Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bring more consistency to nmodl::stringutils utility functions #1066

Closed
tristan0x opened this issue Sep 6, 2023 · 6 comments · Fixed by #1071
Closed

Bring more consistency to nmodl::stringutils utility functions #1066

tristan0x opened this issue Sep 6, 2023 · 6 comments · Fixed by #1071
Assignees
Labels
proposal New suggestions

Comments

@tristan0x
Copy link
Member

Status

Here are some of the utility functions in src/utils/string_utils.hpp:

std::string& ltrim(std::string& s)
std::string& rtrim(std::string& s)
std::string& trim(std::string& s)
void remove_character(std::string& str, const char c)
std::string& trim_newline(std::string& s)
std::string escape_quotes(const std::string& before)
std::string align_text(const std::string& text, int width, text_alignment type)
std::string tolower(std::string text)

We can distinguish 2 categories:

  1. the functions doing a side effect by accepting a reference to a std::string. Most of these functions also return the reference, probably to use them with a functional style (std::cout << stringutils::trim(my_str)). Note that the function remove_character does not follow that idiom by returning void.
  2. The others returning a new instance of std::string with no side effect. Besides, tolower should accept a r-value, not an l-value :)

Situation

The functions of the first category are a bit misleading, because the fact that they return a std::string may suggest that the parameter is not modified, when it actually is.

Target

2 solutions:

  1. either functions accept an l-value, do in-place edition of the string, and return nothing.
  2. or adopt a full functional style by returning a copy of the input string with no side-effect

I prefer to sacrifice the functional style of (2) in the altar of performance by doing in-place edition of the strings with solution (1)

What do you think @ohm314 @pramodk ?

@tristan0x tristan0x added the proposal New suggestions label Sep 6, 2023
@pramodk
Copy link
Contributor

pramodk commented Sep 6, 2023

I didn’t check all usages but just want to say that « performance » aspect is not much a concern in the context of nmodl (as its translation program and strings that we manipulate here are hundreds of chars long).

So I don’t have strong: something that make usage simpler, nicer, intuitive is fine.

tagging @1uc as well as he has been looking at the code in recent weeks and would have opinion.

Edit: Copying @1uc's comment from the Slack:

In Option 1 how does one chain operations: ltrim(rtrim(s));

@ohm314
Copy link
Contributor

ohm314 commented Sep 6, 2023

uff, I'm happy to see that @pramodk had the same feeling as me:

  • I don't think either that performance is an issue here. The nmodl codegen is not so performance critical that I would be focusing too much on that
  • I much prefer to have an API to use in the codegen code that makes it easy and elegant to write

so my conclusion is that option 1 option 2 is preferable, since functional style allows you to do more and with simpler/cleaner code. Having said this, I absolutely agree with @tristan0x that these functions need to be cleaned up and made consistent.
(edit: I mistakenly wrote option 1, but meant option 2, clearly I should get some more rest before I continue here...)

@1uc
Copy link
Collaborator

1uc commented Sep 6, 2023

Even in Option 2 it's not entirely obvious that it's a large performance penalty, due to move semantics. For example:

auto str = trim("   fooo   "); // Create one string, and copy elision.
str = ltrim(rtrim(str)); // 1 copy, 2 moves.
str = ltrim(rtrim(std::move(str))); // 3 moves

Everyone's favourite source suggests that sizeof(std::string) == 3 * sizeof(void*). Hence, a move should copy the name number of bytes for both long and short strings. The source:
https://stackoverflow.com/a/21710033

If you're doing something like stripping whitespace from the front, you end up copying the size of the string (minus the leading whitespace). So you don't have to incur a penalty that larger than the cost of the function.

Finally, there's to_upper which from it's signature is likely used in contexts where one doesn't want to destroy the original. In these cases the copy is unavoidable; and there's no difference in cost between Option 1 and Option 2 in terms of speed.

Which brings us to the final reason why Option 1 is a bit clunky:

trim(str + "foo");  // Option 1 => compilation error
auto str_foo = trim(str + "foo"); // Option 2 => just fine.

@1uc
Copy link
Collaborator

1uc commented Sep 7, 2023

One observation, I can't remember seeing any of these functions and I don't immediately see a use for any of them, except tolower / toupper, because they're rather low level. NMODL has stronger abstractions, that I'd expect to deal with padding/alignment issues.

ltrim: 1 use
rtrim: 0 uses + 1 in trim
trim: approx. 16 uses
remove_character: 3 uses
align_text: 3 uses

@ohm314
Copy link
Contributor

ohm314 commented Sep 7, 2023

ah you're making a good point - but we can clean them up nevertheless.

@1uc
Copy link
Collaborator

1uc commented Sep 7, 2023

It's not an argument against cleaning it up. Rather it's an argument that a super ergonomic API maybe isn't as important as we initially thought. Also the cost of changing this if we notice that it's too slow, isn't prohibitive.

tristan0x added a commit that referenced this issue Sep 8, 2023
* The prototypes of these functions now follow these 2 principles:
  1. the input string is kept intact
  2. the transformed text is returned in a new `std::string` instance
* Improve the Doxygen documentation

Also removed some useless work along the way while updating the calls to these functions.

Fixes #1066
@tristan0x tristan0x assigned tristan0x and unassigned ohm314 and pramodk Sep 8, 2023
tristan0x added a commit that referenced this issue Sep 8, 2023
* The prototypes of these functions now follow these 2 principles:
  1. the input string is kept intact
  2. the transformed text is returned in a new `std::string` instance
* Improve the Doxygen documentation

Also removed some useless work along the way while updating the calls to these functions.

Fixes #1066
tristan0x added a commit that referenced this issue Sep 8, 2023
* The prototypes of these functions now follow these 2 principles:
  1. the input string is kept intact
  2. the transformed text is returned in a new `std::string` instance
* Improve the Doxygen documentation

Also removed some useless work along the way while updating the calls to these functions.

Fixes #1066

Co-authored-by: Luc Grosheintz <luc.grosheintz-laval@epfl.ch>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
proposal New suggestions
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants