New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve ad_utility::.strip() and ad_utility::splitAny() #88
Conversation
niklas88
commented
Aug 8, 2018
•
edited
edited
- Use std::string_first/last_not_of to simplify
- Template so we only need one version for string/char* chars to strip
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added some comments with optional changes, apart from that seems good to me.
src/util/StringUtils.h
Outdated
size_t i = 0; | ||
while (i < text.size() && text[i] == c) { | ||
++i; | ||
auto it = text.begin(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could also add an additional variable auto right = text.end();
and use that instead of both calls to text.end()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd hope that every self respecting compiler inlines std::string::end()
and then it's just a pointer/size comparison either way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yeah, compiler magic. Okay, then leaving it that way could also result in better readable code.
src/util/StringUtils.h
Outdated
size_t i = 0; | ||
while (i < text.size() && chars[static_cast<unsigned char>(text[i])]) { | ||
++i; | ||
auto it = text.begin(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here, regarding the text.end()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
find_first_not_of
src/util/StringUtils.h
Outdated
while (i > 0 && text[i - 1] == c) { | ||
--i; | ||
inline string rstrip(const string& text, const char c) { | ||
auto it = text.end(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same for text.begin()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
find_last_not_of
src/util/StringUtils.h
Outdated
size_t i = text.size(); | ||
while (i > 0 && chars[static_cast<unsigned char>(text[i - 1])]) { | ||
--i; | ||
auto it = text.end(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same for text.begin()
.
src/util/StringUtils.h
Outdated
; | ||
return std::string(left, right); | ||
auto right = text.end(); | ||
for (; left < text.end(); left++) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here one could also use the right
instead of text.end()
.
src/util/StringUtils.h
Outdated
} | ||
auto left = text.begin(); | ||
auto right = text.end(); | ||
for (; left < text.end(); left++) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here one could also use the right
instead of text.end()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the strip functions are used in a performance critical area, we could also try returning string_views, but I am not sure whether this causes too much trouble when explicitly having to cast them to strings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the main problem with the string_view
approach is that we must be sure that none of the strings that are viewed goes out of scope/is deleted while we keep a view.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is correct. If (and only if) we have use cases, where this is critical for performance, we could implement it with
template <class Prefix, bool returnStringView = false>
structConvertingTrueToStringView::value lstrip(const string& text, Prefix p) {...}
and then
string bla = lstrip(someString, "abcdef");
string_view blubb = lstrip<true>(someString, "abcdef");
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this whole section can be cleaned up by using the STL.
src/util/StringUtils.h
Outdated
@@ -508,26 +508,30 @@ J join(const vector<J>& to_join, const S& joiner) { | |||
|
|||
// _____________________________________________________________________________ | |||
inline string lstrip(const string& text, char c) { | |||
size_t i = 0; | |||
while (i < text.size() && text[i] == c) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason, why lstrip (for any arguments) does not use string::find_first_of which is basically reimplemented here for all possible overloads?
inline string lstrip(const string& text, string s) { return text.substr(text.find_first_not_of(s)); }
The same goes for rstrip with find_last_not_of
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@niklas88 Actually you could probably template this to "everything that has an overload of find_first_not_of, so
template <class S>
inline string lstrip(const string& text, S&& s)
{ return text.substr(text.find_first_not_of(std::forward<S>(s))); }
(the first argument to find_first_not_of is always "a char, or a set of chars" in some way (string, char, char*, convertible to string_view") so this should work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sadly templating lstrip()
on the first argument doesn't work because then in lstrip("foo", ' ')
the first argument is deduced to char[size]
which doesn't have last_index_not_of()
. So we can only template the second argument
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is exactly what I suggested.
I don't know if string_views have these methods, but const string&'s are sufficiently effective for now I would guess.
src/util/StringUtils.h
Outdated
size_t i = 0; | ||
while (i < text.size() && text[i] == c) { | ||
++i; | ||
auto it = text.begin(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The join function is basically std::accumulate
auto op = [joiner&](const auto& a, const auto&b) { return a + joiner + b;};
return std::accumulate(to_join.begin(), to_join.end(), op;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And if Performance is critical, it could be overloaded for strings, where we do it by hand but with a call to reserve
before or even resize
and manual emplacement with memcpy
. (Should not be necessary if the operator+=
is implemented well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't get this to work as the init
argument needs to match the joind types and e.g. the following
template <typename J, typename S>
J join(const vector<J>& to_join, const S& joiner) {
auto op = [&joiner](const auto& a, const auto& b) {return a + joiner + b;};
return std::accumulate(to_join.begin(), to_join.end(), J(joiner), op);
}
breaks since there is no std::basic_string(const char&)
constructor
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, so first, you are setting the initial value to the joiner, this is not correct.
My version has the flaw that it would start with the joiner (and yours twice if it could be constructed). So what we would need is probably:
template <typename J, typename S>
J join(const vector<J>& to_join, const S& joiner) {
if (to_join.empty) {return J();}
auto op = [&joiner](const auto& a, const auto& b) {return a + joiner + b;};
return std::accumulate(to_join.begin() + 1, to_join.end(), to_join.first, op);
}
This should works, since adding single chars to strings is ok.
src/util/StringUtils.h
Outdated
size_t i = 0; | ||
while (i < text.size() && chars[static_cast<unsigned char>(text[i])]) { | ||
++i; | ||
auto it = text.begin(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
find_first_not_of
src/util/StringUtils.h
Outdated
while (i > 0 && text[i - 1] == c) { | ||
--i; | ||
inline string rstrip(const string& text, const char c) { | ||
auto it = text.end(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
find_last_not_of
src/util/StringUtils.h
Outdated
} | ||
auto left = text.begin(); | ||
auto right = text.end(); | ||
for (; left < text.end(); left++) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the strip functions are used in a performance critical area, we could also try returning string_views, but I am not sure whether this causes too much trouble when explicitly having to cast them to strings.
@joka921 very good points, it makes no sense to stick with such low level primitives as the original code used. I really should study up on some of those STL algorithm tricks. I'll address your comments and agree on the potential for |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Templates are somewhat implicit so we need explicit comments unless we have concepts.
src/util/StringUtils.h
Outdated
|
||
inline string rstrip(const string& text, const char* s); | ||
template <class T> | ||
inline string strip(const string& text, const T& s); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be more explicit to comment the constraints of the template parameters like "T must be a single char or a string(_view) like type." (not a good example but I think you get what I mean).
@joka921 I've amended the commit with better documentation and also made |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, I think this already helps a lot.
src/util/StringUtils.h
Outdated
inline vector<string> splitAny(const string& orig, const char* seps); | ||
|
||
inline vector<string> splitAny(const string& orig, const string& seps); | ||
//! Splits a string a any character inside the separators string. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
:s/a/at
src/util/StringUtils.h
Outdated
//! a std::string_view. | ||
//! This is analogous to how std::string::find_first_of() works | ||
template <class T> | ||
inline vector<string> splitAny(const string& orig, const T& seperators); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
:s/seperators/separators
src/util/StringUtils.h
Outdated
vector<string> splitAny(const string& orig, const string& seps) { | ||
template <class T> | ||
vector<string> splitAny(const string& orig, const T& seperators) { | ||
std::string_view seps(seperators); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe comment that this ensures us to have a size() method when we come from string literals etc. I was a bit confused at first why this is done.
* Use std::string_first/last_not_of to simplify * Template so we only need one version for string/char* chars to strip