Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve ad_utility::.strip() and ad_utility::splitAny() #88

Merged
merged 1 commit into from Aug 16, 2018
Merged

Improve ad_utility::.strip() and ad_utility::splitAny() #88

merged 1 commit into from Aug 16, 2018

Conversation

niklas88
Copy link
Member

@niklas88 niklas88 commented Aug 8, 2018

  • Use std::string_first/last_not_of to simplify
  • Template so we only need one version for string/char* chars to strip

Copy link

@naetherm naetherm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added some comments with optional changes, apart from that seems good to me.

size_t i = 0;
while (i < text.size() && text[i] == c) {
++i;
auto it = text.begin();
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could also add an additional variable auto right = text.end(); and use that instead of both calls to text.end().

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd hope that every self respecting compiler inlines std::string::end() and then it's just a pointer/size comparison either way.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yeah, compiler magic. Okay, then leaving it that way could also result in better readable code.

size_t i = 0;
while (i < text.size() && chars[static_cast<unsigned char>(text[i])]) {
++i;
auto it = text.begin();
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, regarding the text.end().

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

find_first_not_of

while (i > 0 && text[i - 1] == c) {
--i;
inline string rstrip(const string& text, const char c) {
auto it = text.end();
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same for text.begin().

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

find_last_not_of

size_t i = text.size();
while (i > 0 && chars[static_cast<unsigned char>(text[i - 1])]) {
--i;
auto it = text.end();
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same for text.begin().

;
return std::string(left, right);
auto right = text.end();
for (; left < text.end(); left++) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here one could also use the right instead of text.end().

}
auto left = text.begin();
auto right = text.end();
for (; left < text.end(); left++) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here one could also use the right instead of text.end().

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the strip functions are used in a performance critical area, we could also try returning string_views, but I am not sure whether this causes too much trouble when explicitly having to cast them to strings.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the main problem with the string_view approach is that we must be sure that none of the strings that are viewed goes out of scope/is deleted while we keep a view.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is correct. If (and only if) we have use cases, where this is critical for performance, we could implement it with
template <class Prefix, bool returnStringView = false>
structConvertingTrueToStringView::value lstrip(const string& text, Prefix p) {...}

and then
string bla = lstrip(someString, "abcdef");
string_view blubb = lstrip<true>(someString, "abcdef");

@niklas88 niklas88 removed the request for review from floriankramer August 8, 2018 15:41
Copy link
Member

@joka921 joka921 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this whole section can be cleaned up by using the STL.

@@ -508,26 +508,30 @@ J join(const vector<J>& to_join, const S& joiner) {

// _____________________________________________________________________________
inline string lstrip(const string& text, char c) {
size_t i = 0;
while (i < text.size() && text[i] == c) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason, why lstrip (for any arguments) does not use string::find_first_of which is basically reimplemented here for all possible overloads?
inline string lstrip(const string& text, string s) { return text.substr(text.find_first_not_of(s)); }
The same goes for rstrip with find_last_not_of

Copy link
Member

@joka921 joka921 Aug 14, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@niklas88 Actually you could probably template this to "everything that has an overload of find_first_not_of, so
template <class S>
inline string lstrip(const string& text, S&& s)
{ return text.substr(text.find_first_not_of(std::forward<S>(s))); }

(the first argument to find_first_not_of is always "a char, or a set of chars" in some way (string, char, char*, convertible to string_view") so this should work.

Copy link
Member Author

@niklas88 niklas88 Aug 15, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sadly templating lstrip() on the first argument doesn't work because then in lstrip("foo", ' ') the first argument is deduced to char[size] which doesn't have last_index_not_of(). So we can only template the second argument

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is exactly what I suggested.
I don't know if string_views have these methods, but const string&'s are sufficiently effective for now I would guess.

size_t i = 0;
while (i < text.size() && text[i] == c) {
++i;
auto it = text.begin();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The join function is basically std::accumulate
auto op = [joiner&](const auto& a, const auto&b) { return a + joiner + b;};
return std::accumulate(to_join.begin(), to_join.end(), op;

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And if Performance is critical, it could be overloaded for strings, where we do it by hand but with a call to reserve before or even resize and manual emplacement with memcpy. (Should not be necessary if the operator+= is implemented well.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't get this to work as the init argument needs to match the joind types and e.g. the following

template <typename J, typename S>
J join(const vector<J>& to_join, const S& joiner) {
  auto op = [&joiner](const auto& a, const auto& b) {return a + joiner + b;};
  return std::accumulate(to_join.begin(), to_join.end(), J(joiner), op);
}

breaks since there is no std::basic_string(const char&) constructor

Copy link
Member

@joka921 joka921 Aug 15, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, so first, you are setting the initial value to the joiner, this is not correct.
My version has the flaw that it would start with the joiner (and yours twice if it could be constructed). So what we would need is probably:
template <typename J, typename S>

J join(const vector<J>& to_join, const S& joiner) {
  if (to_join.empty) {return J();}
  auto op = [&joiner](const auto& a, const auto& b) {return a + joiner + b;};
  return std::accumulate(to_join.begin() + 1, to_join.end(), to_join.first, op);
}

This should works, since adding single chars to strings is ok.

size_t i = 0;
while (i < text.size() && chars[static_cast<unsigned char>(text[i])]) {
++i;
auto it = text.begin();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

find_first_not_of

while (i > 0 && text[i - 1] == c) {
--i;
inline string rstrip(const string& text, const char c) {
auto it = text.end();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

find_last_not_of

}
auto left = text.begin();
auto right = text.end();
for (; left < text.end(); left++) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the strip functions are used in a performance critical area, we could also try returning string_views, but I am not sure whether this causes too much trouble when explicitly having to cast them to strings.

@niklas88
Copy link
Member Author

@joka921 very good points, it makes no sense to stick with such low level primitives as the original code used. I really should study up on some of those STL algorithm tricks. I'll address your comments and agree on the potential for std::string_view though I think that's a question for another PR

Copy link
Member

@joka921 joka921 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Templates are somewhat implicit so we need explicit comments unless we have concepts.


inline string rstrip(const string& text, const char* s);
template <class T>
inline string strip(const string& text, const T& s);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be more explicit to comment the constraints of the template parameters like "T must be a single char or a string(_view) like type." (not a good example but I think you get what I mean).

@niklas88 niklas88 changed the title Improve string strip Improve ad_utility::.strip() and ad_utility::splitAny() Aug 16, 2018
@niklas88
Copy link
Member Author

@joka921 I've amended the commit with better documentation and also made splitAny() a template as well. Instead of reimplementing it completely with find_.*_of(), for now it just uses the same "convert to std::string_view" trick. Interestingly http://en.cppreference.com currently seems to miss the std::string_view(const std::string&) constructor.

Copy link
Member

@joka921 joka921 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, I think this already helps a lot.

inline vector<string> splitAny(const string& orig, const char* seps);

inline vector<string> splitAny(const string& orig, const string& seps);
//! Splits a string a any character inside the separators string.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:s/a/at

//! a std::string_view.
//! This is analogous to how std::string::find_first_of() works
template <class T>
inline vector<string> splitAny(const string& orig, const T& seperators);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:s/seperators/separators

vector<string> splitAny(const string& orig, const string& seps) {
template <class T>
vector<string> splitAny(const string& orig, const T& seperators) {
std::string_view seps(seperators);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe comment that this ensures us to have a size() method when we come from string literals etc. I was a bit confused at first why this is done.

* Use std::string_first/last_not_of to simplify
* Template so we only need one version for string/char* chars to strip
@niklas88 niklas88 merged commit 7c6005f into ad-freiburg:master Aug 16, 2018
@niklas88 niklas88 deleted the improve-string-strip branch October 31, 2018 09:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants