New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
String: optimize gsub and tr for ascii char and single byte string #4978
String: optimize gsub and tr for ascii char and single byte string #4978
Conversation
if replacement.is_a?(Char) && char.ascii? && replacement.ascii? | ||
return gsub_ascii_char(char, replacement) | ||
end | ||
|
||
gsub { |my_char| char == my_char ? replacement : my_char } | ||
else | ||
self |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Slightly unrelated but there is no direct coverage of this line in the specs. I wanted to double check the optimized overload have specs, and they do. But there is no String#gsub(Char, replacement)
invocation that will stress this path. Other specs fails if this line is changed though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll add a spec.
Another discussion: replacement
can be anything, like a number, array, string or char. Maybe we should limit it to only chars and strings? (like in Ruby)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would leave it unbounded until there is another overload. I think is clear from the class and the name of the method what to expect.
|
Yes, this is with release. Note that this only improves the performance if you replace one ascii char with another one. In all other cases the performance doesn't improve. And this was already in the order of nanoseconds so it's probably not noticeable in a real app. But you can tell us later :-) |
Fixes #4586
The main performance benefit is that when replacing a single byte char with another single byte char the new string bytesize is known to be the same as the current one, so we can just allocate that. The general
gsub
implementation uses aString::Builder
with an initial approximate capacity.Here I also check for single byte strings and invoke the same method (different overload) with Char, which should be more efficient in general.
I used this as a benchmark:
Old output:
New output:
(the last two entries are just there to see that those checks don't ruin the performance when the optimization can't be applied)