New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[string] Chunk reads #4610
[string] Chunk reads #4610
Conversation
Profiling with callgrind revealed that about 60% of the time in a `something | string match` call was actually spent in `string_get_arg_stdin()`, because it was calling `read` one byte at a time. This makes it read in chunks similar to builtin read. This increases performance for `getent hosts | string match -v '0.0.0.0*'` from about 300ms to about 30ms (i.e. 90%). At that point it's _actually_ quicker than `grep`. To improve performance even more, we'd have to cut down on str2wcstring.
Urgh, I need to attribute https://stackoverflow.com/questions/1583353/how-to-read-exactly-one-line/1584620#1584620, which this is based on. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work; a few nits for for improvement and then it's G2G
src/builtin_string.cpp
Outdated
|
||
// Read in chunks from fd until buffer has a line. | ||
std::string::iterator pos; | ||
while ((pos = std::find (buffer.begin(), buffer.end(), '\n')) == buffer.end ()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a bit more idiomatic as:
size_t pos;
while ((pos = buffer.find('\n')) == std::string::npos) {...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
THAT WAS IT! I've been thinking that there was some nicer way to to find
, but I couldn't for the life of me come up with it.
src/builtin_string.cpp
Outdated
*storage = str2wcstring(arg); | ||
return storage->c_str(); | ||
// Split the buffer around '\n' found and return first part. | ||
*storage = str2wcstring(buffer.c_str(), std::distance(buffer.begin(), pos)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With the above this could be simply *storage = str2wcstring(buffer, pos);
src/builtin_string.cpp
Outdated
return storage->c_str(); | ||
// Split the buffer around '\n' found and return first part. | ||
*storage = str2wcstring(buffer.c_str(), std::distance(buffer.begin(), pos)); | ||
buffer = std::string(pos + 1, buffer.end()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and this could be buffer.erase(0, pos + 1);
I think with my suggestions we can skip the attributions |
Description
Profiling with callgrind revealed that about 60% of the time in a
something | string match
callwas actually spent in
string_get_arg_stdin()
,because it was calling
read
one byte at a time.This makes it read in chunks similar to builtin read.
This increases performance for
getent hosts | string match -v '0.0.0.0*'
from about 300ms to about 30ms (i.e. 90%).At that point it's actually quicker than
grep
.To improve performance even more, we'd have to cut down on str2wcstring.
Fixes issue #4604.
TODOs: