Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[string] Chunk reads #4610

Closed
wants to merge 5 commits into from
Closed

[string] Chunk reads #4610

wants to merge 5 commits into from

Conversation

faho
Copy link
Member

@faho faho commented Dec 18, 2017

Description

Profiling with callgrind revealed that about 60% of the time in a something | string match call
was actually spent in string_get_arg_stdin(),
because it was calling read one byte at a time.

This makes it read in chunks similar to builtin read.

This increases performance for getent hosts | string match -v '0.0.0.0*' from about 300ms to about 30ms (i.e. 90%).
At that point it's actually quicker than grep.

To improve performance even more, we'd have to cut down on str2wcstring.

Fixes issue #4604.

TODOs:

  • Changes to fish usage are reflected in user documentation/manpages.
  • Tests have been added for regressions fixed
  • User-visible changes noted in CHANGELOG.md

Profiling with callgrind revealed that about 60% of the time in a `something | string match` call
was actually spent in `string_get_arg_stdin()`,
because it was calling `read` one byte at a time.

This makes it read in chunks similar to builtin read.

This increases performance for `getent hosts | string match -v '0.0.0.0*'` from about 300ms to about 30ms (i.e. 90%).
At that point it's _actually_ quicker than `grep`.

To improve performance even more, we'd have to cut down on str2wcstring.
@faho faho added this to the fish-3.0 milestone Dec 18, 2017
@faho
Copy link
Member Author

faho commented Dec 18, 2017

Urgh, I need to attribute https://stackoverflow.com/questions/1583353/how-to-read-exactly-one-line/1584620#1584620, which this is based on.

Copy link
Member

@ridiculousfish ridiculousfish left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work; a few nits for for improvement and then it's G2G


// Read in chunks from fd until buffer has a line.
std::string::iterator pos;
while ((pos = std::find (buffer.begin(), buffer.end(), '\n')) == buffer.end ()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bit more idiomatic as:

    size_t pos;
    while ((pos = buffer.find('\n')) == std::string::npos) {...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

THAT WAS IT! I've been thinking that there was some nicer way to to find, but I couldn't for the life of me come up with it.

*storage = str2wcstring(arg);
return storage->c_str();
// Split the buffer around '\n' found and return first part.
*storage = str2wcstring(buffer.c_str(), std::distance(buffer.begin(), pos));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the above this could be simply *storage = str2wcstring(buffer, pos);

return storage->c_str();
// Split the buffer around '\n' found and return first part.
*storage = str2wcstring(buffer.c_str(), std::distance(buffer.begin(), pos));
buffer = std::string(pos + 1, buffer.end());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and this could be buffer.erase(0, pos + 1);

@ridiculousfish
Copy link
Member

I think with my suggestions we can skip the attributions

@faho
Copy link
Member Author

faho commented Dec 20, 2017

Merged with a bit of cleanup as 2de38ef, with some additional work in f9d883d.

@faho faho closed this Dec 20, 2017
@faho faho added the release notes Something that is or should be mentioned in the release notes label Dec 20, 2017
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Apr 17, 2020
@faho faho deleted the unbuffer branch March 4, 2022 13:26
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
enhancement release notes Something that is or should be mentioned in the release notes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants