Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

implemented efficient readline for ByteBuffer #426

Merged
merged 4 commits into from Mar 15, 2020
Merged

implemented efficient readline for ByteBuffer #426

merged 4 commits into from Mar 15, 2020

Conversation

mpenkov
Copy link
Collaborator

@mpenkov mpenkov commented Mar 9, 2020

Motivation

The previous implementation searched for the terminator in the buffer twice. This was unnecessary.

The new implementaiton performs the search only once.

Checklist

Before you create the PR, please make sure you have:

  • Picked a concise, informative and complete title
  • Clearly explained the motivation behind the PR
  • Linked to any existing issues that your PR will be solving
  • Included tests for any new functionality
  • Checked that all unit tests pass

@mpenkov mpenkov added this to the 1.10.0 milestone Mar 9, 2020
@mpenkov mpenkov requested a review from piskvorky March 9, 2020 03:19
Copy link
Owner

@piskvorky piskvorky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like a PR for performance optimization, but I see no numbers. What is the impact of this change?

smart_open/bytebuffer.py Outdated Show resolved Hide resolved
Co-Authored-By: Radim Řehůřek <me@radimrehurek.com>
smart_open/bytebuffer.py Show resolved Hide resolved
Before:

--------------------------------------------- benchmark: 1 tests --------------------------------------------
Name (time in s)         Min      Max     Mean  StdDev   Median     IQR  Outliers     OPS  Rounds  Iterations
-------------------------------------------------------------------------------------------------------------
test                 37.5192  58.2815  42.6553  8.8182  39.4090  7.2349       1;1  0.0234       5           1
-------------------------------------------------------------------------------------------------------------

After:

--------------------------------------------- benchmark: 1 tests --------------------------------------------
Name (time in s)         Min      Max     Mean  StdDev   Median     IQR  Outliers     OPS  Rounds  Iterations
-------------------------------------------------------------------------------------------------------------
test                 36.2176  40.6042  37.2671  1.8898  36.3286  1.6347       1;1  0.0268       5           1
-------------------------------------------------------------------------------------------------------------
@mpenkov
Copy link
Collaborator Author

mpenkov commented Mar 11, 2020

Please check out the benchmarks above. With 5 runs to read 1 million lines, the minimum time hasn't decreased all that much (from 38 to 36s), but the max time changed significantly (from 58 to 40s).

@mpenkov mpenkov self-assigned this Mar 11, 2020
@mpenkov mpenkov merged commit 89b2f44 into piskvorky:master Mar 15, 2020
@mpenkov mpenkov deleted the read_next_line branch March 15, 2020 05:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BufferedInputBase readline reading buffer twice
2 participants