-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[C++] IO: BufferedInputStream would trigger small IO when issue small IO in edge of buffer #37434
Comments
cc @jp0317 as a Buffered-Read user in parquet :-) |
I would probably have expected it to read a full chunk, yes. I suppose here we should copy the part of the existing buffer that we need, then discard the rest and read a new chunk. |
Okay, I'll submit a version of read new large chunk tonight |
Yes, this could be improved. But for large reads (larger than the buffer size), we don't want to write first to the buffer and then to |
I've draft a patch here: #37460 |
I found that |
#37460) ### Rationale for this change If we Set BufferSize == 100k, and read 3k bytes per IO. When we read the 34 times, the IO would be (99k, 102k] In Read, it will read buffered (99k, 100k], issue IO for (100k, 102k]. Rather than (100k, 200k]. ### What changes are included in this PR? Refactor `BufferedInputStream::Read` to optimize small IO. ### Are these changes tested? Already has tests? ### Are there any user-facing changes? User might get io-pattern changed. It can be optimization or downgrade. * Closes: #37434 Lead-authored-by: mwish <maplewish117@gmail.com> Co-authored-by: Antoine Pitrou <antoine@python.org> Signed-off-by: Antoine Pitrou <antoine@python.org>
…l input (apache#37460) ### Rationale for this change If we Set BufferSize == 100k, and read 3k bytes per IO. When we read the 34 times, the IO would be (99k, 102k] In Read, it will read buffered (99k, 100k], issue IO for (100k, 102k]. Rather than (100k, 200k]. ### What changes are included in this PR? Refactor `BufferedInputStream::Read` to optimize small IO. ### Are these changes tested? Already has tests? ### Are there any user-facing changes? User might get io-pattern changed. It can be optimization or downgrade. * Closes: apache#37434 Lead-authored-by: mwish <maplewish117@gmail.com> Co-authored-by: Antoine Pitrou <antoine@python.org> Signed-off-by: Antoine Pitrou <antoine@python.org>
…l input (apache#37460) ### Rationale for this change If we Set BufferSize == 100k, and read 3k bytes per IO. When we read the 34 times, the IO would be (99k, 102k] In Read, it will read buffered (99k, 100k], issue IO for (100k, 102k]. Rather than (100k, 200k]. ### What changes are included in this PR? Refactor `BufferedInputStream::Read` to optimize small IO. ### Are these changes tested? Already has tests? ### Are there any user-facing changes? User might get io-pattern changed. It can be optimization or downgrade. * Closes: apache#37434 Lead-authored-by: mwish <maplewish117@gmail.com> Co-authored-by: Antoine Pitrou <antoine@python.org> Signed-off-by: Antoine Pitrou <antoine@python.org>
Describe the enhancement requested
If we Set BufferSize == 100k, and read 3k bytes per IO. When we read the 34 times, the IO would be
(99k, 102k]
In
Read
, it will read buffered(99k, 100k]
, issue IO for(100k, 102k]
. Rather than(100k, 200k]
. Is this expected?Component(s)
C++
The text was updated successfully, but these errors were encountered: