GH-49896: [C++] Reject short buffer reads in IPC reader#49897
GH-49896: [C++] Reject short buffer reads in IPC reader#49897pitrou merged 1 commit intoapache:mainfrom
Conversation
a2afb58 to
5ced4d5
Compare
|
@github-actions crossbow submit -g cpp |
|
Revision: 5ced4d5 Submitted crossbow builds: ursacomputing/crossbow @ actions-ccefa1cf50 |
|
This is adding to the number of virtual methods and overloads in the IO interfaces. I think we could deprecate the legacy |
|
(if I could redo this, the |
| /// Like `ReadAt(position, nbytes, allow_short_read, out)` with `allow_short_read` | ||
| /// set to true. |
There was a problem hiding this comment.
I wonder if we should deprecate these overloads over time (it feels like it would be safer to have allow_short_read be the opt-in rather than opt-out behavior at least)
There was a problem hiding this comment.
By "these overloads", you mean those without the allow_short_read parameter, right?
And, yes, I agree that disallowing short reads by default would definitely be safer. Short reads by default is fine in a "safe" language like Python, not so much in C++.
There was a problem hiding this comment.
Yes, I think it would be safer if they were eventually removed to avoid this cropping up.
There was a problem hiding this comment.
Ah sorry I missed your comment above. Yes, I agree, we should do that in a later PR.
There was a problem hiding this comment.
(I'll open a separate issue and PR for deprecation)
5ced4d5 to
83a1648
Compare
|
I've created #49904 for the deprecation. |
|
After merging your PR, Conbench analyzed the 0 benchmarking runs that have been run so far on merge-commit 5a331a9. None of the specified runs were found on the Conbench server. The full Conbench report has more details. |
Rationale for this change
IO methods like
ReadAtcan return less bytes than asked for if the file is too short, but the IPC reader doesn't always detect for this situation. On invalid IPC files, this can produce issues down the road such as half-initialized buffers and large processing times (with a potential denial of service).This issue was detected by OSS-Fuzz: https://issues.oss-fuzz.com/issues/489758017
What changes are included in this PR?
ReadAtandReadAsyncoverloads that accept abool allow_short_readargumentallow_short_read = falsein all suitable places in IPC and Parquet readersAre these changes tested?
Yes, by existing tests and new fuzz regression file.
Are there any user-facing changes?
No, except potentially better detection of invalid IPC streams and files.