Skip to content

GH-49896: [C++] Reject short buffer reads in IPC reader#49897

Merged
pitrou merged 1 commit intoapache:mainfrom
pitrou:gh49896-ipc-short-reads
Apr 30, 2026
Merged

GH-49896: [C++] Reject short buffer reads in IPC reader#49897
pitrou merged 1 commit intoapache:mainfrom
pitrou:gh49896-ipc-short-reads

Conversation

@pitrou
Copy link
Copy Markdown
Member

@pitrou pitrou commented Apr 29, 2026

Rationale for this change

IO methods like ReadAt can return less bytes than asked for if the file is too short, but the IPC reader doesn't always detect for this situation. On invalid IPC files, this can produce issues down the road such as half-initialized buffers and large processing times (with a potential denial of service).

This issue was detected by OSS-Fuzz: https://issues.oss-fuzz.com/issues/489758017

What changes are included in this PR?

  1. Add ReadAt and ReadAsync overloads that accept a bool allow_short_read argument
  2. Pass allow_short_read = false in all suitable places in IPC and Parquet readers

Are these changes tested?

Yes, by existing tests and new fuzz regression file.

Are there any user-facing changes?

No, except potentially better detection of invalid IPC streams and files.

@pitrou pitrou force-pushed the gh49896-ipc-short-reads branch 2 times, most recently from a2afb58 to 5ced4d5 Compare April 29, 2026 15:46
@pitrou pitrou added CI: Extra: C++ Run extra C++ CI CI: Extra: CUDA Run extra CUDA CI labels Apr 29, 2026
@pitrou
Copy link
Copy Markdown
Member Author

pitrou commented Apr 29, 2026

@github-actions crossbow submit -g cpp

@github-actions
Copy link
Copy Markdown

Revision: 5ced4d5

Submitted crossbow builds: ursacomputing/crossbow @ actions-ccefa1cf50

Task Status
example-cpp-minimal-build-static GitHub Actions
example-cpp-minimal-build-static-system-dependency GitHub Actions
example-cpp-tutorial GitHub Actions
test-build-cpp-fuzz GitHub Actions
test-conda-cpp GitHub Actions
test-conda-cpp-valgrind GitHub Actions
test-debian-13-cpp-amd64 GitHub Actions
test-debian-13-cpp-i386 GitHub Actions
test-debian-experimental-cpp-gcc-15 GitHub Actions
test-fedora-42-cpp GitHub Actions
test-ubuntu-22.04-cpp GitHub Actions
test-ubuntu-22.04-cpp-bundled GitHub Actions
test-ubuntu-22.04-cpp-emscripten GitHub Actions
test-ubuntu-22.04-cpp-no-threading GitHub Actions
test-ubuntu-24.04-cpp GitHub Actions
test-ubuntu-24.04-cpp-bundled-offline GitHub Actions
test-ubuntu-24.04-cpp-gcc-13-bundled GitHub Actions
test-ubuntu-24.04-cpp-gcc-14 GitHub Actions
test-ubuntu-24.04-cpp-minimal-with-formats GitHub Actions
test-ubuntu-24.04-cpp-thread-sanitizer GitHub Actions

@pitrou pitrou marked this pull request as ready for review April 29, 2026 16:36
@pitrou pitrou requested a review from wgtmac as a code owner April 29, 2026 16:36
@pitrou
Copy link
Copy Markdown
Member Author

pitrou commented Apr 29, 2026

This is adding to the number of virtual methods and overloads in the IO interfaces. I think we could deprecate the legacy ReadAt and ReadAsync that don't take a bool allow_short_read argument in a later PR. Thoughts @lidavidm @WillAyd @zanmato1984 ?

@pitrou
Copy link
Copy Markdown
Member Author

pitrou commented Apr 29, 2026

(if I could redo this, the ReadAt and ReadAsync methods would read exactly the given number of bytes by default)

@pitrou pitrou requested review from WillAyd and lidavidm April 29, 2026 17:01
Comment on lines +291 to +292
/// Like `ReadAt(position, nbytes, allow_short_read, out)` with `allow_short_read`
/// set to true.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we should deprecate these overloads over time (it feels like it would be safer to have allow_short_read be the opt-in rather than opt-out behavior at least)

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By "these overloads", you mean those without the allow_short_read parameter, right?

And, yes, I agree that disallowing short reads by default would definitely be safer. Short reads by default is fine in a "safe" language like Python, not so much in C++.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think it would be safer if they were eventually removed to avoid this cropping up.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah sorry I missed your comment above. Yes, I agree, we should do that in a later PR.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I'll open a separate issue and PR for deprecation)

Comment thread cpp/src/arrow/io/interfaces.cc
@github-actions github-actions Bot added awaiting merge Awaiting merge awaiting changes Awaiting changes and removed awaiting review Awaiting review awaiting merge Awaiting merge labels Apr 30, 2026
@pitrou pitrou force-pushed the gh49896-ipc-short-reads branch from 5ced4d5 to 83a1648 Compare April 30, 2026 08:22
@github-actions github-actions Bot added awaiting change review Awaiting change review and removed CI: Extra: C++ Run extra C++ CI awaiting changes Awaiting changes labels Apr 30, 2026
@pitrou pitrou merged commit 5a331a9 into apache:main Apr 30, 2026
67 of 75 checks passed
@pitrou pitrou removed the awaiting change review Awaiting change review label Apr 30, 2026
@pitrou pitrou deleted the gh49896-ipc-short-reads branch April 30, 2026 13:30
@pitrou
Copy link
Copy Markdown
Member Author

pitrou commented Apr 30, 2026

I've created #49904 for the deprecation.

@conbench-apache-arrow
Copy link
Copy Markdown

After merging your PR, Conbench analyzed the 0 benchmarking runs that have been run so far on merge-commit 5a331a9.

None of the specified runs were found on the Conbench server.

The full Conbench report has more details.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants