Prevent buffer builder length overflow in `MutableBuffer::extend_zeros` by alamb · Pull Request #9820 · apache/arrow-rs

alamb · 2026-04-25T03:53:15Z

Which issue does this PR close?

None.

Rationale for this change

BufferBuilder reserve paths relied on unchecked usize arithmetic when calculating the required byte length. In optimized builds, very large requested lengths could wrap before capacity growth.

What changes are included in this PR?

This adds checked arithmetic for MutableBuffer byte length calculations used by reserve and zero-extension paths.

Are these changes tested?

Yes. This adds regression coverage for overflowing BufferBuilder length calculations through reserve, append_n_zeroed, and advance.

Are there any user-facing changes?

Invalid requested lengths whose byte size cannot be represented without overflow now panic consistently. There are no API changes.

scovich · 2026-04-27T15:27:10Z

+        let new_len = self
+            .len
+            .checked_add(additional)
+            .expect("buffer length overflow");


Introducing internal panics is not ideal, tho clearly better than UB. How should library users code defensively to avoid panic, and how can we make their life easier?

I agree panics are not good and should be avoided when possible

We had a bit of a philosophical debate about this earlier (when to panic vs Error) and the conclusion we came to got codified in this doc, which I think is relevant here: https://github.com/apache/arrow-rs#guidelines-for-panic-vs-result

Basically, do we force all downstream consumers to check for what is very likely an error that will never happen? I think the answer depends on opinion

Yeah... arguably the guidelines say we should be returning an error here (because asking for too many entries is a form of invalid input), but it certainly complicates the API. I don't know if there's a way to provide a fallible version of this API, for paranoid consumers to use?

I do agree it should be a very rare error, but I've also been unpleasantly surprised at how often 32-bit StringArray offsets blow up in practice.

These are 64-bit values right? Probably ok to leave it as a panic because last I knew most hardware cannot physically index more than 48 bits of virtual memory and most operating systems cap the size of any one memory mapping to a few TB of contiguous virtual address space (even one not backed by memory).

I think usize is 64 bits on 64-bit architectures and 32 bits on 32-bit architectures

I do agree it should be a very rare error, but I've also been unpleasantly surprised at how often 32-bit StringArray offsets blow up in practice.

Yeah, i32 (2GB strings) is shockingly common

I don't know if there's a way to provide a fallible version of this API, for paranoid consumers to use?

I mean we could add a try_extend_zeros or something 🤔

Yeah, I agree it might be nice to add some try_ functions and then document that the current versions might panic.

I filed a ticket to consider adding new try_ variants

[arrow-buffer] Add fallible APIs for overflow-prone buffer growth #9843

I also pushed a bunch of documentation updates to this PR to document that the APIs panic in certain cases

…reserve-overflow

alamb · 2026-04-28T19:23:39Z

@scovich or @etseidl I would like to get this into the 58.2.0 release if possible -- I realize we can continue to improve the PR, but I think this is better than what is on main. As @scovich says:

Introducing internal panics is not ideal, tho clearly better than UB.

If you agree can I ask one of you to approve this PR so I can merge it?

etseidl

I agree. This is better, and we can worry about further improvements later.

alamb · 2026-04-28T20:16:46Z

Awesome - with that I think I have all the content we need for the release. I'll move to making a RC

Prevent buffer builder length overflow

0bb8caf

github-actions Bot added the arrow Changes to the arrow crate label Apr 25, 2026

alamb marked this pull request as ready for review April 25, 2026 04:02

alamb mentioned this pull request Apr 25, 2026

Release arrow-rs / parquet Minor version 58.2.0 (April 2026) #9109

Open

19 tasks

scovich reviewed Apr 27, 2026

View reviewed changes

alamb mentioned this pull request Apr 28, 2026

[arrow-buffer] Add fallible APIs for overflow-prone buffer growth #9843

Open

alamb added 3 commits April 28, 2026 15:16

Document panic conditions

f1ed690

Merge remote-tracking branch 'apache/main' into codex/buffer-builder-…

0321722

…reserve-overflow

More docs

ed53dc9

alamb changed the title ~~Prevent buffer builder length overflow~~ Prevent buffer builder length overflow in MutableBuffer::extend_zeros Apr 28, 2026

etseidl approved these changes Apr 28, 2026

View reviewed changes

alamb merged commit 3c4311c into apache:main Apr 28, 2026
27 checks passed

alamb deleted the codex/buffer-builder-reserve-overflow branch April 28, 2026 20:17

Conversation

alamb commented Apr 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

alamb commented Apr 28, 2026

Uh oh!

etseidl left a comment

Choose a reason for hiding this comment

Uh oh!

alamb commented Apr 28, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

alamb commented Apr 25, 2026 •

edited

Loading