Prevent buffer builder length overflow in MutableBuffer::extend_zeros#9820
Prevent buffer builder length overflow in MutableBuffer::extend_zeros#9820alamb merged 4 commits intoapache:mainfrom
MutableBuffer::extend_zeros#9820Conversation
| let new_len = self | ||
| .len | ||
| .checked_add(additional) | ||
| .expect("buffer length overflow"); |
There was a problem hiding this comment.
Introducing internal panics is not ideal, tho clearly better than UB. How should library users code defensively to avoid panic, and how can we make their life easier?
There was a problem hiding this comment.
I agree panics are not good and should be avoided when possible
We had a bit of a philosophical debate about this earlier (when to panic vs Error) and the conclusion we came to got codified in this doc, which I think is relevant here: https://github.com/apache/arrow-rs#guidelines-for-panic-vs-result
There was a problem hiding this comment.
Basically, do we force all downstream consumers to check for what is very likely an error that will never happen? I think the answer depends on opinion
There was a problem hiding this comment.
Yeah... arguably the guidelines say we should be returning an error here (because asking for too many entries is a form of invalid input), but it certainly complicates the API. I don't know if there's a way to provide a fallible version of this API, for paranoid consumers to use?
I do agree it should be a very rare error, but I've also been unpleasantly surprised at how often 32-bit StringArray offsets blow up in practice.
There was a problem hiding this comment.
These are 64-bit values right? Probably ok to leave it as a panic because last I knew most hardware cannot physically index more than 48 bits of virtual memory and most operating systems cap the size of any one memory mapping to a few TB of contiguous virtual address space (even one not backed by memory).
There was a problem hiding this comment.
I think usize is 64 bits on 64-bit architectures and 32 bits on 32-bit architectures
There was a problem hiding this comment.
I do agree it should be a very rare error, but I've also been unpleasantly surprised at how often 32-bit StringArray offsets blow up in practice.
Yeah, i32 (2GB strings) is shockingly common
There was a problem hiding this comment.
I don't know if there's a way to provide a fallible version of this API, for paranoid consumers to use?
I mean we could add a try_extend_zeros or something 🤔
There was a problem hiding this comment.
Yeah, I agree it might be nice to add some try_ functions and then document that the current versions might panic.
There was a problem hiding this comment.
I filed a ticket to consider adding new try_ variants
I also pushed a bunch of documentation updates to this PR to document that the APIs panic in certain cases
MutableBuffer::extend_zeros
|
@scovich or @etseidl I would like to get this into the 58.2.0 release if possible -- I realize we can continue to improve the PR, but I think this is better than what is on main. As @scovich says:
If you agree can I ask one of you to approve this PR so I can merge it? |
etseidl
left a comment
There was a problem hiding this comment.
I agree. This is better, and we can worry about further improvements later.
|
Awesome - with that I think I have all the content we need for the release. I'll move to making a RC |
Which issue does this PR close?
Rationale for this change
BufferBuilder reserve paths relied on unchecked usize arithmetic when calculating the required byte length. In optimized builds, very large requested lengths could wrap before capacity growth.
What changes are included in this PR?
This adds checked arithmetic for MutableBuffer byte length calculations used by reserve and zero-extension paths.
Are these changes tested?
Yes. This adds regression coverage for overflowing BufferBuilder length calculations through reserve, append_n_zeroed, and advance.
Are there any user-facing changes?
Invalid requested lengths whose byte size cannot be represented without overflow now panic consistently. There are no API changes.