Prevent ArrayData validation length overflow#9816
Conversation
| /// A thread-safe, shared reference to the Arrow array data. | ||
| pub type ArrayDataRef = Arc<ArrayData>; | ||
|
|
||
| fn checked_len_plus_offset( |
There was a problem hiding this comment.
the idea is to validate / error on overflow with a common helper
| if let Some(null_bit_buffer) = null_bit_buffer.as_ref() { | ||
| let needed_len = bit_util::ceil(len + offset, 8); | ||
| let len_plus_offset = checked_len_plus_offset(&data_type, len, offset)?; | ||
| let needed_len = bit_util::ceil(len_plus_offset, 8); |
There was a problem hiding this comment.
Just double checking -- this bit_util::ceil cannot overflow because it always performs a division whose output is smaller than the input (so adding one cannot overflow). Or, when dividing by 1, the result is returned unchanged (no need to add one)?
There was a problem hiding this comment.
Right --
arrow-rs/arrow-buffer/src/util/bit_util.rs
Lines 95 to 97 in 4fa8d2f
Uses div_ceil, https://doc.rust-lang.org/std/primitive.usize.html#method.div_ceil
Calculates the smallest value greater than or equal to self that is a multiple of rhs.
though maybe we should just remove bit_util::ceil and call div_ceil directly now that it is in stable Rust 🤔 ( as a follow on PR)
|
|
||
| // This should have been checked as part of `validate()` prior | ||
| // to calling `validate_full()` but double check to be sure | ||
| assert!(buffer.len() / mem::size_of::<T>() >= required_len); |
There was a problem hiding this comment.
aside: Is this a potential panic we should think about turning to an error instead?
There was a problem hiding this comment.
I do think it is a potential panic (though as the code says it "should not be possible")
making it an internal error of some sort would likely be the more defensive strategy
I can do that in a follow on PR
| if let Some(null_bit_buffer) = null_bit_buffer.as_ref() { | ||
| let needed_len = bit_util::ceil(len + offset, 8); | ||
| let len_plus_offset = checked_len_plus_offset(&data_type, len, offset)?; | ||
| let needed_len = bit_util::ceil(len_plus_offset, 8); |
There was a problem hiding this comment.
Right --
arrow-rs/arrow-buffer/src/util/bit_util.rs
Lines 95 to 97 in 4fa8d2f
Uses div_ceil, https://doc.rust-lang.org/std/primitive.usize.html#method.div_ceil
Calculates the smallest value greater than or equal to self that is a multiple of rhs.
though maybe we should just remove bit_util::ceil and call div_ceil directly now that it is in stable Rust 🤔 ( as a follow on PR)
|
|
||
| // This should have been checked as part of `validate()` prior | ||
| // to calling `validate_full()` but double check to be sure | ||
| assert!(buffer.len() / mem::size_of::<T>() >= required_len); |
There was a problem hiding this comment.
I do think it is a potential panic (though as the code says it "should not be possible")
making it an internal error of some sort would likely be the more defensive strategy
I can do that in a follow on PR
|
Thank you for the review @scovich |
- None. `ArrayData` validation used unchecked `usize` arithmetic when combining array lengths and offsets. In optimized builds, very large lengths could wrap these calculations and allow invalid `ArrayData` metadata to pass validation. This adds checked arithmetic for length plus offset calculations in `ArrayData` validation, including offset-buffer validation and related typed-buffer sizing paths. Yes. This adds regression coverage for overflowing offset-buffer and typed-buffer length calculations. Validated with: ```bash cargo test -p arrow-data overflow --release ``` Invalid `ArrayData` whose length and offset cannot be represented without overflow now returns an validation error consistently across build modes. There are no API changes.
Which issue does this PR close?
Rationale for this change
ArrayDatavalidation used uncheckedusizearithmetic when combining array lengths and offsets. In optimized builds, very large lengths could wrap these calculations and allow invalidArrayDatametadata to pass validation.What changes are included in this PR?
This adds checked arithmetic for length plus offset calculations in
ArrayDatavalidation, including offset-buffer validation and related typed-buffer sizing paths.Are these changes tested?
Yes. This adds regression coverage for overflowing offset-buffer and typed-buffer length calculations.
Validated with:
cargo test -p arrow-data overflow --releaseAre there any user-facing changes?
Invalid
ArrayDatawhose length and offset cannot be represented without overflow now returns an validation error consistently across build modes. There are no API changes.