Prevent FixedSizeBinaryArray::value offset truncation#9850
Prevent FixedSizeBinaryArray::value offset truncation#9850alamb wants to merge 1 commit intoapache:mainfrom
FixedSizeBinaryArray::value offset truncation#9850Conversation
e77819a to
7faf57d
Compare
| /// checking for overflow. | ||
| #[inline] | ||
| fn value_offset_at(&self, i: usize) -> i32 { | ||
| self.value_length * i as i32 |
There was a problem hiding this comment.
this arithmetic can overflow if the result is larger than i32::MAX
| /// Caller is responsible for ensuring that the index is within the bounds | ||
| /// of the array | ||
| /// of the array and the resulting byte offset fits in `i32` | ||
| pub unsafe fn value_unchecked(&self, i: usize) -> &[u8] { |
There was a problem hiding this comment.
Would it make sense to add new methods that are alternatives to value_offset and value_offset_at that return usize, so we don't need this limitation? Or at least update this method so it doesn't use them and doesn't suffer from i32 overflow. Because with this change, value now handles when value_offset is greater than max i32, but value_unchecked still doesn't.
I would expect value_unchecked to work correctly for all cases where value doesn't panic.
And existing code that uses value_unchecked might validate the index but not be aware of this hidden safety requirement.
There was a problem hiding this comment.
Short answer is yes. I also spent some more time reviewing the code in FixedSizeBinaryArray and I am now convinced there are several other miuses of i32 <-> usize . I am working on an improvement, though I worry it will be a larger PR
|
I also added some more docs for FixedSizeBinaryArray that may help reviewers |
|
I have played around with several options for improving this code. I think there are several potential i32 math overflow issues, but fixing them all in a single PR is going to be somewhat hard to review and take some time What I am thinking about is adding a new invariant to the FixedSizeArray constructor that prevents constructing arrays with value buffers larger than 2GB as a temporary workaround in one PR. Then I can fixup the actual arithmetic in a second: |
Which issue does this PR close?
Rationale for this change
FixedSizeBinaryArray::value_offset_atcast the requested index toi32before multiplying by the element width. For indexes beyondi32::MAX, that truncation could produce a negative byte offset and causevalue()to read before the start of the value buffer.What changes are included in this PR?
Note I also added some more docs for FixedSizeBinaryArray that may help reviewers
Are these changes tested?
I can't find any way to test this this issue without actually allocating a large array (over 2GB)
Are there any user-facing changes?
Better limit checking