Skip to content

Prevent FixedSizeBinaryArray::value offset truncation#9850

Open
alamb wants to merge 1 commit intoapache:mainfrom
alamb:codex/fixed-size-binary-offset-overflow
Open

Prevent FixedSizeBinaryArray::value offset truncation#9850
alamb wants to merge 1 commit intoapache:mainfrom
alamb:codex/fixed-size-binary-offset-overflow

Conversation

@alamb
Copy link
Copy Markdown
Contributor

@alamb alamb commented Apr 29, 2026

Which issue does this PR close?

  • None.

Rationale for this change

FixedSizeBinaryArray::value_offset_at cast the requested index to i32 before multiplying by the element width. For indexes beyond i32::MAX, that truncation could produce a negative byte offset and cause value() to read before the start of the value buffer.

What changes are included in this PR?

  1. Check for offset overflow
  2. Adds regression tests

Note I also added some more docs for FixedSizeBinaryArray that may help reviewers

Are these changes tested?

I can't find any way to test this this issue without actually allocating a large array (over 2GB)

Are there any user-facing changes?

Better limit checking

@github-actions github-actions Bot added the arrow Changes to the arrow crate label Apr 29, 2026
@alamb alamb force-pushed the codex/fixed-size-binary-offset-overflow branch from e77819a to 7faf57d Compare April 29, 2026 17:42
/// checking for overflow.
#[inline]
fn value_offset_at(&self, i: usize) -> i32 {
self.value_length * i as i32
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this arithmetic can overflow if the result is larger than i32::MAX

@alamb alamb marked this pull request as ready for review April 29, 2026 18:22
/// Caller is responsible for ensuring that the index is within the bounds
/// of the array
/// of the array and the resulting byte offset fits in `i32`
pub unsafe fn value_unchecked(&self, i: usize) -> &[u8] {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to add new methods that are alternatives to value_offset and value_offset_at that return usize, so we don't need this limitation? Or at least update this method so it doesn't use them and doesn't suffer from i32 overflow. Because with this change, value now handles when value_offset is greater than max i32, but value_unchecked still doesn't.

I would expect value_unchecked to work correctly for all cases where value doesn't panic.

And existing code that uses value_unchecked might validate the index but not be aware of this hidden safety requirement.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Short answer is yes. I also spent some more time reviewing the code in FixedSizeBinaryArray and I am now convinced there are several other miuses of i32 <-> usize . I am working on an improvement, though I worry it will be a larger PR

@alamb
Copy link
Copy Markdown
Contributor Author

alamb commented Apr 30, 2026

I also added some more docs for FixedSizeBinaryArray that may help reviewers

@alamb
Copy link
Copy Markdown
Contributor Author

alamb commented Apr 30, 2026

I have played around with several options for improving this code. I think there are several potential i32 math overflow issues, but fixing them all in a single PR is going to be somewhat hard to review and take some time

What I am thinking about is adding a new invariant to the FixedSizeArray constructor that prevents constructing arrays with value buffers larger than 2GB as a temporary workaround in one PR. Then I can fixup the actual arithmetic in a second:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

arrow Changes to the arrow crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants