Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: bug in offset calculation for unions #863

Merged
merged 1 commit into from
Oct 26, 2021

Conversation

helgikrs
Copy link
Contributor

Which issue does this PR close?

Closes #862.

What changes are included in this PR?

The value_offset function only read the least significant byte in the
offset array, causing issues with unions with more than 255 rows of any
given variant. This fixes the issue by reading the entire i32 offset.

The `value_offset` function only read the least significant byte in the
offset array, causing issues with unions with more than 255 rows of any
given variant. Fix the issue by reading the entire i32 offset and add a
unit test.
@github-actions github-actions bot added the arrow Changes to the arrow crate label Oct 26, 2021
@helgikrs helgikrs changed the title fix: fix a bug in offset calculation for unions fix: bug in offset calculation for unions Oct 26, 2021
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @helgikrs

Note there is outstanding work to update the Rust union implementation to a newer version of the arrow spec -- see #85 and #814 if you are interested

I ran the test without the changes in this PR and it fails thusly 👍 so in my mind the PR is good to go.

---- array::array_union::tests::test_dense_i32_large stdout ----
thread 'array::array_union::tests::test_dense_i32_large' panicked at 'assertion failed: `(left == right)`
  left: `0`,
 right: `256`', arrow/src/array/array_union.rs:366:13
stack backtrace:
   0: rust_begin_unwind
             at /rustc/09c42c45858d5f3aedfa670698275303a3d19afa/library/std/src/panicking.rs:517:5
   1: core::panicking::panic_fmt
             at /rustc/09c42c45858d5f3aedfa670698275303a3d19afa/library/core/src/panicking.rs:101:14
   2: core::panicking::assert_failed_inner
             at /rustc/09c42c45858d5f3aedfa670698275303a3d19afa/library/core/src/panicking.rs:183:17
   3: core::panicking::assert_failed
             at /rustc/09c42c45858d5f3aedfa670698275303a3d19afa/library/core/src/panicking.rs:140:5
   4: arrow::array::array_union::tests::test_dense_i32_large
             at ./src/array/array_union.rs:366:13
   5: arrow::array::array_union::tests::test_dense_i32_large::{{closure}}
             at ./src/array/array_union.rs:338:5
   6: core::ops::function::FnOnce::call_once
             at /rustc/09c42c45858d5f3aedfa670698275303a3d19afa/library/core/src/ops/function.rs:227:5
   7: core::ops::function::FnOnce::call_once
             at /rustc/09c42c45858d5f3aedfa670698275303a3d19afa/library/core/src/ops/function.rs:227:5
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.

cc @nevi-me @paddyhoran

@codecov-commenter
Copy link

Codecov Report

Merging #863 (3b4a836) into master (2662bd8) will increase coverage by 0.00%.
The diff coverage is 100.00%.

❗ Current head 3b4a836 differs from pull request most recent head 43900cd. Consider uploading reports for the commit 43900cd to get more accurate results
Impacted file tree graph

@@           Coverage Diff           @@
##           master     #863   +/-   ##
=======================================
  Coverage   82.66%   82.67%           
=======================================
  Files         168      168           
  Lines       48172    48197   +25     
=======================================
+ Hits        39823    39848   +25     
  Misses       8349     8349           
Impacted Files Coverage Δ
arrow/src/array/array_union.rs 89.88% <100.00%> (+0.61%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 2662bd8...43900cd. Read the comment docs.

Copy link
Contributor

@paddyhoran paddyhoran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @helgikrs, LGTM.

@alamb alamb merged commit 05eb63f into apache:master Oct 26, 2021
alamb pushed a commit that referenced this pull request Oct 27, 2021
The `value_offset` function only read the least significant byte in the
offset array, causing issues with unions with more than 255 rows of any
given variant. Fix the issue by reading the entire i32 offset and add a
unit test.
alamb added a commit that referenced this pull request Oct 27, 2021
The `value_offset` function only read the least significant byte in the
offset array, causing issues with unions with more than 255 rows of any
given variant. Fix the issue by reading the entire i32 offset and add a
unit test.

Co-authored-by: Helgi Kristvin Sigurbjarnarson <helgikrs@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrow Changes to the arrow crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Incorrect offset calculation for unions
4 participants