Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use littleendian arrow files for projection_should_work #1573

Merged
merged 1 commit into from Apr 16, 2022

Conversation

viirya
Copy link
Member

@viirya viirya commented Apr 16, 2022

Which issue does this PR close?

Closes #1548.

Rationale for this change

Currently by enabling force_validate, the test projection_should_work fails.

cargo test --features=force_validate -p arrow
---- ipc::reader::tests::projection_should_work stdout ----
thread 'ipc::reader::tests::projection_should_work' panicked at 'called `Result::unwrap()` on an `Err` value: InvalidArgumentError("Last offset 251658240 of List(Field { name: \"item\", data_type: Int32, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: None }) is larger than values length 15")', arrow/src/array/data.rs:301:34

It is because the arrow files used in the test is bigendian, but currently the IPC reader doesn't translate big endian offsets (#859). As that's a known issue, I change the test to use littleendian arrow files, and add a comment there to explain the reason.

What changes are included in this PR?

Are there any user-facing changes?

@github-actions github-actions bot added the arrow Changes to the arrow crate label Apr 16, 2022
fn projection_should_work() {
// complementary to the previous test
let testdata = crate::util::test_util::arrow_test_data();
let paths = vec![
"generated_interval",
"generated_datetime",
// "generated_map", Err: Last offset 872415232 of Utf8 is larger than values length 52 (https://github.com/apache/arrow-rs/issues/859)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is also due to bigendian offsets. So I uncomment it too.

@codecov-commenter
Copy link

codecov-commenter commented Apr 16, 2022

Codecov Report

Merging #1573 (c9a0e93) into master (dbc47e0) will decrease coverage by 0.00%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##           master    #1573      +/-   ##
==========================================
- Coverage   82.87%   82.87%   -0.01%     
==========================================
  Files         193      193              
  Lines       55304    55304              
==========================================
- Hits        45835    45834       -1     
- Misses       9469     9470       +1     
Impacted Files Coverage Δ
arrow/src/ipc/reader.rs 88.61% <ø> (ø)
arrow/src/datatypes/datatype.rs 66.40% <0.00%> (-0.40%) ⬇️
arrow/src/array/transform/mod.rs 86.35% <0.00%> (-0.12%) ⬇️
parquet_derive/src/parquet_field.rs 66.21% <0.00%> (+0.22%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update dbc47e0...c9a0e93. Read the comment docs.

Copy link
Contributor

@tustvold tustvold left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Dandandan
Copy link
Contributor

Thanks for fixing, this was an oversight of me!

@Dandandan Dandandan merged commit 6bb6ed0 into apache:master Apr 16, 2022
@viirya
Copy link
Member Author

viirya commented Apr 16, 2022

Thanks @tustvold @Dandandan !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrow Changes to the arrow crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Output of ipc::reader::tests::projection_should_work fails validation
4 participants