Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix null bitmap length validation (#1231) #1232

Merged
merged 2 commits into from
Jan 24, 2022

Conversation

tustvold
Copy link
Contributor

Which issue does this PR close?

Closes #1231.

Rationale for this change

See ticket

What changes are included in this PR?

Fixes the null bitmap length validation

Are there any user-facing changes?

Previously invalid data will now be caught by validation

@github-actions github-actions bot added the arrow Changes to the arrow crate label Jan 23, 2022
@codecov-commenter
Copy link

Codecov Report

Merging #1232 (4d112c1) into master (fcd37ee) will decrease coverage by 0.00%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1232      +/-   ##
==========================================
- Coverage   82.70%   82.70%   -0.01%     
==========================================
  Files         175      175              
  Lines       51711    51713       +2     
==========================================
+ Hits        42769    42770       +1     
- Misses       8942     8943       +1     
Impacted Files Coverage Δ
arrow/src/array/data.rs 82.29% <100.00%> (+0.01%) ⬆️
parquet_derive/src/parquet_field.rs 65.98% <0.00%> (-0.23%) ⬇️
parquet/src/encodings/encoding.rs 93.52% <0.00%> (-0.20%) ⬇️
arrow/src/array/transform/mod.rs 85.56% <0.00%> (ø)
arrow/src/csv/reader.rs 88.12% <0.00%> (+0.01%) ⬆️
arrow/src/datatypes/field.rs 54.10% <0.00%> (+0.30%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update fcd37ee...4d112c1. Read the comment docs.

fn test_bitmap_too_small() {
let buffer = make_i32_buffer(100);
let buffer = make_i32_buffer(9);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand why changes (other than the message) are needed for this test.

The original formulation with 100 still fails even after this PR, but with a different error message

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am confused -- if the size calculation really was off, I would expect this code to fail ... But it passes just fine on master without the changes in this PR. So 🤔

    #[test]
    fn test_bitmap_size_ok_ok() {
        let buffer = make_i32_buffer(8);
        let null_bit_buffer = Buffer::from(vec![0b11111111]);

        ArrayData::try_new(
            DataType::Int32,
            8,
            Some(0),
            Some(null_bit_buffer),
            0,
            vec![buffer],
            vec![],
        ).unwrap();
    }

Copy link
Contributor Author

@tustvold tustvold Jan 24, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You've written make_i32_buffer(8); not make_i32_buffer(9); ?

The changes are necessary to make this test more restrictive, previously it passed because 100 rows needs 13 bytes which is still more than the 8 bytes it thinks are in the null buffer (when actually there is only 1 byte and 8 bits)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You've written make_i32_buffer(8); not make_i32_buffer(9); ?

Yes, because I am making a 8 element array of Int32s.

What I was trying to do was find a test that shows incorrect behavior on master but passes with this PR. The test you have modified passed for me on master (with a different error message).

Is there a test that we can write that would have failed (validate() would pass on master but not on this PR)?

Copy link
Contributor Author

@tustvold tustvold Jan 24, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test as in the PR will not fail on master on account of the null bitmask length, it will in fact fail because it reads beyond the bounds of the null bit buffer and thinks there is a null bit 😱

The below passes validation on master

fn test_bitmap_too_small() {
        let buffer = make_i32_buffer(9);
        let null_bit_buffer = Buffer::from(vec![0b11111111]);

        ArrayData::try_new(
            DataType::Int32,
            9,
            Some(1), // OUT-OF-BOUNDS READ
            Some(null_bit_buffer),
            0,
            vec![buffer],
            vec![],
        )
        .unwrap();
    }

@@ -650,7 +650,8 @@ impl ArrayData {
}

// check null bit buffer size
if let Some(null_bit_buffer) = self.null_bitmap.as_ref() {
if let Some(null_bit_map) = self.null_bitmap.as_ref() {
let null_bit_buffer = null_bit_map.buffer_ref();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand this change -- the length of the BitMap is in bits which seems like it is properly handled in bit_util::ceil() on the next linel

If you change this to use the underlying buffer, then the length is in bytes, so then the division should also be removed?

Copy link
Contributor Author

@tustvold tustvold Jan 24, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ceil computes the number of bytes from len_plus_offset and then compares this against the number of bits in the bitmap. This is the issue. In particular null_bit_buffer.len() previously returned the length in bits as that is what Bitmap::len returns. With this PR it instead calls Buffer::len which returns the length in bytes.

FWIW I filed #1233 as this is really confusing

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aha -- I finally got it to fail on master with MIRI

test array::data::tests::test_bitmap_too_small_invalid ... error: Undefined Behavior: using uninitialized data, but this operation requires initialized memory
    --> arrow/src/util/bit_chunk_iterator.rs:94:25
     |
94   |                 bits |= (byte as u64) << (i * 8 - bit_offset);
     |                         ^^^^^^^^^^^^^ using uninitialized data, but this operation requires initialized memory
     |
     = help: this indicates a bug in the program: it performed an invalid operation, and caused Undefined Behavior
     = help: see https://doc.rust-lang.org/nightly/reference/behavior-considered-undefined.html for further information
             
     = note: inside `util::bit_chunk_iterator::BitChunks::remainder_bits` at arrow/src/util/bit_chunk_iterator.rs:94:25
note: inside `buffer::immutable::Buffer::count_set_bits_offset` at arrow/src/buffer/immutable.rs:210:18
    --> arrow/src/buffer/immutable.rs:210:18
     |
210  |         count += chunks.remainder_bits().count_ones() as usize;
     |                  ^^^^^^^^^^^^^^^^^^^^^^^
note: inside `array::data::count_nulls` at arrow/src/array/data.rs:43:25
    --> arrow/src/array/data.rs:43:25
     |
43   |         len.checked_sub(buf.count_set_bits_offset(offset, len))
     |                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
note: inside `array::data::ArrayData::validate_full` at arrow/src/array/data.rs:931:33
    --> arrow/src/array/data.rs:931:33
     |
931  |         let actual_null_count = count_nulls(null_bitmap_buffer, self.offset, self.len);
     |                                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
note: inside `array::data::ArrayData::try_new` at arrow/src/array/data.rs:344:9
    --> arrow/src/array/data.rs:344:9
     |
344  |         new_self.validate_full()?;
     |         ^^^^^^^^^^^^^^^^^^^^^^^^
note: inside `array::data::tests::test_bitmap_too_small_invalid` at arrow/src/array/data.rs:1720:9
    --> arrow/src/array/data.rs:1720:9
     |
1720 | /         ArrayData::try_new(
1721 | |             DataType::Int32,
1722 | |             9,
1723 | |             Some(1), // OUT-OF-BOUNDS READ
...    |
1727 | |             vec![],
1728 | |         )
     | |_________^
note: inside closure at arrow/src/array/data.rs:1716:5
    --> arrow/src/array/data.rs:1716:5
     |
1715 |       #[test]
     |       ------- in this procedural macro expansion
1716 | /     fn test_bitmap_too_small_invalid() {
1717 | |         let buffer = make_i32_buffer(9);
1718 | |         let null_bit_buffer = Buffer::from(vec![0b11111111]);
1719 | |
...    |
1729 | |         .unwrap();
1730 | |     }
     | |_____^
     = note: inside `<[closure@arrow/src/array/data.rs:1716:5: 1730:6] as std::ops::FnOnce<()>>::call_once - shim` at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/core/src/ops/function.rs:227:5
     = note: inside `<fn() as std::ops::FnOnce<()>>::call_once - shim(fn())` at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/core/src/ops/function.rs:227:5
     = note: inside `test::__rust_begin_short_backtrace::<fn()>` at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/test/src/lib.rs:574:5
     = note: inside closure at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/test/src/lib.rs:565:30
     = note: inside `<[closure@test::run_test::{closure#1}] as std::ops::FnOnce<()>>::call_once - shim(vtable)` at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/core/src/ops/function.rs:227:5
     = note: inside `<std::boxed::Box<dyn std::ops::FnOnce() + std::marker::Send> as std::ops::FnOnce<()>>::call_once` at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/alloc/src/boxed.rs:1854:9
     = note: inside `<std::panic::AssertUnwindSafe<std::boxed::Box<dyn std::ops::FnOnce() + std::marker::Send>> as std::ops::FnOnce<()>>::call_once` at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/core/src/panic/unwind_safe.rs:271:9
     = note: inside `std::panicking::r#try::do_call::<std::panic::AssertUnwindSafe<std::boxed::Box<dyn std::ops::FnOnce() + std::marker::Send>>, ()>` at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/std/src/panicking.rs:485:40
     = note: inside `std::panicking::r#try::<(), std::panic::AssertUnwindSafe<std::boxed::Box<dyn std::ops::FnOnce() + std::marker::Send>>>` at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/std/src/panicking.rs:449:19
     = note: inside `std::panic::catch_unwind::<std::panic::AssertUnwindSafe<std::boxed::Box<dyn std::ops::FnOnce() + std::marker::Send>>, ()>` at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/std/src/panic.rs:136:14
     = note: inside `test::run_test_in_process` at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/test/src/lib.rs:597:18
     = note: inside closure at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/test/src/lib.rs:491:39
     = note: inside `test::run_test::run_test_inner` at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/test/src/lib.rs:529:13
     = note: inside `test::run_test` at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/test/src/lib.rs:561:28
     = note: inside `test::run_tests::<[closure@test::run_tests_console::{closure#2}]>` at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/test/src/lib.rs:304:17
     = note: inside `test::run_tests_console` at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/test/src/console.rs:290:5
     = note: inside `test::test_main` at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/test/src/lib.rs:115:15
     = note: inside `test::test_main_static` at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/test/src/lib.rs:134:5
     = note: inside `main`
     = note: inside `<fn() as std::ops::FnOnce<()>>::call_once - shim(fn())` at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/core/src/ops/function.rs:227:5
     = note: inside `std::sys_common::backtrace::__rust_begin_short_backtrace::<fn(), ()>` at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/std/src/sys_common/backtrace.rs:123:18
     = note: inside closure at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/std/src/rt.rs:145:18
     = note: inside `std::ops::function::impls::<impl std::ops::FnOnce<()> for &dyn std::ops::Fn() -> i32 + std::marker::Sync + std::panic::RefUnwindSafe>::call_once` at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/core/src/ops/function.rs:259:13
     = note: inside `std::panicking::r#try::do_call::<&dyn std::ops::Fn() -> i32 + std::marker::Sync + std::panic::RefUnwindSafe, i32>` at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/std/src/panicking.rs:485:40
     = note: inside `std::panicking::r#try::<i32, &dyn std::ops::Fn() -> i32 + std::marker::Sync + std::panic::RefUnwindSafe>` at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/std/src/panicking.rs:449:19
     = note: inside `std::panic::catch_unwind::<&dyn std::ops::Fn() -> i32 + std::marker::Sync + std::panic::RefUnwindSafe, i32>` at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/std/src/panic.rs:136:14
     = note: inside closure at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/std/src/rt.rs:128:48
     = note: inside `std::panicking::r#try::do_call::<[closure@std::rt::lang_start_internal::{closure#2}], isize>` at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/std/src/panicking.rs:485:40
     = note: inside `std::panicking::r#try::<isize, [closure@std::rt::lang_start_internal::{closure#2}]>` at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/std/src/panicking.rs:449:19
     = note: inside `std::panic::catch_unwind::<[closure@std::rt::lang_start_internal::{closure#2}], isize>` at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/std/src/panic.rs:136:14
     = note: inside `std::rt::lang_start_internal` at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/std/src/rt.rs:128:20
     = note: inside `std::rt::lang_start::<()>` at /Users/alamb/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/library/std/src/rt.rs:144:17
     = note: this error originates in the attribute macro `test` (in Nightly builds, run with -Z macro-backtrace for more info)

error: aborting due to previous error

error: test failed, to rerun pass '-p arrow --lib'

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to make the docs clearer in #1237

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I confirmed that the test in this PR fails MIRI on master.

Thanks for bearing with me @tustvold

@alamb alamb merged commit 0375e81 into apache:master Jan 24, 2022
@alamb alamb added security and removed security labels Jan 24, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrow Changes to the arrow crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Bitmap Length Validation is Incorrect
3 participants