Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARROW-10943: WIP: HACK! Test bool miri bugs #8948

Closed
wants to merge 2 commits into from

Conversation

GregBowyer
Copy link
Contributor

@alamb @andygrove

No idea if this is the root cause, but I figured that if its reading an incorrect number as a boolean its a messy transmute or similar.

As such I ran miri over the specific failing test and got two hits, one on the hash function and one on the bit packing.

The first commit states that the bit packing is unaligned which I think is ok to take (although it might be worth revisiting the bit-packing code).

The second replaces the hash function which is stated as being unaligned. I dont know if wyhash is the perfect choice (xxHash, t1ha or others might be equally good choices) but it does not suffer from irritating miri.

Thoughts? Ideas?

Running the boolean encoding test with Miri has the following warning:

```
greg@gregslaptop ~/.../rust/parquet $ MIRIFLAGS="-Zmiri-disable-isolation" cargo +nightly miri test 'encodings::encoding::tests::test_bool'
   Compiling parquet v3.0.0-SNAPSHOT (/home/greg/projects/arrow/rust/parquet)
    Finished test [unoptimized + debuginfo] target(s) in 1.94s
     Running /home/greg/projects/arrow/rust/target/x86_64-unknown-linux-gnu/debug/deps/parquet-9bca74cf6a2c2a95

running 1 test
test encodings::encoding::tests::test_bool ... error: Undefined Behavior: accessing memory with alignment 2, but alignment 4 is required
    --> parquet/src/util/bit_packing.rs:84:13
     |
84   |     *out = ((*in_buf) >> 1) & 1;
     |             ^^^^^^^^^ accessing memory with alignment 2, but alignment 4 is required
     |
     = help: this indicates a bug in the program: it performed an invalid operation, and caused Undefined Behavior
     = help: see https://doc.rust-lang.org/nightly/reference/behavior-considered-undefined.html for further information

     = note: inside `util::bit_packing::unpack1_32` at parquet/src/util/bit_packing.rs:84:13
note: inside `util::bit_packing::unpack32` at parquet/src/util/bit_packing.rs:36:14
    --> parquet/src/util/bit_packing.rs:36:14
     |
36   |         1 => unpack1_32(in_ptr, out_ptr),
     |              ^^^^^^^^^^^^^^^^^^^^^^^^^^^
note: inside `util::bit_util::BitReader::get_batch::<bool>` at parquet/src/util/bit_util.rs:554:30
    --> parquet/src/util/bit_util.rs:554:30
     |
554  |                     in_ptr = unpack32(in_ptr, out_ptr, num_bits);
     |                              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
note: inside `<bool as data_type::private::ParquetValueType>::decode` at parquet/src/data_type.rs:684:31
    --> parquet/src/data_type.rs:684:31
     |
684  |             let values_read = bit_reader.get_batch(&mut buffer[..num_values], 1);
     |                               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
note: inside `<encodings::decoding::PlainDecoder<data_type::BoolType> as encodings::decoding::Decoder<data_type::BoolType>>::get` at parquet/src/encodings/decoding.rs:197:9
    --> parquet/src/encodings/decoding.rs:197:9
     |
197  |         T::T::decode(buffer, &mut self.inner)
     |         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
note: inside `encodings::encoding::tests::put_and_get::<data_type::BoolType>` at parquet/src/encodings/encoding.rs:1271:9
    --> parquet/src/encodings/encoding.rs:1271:9
     |
1271 |         decoder.get(output)
     |         ^^^^^^^^^^^^^^^^^^^
note: inside `<data_type::BoolType as encodings::encoding::tests::EncodingTester<data_type::BoolType>>::test_internal` at parquet/src/encodings/encoding.rs:1214:28
    --> parquet/src/encodings/encoding.rs:1214:28
     |
1214 |               actual_total = put_and_get(
     |  ____________________________^
1215 | |                 &mut encoder,
1216 | |                 &mut decoder,
1217 | |                 &values[..],
1218 | |                 &mut result_data[..],
1219 | |             )?;
     | |_____________^
note: inside `<data_type::BoolType as encodings::encoding::tests::EncodingTester<data_type::BoolType>>::test` at parquet/src/encodings/encoding.rs:1159:24
    --> parquet/src/encodings/encoding.rs:1159:24
     |
1159 |                 enc => Self::test_internal(enc, total, type_length),
     |                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
note: inside `encodings::encoding::tests::test_bool` at parquet/src/encodings/encoding.rs:967:9
    --> parquet/src/encodings/encoding.rs:967:9
     |
967  |         BoolType::test(Encoding::PLAIN, TEST_SET_SIZE, -1);
     |         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
note: inside closure at parquet/src/encodings/encoding.rs:966:5
    --> parquet/src/encodings/encoding.rs:966:5
     |
966  | /     fn test_bool() {
967  | |         BoolType::test(Encoding::PLAIN, TEST_SET_SIZE, -1);
968  | |         BoolType::test(Encoding::PLAIN_DICTIONARY, TEST_SET_SIZE, -1);
969  | |         BoolType::test(Encoding::RLE, TEST_SET_SIZE, -1);
970  | |     }
     | |_____^
     = note: inside <[closure@parquet/src/encodings/encoding.rs:966:5: 970:6] as std::ops::FnOnce<()>>::call_once - shim` at /home/greg/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/ops/function.rs:227:5
     = note: inside `<fn() as std::ops::FnOnce<()>>::call_once - shim(fn())` at /home/greg/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/ops/function.rs:227:5
     = note: inside `test::__rust_begin_short_backtrace::<fn()>` at /home/greg/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/test/src/lib.rs:516:5
     = note: inside closure at /home/greg/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/test/src/lib.rs:507:30
     = note: inside `<[closure@test::run_test::{closure#2}] as std::ops::FnOnce<()>>::call_once - shim(vtable)` at /home/greg/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/ops/function.rs:227:5
     = note: inside `<std::boxed::Box<dyn std::ops::FnOnce() + std::marker::Send> as std::ops::FnOnce<()>>::call_once` at /home/greg/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/alloc/src/boxed.rs:1328:9
     = note: inside `<std::panic::AssertUnwindSafe<std::boxed::Box<dyn std::ops::FnOnce() + std::marker::Send>> as std::ops::FnOnce<()>>::call_once` at /home/greg/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/panic.rs:322:9
     = note: inside `std::panicking::r#try::do_call::<std::panic::AssertUnwindSafe<std::boxed::Box<dyn std::ops::FnOnce() + std::marker::Send>>, ()>` at /home/greg/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/panicking.rs:379:40
     = note: inside `std::panicking::r#try::<(), std::panic::AssertUnwindSafe<std::boxed::Box<dyn std::ops::FnOnce() + std::marker::Send>>>` at /home/greg/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/panicking.rs:343:19
     = note: inside `std::panic::catch_unwind::<std::panic::AssertUnwindSafe<std::boxed::Box<dyn std::ops::FnOnce() + std::marker::Send>>, ()>` at /home/greg/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/panic.rs:396:14
     = note: inside `test::run_test_in_process` at /home/greg/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/test/src/lib.rs:538:18
     = note: inside closure at /home/greg/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/test/src/lib.rs:449:39
     = note: inside `test::run_test::run_test_inner` at /home/greg/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/test/src/lib.rs:474:13
     = note: inside `test::run_test` at /home/greg/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/test/src/lib.rs:504:28
     = note: inside `test::run_tests::<[closure@test::run_tests_console::{closure#2}]>` at /home/greg/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/test/src/lib.rs:283:13
     = note: inside `test::run_tests_console` at /home/greg/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/test/src/console.rs:289:5
     = note: inside `test::test_main` at /home/greg/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/test/src/lib.rs:121:15
     = note: inside `test::test_main_static` at /home/greg/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/test/src/lib.rs:140:5
     = note: inside `main`
     = note: inside `<fn() as std::ops::FnOnce<()>>::call_once - shim(fn())` at /home/greg/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/ops/function.rs:227:5
     = note: inside `std::sys_common::backtrace::__rust_begin_short_backtrace::<fn(), ()>` at /home/greg/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/sys_common/backtrace.rs:125:18
     = note: inside closure at /home/greg/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/rt.rs:66:18
     = note: inside `std::ops::function::impls::<impl std::ops::FnOnce<()> for &dyn std::ops::Fn() -> i32 + std::marker::Sync + std::panic::RefUnwindSafe>::call_once` at /home/greg/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/ops/function.rs:259:13
     = note: inside `std::panicking::r#try::do_call::<&dyn std::ops::Fn() -> i32 + std::marker::Sync + std::panic::RefUnwindSafe, i32>` at /home/greg/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/panicking.rs:379:40
     = note: inside `std::panicking::r#try::<i32, &dyn std::ops::Fn() -> i32 + std::marker::Sync + std::panic::RefUnwindSafe>` at /home/greg/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/panicking.rs:343:19
     = note: inside `std::panic::catch_unwind::<&dyn std::ops::Fn() -> i32 + std::marker::Sync + std::panic::RefUnwindSafe, i32>` at /home/greg/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/panic.rs:396:14
     = note: inside `std::rt::lang_start_internal` at /home/greg/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/rt.rs:51:25
     = note: inside `std::rt::lang_start::<()>` at /home/greg/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/rt.rs:65:5
     = note: this error originates in an attribute macro (in Nightly builds, run with -Z macro-backtrace for more info)

error: aborting due to previous error

error: test failed, to rerun pass '--lib'`
```

... This might be related to ARROW-10943, stating that the pointer read
is unaligned throughout seems to resolve the miri error
Miri complains that murmur2 is performing unaligned reads (and without
telling the compiler about this, its technically UB even on x64).

wyhash has emerged as a simpler hash with a similar performance and
without needing unaligned reads.
@github-actions
Copy link

@alamb
Copy link
Contributor

alamb commented Dec 17, 2020

Thanks @GregBowyer - this is great. I plan to review this / do some more careful testing later this week or the weekend

@alamb
Copy link
Contributor

alamb commented Dec 19, 2020

I found that by running the parquet tests in a loop I can reproduce the failure on my machine locally more details in JIRA.

Unfortunately, when I tried running the same experiment on this branch the test still fails intermittently

---- encodings::encoding::tests::test_bool stdout ----
thread 'encodings::encoding::tests::test_bool' panicked at 'Invalid byte when reading bool', parquet/src/util/bit_util.rs:73:18
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

@GregBowyer
Copy link
Contributor Author

This appears not to be the issue, we might want miri at some point but not his.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants