New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ARROW-10943: WIP: HACK! Test bool miri bugs #8948
Conversation
Running the boolean encoding test with Miri has the following warning: ``` greg@gregslaptop ~/.../rust/parquet $ MIRIFLAGS="-Zmiri-disable-isolation" cargo +nightly miri test 'encodings::encoding::tests::test_bool' Compiling parquet v3.0.0-SNAPSHOT (/home/greg/projects/arrow/rust/parquet) Finished test [unoptimized + debuginfo] target(s) in 1.94s Running /home/greg/projects/arrow/rust/target/x86_64-unknown-linux-gnu/debug/deps/parquet-9bca74cf6a2c2a95 running 1 test test encodings::encoding::tests::test_bool ... error: Undefined Behavior: accessing memory with alignment 2, but alignment 4 is required --> parquet/src/util/bit_packing.rs:84:13 | 84 | *out = ((*in_buf) >> 1) & 1; | ^^^^^^^^^ accessing memory with alignment 2, but alignment 4 is required | = help: this indicates a bug in the program: it performed an invalid operation, and caused Undefined Behavior = help: see https://doc.rust-lang.org/nightly/reference/behavior-considered-undefined.html for further information = note: inside `util::bit_packing::unpack1_32` at parquet/src/util/bit_packing.rs:84:13 note: inside `util::bit_packing::unpack32` at parquet/src/util/bit_packing.rs:36:14 --> parquet/src/util/bit_packing.rs:36:14 | 36 | 1 => unpack1_32(in_ptr, out_ptr), | ^^^^^^^^^^^^^^^^^^^^^^^^^^^ note: inside `util::bit_util::BitReader::get_batch::<bool>` at parquet/src/util/bit_util.rs:554:30 --> parquet/src/util/bit_util.rs:554:30 | 554 | in_ptr = unpack32(in_ptr, out_ptr, num_bits); | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ note: inside `<bool as data_type::private::ParquetValueType>::decode` at parquet/src/data_type.rs:684:31 --> parquet/src/data_type.rs:684:31 | 684 | let values_read = bit_reader.get_batch(&mut buffer[..num_values], 1); | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ note: inside `<encodings::decoding::PlainDecoder<data_type::BoolType> as encodings::decoding::Decoder<data_type::BoolType>>::get` at parquet/src/encodings/decoding.rs:197:9 --> parquet/src/encodings/decoding.rs:197:9 | 197 | T::T::decode(buffer, &mut self.inner) | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ note: inside `encodings::encoding::tests::put_and_get::<data_type::BoolType>` at parquet/src/encodings/encoding.rs:1271:9 --> parquet/src/encodings/encoding.rs:1271:9 | 1271 | decoder.get(output) | ^^^^^^^^^^^^^^^^^^^ note: inside `<data_type::BoolType as encodings::encoding::tests::EncodingTester<data_type::BoolType>>::test_internal` at parquet/src/encodings/encoding.rs:1214:28 --> parquet/src/encodings/encoding.rs:1214:28 | 1214 | actual_total = put_and_get( | ____________________________^ 1215 | | &mut encoder, 1216 | | &mut decoder, 1217 | | &values[..], 1218 | | &mut result_data[..], 1219 | | )?; | |_____________^ note: inside `<data_type::BoolType as encodings::encoding::tests::EncodingTester<data_type::BoolType>>::test` at parquet/src/encodings/encoding.rs:1159:24 --> parquet/src/encodings/encoding.rs:1159:24 | 1159 | enc => Self::test_internal(enc, total, type_length), | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ note: inside `encodings::encoding::tests::test_bool` at parquet/src/encodings/encoding.rs:967:9 --> parquet/src/encodings/encoding.rs:967:9 | 967 | BoolType::test(Encoding::PLAIN, TEST_SET_SIZE, -1); | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ note: inside closure at parquet/src/encodings/encoding.rs:966:5 --> parquet/src/encodings/encoding.rs:966:5 | 966 | / fn test_bool() { 967 | | BoolType::test(Encoding::PLAIN, TEST_SET_SIZE, -1); 968 | | BoolType::test(Encoding::PLAIN_DICTIONARY, TEST_SET_SIZE, -1); 969 | | BoolType::test(Encoding::RLE, TEST_SET_SIZE, -1); 970 | | } | |_____^ = note: inside <[closure@parquet/src/encodings/encoding.rs:966:5: 970:6] as std::ops::FnOnce<()>>::call_once - shim` at /home/greg/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/ops/function.rs:227:5 = note: inside `<fn() as std::ops::FnOnce<()>>::call_once - shim(fn())` at /home/greg/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/ops/function.rs:227:5 = note: inside `test::__rust_begin_short_backtrace::<fn()>` at /home/greg/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/test/src/lib.rs:516:5 = note: inside closure at /home/greg/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/test/src/lib.rs:507:30 = note: inside `<[closure@test::run_test::{closure#2}] as std::ops::FnOnce<()>>::call_once - shim(vtable)` at /home/greg/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/ops/function.rs:227:5 = note: inside `<std::boxed::Box<dyn std::ops::FnOnce() + std::marker::Send> as std::ops::FnOnce<()>>::call_once` at /home/greg/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/alloc/src/boxed.rs:1328:9 = note: inside `<std::panic::AssertUnwindSafe<std::boxed::Box<dyn std::ops::FnOnce() + std::marker::Send>> as std::ops::FnOnce<()>>::call_once` at /home/greg/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/panic.rs:322:9 = note: inside `std::panicking::r#try::do_call::<std::panic::AssertUnwindSafe<std::boxed::Box<dyn std::ops::FnOnce() + std::marker::Send>>, ()>` at /home/greg/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/panicking.rs:379:40 = note: inside `std::panicking::r#try::<(), std::panic::AssertUnwindSafe<std::boxed::Box<dyn std::ops::FnOnce() + std::marker::Send>>>` at /home/greg/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/panicking.rs:343:19 = note: inside `std::panic::catch_unwind::<std::panic::AssertUnwindSafe<std::boxed::Box<dyn std::ops::FnOnce() + std::marker::Send>>, ()>` at /home/greg/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/panic.rs:396:14 = note: inside `test::run_test_in_process` at /home/greg/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/test/src/lib.rs:538:18 = note: inside closure at /home/greg/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/test/src/lib.rs:449:39 = note: inside `test::run_test::run_test_inner` at /home/greg/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/test/src/lib.rs:474:13 = note: inside `test::run_test` at /home/greg/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/test/src/lib.rs:504:28 = note: inside `test::run_tests::<[closure@test::run_tests_console::{closure#2}]>` at /home/greg/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/test/src/lib.rs:283:13 = note: inside `test::run_tests_console` at /home/greg/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/test/src/console.rs:289:5 = note: inside `test::test_main` at /home/greg/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/test/src/lib.rs:121:15 = note: inside `test::test_main_static` at /home/greg/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/test/src/lib.rs:140:5 = note: inside `main` = note: inside `<fn() as std::ops::FnOnce<()>>::call_once - shim(fn())` at /home/greg/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/ops/function.rs:227:5 = note: inside `std::sys_common::backtrace::__rust_begin_short_backtrace::<fn(), ()>` at /home/greg/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/sys_common/backtrace.rs:125:18 = note: inside closure at /home/greg/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/rt.rs:66:18 = note: inside `std::ops::function::impls::<impl std::ops::FnOnce<()> for &dyn std::ops::Fn() -> i32 + std::marker::Sync + std::panic::RefUnwindSafe>::call_once` at /home/greg/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/ops/function.rs:259:13 = note: inside `std::panicking::r#try::do_call::<&dyn std::ops::Fn() -> i32 + std::marker::Sync + std::panic::RefUnwindSafe, i32>` at /home/greg/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/panicking.rs:379:40 = note: inside `std::panicking::r#try::<i32, &dyn std::ops::Fn() -> i32 + std::marker::Sync + std::panic::RefUnwindSafe>` at /home/greg/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/panicking.rs:343:19 = note: inside `std::panic::catch_unwind::<&dyn std::ops::Fn() -> i32 + std::marker::Sync + std::panic::RefUnwindSafe, i32>` at /home/greg/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/panic.rs:396:14 = note: inside `std::rt::lang_start_internal` at /home/greg/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/rt.rs:51:25 = note: inside `std::rt::lang_start::<()>` at /home/greg/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/rt.rs:65:5 = note: this error originates in an attribute macro (in Nightly builds, run with -Z macro-backtrace for more info) error: aborting due to previous error error: test failed, to rerun pass '--lib'` ``` ... This might be related to ARROW-10943, stating that the pointer read is unaligned throughout seems to resolve the miri error
Miri complains that murmur2 is performing unaligned reads (and without telling the compiler about this, its technically UB even on x64). wyhash has emerged as a simpler hash with a similar performance and without needing unaligned reads.
Thanks @GregBowyer - this is great. I plan to review this / do some more careful testing later this week or the weekend |
I found that by running the parquet tests in a loop I can reproduce the failure on my machine locally more details in JIRA. Unfortunately, when I tried running the same experiment on this branch the test still fails intermittently
|
This appears not to be the issue, we might want miri at some point but not his. |
@alamb @andygrove
No idea if this is the root cause, but I figured that if its reading an incorrect number as a boolean its a messy transmute or similar.
As such I ran miri over the specific failing test and got two hits, one on the hash function and one on the bit packing.
The first commit states that the bit packing is unaligned which I think is ok to take (although it might be worth revisiting the bit-packing code).
The second replaces the hash function which is stated as being unaligned. I dont know if wyhash is the perfect choice (xxHash, t1ha or others might be equally good choices) but it does not suffer from irritating miri.
Thoughts? Ideas?