Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MIRI CI check fails intermittently with thread 'main' panicked at 'invalid time' #345

Closed
alamb opened this issue May 24, 2021 · 7 comments · Fixed by #421
Closed

MIRI CI check fails intermittently with thread 'main' panicked at 'invalid time' #345

alamb opened this issue May 24, 2021 · 7 comments · Fixed by #421
Labels
bug development-process Related to development process of arrow-rs security

Comments

@alamb
Copy link
Contributor

alamb commented May 24, 2021

Describe the bug
Ever since we have enabled MIRI checks on CI in #323 (epic thanks to @roee88 ) we have started to see intermittent failures of the check

To Reproduce
Run CI checks on a PR

Here are some example failures

https://github.com/apache/arrow-rs/runs/2656075800
https://github.com/apache/arrow-rs/runs/2658362992

An excerpt from
https://github.com/alamb/arrow-rs/runs/2659329294?check_suite_focus=true

test array::array_primitive::tests::test_time32_millisecond_array_from_vec ... ok
test array::array_primitive::tests::test_time32second_fmt_debug ... ok
thread 'main' panicked at 'invalid time', /home/runner/.cargo/registry/src/github.com-1ecc6299db9ec823/chrono-0.4.19/src/naive/time.rs:412:67
stack backtrace:
   0:          0xa9fa3ef - std::backtrace_rs::backtrace::miri::trace_unsynchronized::<&mut [closure@std::sys_common::backtrace::_print_fmt::{closure#1}]>
                               at /usr/share/rust/.rustup/toolchains/nightly-2021-03-24-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/../../backtrace/src/backtrace/miri.rs:65:18
   1:          0xaa266e9 - std::backtrace_rs::backtrace::miri::trace::<&mut [closure@std::sys_common::backtrace::_print_fmt::{closure#1}]>
                               at /usr/share/rust/.rustup/toolchains/nightly-2021-03-24-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/../../backtrace/src/backtrace/miri.rs:51:14
   2:          0xa9dc49b - std::backtrace_rs::backtrace::trace_unsynchronized::<[closure@std::sys_common::backtrace::_print_fmt::{closure#1}]>
                               at /usr/share/rust/.rustup/toolchains/nightly-2021-03-24-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/../../backtrace/src/backtrace/mod.rs:66:5
   3:          0x298f21f - std::sys_common::backtrace::_print_fmt
                               at /usr/share/rust/.rustup/toolchains/nightly-2021-03-24-x86_64-unknown-linux-g

Expected behavior
A clear and concise description of what you expected to happen.

Additional context
Add any other context about the problem here.

@alamb alamb added bug development-process Related to development process of arrow-rs labels May 24, 2021
@alamb
Copy link
Contributor Author

alamb commented May 24, 2021

Caused by:
  process didn't exit successfully: `/usr/share/rust/.rustup/toolchains/nightly-2021-03-24-x86_64-unknown-linux-gnu/bin/cargo-miri /home/runner/work/arrow-rs/arrow-rs/target/x86_64-unknown-linux-gnu/debug/deps/arrow-d73b9ed71d5fffd0 --skip csv --skip ipc --skip json` (exit code: 255)
test compute::kernels::length::tests::bit_length_test_string ... 
Error: Process completed with exit code 255.

🤔

@roee88
Copy link
Contributor

roee88 commented May 26, 2021

I see that it's (almost?) always in bit_length_test_string. Did you consider skipping this test? Should we open an issue in miri to help us debug this? It might reveal something important about this test.

@alamb
Copy link
Contributor Author

alamb commented May 26, 2021

I see that it's (almost?) always in bit_length_test_string. Did you consider skipping this test? Should we open an issue in miri to help us debug this? It might reveal something important about this test.

@roee88 I did not -- I tried to skip array::array_primitive::tests::test_time32second_invalid_neg in #346 but that did not seem to help. I don't have much time to devote to this issue at the moment (most of my arrow bandwidth is going towards getting the releases in shape) so any help here would be most appreciated

@roee88
Copy link
Contributor

roee88 commented May 26, 2021

I think that the issue description here is wrong and this is expected for test_time32second_invalid_neg.

I did a quick memory usage check and identifier the following:

  1. bit_length_test_string consumes on its own ~8g accumulated to a peak of ~10g that are not released after the test is done
  2. bit_length_test_large_string has similar memory consumption but the memory is freed after the test is done
  3. match_single_group consumes 2.7g accumulated to a peak of ~4.5G that are not released after the test is done

I tested ignoring bit_length_test_string and github actions checks pass (5 re-runs). I think it might be worth to ignore all three though. Some questions:

  1. Is it possible that there is a memory leak in bit_length_test_string and match_single_group?
  2. Should we disable bit_length_test_large_string just because it consumes a lot of RAM while running?
  3. Should a support ticket be opened in Miri?

I will submit a PR based on the answers.

@alamb
Copy link
Contributor Author

alamb commented May 26, 2021

Thank you @roee88

@jorgecarleitao
Copy link
Member

Just to understand, the memory consumption is only when running the test via MIRI, right?

@alamb
Copy link
Contributor Author

alamb commented May 26, 2021

Is it possible that there is a memory leak in bit_length_test_string and match_single_group?

I don't know but it sounds like it is worth more investigation

Should we disable bit_length_test_large_string just because it consumes a lot of RAM while running?

I think we should disable the test under MIRI

Should a support ticket be opened in MIRI?

If there is some bug or improvement you can articulate, I ams ure the MIRI developers would appreciate a ticket

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug development-process Related to development process of arrow-rs security
Projects
None yet
3 participants