Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Miri: enable preemption again #855

Merged
merged 11 commits into from
Jul 1, 2022
Merged

Miri: enable preemption again #855

merged 11 commits into from
Jul 1, 2022

Conversation

RalfJung
Copy link
Contributor

Miri has a work-around for rust-lang/rust#55005 now.

Also, "check-number-validity" is on by default these days. And I am not sure why "compare-exchange-weak-failure-rate=0.0" was set so let's see what happens when we remove it. :)

@taiki-e
Copy link
Member

taiki-e commented Jun 26, 2022

Thanks!

I am not sure why "compare-exchange-weak-failure-rate=0.0" was set

This is because some single-threaded tests (e.g., doctest of Stealer::steal) assume that weak CAS will not fail. It is not strictly a correct assumption, but there were many tests (especially doctests) that depended on that behavior, so I had to set it. (sorry I forgot to leave a comment explaining this)

@RalfJung
Copy link
Contributor Author

Also looks like I failed to actually remove that flag.^^ I'll add a comment instead.

@RalfJung
Copy link
Contributor Author

RalfJung commented Jun 26, 2022

That failure in crossbeam-deque stampede looks real though?

error: Undefined Behavior: type validation failed: encountered a dangling box (use-after-free)
    --> /home/runner/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/ptr/mod.rs:1504:9
     |
1504 |         intrinsics::volatile_load(src)
     |         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ type validation failed: encountered a dangling box (use-after-free)
     |
     = help: this indicates a bug in the program: it performed an invalid operation, and caused Undefined Behavior
     = help: see https://doc.rust-lang.org/nightly/reference/behavior-considered-undefined.html for further information
     = note: inside `std::ptr::read_volatile::<std::boxed::Box<usize>>` at /home/runner/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/ptr/mod.rs:1504:9
note: inside `crossbeam_deque::deque::Buffer::<std::boxed::Box<usize>>::read` at /home/runner/work/crossbeam/crossbeam/crossbeam-deque/src/deque.rs:76:9
    --> /home/runner/work/crossbeam/crossbeam/crossbeam-deque/src/deque.rs:76:9
     |
76   |         ptr::read_volatile(self.at(index))
     |         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
note: inside `crossbeam_deque::Stealer::<std::boxed::Box<usize>>::steal` at /home/runner/work/crossbeam/crossbeam/crossbeam-deque/src/deque.rs:652:29
    --> /home/runner/work/crossbeam/crossbeam/crossbeam-deque/src/deque.rs:652:29
     |
652  |         let task = unsafe { buffer.deref().read(f) };
     |                             ^^^^^^^^^^^^^^^^^^^^^^
note: inside closure at crossbeam-deque/tests/fifo.rs:126:41
    --> crossbeam-deque/tests/fifo.rs:126:41
     |
126  |                     if let Success(x) = s.steal() {
     |                                         ^^^^^^^^^
note: inside closure at /home/runner/work/crossbeam/crossbeam/crossbeam-utils/src/thread.rs:438:31
    --> /home/runner/work/crossbeam/crossbeam/crossbeam-utils/src/thread.rs:438:31
     |
438  |                     let res = f(&scope);
     |                               ^^^^^^^^^

Is there an issue for that?

@RalfJung
Copy link
Contributor Author

I think I fixed a soundness bug. :D And the code also became cleaner by removing the need for a bunch of mem::forget.
Let me know if you'd prefer that to be a separate PR.

@taiki-e
Copy link
Member

taiki-e commented Jun 26, 2022

Thanks. IIUC, that could have been real use-after-free before GHSA-pqqp-xmhj-wgcw fixed. After that fix, that value is never used, but your fix to use MaybeUninit looks correct because it is possible to read a value that is not valid as T.

@RalfJung
Copy link
Contributor Author

Hm, but something seems to not be right with the patch...

@taiki-e
Copy link
Member

taiki-e commented Jun 26, 2022

I think you need to change this line:

buffer.deref().at(i).drop_in_place();

to:

(buffer.deref().at(i) as *mut T).drop_in_place();

@@ -62,7 +64,7 @@ impl<T> Buffer<T> {
/// technically speaking a data race and therefore UB. We should use an atomic store here, but
/// that would be more expensive and difficult to implement generically for all types `T`.
/// Hence, as a hack, we use a volatile write instead.
unsafe fn write(&self, index: isize, task: T) {
unsafe fn write(&self, index: isize, task: MaybeUninit<T>) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the value written to self.at(index) must be valid T (otherwise, the destructor cannot be sure that the value is valid), it may be better to use T instead of MaybeUninit<T> in the argument (and use MaybeUninit::new(task) in the next line).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did this because there are some places that just read the data and write it back immediately. By taking MaybeUninit<T> here we can avoid having to assert that this data is valid.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, that makes sense.

@RalfJung
Copy link
Contributor Author

I think you need to change this line:

Ah, yes!

@RalfJung
Copy link
Contributor Author

Btw, Miri might at some point actually notice that these volatile reads/writes should be atomic. We do have a data race detector, after all. I am not sure why it doesn't kick in here; probably it needs some really specific interleaving to actually be a race.

@RalfJung
Copy link
Contributor Author

Well, I think that is exactly what is happening in the latest test failure. :)

@RalfJung
Copy link
Contributor Author

I added timing because the tests became a lot slower, but now they are fast again... whatever.^^
This still helps to identify slow tests, so if you don't mind I will keep it in.

Copy link
Member

@taiki-e taiki-e left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks again!

bors r+

Comment on lines +6 to +8
# We need 'ts' for the per-line timing
sudo apt-get -y install moreutils
echo
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, passing -- -Z unstable-options --report-time to cargo also works. However, I'm not sure if this is better than ts, as it seems there is no way to color the report in CI.

report-time

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I didn't know about this flag.
However (a) it only works with isolation disabled (since for regular tests it actually measures the time inside Miri), (b) it is harder to read without colors and libtest output in Miri is never colored for $REASONS, (c) the colors might not be appropriate for Miri; 1.3s seems still pretty good to me. ;)

Up to you, I am fine with removing the ts thing again. You have isolation disabled anyway.

bors bot added a commit that referenced this pull request Jun 30, 2022
855: Miri: enable preemption again r=taiki-e a=RalfJung

Miri has a [work-around](rust-lang/miri#2248) for rust-lang/rust#55005 now.

Also, "check-number-validity" is on by default these days.  And I am not sure why "compare-exchange-weak-failure-rate=0.0" was set so let's see what happens when we remove it. :)

Co-authored-by: Ralf Jung <post@ralfj.de>
@bors
Copy link
Contributor

bors bot commented Jun 30, 2022

Timed out.

@RalfJung
Copy link
Contributor Author

4h? Either there is a non-deterministic infinite loop somewhere or the test runner just got stuck. Let me still reduce some test sizes.

Argh, Rust 1.36 is ancient...

@RalfJung
Copy link
Contributor Author

What is clippy talking about, I didn't even change that code?!?

@taiki-e
Copy link
Member

taiki-e commented Jul 1, 2022

bors r+

It seems that a new clippy lint was added in Rust 1.62 released about 10 hours ago.

bors bot added a commit that referenced this pull request Jul 1, 2022
855: Miri: enable preemption again r=taiki-e a=RalfJung

Miri has a [work-around](rust-lang/miri#2248) for rust-lang/rust#55005 now.

Also, "check-number-validity" is on by default these days.  And I am not sure why "compare-exchange-weak-failure-rate=0.0" was set so let's see what happens when we remove it. :)

858: Ignore clippy::let_unit_value lint r=taiki-e a=taiki-e



Co-authored-by: Ralf Jung <post@ralfj.de>
Co-authored-by: Taiki Endo <te316e89@gmail.com>
@taiki-e
Copy link
Member

taiki-e commented Jul 1, 2022

bors r-

https://github.com/crossbeam-rs/crossbeam/runs/7142722550?check_suite_focus=true

test arc ... error: unsupported operation: integer-to-pointer casts and `ptr::from_exposed_addr` are not supported with `-Zmiri-strict-provenance`
0.000047      --> /home/runner/.cargo/registry/src/github.com-1ecc6299db9ec823/once_cell-1.12.0/src/imp_std.rs:175:30
0.000007       |
0.000005   175 |             let mut waiter = (queue & !STATE_MASK) as *const Waiter;
0.000006       |                              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ integer-to-pointer casts and `ptr::from_exposed_addr` are not supported with `-Zmiri-strict-provenance`
0.000005       |
0.000006       = help: use Strict Provenance APIs (https://doc.rust-lang.org/nightly/std/ptr/index.html#strict-provenance, https://crates.io/crates/sptr) instead
0.000005    

@bors
Copy link
Contributor

bors bot commented Jul 1, 2022

Canceled.

@RalfJung
Copy link
Contributor Author

RalfJung commented Jul 1, 2022

Yeah once_cell does not work with strict provenance. It never did. But that only became apparent with a preemptive scheduler (the casts only occur when threads actually race on a once_cell).

This depends on once_cell, which is not strict provenance compatible.
@taiki-e
Copy link
Member

taiki-e commented Jul 1, 2022

bors r+

@bors
Copy link
Contributor

bors bot commented Jul 1, 2022

Build succeeded:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants