Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Print a failing schedule even during double panics #40

Merged
merged 1 commit into from Jul 7, 2021

Conversation

jamesbornholt
Copy link
Member

@jamesbornholt jamesbornholt commented Jul 6, 2021

Double panics are a common problem in concurrent Rust code. For example,
a thread might panic while holding a lock, and then need to acquire that
lock again during stack unwinding, but the lock is already poisoned. In
these cases, Shuttle currently can't print the schedule that led to the
original panic, because the double panic aborts the process before our
catch_unwind has a chance to run.

This change adds a new panic hook that gets a chance to run before the
double panic happens. We use this hook to print the schedule, so that
even if we double panic in future at least the user gets some output
they can use to reproduce the problem. There's some trickiness here
(outlined in a module comment for failure.rs) around running different
panic handlers at different times, which makes the code a little more
complex than I would have hoped, but it gets the job done.


By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

Double panics are a common problem in concurrent Rust code. For example,
a thread might panic while holding a lock, and then need to acquire that
lock again during stack unwinding, but the lock is already poisoned. In
these cases, Shuttle currently can't print the schedule that lead to the
original panic, because the double panic aborts the process before our
`catch_unwind` has a chance to run.

This change adds a new panic hook that gets a chance to run before the
double panic happens. We use this hook to print the schedule, so that
even if we double panic in future at least the user gets some output
they can use to reproduce the problem. There's some trickiness here
(outlined in a module comment for `failure.rs`) around running different
panic handlers at different times, which makes the code a little more
complex than I would have hoped, but it gets the job done.
@jamesbornholt
Copy link
Member Author

Can we grab the schedule generated by the test, and add another (also ignored) test that replays it? Just to make sure, for instance, that the printed schedule is really complete?

The schedule actually won't be complete in this double panic scenario, as it'll miss whatever steps were taken during unwinding. I don't think there's a good way to fix that; we'd have to intercept the second panic somehow. Instead, what happens is that the replay successfully makes it to the first panic, which it reproduces, and then panics again because the schedule ends early. Same double-panic effect, just that the second panic is different.

@jorajeev jorajeev merged commit 1f2f98f into awslabs:main Jul 7, 2021
@jamesbornholt jamesbornholt deleted the panic-hook branch August 16, 2021 17:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants