Print a failing schedule even during double panics #40

jamesbornholt · 2021-07-06T19:40:56Z

Double panics are a common problem in concurrent Rust code. For example,
a thread might panic while holding a lock, and then need to acquire that
lock again during stack unwinding, but the lock is already poisoned. In
these cases, Shuttle currently can't print the schedule that led to the
original panic, because the double panic aborts the process before our
catch_unwind has a chance to run.

This change adds a new panic hook that gets a chance to run before the
double panic happens. We use this hook to print the schedule, so that
even if we double panic in future at least the user gets some output
they can use to reproduce the problem. There's some trickiness here
(outlined in a module comment for failure.rs) around running different
panic handlers at different times, which makes the code a little more
complex than I would have hoped, but it gets the job done.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

Double panics are a common problem in concurrent Rust code. For example, a thread might panic while holding a lock, and then need to acquire that lock again during stack unwinding, but the lock is already poisoned. In these cases, Shuttle currently can't print the schedule that lead to the original panic, because the double panic aborts the process before our `catch_unwind` has a chance to run. This change adds a new panic hook that gets a chance to run before the double panic happens. We use this hook to print the schedule, so that even if we double panic in future at least the user gets some output they can use to reproduce the problem. There's some trickiness here (outlined in a module comment for `failure.rs`) around running different panic handlers at different times, which makes the code a little more complex than I would have hoped, but it gets the job done.

jamesbornholt · 2021-07-07T18:11:16Z

Can we grab the schedule generated by the test, and add another (also ignored) test that replays it? Just to make sure, for instance, that the printed schedule is really complete?

The schedule actually won't be complete in this double panic scenario, as it'll miss whatever steps were taken during unwinding. I don't think there's a good way to fix that; we'd have to intercept the second panic somehow. Instead, what happens is that the replay successfully makes it to the first panic, which it reproduces, and then panics again because the schedule ends early. Same double-panic effect, just that the second panic is different.

jamesbornholt requested a review from jorajeev July 6, 2021 19:40

jamesbornholt force-pushed the panic-hook branch from 3579cd1 to 3e4e9e9 Compare July 7, 2021 03:50

jorajeev approved these changes Jul 7, 2021

View reviewed changes

jorajeev merged commit 1f2f98f into awslabs:main Jul 7, 2021

jamesbornholt deleted the panic-hook branch August 16, 2021 17:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Print a failing schedule even during double panics #40

Print a failing schedule even during double panics #40

jamesbornholt commented Jul 6, 2021 •

edited

jamesbornholt commented Jul 7, 2021

Print a failing schedule even during double panics #40

Print a failing schedule even during double panics #40

Conversation

jamesbornholt commented Jul 6, 2021 • edited

jamesbornholt commented Jul 7, 2021

jamesbornholt commented Jul 6, 2021 •

edited