Skip to content

Fix pipelined rendering shutdown deadlock#24059

Open
bryancostanich wants to merge 2 commits into
bevyengine:mainfrom
bryancostanich:fix/pipelined-rendering-shutdown-deadlock
Open

Fix pipelined rendering shutdown deadlock#24059
bryancostanich wants to merge 2 commits into
bevyengine:mainfrom
bryancostanich:fix/pipelined-rendering-shutdown-deadlock

Conversation

@bryancostanich
Copy link
Copy Markdown

@bryancostanich bryancostanich commented May 1, 2026

Hey folks! Bryan Costanich here. I'm loving Bevy, but I keep getting bit by this issue, so I decided to take a whack at fixing it.

This is the macOS shutdown deadlock from #12912 -- the freeze that #23838 didn't catch. App fires AppExit, main thread parks forever, you have to force-quit. About 20-40% of the time on Apple Silicon / Metal in tight loops, which is what made it so annoying to track down.

The root cause is what @msvbg sketched out in his comment on #12912: when RenderAppChannels drops, the main thread calls recv_blocking and parks. Meanwhile the render thread is mid-SubApp::update() and has queued a task that needs to run on the main thread (anything with NonSendMarker, e.g. create_surfaces). The render thread's MultiThreadedExecutor::run blocks waiting for that task. The main thread can't run the task because it's parked on recv_blocking. Both sides stuck.

Sample trace (sample on macOS):

main thread:
  World::clear_resources
    -> drop RenderAppChannels
      -> async_channel::RecvInner::wait
        -> pthread_cond_wait        (parked)

render thread:
  SubApp::update
    -> MultiThreadedExecutor::run
      -> block_on
        -> Inner::park
          -> pthread_cond_wait      (parked, waiting on a main-thread task)

If you want to repro: any DefaultPlugins app that fires AppExit shortly after the window comes up, run it in a loop of 20 with a 5-second timeout. You'll see 3-8 hangs per batch on a recent M-series Mac.

What I did

Stashed an Arc<ThreadExecutor> clone of MainThreadExecutor inside RenderAppChannels. In Drop, swapped recv_blocking for ComputeTaskPool::scope_with_executor so the main-thread executor keeps getting pumped while we wait for the render thread to return. That's already how renderer_extract handles the steady-state path -- I'm just applying it to the shutdown path too. Once main-thread tasks can complete, the render thread's block_on resolves, the SubApp comes back, and the channel's recv returns.

It's a small change: one new field on RenderAppChannels, one new constructor arg, and a rewrite of the Drop body.

Heads up: tiny breaking change

RenderAppChannels::new takes a third arg now -- the Arc<ThreadExecutor<'static>> from MainThreadExecutor.0. The struct is internal plumbing, but the constructor is pub, so I'm flagging it. Here's what PipelinedRenderingPlugin::cleanup looks like now:

let executor = app.world().get_resource::<MainThreadExecutor>().unwrap();
let main_thread_executor = executor.0.clone();
// ...
RenderAppChannels::new(
    app_to_render_sender,
    render_to_app_receiver,
    main_thread_executor,
)

If anyone downstream is constructing RenderAppChannels directly (probably nobody, this is pretty internal), they'll need to pass the executor in.

Testing

Added a regression test: pipelined_rendering::tests::drop_pumps_main_thread_executor_to_avoid_shutdown_deadlock. The fake "render thread" in the test does the same thing the real one does in the bad case -- spawns a task on the shared MainThreadExecutor and awaits it before sending the SubApp back. Without the fix the test hangs forever (I verified -- temporarily reverted Drop back to recv_blocking and watched it block past the 60s mark). With the fix it finishes immediately. Gated on feature = "multi_threaded" since ThreadExecutor::spawn only exists there.

I also stress-tested it end-to-end on a real wgpu/Metal app on macOS / Apple Silicon, with this patch cherry-picked onto release-0.18.1 and PipelinedRenderingPlugin re-enabled. Before the fix: 20-40% flake rate across 20+ runs through a few different exit paths (scheduled AppExit, programmatic WindowCloseRequested, timer-driven exit). After the fix: 30/30 clean across the same paths.

Happy to take review feedback. Thanks for all the great work on Bevy!

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 1, 2026

Welcome, new contributor!

Please make sure you've read our contributing guide, as well as our policy regarding AI usage, and we look forward to reviewing your pull request shortly ✨

@bryancostanich bryancostanich force-pushed the fix/pipelined-rendering-shutdown-deadlock branch 2 times, most recently from 63db147 to 57a0699 Compare May 1, 2026 17:10
PipelinedRenderingPlugin can deadlock during shutdown when the render
thread is mid-update at the moment the main thread tears down. The
main thread enters World::clear_resources, drops RenderAppChannels,
and calls recv_blocking on the render-to-app channel. Meanwhile the
render thread's MultiThreadedExecutor has queued tasks that need to
run on the main thread (non-Send resources, MainThreadExecutor-routed
work) and parks waiting for them. The main thread cannot run those
tasks because it is parked on recv_blocking. Mutual park, no progress.

Fix: stash an Arc<ThreadExecutor> clone of MainThreadExecutor inside
RenderAppChannels at construction, and in Drop use
ComputeTaskPool::scope_with_executor (the same pattern renderer_extract
uses) to pump the executor while waiting for the render thread to
return. Same shape as the steady-state path, just applied to shutdown.

The race is timing-sensitive (~20-40% repro rate on macOS / Apple
Silicon / Metal in test runs), which is why the issue has been hard
to pin down.
@bryancostanich bryancostanich force-pushed the fix/pipelined-rendering-shutdown-deadlock branch from 57a0699 to a7b3260 Compare May 1, 2026 18:16
@bryancostanich
Copy link
Copy Markdown
Author

hey all right, made it through the CI checks. that took some work. :D

@bryancostanich bryancostanich changed the title Fix pipelined rendering shutdown deadlock by pumping MainThreadExecutor in RenderAppChannels::Drop Fix pipelined rendering shutdown deadlock May 1, 2026
@kfc35 kfc35 added C-Bug An unexpected or incorrect behavior O-MacOS Specific to the MacOS (Apple) desktop operating system S-Needs-Review Needs reviewer attention (from anyone!) to move forward labels May 1, 2026
@JMS55 JMS55 requested a review from atlv24 May 2, 2026 00:10
@cart cart closed this May 5, 2026
@cart cart reopened this May 5, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

C-Bug An unexpected or incorrect behavior O-MacOS Specific to the MacOS (Apple) desktop operating system S-Needs-Review Needs reviewer attention (from anyone!) to move forward

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants