Fix pipelined rendering shutdown deadlock#24059
Open
bryancostanich wants to merge 2 commits into
Open
Conversation
Contributor
|
Welcome, new contributor! Please make sure you've read our contributing guide, as well as our policy regarding AI usage, and we look forward to reviewing your pull request shortly ✨ |
63db147 to
57a0699
Compare
PipelinedRenderingPlugin can deadlock during shutdown when the render thread is mid-update at the moment the main thread tears down. The main thread enters World::clear_resources, drops RenderAppChannels, and calls recv_blocking on the render-to-app channel. Meanwhile the render thread's MultiThreadedExecutor has queued tasks that need to run on the main thread (non-Send resources, MainThreadExecutor-routed work) and parks waiting for them. The main thread cannot run those tasks because it is parked on recv_blocking. Mutual park, no progress. Fix: stash an Arc<ThreadExecutor> clone of MainThreadExecutor inside RenderAppChannels at construction, and in Drop use ComputeTaskPool::scope_with_executor (the same pattern renderer_extract uses) to pump the executor while waiting for the render thread to return. Same shape as the steady-state path, just applied to shutdown. The race is timing-sensitive (~20-40% repro rate on macOS / Apple Silicon / Metal in test runs), which is why the issue has been hard to pin down.
57a0699 to
a7b3260
Compare
Author
|
hey all right, made it through the CI checks. that took some work. :D |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Hey folks! Bryan Costanich here. I'm loving Bevy, but I keep getting bit by this issue, so I decided to take a whack at fixing it.
This is the macOS shutdown deadlock from #12912 -- the freeze that #23838 didn't catch. App fires
AppExit, main thread parks forever, you have to force-quit. About 20-40% of the time on Apple Silicon / Metal in tight loops, which is what made it so annoying to track down.The root cause is what @msvbg sketched out in his comment on #12912: when
RenderAppChannelsdrops, the main thread callsrecv_blockingand parks. Meanwhile the render thread is mid-SubApp::update()and has queued a task that needs to run on the main thread (anything withNonSendMarker, e.g.create_surfaces). The render thread'sMultiThreadedExecutor::runblocks waiting for that task. The main thread can't run the task because it's parked onrecv_blocking. Both sides stuck.Sample trace (
sampleon macOS):If you want to repro: any
DefaultPluginsapp that firesAppExitshortly after the window comes up, run it in a loop of 20 with a 5-second timeout. You'll see 3-8 hangs per batch on a recent M-series Mac.What I did
Stashed an
Arc<ThreadExecutor>clone ofMainThreadExecutorinsideRenderAppChannels. InDrop, swappedrecv_blockingforComputeTaskPool::scope_with_executorso the main-thread executor keeps getting pumped while we wait for the render thread to return. That's already howrenderer_extracthandles the steady-state path -- I'm just applying it to the shutdown path too. Once main-thread tasks can complete, the render thread'sblock_onresolves, theSubAppcomes back, and the channel'srecvreturns.It's a small change: one new field on
RenderAppChannels, one new constructor arg, and a rewrite of theDropbody.Heads up: tiny breaking change
RenderAppChannels::newtakes a third arg now -- theArc<ThreadExecutor<'static>>fromMainThreadExecutor.0. The struct is internal plumbing, but the constructor ispub, so I'm flagging it. Here's whatPipelinedRenderingPlugin::cleanuplooks like now:If anyone downstream is constructing
RenderAppChannelsdirectly (probably nobody, this is pretty internal), they'll need to pass the executor in.Testing
Added a regression test:
pipelined_rendering::tests::drop_pumps_main_thread_executor_to_avoid_shutdown_deadlock. The fake "render thread" in the test does the same thing the real one does in the bad case -- spawns a task on the sharedMainThreadExecutorand awaits it before sending theSubAppback. Without the fix the test hangs forever (I verified -- temporarily revertedDropback torecv_blockingand watched it block past the 60s mark). With the fix it finishes immediately. Gated onfeature = "multi_threaded"sinceThreadExecutor::spawnonly exists there.I also stress-tested it end-to-end on a real wgpu/Metal app on macOS / Apple Silicon, with this patch cherry-picked onto
release-0.18.1andPipelinedRenderingPluginre-enabled. Before the fix: 20-40% flake rate across 20+ runs through a few different exit paths (scheduledAppExit, programmaticWindowCloseRequested, timer-driven exit). After the fix: 30/30 clean across the same paths.Happy to take review feedback. Thanks for all the great work on Bevy!