Skip to content

Fix WorkQueue destructor deadlock by merging into AppRuntime#147

Open
bghgary wants to merge 3 commits intoBabylonJS:mainfrom
bghgary:fix/workqueue-deadlock
Open

Fix WorkQueue destructor deadlock by merging into AppRuntime#147
bghgary wants to merge 3 commits intoBabylonJS:mainfrom
bghgary:fix/workqueue-deadlock

Conversation

@bghgary
Copy link
Copy Markdown
Contributor

@bghgary bghgary commented Apr 2, 2026

Fix the WorkQueue destructor deadlock by merging WorkQueue into AppRuntime and dispatching cancellation as a work item.

The Bug

WorkQueue::~WorkQueue() cancelled from the main thread via cancel() + notify_all(). The notify fired without holding the queue mutex, so if the worker thread hadn't entered condition_variable::wait() yet, the signal was lost and join() hung forever. See #146 for a deterministic repro of the deadlock against the old code.

The Fix

  • Merge WorkQueue into AppRuntime: Eliminates split-lifetime issues where the worker thread could access partially-destroyed AppRuntime members during shutdown.
  • Dispatch cancellation as a work item: Append(cancel) calls push() which acquires the queue mutex, so it blocks until the worker enters wait(). Push completes, notifies, worker wakes. No lost signal possible.
  • Fix member declaration order: m_options now declared before the worker thread members so it outlives them during destruction.

Regression Test

Includes a deterministic test using arcana testing hooks (microsoft/arcana.cpp#59) that sleeps while holding the queue mutex before wait(). This guarantees the worker is in the vulnerable window when destruction fires. The test passes with this fix and deadlocks with the old code (#146).

Dependencies

@bghgary bghgary force-pushed the fix/workqueue-deadlock branch 2 times, most recently from 8649d1f to 3e5dd39 Compare April 2, 2026 20:32
@bghgary bghgary marked this pull request as ready for review April 2, 2026 21:02
Copilot AI review requested due to automatic review settings April 2, 2026 21:02
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes a shutdown deadlock in the JavaScript worker queue by inlining the former WorkQueue implementation into Babylon::AppRuntime and changing cancellation to be dispatched as a queued work item, then adds a deterministic regression test using arcana testing hooks to reproduce the old race.

Changes:

  • Merge WorkQueue into AppRuntime (thread, dispatcher, cancellation, suspend/resume) and cancel via queued work item.
  • Add a deterministic unit test (DestroyDoesNotDeadlock) guarded by ARCANA_TESTING_HOOKS.
  • Update test harness / build configuration (Win32 gtest argv init; arcana fork + ARCANA_TESTING_HOOKS definition).

Reviewed changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
Tests/UnitTests/Win32/App.cpp Initializes gtest with argc/argv and runs tests directly.
Tests/UnitTests/Shared/Shared.cpp Adds deterministic deadlock regression test using arcana hooks.
Core/AppRuntime/Source/WorkQueue.h Removes standalone WorkQueue (merged into AppRuntime).
Core/AppRuntime/Source/WorkQueue.cpp Removes standalone WorkQueue implementation.
Core/AppRuntime/Source/AppRuntime.cpp Moves queue/thread/cancel/suspend logic into AppRuntime and cancels via queued work item.
Core/AppRuntime/Source/AppRuntime_JSI.cpp Switches V8 JSI task runner adapter to post tasks via AppRuntime::Dispatch.
Core/AppRuntime/Include/Babylon/AppRuntime.h Adds dispatcher/thread/cancellation members and in-class Append helper.
Core/AppRuntime/CMakeLists.txt Removes WorkQueue sources from build.
CMakeLists.txt Switches arcana dependency to a fork and defines ARCANA_TESTING_HOOKS when tests are enabled.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@bghgary bghgary force-pushed the fix/workqueue-deadlock branch 2 times, most recently from 8452ce3 to 8ac4ba4 Compare April 2, 2026 21:12
WorkQueue::~WorkQueue() had a race condition where cancel() + notify_all()
fired without the queue mutex, so the signal could be lost if the worker
thread hadn't entered condition_variable::wait() yet, causing join() to
hang forever.

This change merges WorkQueue into AppRuntime, eliminating split-lifetime
issues, and dispatches cancellation as a work item via Append(). Since
push() acquires the queue mutex, it blocks until the worker enters wait(),
guaranteeing the notification is delivered.

Changes:
- Merge WorkQueue members into AppRuntime (thread, dispatcher, cancel
  source, env, suspension lock)
- Remove WorkQueue.h and WorkQueue.cpp
- Update AppRuntime_JSI.cpp TaskRunnerAdapter to use AppRuntime::Dispatch
- Add deterministic regression test using arcana testing hooks
- Fix member declaration order so m_options outlives worker thread

The regression test uses arcana::set_before_wait_callback() to sleep while
holding the queue mutex before wait(), deterministically triggering the race.
See BabylonJS#146 for the test running against the old broken code.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@bghgary bghgary force-pushed the fix/workqueue-deadlock branch from 8ac4ba4 to 354f12a Compare April 2, 2026 23:26
@bghgary
Copy link
Copy Markdown
Contributor Author

bghgary commented Apr 2, 2026

⚠️ Do not merge until microsoft/arcana.cpp#59 is merged. The arcana.cpp FetchContent currently points to bghgary/arcana.cpp. Once the upstream PR merges, update the GIT_REPOSITORY and GIT_TAG back to microsoft/arcana.cpp.

bghgary and others added 2 commits April 2, 2026 16:59
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants