Skip to content

Fix data races in Apple WebSocket for TSan safety#27

Merged
bghgary merged 4 commits intoBabylonJS:mainfrom
bghgary:fix/tsan-websocket-race
Apr 16, 2026
Merged

Fix data races in Apple WebSocket for TSan safety#27
bghgary merged 4 commits intoBabylonJS:mainfrom
bghgary:fix/tsan-websocket-race

Conversation

@bghgary
Copy link
Copy Markdown
Contributor

@bghgary bghgary commented Apr 16, 2026

[Created by Copilot on behalf of @bghgary]

Fix data races in Apple WebSocket detected by ThreadSanitizer.

Problem

TSan detected races in WebSocket_Apple_ObjC.m: the JS thread calling close() writes to webSocketTask/session while GCD delegate threads read them concurrently in didCompleteWithError, receiveMessage, etc. No synchronization existed.

Fix

  • Add @synchronized(self) to all methods accessing webSocketTask, session, and callbacks
  • Copy ivars to locals under the lock, operate on locals outside the lock
  • open: wraps session/webSocketTask assignment in @synchronized, copies task to local before resume
  • close: routes through invalidateAndCancelWithCloseCode instead of duplicating teardown logic
  • didOpenWithProtocol: copies openCallback under @synchronized before invoking
  • invalidateAndCancelWithCloseCode: nil-guards closeCb invocation for safety
  • Preserves close-once semantics from Fire onClose after onError on all WebSocket error paths #26 while making them thread-safe

Style

  • Standardize pointer style to Type* (C++ convention) throughout the file, matching the majority of .mm files in the BabylonNative ecosystem

bghgary and others added 2 commits April 16, 2026 11:39
All accesses to session/webSocketTask lifecycle state are now protected
with @synchronized(self). Callbacks are invoked outside the lock to
avoid reentrancy issues. The close-once check, state transition, and
callback invocation follow the pattern: lock → copy state + nil ivars →
unlock → cancel/invalidate → invoke callbacks.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings April 16, 2026 18:43
bghgary added a commit to bghgary/JsRuntimeHost that referenced this pull request Apr 16, 2026
- Add tsan_suppressions.txt to suppress JavaScriptCore/WTF/bmalloc
  internal data races on Ubuntu (not fixable by us)
- Wire suppressions into linux.yml for TSan jobs via TSAN_OPTIONS
- Point arcana.cpp at upstream (microsoft/arcana.cpp#61 merged)
- Update UrlLib fork SHA (post-rebase, pending BabylonJS/UrlLib#27)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR aims to eliminate ThreadSanitizer-reported data races in the Apple WebSocket implementation by synchronizing access to shared ivars (session, webSocketTask) and standardizing pointer formatting in WebSocket_Apple_ObjC.m.

Changes:

  • Added @synchronized(self) guards around several reads/writes of session/webSocketTask, and copied ivars to locals before operating outside the lock.
  • Updated send/receive paths to safely snapshot webSocketTask (and messageCallback) before use.
  • Standardized pointer style in the Objective-C implementation to Type*.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread Source/WebSocket_Apple_ObjC.m Outdated
Comment thread Source/WebSocket_Apple_ObjC.m
Comment thread Source/WebSocket_Apple_ObjC.m
- close now routes through invalidateAndCancelWithCloseCode instead of
  duplicating the teardown logic
- open wraps session/webSocketTask assignment in @synchronized
- didOpenWithProtocol copies openCallback under @synchronized before invoking

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copy link
Copy Markdown
Contributor Author

@bghgary bghgary left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Responded by Copilot on behalf of @bghgary] All three review comments addressed in 45808ef.

- Copy webSocketTask to local inside @synchronized before calling resume
- Guard closeCb invocation with nil check for consistency

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@bghgary bghgary merged commit d251ad4 into BabylonJS:main Apr 16, 2026
1 check passed
bghgary added a commit to bghgary/JsRuntimeHost that referenced this pull request Apr 16, 2026
Switches UrlLib source from bghgary/UrlLib fork back to BabylonJS/UrlLib
now that the Apple WebSocket data race fixes are merged upstream
(BabylonJS/UrlLib#27).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
bghgary added a commit to BabylonJS/JsRuntimeHost that referenced this pull request Apr 17, 2026
Add ThreadSanitizer (TSan) support for detecting data races at runtime.

[Created by Copilot on behalf of @bghgary]

## Changes

- **CMakeLists.txt**: New `ENABLE_THREAD_SANITIZER` option with
`-fsanitize=thread`. Includes a mutual exclusion check with
`ENABLE_SANITIZERS` since TSan and ASan cannot be combined.
- **.github/workflows/build-linux.yml**: New `enable-thread-sanitizer`
input, passed as `ENABLE_THREAD_SANITIZER` to CMake. Wires
`TSAN_OPTIONS` with suppression file and sets `JSC_useConcurrentGC=0`
for the Run Tests step (see below).
- **.github/workflows/build-macos.yml**: New `enable-thread-sanitizer`
input; wires `TSAN_OPTIONS`.
- **.github/workflows/ci.yml**: New `Ubuntu_ThreadSanitizer_clang` and
`macOS_Xcode164_ThreadSanitizer` jobs.
- **.github/tsan_suppressions.txt**: Suppresses JSC-internal data races
on Ubuntu (`called_from_lib:libjavascriptcoregtk`). These are
allocator-level races in JSC's JIT worker threads that we cannot fix.

## Linux TSan: Concurrent GC Workaround

JSC on Linux uses `pthread_kill(tid, SIGUSR1)` + `sem_wait` in
`WTF::Thread::suspend()` to stop mutator threads at GC safepoints.
TSan's signal interception defers SIGUSR1 delivery indefinitely when the
target is inside instrumented code; the handler never runs and the
Collector Thread's `sem_wait` deadlocks. macOS JSC uses Mach
`thread_suspend()` (no Unix signals) and is unaffected.

Setting `JSC_useConcurrentGC=0` on the Linux TSan job removes the
dedicated Collector Thread; GC runs on the mutator without cross-thread
signals. Reproduced locally in WSL Ubuntu: default configuration hung
9/20 runs; with `JSC_useConcurrentGC=0`, 0/30 runs hung.

## Dependency Updates

- **arcana.cpp**: Points to upstream `microsoft/arcana.cpp` (includes
#61 — TSan-safe test hook callback mutex).
- **UrlLib**: Points to upstream `BabylonJS/UrlLib` at `d251ad44`
(includes BabylonJS/UrlLib#27 — Apple WebSocket `@synchronized` fixes).

## Platform Support

TSan is supported on Linux and macOS with Clang/GCC. MSVC does not
support TSan and Clang targeting Windows does not have a TSan runtime
library.

## CI Impact

The TSan jobs run in parallel with other jobs and do not increase
overall pipeline time.

## Status

- macOS TSan: passes clean
- Ubuntu TSan: JSC-internal races suppressed via
`called_from_lib:libjavascriptcoregtk`; concurrent GC disabled to avoid
TSan/signal deadlock

---------

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants