Skip to content

fix(runtime): thread-safety hardening for cross-thread runtime statics#1319

Merged
proggeramlug merged 2 commits into
mainfrom
fix/runtime-thread-safety-hardening
May 22, 2026
Merged

fix(runtime): thread-safety hardening for cross-thread runtime statics#1319
proggeramlug merged 2 commits into
mainfrom
fix/runtime-thread-safety-hardening

Conversation

@TheHypnoo
Copy link
Copy Markdown
Contributor

Summary

perry/thread workers run user JS code on background OS threads. The runtime exposed several process-wide static muts touched on every allocation, throw, and dispatch — concurrent worker code raced on shared mutable state, and stored arena-local pointers that became use-after-free when the originating thread exited.

This PR partitions the offending state per-thread (or per-process where pointer-identity must be global), with no codegen changes — every static mut removed was grep-confirmed unreferenced from crates/perry-codegen.

Before / After

# Issue Before After
1 INTERN_TABLE (8K entries) static mut, racy &mut INTERN_TABLE[slot] on every property-name allocation; cross-thread reads returned foreign-arena pointers thread_local! UnsafeCell; per-thread GC scanner sees only its own table
2 TRANSITION_CACHE_GLOBAL (16K entries) Same — raced under concurrent object construction; comment claimed codegen inlined against the #[no_mangle] symbol but grep showed zero codegen refs thread_local!; dead #[no_mangle] export removed
3 Exception state (JUMP_BUFFERS[128], TRY_DEPTH, CURRENT_EXCEPTION, HAS_EXCEPTION, IN_FINALLY) 5 process-wide static muts; concurrent throws raced on the depth counter and longjmp'd into stack frames that belonged to other threads Single ExceptionState struct in TLS; js_throw releases the TLS borrow before longjmp; per-thread GC scanner roots current_exception
4 SMALL_INT_CACHE (Number→string) static mut of arena pointers; cache populated on thread A handed back invalid pointers on thread B thread_local!; pinned StringHeaders per thread
5 HANDLE_METHOD_DISPATCH / HANDLE_PROPERTY_DISPATCH / HANDLE_PROPERTY_SET_DISPATCH static mut Option<fn> written once at startup, read from many threads without synchronization AtomicPtr<()> with Acquire/Release; safe accessors handle_method_dispatch() etc.
6 Symbol.for / well-known symbol descriptions Box::leak'd SymbolHeader carried a *mut StringHeader allocated in the calling thread's arena; freed when that worker exited, leaving a dangling pointer in the process-lifetime symbol New REGISTERED_SYMBOL_DESCRIPTIONS side table (HashMap<sym_ptr, Arc<str>>); readers materialize a fresh StringHeader in the caller's arena on demand
7 arena::ArenaBlock::alloc bump arithmetic Wrapping + for aligned_offset + size; a hostile size/align pair could wrap and return an in-bounds pointer for an out-of-bounds region All bump arithmetic via checked_add, matching the slow path that already did this
8 Closure capture write-barrier ordering Flagged by the audit; on review, callers store-then-barrier with the value rooted on the Rust stack between the two — the existing pattern is correct No code change; added a SAFETY comment so future maintainers don't try to "fix" it

Test plan

  • cargo build --release -p perry-runtime -p perry-stdlib -p perry — clean
  • cargo test --release -p perry-runtime --lib — 450 pass; 5 pre-existing flakes in event_pump::tests (timing-sensitive sustained_budget_zero_spin_is_throttled poisons the shared SERIAL mutex; commit ea6d467 already loosened this test once; grep confirms event_pump does not touch any state changed here)
  • ./run_parity_tests.sh --filter test_gap_ — 36/37 pass (97.2%); the lone failure is test_gap_class_advanced, a known failure listed in test-parity/known_failures.json for static-class-block lowering, unrelated to runtime statics

Notes

  • No Cargo.toml / CLAUDE.md / CHANGELOG.md edits — maintainer handles version + changelog at merge.
  • No codegen changes; every removed static mut was grep-confirmed unreferenced from crates/perry-codegen.

TheHypnoo added 2 commits May 22, 2026 10:33
`perry/thread` workers run user JS code on background OS threads, but
several runtime hot paths held process-wide `static mut`s that both
raced under concurrent access and stored arena-local pointers that
became use-after-free when the originating thread exited.

Partitions the offending state:

- INTERN_TABLE, TRANSITION_CACHE_GLOBAL, SMALL_INT_CACHE → thread_local!
- exception state (jump_buffers, try_depth, current_exception,
  has_exception, in_finally) → single per-thread ExceptionState in TLS
- HANDLE_METHOD_DISPATCH / HANDLE_PROPERTY_DISPATCH /
  HANDLE_PROPERTY_SET_DISPATCH → AtomicPtr with Acquire/Release
- Symbol.for / well-known symbol descriptions → process-lifetime
  Arc<str> side table; readers materialize a fresh StringHeader in the
  caller's arena on demand
- arena::ArenaBlock::alloc bump arithmetic → checked_add

No codegen changes: every removed `static mut` was grep-confirmed
unreferenced from crates/perry-codegen.
Review feedback on #1319: Symbol.for / well_known_symbol inserted the
pointer into the SYMBOL_REGISTRY / WELL_KNOWN_SYMBOLS map before
recording its description and adding it to SYMBOL_POINTERS. A
concurrent reader observing the pointer through the map would see:

  - registered_symbol_description() = None
  - description field = null
  - is_registered_symbol() = false

…producing transiently wrong sym.description / sym.toString() /
Symbol.keyFor() / is_symbol() results — exactly the half-initialized
visibility pattern this PR was supposed to eliminate.

Populate REGISTERED_SYMBOL_DESCRIPTIONS and SYMBOL_POINTERS first, then
publish into the registry/cache. Lock order is
SYMBOL_REGISTRY → SYMBOL_POINTERS → REGISTERED_SYMBOL_DESCRIPTIONS;
no reader takes them in the reverse order.
@proggeramlug proggeramlug merged commit fdbe19e into main May 22, 2026
9 checks passed
@proggeramlug proggeramlug deleted the fix/runtime-thread-safety-hardening branch May 22, 2026 09:02
proggeramlug added a commit that referenced this pull request May 22, 2026
…sweep (#1414)

Rolls up 26 PRs that merged to main post-v0.5.1023 without version
bumps:

- node:crypto gap-fixes (#1386 #1393 #1394 #1402 #1405): randomInt,
  timingSafeEqual, getHashes/getCiphers, sha224/sha384, base64 digest,
  Buffer hash input, no-arg digest() → Buffer, pbkdf2Sync digest arg,
  scryptSync.
- node:perf_hooks (#1321 + #1328 #1342 coverage): performance + User
  Timing + PerformanceObserver native impl, granular node-suite +
  edge-case coverage.
- #1090 GC checkpoint runtime work (#1324).
- #1311 geisterhand on iOS (#1316 #1383 #1384 #1385).
- #1312 process.env.X (unset) is nullish undefined (#1314).
- #1319 thread-safety hardening for cross-thread runtime statics.
- #1322 exact-head GC evidence packet.
- #1323 wasm timers dispatch through mem_call bridge (#1329).
- #1317 node:timers/promises shadow-segfault fix (#1326).
- #1330 node:process suite (#1331).
- #1292 bcrypt.hash() returns String (#1307).
- #1293 fastify .json()/.body external-fastify dispatch (#1308).
- #1296 app pattern performance gaps.
- #1297 diagnostics_channel parity.
- #1301 iOS App Groups capability (#1313).
- #1318 #1325 os/methods/modern-methods static dispatch.
- #1315 expanded Node parity test coverage.
- #1382 ui-ios stdlib pump for async fetch.
- #1392 ui-wasm reactive state + setText (#1404).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants