perf(#361): skip realm-proto wrappers in call_value for same-realm callees#368
Conversation
📝 WalkthroughWalkthroughAdds a private ChangesCross-realm callee fast path
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 57d70339b3
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
Benchmark ResultsRun: https://github.com/dowdiness/js_engine/actions/runs/27615428559
Stage summary
Focused bytecode base-vs-head comparison
Base-vs-head comparison
Mean-time chart (log scale)
Closure-conversion comparison
|
…llees Add a fast path in Interpreter::call_value that bypasses both with_active_value + with_active_callee_realm_value when every active- override slot is None and every callee stamped proto matches the main realm's (or is absent). Storage: pack all 10 realm proto slots into a single 10-element Array[Value] at symbol key -101 (FUNCTION_REALM_PROTOS_PACKED_SYMBOL_ID) instead of 10 separate symbol_properties entries. The fast path check does 1 HashMap lookup + 10 sequential array reads + 10 physical_equal comparisons vs 10 HashMap lookups in the earlier full-10-slot approach. Keys -102..-110 are freed. Safety: all 10 active-override Ref slots are checked (Part 1) and all 10 callee stamped proto slots are compared (Part 2), so the check is safe even when stamp_function_realm_with sets slots independently or when a non-function override is active. Two whitebox positive-control tests verify that the OLD single-proto check would have failed both scenarios (active_array_prototype_override set, mixed-stamp with foreign array proto) while the full check correctly blocks the fast path. Benchmark improvement vs post-#367 baseline (10-run median): isolate/bytecode/call_frame: 7.514 → 6.295 ms (−16%) isolate/bytecode/method_call: 8.562 → 7.086 ms (−17%) isolate/bytecode/runtime_helpers: 10.369 → 9.013 ms (−13%) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
57d7033 to
96355bb
Compare
…rotos?] (#369) (#369) * perf: replace 10 Ref reads in realm_fast_path_allowed with single Bool Replace the 10 double-indirect Ref reads in Part 1 of realm_fast_path_allowed with a single Bool field (has_active_override) maintained by apply_active_realm_protos. The Bool is set to true when any active override is Some, false when all are None. Ablation established that the 10 reads + HashMap lookup consumed 14-22% of per-call time (PR #367/#368 session). This change eliminates Part 1 (10 x 2 pointer dereferences) at the cost of 1 direct field read. Measured gain on JS target (5-run median): call_frame 6.66 ms → 6.25 ms (−6.2%, CV 4.7% → 0.9%) method_call 7.66 ms → 7.01 ms (−8.5%) runtime_helpers 9.49 ms → 8.79 ms (−7.4%) local_access (control): noise only All 7 raw direct-write sites in wbtest files updated to maintain the invariant (has_active_override == OR of all 10 active_*_override Refs). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix: update remaining has_active_override write sites in has_property_realm_wbtest Three direct writes to active_{map,set,promise}_prototype_override.val in has_property_realm_wbtest.mbt were missing the companion has_active_override = true update. All 10 direct raw write sites in the source tree are now audited and consistent with the invariant: has_active_override == (any active_*_prototype_override.val is Some). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(perf): replace 10-Ref + Bool cache with Ref[FunctionRealmProtos?] PR #369 introduced a stale-cache hazard: the 10 active_*_prototype_override Ref fields remained publicly writable (RealmState is pub(all)), so external code writing any Ref directly would bypass has_active_override and allow call_value to take the fast path incorrectly. Fix: collapse the 10 individual Refs + Bool into a single Ref[FunctionRealmProtos?] on RealmState. - None = no cross-realm context active - Some(protos) = at least one override set The "any active?" check in realm_fast_path_allowed Part 1 is now `active_overrides.val is None` — one Ref read. This is structurally correct regardless of any external write, because writing to the single combined Ref atomically updates both the proto data AND the "any active?" check. No separate Bool cache exists to become stale. Public API changes: - Remove: 10 active_*_prototype_override fields, has_active_override Bool - Add: active_overrides : Ref[FunctionRealmProtos?] - Promote to pub: apply_active_realm_protos, FunctionRealmProtos struct + constructor All wbtest direct-write sites updated to use apply_active_realm_protos. apply_active_realm_protos is now simpler: 10 Ref writes → 1 Option write. active_realm_protos is now simpler: 10 .val reads → unwrap one Option. 2096/2096 tests green. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Summary
Motivation
The two realm-proto wrapper layers in `call_value` perform ~80 Ref ops + 10 HashMap lookups on every function call to save/apply/restore 10 prototype override slots. In single-realm bytecode execution (the common case), both wrappers are pure overhead.
Fast path correctness
Two parts, both required:
Part 1 — all 10 `active_*_prototype_override` Refs are `None`. Checked individually; sampling a subset is unsafe because the `RealmState` Refs are public.
Part 2 — every callee stamped proto slot is absent (Null in the packed array) or pointer-equals the main realm's corresponding proto. All 10 slots are checked; `stamp_function_realm_with` accepts them independently, so sampling one slot is unsafe.
When both hold, the wrappers produce identical results to skipping them.
Two whitebox positive-control tests in `factories_wbtest.mbt` verify scenarios the old single-proto check would have passed incorrectly:
Storage change
Realm protos were stored as 10 separate entries in `symbol_properties` (keys -101..-110). They are now packed into a single 10-element `Array[Value]` at key -101 (`FUNCTION_REALM_PROTOS_PACKED_SYMBOL_ID`). Keys -102..-110 are freed. The fast path reads the array elements directly (sequential memory, warm cache line) rather than doing 10 independent HashMap lookups.
Benchmark (10-run median, JS target, vs post-#367 baseline)
Test plan
🤖 Generated with Claude Code