Skip to content

perf(#361): skip realm-proto wrappers in call_value for same-realm callees#368

Merged
dowdiness merged 1 commit into
mainfrom
perf/call-value-realm-fast-path
Jun 16, 2026
Merged

perf(#361): skip realm-proto wrappers in call_value for same-realm callees#368
dowdiness merged 1 commit into
mainfrom
perf/call-value-realm-fast-path

Conversation

@dowdiness

@dowdiness dowdiness commented Jun 16, 2026

Copy link
Copy Markdown
Owner

Summary

  • Adds a fast path in `Interpreter::call_value` that skips both `with_active_value` + `with_active_callee_realm_value` wrappers when no active-override slot is set and the callee belongs to the main realm
  • Packs all 10 realm proto slots into a single `Array[Value]` entry at symbol key `-101`, reducing the fast-path check from 10 HashMap lookups to 1 lookup + 10 sequential array reads

Motivation

The two realm-proto wrapper layers in `call_value` perform ~80 Ref ops + 10 HashMap lookups on every function call to save/apply/restore 10 prototype override slots. In single-realm bytecode execution (the common case), both wrappers are pure overhead.

Fast path correctness

Two parts, both required:

  1. Part 1 — all 10 `active_*_prototype_override` Refs are `None`. Checked individually; sampling a subset is unsafe because the `RealmState` Refs are public.

  2. Part 2 — every callee stamped proto slot is absent (Null in the packed array) or pointer-equals the main realm's corresponding proto. All 10 slots are checked; `stamp_function_realm_with` accepts them independently, so sampling one slot is unsafe.

When both hold, the wrappers produce identical results to skipping them.

Two whitebox positive-control tests in `factories_wbtest.mbt` verify scenarios the old single-proto check would have passed incorrectly:

  • Active `array_prototype_override` set while `function` slot is None
  • Callee stamped with foreign `Array.prototype` while `function`/`object` slots match the realm

Storage change

Realm protos were stored as 10 separate entries in `symbol_properties` (keys -101..-110). They are now packed into a single 10-element `Array[Value]` at key -101 (`FUNCTION_REALM_PROTOS_PACKED_SYMBOL_ID`). Keys -102..-110 are freed. The fast path reads the array elements directly (sequential memory, warm cache line) rather than doing 10 independent HashMap lookups.

Benchmark (10-run median, JS target, vs post-#367 baseline)

benchmark before after delta
`isolate/bytecode/call_frame` 7.514 ms 6.295 ms −16%
`isolate/bytecode/method_call` 8.562 ms 7.086 ms −17%
`isolate/bytecode/runtime_helpers` 10.369 ms 9.013 ms −13%

Test plan

  • `moon test` passes 2096/2096 (including 2 new positive-control fast-path whitebox tests)
  • `moon check --deny-warn` clean
  • CI test262 (cross-realm regression gate)

🤖 Generated with Claude Code

@coderabbitai

coderabbitai Bot commented Jun 16, 2026

Copy link
Copy Markdown

Review Change Stack

📝 Walkthrough

Walkthrough

Adds a private is_cross_realm_callee helper in factories.mbt that detects cross-realm function objects via a realm-stamp slot and proxy-following recursion. call_value in call.mbt uses this helper to skip the with_active_value/with_active_callee_realm_value wrappers when no prototype override is active and the callee is same-realm.

Changes

Cross-realm callee fast path

Layer / File(s) Summary
is_cross_realm_callee helper
interpreter/runtime/factories.mbt
New private function reads FUNCTION_REALM_FUNCTION_PROTO_SYMBOL_ID from the callee, follows proxy targets recursively, and returns false for non-function or uninitialized callees.
Fast path in call_value
interpreter/runtime/call.mbt
call_value now calls call_value_impl directly when active_function_prototype_override is absent and the callee is not cross-realm; the existing realm-wrapper path is kept for all other cases.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Poem

🐇 Hop, hop, skip the wrapper wide,
When realms align, no need to hide!
A stamp on proto tells the tale—
Same realm? Fast path! No need to flail.
The rabbit zips through, quick as a blink,
Less wrapping means more time to think! ✨

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly describes the main optimization: skipping realm-proto wrappers for same-realm callees, which directly matches the PR's primary change and performance objective.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch perf/call-value-realm-fast-path

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 57d70339b3

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread interpreter/runtime/call.mbt Outdated
@github-actions

github-actions Bot commented Jun 16, 2026

Copy link
Copy Markdown
Contributor

Benchmark Results

Run: https://github.com/dowdiness/js_engine/actions/runs/27615428559

startup/tiny_program is the PR #153 / issue #141 guardrail for built-in realm-stamping startup cost.

Stage summary

stage benchmarks total mean slowest benchmark slowest mean noisy rows
startup 3 2.282 ms startup/tiny_program 1.158 ms 0
frontend 7 0.871 ms pipeline/parse_heavy 0.497 ms 2
execution 25 14313.208 ms exec/fibonacci_30 13020.802 ms 2

Focused bytecode base-vs-head comparison

Base-vs-head deltas are reporting-only. Negative delta and PR/base < 1.00x mean the PR is faster; interpret high-CV or noisy rows cautiously.

benchmark stage base mean PR mean delta PR/base base CV PR CV noisy
baseline/bytecode/closure_factory execution 14.931 ms 12.643 ms -15.3% 0.85x 5.3% 4.3% no
pipeline/bytecode/evaluate execution 9.956 ms 9.211 ms -7.5% 0.93x 3.3% 2.6% no
isolate/bytecode/call_frame execution 10.351 ms 8.541 ms -17.5% 0.83x 0.8% 4.7% no
isolate/bytecode/runtime_helpers execution 13.863 ms 12.221 ms -11.8% 0.88x 0.9% 1.2% no
isolate/bytecode/local_access execution 36.250 ms 36.966 ms +2.0% 1.02x 1.2% 2.3% no
isolate/bytecode/env_access execution 39.620 ms 36.754 ms -7.2% 0.93x 2.2% 1.9% no
isolate/bytecode/captured_access execution 35.661 ms 37.575 ms +5.4% 1.05x 1.3% 1.5% no
isolate/bytecode/dispatch_stack execution 22.649 ms 23.935 ms +5.7% 1.06x 1.0% 2.6% no

Base-vs-head comparison

benchmark stage base mean PR mean delta PR/base base CV PR CV noisy
startup/tiny_program startup 1.415 ms 1.158 ms -18.2% 0.82x 5.5% 6.7% no
lexer/small frontend 0.031 ms 0.030 ms -3.8% 0.96x 23.2% 19.3% base, PR
lexer/large frontend 0.266 ms 0.264 ms -0.6% 0.99x 3.7% 7.9% no
exec/fibonacci_30 execution 15188.602 ms 13020.802 ms -14.3% 0.86x 0.3% 0.7% no
exec/property_chain execution 13.739 ms 13.616 ms -0.9% 0.99x 7.5% 14.6% no
startup/phase/parse_tiny frontend 0.002 ms 0.002 ms +7.7% 1.08x 0.8% 0.6% no
startup/phase/new_interpreter startup 1.390 ms 1.124 ms -19.1% 0.81x 11.9% 13.9% no
startup/phase/execute_preparsed_tiny execution 0.000 ms 0.000 ms +4.8% 1.05x 2.1% 0.8% no
startup/phase/event_loop_drain_empty startup 0.000 ms 0.000 ms -3.9% 0.96x 0.7% 0.6% no
startup/phase/result_stringify_output execution 0.000 ms 0.000 ms +0.3% 1.00x 0.5% 0.5% no
exec/array_map_filter execution 21.227 ms 19.839 ms -6.5% 0.93x 17.2% 21.6% base, PR
exec/closure_factory execution 30.885 ms 29.701 ms -3.8% 0.96x 5.3% 4.8% no
baseline/closure_legacy/closure_factory execution 28.957 ms 28.148 ms -2.8% 0.97x 9.4% 9.6% no
baseline/bytecode/closure_factory execution 14.931 ms 12.643 ms -15.3% 0.85x 5.3% 4.3% no
isolate/bytecode/dispatch_stack execution 22.649 ms 23.935 ms +5.7% 1.06x 1.0% 2.6% no
isolate/bytecode/local_access execution 36.250 ms 36.966 ms +2.0% 1.02x 1.2% 2.3% no
isolate/bytecode/env_access execution 39.620 ms 36.754 ms -7.2% 0.93x 2.2% 1.9% no
isolate/bytecode/captured_access execution 35.661 ms 37.575 ms +5.4% 1.05x 1.3% 1.5% no
isolate/bytecode/call_frame execution 10.351 ms 8.541 ms -17.5% 0.83x 0.8% 4.7% no
isolate/bytecode/runtime_helpers execution 13.863 ms 12.221 ms -11.8% 0.88x 0.9% 1.2% no
isolate/bytecode/property_get execution 45.654 ms 44.657 ms -2.2% 0.98x 1.6% 3.8% no
isolate/bytecode/property_set execution 42.442 ms 40.842 ms -3.8% 0.96x 2.3% 0.9% no
isolate/bytecode/method_call execution 11.356 ms 9.429 ms -17.0% 0.83x 3.8% 0.5% no
isolate/bytecode/object_literal execution 14.073 ms 13.229 ms -6.0% 0.94x 0.7% 1.6% no
isolate/bytecode/array_literal execution 14.648 ms 14.326 ms -2.2% 0.98x 0.7% 1.6% no
exec/arithmetic_loop execution 899.341 ms 841.581 ms -6.4% 0.94x 0.4% 0.6% no
exec/object_construction execution 7.446 ms 7.084 ms -4.9% 0.95x 5.7% 6.3% no
exec/string_ops execution 2.137 ms 1.884 ms -11.8% 0.88x 17.0% 16.8% base, PR
pipeline/exec/lex frontend 0.028 ms 0.027 ms -1.8% 0.98x 2.4% 0.5% no
pipeline/exec/parse frontend 0.027 ms 0.028 ms +3.7% 1.04x 3.2% 3.2% no
pipeline/exec/evaluate execution 25.857 ms 25.658 ms -0.8% 0.99x 4.5% 12.5% no
pipeline/closure_legacy/evaluate execution 26.334 ms 24.564 ms -6.7% 0.93x 4.8% 4.5% no
pipeline/bytecode/compile frontend 0.022 ms 0.023 ms +4.0% 1.04x 28.2% 23.2% base, PR
pipeline/bytecode/evaluate execution 9.956 ms 9.211 ms -7.5% 0.93x 3.3% 2.6% no
pipeline/parse_heavy frontend 0.490 ms 0.497 ms +1.5% 1.02x 5.2% 8.6% no

Mean-time chart (log scale)

benchmark stage mean chart
startup/tiny_program startup 1.158 ms ##
lexer/small frontend 0.030 ms ⚠ #
lexer/large frontend 0.264 ms #
exec/fibonacci_30 execution 13020.802 ms ##############################
exec/property_chain execution 13.616 ms ########
startup/phase/parse_tiny frontend 0.002 ms #
startup/phase/new_interpreter startup 1.124 ms ##
startup/phase/execute_preparsed_tiny execution 0.000 ms #
startup/phase/event_loop_drain_empty startup 0.000 ms #
startup/phase/result_stringify_output execution 0.000 ms #
exec/array_map_filter execution 19.839 ms ⚠ #########
exec/closure_factory execution 29.701 ms ##########
baseline/closure_legacy/closure_factory execution 28.148 ms ##########
baseline/bytecode/closure_factory execution 12.643 ms ########
isolate/bytecode/dispatch_stack execution 23.935 ms ##########
isolate/bytecode/local_access execution 36.966 ms ###########
isolate/bytecode/env_access execution 36.754 ms ###########
isolate/bytecode/captured_access execution 37.575 ms ###########
isolate/bytecode/call_frame execution 8.541 ms #######
isolate/bytecode/runtime_helpers execution 12.221 ms ########
isolate/bytecode/property_get execution 44.657 ms ############
isolate/bytecode/property_set execution 40.842 ms ###########
isolate/bytecode/method_call execution 9.429 ms #######
isolate/bytecode/object_literal execution 13.229 ms ########
isolate/bytecode/array_literal execution 14.326 ms ########
exec/arithmetic_loop execution 841.581 ms #####################
exec/object_construction execution 7.084 ms ######
exec/string_ops execution 1.884 ms ⚠ ###
pipeline/exec/lex frontend 0.027 ms #
pipeline/exec/parse frontend 0.028 ms #
pipeline/exec/evaluate execution 25.658 ms ##########
pipeline/closure_legacy/evaluate execution 24.564 ms ##########
pipeline/bytecode/compile frontend 0.023 ms ⚠ #
pipeline/bytecode/evaluate execution 9.211 ms #######
pipeline/parse_heavy frontend 0.497 ms #

Closure-conversion comparison

  • unavailable

…llees

Add a fast path in Interpreter::call_value that bypasses both
with_active_value + with_active_callee_realm_value when every active-
override slot is None and every callee stamped proto matches the main
realm's (or is absent).

Storage: pack all 10 realm proto slots into a single 10-element
Array[Value] at symbol key -101 (FUNCTION_REALM_PROTOS_PACKED_SYMBOL_ID)
instead of 10 separate symbol_properties entries. The fast path check
does 1 HashMap lookup + 10 sequential array reads + 10 physical_equal
comparisons vs 10 HashMap lookups in the earlier full-10-slot approach.
Keys -102..-110 are freed.

Safety: all 10 active-override Ref slots are checked (Part 1) and all
10 callee stamped proto slots are compared (Part 2), so the check is
safe even when stamp_function_realm_with sets slots independently or
when a non-function override is active. Two whitebox positive-control
tests verify that the OLD single-proto check would have failed both
scenarios (active_array_prototype_override set, mixed-stamp with foreign
array proto) while the full check correctly blocks the fast path.

Benchmark improvement vs post-#367 baseline (10-run median):
  isolate/bytecode/call_frame:      7.514 → 6.295 ms  (−16%)
  isolate/bytecode/method_call:     8.562 → 7.086 ms  (−17%)
  isolate/bytecode/runtime_helpers: 10.369 → 9.013 ms  (−13%)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@dowdiness dowdiness force-pushed the perf/call-value-realm-fast-path branch from 57d7033 to 96355bb Compare June 16, 2026 11:50
@dowdiness dowdiness merged commit 722db05 into main Jun 16, 2026
15 checks passed
@dowdiness dowdiness deleted the perf/call-value-realm-fast-path branch June 16, 2026 12:36
dowdiness added a commit that referenced this pull request Jun 16, 2026
…rotos?] (#369) (#369)

* perf: replace 10 Ref reads in realm_fast_path_allowed with single Bool

Replace the 10 double-indirect Ref reads in Part 1 of
realm_fast_path_allowed with a single Bool field (has_active_override)
maintained by apply_active_realm_protos. The Bool is set to true when
any active override is Some, false when all are None.

Ablation established that the 10 reads + HashMap lookup consumed 14-22%
of per-call time (PR #367/#368 session). This change eliminates Part 1
(10 x 2 pointer dereferences) at the cost of 1 direct field read.

Measured gain on JS target (5-run median):
  call_frame     6.66 ms → 6.25 ms  (−6.2%, CV 4.7% → 0.9%)
  method_call    7.66 ms → 7.01 ms  (−8.5%)
  runtime_helpers  9.49 ms → 8.79 ms  (−7.4%)
  local_access (control): noise only

All 7 raw direct-write sites in wbtest files updated to maintain the
invariant (has_active_override == OR of all 10 active_*_override Refs).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix: update remaining has_active_override write sites in has_property_realm_wbtest

Three direct writes to active_{map,set,promise}_prototype_override.val
in has_property_realm_wbtest.mbt were missing the companion
has_active_override = true update. All 10 direct raw write sites in
the source tree are now audited and consistent with the invariant:
has_active_override == (any active_*_prototype_override.val is Some).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(perf): replace 10-Ref + Bool cache with Ref[FunctionRealmProtos?]

PR #369 introduced a stale-cache hazard: the 10 active_*_prototype_override
Ref fields remained publicly writable (RealmState is pub(all)), so external
code writing any Ref directly would bypass has_active_override and allow
call_value to take the fast path incorrectly.

Fix: collapse the 10 individual Refs + Bool into a single
Ref[FunctionRealmProtos?] on RealmState.

- None = no cross-realm context active
- Some(protos) = at least one override set

The "any active?" check in realm_fast_path_allowed Part 1 is now
`active_overrides.val is None` — one Ref read. This is structurally correct
regardless of any external write, because writing to the single combined Ref
atomically updates both the proto data AND the "any active?" check. No
separate Bool cache exists to become stale.

Public API changes:
- Remove: 10 active_*_prototype_override fields, has_active_override Bool
- Add: active_overrides : Ref[FunctionRealmProtos?]
- Promote to pub: apply_active_realm_protos, FunctionRealmProtos struct + constructor

All wbtest direct-write sites updated to use apply_active_realm_protos.
apply_active_realm_protos is now simpler: 10 Ref writes → 1 Option write.
active_realm_protos is now simpler: 10 .val reads → unwrap one Option.

2096/2096 tests green.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant