Skip to content

perf: collapse 10 active-override Refs into single Ref[FunctionRealmProtos?] (#369)#369

Merged
dowdiness merged 3 commits into
mainfrom
perf/active-override-count
Jun 16, 2026
Merged

perf: collapse 10 active-override Refs into single Ref[FunctionRealmProtos?] (#369)#369
dowdiness merged 3 commits into
mainfrom
perf/active-override-count

Conversation

@dowdiness

@dowdiness dowdiness commented Jun 16, 2026

Copy link
Copy Markdown
Owner

Summary

Replaces Part 1 of realm_fast_path_allowed — originally 10 double-indirect Ref[Value?] reads — with a single Ref[FunctionRealmProtos?] on RealmState.

The initial approach (Bool cache + 10 individual Refs) had a correctness flaw: RealmState is pub(all), so any external code writing active_*_prototype_override.val directly would bypass the Bool without updating it, causing call_value to take the fast path erroneously when a cross-realm override was active.

New design: collapses all 10 override Refs and the Bool into one field:

active_overrides : Ref[FunctionRealmProtos?]
// None  = no cross-realm context active
// Some(protos) = at least one override is set
  • Part 1 check: realm_state.active_overrides.val is None — one Ref read
  • No separate Bool cache, so no stale-cache hazard: writing the single combined Ref atomically updates both the "any active?" check and the proto values
  • apply_active_realm_protos: 10 Ref writes → 1 Option write
  • active_realm_protos: 10 .val reads → unwrap one Option

Public API changes (interpreter/runtime):

  • Removed: 10 active_*_prototype_override : Ref[Value?] fields, has_active_override : Bool
  • Added: active_overrides : Ref[FunctionRealmProtos?]
  • Promoted to pub: apply_active_realm_protos, FunctionRealmProtos struct + constructor

All wbtest direct-write sites updated to use the now-public apply_active_realm_protos.

Motivation

An ablation (return true at top of realm_fast_path_allowed) showed 17–22% total savings from skipping both Part 1 and Part 2. Part 1 alone accounts for ~6–9% of per-call cost. Measured gain on JS target (5-run median, post-#368 baseline):

Benchmark Before After Δ
call_frame 6.66 ms 6.25 ms −6.2%
method_call 7.66 ms 7.01 ms −8.5%
runtime_helpers 9.49 ms 8.79 ms −7.4%

CV on call_frame dropped 4.7% → 0.9%, confirming the Ref reads were adding timing variance.

Test plan

  • moon check --deny-warn clean
  • moon test: 2096/2096 passed
  • moon info && moon fmt clean
  • P2 correctness issue resolved: no Bool cache separate from its backing data

🤖 Generated with Claude Code

dowdiness and others added 2 commits June 16, 2026 23:45
Replace the 10 double-indirect Ref reads in Part 1 of
realm_fast_path_allowed with a single Bool field (has_active_override)
maintained by apply_active_realm_protos. The Bool is set to true when
any active override is Some, false when all are None.

Ablation established that the 10 reads + HashMap lookup consumed 14-22%
of per-call time (PR #367/#368 session). This change eliminates Part 1
(10 x 2 pointer dereferences) at the cost of 1 direct field read.

Measured gain on JS target (5-run median):
  call_frame     6.66 ms → 6.25 ms  (−6.2%, CV 4.7% → 0.9%)
  method_call    7.66 ms → 7.01 ms  (−8.5%)
  runtime_helpers  9.49 ms → 8.79 ms  (−7.4%)
  local_access (control): noise only

All 7 raw direct-write sites in wbtest files updated to maintain the
invariant (has_active_override == OR of all 10 active_*_override Refs).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…_realm_wbtest

Three direct writes to active_{map,set,promise}_prototype_override.val
in has_property_realm_wbtest.mbt were missing the companion
has_active_override = true update. All 10 direct raw write sites in
the source tree are now audited and consistent with the invariant:
has_active_override == (any active_*_prototype_override.val is Some).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@coderabbitai

coderabbitai Bot commented Jun 16, 2026

Copy link
Copy Markdown

Review Change Stack

📝 Walkthrough

Walkthrough

Adds a has_active_override boolean field to RealmState, initialized to false. realm_fast_path_allowed now checks only this flag instead of 10 per-slot *_prototype_override.val checks. apply_active_realm_protos sets the flag based on whether any of the 10 FunctionRealmProtos fields are Some(_). Six whitebox tests are updated to set the flag explicitly.

Changes

has_active_override consolidation

Layer / File(s) Summary
RealmState field declaration and initialization
interpreter/runtime/realm_state.mbt, interpreter/runtime/pkg.generated.mbti
Adds mut has_active_override : Bool to RealmState and initializes it to false in from_symbols; the generated interface file reflects the new field.
Fast-path guard and override flag setter
interpreter/runtime/factories.mbt
realm_fast_path_allowed checks only has_active_override instead of 10 explicit per-slot checks; apply_active_realm_protos sets has_active_override to true when any prototype field is Some(_), otherwise false.
Whitebox test precondition updates
interpreter/runtime/factories_wbtest.mbt, interpreter/runtime/has_property_realm_wbtest.mbt, interpreter/runtime/instanceof_wbtest.mbt, interpreter/stdlib/borrowed_builtin_realm_wbtest.mbt
Sets realm_state.has_active_override = true explicitly in six tests to ensure override-aware routing and fast-path gating are exercised under the new consolidated flag.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Possibly related PRs

  • dowdiness/js_engine#153: Touches the same interpreter/runtime/factories.mbt and interpreter/runtime/realm_state.mbt files, introducing the active prototype override fields that has_active_override now summarizes.

Poem

🐇 Ten checks were too many, a burden to bear,
So I bundled them neatly with one boolean there.
has_active_override — a flag, crisp and true,
The fast path now gallops on just one field's view.
With tests all updated, the warren is tight,
One flag rules them all — and that feels just right! 🌿

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main performance optimization: replacing 10 Refs with a single Bool field. It is concise, specific, and directly reflects the primary change in the changeset.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch perf/active-override-count

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: b14ccac369

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread interpreter/runtime/factories.mbt Outdated
@github-actions

github-actions Bot commented Jun 16, 2026

Copy link
Copy Markdown
Contributor

Benchmark Results

Run: https://github.com/dowdiness/js_engine/actions/runs/27628615012

startup/tiny_program is the PR #153 / issue #141 guardrail for built-in realm-stamping startup cost.

Stage summary

stage benchmarks total mean slowest benchmark slowest mean noisy rows
startup 3 2.603 ms startup/tiny_program 1.319 ms 0
frontend 7 0.866 ms pipeline/parse_heavy 0.493 ms 2
execution 25 15119.738 ms exec/fibonacci_30 13627.469 ms 2

Focused bytecode base-vs-head comparison

Base-vs-head deltas are reporting-only. Negative delta and PR/base < 1.00x mean the PR is faster; interpret high-CV or noisy rows cautiously.

benchmark stage base mean PR mean delta PR/base base CV PR CV noisy
baseline/bytecode/closure_factory execution 13.776 ms 15.330 ms +11.3% 1.11x 7.2% 8.2% no
pipeline/bytecode/evaluate execution 10.045 ms 9.445 ms -6.0% 0.94x 1.7% 2.3% no
isolate/bytecode/call_frame execution 8.918 ms 8.579 ms -3.8% 0.96x 4.6% 0.9% no
isolate/bytecode/runtime_helpers execution 12.889 ms 12.276 ms -4.8% 0.95x 0.9% 2.4% no
isolate/bytecode/local_access execution 37.180 ms 38.630 ms +3.9% 1.04x 1.5% 2.1% no
isolate/bytecode/env_access execution 38.641 ms 37.317 ms -3.4% 0.97x 2.2% 1.4% no
isolate/bytecode/captured_access execution 37.646 ms 36.951 ms -1.8% 0.98x 1.6% 3.4% no
isolate/bytecode/dispatch_stack execution 23.010 ms 22.702 ms -1.3% 0.99x 0.6% 1.5% no

Base-vs-head comparison

benchmark stage base mean PR mean delta PR/base base CV PR CV noisy
startup/tiny_program startup 1.111 ms 1.319 ms +18.8% 1.19x 4.2% 4.1% no
lexer/small frontend 0.031 ms 0.031 ms -0.3% 1.00x 25.1% 20.2% base, PR
lexer/large frontend 0.271 ms 0.264 ms -2.6% 0.97x 7.6% 0.8% no
exec/fibonacci_30 execution 12963.406 ms 13627.469 ms +5.1% 1.05x 0.8% 1.1% no
exec/property_chain execution 13.867 ms 15.341 ms +10.6% 1.11x 8.4% 11.3% no
startup/phase/parse_tiny frontend 0.002 ms 0.002 ms -0.8% 0.99x 0.8% 0.9% no
startup/phase/new_interpreter startup 1.138 ms 1.283 ms +12.8% 1.13x 15.8% 6.0% base
startup/phase/execute_preparsed_tiny execution 0.001 ms 0.001 ms -2.0% 0.98x 1.6% 1.1% no
startup/phase/event_loop_drain_empty startup 0.000 ms 0.000 ms +5.4% 1.05x 0.8% 0.9% no
startup/phase/result_stringify_output execution 0.000 ms 0.000 ms -0.7% 0.99x 0.6% 0.5% no
exec/array_map_filter execution 19.963 ms 20.774 ms +4.1% 1.04x 17.5% 18.9% base, PR
exec/closure_factory execution 29.866 ms 30.789 ms +3.1% 1.03x 6.0% 5.8% no
baseline/closure_legacy/closure_factory execution 28.934 ms 30.616 ms +5.8% 1.06x 6.9% 10.1% no
baseline/bytecode/closure_factory execution 13.776 ms 15.330 ms +11.3% 1.11x 7.2% 8.2% no
isolate/bytecode/dispatch_stack execution 23.010 ms 22.702 ms -1.3% 0.99x 0.6% 1.5% no
isolate/bytecode/local_access execution 37.180 ms 38.630 ms +3.9% 1.04x 1.5% 2.1% no
isolate/bytecode/env_access execution 38.641 ms 37.317 ms -3.4% 0.97x 2.2% 1.4% no
isolate/bytecode/captured_access execution 37.646 ms 36.951 ms -1.8% 0.98x 1.6% 3.4% no
isolate/bytecode/call_frame execution 8.918 ms 8.579 ms -3.8% 0.96x 4.6% 0.9% no
isolate/bytecode/runtime_helpers execution 12.889 ms 12.276 ms -4.8% 0.95x 0.9% 2.4% no
isolate/bytecode/property_get execution 46.873 ms 43.766 ms -6.6% 0.93x 1.7% 1.3% no
isolate/bytecode/property_set execution 43.942 ms 40.170 ms -8.6% 0.91x 2.0% 1.9% no
isolate/bytecode/method_call execution 9.722 ms 9.579 ms -1.5% 0.99x 2.1% 0.8% no
isolate/bytecode/object_literal execution 13.696 ms 13.153 ms -4.0% 0.96x 1.4% 1.0% no
isolate/bytecode/array_literal execution 15.348 ms 14.851 ms -3.2% 0.97x 2.5% 3.0% no
exec/arithmetic_loop execution 902.862 ms 1028.564 ms +13.9% 1.14x 0.4% 0.2% no
exec/object_construction execution 7.403 ms 7.437 ms +0.5% 1.00x 4.7% 4.9% no
exec/string_ops execution 1.832 ms 1.984 ms +8.3% 1.08x 18.0% 17.4% base, PR
pipeline/exec/lex frontend 0.028 ms 0.028 ms +0.1% 1.00x 1.1% 0.8% no
pipeline/exec/parse frontend 0.028 ms 0.028 ms -0.8% 0.99x 3.2% 3.2% no
pipeline/exec/evaluate execution 26.663 ms 27.405 ms +2.8% 1.03x 4.9% 7.6% no
pipeline/closure_legacy/evaluate execution 25.590 ms 26.610 ms +4.0% 1.04x 3.8% 4.1% no
pipeline/bytecode/compile frontend 0.023 ms 0.022 ms -3.0% 0.97x 31.6% 28.9% base, PR
pipeline/bytecode/evaluate execution 10.045 ms 9.445 ms -6.0% 0.94x 1.7% 2.3% no
pipeline/parse_heavy frontend 0.498 ms 0.493 ms -1.1% 0.99x 5.7% 5.5% no

Mean-time chart (log scale)

benchmark stage mean chart
startup/tiny_program startup 1.319 ms ##
lexer/small frontend 0.031 ms ⚠ #
lexer/large frontend 0.264 ms #
exec/fibonacci_30 execution 13627.469 ms ##############################
exec/property_chain execution 15.341 ms ########
startup/phase/parse_tiny frontend 0.002 ms #
startup/phase/new_interpreter startup 1.283 ms ##
startup/phase/execute_preparsed_tiny execution 0.001 ms #
startup/phase/event_loop_drain_empty startup 0.000 ms #
startup/phase/result_stringify_output execution 0.000 ms #
exec/array_map_filter execution 20.774 ms ⚠ #########
exec/closure_factory execution 30.789 ms ##########
baseline/closure_legacy/closure_factory execution 30.616 ms ##########
baseline/bytecode/closure_factory execution 15.330 ms ########
isolate/bytecode/dispatch_stack execution 22.702 ms #########
isolate/bytecode/local_access execution 38.630 ms ###########
isolate/bytecode/env_access execution 37.317 ms ###########
isolate/bytecode/captured_access execution 36.951 ms ###########
isolate/bytecode/call_frame execution 8.579 ms #######
isolate/bytecode/runtime_helpers execution 12.276 ms ########
isolate/bytecode/property_get execution 43.766 ms ###########
isolate/bytecode/property_set execution 40.170 ms ###########
isolate/bytecode/method_call execution 9.579 ms #######
isolate/bytecode/object_literal execution 13.153 ms ########
isolate/bytecode/array_literal execution 14.851 ms ########
exec/arithmetic_loop execution 1028.564 ms #####################
exec/object_construction execution 7.437 ms ######
exec/string_ops execution 1.984 ms ⚠ ###
pipeline/exec/lex frontend 0.028 ms #
pipeline/exec/parse frontend 0.028 ms #
pipeline/exec/evaluate execution 27.405 ms ##########
pipeline/closure_legacy/evaluate execution 26.610 ms ##########
pipeline/bytecode/compile frontend 0.022 ms ⚠ #
pipeline/bytecode/evaluate execution 9.445 ms #######
pipeline/parse_heavy frontend 0.493 ms #

Closure-conversion comparison

  • unavailable

PR #369 introduced a stale-cache hazard: the 10 active_*_prototype_override
Ref fields remained publicly writable (RealmState is pub(all)), so external
code writing any Ref directly would bypass has_active_override and allow
call_value to take the fast path incorrectly.

Fix: collapse the 10 individual Refs + Bool into a single
Ref[FunctionRealmProtos?] on RealmState.

- None = no cross-realm context active
- Some(protos) = at least one override set

The "any active?" check in realm_fast_path_allowed Part 1 is now
`active_overrides.val is None` — one Ref read. This is structurally correct
regardless of any external write, because writing to the single combined Ref
atomically updates both the proto data AND the "any active?" check. No
separate Bool cache exists to become stale.

Public API changes:
- Remove: 10 active_*_prototype_override fields, has_active_override Bool
- Add: active_overrides : Ref[FunctionRealmProtos?]
- Promote to pub: apply_active_realm_protos, FunctionRealmProtos struct + constructor

All wbtest direct-write sites updated to use apply_active_realm_protos.
apply_active_realm_protos is now simpler: 10 Ref writes → 1 Option write.
active_realm_protos is now simpler: 10 .val reads → unwrap one Option.

2096/2096 tests green.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@dowdiness dowdiness changed the title perf: replace 10 Ref reads in realm_fast_path_allowed with single Bool (#369) perf: collapse 10 active-override Refs into single Ref[FunctionRealmProtos?] (#369) Jun 16, 2026
@dowdiness dowdiness merged commit 3717dea into main Jun 16, 2026
15 checks passed
@dowdiness dowdiness deleted the perf/active-override-count branch June 16, 2026 15:53
dowdiness added a commit that referenced this pull request Jun 16, 2026
…overrides (#370)

Covers the clearing path: set an override, call apply_active_realm_protos
with FunctionRealmProtos() (all None), confirm fast path is re-allowed and
proto getter falls back to the base proto.

Follow-up to #369 — identified as a gap during review.

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant