perf: collapse 10 active-override Refs into single Ref[FunctionRealmProtos?] (#369) by dowdiness · Pull Request #369 · dowdiness/js_engine

dowdiness · 2026-06-16T14:50:43Z

Summary

Replaces Part 1 of realm_fast_path_allowed — originally 10 double-indirect Ref[Value?] reads — with a single Ref[FunctionRealmProtos?] on RealmState.

The initial approach (Bool cache + 10 individual Refs) had a correctness flaw: RealmState is pub(all), so any external code writing active_*_prototype_override.val directly would bypass the Bool without updating it, causing call_value to take the fast path erroneously when a cross-realm override was active.

New design: collapses all 10 override Refs and the Bool into one field:

active_overrides : Ref[FunctionRealmProtos?]
// None  = no cross-realm context active
// Some(protos) = at least one override is set

Part 1 check: realm_state.active_overrides.val is None — one Ref read
No separate Bool cache, so no stale-cache hazard: writing the single combined Ref atomically updates both the "any active?" check and the proto values
apply_active_realm_protos: 10 Ref writes → 1 Option write
active_realm_protos: 10 .val reads → unwrap one Option

Public API changes (interpreter/runtime):

Removed: 10 active_*_prototype_override : Ref[Value?] fields, has_active_override : Bool
Added: active_overrides : Ref[FunctionRealmProtos?]
Promoted to pub: apply_active_realm_protos, FunctionRealmProtos struct + constructor

All wbtest direct-write sites updated to use the now-public apply_active_realm_protos.

Motivation

An ablation (return true at top of realm_fast_path_allowed) showed 17–22% total savings from skipping both Part 1 and Part 2. Part 1 alone accounts for ~6–9% of per-call cost. Measured gain on JS target (5-run median, post-#368 baseline):

Benchmark	Before	After	Δ
`call_frame`	6.66 ms	6.25 ms	−6.2%
`method_call`	7.66 ms	7.01 ms	−8.5%
`runtime_helpers`	9.49 ms	8.79 ms	−7.4%

CV on call_frame dropped 4.7% → 0.9%, confirming the Ref reads were adding timing variance.

Test plan

moon check --deny-warn clean
moon test: 2096/2096 passed
moon info && moon fmt clean
P2 correctness issue resolved: no Bool cache separate from its backing data

🤖 Generated with Claude Code

Replace the 10 double-indirect Ref reads in Part 1 of realm_fast_path_allowed with a single Bool field (has_active_override) maintained by apply_active_realm_protos. The Bool is set to true when any active override is Some, false when all are None. Ablation established that the 10 reads + HashMap lookup consumed 14-22% of per-call time (PR #367/#368 session). This change eliminates Part 1 (10 x 2 pointer dereferences) at the cost of 1 direct field read. Measured gain on JS target (5-run median): call_frame 6.66 ms → 6.25 ms (−6.2%, CV 4.7% → 0.9%) method_call 7.66 ms → 7.01 ms (−8.5%) runtime_helpers 9.49 ms → 8.79 ms (−7.4%) local_access (control): noise only All 7 raw direct-write sites in wbtest files updated to maintain the invariant (has_active_override == OR of all 10 active_*_override Refs). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…_realm_wbtest Three direct writes to active_{map,set,promise}_prototype_override.val in has_property_realm_wbtest.mbt were missing the companion has_active_override = true update. All 10 direct raw write sites in the source tree are now audited and consistent with the invariant: has_active_override == (any active_*_prototype_override.val is Some). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

coderabbitai · 2026-06-16T14:51:03Z

📝 Walkthrough

Walkthrough

Adds a has_active_override boolean field to RealmState, initialized to false. realm_fast_path_allowed now checks only this flag instead of 10 per-slot *_prototype_override.val checks. apply_active_realm_protos sets the flag based on whether any of the 10 FunctionRealmProtos fields are Some(_). Six whitebox tests are updated to set the flag explicitly.

Changes

has_active_override consolidation

Layer / File(s)	Summary
RealmState field declaration and initialization `interpreter/runtime/realm_state.mbt`, `interpreter/runtime/pkg.generated.mbti`	Adds `mut has_active_override : Bool` to `RealmState` and initializes it to `false` in `from_symbols`; the generated interface file reflects the new field.
Fast-path guard and override flag setter `interpreter/runtime/factories.mbt`	`realm_fast_path_allowed` checks only `has_active_override` instead of 10 explicit per-slot checks; `apply_active_realm_protos` sets `has_active_override` to `true` when any prototype field is `Some(_)`, otherwise `false`.
Whitebox test precondition updates `interpreter/runtime/factories_wbtest.mbt`, `interpreter/runtime/has_property_realm_wbtest.mbt`, `interpreter/runtime/instanceof_wbtest.mbt`, `interpreter/stdlib/borrowed_builtin_realm_wbtest.mbt`	Sets `realm_state.has_active_override = true` explicitly in six tests to ensure override-aware routing and fast-path gating are exercised under the new consolidated flag.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Possibly related PRs

dowdiness/js_engine#153: Touches the same interpreter/runtime/factories.mbt and interpreter/runtime/realm_state.mbt files, introducing the active prototype override fields that has_active_override now summarizes.

Poem

🐇 Ten checks were too many, a burden to bear,
So I bundled them neatly with one boolean there.
has_active_override — a flag, crisp and true,
The fast path now gallops on just one field's view.
With tests all updated, the warren is tight,
One flag rules them all — and that feels just right! 🌿

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately summarizes the main performance optimization: replacing 10 Refs with a single Bool field. It is concise, specific, and directly reflects the primary change in the changeset.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch perf/active-override-count

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: b14ccac369

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

github-actions · 2026-06-16T14:56:57Z

Benchmark Results

Run: https://github.com/dowdiness/js_engine/actions/runs/27628615012

startup/tiny_program is the PR #153 / issue #141 guardrail for built-in realm-stamping startup cost.

Stage summary

stage	benchmarks	total mean	slowest benchmark	slowest mean	noisy rows
startup	3	2.603 ms	startup/tiny_program	1.319 ms	0
frontend	7	0.866 ms	pipeline/parse_heavy	0.493 ms	2
execution	25	15119.738 ms	exec/fibonacci_30	13627.469 ms	2

Focused bytecode base-vs-head comparison

Base-vs-head deltas are reporting-only. Negative delta and PR/base < 1.00x mean the PR is faster; interpret high-CV or noisy rows cautiously.

benchmark	stage	base mean	PR mean	delta	PR/base	base CV	PR CV	noisy
baseline/bytecode/closure_factory	execution	13.776 ms	15.330 ms	+11.3%	1.11x	7.2%	8.2%	no
pipeline/bytecode/evaluate	execution	10.045 ms	9.445 ms	-6.0%	0.94x	1.7%	2.3%	no
isolate/bytecode/call_frame	execution	8.918 ms	8.579 ms	-3.8%	0.96x	4.6%	0.9%	no
isolate/bytecode/runtime_helpers	execution	12.889 ms	12.276 ms	-4.8%	0.95x	0.9%	2.4%	no
isolate/bytecode/local_access	execution	37.180 ms	38.630 ms	+3.9%	1.04x	1.5%	2.1%	no
isolate/bytecode/env_access	execution	38.641 ms	37.317 ms	-3.4%	0.97x	2.2%	1.4%	no
isolate/bytecode/captured_access	execution	37.646 ms	36.951 ms	-1.8%	0.98x	1.6%	3.4%	no
isolate/bytecode/dispatch_stack	execution	23.010 ms	22.702 ms	-1.3%	0.99x	0.6%	1.5%	no

Base-vs-head comparison

benchmark	stage	base mean	PR mean	delta	PR/base	base CV	PR CV	noisy
startup/tiny_program	startup	1.111 ms	1.319 ms	+18.8%	1.19x	4.2%	4.1%	no
lexer/small	frontend	0.031 ms	0.031 ms	-0.3%	1.00x	25.1%	20.2%	base, PR
lexer/large	frontend	0.271 ms	0.264 ms	-2.6%	0.97x	7.6%	0.8%	no
exec/fibonacci_30	execution	12963.406 ms	13627.469 ms	+5.1%	1.05x	0.8%	1.1%	no
exec/property_chain	execution	13.867 ms	15.341 ms	+10.6%	1.11x	8.4%	11.3%	no
startup/phase/parse_tiny	frontend	0.002 ms	0.002 ms	-0.8%	0.99x	0.8%	0.9%	no
startup/phase/new_interpreter	startup	1.138 ms	1.283 ms	+12.8%	1.13x	15.8%	6.0%	base
startup/phase/execute_preparsed_tiny	execution	0.001 ms	0.001 ms	-2.0%	0.98x	1.6%	1.1%	no
startup/phase/event_loop_drain_empty	startup	0.000 ms	0.000 ms	+5.4%	1.05x	0.8%	0.9%	no
startup/phase/result_stringify_output	execution	0.000 ms	0.000 ms	-0.7%	0.99x	0.6%	0.5%	no
exec/array_map_filter	execution	19.963 ms	20.774 ms	+4.1%	1.04x	17.5%	18.9%	base, PR
exec/closure_factory	execution	29.866 ms	30.789 ms	+3.1%	1.03x	6.0%	5.8%	no
baseline/closure_legacy/closure_factory	execution	28.934 ms	30.616 ms	+5.8%	1.06x	6.9%	10.1%	no
baseline/bytecode/closure_factory	execution	13.776 ms	15.330 ms	+11.3%	1.11x	7.2%	8.2%	no
isolate/bytecode/dispatch_stack	execution	23.010 ms	22.702 ms	-1.3%	0.99x	0.6%	1.5%	no
isolate/bytecode/local_access	execution	37.180 ms	38.630 ms	+3.9%	1.04x	1.5%	2.1%	no
isolate/bytecode/env_access	execution	38.641 ms	37.317 ms	-3.4%	0.97x	2.2%	1.4%	no
isolate/bytecode/captured_access	execution	37.646 ms	36.951 ms	-1.8%	0.98x	1.6%	3.4%	no
isolate/bytecode/call_frame	execution	8.918 ms	8.579 ms	-3.8%	0.96x	4.6%	0.9%	no
isolate/bytecode/runtime_helpers	execution	12.889 ms	12.276 ms	-4.8%	0.95x	0.9%	2.4%	no
isolate/bytecode/property_get	execution	46.873 ms	43.766 ms	-6.6%	0.93x	1.7%	1.3%	no
isolate/bytecode/property_set	execution	43.942 ms	40.170 ms	-8.6%	0.91x	2.0%	1.9%	no
isolate/bytecode/method_call	execution	9.722 ms	9.579 ms	-1.5%	0.99x	2.1%	0.8%	no
isolate/bytecode/object_literal	execution	13.696 ms	13.153 ms	-4.0%	0.96x	1.4%	1.0%	no
isolate/bytecode/array_literal	execution	15.348 ms	14.851 ms	-3.2%	0.97x	2.5%	3.0%	no
exec/arithmetic_loop	execution	902.862 ms	1028.564 ms	+13.9%	1.14x	0.4%	0.2%	no
exec/object_construction	execution	7.403 ms	7.437 ms	+0.5%	1.00x	4.7%	4.9%	no
exec/string_ops	execution	1.832 ms	1.984 ms	+8.3%	1.08x	18.0%	17.4%	base, PR
pipeline/exec/lex	frontend	0.028 ms	0.028 ms	+0.1%	1.00x	1.1%	0.8%	no
pipeline/exec/parse	frontend	0.028 ms	0.028 ms	-0.8%	0.99x	3.2%	3.2%	no
pipeline/exec/evaluate	execution	26.663 ms	27.405 ms	+2.8%	1.03x	4.9%	7.6%	no
pipeline/closure_legacy/evaluate	execution	25.590 ms	26.610 ms	+4.0%	1.04x	3.8%	4.1%	no
pipeline/bytecode/compile	frontend	0.023 ms	0.022 ms	-3.0%	0.97x	31.6%	28.9%	base, PR
pipeline/bytecode/evaluate	execution	10.045 ms	9.445 ms	-6.0%	0.94x	1.7%	2.3%	no
pipeline/parse_heavy	frontend	0.498 ms	0.493 ms	-1.1%	0.99x	5.7%	5.5%	no

Mean-time chart (log scale)

benchmark	stage	mean	chart
startup/tiny_program	startup	1.319 ms	`##`
lexer/small	frontend	0.031 ms ⚠	`#`
lexer/large	frontend	0.264 ms	`#`
exec/fibonacci_30	execution	13627.469 ms	`##############################`
exec/property_chain	execution	15.341 ms	`########`
startup/phase/parse_tiny	frontend	0.002 ms	`#`
startup/phase/new_interpreter	startup	1.283 ms	`##`
startup/phase/execute_preparsed_tiny	execution	0.001 ms	`#`
startup/phase/event_loop_drain_empty	startup	0.000 ms	`#`
startup/phase/result_stringify_output	execution	0.000 ms	`#`
exec/array_map_filter	execution	20.774 ms ⚠	`#########`
exec/closure_factory	execution	30.789 ms	`##########`
baseline/closure_legacy/closure_factory	execution	30.616 ms	`##########`
baseline/bytecode/closure_factory	execution	15.330 ms	`########`
isolate/bytecode/dispatch_stack	execution	22.702 ms	`#########`
isolate/bytecode/local_access	execution	38.630 ms	`###########`
isolate/bytecode/env_access	execution	37.317 ms	`###########`
isolate/bytecode/captured_access	execution	36.951 ms	`###########`
isolate/bytecode/call_frame	execution	8.579 ms	`#######`
isolate/bytecode/runtime_helpers	execution	12.276 ms	`########`
isolate/bytecode/property_get	execution	43.766 ms	`###########`
isolate/bytecode/property_set	execution	40.170 ms	`###########`
isolate/bytecode/method_call	execution	9.579 ms	`#######`
isolate/bytecode/object_literal	execution	13.153 ms	`########`
isolate/bytecode/array_literal	execution	14.851 ms	`########`
exec/arithmetic_loop	execution	1028.564 ms	`#####################`
exec/object_construction	execution	7.437 ms	`######`
exec/string_ops	execution	1.984 ms ⚠	`###`
pipeline/exec/lex	frontend	0.028 ms	`#`
pipeline/exec/parse	frontend	0.028 ms	`#`
pipeline/exec/evaluate	execution	27.405 ms	`##########`
pipeline/closure_legacy/evaluate	execution	26.610 ms	`##########`
pipeline/bytecode/compile	frontend	0.022 ms ⚠	`#`
pipeline/bytecode/evaluate	execution	9.445 ms	`#######`
pipeline/parse_heavy	frontend	0.493 ms	`#`

Closure-conversion comparison

unavailable

PR #369 introduced a stale-cache hazard: the 10 active_*_prototype_override Ref fields remained publicly writable (RealmState is pub(all)), so external code writing any Ref directly would bypass has_active_override and allow call_value to take the fast path incorrectly. Fix: collapse the 10 individual Refs + Bool into a single Ref[FunctionRealmProtos?] on RealmState. - None = no cross-realm context active - Some(protos) = at least one override set The "any active?" check in realm_fast_path_allowed Part 1 is now `active_overrides.val is None` — one Ref read. This is structurally correct regardless of any external write, because writing to the single combined Ref atomically updates both the proto data AND the "any active?" check. No separate Bool cache exists to become stale. Public API changes: - Remove: 10 active_*_prototype_override fields, has_active_override Bool - Add: active_overrides : Ref[FunctionRealmProtos?] - Promote to pub: apply_active_realm_protos, FunctionRealmProtos struct + constructor All wbtest direct-write sites updated to use apply_active_realm_protos. apply_active_realm_protos is now simpler: 10 Ref writes → 1 Option write. active_realm_protos is now simpler: 10 .val reads → unwrap one Option. 2096/2096 tests green. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…overrides (#370) Covers the clearing path: set an override, call apply_active_realm_protos with FunctionRealmProtos() (all None), confirm fast path is re-allowed and proto getter falls back to the base proto. Follow-up to #369 — identified as a gap during review. Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

dowdiness and others added 2 commits June 16, 2026 23:45

chatgpt-codex-connector Bot reviewed Jun 16, 2026

View reviewed changes

Comment thread interpreter/runtime/factories.mbt Outdated

dowdiness changed the title ~~perf: replace 10 Ref reads in realm_fast_path_allowed with single Bool (#369)~~ perf: collapse 10 active-override Refs into single Ref[FunctionRealmProtos?] (#369) Jun 16, 2026

dowdiness merged commit 3717dea into main Jun 16, 2026
15 checks passed

dowdiness deleted the perf/active-override-count branch June 16, 2026 15:53

dowdiness mentioned this pull request Jun 16, 2026

test(runtime): verify apply_active_realm_protos with all-None clears overrides #370

Merged

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: collapse 10 active-override Refs into single Ref[FunctionRealmProtos?] (#369)#369

perf: collapse 10 active-override Refs into single Ref[FunctionRealmProtos?] (#369)#369
dowdiness merged 3 commits into
mainfrom
perf/active-override-count

dowdiness commented Jun 16, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot commented Jun 16, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Poem

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

Uh oh!

github-actions Bot commented Jun 16, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

dowdiness commented Jun 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Motivation

Test plan

Uh oh!

coderabbitai Bot commented Jun 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Poem

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

github-actions Bot commented Jun 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Benchmark Results

Stage summary

Focused bytecode base-vs-head comparison

Base-vs-head comparison

Mean-time chart (log scale)

Closure-conversion comparison

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

dowdiness commented Jun 16, 2026 •

edited

Loading

coderabbitai Bot commented Jun 16, 2026 •

edited

Loading

github-actions Bot commented Jun 16, 2026 •

edited

Loading