Skip to content

engine: VM set_value cannot apply distinct per-instance overrides to two instances of the same shared CompiledModule #616

@bpowers

Description

@bpowers

Summary

The bytecode VM's set_value / set_value_by_offset override mechanism (src/simlin-engine/src/vm.rs) cannot apply distinct overrides to two instances of the same shared CompiledModule. When a model instantiates the same submodel (or stdlib macro: SMOOTH/DELAY/etc.) twice and a host calls set_value on the same relative constant in each instance with different values, the second override clobbers the first -- both instances then read the last value.

This is a VM (interpreter) correctness bug. It is not in the new wasm backend; the wasm backend handles this case correctly and is the more-correct reference behavior (see below).

Discovered and independently re-confirmed during Phase 7 of the wasm-backend work (docs/implementation-plans/2026-05-20-wasm-backend/).

Root cause (verified against vm.rs)

The override path is set_value/set_value_by_offset -> apply_override -> write_literal, with module resolution via module_idx_for.

collect_constant_info (vm.rs:439-520) recurses into submodules (lines 511-517) and keys overridable constants by their absolute data-buffer offset abs_off = base_off + off (line 453). So two instances of the same submodel at different module_off bases DO get distinct absolute-offset entries in cached_constant_info.

But the BytecodeLocation stored for each of those distinct absolute offsets carries (lines 457-461 / 492-496):

  • module_key -- make_module_key(model_name, input_set), which is identical for both instances (same model name + input set), and
  • literal_id -- the literal index within the shared bytecode, also identical for both instances.

write_literal (vm.rs:952-979) then resolves module_idx_for(module_key) (line 959) to the single shared ResolvedModule index and does Arc::make_mut(bytecode).literals[literal_id] = value (line 966) -- writing into the one shared bytecode literal that both instances execute.

So two distinct absolute offsets belonging to two instances of the same ModuleKey resolve to the same (module, literal_id), and the second set_value overwrites the first. Both instances subsequently read the last-written value.

(module_idx_for at vm.rs:915-921 is the cold-path key->index map. key_to_idx is one entry per unique (model_name, input_set), by construction -- modules are deduplicated by ModuleKey -- so there is no per-instance module index for the override path to target.)

Empirical evidence (Phase 7)

A model instantiating a submodel twice, where the submodel has its own overridable constant k and out = in + k (with in wired to the constant 7):

  • Override sub0.k = 100, sub1.k = 200.
  • VM produces sub0.out = 207 and sub1.out = 207 -- clobbered; both got 200.
  • wasm backend produces sub0.out = 107 and sub1.out = 207 -- correct; each instance carries its own override.

The wasm backend is strictly more correct here because its constants-override region is indexed by absolute slot (const_region_base + (module_off + off) * 8; see src/simlin-engine/src/wasmgen/lower.rs:224-230 and 1346-1352). A shared module run at several module_offs thus picks up each instance's distinct override. The VM's shared-literal mutation has no equivalent per-instance addressing.

Severity / reachability

Triggering this requires both:

  1. a model that instantiates the same submodel / macro more than once, AND
  2. a host driving distinct per-instance overrides via set_value / set_value_by_offset on a constant that lives inside that shared module.

Notes on blast radius:

  • The wasm full-corpus parity gates are unaffected. The corpus parity tests run compiled defaults and do not call set_value, so this VM/wasm divergence is invisible to the wasm parity floor and does not affect the Phase 8 full-corpus-parity goal. It only matters for a host that overrides per-instance constants.
  • The same-value override case is harmless (clobbering a value with the same value is a no-op), so a host that sets all instances of a shared constant to one value is unaffected.
  • No in-repo caller is known to drive distinct per-instance overrides of a shared-module constant today. The override callers (libsimlin / pysimlin / the app) are the surfaces where this could surface in the wild; if/when one does per-instance constant overriding on a model with a repeated submodel, it would hit this.

Phase 7 test posture (already correct -- no change needed)

The Phase 7 wasm tests deliberately do not bake in the VM bug:

  • compile_simulation_two_instances_distinct_overrides (src/simlin-engine/src/wasmgen/module.rs:3144) tests wasm correctness directly (asserts 107 and 207), and does NOT assert wasm == VM for the distinct-override case.
  • compile_simulation_two_instances_same_value_override_matches_vm (module.rs:3200) anchors wasm-vs-VM parity in the same-value regime, where the clobber is harmless.

Both test doc comments document this VM divergence and note it is "tracked separately." This issue is that tracking item.

Possible approaches for resolution

The fix needs per-instance override storage rather than per-ModuleKey-shared-literal mutation. Options to evaluate:

  • Give the VM an absolute-offset-keyed override region (mirroring the wasm backend's const_region indexed by absolute slot) and have AssignConstCurr read from it when an override is present, instead of mutating the shared bytecode literal in place. This converges the two backends on the same (correct) model and removes the Arc::make_mut-on-shared-bytecode override path entirely.
  • Alternatively, de-share the bytecode for a module that is instantiated more than once (one ResolvedModule per instance). This is simpler conceptually but gives up the dedup that ModuleKey sharing buys and would not match the wasm backend's design.

The first approach is preferred: it is the same mechanism the wasm backend already proves correct, keeps module-bytecode sharing, and would let the distinct-override case become a genuine VM==wasm parity assertion.

Context

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingengineIssues with the rust-based simulation enginerustPull requests that update Rust code

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions