Skip to content

cranelift: include every SSA value bound to a variable in stack maps#13449

Merged
cfallin merged 2 commits into
bytecodealliance:mainfrom
vouillon:gc-fix
May 22, 2026
Merged

cranelift: include every SSA value bound to a variable in stack maps#13449
cfallin merged 2 commits into
bytecodealliance:mainfrom
vouillon:gc-fix

Conversation

@vouillon
Copy link
Copy Markdown
Contributor

SSABuilder::variables is a SecondaryMap<Variable, SecondaryMap<Block, PackedOption<Value>>> that stores one Value per (Variable, Block) pair, so each def_var overwrites whatever the prior definition in that block was. values_for_var then iterates this map and returns only the latest Value per block. That is the wrong set for the safepoint pass: FunctionBuilder::finalize uses values_for_var to propagate stack-map flags from a variable onto its associated values, and any earlier SSA value bound to the variable in the same block is silently dropped even if it is still live across a later safepoint.

Concretely, Wasm code that re-defines a stack-map variable while an earlier SSA value bound to it is still live on the operand stack across an intervening safepoint hits the pattern:

    local.get N           ;; pushes the SSA value of the first def
    ...
    call $foo             ;; safepoint
    local.tee N           ;; redefines the variable
    ...
    array.new_fixed       ;; another safepoint that may GC

Without this fix the safepoint pass omits the first def's value from the stack map, GC's trace doesn't mark it, sweep removes it from OASR, the slot is freed, and the subsequent init-barrier in array.new_fixed inc-refs a freed slot, corrupting the OASR list.

Add all_var_values: SecondaryMap<Variable, EntitySet<Value>> to SSABuilder, and maintain the invariant that every Value written to variables[var][block] is also in all_var_values[var]. The only places that store a Value into variables[var][block] for the first time are def_var and find_var's block-param-creation tail, so those are the two sites that need to insert. use_var_nonlocal's predecessor-copy loop only re-stores a val that find_var just
returned, so the value is already in all_var_values; a debug_assert! documents and checks this. Finally switch
values_for_var to iterate the new set.

vouillon added 2 commits May 22, 2026 13:54
Declares a variable that needs stack maps, defines it twice in the same
block (`v1` then `v2`), reads both values, then calls a function as a
safepoint before returning both. The expected CLIF spills both `v1` and
`v2` to separate slots and lists both in the call's `stack_map=[…]`.

Without the accompanying fix, `SSABuilder::variables` stores only the
latest `Value` per `(Variable, Block)` pair, so the earlier definition
is dropped from `values_for_var` and the safepoint pass omits it from
the stack map. The test fails with only `v2` spilled and only one entry
in `stack_map`.
`SSABuilder::variables` is a `SecondaryMap<Variable,
SecondaryMap<Block, PackedOption<Value>>>` that stores one `Value` per
`(Variable, Block)` pair, so each `def_var` overwrites whatever the
prior definition in that block was. `values_for_var` then iterates this
map and returns only the latest `Value` per block. That is the wrong
set for the safepoint pass: `FunctionBuilder::finalize` uses
`values_for_var` to propagate stack-map flags from a variable onto its
associated values, and any earlier SSA value bound to the variable in
the same block is silently dropped even if it is still live across a
later safepoint.

Concretely, wasm code that re-defines a stack-map variable while an
earlier SSA value bound to it is still live on the operand stack
across an intervening safepoint hits the pattern:

    local.get N           ;; pushes the SSA value of the first def
    ...
    call $foo             ;; safepoint
    local.tee N           ;; redefines the variable
    ...
    array.new_fixed       ;; another safepoint that may GC

The safepoint pass omits the first def's value from the stack map,
GC's trace doesn't mark it, sweep removes it from OASR, the slot is
freed, and the subsequent init-barrier in `array.new_fixed` inc-refs
a freed slot, corrupting the OASR list.

Add `all_var_values: SecondaryMap<Variable, EntitySet<Value>>` to
`SSABuilder`, and maintain the invariant that every `Value` written to
`variables[var][block]` is also in `all_var_values[var]`. The only
places that store a `Value` into `variables[var][block]` for the
first time are `def_var` and `find_var`'s block-param-creation tail,
so those are the two sites that need to insert. `use_var_nonlocal`'s
predecessor-copy loop only re-stores a `val` that `find_var` just
returned, so the value is already in `all_var_values`; a
`debug_assert!` documents and checks this. Finally switch
`values_for_var` to iterate the new set.
@vouillon vouillon requested a review from a team as a code owner May 22, 2026 12:38
@vouillon vouillon requested review from cfallin and removed request for a team May 22, 2026 12:38
@github-actions github-actions Bot added the cranelift Issues related to the Cranelift code generator label May 22, 2026
Copy link
Copy Markdown
Member

@cfallin cfallin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good find -- thank you!

@cfallin cfallin added this pull request to the merge queue May 22, 2026
Merged via the queue into bytecodealliance:main with commit decb45f May 22, 2026
51 checks passed
gfx added a commit to wado-lang/wasmtime that referenced this pull request May 23, 2026
…liance#13228)

A GC reference that is live across a call safepoint is reloaded (after
bytecodealliance#13228, "Handle alias values in `SafepointSpiller::rewrite_use`") from a
stack slot that was never spilled, so it reads back as null and a later
`ref.as_non_null` traps with `wasm trap: null reference`.

The attached module is produced by a real frontend (the Wado compiler):
it JSON-deserializes a two-field struct in a loop. The struct-access
object is built with `struct.new` (hence non-null) and is held in a local
that is live across the call fetching the next field; on the second loop
iteration the reloaded local reads back null.

- Good: wasmtime 44.0.0 — `(invoke "run")` returns without trapping.
- Bad:  wasmtime 45.0.0 .. 46.0.0-dev (main) — `wasm trap: null reference`.

Bisected first bad commit: 24938c4 (bytecodealliance#13228). The follow-up bytecodealliance#13449
does not fix this case. The miscompile is register-allocation-sensitive,
so the module does not reduce under `wasm-tools shrink` (already DCE'd).

This test currently FAILS on main; it documents the bug and is expected
to pass once the safepoint spiller is fixed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cranelift Issues related to the Cranelift code generator

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants