Skip to content

DRC collector isn't walking exnref roots in the store #13316

@alexcrichton

Description

@alexcrichton

This test:

#[wasmtime_test(wasm_features(exceptions))]
#[cfg_attr(miri, ignore)]
fn store_pending_exception_is_rooted(config: &mut Config) -> wasmtime::Result<()> {
    let engine = Engine::new(&config)?;
    let mut store = Store::new(&engine, ());

    let module = Module::new(
        &engine,
        r#"
        (module
          (import "h" "t1" (tag $t1 (param i32)))
          (import "h" "throw_t1" (func $throw_t1))
          (func (export "run") (result i32)
            (block $h (result i32)
              (try_table (result i32) (catch $t1 $h)
                call $throw_t1
                unreachable
              )
            )
          )
        )
        "#,
    )?;

    let functy = FuncType::new(&engine, [ValType::I32], []);
    let tagty = TagType::new(functy);
    let t1 = Tag::new(&mut store, &tagty)?;
    let exnty = ExnType::from_tag_type(&tagty)?;
    let exnpre_for_t1 = ExnRefPre::new(&mut store, exnty);

    let throw_t1 = Func::wrap(
        &mut store,
        move |mut caller: Caller<'_, ()>| -> Result<()> {
            let err = {
                let mut scope = RootScope::new(&mut caller);
                let exn = ExnRef::new(&mut scope, &exnpre_for_t1, &t1, &[Val::I32(0x1111_1111)])?;
                scope.as_context_mut().throw::<()>(exn)
            };
            caller.as_context_mut().gc(None)?;
            err.map_err(|e| e.into())
        },
    );

    let instance = Instance::new(
        &mut store,
        &module,
        &[Extern::Tag(t1), Extern::Func(throw_t1)],
    )?;
    let run = instance.get_typed_func::<(), i32>(&mut store, "run")?;
    let result = run.call(&mut store, ())?;
    assert_eq!(result, 0x1111_1111);
    Ok(())
}

currently fails with:

$ cargo test --test all store_pending_exc
   Compiling wasmtime-cli v46.0.0 (/home/alex/code/wasmtime)
    Finished `test` profile [unoptimized + debuginfo] target(s) in 2.41s
     Running tests/all/main.rs (target/debug/deps/all-5d784fe606f0e513)

running 3 tests
test exceptions::winch_store_pending_exception_is_rooted ... ok

thread 'exceptions::craneliftpulley_store_pending_exception_is_rooted' (2812196) panicked at crates/wasmtime/src/runtime/vm/gc/enabled/drc.rs:709:9:
assertion failed: self.ref_count > 0
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

thread 'exceptions::craneliftpulley_store_pending_exception_is_rooted' (2812196) panicked at /rustc/59807616e1fa2540724bfbac14d7976d7e4a3860/library/core/src/panicking.rs:225:5:
panic in a function that cannot unwind
stack backtrace:
 ... big backtrace here ...
thread caused non-unwinding panic. aborting.
error: test failed, to rerun pass `--test all`

Caused by:
  process didn't exit successfully: `/home/alex/code/wasmtime/target/debug/deps/all-5d784fe606f0e513 store_pending_exc` (signal: 6, SIGABRT: process abort signal)

An LLM-generated summary (possibly wrong) is:

Details

Pending exception's VMGcRef is not reference-counted nor traced as a strong root, causing use-after-free of the throw payload across a host-triggered GC

Scope:

  • crates/wasmtime/src/runtime/store.rs:2747-2754
    (StoreOpaque::set_pending_exception: plain assignment, no inc_ref).
  • crates/wasmtime/src/runtime/store.rs:2348-2356
    (StoreOpaque::trace_pending_exception_roots: registers the slot via
    add_vmgcref_root).
  • crates/wasmtime/src/runtime/vm/gc/enabled/drc.rs:425-475
    (DrcHeap::trace: explicitly skips every root for which
    !root.is_on_wasm_stack(), so the pending-exception root never
    marks its referent alive).
  • crates/wasmtime/src/runtime/gc/enabled/exnref.rs:419-427
    (ExnRef::_to_raw: try_clone_gc_ref then expose_gc_ref_to_wasm;
    the +1 ref count from the clone is consumed entirely by the OASR
    list, leaving the pending slot as a borrowed view).
  • crates/wasmtime/src/runtime/vm/throw.rs:21-26
    (compute_handler reads (instance_id, defined_tag_index) from the
    pending exnref after the embedder may have GC'd that slot).

Severity: Use-after-free of a GC-heap exception object reachable
through safe public APIs.
With the default-on gc Cargo feature (DRC
collector), Config::wasm_exceptions(true), plus no debug feature
and no async, the embedder can use only safe wasmtime::* APIs to:

  1. Allocate an exn X1,
  2. Store::throw(X1),
  3. Drop X1's last LIFO root (e.g., by calling throw from inside a
    RootScope),
  4. Trigger an explicit GC via Store::gc() — this dec_refs the OASR
    entry and dealloc's X1's heap slot,
  5. Allocate a new exn X2 of the same size — the FreeList hands
    back the very slot X1 lived in,
  6. Return Err(ThrownException).

The runtime's compute_handler then reads the
(instance_id, defined_tag_index) from X2's bytes and uses them to
search the wasm stack for a matching try_table clause. The
embedder controls every byte of X2, including the tag-identity
header that determines which wasm catch clause runs (or whether
the "thrown" exception escapes the supposedly-catching try_table
entirely). This is a clean primitive for forcing wasm to run a
handler for a tag that was never actually thrown (or to fail to run
a handler for a tag that was thrown).

This bug is distinct from reports 003 / 004 / 010:

  • 003 / 010 are about the pending-exception slot's contents being
    manipulated through additional Store::throw / take_pending_exception
    calls; they require the debug feature.
  • 004 is about the slot being emptied during a debug pre-pass.
  • This bug requires neither debug nor async and no second
    Store::throw. It is purely a memory-safety consequence of
    pending_exception being a borrowed VMGcRef rather than an
    owned/reference-counted slot, combined with DRC's trace
    explicitly skipping non-stack roots.

Summary

In the DRC collector, every reference-typed slot that holds a GC
object is expected to "own" a +1 on the object's reference count
(or to be live-traced as a stack root). For example:

  • The user LIFO root list owns +1 per entry; exit_lifo_scope
    dec_refs each entry's gc_ref before truncating the list.
  • A wasm-stack value is added to the over-approximated stack-roots
    (OASR) list with +1 and is then "marked" during the GC trace
    phase by walking the stack maps; the OASR sweep keeps the entry
    if it was marked.
  • Globals and table elements are stored via write_gc_ref, which
    inc_refs the source and dec_refs the destination.

The pending-exception slot does not follow either contract:

// crates/wasmtime/src/runtime/store.rs:2747-2754
#[cfg(feature = "gc")]
pub(crate) fn set_pending_exception(&mut self, exnref: VMExnRef) {
    self.pending_exception = Some(exnref);
}

Plain assignment; no inc_ref, no write_gc_ref. The slot does not
hold its own +1. It is added as a root for tracing
(trace_pending_exception_roots, store.rs:2348-2356), but only via
add_vmgcref_root, which DRC's trace ignores:

// crates/wasmtime/src/runtime/vm/gc/enabled/drc.rs:436-444
for root in roots {
    if !root.is_on_wasm_stack() {
        // We only trace on-Wasm-stack GC roots. ...
        continue;
    }
    ...
    self.index_mut(drc_ref(&gc_ref)).set_marked();
}

Only is_on_wasm_stack() roots get marked; everything added via
add_vmgcref_root (LIFO/Owned user roots, the pending-exception
slot, globals, table elements) is treated as if it were responsible
for holding its own +1. The pending-exception slot does not.

The two host-side throw paths interact with the OASR list as follows:

  • Wasm throw_ref libcall (vm/libcalls.rs::throw_ref): explicitly
    clone_gc_ref (which inc_refs) before set_pending_exception. The
    +1 from clone_gc_ref belongs to the slot. OK.

  • Host Store::throw (store.rs::throw_impl):

    fn throw_impl(&mut self, exception: Rooted<ExnRef>) {
        let mut nogc = AutoAssertNoGc::new(self);
        let exnref = exception._to_raw(&mut nogc).unwrap();   // (a)
        let exnref = VMGcRef::from_raw_u32(exnref)
            .expect("exception cannot be null")
            .into_exnref_unchecked();                         // (b)
        nogc.set_pending_exception(exnref);                   // (c)
    }

    At (a), _to_raw calls try_clone_gc_ref (inc_ref) and then
    expose_gc_ref_to_wasm. expose_gc_ref_to_wasm (DRC)
    consumes the cloned VMGcRef into the OASR list — so the +1 from
    the clone is owned by the OASR entry. At (b), a fresh VMGcRef
    is reconstructed from the raw u32 value (no inc). At (c), it is
    stored as the pending exception (no inc). Pending slot is now a
    borrowed view of the heap object whose only refcount is held by
    the OASR list.

The OASR list is dec_ref'd at the next GC sweep for any object whose
mark bit is not set. Since no marker walks the pending-exception
root, a sweep called between the host's Store::throw and the
runtime's eventual compute_handler will dealloc the underlying
slot (assuming no other refcount holders, e.g., once the user's
LIFO root is gone). The next allocation of the same size hits the
free list and returns the same slot, populated with attacker-chosen
bytes.

Reproducer

reports/011-pending-exception-uaf-via-gc/ is a self-contained
Cargo project (depends on the wasmtime crate at
../../../wasmtime/crates/wasmtime, i.e. the main worktree).
Build and run:

cd reports/011-pending-exception-uaf-via-gc
cargo build --release
./target/release/repro ; echo "EXIT: $?"

Observed output

error from go.call(): error while executing at wasm backtrace:
    0:     0x49 - <unknown>!<wasm function 1>

Caused by:
    thrown Wasm exception


BUG REPRODUCED (error path): the runtime walked into a deallocated/recycled heap slot.
EXIT: 1

Expected output

The wasm (try_table (catch $t1 ...)) should catch the host-thrown
$t1 exception and produce its i32 payload 0x11111111. Process exits 0.

Reproducer outline

// (full source: reports/011-pending-exception-uaf-via-gc/src/main.rs)

let throw_evil = Func::wrap(&mut store, move |mut caller: Caller<'_, ()>| -> Result<()> {
    // (1) Allocate `X1` of tag $t1 inside an inner RootScope so we can
    //     drop X1's LIFO refcount without leaving the host function.
    {
        let mut scope = RootScope::new(&mut caller);
        let exn = ExnRef::new(&mut scope, &exnpre_for_t1, &t1, &[Val::I32(0x1111_1111)])?;
        let _: Result<(), _> = scope.as_context_mut().throw::<()>(exn);
    }
    // After the inner RootScope drops:
    //   refcount(X1) = 1 (held by OASR only).
    //   pending_exception slot = borrowed VMGcRef into X1's heap slot.

    // (2) Trigger an explicit GC. DRC `trace` skips non-stack roots,
    //     so the pending-exception root and the user-LIFO root do
    //     NOT mark X1. The OASR sweep dec_refs X1: 1 -> 0 -> dealloc.
    caller.as_context_mut().gc(None)?;

    // (3) Reuse the freed slot. The decoy exn's body is the same size,
    //     so the FreeList returns the very slot X1 lived in. The
    //     `(instance_id, tag_idx)` header bytes that the still-set
    //     `pending_exception` slot points at are now t_decoy's tag
    //     identity, not t1's.
    let _decoy = ExnRef::new(&mut caller, &exnpre_decoy, &t_decoy,
                             &[Val::I64(0xDEAD_BEEF_CAFE_BABEu64 as i64)])?;

    // (4) Return Err(ThrownException). compute_handler reads the
    //     (instance, tag_idx) header of the pending exnref — but those
    //     bytes are t_decoy's tag, not t1's. The wasm `(catch $t1 ...)`
    //     does not match, so the supposedly-catchable throw escapes
    //     to the host instead.
    Err(ThrownException.into())
});

The wasm side is the standard "catch $t1, return its i32 payload":

(module
  (import "h" "t1" (tag $t1 (param i32)))
  (import "h" "throw_evil" (func $throw_evil))
  (func (export "go") (result i32)
    (block $h (result i32)
      (try_table (result i32) (catch $t1 $h)
        call $throw_evil
        i32.const 0xdeadbeef))))

Root cause

pending_exception violates the DRC contract on two counts:

  1. No owned refcount. set_pending_exception is a raw assignment;
    it does not call inc_ref/write_gc_ref for the new value (and
    does not dec_ref the displaced value, so two consecutive
    set_pending_exception calls also leak refs — but that is the
    smaller concern). The runtime currently relies on whichever caller
    set the slot to have already taken care of the inc. The throw_ref
    libcall does, but throw_impl (host's Store::throw) does not:
    the +1 it inc_refs is consumed by expose_gc_ref_to_wasm into
    the OASR list, not into the pending-exception slot.

  2. Tracing does not mark. trace_pending_exception_roots uses
    add_vmgcref_root, which DRC's trace impl explicitly skips. So
    even if the slot is "live" from the runtime's perspective, the
    GC sweep treats its referent as garbage as soon as no is_on_wasm_stack()
    root marks it. For an exception that has been set by Store::throw
    from a host function and whose host-side Rooted has gone out of
    scope (or is in the process of going out of scope), nothing else
    marks it.

The exact failure window is "between Store::throw and the runtime's
consumption of the pending exception". Host code can trigger a GC in
that window through any of:

  • An explicit Store::gc(None)/StoreContextMut::gc(...) call
    (used by the reproducer).
  • Any GC-heap allocation that triggers retry-after-GC (e.g.,
    ExnRef::new/StructRef::new/ArrayRef::new when the heap is
    near-full).
  • Less reliably, the trampoline's exit-path call hooks
    (call_hook(CallHook::ReturningFromHost)).

Once the slot is freed, even if no second allocation reuses it, the
DRC dealloc adds the slot to the FreeList, which writes free-list
metadata into the slot's first bytes (not POISON unless gc_zeal is
on). The exception-tag header offsets +24/+28 (DRC) overlap with
where the runtime later reads instance_id and defined_tag_index,
so the read returns whatever happened to be written into those bytes
during dealloc.

Fix options

  1. Make set_pending_exception reference-count-correct. Have it
    take the source via write_gc_ref (or its own init_gc_ref-style
    wrapper that handles the optional displaced value): inc_ref the
    incoming exnref, dec_ref the displaced one. This makes the slot
    own its +1 and decouples its lifetime from the OASR list. This
    is the smallest behavioral change and also fixes the silent
    leak when Store::throw is called twice.

  2. Have DRC's trace mark non-stack roots too. Change the
    if !root.is_on_wasm_stack() { continue; } to also handle
    add_vmgcref_root and add_val_raw_root roots by setting their
    referents' mark bit (the existing globals/tables paths are already
    relying on naive +1 ref counting, so this is a defensive
    measure that costs only the trace traversal). This alone does not
    fix the underlying refcount issue but stops the dealloc from
    firing while the pending slot is still set.

  3. Make throw_impl go through a slot-aware setter. Equivalent
    to (1) but localised: have throw_impl use a write-barriered
    helper instead of set_pending_exception directly.

Option (1) is the minimal correct fix. It also fixes a related
concern: every two consecutive set_pending_exception calls
silently leak a ref (the displaced value's GC ref is dropped without
a dec barrier).

Severity / impact assessment

  • wasm_exceptions(true) is currently 🚧 (work-in-progress), so this
    is not classified as a security issue under Wasmtime's published
    guarantees today. It becomes a security issue the moment the
    exception proposal graduates to tier 1.
  • This is the most impactful exception-proposal bug in the auditing
    series (003/004/005/010/011) because:
    • It does not require the debug feature.
    • It does not require async.
    • It is a memory-safety violation (use-after-free of a GC heap
      object) reachable from safe public API.
    • The freed slot's bytes are attacker-controllable (via a
      follow-up ExnRef::new/StructRef::new of the same size), so
      the embedder can substitute any (instance_id, defined_tag_index)
      into what the runtime believes is the pending exception's tag
      header — a primitive for forcing a wasm try_table to catch
      a tag that was never actually thrown (or to fail to catch a
      tag that was).
    • When the substituted exn is of a different tag that the wasm
      happens to have a (catch ...) for
      , the wasm catches the
      "wrong" exception and reads its payload. Combined with the
      bug-010 attack shape (catch's compiled unboxer reads fields
      using the catch-side tag's layout), this can produce a more
      severe type-confusion variant than 010 alone, and without
      requiring debug.
  • The fix is local: option (1) modifies only set_pending_exception
    and throw_impl's call site (~10 lines).

Metadata

Metadata

Assignees

No one assigned

    Labels

    wasm-proposal:exceptionsIssues for WebAssembly exceptions/exception-handlingwasm-proposal:gcIssues with the implementation of the gc wasm proposal

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions