Skip to content

V8 procedures: async/await support for blocking host calls #4697

@philtrem

Description

@philtrem

Summary

V8 procedure syscalls like fetch() block the V8 thread for their full duration, forcing the runtime to spawn additional instances for concurrent requests — each with its own OS thread and V8 isolate. This proposal adds async/await support for procedures so they can yield at host calls, multiplexing multiple in-flight procedures on a single V8 worker. Synchronous procedures continue to work unchanged; module authors opt in to async at their own pace.

Motivation

Blocking procedure syscalls (fetch()) in the V8 runtime park the V8 thread for their full duration — up to 30s per request. When another operation arrives while an instance is blocked, ModuleInstanceManager creates a new V8 instance — spawning an OS thread, allocating a V8 isolate, and recompiling the module. The pool never shrinks, so instances accumulate at peak load.

For example, an LLM chat app where each message triggers a procedure calling an LLM API (30-60s response time). With 10 concurrent users, that's 10 blocked instances each holding an OS thread and V8 heap, plus additional instances for interleaved work. This grows with every concurrent long-running request and never reclaims.

PR #4663 addresses this for reducers, views, and lifecycle callbacks by moving them to a single-worker FIFO lane (JsInstanceLane). Procedures are deliberately left on the old pool because they block on rt.block_on(). V8 has native async/await and there's exactly one guest language to support, so async procedures are a natural fit. A file-by-file analysis of what the implementation would involve accompanies this issue (see comments below), along with a verification checklist, to help expedite this work.

How the WASM runtime already solves this

The WASM runtime doesn't have this problem. Both runtimes call the same instance_env.http_request(), but the paths diverge at the syscall layer:

// WASM path (wasmtime/wasm_instance_env.rs) — yields via async host function
let result = async { env.instance_env.http_request(request, body)?.await }.await;

// V8 path (v8/syscall/common.rs) — blocks the thread
let (response, body) = rt.block_on(env.instance_env.http_request(request, body)?)?;

The WASM runtime uses SingleCoreExecutor backed by a tokio::task::LocalSet — a single-threaded async executor where multiple tasks are multiplexed cooperatively. When a procedure yields at an async host function, the executor polls other tasks. Wasmtime bridges synchronous WASM guest code to async host functions via stack switching.

V8 doesn't have stack switching, but it doesn't need it — native async/await serves the same purpose. #4663 brings the reducer path closer to this model (single worker, FIFO queue) but doesn't add the async yielding that would let procedures share the worker.

What it looks like to module authors

// Synchronous procedure — works as today, blocks V8 thread
(ctx) => {
  const resp = ctx.http.fetch(url);
  return resp.text();
}

// Async procedure — V8 thread is free during await
async (ctx) => {
  const resp = await ctx.http.fetch(url);
  return resp.text();
}

Both forms coexist. Synchronous procedures use the existing rt.block_on() path unchanged. Async procedures yield at await points. Module authors adopt async at their own pace — no migration required. Reducers remain synchronous and never yield, same as the WASM runtime.

The runtime detects async functions at registration time via fn.constructor.name === 'AsyncFunction' and automatically selects the async execution path, providing AsyncProcedureCtx (where fetch() returns Promise<Response> instead of SyncResponse). No explicit flag is needed:

export const myProc = spacetimedb.procedure(
  { url: t.string() },
  t.string(),
  async (ctx, { url }) => {
    const resp = await ctx.http.fetch(url);
    return resp.text();
  }
);

withTx vulnerability (ships independently)

The current withTx<T>(body: (ctx) => T): T signature silently accepts async callbacks — TypeScript infers T = Promise<X>, the call type-checks, and the transaction commits before the awaited body runs. This is a latent data corruption path that exists today.

Fix: a conditional type T extends Promise<any> ? never : T rejects async callbacks at compile time, plus a runtime thenable check as defense-in-depth. This can ship independently of async procedures.

Key implementation areas

The changes needed have been traced through the codebase. The full file-by-file analysis and verification checklist are in the comments below. Here's a summary of the key areas:

Event loop. The synchronous for request in request_rx.iter() loop in spawn_instance_worker() becomes an async event loop with tokio::task::LocalSet, FuturesUnordered for in-flight async futures, and v8::MicrotasksPolicy::kExplicit to control when Promise resolutions propagate. The scheduling priority (request channel biased, with non-blocking try_next to prevent future starvation) ensures reducers aren't delayed by async procedure work.

Per-call state. InstanceEnv and JsInstanceEnv store per-call mutable state (start_time, func_name, tx, iters, call_times, timing_spans) in single slots. With multiple procedures in-flight, these move into a HashMap<CallId, CallContext>. Syscall handlers access the active call's state via env.current_call().

Promise-returning syscalls. procedure_http_request branches on env.current_call().is_async: the sync path uses rt.block_on() unchanged, the async path creates a v8::PromiseResolver, stashes it with the Rust future, and returns the Promise to JS. When the future completes, the event loop resolves the Promise and runs a microtask checkpoint. This pattern also unlocks future streaming syscalls.

__call_procedure__ ABI. Returns Promise<Uint8Array> for async procedures instead of Uint8Array. The Rust side branches on the isAsync metadata from registration; .now_or_never() is removed for async calls.

Host-side dispatch. #4663 already has per-request oneshot reply channels and a cloneable JsInstance. The remaining change is routing procedures through the lane worker instead of procedure_instances, with a semaphore-based max-in-flight limit (e.g. 100).

Trap handling. #4663's replace_active_if_current() mechanism handles traps. The addition is failing all in-flight async procedures' reply channels before replacing the worker.

Safety. A starvation watchdog (terminate_execution() after 30s without yielding) prevents runaway JS. A wall-clock timeout (fixed 5-minute default) catches never-resolving Promises. Both use hard cancellation — graceful cancellation (AbortSignal, cleanup deadlines) can follow later.

Suggested incremental path

Steps 1-2 are independent of #4663 and could be contributed immediately. Steps 3-6 build on #4663's single-worker lane architecture.

  1. Pool shrinking (independent): idle timeout on ModuleInstanceManager so instances are reclaimed after peak load.
  2. withTx hardening (independent): conditional type + runtime thenable check. Fixes a latent bug that exists today.
  3. Event loop + per-call state: async event loop infrastructure + CallContext map. No behavior change yet — all operations still synchronous.
  4. Route procedures through lane worker: remove procedure_instances, add FuturesUnordered + max-in-flight semaphore.
  5. Promise-returning syscalls + async procedures: the actual feature — AsyncProcedureCtx, Promise-based fetch(), __call_procedure__ ABI branching.
  6. Starvation watchdog + wall-clock timeout: safety mechanisms.

Relationship to open PRs

Scope considerations

The following are intentionally left out of the core proposal to keep scope manageable. They're natural follow-ups once the foundation is in place:

  • Graceful cancellation (AbortSignal, cleanup deadlines) — let procedures catch cancellation and run cleanup before being killed.
  • Client disconnect cancellation — cancel only the disconnecting client's procedures.
  • Module shutdown drain mode — grace period for in-flight procedures during update/hotswap.
  • Scheduled procedure overlap — repeat-scheduled async procedures overlapping instead of running serially. Breaking semantic change, needs its own discussion.
  • Overflow workers — spawn additional workers when in-flight count exceeds threshold.
  • Per-procedure timeout configuration — custom timeouts instead of the fixed 5-minute default.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions