diff --git a/design/mvp/Binary.md b/design/mvp/Binary.md index 14bec54e..03812964 100644 --- a/design/mvp/Binary.md +++ b/design/mvp/Binary.md @@ -216,6 +216,7 @@ valtype ::= i: => i resourcetype ::= 0x3f 0x7f f?:? => (resource (rep i32) (dtor f)?) | 0x3e 0x7f f: cb?:? => (resource (rep i32) (dtor async f (callback cb)?)) 🚝 functype ::= 0x40 ps: rs: => (func ps rs) + | 0x43 ps: rs: => (func async ps rs) paramlist ::= lt*:vec() => (param lt)* resultlist ::= 0x00 t: => (result t) | 0x01 0x00 => @@ -288,7 +289,6 @@ canon ::= 0x00 0x00 f: opts: ft: => (canon lift | 0x01 0x00 f: opts: => (canon lower f opts (core func)) | 0x02 rt: => (canon resource.new rt (core func)) | 0x03 rt: => (canon resource.drop rt (core func)) - | 0x07 rt: => (canon resource.drop rt async (core func)) 🚝 | 0x04 rt: => (canon resource.rep rt (core func)) | 0x08 => (canon backpressure.set (core func)) πŸ”€βœ• | 0x24 => (canon backpressure.inc (core func)) πŸ”€ @@ -515,7 +515,8 @@ named once. * The opcodes (for types, canon built-ins, etc) should be re-sorted * The two `depname` cases should be merged into one (`dep=<...>`) -* The two `list` type codes should be merged into one with an optional immediate. +* The two `list` type codes should be merged into one with an optional immediate + and similarly for `func`. * The `0x00` variant of `importname'` and `exportname'` will be removed. Any remaining variant(s) will be renumbered or the prefix byte will be removed or repurposed. diff --git a/design/mvp/CanonicalABI.md b/design/mvp/CanonicalABI.md index 7170dc15..bc2a63f4 100644 --- a/design/mvp/CanonicalABI.md +++ b/design/mvp/CanonicalABI.md @@ -570,6 +570,7 @@ complementarily using `parent_lock` and `fiber_lock` as follows: assert(not self.running()) def suspend(self, cancellable) -> SuspendResult: + assert(self.task.may_block()) assert(self.running() and not self.cancellable and self.suspend_result is None) self.cancellable = cancellable self.parent_lock.release() @@ -598,6 +599,7 @@ The `Thread.suspend_until` method is used by a multiple internal callers below to specify a custom `ready_func` that is polled by `Store.tick`: ```python def suspend_until(self, ready_func, cancellable = False) -> SuspendResult: + assert(self.task.may_block()) assert(self.running()) if ready_func() and not DETERMINISTIC_PROFILE and random.randint(0,1): return SuspendResult.NOT_CANCELLED @@ -877,14 +879,21 @@ synchronously or with `async callback`. This predicate is used by the other ```python def needs_exclusive(self): return not self.opts.async_ or self.opts.callback +``` +The `Task.may_block` predicate returns whether the [current task]'s function's +type is allowed to [block]. Specifically, functions that do not declare the +`async` effect that have not yet returned a value may not block. +```python + def may_block(self): + return self.ft.async_ or self.state == Task.State.RESOLVED ``` -The `Task.enter` method implements [backpressure] between when a caller makes a -call to an imported callee and when the callee's core wasm entry point is -executed. This interstitial placement allows an overloaded component instance -to avoid the need to otherwise-endlessly allocate guest memory for blocked -async calls until OOM. When backpressure is enabled, `enter` will block until +The `Task.enter` method implements [backpressure] between when the caller of an +`async`-typed function initiates the call and when the callee's core wasm entry +point is executed. This interstitial placement allows a component instance that +has been overloaded with concurrent function invocations to avoid OOM. When +backpressure is enabled, `enter` will block new `async`-typed calls until backpressure is disabled. There are three sources of backpressure: 1. *Explicit backpressure* is triggered by core wasm calling `backpressure.{inc,dec}` which modify the `ComponentInstance.backpressure` @@ -896,9 +905,17 @@ backpressure is disabled. There are three sources of backpressure: `enter` that need to be given the chance to start without getting starved by new tasks. +Note that, because non-`async`-typed functions ignore backpressure entirely, +they may reenter core wasm when an `async`-typed function would have been +blocked by implicit backpressure. Thus, export bindings generators must be +careful to handle this possibility (e.g., while maintaining the linear-memory +shadow stack pointer) for components with mixed `async`- and non-`async`- typed +exports. ```python def enter(self, thread): assert(thread in self.threads and thread.task is self) + if not self.ft.async_: + return True def has_backpressure(): return self.inst.backpressure > 0 or (self.needs_exclusive() and self.inst.exclusive) if has_backpressure() or self.inst.num_waiting_to_enter > 0: @@ -931,6 +948,8 @@ returns to clear the `exclusive` flag set by `Task.enter`, allowing other ```python def exit(self): assert(len(self.threads) > 0) + if not self.ft.async_: + return if self.needs_exclusive(): assert(self.inst.exclusive) self.inst.exclusive = False @@ -3249,12 +3268,17 @@ function (specified as a `funcidx` immediate in `canon lift`) until the inst.exclusive = False match code: case CallbackCode.YIELD: - event = task.yield_until(lambda: not inst.exclusive, thread, cancellable = True) + if task.may_block(): + event = task.yield_until(lambda: not inst.exclusive, thread, cancellable = True) + else: + event = (EventCode.NONE, 0, 0) case CallbackCode.WAIT: + trap_if(not task.may_block()) wset = inst.table.get(si) trap_if(not isinstance(wset, WaitableSet)) event = task.wait_until(lambda: not inst.exclusive, thread, wset, cancellable = True) case CallbackCode.POLL: + trap_if(not task.may_block()) wset = inst.table.get(si) trap_if(not isinstance(wset, WaitableSet)) event = task.poll_until(lambda: not inst.exclusive, thread, wset, cancellable = True) @@ -3272,6 +3296,12 @@ built-ins. Thus, the main difference between stackful and stackless async is whether these suspending operations are performed from an empty or non-empty core wasm callstack (with the former allowing additional engine optimization). +If a `Task` is not allowed to block (because it was created for a non-`async`- +typed function call and has not yet returned a value), `YIELD` is always a +no-op and `WAIT` and `POLL` always trap. Thus, a component may implement a +non-`async`-typed function with the `async callback` ABI, but the component +*must* call `task.return` *before* returning `WAIT` or `POLL`. + The event loop also releases `ComponentInstance.exclusive` (which was acquired by `Task.enter` and will be released by `Task.exit`) before potentially suspending the thread to allow other synchronous and `async callback` tasks to @@ -3363,14 +3393,24 @@ Based on this, `canon_lower` is defined in chunks as follows: ```python def canon_lower(opts, ft, callee: FuncInst, thread, flat_args): trap_if(not thread.task.inst.may_leave) - subtask = Subtask() - cx = LiftLowerContext(opts, thread.task.inst, subtask) + trap_if(not thread.task.may_block() and ft.async_ and not opts.async_) ``` +A non-`async`-typed function export that has not yet returned a value +unconditionally traps if it transitively attempts to make a synchronous call to +an `async`-typed function import (even if the callee wouldn't have actually +blocked at runtime). It is however fine to make an `async`-lowered call to an +`async`-typed function import, since this never blocks (only a subsequent call +to, e.g., `waitable-set.wait` would block). + Each call to `canon_lower` creates a new `Subtask`. However, this `Subtask` is only added to the current component instance's table (below) if `async` is specified *and* `callee` blocks. In any case, this `Subtask` is used as the `LiftLowerContext.borrow_scope` for `borrow` arguments, ensuring that owned handles are not dropped before `Subtask.deliver_return` is called (below). +```python + subtask = Subtask() + cx = LiftLowerContext(opts, thread.task.inst, subtask) +``` The next chunk makes the call to `callee` (which has type `FuncInst`, as defined in the [Embedding](#embedding) interface). The [current task] serves as @@ -3415,6 +3455,7 @@ above). flat_results = lower_flat_values(cx, max_flat_results, result, ft.result_type(), flat_args) subtask.callee = callee(thread.task, on_start, on_resolve) + assert(ft.async_ or subtask.state == Subtask.State.RETURNED) ``` The `Subtask.state` field is updated by the callbacks to keep track of the call progres. The `on_progress` variable starts as a no-op, but is used by the @@ -3423,7 +3464,8 @@ call progres. The `on_progress` variable starts as a no-op, but is used by the According to the `FuncInst` calling contract, the call to `callee` should never "block" (i.e., wait on I/O). If the `callee` *would* block, it will instead return a `Call` object which is stored in the `Subtask` (so that it can be used -to `request_cancellation` in the future). +to `request_cancellation` in the future). Furthermore, if the function type +does not have the `async` effect, the function *must* have returned a value. In the synchronous case (when the `async` `canonopt` is not set), if the `callee` blocked before calling `on_resolve`, the synchronous caller's thread @@ -3518,20 +3560,18 @@ For a canonical definition: validation specifies: * `$rt` must refer to resource type * `$f` is given type `(func (param i32))` -* πŸ”€+🚝 - `async` is allowed (otherwise it is not allowed) Calling `$f` invokes the following function, which removes the handle from the current component instance's table and, if the handle was owning, calls the resource's destructor. ```python -def canon_resource_drop(rt, async_, thread, i): +def canon_resource_drop(rt, thread, i): trap_if(not thread.task.inst.may_leave) inst = thread.task.inst h = inst.table.remove(i) trap_if(not isinstance(h, ResourceHandle)) trap_if(h.rt is not rt) trap_if(h.num_lends != 0) - flat_results = [] if not async_ else [0] if h.own: assert(h.borrow_scope is None) if inst is rt.impl: @@ -3539,24 +3579,26 @@ def canon_resource_drop(rt, async_, thread, i): rt.dtor(h.rep) else: if rt.dtor: - caller_opts = CanonicalOptions(async_ = async_) + caller_opts = CanonicalOptions(async_ = False) callee_opts = CanonicalOptions(async_ = rt.dtor_async, callback = rt.dtor_callback) - ft = FuncType([U32Type()],[]) + ft = FuncType([U32Type()],[], async_ = False) callee = partial(canon_lift, callee_opts, rt.impl, ft, rt.dtor) - flat_results = canon_lower(caller_opts, ft, callee, thread, [h.rep]) + [] = canon_lower(caller_opts, ft, callee, thread, [h.rep]) else: thread.task.trap_if_on_the_stack(rt.impl) else: h.borrow_scope.num_borrows -= 1 - return flat_results -``` -In general, the call to a resource's destructor is treated like a -cross-component call (as-if the destructor was exported by the component -defining the resource type). This means that cross-component destructor calls -follow the same concurrency rules as normal exports. However, since there are -valid reasons to call `resource.drop` in the same component instance that -defined the resource, which would otherwise trap at the reentrance guard of -`Task.enter`, an exception is made when the resource type's + return [] +``` +The call to a resource's destructor is defined as a non-`async`-lowered, +non-`async`-typed function call to a possibly-`async`-lifted callee, passing +the private `i32` representation as a parameter. Thus, destructors *may* block +on I/O, but only after they `task.return`, ensuring that `resource.drop` never +blocks. + +Since there are valid reasons to call `resource.drop` in the same component +instance that defined the resource, which would otherwise trap at the +reentrance guard of `Task.enter`, an exception is made when the resource type's implementation-instance is the same as the current instance (which is statically known for any given `canon resource.drop`). @@ -3798,6 +3840,7 @@ returning its `EventCode` and writing the payload values into linear memory: ```python def canon_waitable_set_wait(cancellable, mem, thread, si, ptr): trap_if(not thread.task.inst.may_leave) + trap_if(not thread.task.may_block()) wset = thread.task.inst.table.get(si) trap_if(not isinstance(wset, WaitableSet)) event = thread.task.wait_until(lambda: True, thread, wset, cancellable) @@ -3810,6 +3853,10 @@ def unpack_event(mem, thread, ptr, e: EventTuple): store(cx, p2, U32Type(), ptr + 4) return [event] ``` +A non-`async`-typed function export that has not yet returned a value +unconditionally traps if it transitively attempts to call `wait` (regardless of +whether there are any waitables with pending events). + The `lambda: True` passed to `wait_until` means that `wait_until` will only wait for the given `wset` to have a pending event with no extra conditions. @@ -3837,6 +3884,7 @@ same way as `wait`. ```python def canon_waitable_set_poll(cancellable, mem, thread, si, ptr): trap_if(not thread.task.inst.may_leave) + trap_if(not thread.task.may_block()) wset = thread.task.inst.table.get(si) trap_if(not isinstance(wset, WaitableSet)) event = thread.task.poll_until(lambda: True, thread, wset, cancellable) @@ -3845,7 +3893,10 @@ def canon_waitable_set_poll(cancellable, mem, thread, si, ptr): Even though `waitable-set.poll` doesn't block until the given waitable set has a pending event, `poll_until` does transitively perform a `Thread.suspend` which allows the embedder to nondeterministically switch to executing another -task (like `thread.yield`). +task (like `thread.yield`). To avoid encouraging spin-waiting and to support +hosts like browsers that require returning to the event loop for async I/O to +resolve, a non-`async`-typed function export that has not yet returned a value +unconditionally traps if it transitively attempts to call `poll`. If `cancellable` is set, then `waitable-set.poll` will return whether the supertask has already or concurrently requested cancellation. @@ -3944,6 +3995,7 @@ BLOCKED = 0xffff_ffff def canon_subtask_cancel(async_, thread, i): trap_if(not thread.task.inst.may_leave) + trap_if(not thread.task.may_block() and not async_) subtask = thread.task.inst.table.get(i) trap_if(not isinstance(subtask, Subtask)) trap_if(subtask.resolve_delivered()) @@ -3963,9 +4015,12 @@ def canon_subtask_cancel(async_, thread, i): assert(subtask.resolve_delivered()) return [subtask.state] ``` -The initial trapping conditions disallow calling `subtask.cancel` twice for the -same subtask or after the supertask has already been notified that the subtask -has returned. +A non-`async`-typed function export that has not yet returned a value +unconditionally traps if it transitively attempts to make a synchronous call to +`subtask.cancel` (regardless of whether the cancellation would have succeeded +without blocking). The other traps disallow calling `subtask.cancel` twice for +the same subtask or after the supertask has already been notified that the +subtask has returned. A race condition handled by the above code is that it's possible for a subtask to have already resolved (by calling `task.return` or `task.cancel`) and @@ -4060,13 +4115,20 @@ def canon_stream_write(stream_t, opts, thread, i, ptr, n): stream_t, opts, thread, i, ptr, n) ``` -Introducing the `stream_copy` function in chunks, `stream_copy` first checks -that the element at index `i` is of the right type and allowed to start a new -copy. (In the future, the "trap if not `IDLE`" condition could be relaxed to -allow multiple pipelined reads or writes.) +Introducing the `stream_copy` function in chunks, a non-`async`-typed function +export that has not yet returned a value unconditionally traps if it +transitively attempts to perform a synchronous `read` or `write` (regardless of +whether the operation would have succeeded eagerly without blocking). ```python def stream_copy(EndT, BufferT, event_code, stream_t, opts, thread, i, ptr, n): trap_if(not thread.task.inst.may_leave) + trap_if(not thread.task.may_block() and not opts.async_) +``` + +Next, `stream_copy` checks that the element at index `i` is of the right type +and allowed to start a new copy. (In the future, the "trap if not `IDLE`" +condition could be relaxed to allow multiple pipelined reads or writes.) +```python e = thread.task.inst.table.get(i) trap_if(not isinstance(e, EndT)) trap_if(e.shared.t != stream_t.t) @@ -4166,11 +4228,14 @@ def canon_future_write(future_t, opts, thread, i, ptr): ``` Introducing the `future_copy` function in chunks, `future_copy` starts with the -same set of guards as `stream_copy` for parameters `i` and `ptr`. The only -difference is that, with futures, the `Buffer` length is fixed to `1`. +same set of guards as `stream_copy` regarding whether suspension is allowed and +parameters `i` and `ptr`. The only difference is that, with futures, the +`Buffer` length is fixed to `1`. ```python def future_copy(EndT, BufferT, event_code, future_t, opts, thread, i, ptr): trap_if(not thread.task.inst.may_leave) + trap_if(not thread.task.may_block() and not opts.async_) + e = thread.task.inst.table.get(i) trap_if(not isinstance(e, EndT)) trap_if(e.shared.t != future_t.t) @@ -4254,6 +4319,7 @@ def canon_future_cancel_write(future_t, async_, thread, i): def cancel_copy(EndT, event_code, stream_or_future_t, async_, thread, i): trap_if(not thread.task.inst.may_leave) + trap_if(not thread.task.may_block() and not async_) e = thread.task.inst.table.get(i) trap_if(not isinstance(e, EndT)) trap_if(e.shared.t != stream_or_future_t.t) @@ -4269,8 +4335,12 @@ def cancel_copy(EndT, event_code, stream_or_future_t, async_, thread, i): assert(not e.copying() and code == event_code and index == i) return [payload] ``` -Cancellation traps if there is not currently an async copy in progress (sync -copies do not expect or check for cancellation and thus cannot be cancelled). +A non-`async`-typed function export that has not yet returned a value +unconditionally traps if it transitively attempts to make a synchronous call to +`cancel-read` or `cancel-write` (regardless of whether the cancellation would +have completed without blocking). There is also a trap if there is not +currently an async copy in progress (sync copies do not expect or check for +cancellation and thus cannot be cancelled). The *first* check for `e.has_pending_event()` catches the case where the copy has already racily finished, in which case we must *not* call `cancel()`. Calling @@ -4435,9 +4505,13 @@ calling component. ```python def canon_thread_suspend(cancellable, thread): trap_if(not thread.task.inst.may_leave) + trap_if(not thread.task.may_block()) suspend_result = thread.task.suspend(thread, cancellable) return [suspend_result] ``` +A non-`async`-typed function export that has not yet returned a value traps if +it transitively attempts to call `thread.suspend`. + If `cancellable` is set, then `thread.suspend` will return a `SuspendResult` value to indicate whether the supertask has already or concurrently requested cancellation. `thread.suspend` (and other cancellable operations) will only @@ -4521,6 +4595,8 @@ other threads in a cooperative setting. ```python def canon_thread_yield(cancellable, thread): trap_if(not thread.task.inst.may_leave) + if not thread.task.may_block(): + return [SuspendResult.NOT_CANCELLED] event_code,_,_ = thread.task.yield_until(lambda: True, thread, cancellable) match event_code: case EventCode.NONE: @@ -4528,6 +4604,13 @@ def canon_thread_yield(cancellable, thread): case EventCode.TASK_CANCELLED: return [SuspendResult.CANCELLED] ``` +If a non-`async`-typed function export that has not yet returned a value +transitively calls `thread.yield`, it returns immediately without blocking +(instead of trapping, as with other possibly-blocking operations like +`waitable-set.poll`). This is because, unlike other built-ins, `thread.yield` +may be scattered liberally throughout code that might show up in the transitive +call tree of a synchronous function call. + Even though `yield_until` passes `lambda: True` as the condition it is waiting for, `yield_until` does transitively peform a `Thread.suspend` which allows the embedder to nondeterministically switch to executing another thread. @@ -4741,11 +4824,12 @@ def canon_thread_available_parallelism(): [Adapter Functions]: FutureFeatures.md#custom-abis-via-adapter-functions [Shared-Everything Dynamic Linking]: examples/SharedEverythingDynamicLinking.md [Concurrency Explainer]: Concurrency.md -[Suspended]: Concurrency#waiting +[Suspended]: Concurrency#thread-built-ins [Structured Concurrency]: Concurrency.md#subtasks-and-supertasks [Backpressure]: Concurrency.md#backpressure [Current Thread]: Concurrency.md#current-thread-and-task [Current Task]: Concurrency.md#current-thread-and-task +[Block]: Concurrency.md#blocking [Subtasks]: Concurrency.md#subtasks-and-supertasks [Readable and Writable Ends]: Concurrency.md#streams-and-futures [Readable or Writable End]: Concurrency.md#streams-and-futures diff --git a/design/mvp/Concurrency.md b/design/mvp/Concurrency.md index 55bd9856..3784257c 100644 --- a/design/mvp/Concurrency.md +++ b/design/mvp/Concurrency.md @@ -13,10 +13,12 @@ emojis. For an even higher-level introduction, see [these][wasmio-2024] * [Threads and Tasks](#threads-and-tasks) * [Subtasks and Supertasks](#subtasks-and-supertasks) * [Current Thread and Task](#current-thread-and-task) + * [Thread Built-ins](#thread-built-ins) * [Thread-Local Storage](#thread-local-storage) + * [Blocking](#blocking) + * [Waitables and Waitable Sets](#waitables-and-waitable-sets) * [Streams and Futures](#streams-and-futures) * [Stream Readiness](#stream-readiness) - * [Waiting](#waiting) * [Backpressure](#backpressure) * [Returning](#returning) * [Borrows](#borrows) @@ -52,7 +54,7 @@ concurrency-specific goals and use cases: [shared-everything-threads]. * Allow polyfilling in browsers via JavaScript Promise Integration ([JSPI]) * Avoid partitioning interfaces and components into separate ecosystems based - on degree of concurrency; don't give functions or components a "[color]". + on degree of concurrency; don't give components a "[color]". * Maintain meaningful cross-language call stacks (for the benefit of debugging, logging and tracing). * Consider backpressure and cancellation as part of the design. @@ -72,34 +74,58 @@ as `select`, `epoll`, `io_uring`, `kqueue` and Overlapped I/O) making the Component Model "just another OS" from the language toolchain's perspective. The new async ABI can be used alongside or instead of the existing Preview 2 -"sync ABI" to call or implement *any* WIT function type, not just functions -with specific signatures. This allows *all* function types to be called or -implemented concurrently. When *calling* an imported function via the async -ABI, if the callee blocks, control flow is returned immediately to the caller, -and the callee resumes execution concurrently. When *implementing* an exported -function via the async ABI, multiple concurrent export calls are allowed to -be made by the caller. Critically, both sync-to-async and async-to-sync -pairings have well-defined, composable behavior for both inter-component and -intra-component calls, so that functions and components are not forced to pick -a "[color]". - -Although Component Model function *types* are colorless, it can still be -beneficial, especially in languages with `async`/`await`-style concurrency, to -give the bindings generator a *hint* as to whether or not a particular function -declared in WIT should appear as an `async` function in the generated bindings -by default. Even in languages with colorless functions, developers and their -tools can still benefit from such a hint when determining whether they want to -call a particular imported function concurrently or not. To support these use -cases, functions in WIT can be annotated with an `async` hint. E.g.: +"sync ABI" to call or implement *any* WIT function type. When *calling* an +imported function via the async ABI, if the callee [blocks](#blocking), control +flow is returned immediately to the caller, and the callee continues executing +concurrently. When *implementing* an exported function via the async ABI, +multiple concurrent export calls are allowed to be made by the caller. +Critically, both sync-ABI-calls-async-ABI and async-ABI-calls-sync-ABI pairings +have well-defined, composable behavior for both inter-component and +intra-component calls. + +In addition to adding a new async *ABI* for use by the language's compiler and +runtime, the Component Model also adds a new `async` [effect type] that can be +added to function types (in both WIT and raw component function type +definitions) to indicate that the function may block before returning its value. ```wit -interface http-handler { - use http-types.{request, response, error}; - handle: async func(r: request) -> result; +interface processor { + process: async func(in: inputs) -> outputs; /* may block */ + ready: func() -> bool; /* may not block */ } ``` -Since `async` is just a hint, this `handle` function can be called using both -the sync and async ABIs. Bindings generators can even generate both variants -side-by-side, giving the developer the choice. +When a function type does *not* contain `async`, the Component Model traps if +the callee blocks at runtime before returning a value (as described in more +detail [below](#blocking)). Thus, the absence of `async` means that a host or +component caller never needs to handle the case where the callee blocks before +returning a value. For hosts like browsers with event-loop concurrency, this +invariant is necessary to allow non-`async` component exports to be called in +synchronous contexts (like event listeners, callbacks, getters, setters and +constructors). + +Because `async` function exports may be implemented with the *sync* ABI and +then call `async` function imports using the *sync* ABI, traditional sync code +can compile directly to components exporting `async` functions without having +to be rewritten to use source-language concurrency mechanisms (like callbacks, +`async`/`wait`, coroutines, etc). For example, traditional C programs with a +`main()` and calls to `read()`, `write()` and `select()` can run without change +in the Preview 3 `wasi:cli/command` world, which exports `run: async func() -> +result`. Thus, `async` in WIT does not require the same kind of transitive +source-code changes as source-level `async` in langauges like C#, Python, JS, +Rust and Dart. + +Because `async` exports impose little to no requirements on the guest +language's style of concurrency, most `world`s (including `wasi:cli/command`, +`wasi:http/service` and `wasi:http/middleware`) are expected to export `async` +functions so that the contained Core WebAssembly code is free to block. +Implementing a non-`async` function will primarily only arise when a component +is *virtualizing* the non-`async` *imports* of a `world` (e.g., the getters and +setters of `wasi:http/types.headers`). In this virtualization scenario (once +functions are allowed to be [recursive](#TODO)), the Canonical ABI and/or Core +WebAssembly [stack-switching] proposal will allow a parent component to +implement a child's non-`async` imports in terms of the parent's `async` +imports in the same manner as [JSPI]. Thus, overall, `async` in WIT and the +Component Model does not behave like a "color" in the sense described by the +popular [What Color Is Your Function?] essay. Each time a component export is called, the wasm runtime logically spawns a new [green thread] (as opposed to a [kernel thread]) to execute the export call @@ -118,8 +144,8 @@ In addition to the *implicit* threads logically created for export calls, Core WebAssembly code can also *explicitly* create new green threads by calling the [`thread.new-indirect`] built-in. Regardless of how they were created, all threads can call a set of Component Model-defined `thread.*` built-in functions -(listed [below](#waiting)) to suspend themselves and/or resume other threads. -These built-ins provide sufficient functionality to implement both the +(listed [below](#thread-built-ins)) to suspend themselves and/or resume other +threads. These built-ins provide sufficient functionality to implement both the internally-scheduled "green thread" and the externally-scheduled "host thread" use cases mentioned in the [goals](#goals). @@ -155,22 +181,26 @@ runtime while waiting in the event loop. To propagate backpressure, it's necessary for a component to be able to say "there are too many concurrent export calls already in progress, don't start -any more until I let some of them complete". Thus, the Component Model provides -a built-in way for a component instance to apply and release backpressure that -callers must always be prepared to handle. - -With this backpressure mechanism in place, there is a natural way for sync and -async code to interoperate: -1. If an async component calls a sync component and the sync component blocks, - execution is immediately returned to the async component, effectively - suspending the sync component. -2. If anyone tries to reenter the now-suspended sync component, the Component - Model automatically signals backpressure on the suspended component's - behalf. - -Thus, backpressure combined with the partitioning of low-level state provided -by the Component Model enables sync and async code to interoperate while -preserving the expectations of both. +any more `async` calls until I let some of the current calls complete". Thus, +the Component Model provides a built-in way for a component instance to apply +and release backpressure that callers experience by having their import call +immediately block. + +This backpressure mechanism provides the basis for how the sync and async ABIs +interoperate: +1. If a component calls an import using the async ABI, and the import is + implemented by a component using the sync ABI, and the callee blocks, + execution is immediately transferred back to the caller (as required by the + async ABI) and the callee's component instance is marked "suspended". +2. If another async call attempts to start in a "suspended" component instance, + the Component Model automatically makes the call block, the same way as when + backpressure is active. + +Note that because functions without `async` in their type are not allowed to +block, non-`async` functions do not check for backpressure or suspension; they +always run synchronously. Components exporting a mix of `async` and non-`async` +functions (which again mostly only arises in the more advanced virtualization +scenarios) must thus take care to handle non-`async` reentrance gracefully. Lastly, WIT is extended with two new type constructorsβ€”`future` and `stream`β€”to allow new WIT interfaces to explicitly represent concurrency in @@ -326,6 +356,57 @@ object as an argument to all function calls so that the semantic "current thread" is always the value of the `thread` parameter. Threads store their containing task so that the "current task" is always `thread.task`. +Because there is always a well-defined current task and tasks are always +created for calls to typed functions, it is therefore also always well-defined +to refer to the current task's function's type, e.g., when returning a value or +determining whether blocking is allowed. + +### Thread Built-ins + +The Component Model provides a set of built-in Core WebAssembly functions for +creating and running threads. + +New threads are created with the [`thread.new-indirect`] built-in. As mentioned +[above](#threads-and-tasks), a spawned thread inherits the task of the spawning +thread which is why threads and tasks are N:1. `thread.new-indirect` adds a new +thread to the component instance's table and returns the `i32` index of this +table entry to the Core WebAssembly caller. Like [`pthread_create`], +`thread.new-indirect` takes a Core WebAssembly function (via `i32` index into a +`funcref` table) and a "closure" parameter to pass to the function when called +on the new thread. However, unlike `pthread_create`, the new thread is +initially in a "suspended" state and must be explicitly "resumed" using one of +the following 3 thread built-ins. Once the thread is resumed, the thread can +learn its own index by calling the [`thread.index`] built-in. + +A suspended thread (identified by table index) can be resumed at some +non-deterministic point in future via the [`thread.resume-later`] built-in. In +contrast, the [`thread.yield-to`] built-in switches execution to the given +thread immediately, leaving the *calling* thread to be resumed at some +non-deterministic point in the future. Lastly, the [`thread.switch-to`] +built-in switches execution to the given thread immediately, like `yield-to`, +but leaves the calling thread in the "suspended" state. These three functions +can be used to resume both newly-created threads as well as threads that +executed and then suspended. + +In addition to threads entering the suspended state via `thread.new-indirect` +and `thread.switch-to`, threads can also explicitly suspend themselves via the +[`thread.suspend`] built-in. Thus, there are three ways a thread *enters* the +suspended state and three ways a thread *exits* the suspended state (with +`thread.switch-to` serving in both categories). Together, these 5 thread +built-ins support both the "green thread" [use cases](#goals) where Core +WebAssembly code running inside the component wants to fully control thread +scheduling (via `thread.switch-to` and `thread.suspend`) as well as the "host +thread" use cases where the Core WebAssembly code wants to let the containing +runtime nondeterministically schedule threads (via `thread.resume-later` or +`thread.yield-to`). + +Lastly, since threads are cooperative, there is a [`thread.yield`] built-in +that can be called in the middle of long-running computations to allow the +runtime to nondeterministically switch execution to another thread. +`thread.yield` is equivalent to (but obviously more efficient than) creating a +new thread with a no-op function (via `thread.new-indirect`) and then yielding +to it (via `thread.yield-to`). + ### Thread-Local Storage Each thread contains a distinct mutable **thread-local storage** array. The @@ -374,6 +455,69 @@ allowing control flow to escape into the wild. For more information, see [`context.get`] in the AST explainer. +### Blocking + +When a thread calls an import using the async ABI, the Component Model +guarantees that if the callee **blocks**, control flow is immediately returned +back to the caller. When the callee is implemented by the *host*, what counts as +"blocking" is up to the host; e.g., the host can arbitrarily determine whether +file I/O "blocks" or not depending on whether the host is implemented using +traditional synchronous OS syscalls or an asynchronous `io_uring`. However, +when the callee is implemented by another component, the Component Model +defines exactly what counts as "blocking". + +At a high level, there are six ways for a call to a component export to block, +all of which are described above or below in more detail: +* calling an `async`-typed function import using the sync ABI +* suspending the current thread via the + [`thread.suspend`](#thread-built-ins) built-in +* cooperatively yielding (e.g., during a long-running computation) via the + [`thread.yield`](#thread-built-ins) built-in +* waiting for one of a set of concurrent operations to complete via the + [`waitable-set.{wait,poll}`](#waitables-and-waitable-sets) built-ins +* waiting for a stream or future operation to complete via the + [`{stream,future}.{,cancel-}{read,write}`](#streams-and-futures) built-ins +* waiting for a subtask to cooperatively cancel itself via the + [`subtask.cancel`](#cancellation) built-in + +At each of these points, the [current thread](#current-thread-and-task) will be +suspended and execution will transfer to a caller's thread, if there is one. +Additionally, each of these potentially-blocking operations will trap if the +[current task's function type](#current-thread-and-task) does not declare the +`async` effect, since only `async`-typed functions are allowed to block. As an +exception, to allow it to be called arbitrarily from anywhere, `thread.yield` +does not trap but instead behaves as a no-op if the current task's function +type does not contain `async`. + +The [Canonical ABI explainer] defines the above behavior more precisely; search +for `may_block` to see all the relevant points. + +### Waitables and Waitable Sets + +When an `async`-typed function is called with the async ABI and the call +[blocks](#blocking) before returning a value, the return value to the Core +WebAssembly caller is the index of a newly-created **subtask** representing the +concurrent execution of the callee. Subtasks are a kind of **waitable**. +Multiple waitables can be added to a **waitable set** to wait for one of them +to make progress. Waitable sets work like a simplified version of [`epoll`] and +are designed to avoid the O(N) cost associated with traditional +[`select`]-style primitives. + +Specifically, waitable sets are created and used via the following built-ins: +* [`waitable-set.new`]: return a new empty waitable set +* [`waitable.join`]: add, move, or remove a given waitable to/from a given + waitable set +* [`waitable-set.wait`]: suspend until one of the waitables in the given set + has a pending event and then return that event +* [`waitable-set.poll`]: first `thread.yield` and, once resumed, if any of the + waitables in the given set has a pending event, return that event; otherwise + return a sentinel "none" value + +In addition to subtasks, (the readable and writable ends of) [streams and +futures](#streams-and-futures) are *also* waitables, which means that a single +waitable set can uniformly wait on all the kinds of heterogeneous I/O available +in the Component Model. + ### Streams and Futures Streams and Futures have two "ends": a *readable end* and *writable end*. When @@ -403,12 +547,13 @@ pointer and length of a linear-memory buffer to write-into or read-from, resp. These built-ins can either return immediately if >0 elements were able to be written or read immediately (without blocking) or return a sentinel "blocked" value indicating that the read or write will execute concurrently. The readable -and writable ends of streams and futures can then be [waited](#waiting) on to -make progress. Notification of progress signals *completion* of a read or write -(i.e., the bytes have already been copied into the buffer). Additionally, -*readiness* (to perform a read or write in the future) can be queried and -signalled by performing a `0`-length read or write (see the [Stream State] -section in the Canonical ABI explainer for details). +and writable ends of streams and futures can then be +[waited](#waitables-and-waitable-sets) on to make progress. Notification of +progress signals *completion* of a read or write (i.e., the bytes have already +been copied into the buffer). Additionally, *readiness* (to perform a read or +write in the future) can be queried and signalled by performing a `0`-length +read or write (see the [Stream State] section in the Canonical ABI explainer +for details). As a temporary limitation, if a `read` and `write` for a single stream or future occur from within the same component and the element type is non-empty, @@ -462,8 +607,8 @@ scheme is also possible for `read()`): * When `select()` is called to wait for a stream-backed file descriptor to be writable: * `select()` starts a zero-length write if there is not already a pending - write in progress and then [waits](#waiting) on the stream (along with the - other `select()` arguments). + write in progress and then [waits](#waitables-and-waitable-sets) on the + stream (along with the other `select()` arguments). * If the pending write completes, `select()` updates the file descriptor and returns that the file descriptor is ready. * When `write()` is called for an `O_NONBLOCKING` file descriptor: @@ -490,79 +635,6 @@ readiness resembles the buffering normally performed by the kernel for a `write` syscall and reflects the fact that streams do not perform internal buffering between the readable and writable ends. -### Waiting - -When a component asynchronously lowers an import, it is explicitly requesting -that, if the import blocks, control flow be returned back to the calling thread -so that it can do something else. Similarly, if `stream.read` or `stream.write` -are called asynchronously and block, they return a "blocked" code so that the -caller can continue to make progress on other things. But eventually, a thread -will run out of other things to do and will need to wait for something else to -happen by **suspending** itself until something else happens. - -The following three built-ins put threads into a suspended state: -* [`thread.new-indirect`]: create a new thread that is initially suspended - and continue executing the current thread -* [`thread.switch-to`]: suspend the current thread and immediately resume a - given thread -* [`thread.suspend`]: suspend the current thread and resume any transitive - async caller on the stack - -These built-ins enable "green thread" [use cases](#goals), allowing the -language's runtime (compiled to wasm) to deterministically control which thread -executes when. - -The following three built-ins can additionally be called to -nondeterministically resume a thread at some point in the future (allowing the -embedder to use whatever scheduler heuristics based on, e.g., timing and -priority): - -* [`thread.resume-later`]: resume a given thread at some point in the future - and continue executing in the current thread -* [`thread.yield-to`]: immediately resume a given thread and resume the current - thread at some point in the future -* [`thread.yield`]: immediately resume *some* (nondeterministically-selected) - other thread and resume the current thread at some point in the future - -These built-ins enable the "host thread" [use cases](#goals), allowing the -embedder to nondeterministically control which thread is resumed when. In -particular, [`pthread_create`] can be implemented using `thread.new-indirect` -and either `thread.resume-later` or `thread.yield-to` (thereby allowing the -pthreads implementation to choose whether to execute a new pthread eagerly or -not). - -Additionally, a thread may need to wait for progress to be made on an async -subtask or stream/future read/write in progress. Subtasks and readable/writable -ends of streams/futures are collectively called **waitables** and can be put -into **waitable sets** which a thread can wait on. Waitable sets avoid the O(N) -cost of passing and examining a list of waitables every time a thread needs to -wait for progress in the same manner as, e.g., `epoll`. - -In particular, the following built-ins allow building and using waitable sets: -* [`waitable-set.new`]: return a new empty waitable set -* [`waitable.join`]: add, move, or remove a given waitable to/from a given - waitable set -* [`waitable-set.wait`]: suspend until one of the waitables in the given set - has a pending event and then return that event -* [`waitable-set.poll`]: first `thread.yield` and, once resumed, if any of the - waitables in the given set has a pending event, return that event; otherwise - return a sentinel "none" value - -Threads that are explicitly suspended (via `thread.new-indirect`, -`thread.switch-to` or `thread.suspend`) will stay suspended indefinitely until -explicitly resumed (via `thread.switch-to`, `thread.resume-later`, -`thread.yield-to`). Attempting to explicitly resume a thread that was *not* -explicitly suspended by one of these three built-ins traps. For example, -attempting to `thread.resume-later` a thread waiting on `waitable-set.wait` or -a synchronous import call will trap. Thus, language runtimes and compilers have -to be careful when using a mix of explicit and implicit suspension/resumption. - -Lastly, when an async function is implemented using the `callback` suboption -(mentioned in the [summary](#summary)), instead of calling `wait`, `poll` or -`yield`, as an optimization, the `callback` function can *return* to wait in -the event loop, minimizing switching costs and freeing up the stack in the -interim. - ### Backpressure Once a component exports functions using the async ABI, multiple concurrent @@ -588,6 +660,13 @@ event, allowing a higher degree of concurrency than synchronous exports. Stackfull async exports ignore the lock entirely and thus achieve the highest degree of (cooperative) concurrency. +Since non-`async` functions are not allowed to block (including due to +backpressure) and also don't pile up like `async` functions, non-`async` +functions ignore backpressure (explicit and implicit) entirely. If a +component exports a mix of `async` and non-`async` functions, code generation +must therefore be prepared to handle non-`async` functions executing at +any cooperative yield point, even in the middle of a `callback`. + Once a task is allowed to start according to these backpressure rules, its arguments are lowered into the callee's linear memory and the task is in the "started" state. @@ -616,7 +695,8 @@ thread created implicitly for the initial export call or some thread transitively created by that thread) must call `task.return`. Once `task.return` is called, the task is in the "returned" state. Calling -`task.return` when not in the "started" state traps. +`task.return` when not in the "started" state traps. Once in a "returned" +state, non-`async` functions are allowed to block. ### Borrows @@ -676,7 +756,7 @@ all. When `subtask.cancel` is called, it will attempt to immediately resume one of the subtask's threads which is in a cancellable state, passing it a sentinel "cancelled" value. A thread is in a "cancellable" state if it calls one of the -[waiting](#waiting) built-ins with the `cancellable` immediate set (indicating +[blocking](#blocking) built-ins with the `cancellable` immediate set (indicating that the caller expects and propagates cancellation appropriately) or, if using a `callback`, returns to the event loop (which always waits cancellably). If a subtask has no cancellable threads, no thread is resumed and the request for @@ -685,13 +765,13 @@ the next cancellable wait. In the worst case, though, a component may never wait cancellably and thus cancellation may be silently ignored. `subtask.cancel` can be called synchronously or asynchronously. If called -synchronously, `subtask.cancel` waits until the subtask reaches a resolved +synchronously, `subtask.cancel` blocks until the subtask reaches a resolved state and returns which state was reached. If called asynchronously, then if a cancellable subtask thread is resumed *and* the subtask reaches a resolved state before suspending itself for whatever reason `subtask.cancel` will return which state was reached. Otherwise, `subtask.cancel` will return a "blocked" -sentinel value and the caller must [wait][waiting] via waitable set until the -subtask reaches a resolved state. +sentinel value and the caller must [wait][#waitables-and-waitable-sets] via +waitable set until the subtask reaches a resolved state. The Component Model does not provide a mechanism to force prompt termination of threads as this can lead to leaks and corrupt state in a still-live component @@ -806,21 +886,24 @@ function an async-oriented core function signature that can be used instead of or in addition to the existing (Preview-2-defined) synchronous core function signature. This async-oriented core function signature is intended to be called or implemented by generated bindings which then map the low-level core async -protocol to the languages' higher-level native concurrency features. Because -the WIT-level `async` attribute is purely a *hint* (as mentioned -[above](#summary)), *every* WIT function has an async core function signature; -`async` just provides hints to the bindings generator for which to use by -default. +protocol to the languages' higher-level native concurrency features. + +Note that *every* WIT-level function type can be lifted and lowered using the +async (or sync) ABI. While calling a non-`async`-typed function import using +the async ABI will never returned that the call "blocked" (as guaranteed by the +Component Model trapping if the callee would have blocked), the async ABI is +still allowed to be used (for the benefit of code generators that only want +to think about one ABI). ### Async Import ABI Given these imported WIT functions (using the fixed-length-list feature πŸ”§): ```wit world w { - import foo: func(s: string) -> u32; - import bar: func(s: string) -> string; - import baz: func(t: list) -> string; - import quux: func(t: list) -> string; + import foo: async func(s: string) -> u32; + import bar: async func(s: string) -> string; + import baz: async func(t: list) -> string; + import quux: async func(t: list) -> string; } ``` the default/synchronous lowered import function signatures are: @@ -870,22 +953,22 @@ receiving an event indicating that the async subtask has started/returned. Other example asynchronous lowered signatures: -| WIT function type | Async ABI | -| ----------------------------------------- | --------------------- | -| `func()` | `(func (result i32))` | -| `func() -> string` | `(func (param $out-ptr i32) (result i32))` | -| `func(x: f32) -> f32` | `(func (param $x f32) (param $out-ptr i32) (result i32))` | -| `func(s: string, t: string)` | `(func (param $s-ptr i32) (param $s-len i32) (result $t-ptr i32) (param $t-len i32) (result i32))` | +| WIT function type | Async ABI | +| ---------------------------------- | --------------------- | +| `async func()` | `(func (result i32))` | +| `async func() -> string` | `(func (param $out-ptr i32) (result i32))` | +| `async func(x: f32) -> f32` | `(func (param $x f32) (param $out-ptr i32) (result i32))` | +| `async func(s: string, t: string)` | `(func (param $s-ptr i32) (param $s-len i32) (result $t-ptr i32) (param $t-len i32) (result i32))` | `future` and `stream` can appear anywhere in the parameter or result types. For example: ```wit -func(s1: stream>, s2: list>) -> result, stream> +async func(s1: stream>, s2: list>) -> result, stream> ``` In *both* the sync and async ABIs, a `future` or `stream` in the WIT-level type translates to a single `i32` in the ABI. This `i32` is an index into the current component instance's handle table. For example, for the WIT function type: ```wit -func(f: future) -> future +async func(f: future) -> future ``` the synchronous ABI has signature: ```wat @@ -910,7 +993,7 @@ Explainer]. For a complete description of how async imports work, see Given an exported WIT function: ```wit world w { - export foo: func(s: string) -> string; + export foo: async func(s: string) -> string; } ``` @@ -1003,7 +1086,7 @@ Starting with the stackful ABI, the meat of this example component is replaced with `...` to focus on the overall flow of function calls: ```wat (component - (import "fetch" (func $fetch (param "url" string) (result (list u8)))) + (import "fetch" (func $fetch async (param "url" string) (result (list u8)))) (core module $Libc (memory (export "mem") 1) (func (export "realloc") (param i32 i32 i32 i32) (result i32) ...) @@ -1063,7 +1146,7 @@ with `...` to focus on the overall flow of function calls: )))) (canon lift (core func $main "summarize") async (memory $mem) (realloc $realloc) - (func $summarize (param "urls" (list string)) (result string))) + (func $summarize async (param "urls" (list string)) (result string))) (export "summarize" (func $summarize)) ) ``` @@ -1086,6 +1169,10 @@ call to `waitable-set.wait` blocks, the runtime will suspend its callstack entry point and store it in context-local storage (via `context.set`) instead of simply using a `global`, as in a synchronous function. +Note that removing `async` from the type of `summarize` specified in the `canon +lift` definition would cause the above component to trap when it attempted to +call `waitable-set.wait`. + ### Stackless ABI example The stackful example can be re-written to use the `callback` immediate (thereby @@ -1101,7 +1188,7 @@ core wasm code between events, not externally-visible behavior. ```wat (component - (import "fetch" (func $fetch (param "url" string) (result (list u8)))) + (import "fetch" (func $fetch async (param "url" string) (result (list u8)))) (core module $Libc (memory (export "mem") 1) (func (export "realloc") (param i32 i32 i32 i32) (result i32) ...) @@ -1168,7 +1255,7 @@ core wasm code between events, not externally-visible behavior. )))) (canon lift (core func $main "summarize") async (callback (core func $main "cb")) (memory $mem) (realloc $realloc) - (func $summarize (param "urls" (list string)) (result string))) + (func $summarize async (param "urls" (list string)) (result string))) (export "summarize" (func $summarize)) ) ``` @@ -1213,9 +1300,11 @@ comes after: [wasmio-2025]: https://www.youtube.com/watch?v=mkkYNw8gTQg [Color]: https://journal.stuffwithstuff.com/2015/02/01/what-color-is-your-function/ +[What Color Is Your Function?]: https://journal.stuffwithstuff.com/2015/02/01/what-color-is-your-function/ [Weak Memory Model]: https://people.mpi-sws.org/~rossberg/papers/Watt,%20Rossberg,%20Pichon-Pharabod%20-%20Weakening%20WebAssembly%20[Extended].pdf [Fiber]: https://en.wikipedia.org/wiki/Fiber_(computer_science) +[Effect Type]: https://en.wikipedia.org/wiki/Effect_system [CPS Transform]: https://en.wikipedia.org/wiki/Continuation-passing_style [Asyncify]: https://emscripten.org/docs/porting/asyncify.html [Session Types]: https://en.wikipedia.org/wiki/Session_type diff --git a/design/mvp/Explainer.md b/design/mvp/Explainer.md index 46c329b1..fc053745 100644 --- a/design/mvp/Explainer.md +++ b/design/mvp/Explainer.md @@ -568,7 +568,7 @@ valtype ::= | resourcetype ::= (resource (rep i32) (dtor )?) | (resource (rep i32) (dtor async (callback )?)?) 🚝 -functype ::= (func (param "