Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions .github/workflows/main.yml
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,6 @@ jobs:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
- uses: WebAssembly/wit-abi-up-to-date@v4
- uses: WebAssembly/wit-abi-up-to-date@v13
with:
wit-abi-tag: wit-abi-0.4.0
wit-abi-tag: wit-abi-0.11.0
15 changes: 6 additions & 9 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,18 +5,15 @@ same guidelines as the [WASI contribution guidelines].

[WASI contribution guidelines]: https://github.com/WebAssembly/WASI/blob/main/Contributing.md

The specification is written in WIT (WebAssembly Interface Types) syntax in code
blocks inside a Markdown file (`wasi-nn.wit.md`). This is parseable by WIT tools
(e.g., [`wit-bindgen`], [`wit-abi`]). Note that, when altering the WIT
specification, the ABI file (`wasi-nn.abi.md`) must also be updated or CI will
fail. To use [`wit-abi`] to update the ABI file, run:
The specification is written in WIT (WebAssembly Interface Types) — see `wit/wasi-nn.wit`.
This is parseable by WIT tools (e.g., [`wit-bindgen`], [`wit-abi`]). Note that, when altering the
WIT specification, the Markdown example file (`ml-example.md`) must also be updated or CI will fail.
To use [`wit-abi`] to update the ABI file, run:

[`wit-bindgen`]: https://github.com/bytecodealliance/wit-bindgen
[`wit-abi`]: https://github.com/WebAssembly/wasi-tools/tree/main/wit-abi

```console
$ git clone https://github.com/WebAssembly/wasi-tools
$ cd wasi-tools/wit-abi
$ cargo build
$ target/debug/wit-abi <path to wasi-nn directory>
$ cargo install wit-abi --locked --git https://github.com/WebAssembly/wasi-tools --tag wit-abi-0.11.0
$ wit-abi markdown --html-in-md wit
```
202 changes: 202 additions & 0 deletions ml-example.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,202 @@
<h1><a name="ml_example">World ml-example</a></h1>
<p><code>wasi-nn</code> is a WASI API for performing machine learning (ML) inference. The API is not (yet)
capable of performing ML training. WebAssembly programs that want to use a host's ML
capabilities can access these capabilities through <code>wasi-nn</code>'s core abstractions: <em>graphs</em> and
<em>tensors</em>. A user <a href="#load"><code>load</code></a>s a model -- instantiated as a <em>graph</em> -- to use in an ML <em>backend</em>.
Then, the user passes <em>tensor</em> inputs to the <em>graph</em>, computes the inference, and retrieves the
<em>tensor</em> outputs.</p>
<p>This example world shows how to use these primitives together.</p>
<ul>
<li>Imports:
<ul>
<li>interface <a href="#wasi:nn_tensor"><code>wasi:nn/tensor</code></a></li>
<li>interface <a href="#wasi:nn_errors"><code>wasi:nn/errors</code></a></li>
<li>interface <a href="#wasi:nn_graph"><code>wasi:nn/graph</code></a></li>
<li>interface <a href="#wasi:nn_inference"><code>wasi:nn/inference</code></a></li>
</ul>
</li>
</ul>
<h2><a name="wasi:nn_tensor">Import interface wasi:nn/tensor</a></h2>
<p>All inputs and outputs to an ML inference are represented as <a href="#tensor"><code>tensor</code></a>s.</p>
<hr />
<h3>Types</h3>
<h4><a name="tensor_type"><code>enum tensor-type</code></a></h4>
<p>The type of the elements in a tensor.</p>
<h5>Enum Cases</h5>
<ul>
<li><a name="tensor_type.fp16"><code>fp16</code></a></li>
<li><a name="tensor_type.fp32"><code>fp32</code></a></li>
<li><a name="tensor_type.bf16"><code>bf16</code></a></li>
<li><a name="tensor_type.up8"><code>up8</code></a></li>
<li><a name="tensor_type.ip32"><code>ip32</code></a></li>
</ul>
<h4><a name="tensor_dimensions"><code>type tensor-dimensions</code></a></h4>
<p><a href="#tensor_dimensions"><a href="#tensor_dimensions"><code>tensor-dimensions</code></a></a></p>
<p>The dimensions of a tensor.
<p>The array length matches the tensor rank and each element in the array describes the size of
each dimension</p>
<h4><a name="tensor_data"><code>type tensor-data</code></a></h4>
<p><a href="#tensor_data"><a href="#tensor_data"><code>tensor-data</code></a></a></p>
<p>The tensor data.
<p>Initially conceived as a sparse representation, each empty cell would be filled with zeros
and the array length must match the product of all of the dimensions and the number of bytes
in the type (e.g., a 2x2 tensor with 4-byte f32 elements would have a data array of length
16). Naturally, this representation requires some knowledge of how to lay out data in
memory--e.g., using row-major ordering--and could perhaps be improved.</p>
<h4><a name="tensor"><code>record tensor</code></a></h4>
<h5>Record Fields</h5>
<ul>
<li><a name="tensor.dimensions"><code>dimensions</code></a>: <a href="#tensor_dimensions"><a href="#tensor_dimensions"><code>tensor-dimensions</code></a></a></li>
<li><a name="tensor.tensor_type"><a href="#tensor_type"><code>tensor-type</code></a></a>: <a href="#tensor_type"><a href="#tensor_type"><code>tensor-type</code></a></a></li>
<li><a name="tensor.data"><code>data</code></a>: <a href="#tensor_data"><a href="#tensor_data"><code>tensor-data</code></a></a></li>
</ul>
<h2><a name="wasi:nn_errors">Import interface wasi:nn/errors</a></h2>
<p>TODO: create function-specific errors (https://github.com/WebAssembly/wasi-nn/issues/42)</p>
<hr />
<h3>Types</h3>
<h4><a name="error"><code>enum error</code></a></h4>
<h5>Enum Cases</h5>
<ul>
<li><a name="error.invalid_argument"><code>invalid-argument</code></a></li>
<li><a name="error.invalid_encoding"><code>invalid-encoding</code></a></li>
<li><a name="error.busy"><code>busy</code></a></li>
<li><a name="error.runtime_error"><code>runtime-error</code></a></li>
<li><a name="error.unsupported_operation"><code>unsupported-operation</code></a></li>
<li><a name="error.model_too_large"><code>model-too-large</code></a></li>
<li><a name="error.model_not_found"><code>model-not-found</code></a></li>
</ul>
<h2><a name="wasi:nn_graph">Import interface wasi:nn/graph</a></h2>
<p>A <a href="#graph"><code>graph</code></a> is a loaded instance of a specific ML model (e.g., MobileNet) for a specific ML
framework (e.g., TensorFlow):</p>
<hr />
<h3>Types</h3>
<h4><a name="error"><code>type error</code></a></h4>
<p><a href="#error"><a href="#error"><code>error</code></a></a></p>
<p>
#### <a name="tensor">`type tensor`</a>
[`tensor`](#tensor)
<p>
#### <a name="graph_encoding">`enum graph-encoding`</a>
<p>Describes the encoding of the graph. This allows the API to be implemented by various
backends that encode (i.e., serialize) their graph IR with different formats.</p>
<h5>Enum Cases</h5>
<ul>
<li><a name="graph_encoding.openvino"><code>openvino</code></a></li>
<li><a name="graph_encoding.onnx"><code>onnx</code></a></li>
<li><a name="graph_encoding.tensorflow"><code>tensorflow</code></a></li>
<li><a name="graph_encoding.pytorch"><code>pytorch</code></a></li>
<li><a name="graph_encoding.tensorflowlite"><code>tensorflowlite</code></a></li>
<li><a name="graph_encoding.autodetect"><code>autodetect</code></a></li>
</ul>
<h4><a name="graph_builder"><code>type graph-builder</code></a></h4>
<p><a href="#graph_builder"><a href="#graph_builder"><code>graph-builder</code></a></a></p>
<p>The graph initialization data.
<p>This gets bundled up into an array of buffers because implementing backends may encode their
graph IR in parts (e.g., OpenVINO stores its IR and weights separately).</p>
<h4><a name="graph"><code>type graph</code></a></h4>
<p><code>u32</code></p>
<p>An execution graph for performing inference (i.e., a model).
<p>TODO: replace with <code>resource</code> (https://github.com/WebAssembly/wasi-nn/issues/47).</p>
<h4><a name="execution_target"><code>enum execution-target</code></a></h4>
<p>Define where the graph should be executed.</p>
<h5>Enum Cases</h5>
<ul>
<li><a name="execution_target.cpu"><code>cpu</code></a></li>
<li><a name="execution_target.gpu"><code>gpu</code></a></li>
<li><a name="execution_target.tpu"><code>tpu</code></a></li>
</ul>
<hr />
<h3>Functions</h3>
<h4><a name="load"><code>load: func</code></a></h4>
<p>Load a <a href="#graph"><code>graph</code></a> from an opaque sequence of bytes to use for inference.</p>
<h5>Params</h5>
<ul>
<li><a name="load.builder"><code>builder</code></a>: list&lt;<a href="#graph_builder"><a href="#graph_builder"><code>graph-builder</code></a></a>&gt;</li>
<li><a name="load.encoding"><code>encoding</code></a>: <a href="#graph_encoding"><a href="#graph_encoding"><code>graph-encoding</code></a></a></li>
<li><a name="load.target"><code>target</code></a>: <a href="#execution_target"><a href="#execution_target"><code>execution-target</code></a></a></li>
</ul>
<h5>Return values</h5>
<ul>
<li><a name="load.0"></a> result&lt;<a href="#graph"><a href="#graph"><code>graph</code></a></a>, <a href="#error"><a href="#error"><code>error</code></a></a>&gt;</li>
</ul>
<h4><a name="load_named_model"><code>load-named-model: func</code></a></h4>
<p>Load a <a href="#graph"><code>graph</code></a> by name.</p>
<p>How the host expects the names to be passed and how it stores the graphs for retrieval via
this function is <strong>implementation-specific</strong>. This allows hosts to choose name schemes that
range from simple to complex (e.g., URLs?) and caching mechanisms of various kinds.</p>
<h5>Params</h5>
<ul>
<li><a name="load_named_model.name"><code>name</code></a>: <code>string</code></li>
</ul>
<h5>Return values</h5>
<ul>
<li><a name="load_named_model.0"></a> result&lt;<a href="#graph"><a href="#graph"><code>graph</code></a></a>, <a href="#error"><a href="#error"><code>error</code></a></a>&gt;</li>
</ul>
<h2><a name="wasi:nn_inference">Import interface wasi:nn/inference</a></h2>
<p>An inference &quot;session&quot; is encapsulated by a <a href="#graph_execution_context"><code>graph-execution-context</code></a>. This structure binds a
<a href="#graph"><code>graph</code></a> to input tensors before <a href="#compute"><code>compute</code></a>-ing an inference:</p>
<hr />
<h3>Types</h3>
<h4><a name="error"><code>type error</code></a></h4>
<p><a href="#error"><a href="#error"><code>error</code></a></a></p>
<p>
#### <a name="tensor">`type tensor`</a>
[`tensor`](#tensor)
<p>
#### <a name="tensor_data">`type tensor-data`</a>
[`tensor-data`](#tensor_data)
<p>
#### <a name="graph">`type graph`</a>
[`graph`](#graph)
<p>
#### <a name="graph_execution_context">`type graph-execution-context`</a>
`u32`
<p>Bind a `graph` to the input and output tensors for an inference.
<p>TODO: this is no longer necessary in WIT (https://github.com/WebAssembly/wasi-nn/issues/43)</p>
<hr />
<h3>Functions</h3>
<h4><a name="init_execution_context"><code>init-execution-context: func</code></a></h4>
<p>Create an execution instance of a loaded graph.</p>
<h5>Params</h5>
<ul>
<li><a name="init_execution_context.graph"><a href="#graph"><code>graph</code></a></a>: <a href="#graph"><a href="#graph"><code>graph</code></a></a></li>
</ul>
<h5>Return values</h5>
<ul>
<li><a name="init_execution_context.0"></a> result&lt;<a href="#graph_execution_context"><a href="#graph_execution_context"><code>graph-execution-context</code></a></a>, <a href="#error"><a href="#error"><code>error</code></a></a>&gt;</li>
</ul>
<h4><a name="set_input"><code>set-input: func</code></a></h4>
<p>Define the inputs to use for inference.</p>
<h5>Params</h5>
<ul>
<li><a name="set_input.ctx"><code>ctx</code></a>: <a href="#graph_execution_context"><a href="#graph_execution_context"><code>graph-execution-context</code></a></a></li>
<li><a name="set_input.index"><code>index</code></a>: <code>u32</code></li>
<li><a name="set_input.tensor"><a href="#tensor"><code>tensor</code></a></a>: <a href="#tensor"><a href="#tensor"><code>tensor</code></a></a></li>
</ul>
<h5>Return values</h5>
<ul>
<li><a name="set_input.0"></a> result&lt;_, <a href="#error"><a href="#error"><code>error</code></a></a>&gt;</li>
</ul>
<h4><a name="compute"><code>compute: func</code></a></h4>
<p>Compute the inference on the given inputs.</p>
<p>Note the expected sequence of calls: <a href="#set_input"><code>set-input</code></a>, <a href="#compute"><code>compute</code></a>, <a href="#get_output"><code>get-output</code></a>. TODO: this
expectation could be removed as a part of https://github.com/WebAssembly/wasi-nn/issues/43.</p>
<h5>Params</h5>
<ul>
<li><a name="compute.ctx"><code>ctx</code></a>: <a href="#graph_execution_context"><a href="#graph_execution_context"><code>graph-execution-context</code></a></a></li>
</ul>
<h5>Return values</h5>
<ul>
<li><a name="compute.0"></a> result&lt;_, <a href="#error"><a href="#error"><code>error</code></a></a>&gt;</li>
</ul>
<h4><a name="get_output"><code>get-output: func</code></a></h4>
<p>Extract the outputs after inference.</p>
<h5>Params</h5>
<ul>
<li><a name="get_output.ctx"><code>ctx</code></a>: <a href="#graph_execution_context"><a href="#graph_execution_context"><code>graph-execution-context</code></a></a></li>
<li><a name="get_output.index"><code>index</code></a>: <code>u32</code></li>
</ul>
<h5>Return values</h5>
<ul>
<li><a name="get_output.0"></a> result&lt;<a href="#tensor_data"><a href="#tensor_data"><code>tensor-data</code></a></a>, <a href="#error"><a href="#error"><code>error</code></a></a>&gt;</li>
</ul>
2 changes: 1 addition & 1 deletion wasi-nn.witx
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
;; This WITX version of the wasi-nn API is retained for consistency only. See the `wasi-nn.wit`
;; This WITX version of the wasi-nn API is retained for consistency only. See the `wit/wasi-nn.wit`
;; version for the official specification and documentation.

(typename $buffer_size u32)
Expand Down
File renamed without changes.