This repository implements a reference interpreter for WebAssembly. It is written for clarity and simplicity, not speed. It is intended as a playground for trying out ideas and a device for nailing down the exact semantics, and as a proxy for the (yet to be produced) formal specification of WebAssembly. For that purpose, the code is written in a fairly declarative, "speccy" way.
Currently, it can
- decode/parse and validate modules in binary or text format
- execute test scripts with assertions about modules
- convert between binary and text format (both directions)
- run as an interactive interpreter
The text format defines modules in S-expression syntax. Moreover, it is generalised to a (very dumb) form of script that can define multiples module and a batch of invocations, assertions, and conversions between them. As such it is richer than the binary format, with the additional functionality purely intended as testing infrastructure. (See below for details.)
You'll need OCaml 4.02 or higher. An easy way to get this on Linux is to download the source tarball from our mirror of the ocaml website and do the configure / make dance. On macOS, with Homebrew installed, simply
brew install ocaml ocamlbuild.
Once you have OCaml, simply do
You'll get an executable named
./wasm. This is a byte code executable. If you want a (faster) native code executable, do
To run the test suite,
To do everything:
Before committing changes, you should do
all, plus updates
Building on Windows
We recommend a pre-built installer. With this one you have two options:
- Bare OCaml. If you just want to build the interpreter and don't care about modifying it, you don't need to install the Cygwin core that comes with the installer. Just install OCaml itself and run
in a Windows shell, which creates a program named
wasm. Note that this will be a byte code executable only, i.e., somewhat slower.
- OCaml + Cygwin. If you want to build a native code executable, or want to hack on the interpreter (i.e., use incremental compilation), then you need to install the Cygwin core that is included with the OCaml installer. Then you can build the interpreter using
makein the Cygwin terminal, as described above.
Either way, in order to run the test suite you'll need to have Python installed. If you used Option 1, you can invoke the test runner
runtests.py directly instead of doing it through
Running Modules or Scripts
You can call the executable with
wasm [option | file ...]
file, depending on its extension, either should be an S-expression script file (see below) to be run, or a binary module file to be loaded.
By default, the interpreter validates all modules.
-u option selects "unchecked mode", which skips validation and runs code as is.
Runtime type errors will be captured and reported appropriately.
Converting Modules or Scripts
A file prefixed by
-o is taken to be an output file. Depending on its extension, this will write out the preceding module definition in either S-expression or binary format. This option can be used to convert between the two in both directions, e.g.:
wasm -d module.wast -o module.wasm wasm -d module.wasm -o module.wast
In the second case, the produced script contains exactly one module definition.
-d option selects "dry mode" and ensures that the input module is not run, even if it has a start section.
In addition, the
-u option for "unchecked mode" can be used to convert even modules that do not validate.
The interpreter can also convert entire test scripts:
wasm -d script.wast -o script.bin.wast wasm -d script.wast -o script2.wast wasm -d script.wast -o script.js
The first creates a new test scripts where all embedded modules are converted to binary, the second one where all are converted to textual.
-h can be used to omit the test harness from the converted file;
it then is the client's responsibility to provide versions of the necessary functions.
By default, the generated script will require
assert_soft_invalid (see below) to detect validation failures. Use the
-us flag ("unchecked soft") to deactivate these assertions to run on implementations that do not validate dead code.
Command Line Expressions
Finally, the option
-e allows to provide arbitrary script commands directly on the command line. For example:
wasm module.wasm -e '(invoke "foo")'
If neither a file nor any of the previous options is given, you'll land in the REPL and can enter script commands interactively. You can also get into the REPL by explicitly passing
- as a file name. You can do that in combination to giving a module file, so that you can then invoke its exports interactively, e.g.:
wasm module.wast -
wasm -h for (the few) additional options.
The implementation consumes a WebAssembly AST given in S-expression syntax. Here is an overview of the grammar of types, expressions, functions, and modules, mirroring what's described in the design doc:
value: <int> | <float> var: <int> | <name> name: $(<letter> | <digit> | _ | . | + | - | * | / | \ | ^ | ~ | = | < | > | ! | ? | @ | # | $ | % | & | | | : | ' | `)+ string: "(<char> | \n | \t | \\ | \' | \" | \<hex><hex>)*" type: i32 | i64 | f32 | f64 elem_type: anyfunc unop: ctz | clz | popcnt | ... binop: add | sub | mul | ... relop: eq | ne | lt | ... sign: s|u offset: offset=<nat> align: align=(1|2|4|8|...) cvtop: trunc_s | trunc_u | extend_s | extend_u | ... block_sig : <type>* func_sig: ( type <var> ) | <param>* <result>* global_sig: <type> | ( mut <type> ) table_sig: <nat> <nat>? <elem_type> memory_sig: <nat> <nat>? expr: ( <op> ) ( <op> <expr>+ ) ;; = <expr>+ (<op>) ( block <name>? <block_sig>? <instr>* ) ( loop <name>? <block_sig>? <instr>* ) ( if <name>? <block_sig>? ( then <instr>* ) ( else <instr>* )? ) ( if <name>? <block_sig>? <expr> ( then <instr>* ) ( else <instr>* )? ) ;; = (if <name>? <block_sig>? <expr> (then <instr>*) (else <instr>*)?) ( if <name>? <block_sig>? <expr> <expr> <expr>? ) ;; = (if <name>? <block_sig>? <expr> (then <expr>) (else <expr>?)) instr: <expr> <op> ;; = (<op>) block <name>? <block_sig>? <instr>* end <name>? ;; = (block <name>? <block_sig>? <instr>*) loop <name>? <block_sig>? <instr>* end <name>? ;; = (loop <name>? <block_sig>? <instr>*) if <name>? <block_sig>? <instr>* end <name>? ;; = (if <name>? <block_sig>? (then <instr>*)) if <name>? <block_sig>? <instr>* else <name>? <instr>* end <name>? ;; = (if <name>? <block_sig>? (then <instr>*) (else <instr>*)) op: unreachable nop br <var> br_if <var> br_table <var>+ return call <var> call_indirect <var> drop select get_local <var> set_local <var> tee_local <var> get_global <var> set_global <var> <type>.load((8|16|32)_<sign>)? <offset>? <align>? <type>.store(8|16|32)? <offset>? <align>? current_memory grow_memory <type>.const <value> <type>.<unop> <type>.<binop> <type>.<testop> <type>.<relop> <type>.<cvtop>/<type> func: ( func <name>? <func_sig> <local>* <instr>* ) ( func <name>? ( export <string> ) <func_sig> <local>* <instrr>* ) ;; = (export <string> (func <N>) (func <name>? <func_sig> <local>* <instr>*) ( func <name>? ( import <string> <string> ) <func_sig>) ;; = (import <name>? <string> <string> (func <func_sig>)) param: ( param <type>* ) | ( param <name> <type> ) result: ( result <type> ) local: ( local <type>* ) | ( local <name> <type> ) global: ( global <name>? <global_sig> ) ( global <name>? ( export <string> ) <global_sig> ) ;; = (export <string> (global <N>)) (global <name>? <global_sig>) ( global <name>? ( import <string> <string> ) <global_sig> ) ;; = (import <name>? <string> <string> (global <global_sig>)) table: ( table <name>? <table_sig> ) ( table <name>? ( export <string> ) <table_sig> ) ;; = (export <string> (table <N>)) (table <name>? <table_sig>) ( table <name>? ( import <string> <string> ) <table_sig> ) ;; = (import <name>? <string> <string> (table <table_sig>)) ( table <name>? ( export <string> )? <elem_type> ( elem <var>* ) ) ;; = (table <name>? ( export <string> )? <size> <size> <elem_type>) (elem (i32.const 0) <var>*) elem: ( elem <var>? (offset <instr>* ) <var>* ) ( elem <var>? <expr> <var>* ) ;; = (elem <var>? (offset <expr>) <var>*) memory: ( memory <name>? <memory_sig> ) ( memory <name>? ( export <string> ) <memory_sig> ) ;; = (export <string> (memory <N>)) (memory <name>? <memory_sig>) ( memory <name>? ( import <string> <string> ) <memory_sig> ) ;; = (import <name>? <string> <string> (memory <memory_sig>)) ( memory <name>? ( export <string> )? ( data <string>* ) ;; = (memory <name>? ( export <string> )? <size> <size>) (data (i32.const 0) <string>*) data: ( data <var>? ( offset <instr>* ) <string>* ) ( data <var>? <expr> <string>* ) ;; = (data <var>? (offset <expr>) <string>*) start: ( start <var> ) typedef: ( type <name>? ( func <funcsig> ) ) import: ( import <string> <string> <imkind> ) imkind: ( func <name>? <func_sig> ) ( global <name>? <global_sig> ) ( table <name>? <table_sig> ) ( memory <name>? <memory_sig> ) export: ( export <string> <exkind> ) exkind: ( func <var> ) ( global <var> ) ( table <var> ) ( memory <var> ) module: ( module <name>? <typedef>* <func>* <import>* <export>* <table>? <memory>? <elem>* <data>* <start>? ) ( module <name>? <string>+ )
Here, productions marked with respective comments are abbreviation forms for equivalent expansions (see the explanation of the AST below).
In particular, WebAssembly is a stack machine, so that all expressions of the form
(<op> <expr>+) are merely abbreviations of a corresponding post-order sequence of instructions.
For raw instructions, the syntax allows omitting the parentheses around the operator name and its immediate operands. In the case of control operators (
if), this requires marking the end of the nested sequence with an explicit
Any form of naming via
<var> (including expression labels) is merely notational convenience of this text format. The actual AST has no names, and all bindings are referred to via ordered numeric indices; consequently, names are immediately resolved in the parser and replaced by indices. Indices can also be used directly in the text format.
A module of the form
(module <string>+) is given in binary form and will be decoded from the (concatenation of the) strings.
The segment strings in the memory field are used to initialize the consecutive memory at the given offset.
<size> in the expansion of the two short-hand forms for
memory is the minimal size that can hold the segment: the number of
<var>s for tables, and the accumulative length of the strings rounded up to page size for memories.
Comments can be written in one of two ways:
comment: ;; <char>* <eol> (; (<char> | <comment>)* ;)
In particular, comments of the latter form nest properly.
In order to be able to check and run modules for testing purposes, the S-expression format is interpreted as a very simple and dumb notion of "script", with commands as follows:
script: <cmd>* cmd: <module> ;; define, validate, and initialize module ( register <string> <name>? ) ;; register module for imports module with given failure string <action> ;; perform action and print results <assertion> ;; assert result of an action <meta> ;; meta command action: ( invoke <name>? <string> <expr>* ) ;; invoke function export ( get <name>? <string> ) ;; get global export assertion: ( assert_return <action> <expr>* ) ;; assert action has expected results ( assert_return_nan <action> ) ;; assert action results in NaN ( assert_trap <action> <failure> ) ;; assert action traps with given failure string ( assert_malformed <module> <failure> ) ;; assert module cannot be decoded with given failure string ( assert_invalid <module> <failure> ) ;; assert module is invalid with given failure string ( assert_soft_invalid <module> <failure> ) ;; assert module is for cases that are not required to be checked ( assert_unlinkable <module> <failure> ) ;; assert module fails to link ( assert_trap <module> <failure> ) ;; assert module traps on instantiation meta: ( script <name>? <script> ) ;; name a subscript ( input <name>? <string> ) ;; read script or module from file ( output <name>? <string>? ) ;; output module to stout or file
Commands are executed in sequence. Commands taking an optional module name refer to the most recently defined module if no name is given. They are only possible after a module has been defined.
After a module is registered under a string name it is available for importing in other modules.
There are also a number of meta commands.
script command is a simple mechanism to name sub-scripts themselves. This is mainly useful for converting scripts with the
output command. Commands inside a
script will be executed normally, but nested meta are expanded in place (
input, recursively) or elided (
output) in the named script.
output meta commands determine the requested file format from the file name extension. They can handle both
.wasm files. In the case of input, a
.wast script will be recursively executed. Output additionally handles
.bin.wast specially, which creates a script where module definitions are in binary.
The interpreter supports a "dry" mode (flag
-d), in which modules are only validated. In this mode, all actions and assertions are ignored.
It also supports an "unchecked" mode (flag
-u), in which module definitions are not validated before use.
Finally, "unchecked soft" mode (flag
-us), will not require
However, to simplify the implementation, this AST representation represents some of the inner structure of the operators more explicitly. The mapping from the operators as given in the design doc to their structured form is defined in operators.ml.
The implementation is split into four directories:
spec: the part of the implementation that corresponds to the actual language specification.
text: parsing and printing the S-expressions text format.
host: infrastructure for loading and running scripts and defining host environment modules.
util: utility libraries.
The implementation consists of the following parts:
Abstract Syntax (
script.ml). Notably, the
phrasewrapper type around each AST node carries the source position information.
parse.ml[i]). Generated with ocamllex and ocamlyacc. The lexer does the opcode encoding (non-trivial tokens carry e.g. type information as semantic values, as declared in
parser.mly), the parser the actual S-expression parsing.
Pretty Printer (
sexpr.ml[i]). Turns a module or script AST back into the textual S-expression format.
encode.ml[i]). The former parses the binary format and turns it into an AST, the latter does the inverse.
valid.ml[i]). Does a recursive walk of the AST, passing down the expected type for expressions, and checking each expression against that. An expected empty type can be matched by any result, corresponding to implicit dropping of unused values (e.g. in a block).
memory.ml[i], and a few more). Implements evaluation as a small-step semantics that rewrites a program one computation step at a time.
JS Generator (
flags.ml). Executes scripts, reports results or errors, etc.
The most relevant pieces are probably the validator (
valid.ml) and the evaluator (
eval.ml). They are written to look as much like a "specification" as possible. Hopefully, the code is fairly self-explanatory, at least for those with a passing familiarity with functional programming.
In typical FP convention (and for better readability), the code tends to use single-character names for local variables where consistent naming conventions are applicable (e.g.,
e for expressions,
v for values,
xs for lists of
xs, etc.). See
eval.ml for more comments.