#### WASM Binary Format

In [1]:
def runpywasm(wasmfile):
    import pywasm
    def write(s, i): print(i, end=' ')
    def writeln(s): print()
    def read(s): return int(input())
    vm = pywasm.load(wasmfile, {'P0lib': {'write': write, 'writeln': writeln, 'read': read}})

In [2]:
import nbimporter; nbimporter.options["only_defs"] = False
from P0 import compileString

Python function `hexdump` prints the contents of a binary file in hexadecimal format with four bytes on each line. The four bytes are preceded by their address and followed by their corresponding ASCII character or `.` if the corresponding character is not printable: 

In [3]:
def hexdump(fn: str):
    with open(fn, 'rb') as hexfile: # open binary file for reading
        word, pos = hexfile.read(4), 0 # type(word) == bytes
        while len(word) > 0:
            print('{:#08x}'.format(pos) + ': ' +
                  ' '.join('{:02X}'.format(b) for b in word) + '    ' +
                  ''.join(chr(c) if 32 <= c < 127 else '.' for c in word))
            word, pos = hexfile.read(4), pos + 4

Let us apply `hexdump` to the generated .wasm file and analyze the output:

In [4]:
compileString("""
program max
  var x, y: integer
    x ← read(); y ← read()
    if x > y then write(x) else write(y)
""", 'max.wat')

In [5]:
!wat2wasm max.wat

In [6]:
hexdump('max.wasm')

0x000000: 00 61 73 6D    .asm
0x000004: 01 00 00 00    ....
0x000008: 01 0C 03 60    ...`
0x00000c: 01 7F 00 60    ...`
0x000010: 00 00 60 00    ..`.
0x000014: 01 7F 02 2C    ...,
0x000018: 03 05 50 30    ..P0
0x00001c: 6C 69 62 05    lib.
0x000020: 77 72 69 74    writ
0x000024: 65 00 00 05    e...
0x000028: 50 30 6C 69    P0li
0x00002c: 62 07 77 72    b.wr
0x000030: 69 74 65 6C    itel
0x000034: 6E 00 01 05    n...
0x000038: 50 30 6C 69    P0li
0x00003c: 62 04 72 65    b.re
0x000040: 61 64 00 02    ad..
0x000044: 03 02 01 01    ....
0x000048: 05 03 01 00    ....
0x00004c: 01 06 06 01    ....
0x000050: 7F 01 41 00    ..A.
0x000054: 0B 08 01 03    ....
0x000058: 0A 1F 01 1D    ....
0x00005c: 01 03 7F 10    ....
0x000060: 02 21 00 10    .!..
0x000064: 02 21 01 20    .!. 
0x000068: 00 20 01 4A    . .J
0x00006c: 04 40 20 00    .@ .
0x000070: 10 00 05 20    ... 
0x000074: 01 10 00 0B    ....
0x000078: 0B    .


The [WebAssembly Specification](https://webassembly.github.io/spec/) defines the binary format by a [grammar](https://webassembly.github.io/spec/core/binary/index.html). The specification uses an attribute grammar to define the abstract syntax of WebAssembly modules. Here, we give a slightly simplified grammar:

```ebnf
magic ::= 0x00 0x61 0x73 0x6D
version ::= 0x01 0x00 0x00 0x00
module ::= magic
                    version
                    typesec*
                    importsec*
                    funcsec*
                    memsec*
                    globalsec*
                    startsec?
                    codesec*
```

A WebAssembly module is structured into _sections_. Each section starts with a section id, a unique byte, followed by the size of the section's contents in bytes and then the contents. This is made such that a quick scan of the module can determine the start of sections, and they can be processed in parallel, e.g. by compiling function bodies.
```ebnf
size ::= u32
```

The size of each section is a `u32` integer. All integers are represented using the [LEB128](https://en.wikipedia.org/wiki/LEB128) variable-length integer encoding; in the examples here, most integers fit into one byte. Strings are represented by the length of the string and then the string with each character in UTF-8 encoding. The binary format is close to the textual format for `wasm2wat`, except that functions are split into the function type section (`funcsec`) and the body including local variables in the code section (`codesec`).

The grammar uses the convention that `vec(X)` stands for `n Xⁿ`, that is, `n` repetitions of `X` preceded by `n`, for some `n`. 

A function type specifies a vector of parameters and a vector of results. The type `i32` is encoded as `0x7F`. Here, `size` must be the length of `vec(functype)` in bytes:
```ebnf
typesec ::= 0x01 size vec(functype)
functype ::= 0x60 vec(valtype) vec(valtype)
valtype ::= 0x7F | ...
```

Functions, tables, memories, and globals can be imported. Here, only functions are used. Only the index to the function type (`typeidx`) is specified for functions. The `size` in `importsec` is the length of `vec(import)` in bytes, and likewise, the `size` of `funcsec` is the length of `vec(typeidx)` in bytes:

```ebnf
importsec ::= 0x02 size vec(import)
import ::= modname name importdesc
importdesc ::= 0x00 typeidx | ...
typeidx ::= u32
```

```ebnf
funcsec ::= 0x03 size vec(typeidx)
```

The memory section specifies only the minimum number of pages or the minimum and maximum number of pages. Here, `size` is the length of `vec(mem)` in bytes:

```ebnf
memsec ::= 0x05 size vec(mem)
mem ::= 0x00 u32 | 0x01 u32 u32
```

Global variables are constants if `mut` is `0x00` and variables if `mut` is `0x01`. An expression is an instruction sequence that must be terminated with `0x0B`. Here, `size` is the length of `vec(global)` in bytes:
```ebnf
globalsec ::= 0x06 size vec(global)
global ::= valtype mut expr
mut ::= 0x00 | 0x01
expr ::= instr* 0x0B
```

The start section specifies the index of the start function. Here, `size` is the length of `funcidx` in bytes:
```ebnf
startsec ::= 0x08 size funcidx
funcidx ::= u32
```

The code section is a sequence of function bodies with locals. The `size` in `code` is the length of `vec(locals) expr` in bytes. The interpretation of `locals` is that `u32` variables of `valtype` are declared:
```ebnf
codesec ::= 0x0A size vec(code)
code ::= size vec(locals) expr
locals ::= u32 valtype
```

For the representation of instructions, consult https://webassembly.github.io/spec/core/binary/instructions.html.

Here is an annotated version of the hex dump of `max.wasm`:

<pre style="font-family:monospace;color:royalblue">
0x000000: 00 61 73 6D    .asm    magic
0x000004: 01 00 00 00    ....    version
0x000008: 01 0C 03 60    ...`    typesecid, size typesec, #types, functype (0)
0x00000c: 01 7F 00 60    ...`    #params, i32, #results, functype (1)
0x000010: 00 00 60 00    ..`.    #params, #results, functype (2), #params
0x000014: 01 7F 02 2C    ...,    #results, i32, importsecid, length import
0x000018: 03 05 50 30    ..P0    #imports, len(P0lib), 'P', '0'
0x00001c: 6C 69 62 05    lib.    'l', 'i', 'b', len('write')
0x000020: 77 72 69 74    writ    'w', 'r', 'i', 't'
0x000024: 65 00 00 05    e...    'e', typeidx, 0, len('P0lib')
0x000028: 50 30 6C 69    P0li    'P', '0', 'l', 'i'
0x00002c: 62 07 77 72    b.wr    'b', len('writeln'), 'w', 'r'
0x000030: 69 74 65 6C    itel    'i', 't', 'e', 'l'
0x000034: 6E 00 01 05    n...    'n', typeidx, 1, len('P0lib')
0x000038: 50 30 6C 69    P0li    'P', '0', 'l', 'i'
0x00003c: 62 04 72 65    b.re    'b', len('read'), 'r', 'e'
0x000040: 61 64 00 02    ad..    'a', 'd', typeidx, 2
0x000044: 03 02 01 01    ....    funcsec, length 2, #typeidx, type 1
0x000048: 05 03 01 00    ....    memsec, length 3, #mem, 0 (only min)
0x00004c: 01 06 06 01    ....    1 (min), globalsec, size globalsec, #globals
0x000050: 7F 01 41 00    ..A.    i32, var, i32.const, 0 
0x000054: 0B 08 01 03    ....    end (expression), startsec, size startsec, 3 (function)
0x000058: 0A 1F 01 1D    ....    codesection, sizecodesection, #functions, length code
0x00005c: 01 03 7F 10    ....    1 (num locals), 3 (num vars), type (int32), call
0x000060: 02 21 00 10    .!..    2 (read), local.set, 0 (x), call
0x000064: 02 21 01 20    .!.     2 (read), local.set, 1 (y), local.get
0x000068: 00 20 01 4A    . .J    0 (x), local.get, 1 (y), i32.gt_s
0x00006c: 04 40 20 00    .@ .    if, [] (empty type), global.get, 0
0x000070: 10 00 05 20    ...     call, 0 (write), else, local.get
0x000074: 01 10 00 0B    ....    1, call, 0 (write), end (if)
0x000078: 0B             .       end (function)
</pre>