wasm-tools

wasm-tools is a pure-Python WebAssembly parser and disassembler. It is designed around binary decoding and callback-based visitors rather than a large object model. The project currently focuses on practical inspection of .wasm binaries, objdump-style disassembly, and programmatic extraction of decoded instructions for integration into other tooling.

What this project is for

This repository is useful when you need a lightweight WebAssembly parser that can:

inspect a binary module without depending on native parsing libraries,
produce readable instruction traces for analyst review,
expose structured instruction data as Python dictionaries or JSON,
behave safely on malformed or truncated input by reporting parser errors through callbacks instead of crashing the caller.

For a security engineering audience, the main value is that the code path is short and inspectable. Most behavior lives in four files:

wasm_tools/parser.py for binary decoding and traversal,
wasm_tools/opcodes.py for opcode and immediate metadata,
wasm_tools/visitor.py for human-readable output,
wasm_tools/api.py for library-first structured output.

Command-line usage

The installed console script is wasm-tools, as defined in pyproject.toml.

Disassemble a fixture module:

python -m wasm_tools.cli tests/fixtures/simple_add.wasm -d

If installed as a package, the equivalent entrypoint is:

wasm-tools tests/fixtures/simple_add.wasm -d

Current CLI flags in wasm_tools/cli.py:

-h, --headers — print section header table with ids, sizes, and offsets
-x, --details — print section contents: type signatures, imports, exports, globals, tables, memories, data segments, elements, tags, and code body summaries
-d, --disassemble — decode and print function body instructions
--json — print a minified JSON report to stdout
--json-out PATH — write a minified JSON report to PATH
--analysis-only — with --json and/or --json-out, emit only the high-level analysis object

With no flags, --details is the default.

Index notes for CLI output:

function/global/table/memory/tag indices are printed in module-global index space,
locally-defined function bodies therefore start at func[imported_function_count] when function imports are present,
section detail headers use entry counts (for example Function[3], Code[3], Data[1]) and DataCount prints the decoded count value.

Write a minified JSON report to a file:

wasm-tools tests/fixtures/simple_add.wasm --json-out simple_add.json

Print a minified JSON report to stdout:

wasm-tools tests/fixtures/simple_add.wasm --json

Print only the high-level analysis object to stdout:

wasm-tools tests/fixtures/wasi_capabilities.wasm --json --analysis-only

Use both JSON options together to write a file and print the same payload:

wasm-tools tests/fixtures/simple_add.wasm --json --json-out simple_add.json

Write only the analysis object to a file:

wasm-tools tests/fixtures/dos_growth_loop.wasm --json-out analysis.json --analysis-only

Library usage

Parse from a file

from wasm_tools.api import parse_wasm_file

report = parse_wasm_file("tests/fixtures/simple_add.wasm")
print(report["module_version"])
print(report["function_count"])
print(report["functions"][0]["instructions"])

Parse from bytes and emit JSON

from wasm_tools.api import parse_wasm_bytes_json

with open("tests/fixtures/unicode_names.wasm", "rb") as wasm_file:
    print(parse_wasm_bytes_json(wasm_file.read(), filename="unicode_names.wasm"))

Trust and provenance

The source code in this repository was fully generated by AI assistants, with any human edits limited to formatting or minor changes. For a technical reader, the practical implication is simple: treat the codebase as useful but review every line of code carefully. Review parser behavior, test coverage, and known gaps before depending on it in a security workflow.

The repository itself already reflects this review posture:

parser failures are covered by unit tests for malformed input,
end-to-end tests assert exact disassembly substrings,
CLI and JSON outputs use module-global index spaces for functions, globals, tables, memories, and tags, including imported-entity offsets.

Architecture

A detailed description of the WebAssembly binary format, the parser internals, visitor pattern, two-pass execution model, and security-relevant design decisions is in ARCHITECTURE.md.

The short version:

BinaryReader in wasm_tools/parser.py owns the binary walk. It reads the module header, iterates sections, and decodes function bodies instruction by instruction. It does not build a full AST. Instead, it emits parser events to a delegate object. The parser checks callbacks with hasattr(...) before calling them, so a visitor only needs to implement the hooks it cares about.

The CLI and the JSON API both run the parse twice. The first pass collects names and type information into ObjdumpState. The second pass uses that state to produce disassembly, section details, or a structured JSON report. The shared state lives in wasm_tools/models.py.

wasm_tools/opcodes.py defines the mapping from (prefix, opcode) to (mnemonic, immediate type). The parser uses this table inside BinaryReader.read_instructions() to decide how many bytes to consume. When extending the instruction set, only this table and the immediate dispatch branches in the parser need to change.

Relationship to the specification

The repository includes a local specification snapshot under specification/wasm-latest/. The most relevant files for current implementation work are:

specification/wasm-latest/5.3-binary.instructions.spectec
specification/wasm-latest/5.4-binary.modules.spectec
specification/wasm-latest/6.3-text.instructions.spectec

These files are useful when validating opcode encodings, section layouts, and text-to-binary expectations. The current parser is not a full implementation of everything described by the latest specification snapshot. It implements a practical subset and falls back to unknown_<prefix>_<opcode> names for unsupported instructions.

Spec coverage matrix

This matrix is a planning aid, not a certification statement. It reflects what the current codebase does today based on wasm_tools/parser.py, wasm_tools/opcodes.py, wasm_tools/visitor.py, wasm_tools/api.py, and the current test suite.

Status terms used below:

Tested: implemented and covered by the current automated tests.
Partial: implemented in a limited way, or traversed without full semantic decoding.
Known gap: explicitly tracked as missing behavior in tests.
Not implemented or unverified: no support or no current evidence in tests.

Module and section coverage

Area	Spec reference	Status	Current behavior and evidence
Module header and version	`5.4-binary.modules.spectec`	Tested	Validates magic and version in `BinaryReader._do_read_module()`. Error cases for short files and bad magic are covered in `tests/test_parser.py`.
Section framing and bounds checks	`5.4-binary.modules.spectec`	Tested	Reads section id and size, checks file bounds, and reports errors through `on_error`. Covered by truncated section tests.
Custom sections, generic	`5.4-binary.modules.spectec`	Partial	Parser reads custom section name and skips unknown payloads. The JSON API records the custom section name, but does not decode arbitrary custom payloads.
Custom `name` section for function and local names	`5.4-binary.modules.spectec`	Tested	Subsections 1 (function names) and 2 (local names) are decoded and stored in `ObjdumpState`. Names appear in disassembly and JSON reports. Covered by `custom_name.wasm` and `unicode_names.wat`.
Type section	`5.4-binary.modules.spectec`	Tested	Full function type decoding with GC subtype / rec-type wrappers. Params and results stored as `FuncType` in `ObjdumpState.types` and surfaced in `--details`, JSON `types[]`, and `tests/test_details.py`.
Import section	`5.4-binary.modules.spectec`	Tested	All five import kinds (func, table, memory, global, tag) fully decoded into `ImportEntry` with kind-specific fields. Exposed in `--details` output, JSON `imports[]`, and covered by `tests/test_details.py`.
Function section	`5.4-binary.modules.spectec`	Tested	Function signature indices decoded and stored via `on_function`. Used in prepass and JSON reports.
Table section	`5.4-binary.modules.spectec`	Tested	Reference type and limits decoded into `TableEntry`. Exposed in `--details` and JSON `tables[]`.
Memory section	`5.4-binary.modules.spectec`	Tested	Limits decoded (i32 and i64 variants, including shared flag combinations) into `MemoryEntry`. Exposed in `--details` and JSON `memories[]`.
Global section	`5.4-binary.modules.spectec`	Tested	Value type, mutability, and constant init expression decoded into `GlobalEntry`. Exposed in `--details` and JSON `globals[]`.
Export section	`5.4-binary.modules.spectec`	Tested	All five export kinds decoded into `ExportEntry`. Exposed in `--details` and JSON `exports[]`.
Start section	`5.4-binary.modules.spectec`	Tested	Start function index stored and surfaced in JSON `start_function` field and `--details` output.
Element section	`5.4-binary.modules.spectec`	Tested	All 8 element segment variants decoded, with mode, ref type, table index, offset expression, and function index list stored in `ElementEntry`.
Code section and function bodies	`5.4-binary.modules.spectec`	Tested	Local declaration headers are consumed, instructions are decoded, and end-of-body tracking is implemented. Covered heavily by `tests/test_e2e.py` and `tests/test_json_api.py`.
Data section	`5.4-binary.modules.spectec`	Tested	Active (mem 0), passive, and active (mem x) variants decoded into `DataEntry`. Exposed in `--details` and JSON `data_segments[]`. Covered by `bulk_memory.wat` and `memory_data.wat`.
Data count section	`5.4-binary.modules.spectec`	Tested	Data count is decoded and forwarded to delegates via `on_data_count`.
Tag section	`5.4-binary.modules.spectec`	Tested	Tag entries decoded into `TagEntry` with type index. Exposed in `--details` and JSON `tags[]`.

Instruction coverage

Area	Spec reference	Status	Current behavior and evidence
Basic parametric instructions (`unreachable`, `nop`, `drop`, `select`)	`5.3-binary.instructions.spectec`	Tested	All mapped explicitly in `OPCODES`. Typed `select` with result type vector is handled via `SELECT_T` immediate dispatch. Covered by fixture disassembly tests.
Block/control structure (`block`, `loop`, `if`, `else`, `end`)	`5.3-binary.instructions.spectec`	Tested	Block signatures and expression depth tracking are implemented in `read_instructions()`. Covered by `control_flow.wat` and `complex_flow.wat`.
Branching (`br`, `br_if`, `br_table`, `return`)	`5.3-binary.instructions.spectec`	Tested	Core branch immediates are decoded. `br_table` target list decoded and printed. Covered by `tests/test_e2e.py` and `adversarial_ops.wat`.
Direct and indirect calls (`call`, `call_indirect`)	`5.3-binary.instructions.spectec`	Tested	Direct index operands and `call_indirect` signature/table operands decoded. Covered by `call_indirect.wat` and `complex_flow.wat`.
Return-call extensions (`return_call`, `return_call_indirect`, `call_ref`, `return_call_ref`)	`5.3-binary.instructions.spectec`	Tested	All four opcodes are in `OPCODES` with correct immediate types. Covered by `tests/test_extended_ops.py` and fixture-level `call_ref` disassembly in `call_refs.wat`.
Variable access (`local.get/set/tee`, `global.get/set`)	`5.3-binary.instructions.spectec`	Tested	Index immediates decoded and printed. Covered by arithmetic, globals, and control-flow fixtures.
Memory load/store with memarg	`5.3-binary.instructions.spectec`	Tested	All scalar load/store instructions use the `MEMARG` decoder path, including memory64 large-offset fixtures. Covered by `memory_data.wat`, `complex_flow.wat`, and `load64.wat`.
Integer and float constants	`5.3-binary.instructions.spectec`	Tested	`i32.const`, `i64.const`, `f32.const`, and `f64.const` immediates decoded. Edge signed immediates covered in parser tests and `adversarial_ops.wat`.
Scalar numeric arithmetic and comparisons	`5.3-binary.instructions.spectec`	Tested	Full i32, i64, f32, f64 arithmetic, comparison, and conversion opcode sets are in `OPCODES`. Sign-extension opcodes (`0xC0-0xC4`) included. Covered by `tests/test_extended_ops.py`.
Reference type instructions (`ref.null`, `ref.func`, `ref.eq`, etc.)	`5.3-binary.instructions.spectec`	Tested	`0xD0-0xD6` fully mapped. `ref.null` uses `HEAP_TYPE` immediate. `br_on_null`/`br_on_non_null` use `INDEX`. Covered by `tests/test_extended_ops.py`.
Saturating truncation (`i32.trunc_sat_`, `i64.trunc_sat_`)	`5.3-binary.instructions.spectec`	Tested	All eight `0xFC 0-7` opcodes in `OPCODES` with `NONE` immediate. Dispatch covered by `tests/test_extended_ops.py::test_dispatch_sat_trunc`.
Bulk memory (`memory.init`, `data.drop`, `memory.copy`, `memory.fill`)	`5.3-binary.instructions.spectec`	Tested	`0xFC 8-11` with correct binary operand order for `memory.init`. Covered by `tests/test_confidence_parser.py`, `tests/test_e2e.py`, `tests/test_json_api.py`.
Table bulk ops (`table.init`, `elem.drop`, `table.copy`, `table.grow`, `table.size`, `table.fill`)	`5.3-binary.instructions.spectec`	Tested	`0xFC 12-17` fully mapped with `TABLE_INIT`, `TABLE_COPY`, and `INDEX` immediate types. Dispatch covered by `tests/test_extended_ops.py`.
Exception handling (`throw`, `throw_ref`, `try_table`)	`5.3-binary.instructions.spectec`	Tested	`throw` (0x08), `throw_ref` (0x0A), and `try_table` (0x1F with full catch list) decoded. `TRY_TABLE_BLOCK` parses catch opcodes 0x00-0x03. Covered by `tests/test_extended_ops.py`.
GC / reference types (`0xFB` prefix, struct/array/ref ops)	`5.3-binary.instructions.spectec`	Tested	All 31 `0xFB 0-30` opcodes in `OPCODES`. `BR_ON_CAST` (flags + label + 2 heaptypes) fully decoded. `tests/test_extended_ops.py` covers table completeness and dispatch for `array.len`, `struct.new`, `ref.test`.
SIMD / vector instructions (`0xFD` prefix)	`5.3-binary.instructions.spectec`	Tested	All standard SIMD opcodes 0-275 mapped, including relaxed SIMD. Load/store use `MEMARG`, `v128.const` uses `V128_CONST` (16 raw bytes), `i8x16.shuffle` uses `V128_SHUFFLE`, lane ops use `LANE_IDX` and `MEMARG_LANE`. Covered by `tests/test_extended_ops.py`.
Threads / atomics (`0xFE` prefix)	`5.3-binary.instructions.spectec`	Tested	All atomic operations mapped. `atomic.fence` uses `ATOMIC_FENCE` (reads reserved byte). All others use `MEMARG`. Covered by `tests/test_extended_ops.py`.
Unknown opcode resilience	`5.3-binary.instructions.spectec`	Tested	Unsupported opcodes fall back to `unknown_<prefix>_<opcode>` rather than crashing. Covered by `tests/test_confidence_parser.py`.

Interface and analysis coverage

Area	Status	Current behavior and evidence
CLI disassembly mode (`-d`)	Tested	Covered by `tests/test_e2e.py` with exact substring assertions across all fixture files.
CLI headers mode (`--headers`)	Tested	`BinaryReaderObjdumpHeaders` prints section id, name, size, and offset. Covered by `tests/test_details.py`.
CLI details mode (`-x`)	Tested	`BinaryReaderObjdumpDetails` prints all section contents: types, imports, exports, globals, tables, memories, data segments, elements, tags, and code bodies. Covered by `tests/test_details.py`.
JSON-friendly library API	Tested	`parse_wasm_file()` and related helpers return full semantic reports including types, imports, exports, globals, tables, memories, data segments, and elements. Covered in `tests/test_json_api.py`.
Non-throwing parse errors for library callers	Tested	Malformed inputs populate `errors` instead of forcing a traceback. Covered in parser and JSON API tests.
Full validation against the specification	Not implemented	The current code decodes and reports binary structure; it does not implement the validation chapters from the bundled specification snapshot.
Text-format parsing (`.wat` as input)	Not implemented	The repository consumes `.wat` only through the external fixture build step with `wat2wasm`.

How to use this matrix

The library covers the full WebAssembly binary format at the decoding level. The remaining gaps are deliberate scope choices rather than missing work items:

Spec validation (type checking, structural constraints from chapters 2 and 3 of the spec) is not the goal of this library. Validation belongs in a downstream consumer such as a language runtime.
Text-format (.wat) input is handled externally by WABT and is not in scope.
The specification snapshot is kept locally under specification/wasm-latest/ to serve as an authoritative reference during development but is not shipped with the distributed package.

Report schema

The structured report currently contains:

file: source path or caller-supplied label,
module_version: wasm version from the module header, or None on parse failure,
section_count: number of recorded sections,
sections: list of section dictionaries with index, id, name, size, and offset,
function_count: number of decoded function bodies,
functions: list of function dictionaries with index, name, signature_index, offset, body_size, instruction_count, and instructions,
tables: list of decoded table entries with index, ref_type, and limits (min, max, is_64),
memories: list of decoded memory entries with index and limits (min, max, is_64),
errors: list of parsing or file read errors.

Each instruction entry contains:

offset: byte offset used by the parser when the opcode was decoded,
opcode: mnemonic from OPCODES or an unknown_... fallback,
immediates: decoded immediate values in parser order,
decode_incomplete: present only when a function body ended with a partially decoded instruction record.

This shape is covered by tests/test_json_api.py.

High-level security analysis

The JSON report includes an analysis object designed for analyst triage.

summary: overall risk_score, risk_tier, and finding_count,
detections.wasi: explicit WASI import detection (detected, variants, matched import modules/count),
detections.js_interface: JavaScript-interface signals from imports/exports (js/wbg namespaces, wasm:* builtins such as wasm:js-string, and common glue symbol patterns),
detections.format: coarse format classification (core, possible-component, invalid-core) with evidence signals,
capabilities: inferred host capability tags from imports (for example fs.path, network, process.terminate),
profiles.memory: memory access density, memory.grow, bulk-memory activity, and total data segment bytes,
profiles.control_flow: dynamic dispatch metrics (call_indirect, call_ref) and table mutation counts,
profiles.compute: loop depth and loop-contained memory/control-flow pressure,
findings: actionable rule-based results with stable ids and remediation guidance.

Current built-in finding ids:

WASM-CAP-001: filesystem and network host capabilities imported together.
WASM-CFG-002: indirect call surface combined with mutable table operations.
WASM-DOS-003: memory growth in loop context.
WASM-LOOP-004: deep loop nesting amplification signal.
WASM-FMT-005: binary appears to be non-core or otherwise parse-incompatible for this decoder.

Error handling model

The parser does not re-raise WasmParseError by default. BinaryReader.read_module() catches parse exceptions and forwards the message to delegate.on_error(...) when that callback exists.

This behavior is important for integration scenarios:

command-line flows can report errors without a Python traceback,
library callers can collect structured failure information,
fuzzing or batch inspection pipelines can continue after a malformed file.

Unit tests cover this behavior in tests/test_parser.py and tests/test_confidence_parser.py.

Examples of currently tested failure cases include:

truncated modules,
bad magic values,
sections extending beyond file boundaries,
malformed LEB128 encodings,
truncated instruction immediates.

Test fixtures and what they cover

The repository uses .wat fixtures under tests/fixtures/, compiled to .wasm with WABT's wat2wasm.

Representative fixtures include:

simple_add.wat for minimal arithmetic and local access,
control_flow.wat for block, loop, br, and br_if,
labels_control.wat for named-label lowering, br_table depth vectors, and label shadowing/redefinition patterns,
memory_data.wat for memory load semantics and data segments,
globals_imports.wat for imported globals and functions,
call_indirect.wat for indirect calls,
call_refs.wat for typed call_ref through locals and globals, plus null-ref call paths,
load64.wat for memory64 ((memory i64 ...)) addressing and large memarg offsets,
float_memory64.wat for memory64 float load/store decoding across f32.* and f64.* memory ops,
bulk64.wat for memory64 memory.init, data.drop, memory.copy, and memory.fill,
memory_trap64.wat for memory64 boundary-style address construction with memory.size, memory.grow, and scalar load/store ops,
memory64_shared.wat for shared memory64 limit decoding and memory.size/memory.grow disassembly,
table_fill64.wat for table64 table.fill and table.get,
table_set64.wat for table64 table.set/table.get on externref and funcref tables,
table_size64.wat for table64 table.size/table.grow plus i64 table limits,
table_init64.wat for table64 ((table ... i64 ...)) offsets plus table.init, table.copy, and table-indexed call_indirect,
simd_store64_lane.wat for SIMD lane memory operands, including v128.store64_lane alignment, offset, and lane immediates,
unreachable.wat for stack-polymorphic unreachable behavior across blocks, loops, calls, branches, memory, and numeric operators,
bulk_memory.wat for memory.init, data.drop, and memory.fill,
complex_flow.wat for mixed control flow, memory, direct calls, and indirect calls,
unicode_names.wat for Unicode content,
adversarial_ops.wat for edge immediates and br_table,
wasi_capabilities.wat for host capability/risk analysis checks,
wasi_preview2_like.wat for WASI preview2-like namespace detection (wasi:* imports),
js_interface.wat for JavaScript embedding detection (js, wbg, and wasm:js-string imports),
dos_growth_loop.wat for loop + memory.grow DoS heuristics.

These fixtures are used in tests/test_e2e.py to validate the disassembly output and in tests/test_json_api.py to validate the structured API.

Known limitations

The repository is a practical decoder, not a full specification implementation:

Spec validation (type checking, module-level structural constraints) is deliberately out of scope.
The custom name section decodes subsections 1 (function names) and 2 (local names); other subsections such as label names are skipped.
Some rarely used init-expression forms in element and data segments fall back to a hex scan rather than full expression decoding.
The analysis layer is heuristic by design and is intended for triage, not formal proof of exploitability.

Development workflow

Run the full test suite:

python -m pytest -q

Rebuild .wasm fixtures from .wat sources:

python tests/fixtures/build.py

The fixture build script requires WABT's wat2wasm binary to be available on PATH.

If you prefer using Poetry, the repository metadata in pyproject.toml indicates Poetry-based packaging:

poetry install
poetry run pytest -q
poetry run python tests/fixtures/build.py

Guidance for reviewers and integrators

If you are evaluating this project for security tooling or pipeline integration, start with these files:

wasm_tools/parser.py for parse correctness,
wasm_tools/opcodes.py for current opcode coverage,
wasm_tools/api.py for the stable integration surface,
tests/test_e2e.py for output expectations,
specification/wasm-latest/5.3-binary.instructions.spectec for spec alignment work.

License

This project is licensed under the MIT License. See LICENSE for details.

The inputs to the AI agents came from the WebAssembly specification, the WABT project, and the author's knowledge of Python and WebAssembly. The outputs are original code generated by the AI agents based on those inputs. It is possible this project is therefore not MIT-licensed due to the presence of third-party specification text in the training data. The author has made a good faith effort to generate original code and to avoid copying any specific text from the specification, but this cannot be guaranteed. Users should review the code and the specification to ensure compliance with their licensing needs.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
.github/workflows		.github/workflows
specification/wasm-latest		specification/wasm-latest
tests		tests
wasm_tools		wasm_tools
.gitignore		.gitignore
AGENTS.md		AGENTS.md
AI-DECLARATION.md		AI-DECLARATION.md
ANALYST_GUIDE.md		ANALYST_GUIDE.md
ARCHITECTURE.md		ARCHITECTURE.md
DEPENDENCY_RESEARCH.md		DEPENDENCY_RESEARCH.md
LICENSE		LICENSE
README.md		README.md
SKILL.md		SKILL.md
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

wasm-tools

What this project is for

Command-line usage

Library usage

Parse from a file

Parse from bytes and emit JSON

Trust and provenance

Architecture

Relationship to the specification

Spec coverage matrix

Module and section coverage

Instruction coverage

Interface and analysis coverage

How to use this matrix

Report schema

High-level security analysis

Error handling model

Test fixtures and what they cover

Known limitations

Development workflow

Guidance for reviewers and integrators

License

About

Uh oh!

Releases 4

Contributors 1

Languages

Folders and files

Latest commit

History

Repository files navigation

wasm-tools

What this project is for

Command-line usage

Library usage

Parse from a file

Parse from bytes and emit JSON

Trust and provenance

Architecture

Relationship to the specification

Spec coverage matrix

Module and section coverage

Instruction coverage

Interface and analysis coverage

How to use this matrix

Report schema

High-level security analysis

Error handling model

Test fixtures and what they cover

Known limitations

Development workflow

Guidance for reviewers and integrators

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 4

Contributors 1

Languages