wasm-tools is a pure-Python WebAssembly parser and disassembler. It is designed around binary decoding and callback-based visitors rather than a large object model. The project currently focuses on practical inspection of .wasm binaries, objdump-style disassembly, and programmatic extraction of decoded instructions for integration into other tooling.
This repository is useful when you need a lightweight WebAssembly parser that can:
- inspect a binary module without depending on native parsing libraries,
- produce readable instruction traces for analyst review,
- expose structured instruction data as Python dictionaries or JSON,
- behave safely on malformed or truncated input by reporting parser errors through callbacks instead of crashing the caller.
For a security engineering audience, the main value is that the code path is short and inspectable. Most behavior lives in four files:
wasm_tools/parser.pyfor binary decoding and traversal,wasm_tools/opcodes.pyfor opcode and immediate metadata,wasm_tools/visitor.pyfor human-readable output,wasm_tools/api.pyfor library-first structured output.
The installed console script is wasm-tools, as defined in pyproject.toml.
Disassemble a fixture module:
python -m wasm_tools.cli tests/fixtures/simple_add.wasm -dIf installed as a package, the equivalent entrypoint is:
wasm-tools tests/fixtures/simple_add.wasm -dCurrent CLI flags in wasm_tools/cli.py:
-h,--headers— print section header table with ids, sizes, and offsets-x,--details— print section contents: type signatures, imports, exports, globals, tables, memories, data segments, elements, tags, and code body summaries-d,--disassemble— decode and print function body instructions--json— print a minified JSON report to stdout--json-out PATH— write a minified JSON report toPATH--analysis-only— with--jsonand/or--json-out, emit only the high-levelanalysisobject
With no flags, --details is the default.
Index notes for CLI output:
- function/global/table/memory/tag indices are printed in module-global index space,
- locally-defined function bodies therefore start at
func[imported_function_count]when function imports are present, - section detail headers use entry counts (for example
Function[3],Code[3],Data[1]) andDataCountprints the decoded count value.
Write a minified JSON report to a file:
wasm-tools tests/fixtures/simple_add.wasm --json-out simple_add.jsonPrint a minified JSON report to stdout:
wasm-tools tests/fixtures/simple_add.wasm --jsonPrint only the high-level analysis object to stdout:
wasm-tools tests/fixtures/wasi_capabilities.wasm --json --analysis-onlyUse both JSON options together to write a file and print the same payload:
wasm-tools tests/fixtures/simple_add.wasm --json --json-out simple_add.jsonWrite only the analysis object to a file:
wasm-tools tests/fixtures/dos_growth_loop.wasm --json-out analysis.json --analysis-onlyfrom wasm_tools.api import parse_wasm_file
report = parse_wasm_file("tests/fixtures/simple_add.wasm")
print(report["module_version"])
print(report["function_count"])
print(report["functions"][0]["instructions"])from wasm_tools.api import parse_wasm_bytes_json
with open("tests/fixtures/unicode_names.wasm", "rb") as wasm_file:
print(parse_wasm_bytes_json(wasm_file.read(), filename="unicode_names.wasm"))The source code in this repository was fully generated by AI assistants, with any human edits limited to formatting or minor changes. For a technical reader, the practical implication is simple: treat the codebase as useful but review every line of code carefully. Review parser behavior, test coverage, and known gaps before depending on it in a security workflow.
The repository itself already reflects this review posture:
- parser failures are covered by unit tests for malformed input,
- end-to-end tests assert exact disassembly substrings,
- CLI and JSON outputs use module-global index spaces for functions, globals, tables, memories, and tags, including imported-entity offsets.
A detailed description of the WebAssembly binary format, the parser internals, visitor pattern, two-pass execution model, and security-relevant design decisions is in ARCHITECTURE.md.
The short version:
BinaryReader in wasm_tools/parser.py owns the binary walk. It reads the module header, iterates sections, and decodes function bodies instruction by instruction. It does not build a full AST. Instead, it emits parser events to a delegate object. The parser checks callbacks with hasattr(...) before calling them, so a visitor only needs to implement the hooks it cares about.
The CLI and the JSON API both run the parse twice. The first pass collects names and type information into ObjdumpState. The second pass uses that state to produce disassembly, section details, or a structured JSON report. The shared state lives in wasm_tools/models.py.
wasm_tools/opcodes.py defines the mapping from (prefix, opcode) to (mnemonic, immediate type). The parser uses this table inside BinaryReader.read_instructions() to decide how many bytes to consume. When extending the instruction set, only this table and the immediate dispatch branches in the parser need to change.
The repository includes a local specification snapshot under specification/wasm-latest/. The most relevant files for current implementation work are:
specification/wasm-latest/5.3-binary.instructions.spectecspecification/wasm-latest/5.4-binary.modules.spectecspecification/wasm-latest/6.3-text.instructions.spectec
These files are useful when validating opcode encodings, section layouts, and text-to-binary expectations. The current parser is not a full implementation of everything described by the latest specification snapshot. It implements a practical subset and falls back to unknown_<prefix>_<opcode> names for unsupported instructions.
This matrix is a planning aid, not a certification statement. It reflects what the current codebase does today based on wasm_tools/parser.py, wasm_tools/opcodes.py, wasm_tools/visitor.py, wasm_tools/api.py, and the current test suite.
Status terms used below:
Tested: implemented and covered by the current automated tests.Partial: implemented in a limited way, or traversed without full semantic decoding.Known gap: explicitly tracked as missing behavior in tests.Not implemented or unverified: no support or no current evidence in tests.
| Area | Spec reference | Status | Current behavior and evidence |
|---|---|---|---|
| Module header and version | 5.4-binary.modules.spectec |
Tested | Validates magic and version in BinaryReader._do_read_module(). Error cases for short files and bad magic are covered in tests/test_parser.py. |
| Section framing and bounds checks | 5.4-binary.modules.spectec |
Tested | Reads section id and size, checks file bounds, and reports errors through on_error. Covered by truncated section tests. |
| Custom sections, generic | 5.4-binary.modules.spectec |
Partial | Parser reads custom section name and skips unknown payloads. The JSON API records the custom section name, but does not decode arbitrary custom payloads. |
Custom name section for function and local names |
5.4-binary.modules.spectec |
Tested | Subsections 1 (function names) and 2 (local names) are decoded and stored in ObjdumpState. Names appear in disassembly and JSON reports. Covered by custom_name.wasm and unicode_names.wat. |
| Type section | 5.4-binary.modules.spectec |
Tested | Full function type decoding with GC subtype / rec-type wrappers. Params and results stored as FuncType in ObjdumpState.types and surfaced in --details, JSON types[], and tests/test_details.py. |
| Import section | 5.4-binary.modules.spectec |
Tested | All five import kinds (func, table, memory, global, tag) fully decoded into ImportEntry with kind-specific fields. Exposed in --details output, JSON imports[], and covered by tests/test_details.py. |
| Function section | 5.4-binary.modules.spectec |
Tested | Function signature indices decoded and stored via on_function. Used in prepass and JSON reports. |
| Table section | 5.4-binary.modules.spectec |
Tested | Reference type and limits decoded into TableEntry. Exposed in --details and JSON tables[]. |
| Memory section | 5.4-binary.modules.spectec |
Tested | Limits decoded (i32 and i64 variants, including shared flag combinations) into MemoryEntry. Exposed in --details and JSON memories[]. |
| Global section | 5.4-binary.modules.spectec |
Tested | Value type, mutability, and constant init expression decoded into GlobalEntry. Exposed in --details and JSON globals[]. |
| Export section | 5.4-binary.modules.spectec |
Tested | All five export kinds decoded into ExportEntry. Exposed in --details and JSON exports[]. |
| Start section | 5.4-binary.modules.spectec |
Tested | Start function index stored and surfaced in JSON start_function field and --details output. |
| Element section | 5.4-binary.modules.spectec |
Tested | All 8 element segment variants decoded, with mode, ref type, table index, offset expression, and function index list stored in ElementEntry. |
| Code section and function bodies | 5.4-binary.modules.spectec |
Tested | Local declaration headers are consumed, instructions are decoded, and end-of-body tracking is implemented. Covered heavily by tests/test_e2e.py and tests/test_json_api.py. |
| Data section | 5.4-binary.modules.spectec |
Tested | Active (mem 0), passive, and active (mem x) variants decoded into DataEntry. Exposed in --details and JSON data_segments[]. Covered by bulk_memory.wat and memory_data.wat. |
| Data count section | 5.4-binary.modules.spectec |
Tested | Data count is decoded and forwarded to delegates via on_data_count. |
| Tag section | 5.4-binary.modules.spectec |
Tested | Tag entries decoded into TagEntry with type index. Exposed in --details and JSON tags[]. |
| Area | Spec reference | Status | Current behavior and evidence |
|---|---|---|---|
Basic parametric instructions (unreachable, nop, drop, select) |
5.3-binary.instructions.spectec |
Tested | All mapped explicitly in OPCODES. Typed select with result type vector is handled via SELECT_T immediate dispatch. Covered by fixture disassembly tests. |
Block/control structure (block, loop, if, else, end) |
5.3-binary.instructions.spectec |
Tested | Block signatures and expression depth tracking are implemented in read_instructions(). Covered by control_flow.wat and complex_flow.wat. |
Branching (br, br_if, br_table, return) |
5.3-binary.instructions.spectec |
Tested | Core branch immediates are decoded. br_table target list decoded and printed. Covered by tests/test_e2e.py and adversarial_ops.wat. |
Direct and indirect calls (call, call_indirect) |
5.3-binary.instructions.spectec |
Tested | Direct index operands and call_indirect signature/table operands decoded. Covered by call_indirect.wat and complex_flow.wat. |
Return-call extensions (return_call, return_call_indirect, call_ref, return_call_ref) |
5.3-binary.instructions.spectec |
Tested | All four opcodes are in OPCODES with correct immediate types. Covered by tests/test_extended_ops.py and fixture-level call_ref disassembly in call_refs.wat. |
Variable access (local.get/set/tee, global.get/set) |
5.3-binary.instructions.spectec |
Tested | Index immediates decoded and printed. Covered by arithmetic, globals, and control-flow fixtures. |
| Memory load/store with memarg | 5.3-binary.instructions.spectec |
Tested | All scalar load/store instructions use the MEMARG decoder path, including memory64 large-offset fixtures. Covered by memory_data.wat, complex_flow.wat, and load64.wat. |
| Integer and float constants | 5.3-binary.instructions.spectec |
Tested | i32.const, i64.const, f32.const, and f64.const immediates decoded. Edge signed immediates covered in parser tests and adversarial_ops.wat. |
| Scalar numeric arithmetic and comparisons | 5.3-binary.instructions.spectec |
Tested | Full i32, i64, f32, f64 arithmetic, comparison, and conversion opcode sets are in OPCODES. Sign-extension opcodes (0xC0-0xC4) included. Covered by tests/test_extended_ops.py. |
Reference type instructions (ref.null, ref.func, ref.eq, etc.) |
5.3-binary.instructions.spectec |
Tested | 0xD0-0xD6 fully mapped. ref.null uses HEAP_TYPE immediate. br_on_null/br_on_non_null use INDEX. Covered by tests/test_extended_ops.py. |
Saturating truncation (i32.trunc_sat_*, i64.trunc_sat_*) |
5.3-binary.instructions.spectec |
Tested | All eight 0xFC 0-7 opcodes in OPCODES with NONE immediate. Dispatch covered by tests/test_extended_ops.py::test_dispatch_sat_trunc. |
Bulk memory (memory.init, data.drop, memory.copy, memory.fill) |
5.3-binary.instructions.spectec |
Tested | 0xFC 8-11 with correct binary operand order for memory.init. Covered by tests/test_confidence_parser.py, tests/test_e2e.py, tests/test_json_api.py. |
Table bulk ops (table.init, elem.drop, table.copy, table.grow, table.size, table.fill) |
5.3-binary.instructions.spectec |
Tested | 0xFC 12-17 fully mapped with TABLE_INIT, TABLE_COPY, and INDEX immediate types. Dispatch covered by tests/test_extended_ops.py. |
Exception handling (throw, throw_ref, try_table) |
5.3-binary.instructions.spectec |
Tested | throw (0x08), throw_ref (0x0A), and try_table (0x1F with full catch list) decoded. TRY_TABLE_BLOCK parses catch opcodes 0x00-0x03. Covered by tests/test_extended_ops.py. |
GC / reference types (0xFB prefix, struct/array/ref ops) |
5.3-binary.instructions.spectec |
Tested | All 31 0xFB 0-30 opcodes in OPCODES. BR_ON_CAST (flags + label + 2 heaptypes) fully decoded. tests/test_extended_ops.py covers table completeness and dispatch for array.len, struct.new, ref.test. |
SIMD / vector instructions (0xFD prefix) |
5.3-binary.instructions.spectec |
Tested | All standard SIMD opcodes 0-275 mapped, including relaxed SIMD. Load/store use MEMARG, v128.const uses V128_CONST (16 raw bytes), i8x16.shuffle uses V128_SHUFFLE, lane ops use LANE_IDX and MEMARG_LANE. Covered by tests/test_extended_ops.py. |
Threads / atomics (0xFE prefix) |
5.3-binary.instructions.spectec |
Tested | All atomic operations mapped. atomic.fence uses ATOMIC_FENCE (reads reserved byte). All others use MEMARG. Covered by tests/test_extended_ops.py. |
| Unknown opcode resilience | 5.3-binary.instructions.spectec |
Tested | Unsupported opcodes fall back to unknown_<prefix>_<opcode> rather than crashing. Covered by tests/test_confidence_parser.py. |
| Area | Status | Current behavior and evidence |
|---|---|---|
CLI disassembly mode (-d) |
Tested | Covered by tests/test_e2e.py with exact substring assertions across all fixture files. |
CLI headers mode (--headers) |
Tested | BinaryReaderObjdumpHeaders prints section id, name, size, and offset. Covered by tests/test_details.py. |
CLI details mode (-x) |
Tested | BinaryReaderObjdumpDetails prints all section contents: types, imports, exports, globals, tables, memories, data segments, elements, tags, and code bodies. Covered by tests/test_details.py. |
| JSON-friendly library API | Tested | parse_wasm_file() and related helpers return full semantic reports including types, imports, exports, globals, tables, memories, data segments, and elements. Covered in tests/test_json_api.py. |
| Non-throwing parse errors for library callers | Tested | Malformed inputs populate errors instead of forcing a traceback. Covered in parser and JSON API tests. |
| Full validation against the specification | Not implemented | The current code decodes and reports binary structure; it does not implement the validation chapters from the bundled specification snapshot. |
Text-format parsing (.wat as input) |
Not implemented | The repository consumes .wat only through the external fixture build step with wat2wasm. |
The library covers the full WebAssembly binary format at the decoding level. The remaining gaps are deliberate scope choices rather than missing work items:
- Spec validation (type checking, structural constraints from chapters 2 and 3 of the spec) is not the goal of this library. Validation belongs in a downstream consumer such as a language runtime.
- Text-format (
.wat) input is handled externally by WABT and is not in scope. - The specification snapshot is kept locally under
specification/wasm-latest/to serve as an authoritative reference during development but is not shipped with the distributed package.
The structured report currently contains:
file: source path or caller-supplied label,module_version: wasm version from the module header, orNoneon parse failure,section_count: number of recorded sections,sections: list of section dictionaries withindex,id,name,size, andoffset,function_count: number of decoded function bodies,functions: list of function dictionaries withindex,name,signature_index,offset,body_size,instruction_count, andinstructions,tables: list of decoded table entries withindex,ref_type, andlimits(min,max,is_64),memories: list of decoded memory entries withindexandlimits(min,max,is_64),errors: list of parsing or file read errors.
Each instruction entry contains:
offset: byte offset used by the parser when the opcode was decoded,opcode: mnemonic fromOPCODESor anunknown_...fallback,immediates: decoded immediate values in parser order,decode_incomplete: present only when a function body ended with a partially decoded instruction record.
This shape is covered by tests/test_json_api.py.
The JSON report includes an analysis object designed for analyst triage.
summary: overallrisk_score,risk_tier, andfinding_count,detections.wasi: explicit WASI import detection (detected,variants, matched import modules/count),detections.js_interface: JavaScript-interface signals from imports/exports (js/wbgnamespaces,wasm:*builtins such aswasm:js-string, and common glue symbol patterns),detections.format: coarse format classification (core,possible-component,invalid-core) with evidence signals,capabilities: inferred host capability tags from imports (for examplefs.path,network,process.terminate),profiles.memory: memory access density,memory.grow, bulk-memory activity, and total data segment bytes,profiles.control_flow: dynamic dispatch metrics (call_indirect,call_ref) and table mutation counts,profiles.compute: loop depth and loop-contained memory/control-flow pressure,findings: actionable rule-based results with stable ids and remediation guidance.
Current built-in finding ids:
WASM-CAP-001: filesystem and network host capabilities imported together.WASM-CFG-002: indirect call surface combined with mutable table operations.WASM-DOS-003: memory growth in loop context.WASM-LOOP-004: deep loop nesting amplification signal.WASM-FMT-005: binary appears to be non-core or otherwise parse-incompatible for this decoder.
The parser does not re-raise WasmParseError by default. BinaryReader.read_module() catches parse exceptions and forwards the message to delegate.on_error(...) when that callback exists.
This behavior is important for integration scenarios:
- command-line flows can report errors without a Python traceback,
- library callers can collect structured failure information,
- fuzzing or batch inspection pipelines can continue after a malformed file.
Unit tests cover this behavior in tests/test_parser.py and tests/test_confidence_parser.py.
Examples of currently tested failure cases include:
- truncated modules,
- bad magic values,
- sections extending beyond file boundaries,
- malformed LEB128 encodings,
- truncated instruction immediates.
The repository uses .wat fixtures under tests/fixtures/, compiled to .wasm with WABT's wat2wasm.
Representative fixtures include:
simple_add.watfor minimal arithmetic and local access,control_flow.watforblock,loop,br, andbr_if,labels_control.watfor named-label lowering,br_tabledepth vectors, and label shadowing/redefinition patterns,memory_data.watfor memory load semantics and data segments,globals_imports.watfor imported globals and functions,call_indirect.watfor indirect calls,call_refs.watfor typedcall_refthrough locals and globals, plus null-ref call paths,load64.watfor memory64 ((memory i64 ...)) addressing and large memarg offsets,float_memory64.watfor memory64 float load/store decoding acrossf32.*andf64.*memory ops,bulk64.watfor memory64memory.init,data.drop,memory.copy, andmemory.fill,memory_trap64.watfor memory64 boundary-style address construction withmemory.size,memory.grow, and scalar load/store ops,memory64_shared.watfor shared memory64 limit decoding andmemory.size/memory.growdisassembly,table_fill64.watfor table64table.fillandtable.get,table_set64.watfor table64table.set/table.geton externref and funcref tables,table_size64.watfor table64table.size/table.growplus i64 table limits,table_init64.watfor table64 ((table ... i64 ...)) offsets plustable.init,table.copy, and table-indexedcall_indirect,simd_store64_lane.watfor SIMD lane memory operands, includingv128.store64_lanealignment, offset, and lane immediates,unreachable.watfor stack-polymorphicunreachablebehavior across blocks, loops, calls, branches, memory, and numeric operators,bulk_memory.watformemory.init,data.drop, andmemory.fill,complex_flow.watfor mixed control flow, memory, direct calls, and indirect calls,unicode_names.watfor Unicode content,adversarial_ops.watfor edge immediates andbr_table,wasi_capabilities.watfor host capability/risk analysis checks,wasi_preview2_like.watfor WASI preview2-like namespace detection (wasi:*imports),js_interface.watfor JavaScript embedding detection (js,wbg, andwasm:js-stringimports),dos_growth_loop.watfor loop +memory.growDoS heuristics.
These fixtures are used in tests/test_e2e.py to validate the disassembly output and in tests/test_json_api.py to validate the structured API.
The repository is a practical decoder, not a full specification implementation:
- Spec validation (type checking, module-level structural constraints) is deliberately out of scope.
- The custom
namesection decodes subsections 1 (function names) and 2 (local names); other subsections such as label names are skipped. - Some rarely used init-expression forms in element and data segments fall back to a hex scan rather than full expression decoding.
- The
analysislayer is heuristic by design and is intended for triage, not formal proof of exploitability.
Run the full test suite:
python -m pytest -qRebuild .wasm fixtures from .wat sources:
python tests/fixtures/build.pyThe fixture build script requires WABT's wat2wasm binary to be available on PATH.
If you prefer using Poetry, the repository metadata in pyproject.toml indicates Poetry-based packaging:
poetry install
poetry run pytest -q
poetry run python tests/fixtures/build.pyIf you are evaluating this project for security tooling or pipeline integration, start with these files:
wasm_tools/parser.pyfor parse correctness,wasm_tools/opcodes.pyfor current opcode coverage,wasm_tools/api.pyfor the stable integration surface,tests/test_e2e.pyfor output expectations,specification/wasm-latest/5.3-binary.instructions.spectecfor spec alignment work.
This project is licensed under the MIT License. See LICENSE for details.
The inputs to the AI agents came from the WebAssembly specification, the WABT project, and the author's knowledge of Python and WebAssembly. The outputs are original code generated by the AI agents based on those inputs. It is possible this project is therefore not MIT-licensed due to the presence of third-party specification text in the training data. The author has made a good faith effort to generate original code and to avoid copying any specific text from the specification, but this cannot be guaranteed. Users should review the code and the specification to ensure compliance with their licensing needs.