ToCode exports a binary or IDA database into a source-like project tree: raw recovered C, matching assembly, function summaries, section data, optional IDA database, and metadata that coding agents can read directly.
AI models are strong at coding, especially when they can traverse large codebases and accumulate context with subagents and other strategies. When we use these agents to assist with reverse engineering, we usually provide tools through MCP or other means so the coding agent can learn and build strategies around tools such as IDA and r2. This approach adds limitations and constraints to how the agent behaves, and it increases the need for deep, complex reasoning.
There should be a better way to improve this scenario so that even smaller models can perform well on this kind of work.
The idea behind ToCode is simple: use a disassembler such as IDA to create a source-code-like project for a given binary, with a pre-built AGENTS.md so most coding agents start with precomputed context. ToCode also produces rich .json files with important metadata.
With this approach, even tiny models can perform well without being connected to MCP-like tool calls, because ToCode provides exactly what coding agents are good at working with: code.
The exported project contains the following structure:
sample_decompiler/
AGENTS.md
CLAUDE.md
src/raw/**/*.c
src/raw/**/*.asm
src/raw/**/*.summary
include/*.h
data/*.bin
data/variables.json
data/variables_interesting.json
function-index.json
functions.json
sections.json
strings.json
imports.json
exports.json
relocations.json
reachable.json
cluster-graph.json
triage.json
project.json
export-manifest.json
| Path | Description |
|---|---|
src/raw |
Decompiled C-like output, assembly, and summaries grouped by cluster. |
include |
Generated headers for the exported project. |
data |
Raw section dumps and variable metadata. |
*.json |
Functions, sections, strings, imports, exports, relocations, reachability, clusters, triage, project metadata, and export manifest. |
AGENTS.md / CLAUDE.md |
Instructions for agents analyzing the exported binary. |
src/tree |
Optional scanner-friendly C output when --tree is used. |
Currently, IDA (using the ida-domain/idapro Python libraries) and radare2 are supported. Other disassemblers may be added in the future.
ToCode supports Windows, Linux, and macOS with Python 3.10 or newer.
On Windows PowerShell:
powershell -ExecutionPolicy Bypass -File .\install.ps1On Linux or macOS:
bash ./install.shManual setup (requires uv):
git clone https://github.com/buzzer-re/ToCode
cd ToCode
uv sync --locked
uv tool install --force --editable .tocode firmwareX.bin -o firmwareX_decompiled/
cd firmwareX_decompiled/
codex
# Inside your agent shell, type your goals, e.g.: "Give me a brief overview of the boot process of this firmware."
tocode firmwareX.bin.i64 -o firmwareX_decompiled/
...
This tool was built using agentic coding, so if you plan to help, I strongly advise doing the same.
Before changing ToCode, have Python, uv, ruff, mypy, pytest, and compileall available. For backend work, also have IDA or radare2 installed, depending on what you are touching.
The main instructions for agents are in AGENTS.md. Read it before starting, and make sure the local quality gate passes before proceeding.
Run the local CI gate before opening a PR:
./ci-local.shOn Windows PowerShell:
powershell -ExecutionPolicy Bypass -File .\ci-local.ps1