Skip to content

buzzer-re/ToCode

Repository files navigation

ToCode

ToCode exports a binary or IDA database into a source-like project tree: raw recovered C, matching assembly, function summaries, section data, optional IDA database, and metadata that coding agents can read directly.

Why

AI models are strong at coding, especially when they can traverse large codebases and accumulate context with subagents and other strategies. When we use these agents to assist with reverse engineering, we usually provide tools through MCP or other means so the coding agent can learn and build strategies around tools such as IDA and r2. This approach adds limitations and constraints to how the agent behaves, and it increases the need for deep, complex reasoning.

There should be a better way to improve this scenario so that even smaller models can perform well on this kind of work.

The idea behind ToCode is simple: use a disassembler such as IDA to create a source-code-like project for a given binary, with a pre-built AGENTS.md so most coding agents start with precomputed context. ToCode also produces rich .json files with important metadata.

With this approach, even tiny models can perform well without being connected to MCP-like tool calls, because ToCode provides exactly what coding agents are good at working with: code.

Export layout

The exported project contains the following structure:

sample_decompiler/
  AGENTS.md
  CLAUDE.md
  src/raw/**/*.c
  src/raw/**/*.asm
  src/raw/**/*.summary
  include/*.h
  data/*.bin
  data/variables.json
  data/variables_interesting.json
  function-index.json
  functions.json
  sections.json
  strings.json
  imports.json
  exports.json
  relocations.json
  reachable.json
  cluster-graph.json
  triage.json
  project.json
  export-manifest.json
Path Description
src/raw Decompiled C-like output, assembly, and summaries grouped by cluster.
include Generated headers for the exported project.
data Raw section dumps and variable metadata.
*.json Functions, sections, strings, imports, exports, relocations, reachability, clusters, triage, project metadata, and export manifest.
AGENTS.md / CLAUDE.md Instructions for agents analyzing the exported binary.
src/tree Optional scanner-friendly C output when --tree is used.

Supported backends

Currently, IDA (using the ida-domain/idapro Python libraries) and radare2 are supported. Other disassemblers may be added in the future.

Using

ToCode supports Windows, Linux, and macOS with Python 3.10 or newer.

On Windows PowerShell:

powershell -ExecutionPolicy Bypass -File .\install.ps1

On Linux or macOS:

bash ./install.sh

Manual setup (requires uv):

git clone https://github.com/buzzer-re/ToCode
cd ToCode
uv sync --locked
uv tool install --force --editable .

Example

tocode firmwareX.bin -o firmwareX_decompiled/
cd firmwareX_decompiled/
codex 

# Inside your agent shell, type your goals, e.g.: "Give me a brief overview of the boot process of this firmware."

From an ongoing RE work

tocode firmwareX.bin.i64 -o firmwareX_decompiled/
...

Development

This tool was built using agentic coding, so if you plan to help, I strongly advise doing the same.

Before changing ToCode, have Python, uv, ruff, mypy, pytest, and compileall available. For backend work, also have IDA or radare2 installed, depending on what you are touching.

The main instructions for agents are in AGENTS.md. Read it before starting, and make sure the local quality gate passes before proceeding.

Quality Gate

Run the local CI gate before opening a PR:

./ci-local.sh

On Windows PowerShell:

powershell -ExecutionPolicy Bypass -File .\ci-local.ps1

About

Transform binaries into source-code-like projects that coding agents can traverse, analyze, and use as an oracle for large binaries. Supports IDA Pro and radare2.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors