Decaf: Decompilation with Automatic Feedback

Artifact Repository

This repository contains the code, configurations, and scripts for reproducing the experiments in our paper.

Note: Models and data are hosted on HuggingFace under decaf-usenix/ (anonymized) due to size: Decaf-Gen-{1.3b, 6.7b, 22b}, two 32B rerankers, ExeBench test sets, and the Juliet vulnerability-detection dataset. The provided download script fetches whichever subset you need (full set is over 200 GB)

Environment Setup

We recommend opening the repository as a Dev Container in VS Code.

Edit .devcontainer/devcontainer.json to mount your storage directory on the host machine.
After building the container, run .devcontainer/conda_install.sh to install all dependencies.
A HF_TOKEN from HuggingFace will need to be created and added to the environment and the .dotenv file in the project to authorize HuggingFace.

Key Scripts

Script	Description
`scripts/download.sh`	Download models and data from HuggingFace
`scripts/inference.py`	LLM inference (vLLM) to sample from the LLMs for decompilation
`scripts/evaluator.py`	Evaluation for LLM generated decompilations: compilation, execution, edit distance, etc.
`scripts/eval_rerank.py`	Reranking using the different reranking methodologies

| src/utils/exebench.py | Core utility for compiling, executing, disassembling, and evaluating ExeBench examples | | src/training/trainer.py | Supervised Fine-Tuning training |

Experiment Drivers

The scripts/merged_test_set_experiments/ directory contains shell scripts for running the merged-test-set experiments (inference, evaluation, reranking) across the different models.

For the Juliet vulnerability-recovery experiment:

# End-to-end (GPUs needed): Ghidra prepare -> infer -> evaluate -> rerank -> analysis
bash scripts/juliet/run_juliet_pipeline.sh

# Re-run just the analysis side (no GPUs) against the shipped reranked_results.jsonl
bash scripts/juliet/run_juliet_pipeline.sh analyze

Individual stages are also addressable: prepare, infer, evaluate, rerank, populate, codeql, summarize. The analyze shortcut composes populate -> codeql -> summarize.

Configurations

Directory	Description
`configs/test_experiments/`	Inference / evaluation / reranking configs (merged test sets)

| configs/decaf_batch_juliet_ceiling_O2_funceval.yaml | Juliet O2 batch pipeline config (used by run_juliet_full_pipeline.sh) |

Testing

To verify the environment setup, run:

pytest tests/ --ignore=tests/models/ -v

This validates core functionality (compilation, execution, distance metrics) without requiring GPU resources.

Reproduce

Download models and data:
```
./scripts/download.sh --all
```
This fetches:
- Decaf-Gen-1.3b / -6.7b / -22b — LLM generators at three scales
- Decaf-ReRanker-32b-stripped / -unstripped — neural rerankers
- Decaf-Test-Sets — Real and Synth merged-test-set evaluation jsonl
- Decaf-Juliet-Funceval — Juliet binaries, manifests, alignments, and a 3.2 GB reranked_results.jsonl so the analysis pipeline can run without GPUs
You can also fetch selectively:
```
./scripts/download.sh --models-only      # all generators + rerankers
./scripts/download.sh --data-only        # test sets + Juliet
./scripts/download.sh --juliet-only      # just the Juliet dataset (~707 MB)
```

Merged-test-set experiments: run the drivers under scripts/merged_test_set_experiments/ or the configs under configs/test_experiments/ours_base_{1.3b,6.7b}/:

bash scripts/merged_test_set_experiments/ours_base_v2_n32_inference.sh
bash scripts/merged_test_set_experiments/ours_base_v2_n32_eval.sh
bash scripts/merged_test_set_experiments/ours_base_v2_n32_rerank_comprehensive.sh

Juliet vulnerability-recovery experiment. After download.sh --juliet-only, run either:

# Analysis-only (no GPU needed) — uses shipped reranked_results.jsonl
bash scripts/juliet/run_juliet_analysis_pipeline.sh

# Full end-to-end (GPUs needed) — re-runs inference / evaluate / rerank
bash scripts/juliet/run_juliet_full_pipeline.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Decaf: Decompilation with Automatic Feedback

Artifact Repository

Environment Setup

Key Scripts

Experiment Drivers

Configurations

Testing

Reproduce

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
.devcontainer		.devcontainer
configs		configs
scripts		scripts
src		src
tests		tests
.dotenv.example		.dotenv.example
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

Decaf: Decompilation with Automatic Feedback

Artifact Repository

Environment Setup

Key Scripts

Experiment Drivers

Configurations

Testing

Reproduce

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages