Skip to content

AlexShypula/decaf

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Decaf: Decompilation with Automatic Feedback

Artifact Repository

This repository contains the code, configurations, and scripts for reproducing the experiments in our paper.

Note: Models and data are hosted on HuggingFace under decaf-usenix/ (anonymized) due to size: Decaf-Gen-{1.3b, 6.7b, 22b}, two 32B rerankers, ExeBench test sets, and the Juliet vulnerability-detection dataset. The provided download script fetches whichever subset you need (full set is over 200 GB)


Environment Setup

We recommend opening the repository as a Dev Container in VS Code.

  1. Edit .devcontainer/devcontainer.json to mount your storage directory on the host machine.
  2. After building the container, run .devcontainer/conda_install.sh to install all dependencies.
  3. A HF_TOKEN from HuggingFace will need to be created and added to the environment and the .dotenv file in the project to authorize HuggingFace.

Key Scripts

Script Description
scripts/download.sh Download models and data from HuggingFace
scripts/inference.py LLM inference (vLLM) to sample from the LLMs for decompilation
scripts/evaluator.py Evaluation for LLM generated decompilations: compilation, execution, edit distance, etc.
scripts/eval_rerank.py Reranking using the different reranking methodologies

| src/utils/exebench.py | Core utility for compiling, executing, disassembling, and evaluating ExeBench examples | | src/training/trainer.py | Supervised Fine-Tuning training |


Experiment Drivers

The scripts/merged_test_set_experiments/ directory contains shell scripts for running the merged-test-set experiments (inference, evaluation, reranking) across the different models.

For the Juliet vulnerability-recovery experiment:

# End-to-end (GPUs needed): Ghidra prepare -> infer -> evaluate -> rerank -> analysis
bash scripts/juliet/run_juliet_pipeline.sh

# Re-run just the analysis side (no GPUs) against the shipped reranked_results.jsonl
bash scripts/juliet/run_juliet_pipeline.sh analyze

Individual stages are also addressable: prepare, infer, evaluate, rerank, populate, codeql, summarize. The analyze shortcut composes populate -> codeql -> summarize.


Configurations

Directory Description
configs/test_experiments/ Inference / evaluation / reranking configs (merged test sets)

| configs/decaf_batch_juliet_ceiling_O2_funceval.yaml | Juliet O2 batch pipeline config (used by run_juliet_full_pipeline.sh) |


Testing

To verify the environment setup, run:

pytest tests/ --ignore=tests/models/ -v

This validates core functionality (compilation, execution, distance metrics) without requiring GPU resources.


Reproduce

  1. Download models and data:

    ./scripts/download.sh --all

    This fetches:

    • Decaf-Gen-1.3b / -6.7b / -22b — LLM generators at three scales
    • Decaf-ReRanker-32b-stripped / -unstripped — neural rerankers
    • Decaf-Test-Sets — Real and Synth merged-test-set evaluation jsonl
    • Decaf-Juliet-Funceval — Juliet binaries, manifests, alignments, and a 3.2 GB reranked_results.jsonl so the analysis pipeline can run without GPUs

    You can also fetch selectively:

    ./scripts/download.sh --models-only      # all generators + rerankers
    ./scripts/download.sh --data-only        # test sets + Juliet
    ./scripts/download.sh --juliet-only      # just the Juliet dataset (~707 MB)
  2. Merged-test-set experiments: run the drivers under scripts/merged_test_set_experiments/ or the configs under configs/test_experiments/ours_base_{1.3b,6.7b}/:

    bash scripts/merged_test_set_experiments/ours_base_v2_n32_inference.sh
    bash scripts/merged_test_set_experiments/ours_base_v2_n32_eval.sh
    bash scripts/merged_test_set_experiments/ours_base_v2_n32_rerank_comprehensive.sh
  3. Juliet vulnerability-recovery experiment. After download.sh --juliet-only, run either:

    # Analysis-only (no GPU needed) — uses shipped reranked_results.jsonl
    bash scripts/juliet/run_juliet_analysis_pipeline.sh
    
    # Full end-to-end (GPUs needed) — re-runs inference / evaluate / rerank
    bash scripts/juliet/run_juliet_full_pipeline.sh

About

Decaf: Decompilation with Automatic Feedback — code, configs, and pipelines for the paper.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors