GitHub - CurryChen77/CraftPolicy: CRAFT: Counterfactual-to-Interactive Reinforcement Fine-Tuning for Driving Policies

Counterfactual-to-Interactive Reinforcement Fine-Tuning for Driving Policies

A closed-loop post-training framework for autonomous driving policies.

Keyu Chen¹ · Nanfei Ye² · Yida Wang² · Wenchao Sun¹ · Danqi Zhao¹ · Hao Cheng¹ · Sifa Zheng¹
¹School of Vehicle and Mobility, Tsinghua University · ²Li Auto Inc

💫 CRAFT improves driving policies by combining dense counterfactual proxy supervision with residual correction from true closed-loop interaction.

If you find this repository useful, please consider giving it a star.

✨ News

2026-05-07 Our paper is available on arXiv📄!
2026-05-06 Explore our project page, now live here🔗!

📆 TODO List

Training & Eval Pipeline
Fine-Tuning Support (SparseDriveV2 / HiP-AD / MindDrive)
Visualization Utilities
Official fine-tuned checkpoints (Post-Acceptance Release)

📋 Outline

✨ News
📆 TODO List
📋 Outline
🔨 Setup
🎁 Data and Checkpoints
🔥 Usage
- Training / Evaluation
- Visualization
🔖 Citation
❤️ Acknowledgements

🔨 Setup

Recommended system: Ubuntu 20.04 or 22.04

Step 1: Download CARLA 0.9.16

mkdir carla
cd carla
wget https://carla-releases.s3.us-east-005.backblazeb2.com/Linux/CARLA_0.9.16.tar.gz
tar -xvf CARLA_0.9.16.tar.gz
cd Import && wget https://carla-releases.s3.us-east-005.backblazeb2.com/Linux/AdditionalMaps_0.9.16.tar.gz
cd .. && bash ImportAssets.sh

Update your system's PYTHONPATH with the following paths:

export CARLA_ROOT=YOUR_CARLA_PATH
export PYTHONPATH=$PYTHONPATH:${CARLA_ROOT}/PythonAPI
export PYTHONPATH=$PYTHONPATH:${CARLA_ROOT}/PythonAPI/carla

Step 2: Set up the conda environment and install CARLA

conda create -n Craft python=3.10
conda activate Craft
## install CARLA
pip install YOUR_CARLA_PATH/PythonAPI/carla/dist/carla-0.9.16-cp310-cp310-manylinux_2_31_x86_64.whl

Step 3: Clone this git repo in an appropriate folder

git clone git@github.com:CurryChen77/CraftPolicy.git
cd CraftPolicy

Step 4: Install packages

pip install -r requirements.txt
pip install -e .

Step 5: Install E2E packages

Follow the E2E installation guide to install E2E-related packages.

🎁 Data and Checkpoints

CARLA Map Data

Doc of HD Map

Please ensure the HD map and speed limit data are downloaded before starting fine-tuning.

Name	Google Drive	Approx. Size	Storage Place
HD Map Data	Link	714 MB	Folder
Speed Limits Data	Link	79 MB	Folder

AV Checkpoints

AV Name	Google Drive	Approx. Size	Storage Place
PlanT-V2	Link	732 MB	Folder

For E2E AV checkpoints, see the E2E installation guide.

CBV Checkpoints

CBV Name	Google Drive	Approx. Size	Storage Place
Pluto	Link	51.4 MB	Folder

🔥 Usage

The main workflows are launched from scripts/. For most experiments, start with the two high-level launchers below:

scripts/run_train_and_eval.sh: collect rollouts, train one fine-tuner, then evaluate it.
scripts/run_multi_finetuners.sh: run several fine-tuners sequentially with the same settings.

A typical multi-GPU run uses one CARLA worker per collector/evaluator GPU and Lightning DDP for the trainer GPUs. In the examples below, GPUs 0,1,2,3 run CARLA workers and GPUs 4,5,6,7 run training.

Training / Evaluation

Train and evaluate one fine-tuner:

bash scripts/run_train_and_eval.sh \
  -e sparsedrive_scorer \
  -c scenario \
  -f craft \
  -p bench2drive220 \
  -s 0 \
  -t 1 \
  -r 1 \
  -g 0,1,2,3 \
  -u 4,5,6,7 \
  -S \
  -V

Train and evaluate multiple fine-tuners sequentially:

bash scripts/run_multi_finetuners.sh \
  -e sparsedrive_scorer \
  -c scenario \
  -f craft,grpo,ppo,reinforce_plus \
  -p bench2drive220 \
  -s 0 \
  -t 1 \
  -r 1 \
  -g 0,1,2,3 \
  -u 4,5,6,7 \
  -S

Recommended launcher flags:

Flag	Meaning
`-e`	Ego policy config name without `.yaml`, such as `sparsedrive_scorer`, `sparsedrive`, `hip_ad`, `pdm_lite`, `uniad`, or `vad`.
`-c`	CBV / traffic policy config name without `.yaml`, such as `scenario`, `autopilot`, or `pluto`.
`-f`	Fine-tuner name. Supported values include `craft`, `grpo`, `ppo`, `reinforce_plus`, `rift`, and `soft_align`. `run_multi_finetuners.sh` accepts a comma-separated list.
`-p`	Route set under `rift/scenario/route`; pass either `bench2drive220` or `bench2drive220.xml`.
`-s`	Random seed.
`-t`	Training repetitions for the checkpoint/output tag.
`-r`	Evaluation runs used after training.
`-g`	Collector GPU list. One CARLA worker is launched for each listed GPU.
`-u`	Trainer GPU list. Multiple trainer GPUs automatically use Lightning DDP.
`-S`	Start from scratch by clearing the matching logs, rollout buffers, and checkpoints before running.
`-V`	Record videos during the evaluation stage.

Evaluation uses the union of -g and -u as evaluator GPUs in run_train_and_eval.sh. For lower-level collection, training, evaluation-only runs, pretrained-checkpoint evaluation, and output path conventions, see the detailed usage guide.

Common outputs:

Output	Path pattern
Rollout buffers	`data/rollout_data/<ego>-<cbv>-seed<seed>/<route>x<train_repetitions>-<finetuner>_finetuner`
Collection logs	`log/collect_data/<ego>-<cbv>-seed<seed>/<route>x<train_repetitions>-<finetuner>_finetuner`
Training logs	`log/train_ego/<ego>-<cbv>-seed<seed>/<route>x<train_repetitions>-<finetuner>_finetuner`
Fine-tuned checkpoints	`rift/ego/model_ckpt/<ego>/<route>x<train_repetitions>-<finetuner>_finetuner-<cbv>-seed<seed>`
Evaluation logs	`log/eval/<ego>-<cbv>-seed<seed>/<route>x<train_repetitions>-<finetuner>_finetuner` for the default `eval_runs=1` same-route case. For `eval_runs>1` or cross-route evaluation, the suffix becomes `<eval_route_tag>-train_<train_route>x<train_repetitions>-<finetuner>_finetuner`.

Visualization

Plotting scripts are under tools/plot. The notebook provides an executable gallery for the main figures below.

Aggregate ego evaluation metrics and export a CSV:

python tools/plot/plot_ego_eval_result.py

Render the scaling-and-reward combo figure. This command expects assets/results_ego.csv, collection logs, and matching evaluation logs:

python tools/plot/plot_scaling_reward_combo.py

Render policy distribution figures from a rollout snapshot and available pretrained / fine-tuned checkpoints:

python tools/plot/plot_policy_distribution.py

Render training-stability figures from offline W&B logs:

python tools/plot/plot_train_stability.py

🔖 Citation

If you find our paper useful, please kindly cite us via:

@misc{chen2026craft,
      title={CRAFT: Counterfactual-to-Interactive Reinforcement Fine-Tuning for Driving Policies}, 
      author={Keyu Chen and Nanfei Ye and Yida Wang and Wenchao Sun and Danqi Zhao and Hao Cheng and Sifa Zheng},
      year={2026},
      eprint={2605.04470},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2605.04470}, 
}

❤️ Acknowledgements

This implementation is based on code from several repositories. We sincerely thank the authors for their awesome work.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
assets		assets
data		data
docs		docs
log		log
nuplan_plugin		nuplan_plugin
rift		rift
scripts		scripts
tools		tools
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
paper_figure.ipynb		paper_figure.ipynb
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Counterfactual-to-Interactive Reinforcement Fine-Tuning for Driving Policies

✨ News

📆 TODO List

📋 Outline

🔨 Setup

🎁 Data and Checkpoints

🔥 Usage

Training / Evaluation

Visualization

🔖 Citation

❤️ Acknowledgements

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Counterfactual-to-Interactive Reinforcement Fine-Tuning for Driving Policies

✨ News

📆 TODO List

📋 Outline

🔨 Setup

🎁 Data and Checkpoints

🔥 Usage

Training / Evaluation

Visualization

🔖 Citation

❤️ Acknowledgements

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages