Skip to content

CurryChen77/CraftPolicy

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CRAFT

Counterfactual-to-Interactive Reinforcement Fine-Tuning for Driving Policies

A closed-loop post-training framework for autonomous driving policies.

Project Page · arXiv

Keyu Chen1 · Nanfei Ye2 · Yida Wang2 · Wenchao Sun1 · Danqi Zhao1 · Hao Cheng1 · Sifa Zheng1
1School of Vehicle and Mobility, Tsinghua University · 2Li Auto Inc

CRAFT overview

💫​ CRAFT improves driving policies by combining dense counterfactual proxy supervision with residual correction from true closed-loop interaction.

If you find this repository useful, please consider giving it a star.

✨ News

  • 2026-05-07 Our paper is available on arXiv📄!
  • 2026-05-06 Explore our project page, now live here🔗!

📆 TODO List

  • Training & Eval Pipeline
  • Fine-Tuning Support (SparseDriveV2 / HiP-AD / MindDrive)
  • Visualization Utilities
  • Official fine-tuned checkpoints (Post-Acceptance Release)

📋 Outline

🔨 Setup

Recommended system: Ubuntu 20.04 or 22.04

Step 1: Download CARLA 0.9.16

mkdir carla
cd carla
wget https://carla-releases.s3.us-east-005.backblazeb2.com/Linux/CARLA_0.9.16.tar.gz
tar -xvf CARLA_0.9.16.tar.gz
cd Import && wget https://carla-releases.s3.us-east-005.backblazeb2.com/Linux/AdditionalMaps_0.9.16.tar.gz
cd .. && bash ImportAssets.sh

Update your system's PYTHONPATH with the following paths:

export CARLA_ROOT=YOUR_CARLA_PATH
export PYTHONPATH=$PYTHONPATH:${CARLA_ROOT}/PythonAPI
export PYTHONPATH=$PYTHONPATH:${CARLA_ROOT}/PythonAPI/carla

Step 2: Set up the conda environment and install CARLA

conda create -n Craft python=3.10
conda activate Craft
## install CARLA
pip install YOUR_CARLA_PATH/PythonAPI/carla/dist/carla-0.9.16-cp310-cp310-manylinux_2_31_x86_64.whl

Step 3: Clone this git repo in an appropriate folder

git clone git@github.com:CurryChen77/CraftPolicy.git
cd CraftPolicy

Step 4: Install packages

pip install -r requirements.txt
pip install -e .

Step 5: Install E2E packages

Follow the E2E installation guide to install E2E-related packages.

🎁 Data and Checkpoints

  • CARLA Map Data

Doc of HD Map

Please ensure the HD map and speed limit data are downloaded before starting fine-tuning.

Name Google Drive Approx. Size Storage Place
HD Map Data Link 714 MB Folder
Speed Limits Data Link 79 MB Folder
  • AV Checkpoints
AV Name Google Drive Approx. Size Storage Place
PlanT-V2 Link 732 MB Folder

For E2E AV checkpoints, see the E2E installation guide.

  • CBV Checkpoints
CBV Name Google Drive Approx. Size Storage Place
Pluto Link 51.4 MB Folder

🔥 Usage

The main workflows are launched from scripts/. For most experiments, start with the two high-level launchers below:

  • scripts/run_train_and_eval.sh: collect rollouts, train one fine-tuner, then evaluate it.
  • scripts/run_multi_finetuners.sh: run several fine-tuners sequentially with the same settings.

A typical multi-GPU run uses one CARLA worker per collector/evaluator GPU and Lightning DDP for the trainer GPUs. In the examples below, GPUs 0,1,2,3 run CARLA workers and GPUs 4,5,6,7 run training.

Training / Evaluation

Train and evaluate one fine-tuner:

bash scripts/run_train_and_eval.sh \
  -e sparsedrive_scorer \
  -c scenario \
  -f craft \
  -p bench2drive220 \
  -s 0 \
  -t 1 \
  -r 1 \
  -g 0,1,2,3 \
  -u 4,5,6,7 \
  -S \
  -V

Train and evaluate multiple fine-tuners sequentially:

bash scripts/run_multi_finetuners.sh \
  -e sparsedrive_scorer \
  -c scenario \
  -f craft,grpo,ppo,reinforce_plus \
  -p bench2drive220 \
  -s 0 \
  -t 1 \
  -r 1 \
  -g 0,1,2,3 \
  -u 4,5,6,7 \
  -S

Recommended launcher flags:

Flag Meaning
-e Ego policy config name without .yaml, such as sparsedrive_scorer, sparsedrive, hip_ad, pdm_lite, uniad, or vad.
-c CBV / traffic policy config name without .yaml, such as scenario, autopilot, or pluto.
-f Fine-tuner name. Supported values include craft, grpo, ppo, reinforce_plus, rift, and soft_align. run_multi_finetuners.sh accepts a comma-separated list.
-p Route set under rift/scenario/route; pass either bench2drive220 or bench2drive220.xml.
-s Random seed.
-t Training repetitions for the checkpoint/output tag.
-r Evaluation runs used after training.
-g Collector GPU list. One CARLA worker is launched for each listed GPU.
-u Trainer GPU list. Multiple trainer GPUs automatically use Lightning DDP.
-S Start from scratch by clearing the matching logs, rollout buffers, and checkpoints before running.
-V Record videos during the evaluation stage.

Evaluation uses the union of -g and -u as evaluator GPUs in run_train_and_eval.sh. For lower-level collection, training, evaluation-only runs, pretrained-checkpoint evaluation, and output path conventions, see the detailed usage guide.

Common outputs:

Output Path pattern
Rollout buffers data/rollout_data/<ego>-<cbv>-seed<seed>/<route>x<train_repetitions>-<finetuner>_finetuner
Collection logs log/collect_data/<ego>-<cbv>-seed<seed>/<route>x<train_repetitions>-<finetuner>_finetuner
Training logs log/train_ego/<ego>-<cbv>-seed<seed>/<route>x<train_repetitions>-<finetuner>_finetuner
Fine-tuned checkpoints rift/ego/model_ckpt/<ego>/<route>x<train_repetitions>-<finetuner>_finetuner-<cbv>-seed<seed>
Evaluation logs log/eval/<ego>-<cbv>-seed<seed>/<route>x<train_repetitions>-<finetuner>_finetuner for the default eval_runs=1 same-route case. For eval_runs>1 or cross-route evaluation, the suffix becomes <eval_route_tag>-train_<train_route>x<train_repetitions>-<finetuner>_finetuner.

Visualization

Plotting scripts are under tools/plot. The notebook provides an executable gallery for the main figures below.

Aggregate ego evaluation metrics and export a CSV:

python tools/plot/plot_ego_eval_result.py

Render the scaling-and-reward combo figure. This command expects assets/results_ego.csv, collection logs, and matching evaluation logs:

python tools/plot/plot_scaling_reward_combo.py

Render policy distribution figures from a rollout snapshot and available pretrained / fine-tuned checkpoints:

python tools/plot/plot_policy_distribution.py

Render training-stability figures from offline W&B logs:

python tools/plot/plot_train_stability.py

🔖 Citation

If you find our paper useful, please kindly cite us via:

@misc{chen2026craft,
      title={CRAFT: Counterfactual-to-Interactive Reinforcement Fine-Tuning for Driving Policies}, 
      author={Keyu Chen and Nanfei Ye and Yida Wang and Wenchao Sun and Danqi Zhao and Hao Cheng and Sifa Zheng},
      year={2026},
      eprint={2605.04470},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2605.04470}, 
}

❤️ Acknowledgements

This implementation is based on code from several repositories. We sincerely thank the authors for their awesome work.

About

CRAFT: Counterfactual-to-Interactive Reinforcement Fine-Tuning for Driving Policies

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors