This repository provides the official implementation of SDP (Set-Supervised Diffusion Policy: Learning Action-Chunking Diffusion through Corrections), a framework that trains diffusion policies using contrastive positive-negative action-chunk pairs, improving robustness to noisy data and data efficiency in interactive imitation learning. It includes baselines such as Diffusion Policy, Ambient Diffusion, Diffusion-DPO, CLIC, and IBC.
We support conda, poetry, docker, and apptainer. The code has been tested on Ubuntu 22.04 and 24.04.
You can use our conda environment YAML file to create the env:
cd Files/src/
conda env create -f environment.yml
# If issues occur, update:
conda env update -n conda-env-SDP -f environment.yml --prune
β οΈ Torch version may depend on your GPU. For newer GPUs (e.g., RTX 5070Ti), adjustenvironment.yml.
The Poetry configuration is in Files/src/pyproject.toml. Create it with:
cd Files/src/
poetry installAfter installation, activate it with:
cd Files/src/
poetry shellDocker is used for real-robot experiments because it includes the required ROS 1 dependencies. Build the SDP image from the repository root:
cd Files
sudo docker build -t sdp-image -f dockerfile_SDP .The Dockerfile uses zhaoting123/franka_robot_docker:v1 as the base image by default. If you already have the base image locally under another tag, override it with:
cd Files
sudo docker build \
--build-arg SDP_BASE_IMAGE=franka_robot_docker:v1 \
-t sdp-image \
-f dockerfile_SDP .Run the SDP container with a bind mount so your local Files/src directory is available at /app inside the container:
cd Files
sudo docker run -it \
--gpus=all \
--net=host \
--env="NVIDIA_DRIVER_CAPABILITIES=all" \
--env="DISPLAY" \
--env="QT_X11_NO_MITSHM=1" \
--volume="/tmp/.X11-unix:/tmp/.X11-unix:rw" \
--volume="$(pwd)/src:/app:rw" \
sdp-image bashPull the pre-built SDP Apptainer image. You could also build the image from the repository root:
cd Files
apptainer build --fakeroot SDP_nvidia.sif apptainer_SDP.defUse the container with NVIDIA support enabled. The image already sets the working directory to /opt/sdp/src, so you can run the training entrypoints directly:
cd Files
apptainer exec --nv --bind "$PWD/src:/opt/sdp/src" SDP_nvidia.sif \
python /opt/sdp/src/main-receding_horizon.py \
--config-name train_CLIC_Diffusion_image_Ta8 \
hydra.run.dir='outputs/${experiment_id}' \
+GENERAL.render_savefig_flag=falseRun a simple 2D point-to-goal task:
python main-receding_horizon.py --config-name=train_Set_Supervised_Diffusion_low_dim_Ta8 +GENERAL.render_savefig_flag=true| Path | Description |
|---|---|
Files/src/main-receding_horizon.py |
Simulation training/evaluation entrypoint |
Files/src/main-real-robot.py |
Real-robot entrypoint |
Files/src/agents/ |
Policy and baseline implementations |
Files/src/env/ |
Simulation and real-robot env wrappers |
Files/src/tools/ |
Buffers, feedback, helpers |
Files/src/config/ |
Hydra configs for simulation |
Files/src/config_real/ |
Hydra configs for real robot |
Run algorithms with:
python main-receding_horizon.py --config-path='config/exp_accurate_interactive' --config-name=train_Circular_Set_Supervised_Diffusion_image_Ta8Config files for all algorithms are in Files/src/config/. An introduction of the file main-receding_horizon.py is provided in this document.
Override the task directly in the command. Task configs live in Files/src/config/exp_accurate_interactive/task/, for example square_image_abs, pickcan_image_abs, pushT_abs, and twoArmLift_image_abs:
python main-receding_horizon.py \
--config-path='config/exp_accurate_interactive' \
--config-name=train_Circular_Set_Supervised_Diffusion_image_Ta8 \
task=pickcan_image_absTo simulate noisy corrective feedback, enable Gaussian teacher noise with a Hydra override:
python main-receding_horizon.py \
--config-path='config/exp_accurate_interactive' \
--config-name=train_Circular_Set_Supervised_Diffusion_image_Ta8 \
task=pickcan_image_abs \
GENERAL.oracle_teacher_Gaussian_noise=trueYou can also use the preset noisy-feedback configs under Files/src/config/exp_noisy_demo_interactive/:
python main-receding_horizon.py \
--config-path='config/exp_noisy_demo_interactive' \
--config-name=train_Circular_Set_Supervised_Diffusion_image_Ta8 \
task=pickcan_image_absOffline training expects a trajectory dataset in SDP HDF5 format. You can preview HDF5 datasets before downloading them with this Hugging Face Space. Released correction datasets are available from Hugging Face:
| Task | Dataset | Download |
|---|---|---|
| Square correction | Robosuite Square image absolute dataset with state | Download from Hugging Face |
| PickCan correction | Robosuite PickCan image absolute dataset with state | Download from Hugging Face |
To run the Square correction dataset, download trajectory_buffer_0.hdf5 from the Square correction Hugging Face link above and place it at:
Files/src/outputs/square_dataset_SDP/trajectory_buffer_0.hdf5
From Files/src, inspect the downloaded dataset with:
python script/visualize_traj_buffer_data_hdf5.py \
--buffer-path outputs/square_dataset_SDP/trajectory_buffer_0.hdf5 \
--img-key agentview_imageThen train a policy offline with the dataset path override:
python main-receding_horizon.py \
--config-path='config/exp_offline' \
--config-name=train_Circular_Set_Supervised_Diffusion_image_abs_Ta8 \
AGENT.buffer_dataset_path=outputs/square_dataset_SDP/trajectory_buffer_0.hdf5For other datasets, replace AGENT.buffer_dataset_path with the local path to the downloaded SDP HDF5 file.
We use:
For Metaworld (old simulator version), ensure:
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/home/<user>/.mujoco/mujoco210/bin
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/lib/nvidia
export LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libGLEW.soThis section describes the recommended workflow for running real-robot experiments with the Franka manipulator.
Before running any experiment, please follow the step-by-step instructions in this document for conducting experiments with the Franka manipulator.
The real-robot learning pipeline follows an iterative workflow:
1. Collect offline demonstrations
β
2. Offline training with SDP
β
3. Deploy SDP with online IIL and collect a correction dataset
4. Offline training with SDP using the updated dataset
β
5. Deploy SDP with online IIL and collect corrections again
β
6. Repeat steps 4β5 as needed
In summary, the workflow alternates between offline training and online interactive imitation learning (IIL). Each online IIL round produces additional correction data, which is then used to improve the policy in the next offline training stage.
Use the following command to collect offline demonstration data:
python main-real-robot.py \
--config-path='config_real' \
--config-name=train_Set_Supervised_Diffusion_image_Ta8 \
AGENT.offline_data_collection=trueThe collected demonstrations will be saved as an offline dataset. This dataset is used for the first offline training stage.
The offline demonstration data need to be transferred into a correction dataset; see
Files/src/script/hdf5_traj_process/README.md
for the HDF5 trajectory processing workflow.
Before starting offline training, specify the dataset path in the configuration file using:
buffer_dataset_path: <path_to_dataset>Then run:
python main-real-robot.py \
--config-path='config_real' \
--config-name=train_Set_Supervised_Diffusion_image_Ta8_offlineThis stage trains an SDP policy from the collected offline demonstrations.
Before running online interactive imitation learning, specify the checkpoint path in the configuration file using:
load_dir: <path_to_checkpoint>Then deploy the trained policy with online IIL:
python main-real-robot.py \
--config-path='config_real' \
--config-name=train_Set_Supervised_Diffusion_image_Ta8During online IIL, the trained policy is deployed on the real robot. Human corrections are collected during execution and saved as a correction dataset.
This code builds on and adapts:
- CLIC
- Diffusion Policy
- DiffusionDPO
- IBC (MCMC sampling, baseline)
- Ambient Diffusion
- Robosuite
- Robomimic
- Metaworld
If you find this repository useful for your research, please consider citing the following paper:
@inproceedings{RSS2026_SDP,
title={Set-Supervised Diffusion Policy: Learning Action-Chunking Diffusion through Corrections},
author={Li, Zhaoting and Chen, Gang and Alonso-Mora, Javier and Della Santina, Cosimo and Kober, Jens},
booktitle={Robotics: Science and Systems 2026},
year={2026},
}
This project is licensed under the MIT License - see the LICENSE file for details.
