Refine

Diffusion behavioral cloning + DSRL-style RL fine-tuning on Robomimic.

A diffusion policy is first trained with BC on demonstrations, then refined by SAC operating in the diffusion policy's noise space (Diffusion Steering RL). The frozen diffusion model maps SAC's noise vectors w to environment actions via DDIM, so RL learns to steer an already-good prior instead of learning control from scratch.

Results (Robomimic, low-dim, proficient-human)

Diffusion BC alone:

Task	Success rate
Lift	98%
Can	96%
Square	82%

Evaluated over 20 episodes per task with DDIM (10 steps), pred_horizon=4.

Sample rollouts in videos/:

Lift	Can	Square
ep01 / ep03	ep01 / ep02 / ep03	ep01 / ep02 / ep03

Layout

config/         per-task BC + DSRL hyperparameters
data/           Robomimic hdf5 loader and min-max normalizer
model/          diffusion policy (MLP epsilon predictor, EMA, DDIM sampler)
wrappers/       DSRLEnvWrapper - exposes noise space to SB3 SAC
train_bc.py     stage 1: BC pretraining of the diffusion policy
train_dsrl.py   stage 2: SAC in noise space over the frozen diffusion policy
evaluate.py     rollout the BC diffusion policy
record_eval.py  same, with video recording

Stage 1 - BC pretraining

python train_bc.py \
    --data_path datasets/lift/ph/low_dim_v141.hdf5 \
    --save_dir checkpoints/bc/lift \
    --task lift \
    --pred_horizon 4 \
    --diffusion_steps 100 \
    --hidden_dims 1024 1024 1024 \
    --epochs 3000

Per-task configs live in config/bc_{lift,can,square}.yaml. Checkpoints store the model, optimizer, EMA, both normalizers, observation keys, and the full arg dict so downstream stages can reconstruct everything.

Evaluate:

python evaluate.py --checkpoint checkpoints/bc/lift/state_3000.pt --n_episodes 20 --ddim_steps 10

Stage 2 - DSRL fine-tuning

SAC's action space becomes the diffusion noise vector w of shape pred_horizon * action_per_step. Each wrapper step denoises w into a chunk of pred_horizon env actions and executes them open-loop. The replay buffer is warm-started by rolling out the BC policy with w ~ N(0, I).

python train_dsrl.py \
    --bc_checkpoint checkpoints/bc/lift/state_3000.pt \
    --task lift \
    --action_magnitude 1.0 \
    --ddim_steps 10 \
    --initial_steps 320000 \
    --train_steps 100000 \
    --utd 20

Best checkpoint by online success rate is written to checkpoints/dsrl/<task>/sac_best.zip.

Requirements

PyTorch (CUDA recommended)
robosuite, robomimic
stable-baselines3, gymnasium
h5py, numpy

Datasets: standard Robomimic low_dim_v141.hdf5 files under datasets/<task>/ph/.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Refine

Results (Robomimic, low-dim, proficient-human)

Layout

Stage 1 - BC pretraining

Stage 2 - DSRL fine-tuning

Requirements

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
__pycache__		__pycache__
config		config
data		data
model		model
videos		videos
wrappers		wrappers
README.md		README.md
evaluate.py		evaluate.py
record_eval.py		record_eval.py
train_bc.py		train_bc.py
train_dsrl.py		train_dsrl.py

Folders and files

Latest commit

History

Repository files navigation

Refine

Results (Robomimic, low-dim, proficient-human)

Layout

Stage 1 - BC pretraining

Stage 2 - DSRL fine-tuning

Requirements

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages