BiTrajDiff: Bidirectional Trajectory Generation with Diffusion Models for Offline Reinforcement Learning
Official codebase for the paper BiTrajDiff: Bidirectional Trajectory Generation with Diffusion Models for Offline Reinforcement Learning.
TLDR: This paper presents Bi-directional Trajectory Diffusion (BiTrajDiff), a novel data augmentation framework for offline reinforcement learning (RL) that improves trajectory connectivity and dataset diversity through bidirectional diffusion-based trajectory generation. Unlike prior single-direction augmentation methods, BiTrajDiff generates both forward and backward trajectory bridges between disconnected states, enabling more effective recovery of missing trajectory-level transitions in sparse and conservative offline datasets. By leveraging a dual diffusion process, the method synthesizes high-quality intermediate trajectories that better preserve behavioral consistency while enhancing long-horizon compositionality. Extensive experiments demonstrate that BiTrajDiff consistently improves the performance of multiple offline RL algorithms and outperforms existing state-of-the-art data augmentation and trajectory stitching baselines.
Our BiTrajDiff is built on the CleanDiffuser repo. You can directly follow CleanDiffuser Guideline to build dependence for Bitrajdiff.
The BiTrajDiff model training can be reproduced by :
python src/bitrajdiff_pipeline.py task=<env_name> mode=train_diffusion
More detailed hyperparameters are provided in config directory.
After the BiTrajDiff model training finished, you can utilized the trained BiTrajDiff model to generate your own dataset for enhancing the offline RL algorithm:
python src/bitrajdiff_pipeline.py task=<env_name> mode=stitch
As illustrate in the expeirment section of original paper, We directy utilize JAX-CORL repo without modification to run and eval downstream offline RL.
If you find this work useful for your research, please cite our paper:
@article{qing2025bitrajdiff,
title={Bitrajdiff: Bidirectional trajectory generation with diffusion models for offline reinforcement learning},
author={Qing, Yunpeng and Chi, Yixiao and Chen, Shuo and Liu, Shunyu and Yao, Kelu and Lin, Sixu and Liu, Litao and Zou, Changqing},
journal={arXiv preprint arXiv:2506.05762},
year={2025}
}
Please feel free to contact me via email qingyunpeng@zju.edu.cn if you are interested in our research :)
