PTR: Conservative Offline Robot Policy Learning via Posterior-Transition Reweighting

Wanpeng Zhang^1,3, Hao Luo^1,3, Sipeng Zheng³, Yicheng Feng^1,3, Haiweng Xu^1,3,
Ziheng Xi^2,3, Chaoyi Xu^1,3, Haoqi Yuan^1,3, Zongqing Lu^1,3,†

¹Peking University ²Tsinghua University ³BeingBeyond

PTR is a reward-free and conservative post-training method for robot policies that uses post-action consequences to decide which logged samples deserve more gradient budget. For each sample, PTR encodes the observed post-action consequence as a latent target, inserts it into a pool of mismatched alternatives, and asks whether the matched future can be identified from the current context and the logged action chunk. The resulting posterior-to-uniform ratio defines the PTR score, which is converted into a clipped-and-mixed weight and applied to the original supervised action objective through self-normalized weighted regression.

News

[2026-03-17]: We publish PTR! Check our paper here. 🔥🔥🔥

Citation

If you find our work useful, please consider citing:

@article{zhang2026conservative,
  title={Conservative Offline Robot Policy Learning via Posterior-Transition Reweighting},
  author={Zhang, Wanpeng and Luo, Hao and Zheng, Sipeng and Feng, Yicheng and Xu, Haiweng and Xi, Ziheng and Xu, Chaoyi and Yuan, Haoqi and Lu, Zongqing},
  journal={arXiv preprint arXiv:2603.16542},
  year={2026}
}

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
docs		docs
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PTR: Conservative Offline Robot Policy Learning via Posterior-Transition Reweighting

News

Citation

About

Uh oh!

Contributors

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

PTR: Conservative Offline Robot Policy Learning via Posterior-Transition Reweighting

News

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Contributors

Uh oh!