Skip to content

BeingBeyond/PTR

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 

Repository files navigation

PTR: Conservative Offline Robot Policy Learning via Posterior-Transition Reweighting

Wanpeng Zhang1,3, Hao Luo1,3, Sipeng Zheng3, Yicheng Feng1,3, Haiweng Xu1,3,
Ziheng Xi2,3, Chaoyi Xu1,3, Haoqi Yuan1,3, Zongqing Lu1,3,†

1Peking University    2Tsinghua University    3BeingBeyond

Website arXiv

PTR framework

PTR is a reward-free and conservative post-training method for robot policies that uses post-action consequences to decide which logged samples deserve more gradient budget. For each sample, PTR encodes the observed post-action consequence as a latent target, inserts it into a pool of mismatched alternatives, and asks whether the matched future can be identified from the current context and the logged action chunk. The resulting posterior-to-uniform ratio defines the PTR score, which is converted into a clipped-and-mixed weight and applied to the original supervised action objective through self-normalized weighted regression.

News

  • [2026-03-17]: We publish PTR! Check our paper here. 🔥🔥🔥

Citation

If you find our work useful, please consider citing:

@article{zhang2026conservative,
  title={Conservative Offline Robot Policy Learning via Posterior-Transition Reweighting},
  author={Zhang, Wanpeng and Luo, Hao and Zheng, Sipeng and Feng, Yicheng and Xu, Haiweng and Xi, Ziheng and Xu, Chaoyi and Yuan, Haoqi and Lu, Zongqing},
  journal={arXiv preprint arXiv:2603.16542},
  year={2026}
}

About

Conservative Offline Robot Policy Learning via Posterior-Transition Reweighting

Resources

Stars

Watchers

Forks

Contributors