Beyond Reward: Offline Preference-guided Policy Optimization (OPPO)

This is source code for the paper, "Beyond Reward: Offline Preference-guided Policy Optimization" (OPPO)

Main codes are in oppo folder It contains 2 parts:

scripted contains code to reproduce results using preferences generated by a "scripted teacher".

human contains code to train/eval OPPO using human-labeled perference, which is from Preference Transformer, please refer to their codebase for further details and consider cite their paper if needed

Citation

@misc{kang2023reward,
      title={Beyond Reward: Offline Preference-guided Policy Optimization}, 
      author={Yachen Kang and Diyuan Shi and Jinxin Liu and Li He and Donglin Wang},
      year={2023},
      eprint={2305.16217},
      archivePrefix={arXiv},
      primaryClass={cs.LG}
}

Acknowledgements

Our code is largely based on Decision Transformer

Human labels are obtained thanks to Preference Transformer

Our experiments, largely used D4RL dataset

Lift and Can environments are owing to Robomimic and Robosuite project

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
oppo		oppo
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Beyond Reward: Offline Preference-guided Policy Optimization (OPPO)

Citation

Acknowledgements

About

Releases

Packages

Languages

bkkgbkjb/OPPO

Folders and files

Latest commit

History

Repository files navigation

Beyond Reward: Offline Preference-guided Policy Optimization (OPPO)

Citation

Acknowledgements

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages