PPO Implementation Details - Checklist #53

herbiebradley · 2022-10-20T23:57:45Z

The 37 Implementation Details of PPO, a blog post published at ICLR, details a number of PPO implementation details to improve both efficiency and model performance. See also: Andrychowicz et al., Engstrom et al.

Some of these optimizations are minor and probably irrelevant, many are already implemented here, and some may provide performance boosts to trlx. This issue documents these details as a checklist, to track the progress of this repository towards the entire list.

Other items in the blog post are environment/network specific to problems trlx does not tackle. Andrychowicz also contains other hyperparameter choices not mentioned here which may be of interest.

The text was updated successfully, but these errors were encountered:

Dahoas · 2022-10-21T13:16:34Z

Thanks for this!

LouisCastricato · 2022-12-26T22:58:35Z

Closing

This was referenced Nov 6, 2022

Update TrainConfig optimizer hyperparameters #82

Merged

Refactor PPO objective function #88

Merged

LouisCastricato closed this as completed Dec 26, 2022

LouisCastricato mentioned this issue Jan 8, 2023

[PPOTrainer] Support generic optimizers huggingface/trl#78

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PPO Implementation Details - Checklist #53

PPO Implementation Details - Checklist #53

herbiebradley commented Oct 20, 2022 •

edited

Loading

Dahoas commented Oct 21, 2022

LouisCastricato commented Dec 26, 2022

PPO Implementation Details - Checklist #53

PPO Implementation Details - Checklist #53

Comments

herbiebradley commented Oct 20, 2022 • edited Loading

Dahoas commented Oct 21, 2022

LouisCastricato commented Dec 26, 2022

herbiebradley commented Oct 20, 2022 •

edited

Loading