You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Some of these optimizations are minor and probably irrelevant, many are already implemented here, and some may provide performance boosts to trlx. This issue documents these details as a checklist, to track the progress of this repository towards the entire list.
1. Vectorized Architecture - trlx already does this.
2. Weights and Biases Initialisation. Any layers initialised from scratch should use orthogonal initialization with scaling sqrt(2) and bias of 0, with policy network last layer scaled by 0.01 after init.
3. Adam Optimizer initialization. Andrychowicz et al. recommend 1e-7 as Adam epsilon (and actually find that the PyTorch default of 1e-8 is the worst of the choices tested).
4. Optimizer Weight Decay. Currently the code does not use the config value of weight_decay: 1e-6 at all? It also uses Cosine Annealing instead of Linear, and decays not to 0 (recommended by Andrychowicz et al.) but to 1.412e-4 by default. Maybe test linear to see if it makes a difference?
5. Generalized Advantage Estimation. Correctly implemented in trlx.
6. Mini-batch updates. In trlx this is being done in make_experience.
7. Normalization of Advantages (at the mini-batch level). I believe this is being done, since I think whiten is called at mini-batch level?
8. Clipped surrogate objective. Done in trlx.
9. Value function loss clipping. Done in trlx.
10. Overall loss and entropy bonus. Entropy is not used for regularization in trlx. OAI set it to 0 for mujoco anyway, and Andrychowicz et al. find that regularization does not help performance, so this may not be useful to implement.
11. Global gradient clipping. The trlxgrad_clip config option does not appear to be connected to anything. Andrychowicz et al. find a small performance boost from ensuring the norm of gradients of all parameters does not exceed 0.5.
13. Shared vs separate policy/value networks. Irrelevant in trlx due to the hydra heads implementation.
Other items in the blog post are environment/network specific to problems trlx does not tackle. Andrychowicz also contains other hyperparameter choices not mentioned here which may be of interest.
The text was updated successfully, but these errors were encountered:
The 37 Implementation Details of PPO, a blog post published at ICLR, details a number of PPO implementation details to improve both efficiency and model performance. See also: Andrychowicz et al., Engstrom et al.
Some of these optimizations are minor and probably irrelevant, many are already implemented here, and some may provide performance boosts to trlx. This issue documents these details as a checklist, to track the progress of this repository towards the entire list.
trlx
already does this.sqrt(2)
and bias of 0, with policy network last layer scaled by0.01
after init.1e-7
as Adam epsilon (and actually find that the PyTorch default of1e-8
is the worst of the choices tested).weight_decay: 1e-6
at all? It also uses Cosine Annealing instead of Linear, and decays not to 0 (recommended by Andrychowicz et al.) but to1.412e-4
by default. Maybe test linear to see if it makes a difference?trlx
.trlx
this is being done inmake_experience
.whiten
is called at mini-batch level?trlx
.trlx
.trlx
. OAI set it to 0 for mujoco anyway, and Andrychowicz et al. find that regularization does not help performance, so this may not be useful to implement.trlx
grad_clip
config option does not appear to be connected to anything. Andrychowicz et al. find a small performance boost from ensuring the norm of gradients of all parameters does not exceed0.5
.trlx
due to the hydra heads implementation.Other items in the blog post are environment/network specific to problems
trlx
does not tackle. Andrychowicz also contains other hyperparameter choices not mentioned here which may be of interest.The text was updated successfully, but these errors were encountered: