-
Notifications
You must be signed in to change notification settings - Fork 515
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
What does the hyperparameter "normalize" refer to in PPO? #64
Comments
Yes, this parameter refers to wrapping environment with VecNormalize, wrapped here. PS: Thanks for the kind words which make return from the vacations easier :) |
Yes, those are parameters to A PR that document this value in the config would be appreciated ;) |
Awesome, thanks. So just to be sure I got this right, all algos, not just PPO, use the Or it looks like some of those parameters are overwritten? e.g. gamma for the normalization is set to whatever gamma the agent is using? Happy to prep a PR, and sorry for all the questions, my earlier attempts to eyeball the source definitely miss-read this (looked to me that the default normalization was |
yes
gamma is the only one we override automatically for correctness (and only if present in the hyperparameters).
no pb ;) |
and a few additional comments regarding hyperparameter defaults in general
* extend documentation to address #64 and a few additional comments regarding hyperparameter defaults in general * Update changelog and readme * Update README Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
PPO hyperparameter configurations often refer to
normalize
as a logical, e.g.rl-baselines3-zoo/hyperparams/ppo.yml
Line 44 in 8ea4f4a
It's not clear to me what configuration this particular hyperparameter refers to, (if anything?) e.g. I see A2C tunes the
normalize_advantage
parameter, but that's not a hyperparameter for PPO. PPO has a boolean to normalize_image, but don't think that's it either. Is this controlling whether or not the env gets wrapped in vector normalize?(For context here -- I've found the zoo scripts here particularly handy for tuning even for my custom environments, thanks! but am struggling to reproduce some of the tuned results by passing the best hyper-parameters directly to fresh initializations of the RL algorithms. Thanks for the amazing work you've done in developing stable-baselines and the zoo!)
The text was updated successfully, but these errors were encountered: