Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What does the hyperparameter "normalize" refer to in PPO? #64

Closed
cboettig opened this issue Dec 30, 2020 · 4 comments · Fixed by #65
Closed

What does the hyperparameter "normalize" refer to in PPO? #64

cboettig opened this issue Dec 30, 2020 · 4 comments · Fixed by #65
Labels
documentation Improvements or additions to documentation question Further information is requested

Comments

@cboettig
Copy link
Contributor

PPO hyperparameter configurations often refer to normalize as a logical, e.g.

normalize: true

It's not clear to me what configuration this particular hyperparameter refers to, (if anything?) e.g. I see A2C tunes the normalize_advantage parameter, but that's not a hyperparameter for PPO. PPO has a boolean to normalize_image, but don't think that's it either. Is this controlling whether or not the env gets wrapped in vector normalize?

(For context here -- I've found the zoo scripts here particularly handy for tuning even for my custom environments, thanks! but am struggling to reproduce some of the tuned results by passing the best hyper-parameters directly to fresh initializations of the RL algorithms. Thanks for the amazing work you've done in developing stable-baselines and the zoo!)

@araffin araffin added the Maintainers on vacation Maintainers are on vacation so they can recharge their batteries, we will be back soon ;) label Dec 31, 2020
@Miffyli Miffyli removed the Maintainers on vacation Maintainers are on vacation so they can recharge their batteries, we will be back soon ;) label Jan 4, 2021
@Miffyli
Copy link
Collaborator

Miffyli commented Jan 4, 2021

Yes, this parameter refers to wrapping environment with VecNormalize, wrapped here.

PS: Thanks for the kind words which make return from the vacations easier :)

@Miffyli Miffyli added the question Further information is requested label Jan 4, 2021
@araffin
Copy link
Member

araffin commented Jan 5, 2021

It's not clear to me what configuration this particular hyperparameter refers to

Yes, those are parameters to VecNormalize wrapper.
By default, observation and return normalization are enabled, but you can change that as in
https://github.com/DLR-RM/rl-baselines3-zoo/blob/master/hyperparams/sac.yml#L208

A PR that document this value in the config would be appreciated ;)

@araffin araffin added the documentation Improvements or additions to documentation label Jan 5, 2021
@cboettig
Copy link
Contributor Author

cboettig commented Jan 5, 2021

Awesome, thanks. So just to be sure I got this right, all algos, not just PPO, use the VecNormalize wrapper around the environment, with norm_obs and norm_reward are True by default (unless over-ridden in hyperparameters yaml files as shown?) With these parameters listed at
https://github.com/DLR-RM/stable-baselines3/blob/master/stable_baselines3/common/vec_env/vec_normalize.py#L29-L37 ?

Or it looks like some of those parameters are overwritten? e.g. gamma for the normalization is set to whatever gamma the agent is using? Happy to prep a PR, and sorry for all the questions, my earlier attempts to eyeball the source definitely miss-read this (looked to me that the default normalization was False, a la https://github.com/DLR-RM/rl-baselines3-zoo/blob/master/utils/exp_manager.py#L85). again apologies for being dense

@araffin
Copy link
Member

araffin commented Jan 6, 2021

So just to be sure I got this right, all algos, not just PPO, use the VecNormalize wrapper around the environment, with norm_obs and norm_reward are True by default
unless over-ridden in hyperparameters yaml files as shown?

yes

Or it looks like some of those parameters are overwritten? e.g. gamma for the normalization is set to whatever gamma the agent is using?

gamma is the only one we override automatically for correctness (and only if present in the hyperparameters).
We also deactivate reward normalization when evaluating the agent (to have the true reward, even though it is not needed anymore as we recently switched to Monitor wrapper).

Happy to prep a PR, and sorry for all the questions,

no pb ;)
If you were confused, then you were probably not the only one.

cboettig added a commit to cboettig/rl-baselines3-zoo that referenced this issue Jan 8, 2021
and a few additional comments regarding hyperparameter defaults in general
araffin added a commit that referenced this issue Jan 12, 2021
* extend documentation to address #64

and a few additional comments regarding hyperparameter defaults in general

* Update changelog and readme

* Update README

Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation question Further information is requested
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants