What does the hyperparameter "normalize" refer to in PPO? #64

cboettig · 2020-12-30T22:31:46Z

PPO hyperparameter configurations often refer to normalize as a logical, e.g.

Line 44 in 8ea4f4a

normalize: true

It's not clear to me what configuration this particular hyperparameter refers to, (if anything?) e.g. I see A2C tunes the normalize_advantage parameter, but that's not a hyperparameter for PPO. PPO has a boolean to normalize_image, but don't think that's it either. Is this controlling whether or not the env gets wrapped in vector normalize?

(For context here -- I've found the zoo scripts here particularly handy for tuning even for my custom environments, thanks! but am struggling to reproduce some of the tuned results by passing the best hyper-parameters directly to fresh initializations of the RL algorithms. Thanks for the amazing work you've done in developing stable-baselines and the zoo!)

The text was updated successfully, but these errors were encountered:

Miffyli · 2021-01-04T12:49:54Z

Yes, this parameter refers to wrapping environment with VecNormalize, wrapped here.

PS: Thanks for the kind words which make return from the vacations easier :)

araffin · 2021-01-05T16:16:48Z

It's not clear to me what configuration this particular hyperparameter refers to

Yes, those are parameters to VecNormalize wrapper.
By default, observation and return normalization are enabled, but you can change that as in
https://github.com/DLR-RM/rl-baselines3-zoo/blob/master/hyperparams/sac.yml#L208

A PR that document this value in the config would be appreciated ;)

cboettig · 2021-01-05T17:43:28Z

Awesome, thanks. So just to be sure I got this right, all algos, not just PPO, use the VecNormalize wrapper around the environment, with norm_obs and norm_reward are True by default (unless over-ridden in hyperparameters yaml files as shown?) With these parameters listed at
https://github.com/DLR-RM/stable-baselines3/blob/master/stable_baselines3/common/vec_env/vec_normalize.py#L29-L37 ?

Or it looks like some of those parameters are overwritten? e.g. gamma for the normalization is set to whatever gamma the agent is using? Happy to prep a PR, and sorry for all the questions, my earlier attempts to eyeball the source definitely miss-read this (looked to me that the default normalization was False, a la https://github.com/DLR-RM/rl-baselines3-zoo/blob/master/utils/exp_manager.py#L85). again apologies for being dense

araffin · 2021-01-06T10:26:41Z

So just to be sure I got this right, all algos, not just PPO, use the VecNormalize wrapper around the environment, with norm_obs and norm_reward are True by default
unless over-ridden in hyperparameters yaml files as shown?

yes

Or it looks like some of those parameters are overwritten? e.g. gamma for the normalization is set to whatever gamma the agent is using?

gamma is the only one we override automatically for correctness (and only if present in the hyperparameters).
We also deactivate reward normalization when evaluating the agent (to have the true reward, even though it is not needed anymore as we recently switched to Monitor wrapper).

Happy to prep a PR, and sorry for all the questions,

no pb ;)
If you were confused, then you were probably not the only one.

and a few additional comments regarding hyperparameter defaults in general

* extend documentation to address #64 and a few additional comments regarding hyperparameter defaults in general * Update changelog and readme * Update README Co-authored-by: Antonin RAFFIN <antonin.raffin@ensta.org>

araffin added the Maintainers on vacation Maintainers are on vacation so they can recharge their batteries, we will be back soon ;) label Dec 31, 2020

Miffyli removed the Maintainers on vacation Maintainers are on vacation so they can recharge their batteries, we will be back soon ;) label Jan 4, 2021

Miffyli added the question Further information is requested label Jan 4, 2021

araffin added the documentation Improvements or additions to documentation label Jan 5, 2021

cboettig added a commit to cboettig/rl-baselines3-zoo that referenced this issue Jan 8, 2021

extend documentation to address DLR-RM#64

26511d4

and a few additional comments regarding hyperparameter defaults in general

araffin mentioned this issue Jan 9, 2021

Improve documentation #65

Merged

araffin closed this as completed in #65 Jan 12, 2021

edmund735 mentioned this issue Apr 4, 2024

[Question] exp_manager reward and GAE discount factors #442

Open

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What does the hyperparameter "normalize" refer to in PPO? #64

What does the hyperparameter "normalize" refer to in PPO? #64

cboettig commented Dec 30, 2020

Miffyli commented Jan 4, 2021 •

edited

Loading

araffin commented Jan 5, 2021

cboettig commented Jan 5, 2021

araffin commented Jan 6, 2021

What does the hyperparameter "normalize" refer to in PPO? #64

What does the hyperparameter "normalize" refer to in PPO? #64

Comments

cboettig commented Dec 30, 2020

Miffyli commented Jan 4, 2021 • edited Loading

araffin commented Jan 5, 2021

cboettig commented Jan 5, 2021

araffin commented Jan 6, 2021

Miffyli commented Jan 4, 2021 •

edited

Loading