[Question] What is the real intention for reward scaling with running variance of discounted rewards? #1165

MagiFeeney · 2022-11-10T08:29:38Z

❓ Question

It confuses me a lot that using statistics of discounted rewards to rescale another quantity - reward. That seems a default choice for PPO. Is there any intuition for interpreting this choice? Why not just use single reward's running variance to achieve that? And what I can conclude is that its effect is pretty like learning rate annealing, or something I'm missing?

Checklist

I have checked that there is no similar issue in the repo
I have read the documentation
If code there is, it is minimal and working
If code there is, it is formatted using the markdown code blocks for both code and stack traces.

araffin · 2022-11-10T13:24:01Z

Hello,

I have checked that there is no similar issue in the repo

Probably a duplicate of #348 (comment)

Why not just use single reward's running variance to achieve that?

you can also give that a try, I would be happy to have comparison (but I don't think it will make a big difference, the main thing is to scale the reward and return to make learning the value function easier).

MagiFeeney · 2022-11-10T14:13:57Z

Those would be helpful, I will check it out! I am not confident to conclude that using single reward would be worse, but I have overwritten the normalize_reward function with creating another RunningMeanStd rew_rms, results show that it doesn't perform well, so I quickly skipped this choice.

MagiFeeney added the question Further information is requested label Nov 10, 2022

MagiFeeney closed this as completed Nov 10, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Question] What is the real intention for reward scaling with running variance of discounted rewards? #1165

[Question] What is the real intention for reward scaling with running variance of discounted rewards? #1165

MagiFeeney commented Nov 10, 2022 •

edited

Loading

araffin commented Nov 10, 2022

MagiFeeney commented Nov 10, 2022

[Question] What is the real intention for reward scaling with running variance of discounted rewards? #1165

[Question] What is the real intention for reward scaling with running variance of discounted rewards? #1165

Comments

MagiFeeney commented Nov 10, 2022 • edited Loading

❓ Question

Checklist

araffin commented Nov 10, 2022

MagiFeeney commented Nov 10, 2022

MagiFeeney commented Nov 10, 2022 •

edited

Loading