some problems when run example with OvercookedMultiEnv-v0 PPO #5

lixiyun98 · 2022-07-11T06:07:56Z

Hello, I run this framework with the command for training PPO python3 trainer.py OvercookedMultiEnv-v0 PPO PPO --env-config '{"layout_name": "simple"}' --seed 10 --preset 1， However, in the training log, I found the loss of the value function increases as the reward increases, I am confused for this, and the ep_rew_mean can reach 300 when total-timesteps is 500000. And I wonder how to solve this because this looks like a bug.

The text was updated successfully, but these errors were encountered:

ShaoZhang0115 · 2022-09-21T15:21:08Z

We also met this problem. The loss, entropy loss, value loss, and policy gradient loss increase during the training process, with the reward increasing.
We have tried all the layouts, and the loss decreases only when the reward is 0, as we usually know.
We tried checking the call to SB3 and the computation of loss but found no obvious errors.
And no errors or warnings were encountered during training.
Would there be something wrong with the SB3 itself? Or the print of the log has something wrong?

bsarkar321 · 2023-01-02T02:53:45Z

I believe that this behavior is expected and isn't an issue. SB3 (and all other PPO implementations I'm aware of) use a mean-squared error loss between the expected returns from the value function and the "true" value from the rollout. When policies are stochastic and their expected rewards are increasing, this will naturally lead to a higher variance in returns and a higher MSE (even though it is more accurate).

In my experience, the value loss and policy loss reported by PPO (or most other RL algorithms) do not provide a strong signal for "learning." I've noticed this same behavior with single-player environments like Cartpole, and also with different implementations of PPO (like CleanRL or Garage).

bsarkar321 closed this as completed Jan 2, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

some problems when run example with OvercookedMultiEnv-v0 PPO #5

some problems when run example with OvercookedMultiEnv-v0 PPO #5

lixiyun98 commented Jul 11, 2022

ShaoZhang0115 commented Sep 21, 2022

bsarkar321 commented Jan 2, 2023

some problems when run example with OvercookedMultiEnv-v0 PPO #5

some problems when run example with OvercookedMultiEnv-v0 PPO #5

Comments

lixiyun98 commented Jul 11, 2022

ShaoZhang0115 commented Sep 21, 2022

bsarkar321 commented Jan 2, 2023