Problems with PPO/ClippedPPO #87

KuenstlicheIntelligenz · 2018-04-09T11:56:12Z

Hey Guys,

I've trouble with the likely hood ratio and nan/inf. I'm using entropy regularization for the exploration and its getting quite low so i think that the distributions at some point can't be compared anymore.
My model learns until a certain point and then the nans happen. Adding a small epsilon in the ratio avoids the nans but then the reward curve is just dropping at some point and the model is not learning anymore.
(The KL Divergence is also divergent) I'm using my own environment, a feed forward architecture and have a continious problem.

I've already tried many things:

Optimizers: Adam(with different epsilons), RMSProp
Reducing the LR (that just postpones the crash)
Reducing the clipping(0.1) and the epochs, clip the gradients
Changing the coefficients for the value loss, policy loss and entropy
Changed the weight initializers and the network sizes (~a bigger network postpones the problem)
Changed the activation function (relu, lrelu, selu, tanh)

If i change the beta coefficient for the entropy i get either an ever increasing entropy or it falls until the crash happens.
The agent learns pretty well until that point, so i suppose i haven't made any error in my implementation.
I may have made an error in the amount i have changed the parameters.

Any tips or ideas to that?

KuenstlicheIntelligenz · 2018-04-11T16:29:53Z

If anyone has a similar problem, i solved it with the right parameters and by just clipping the logstd as a quickfix so it doesn't get too small. Also theres a known issue with Adam where you have to choose a higher epsilon. I used RMSProp.

Update: It hadn't solved it completely but postponing the crash

ujjawalchugh97 · 2019-10-18T09:18:25Z

I'm facing the same issue. No matter what hyperparameters I use for the clipped PPO algorithm, the surrogate loss, KL divergence and the entropy become Nan after some time. Any solution to this?

ujjawalchugh97 · 2019-10-18T09:20:00Z

@KuenstlicheIntelligenz did you figure out the cause/solution to the problem?

MeiJuanLiu · 2019-12-28T11:07:06Z

@ujjawalchugh97 , I'm facing the same issue, too. so i wonder to know if you have solved this problem .smaller network(with default layers) has no problem but a bigger network will cause nan value after just training for a few epochs and i have no idea to solve the problem.

dhruvramani · 2020-06-27T11:29:18Z

You can try this maybe : https://stats.stackexchange.com/a/66621

EDIT : Didn't work for me. Most of the values became zero instead. If anyone finds a solution, please let me know.

eflopez1 · 2020-12-08T23:22:54Z

Hello. I don't use this library and am instead using Stable-Baselines . A similar issue came up for me where NaNs would appear after a few epochs of training on my custom environment and on larger Neural Networks. An interim solution has been to use smaller networks, but this is not ideal.

Recently I have seen success with large networks by setting the entropy coefficient to 0 as is discussed here: https://stable-baselines.readthedocs.io/en/master/.

A quick survey of how this library is setup and it seems to have a similar option, try that perhaps? On a side note, I am actually here because I am trying to learn why setting the entropy coefficient to 0 solves the problem.

Hope this helps.

KuenstlicheIntelligenz · 2020-12-11T11:40:29Z

Sorry guys for the late answer, i had the problem in my masterthesis and remember solving it with clipping and adding epsilons to avoid getting too small and too large numbers.

richardrl · 2021-05-11T23:43:13Z

Hi @KuenstlicheIntelligenz can you elaborate where exactly you did the clipping? Or reference which part of code it was added

Also where you added the epsilon

Have been running into this exact same issue training a large model with PPO

KuenstlicheIntelligenz · 2021-05-12T11:16:31Z

@richardrl
Sorry i cannot. I heavily changed the original repo and didnt documented well, due to time pressure. Also im not into the project anymore and currently have not the time to get into it.

I think the problem was that the probability distributions collapsed to a somewhat near dirac distributions and could not be compared anymore. So you could say Q(x) would be near 0 for some x while P(x) had a high probability, which resulted in Nans in the kl computation. Adding epsilon to Q(x) would still result in a very high kl divergence which resulted in exploding gradients which "destroyed" the model (learning curve dropping).
I somehow solved the problem by making sure the kl divergence and the gradients did not get too big. I surely clipped the gradients right before applying them to the network weights.

richardrl · 2021-05-16T15:36:08Z

Yea, I "fixed" this problem in the sense I get no errors anymore. I had to use clipping in the logp computation (visualize log(x) and exp(x) and think about what's reasonable). I also added an epsilon (ratio = logp_new -logp_old). Finally, I initialized the mean and variance of my distribution to be relatively small. This is important. If the variance is too high you are guaranteed to nan out when calculating the ratio

KuenstlicheIntelligenz closed this as completed Apr 11, 2018

galnov pushed a commit that referenced this issue Nov 27, 2018

Channel order transpose, for image embedder. Updated unit test. (#87)

7ba1a43

ujjawalchugh97 mentioned this issue Oct 20, 2019

Nans in PPO and Clipped PPO agents #418

Open

dandanelbaz self-assigned this Oct 22, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Problems with PPO/ClippedPPO #87

Problems with PPO/ClippedPPO #87

KuenstlicheIntelligenz commented Apr 9, 2018 •

edited

KuenstlicheIntelligenz commented Apr 11, 2018 •

edited

ujjawalchugh97 commented Oct 18, 2019

ujjawalchugh97 commented Oct 18, 2019

MeiJuanLiu commented Dec 28, 2019

dhruvramani commented Jun 27, 2020 •

edited

eflopez1 commented Dec 8, 2020

KuenstlicheIntelligenz commented Dec 11, 2020

richardrl commented May 11, 2021

KuenstlicheIntelligenz commented May 12, 2021

richardrl commented May 16, 2021

Problems with PPO/ClippedPPO #87

Problems with PPO/ClippedPPO #87

Comments

KuenstlicheIntelligenz commented Apr 9, 2018 • edited

KuenstlicheIntelligenz commented Apr 11, 2018 • edited

ujjawalchugh97 commented Oct 18, 2019

ujjawalchugh97 commented Oct 18, 2019

MeiJuanLiu commented Dec 28, 2019

dhruvramani commented Jun 27, 2020 • edited

eflopez1 commented Dec 8, 2020

KuenstlicheIntelligenz commented Dec 11, 2020

richardrl commented May 11, 2021

KuenstlicheIntelligenz commented May 12, 2021

richardrl commented May 16, 2021

KuenstlicheIntelligenz commented Apr 9, 2018 •

edited

KuenstlicheIntelligenz commented Apr 11, 2018 •

edited

dhruvramani commented Jun 27, 2020 •

edited