Problems with PPO/ClippedPPO #87
Comments
If anyone has a similar problem, i solved it with the right parameters and by just clipping the logstd as a quickfix so it doesn't get too small. Also theres a known issue with Adam where you have to choose a higher epsilon. I used RMSProp. Update: It hadn't solved it completely but postponing the crash |
I'm facing the same issue. No matter what hyperparameters I use for the clipped PPO algorithm, the surrogate loss, KL divergence and the entropy become Nan after some time. Any solution to this? |
@KuenstlicheIntelligenz did you figure out the cause/solution to the problem? |
@ujjawalchugh97 , I'm facing the same issue, too. so i wonder to know if you have solved this problem .smaller network(with default layers) has no problem but a bigger network will cause nan value after just training for a few epochs and i have no idea to solve the problem. |
You can try this maybe : https://stats.stackexchange.com/a/66621 EDIT : Didn't work for me. Most of the values became zero instead. If anyone finds a solution, please let me know. |
Hello. I don't use this library and am instead using Stable-Baselines . A similar issue came up for me where NaNs would appear after a few epochs of training on my custom environment and on larger Neural Networks. An interim solution has been to use smaller networks, but this is not ideal. Recently I have seen success with large networks by setting the entropy coefficient to 0 as is discussed here: https://stable-baselines.readthedocs.io/en/master/. A quick survey of how this library is setup and it seems to have a similar option, try that perhaps? On a side note, I am actually here because I am trying to learn why setting the entropy coefficient to 0 solves the problem. Hope this helps. |
Sorry guys for the late answer, i had the problem in my masterthesis and remember solving it with clipping and adding epsilons to avoid getting too small and too large numbers. |
Hi @KuenstlicheIntelligenz can you elaborate where exactly you did the clipping? Or reference which part of code it was added Also where you added the epsilon Have been running into this exact same issue training a large model with PPO |
@richardrl I think the problem was that the probability distributions collapsed to a somewhat near dirac distributions and could not be compared anymore. So you could say Q(x) would be near 0 for some x while P(x) had a high probability, which resulted in Nans in the kl computation. Adding epsilon to Q(x) would still result in a very high kl divergence which resulted in exploding gradients which "destroyed" the model (learning curve dropping). |
Yea, I "fixed" this problem in the sense I get no errors anymore. I had to use clipping in the logp computation (visualize log(x) and exp(x) and think about what's reasonable). I also added an epsilon (ratio = logp_new -logp_old). Finally, I initialized the mean and variance of my distribution to be relatively small. This is important. If the variance is too high you are guaranteed to nan out when calculating the ratio |
Hey Guys,
I've trouble with the likely hood ratio and nan/inf. I'm using entropy regularization for the exploration and its getting quite low so i think that the distributions at some point can't be compared anymore.
My model learns until a certain point and then the nans happen. Adding a small epsilon in the ratio avoids the nans but then the reward curve is just dropping at some point and the model is not learning anymore.
(The KL Divergence is also divergent) I'm using my own environment, a feed forward architecture and have a continious problem.
I've already tried many things:
If i change the beta coefficient for the entropy i get either an ever increasing entropy or it falls until the crash happens.
The agent learns pretty well until that point, so i suppose i haven't made any error in my implementation.
I may have made an error in the amount i have changed the parameters.
Any tips or ideas to that?
The text was updated successfully, but these errors were encountered: