-
Notifications
You must be signed in to change notification settings - Fork 19.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to set clipnorm? #510
Comments
Can you reformat your post in a way that would make it readable, or otherwise reformulate your question? |
Done |
To set clipnorm, you can simply do:
However it is unclear why you would want to clip your gradients in a simple MLP. The cause of your issue is likely in your data. Additionally, you can impose constraints on your weights, which might be closer to what you want to do: http://keras.io/constraints/ |
Same data work just fine for the lasagne model on the same machine without clipnorm. When I use the same data for the keras model, I got nan loss without setting clipnorm. I read some threads about the nan loss, and using clipnorm is one solution working for me. |
Two possible fixes:
A simple MLP will never cause gradient explosion if your data is correctly preprocessed. |
Hi all,
I tried to use keras to reproduce results for a very simple MLP model in lasagne. This simple MLP lasagne model is defined as follows
'''
lasagne model
see net1 for more details
And here is how I defined it in keras,
keras model
I have already ensured input data for both models are identical, but my keras model gives nan loss, which I noticed this loss quickly blow up as more batches are trained. This problem sill happens even if I use CPU mode.
However, this problem will be disappear if I use clipnorm option in sgd. I tried many different clipnorm values, and the relationship between these clipnorm values are P(0.9) < P(0.95) < P(1) < P(10) < P(100), and P(1000) keeps oscillating and never converges, where P( . ) is the final validation loss in terms of MSE after 400 epochs. Another thing is that even though I achieved similar performance when clipnorm=100, its losses in the first 40 epochs also severely oscillated, while suddenly stabilized at epoch = 39. One may see detailed loss log in the end of this post.
This raise up a question how to set clipnorm correctly. I really donot know how to do this. And I probably will early terminate my experiment with clipnorm=100 for its big oscillating. I will appreciate anyone can help on this. On the other hand, I guess it will better if there is a regression example in keras.
The text was updated successfully, but these errors were encountered: