-
Notifications
You must be signed in to change notification settings - Fork 351
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RemoveNotFinite step rule #343
Conversation
|
||
|
||
class RemoveNotFinite(StepRule): | ||
"""If gradients norm is inf or nan. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You'll need a blank line after this... Sorry, the docstring checker is picky!
7bad42f
to
7115439
Compare
class RemoveNotFinite(StepRule): | ||
"""A step rule that replaces non-finite elements. | ||
|
||
Replaces non-finite elements (`inf` or `NaN`) in the step with |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this really what it does? If I read the code it seems like it replaces the entire step with the scaled parameter, instead of doing it entry-wise (which is how I read this docstring).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, this is not correct. @dwf suggested this docstring. (For some odd reason, his comment was on my fork not here.)
How about replacing first line with:
Replaces steps with non-finite norm (
infor
NaN) with
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tensor.switch
operates element-wise.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I misread. Sorry.
This overlaps with #162, but is different, I guess we should have both. More in detail: I would also like to be able to avoid updating parameters when the step contains |
You want to compile two separate Theano functions, one which calculates the step and another one which applies it (using the output of the first as an input)? That sounds woefully inefficient. |
... and another one that subtracts it. What is so inefficient about it? Computation-wise it is the same, memory-wise it is harder to tell. |
You could be stopping Theano from performing a whole bunch of optimizations, for example memory-wise as you mention (perhaps Theano likes updating parameters one by one, not needing all of the steps to be available simultaneously). But most importantly, wouldn't this force Theano to push the step from GPU to CPU (as output from the first function) and then back onto the GPU (as input to the second function)? That's a massive bandwidth issue. |
Why would it force to do such an awful thing? I assume that the step is stored in shared variables on GPU, which guarantees that both function will be performed without copying to the main memory. In fact this is exactly the way it is done in Groundhog. Memory-wise it might be indeed inefficient. |
If you keep it on GPU, that could potentially be almost a doubling in memory, because you need to keep a step for each parameter in your model, while Theano could otherwise do them weight matrix by weight matrix. If that's really how it was done in GroundHog, I'm surprised the models fit on GPUs at all! |
I doubt that Theano is smart enough to do that! However if it is not as smart, which I would expect, the inefficiency is still there. I guess it the time to ask Fred what is the recommended way in Theano to skip an update if it would lead to corruption with NaNs... |
Why not just switch on any(isnan(update)) ? Or take the disjunction across On Wed, Feb 25, 2015 at 2:39 PM, Dmitry Bogdanov notifications@github.com
|
I guess because there's no |
On Wed, Feb 25, 2015 at 4:15 PM, Bart van Merriënboer <
|
Ah, great, my bad! I was looking for it in the documentation with other logical functions like |
#352 better? |
Simple trick to remove nan, inf from gradients. Thanks to @jych
Not the best name chosen, and can change if anyone has better suggestion.