Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RemoveNotFinite step rule #343

Merged
merged 1 commit into from
Feb 25, 2015
Merged

RemoveNotFinite step rule #343

merged 1 commit into from
Feb 25, 2015

Conversation

memimo
Copy link
Contributor

@memimo memimo commented Feb 24, 2015

Simple trick to remove nan, inf from gradients. Thanks to @jych
Not the best name chosen, and can change if anyone has better suggestion.



class RemoveNotFinite(StepRule):
"""If gradients norm is inf or nan.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You'll need a blank line after this... Sorry, the docstring checker is picky!

@memimo memimo force-pushed the notfinite branch 3 times, most recently from 7bad42f to 7115439 Compare February 24, 2015 19:50
class RemoveNotFinite(StepRule):
"""A step rule that replaces non-finite elements.

Replaces non-finite elements (`inf` or `NaN`) in the step with
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this really what it does? If I read the code it seems like it replaces the entire step with the scaled parameter, instead of doing it entry-wise (which is how I read this docstring).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this is not correct. @dwf suggested this docstring. (For some odd reason, his comment was on my fork not here.)
How about replacing first line with:
Replaces steps with non-finite norm (inforNaN) with

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tensor.switch operates element-wise.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I misread. Sorry.

@rizar
Copy link
Contributor

rizar commented Feb 25, 2015

This overlaps with #162, but is different, I guess we should have both.

More in detail: I would also like to be able to avoid updating parameters when the step contains NaN, so that the last valid parameters were available for analysis. For this I guess we will have to split the single Theano function of the DifferentiableCostMinimizer into two, one computing step and another one subtracting it from parameters, similarly like it was done in Groundhog.

rizar added a commit that referenced this pull request Feb 25, 2015
RemoveNotFinite step rule
@rizar rizar merged commit 1c2278b into mila-iqia:master Feb 25, 2015
@bartvm
Copy link
Member

bartvm commented Feb 25, 2015

You want to compile two separate Theano functions, one which calculates the step and another one which applies it (using the output of the first as an input)? That sounds woefully inefficient.

@rizar
Copy link
Contributor

rizar commented Feb 25, 2015

... and another one that subtracts it. What is so inefficient about it? Computation-wise it is the same, memory-wise it is harder to tell.

@bartvm
Copy link
Member

bartvm commented Feb 25, 2015

You could be stopping Theano from performing a whole bunch of optimizations, for example memory-wise as you mention (perhaps Theano likes updating parameters one by one, not needing all of the steps to be available simultaneously). But most importantly, wouldn't this force Theano to push the step from GPU to CPU (as output from the first function) and then back onto the GPU (as input to the second function)? That's a massive bandwidth issue.

@rizar
Copy link
Contributor

rizar commented Feb 25, 2015

Why would it force to do such an awful thing?

I assume that the step is stored in shared variables on GPU, which guarantees that both function will be performed without copying to the main memory. In fact this is exactly the way it is done in Groundhog.

Memory-wise it might be indeed inefficient.

@bartvm
Copy link
Member

bartvm commented Feb 25, 2015

If you keep it on GPU, that could potentially be almost a doubling in memory, because you need to keep a step for each parameter in your model, while Theano could otherwise do them weight matrix by weight matrix. If that's really how it was done in GroundHog, I'm surprised the models fit on GPUs at all!

@rizar
Copy link
Contributor

rizar commented Feb 25, 2015

I doubt that Theano is smart enough to do that! However if it is not as smart, which I would expect, the inefficiency is still there.

I guess it the time to ask Fred what is the recommended way in Theano to skip an update if it would lead to corruption with NaNs...

@dwf
Copy link
Contributor

dwf commented Feb 25, 2015

Why not just switch on any(isnan(update)) ? Or take the disjunction across
all proposed updates.

On Wed, Feb 25, 2015 at 2:39 PM, Dmitry Bogdanov notifications@github.com
wrote:

I doubt that Theano is smart enough to do that! However if it is not as
smart, which I would expect, the inefficiency is still there.

I guess it the time to ask Fred what is the recommended way in Theano to
skip an update if it would lead to corruption with NaNs...


Reply to this email directly or view it on GitHub
#343 (comment).

@bartvm
Copy link
Member

bartvm commented Feb 25, 2015

I guess because there's no any operator in Theano, is there? I guess that l2_norm is a bit strange though, a simple reduce like sum would have the same effect.

@dwf
Copy link
Contributor

dwf commented Feb 25, 2015

theano.tensor.basic.any seems to exist for me.

On Wed, Feb 25, 2015 at 4:15 PM, Bart van Merriënboer <
notifications@github.com> wrote:

I guess because there's no any operator in Theano, is there? I guess that
l2_norm is a bit strange though, a simple reduce like sum would have the
same effect.


Reply to this email directly or view it on GitHub
#343 (comment).

@bartvm
Copy link
Member

bartvm commented Feb 25, 2015

Ah, great, my bad! I was looking for it in the documentation with other logical functions like and_ and or_...

@bartvm
Copy link
Member

bartvm commented Feb 25, 2015

#352 better?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants