RemoveNotFinite step rule #343

memimo · 2015-02-24T19:12:46Z

Simple trick to remove nan, inf from gradients. Thanks to @jych
Not the best name chosen, and can change if anyone has better suggestion.

bartvm · 2015-02-24T19:16:07Z

blocks/algorithms/__init__.py

+
+
+class RemoveNotFinite(StepRule):
+    """If gradients norm is inf or nan.


You'll need a blank line after this... Sorry, the docstring checker is picky!

bartvm · 2015-02-24T19:51:33Z

blocks/algorithms/__init__.py

+class RemoveNotFinite(StepRule):
+    """A step rule that replaces non-finite elements.
+
+    Replaces non-finite elements (`inf` or `NaN`) in the step with


Is this really what it does? If I read the code it seems like it replaces the entire step with the scaled parameter, instead of doing it entry-wise (which is how I read this docstring).

Yeah, this is not correct. @dwf suggested this docstring. (For some odd reason, his comment was on my fork not here.)
How about replacing first line with:
Replaces steps with non-finite norm (inforNaN) with

tensor.switch operates element-wise.

Oh, I misread. Sorry.

rizar · 2015-02-25T18:47:42Z

This overlaps with #162, but is different, I guess we should have both.

More in detail: I would also like to be able to avoid updating parameters when the step contains NaN, so that the last valid parameters were available for analysis. For this I guess we will have to split the single Theano function of the DifferentiableCostMinimizer into two, one computing step and another one subtracting it from parameters, similarly like it was done in Groundhog.

RemoveNotFinite step rule

bartvm · 2015-02-25T18:54:06Z

You want to compile two separate Theano functions, one which calculates the step and another one which applies it (using the output of the first as an input)? That sounds woefully inefficient.

rizar · 2015-02-25T19:02:52Z

... and another one that subtracts it. What is so inefficient about it? Computation-wise it is the same, memory-wise it is harder to tell.

bartvm · 2015-02-25T19:20:24Z

You could be stopping Theano from performing a whole bunch of optimizations, for example memory-wise as you mention (perhaps Theano likes updating parameters one by one, not needing all of the steps to be available simultaneously). But most importantly, wouldn't this force Theano to push the step from GPU to CPU (as output from the first function) and then back onto the GPU (as input to the second function)? That's a massive bandwidth issue.

rizar · 2015-02-25T19:23:25Z

Why would it force to do such an awful thing?

I assume that the step is stored in shared variables on GPU, which guarantees that both function will be performed without copying to the main memory. In fact this is exactly the way it is done in Groundhog.

Memory-wise it might be indeed inefficient.

bartvm · 2015-02-25T19:28:54Z

If you keep it on GPU, that could potentially be almost a doubling in memory, because you need to keep a step for each parameter in your model, while Theano could otherwise do them weight matrix by weight matrix. If that's really how it was done in GroundHog, I'm surprised the models fit on GPUs at all!

rizar · 2015-02-25T19:39:47Z

I doubt that Theano is smart enough to do that! However if it is not as smart, which I would expect, the inefficiency is still there.

I guess it the time to ask Fred what is the recommended way in Theano to skip an update if it would lead to corruption with NaNs...

dwf · 2015-02-25T21:07:00Z

Why not just switch on any(isnan(update)) ? Or take the disjunction across
all proposed updates.

On Wed, Feb 25, 2015 at 2:39 PM, Dmitry Bogdanov notifications@github.com
wrote:

I doubt that Theano is smart enough to do that! However if it is not as
smart, which I would expect, the inefficiency is still there.

I guess it the time to ask Fred what is the recommended way in Theano to
skip an update if it would lead to corruption with NaNs...

—
Reply to this email directly or view it on GitHub
#343 (comment).

bartvm · 2015-02-25T21:15:55Z

I guess because there's no any operator in Theano, is there? I guess that l2_norm is a bit strange though, a simple reduce like sum would have the same effect.

dwf · 2015-02-25T21:19:49Z

theano.tensor.basic.any seems to exist for me.

On Wed, Feb 25, 2015 at 4:15 PM, Bart van Merriënboer <
notifications@github.com> wrote:

I guess because there's no any operator in Theano, is there? I guess that
l2_norm is a bit strange though, a simple reduce like sum would have the
same effect.

—
Reply to this email directly or view it on GitHub
#343 (comment).

bartvm · 2015-02-25T21:26:09Z

Ah, great, my bad! I was looking for it in the documentation with other logical functions like and_ and or_...

bartvm · 2015-02-25T21:31:00Z

#352 better?

memimo force-pushed the notfinite branch from 8fba022 to 0a09ac5 Compare February 24, 2015 19:15

bartvm reviewed Feb 24, 2015
View reviewed changes

memimo force-pushed the notfinite branch 3 times, most recently from 7bad42f to 7115439 Compare February 24, 2015 19:50

bartvm reviewed Feb 24, 2015
View reviewed changes

RemoveNotFinite step rule

91d10d3

memimo force-pushed the notfinite branch from 7115439 to 91d10d3 Compare February 24, 2015 20:11

rizar added a commit that referenced this pull request Feb 25, 2015

Merge pull request #343 from memimo/notfinite

1c2278b

RemoveNotFinite step rule

rizar merged commit 1c2278b into mila-iqia:master Feb 25, 2015

rizar mentioned this pull request Feb 25, 2015

Refactor RemoveNotFinite to Remember If There Were NaNs #162

Open

memimo deleted the notfinite branch February 25, 2015 21:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RemoveNotFinite step rule #343

RemoveNotFinite step rule #343

memimo commented Feb 24, 2015

bartvm Feb 24, 2015

bartvm Feb 24, 2015

memimo Feb 24, 2015

dwf Feb 24, 2015

dwf Feb 24, 2015

rizar commented Feb 25, 2015

bartvm commented Feb 25, 2015

rizar commented Feb 25, 2015

bartvm commented Feb 25, 2015

rizar commented Feb 25, 2015

bartvm commented Feb 25, 2015

rizar commented Feb 25, 2015

dwf commented Feb 25, 2015

bartvm commented Feb 25, 2015

dwf commented Feb 25, 2015

bartvm commented Feb 25, 2015

bartvm commented Feb 25, 2015



		class RemoveNotFinite(StepRule):
		"""If gradients norm is inf or nan.

RemoveNotFinite step rule #343

RemoveNotFinite step rule #343

Conversation

memimo commented Feb 24, 2015

bartvm Feb 24, 2015

Choose a reason for hiding this comment

bartvm Feb 24, 2015

Choose a reason for hiding this comment

memimo Feb 24, 2015

Choose a reason for hiding this comment

dwf Feb 24, 2015

Choose a reason for hiding this comment

dwf Feb 24, 2015

Choose a reason for hiding this comment

rizar commented Feb 25, 2015

bartvm commented Feb 25, 2015

rizar commented Feb 25, 2015

bartvm commented Feb 25, 2015

rizar commented Feb 25, 2015

bartvm commented Feb 25, 2015

rizar commented Feb 25, 2015

dwf commented Feb 25, 2015

bartvm commented Feb 25, 2015

dwf commented Feb 25, 2015

bartvm commented Feb 25, 2015

bartvm commented Feb 25, 2015