dropout in place incompatible with max pooling #117

mavenlin · 2014-02-16T17:03:50Z

It took me several hours to finally find this problem.
In my own implementation of dropout in cuda-convnet, I randomly drop half of the nodes during training time, and multiply by one half during test. In caffe, the nodes are multiplied by two during training time, and nothing is done during testing.
The two approach seems the same, but not the same when dropout is applied on max pooling layer in place. As the backward pass of max pooling layer needs its output, but dropout disrupts by multiplying a factor of two. For dropout, this can be resolved by multiplying one half during test time.

Any ideas to prevent in place operation when the data is needed in the backward pass? as in cuda-convnet, there is a useAct flag that indicates the activation data is needed by the layer in the future, and should not be overwritten.

kloudkl · 2014-02-17T11:31:22Z

@dnouri implemented dropout in cuda-convnet with a mask matrix which drops out the data while keeping the data intact.

kloudkl · 2014-02-17T11:41:16Z

Sorry, the above implementation is the same with yours. But in practice, dropout is usually performed against the fully connected layer. Is there any special reason to apply it to the max pooling layer?

mavenlin · 2014-02-17T11:45:09Z

Yes, in my paper Network in Network, dropout is applied on max pooling layer.
Also in convolutional maxout networks dropout is applied on max pooling layer, One example here: https://github.com/lisa-lab/pylearn2/blob/master/pylearn2/scripts/papers/maxout/cifar10.yaml

mavenlin · 2014-02-17T11:48:39Z

we can keep the data intact by allocating a top different from the bottom though. Two disadvantages here:

extra memory.
I didn't notice it can cause a problem and just do it in place.

Yangqing · 2014-02-17T17:28:05Z

@mavenlin if we will need to keep both versions, having a separate copy would probably be necessary so I won't worry about extra memory too much. It'll indeed be helpful to have a mechanism to check if a blob could be used in in-place operations, probably in net.cpp when we construct the layer.

sguada · 2014-02-17T17:53:16Z

I think that could be fixed by changing the max pooling layer, which
shouldn't rely on comparing the max values for backprop, since that could
introduce errors. For example if two inputs have the same max value then in
the backprop both would propagate the gradients.
If the max pooling would rely on a mask, similarly to the dropout-layer,
then there will be no problem.

Sergio

2014-02-17 Yangqing Jia notifications@github.com:

@mavenlin https://github.com/mavenlin if we will need to keep both
versions, having a separate copy would probably be necessary so I won't
worry about extra memory too much. It'll indeed be helpful to have a
mechanism to check if a blob could be used in in-place operations, probably
in net.cpp when we construct the layer.

Reply to this email directly or view it on GitHubhttps://github.com//issues/117#issuecomment-35305153
.

Yangqing · 2014-02-17T17:55:32Z

That is true. Explicitly storing the indices ("sufficient statistics" for
the mask) during the forward pass would help (and will also increase speed).

Yangqing

On Mon, Feb 17, 2014 at 9:53 AM, Sergio Guadarrama <notifications@github.com

wrote:

I think that could be fixed by changing the max pooling layer, which
shouldn't rely on comparing the max values for backprop, since that could
introduce errors. For example if two inputs have the same max value then in
the backprop both would propagate the gradients.
If the max pooling would rely on a mask, similarly to the dropout-layer,
then there will be no problem.

Sergio

2014-02-17 Yangqing Jia notifications@github.com:

@mavenlin https://github.com/mavenlin if we will need to keep both
versions, having a separate copy would probably be necessary so I won't
worry about extra memory too much. It'll indeed be helpful to have a
mechanism to check if a blob could be used in in-place operations,
probably
in net.cpp when we construct the layer.

Reply to this email directly or view it on GitHub<
https://github.com/BVLC/caffe/issues/117#issuecomment-35305153>

.

Reply to this email directly or view it on GitHubhttps://github.com//issues/117#issuecomment-35307404
.

sguada · 2014-02-17T18:02:33Z

I will work on that.
Right now, MaxPoolBackward accounts for 3.84% of the time, while
MaxPoolForward accounts for only 0.79% of the time.

Sergio

2014-02-17 Yangqing Jia notifications@github.com:

That is true. Explicitly storing the indices ("sufficient statistics" for
the mask) during the forward pass would help (and will also increase
speed).

Yangqing

On Mon, Feb 17, 2014 at 9:53 AM, Sergio Guadarrama <
notifications@github.com

wrote:

I think that could be fixed by changing the max pooling layer, which
shouldn't rely on comparing the max values for backprop, since that could
introduce errors. For example if two inputs have the same max value then
in
the backprop both would propagate the gradients.
If the max pooling would rely on a mask, similarly to the dropout-layer,
then there will be no problem.

Sergio

2014-02-17 Yangqing Jia notifications@github.com:

@mavenlin https://github.com/mavenlin if we will need to keep both

versions, having a separate copy would probably be necessary so I won't
worry about extra memory too much. It'll indeed be helpful to have a
mechanism to check if a blob could be used in in-place operations,
probably
in net.cpp when we construct the layer.

Reply to this email directly or view it on GitHub<
https://github.com/BVLC/caffe/issues/117#issuecomment-35305153>

.

Reply to this email directly or view it on GitHub<
https://github.com/BVLC/caffe/issues/117#issuecomment-35307404>

.

Reply to this email directly or view it on GitHubhttps://github.com//issues/117#issuecomment-35307589
.

sguada · 2014-02-26T01:34:58Z

@mavenlin take a look to #162 and let me know if that fix the problem. Comments are welcome.

@Yangqing I stored the indices, but in the Backward GPU I still needed to do extra comparisons to avoid races.

shelhamer · 2014-03-17T23:41:09Z

Addressed by #162.

sergeyk assigned sguada Feb 25, 2014

sguada mentioned this issue Feb 26, 2014

Fix Max pooling layer to use a mask #162

Closed

shelhamer closed this as completed Mar 17, 2014

mfigurnov mentioned this issue Jul 3, 2015

cuDNN max-pooling is not compatible with in-place dropout #2688

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

dropout in place incompatible with max pooling #117

dropout in place incompatible with max pooling #117

mavenlin commented Feb 16, 2014

kloudkl commented Feb 17, 2014

kloudkl commented Feb 17, 2014

mavenlin commented Feb 17, 2014

mavenlin commented Feb 17, 2014

Yangqing commented Feb 17, 2014

sguada commented Feb 17, 2014

Yangqing commented Feb 17, 2014

sguada commented Feb 17, 2014

sguada commented Feb 26, 2014

shelhamer commented Mar 17, 2014