New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
dropout in place incompatible with max pooling #117
Comments
@dnouri implemented dropout in cuda-convnet with a mask matrix which drops out the data while keeping the data intact. |
Sorry, the above implementation is the same with yours. But in practice, dropout is usually performed against the fully connected layer. Is there any special reason to apply it to the max pooling layer? |
Yes, in my paper Network in Network, dropout is applied on max pooling layer. |
we can keep the data intact by allocating a top different from the bottom though. Two disadvantages here:
|
@mavenlin if we will need to keep both versions, having a separate copy would probably be necessary so I won't worry about extra memory too much. It'll indeed be helpful to have a mechanism to check if a blob could be used in in-place operations, probably in net.cpp when we construct the layer. |
I think that could be fixed by changing the max pooling layer, which Sergio 2014-02-17 Yangqing Jia notifications@github.com:
|
That is true. Explicitly storing the indices ("sufficient statistics" for Yangqing On Mon, Feb 17, 2014 at 9:53 AM, Sergio Guadarrama <notifications@github.com
|
I will work on that. Sergio 2014-02-17 Yangqing Jia notifications@github.com:
|
Addressed by #162. |
It took me several hours to finally find this problem.
In my own implementation of dropout in cuda-convnet, I randomly drop half of the nodes during training time, and multiply by one half during test. In caffe, the nodes are multiplied by two during training time, and nothing is done during testing.
The two approach seems the same, but not the same when dropout is applied on max pooling layer in place. As the backward pass of max pooling layer needs its output, but dropout disrupts by multiplying a factor of two. For dropout, this can be resolved by multiplying one half during test time.
Any ideas to prevent in place operation when the data is needed in the backward pass? as in cuda-convnet, there is a useAct flag that indicates the activation data is needed by the layer in the future, and should not be overwritten.
The text was updated successfully, but these errors were encountered: