New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conv gemm non-square kernel support #2023
Conversation
…sample (x, y) values and all kernel, batch, image size compatible on GPU. @nouiz: Can you please run the tests once again just to be double sure? I think it works correctly.
Looks good to me, and the tests pass on my office computer: git fetch upstream pull/2023/head:corrmm
git checkout corrmm
cd theano/sandbox/cuda/tests
nosetests test_conv_cuda_ndarray.py
Using gpu device 0: GeForce GT 640
.............
----------------------------------------------------------------------
Ran 13 tests in 839.078s
OK You should just change the docstring of |
I did a PR to this PR with some doc fix. Can you review it? If you are good On Fri, Aug 8, 2014 at 6:38 AM, Jan Schlüter notifications@github.com
|
I've also sent you a PR for this PR to have the |
Jan commit indicate that we didn't get out correctly, otherwise, the tests
|
Ah, I see, @stencilman: I've sent you a pull request to address Fred's suggestion. |
@nouiz: Everything passes now! |
@@ -7,6 +7,7 @@ | |||
|
|||
|
|||
import numpy | |||
import scipy |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not ok. Scipy is an optional dependency for theano so the tests need to still work (i.e. not crash) when it isn't there.
@stencilman: I've sent you a PR, hopefully the last one, to clean up the tests. |
@f0k: Did I merge it right or I screwed it up? |
@stencilman: Hmm, you merged the wrong branch of mine. I'll instruct you how to fix it in a minute. |
So the pull request I meant was this one: https://github.com/stencilman/Theano-1/pull/7 git checkout conv_gemm
git reset --hard 4d4c928
git push --force origin conv_gemm This way the PR will be reset to the state of this morning. Afterwards, go to https://github.com/stencilman/Theano-1/pull/7 and click the green merge button to accept my PR. |
Cleanup CUDA convolution tests
Thanks a lot @f0k!! :-) Also, do you mind sharing your findings regarding speed comparison with torch7? I would be very grateful. Thank you! |
I don't have torch7 installed and the installation instructions scare me. My updated Theano benchmark is merged into convnet-benchmarks, though, please feel free to try it and report back! |
Here are the results of the benchmark I get on a Titan Black. As you see we are very competitive except that for some reason our bprop weights is slow(Upto 3x). Do you have any clue why? Torch7:
Ours
|
Cool, thanks for the direct comparison! Our bprop wrt. weights uses the same algorithm as the frop (both do a valid convolution, and we currently only have a single gemm-based algorithm for that). Caffe uses a slightly different variant for the bprop wrt. weights, and it seems this is faster. The results indicate that we should really split PS: Let's hope someone has both mercy with us and time to merge this PR soon so we can go on :) |
I see, thanks for your explanation. Yes, please merge this PR! @nouiz is perhaps away until fri and then things should move faster. Hmm, so how do you think is the best and the fastest way to make it as fast? I am happy to write code for it.. I need the conv to be at least as fast as torch7 to be able to use theano. |
Well, when this PR is merged, I will rebase my other PR and then we can either work from there or start a second attempt. I would redefine the
Thanks, I'll let you know how you can help!
What about the FFT-based convolution then? In convnet-benchmarks, it is faster than Torch7 for all configurations except L1. |
My GPU is occupied with other tests right now. I'll give it a spin after that and merge if the tests pass (should be later tonight or tomorrow). |
You have to install the development version. The last release doesn't have the necessary bindings. |
Any updates on merging this PR? Thanks! |
Conv gemm non-square kernel support
Just noticed that the tests have been successful. |
Thanks a lot @abergeron! |
Great, thank you! |
Bellow I attach the convnet benchmark results for fft vs corrMM. However, when I try to run it for my project, it throws a scikits.cuda.cufft.cufftAllocFailed. So fft is not not practical to use.. And corrMM is still so slow as compared to torch7 :-(
|
I don't catch-up to everything was done while I was away. I have 1 questions. Do one of you know what is the difference in the As both implementation are valid convolution, we could do multiple op, but On Thu, Aug 14, 2014 at 2:12 PM, Arjun Jain notifications@github.com
|
Hi @nouiz: Torch does a updateOutput (which is a valid conv) for doing the fprop, but does a accGradParameters when updating the weights (also a valid conv). If you see the code(https://github.com/torch/cunn/blob/master/SpatialConvolutionMM.cu) you will see that they are different and perhaps thats why the grad wrt to the weights is faster for torch. Does it answer your question? Yes, perhaps you can choose a diff code path depending on the sizes, but you guys(@f0k) will know this better. |
@nouiz: The difference is that for the bprop wrt. weights, caffe iterates over the batch and computes a number of dot products that are accumulated into a weight gradient (by setting both alpha and beta to 1 in the gemm call). I have it almost working now, will push to #2033 soon. My plan was to first have a |
@f0k: Awesome!! Cant wait, thank you so much!! 👍 |
This add support to non-square and kernels and strides sizes. It passes all tests.
Features:
This finishes some TODOs in gh-2015.