Conv gemm non-square kernel support #2023

stencilman · 2014-08-07T13:51:29Z

This add support to non-square and kernels and strides sizes. It passes all tests.

Features:

A Filter sizes (including non-square) supported
All stride sizes supported.

This finishes some TODOs in gh-2015.

@nouiz

…sample (x, y) values and all kernel, batch, image size compatible on GPU. @nouiz: Can you please run the tests once again just to be double sure? I think it works correctly.

f0k · 2014-08-08T10:38:28Z

Looks good to me, and the tests pass on my office computer:

git fetch upstream pull/2023/head:corrmm
git checkout corrmm
cd theano/sandbox/cuda/tests
nosetests test_conv_cuda_ndarray.py
Using gpu device 0: GeForce GT 640
.............
----------------------------------------------------------------------
Ran 13 tests in 839.078s

OK

You should just change the docstring of GpuCorrMM.__init__(), it still says strides are unsupported.
/Edit: And the local_conv_gemm() optimizer should skip the node.op.subsample == (1, 1) test.

nouiz · 2014-08-08T13:15:33Z

I did a PR to this PR with some doc fix. Can you review it? If you are good
with that, merge it and I'll merge this PR.

On Fri, Aug 8, 2014 at 6:38 AM, Jan Schlüter notifications@github.com
wrote:

Looks good to me, and the tests pass on my office computer:

git fetch upstream pull/2023/head:corrmm
git checkout corrmmcd theano/sandbox/cuda/tests
nosetests test_conv_cuda_ndarray.py
Using gpu device 0: GeForce GT 640

.............

Ran 13 tests in 839.078s

OK

You should just change the docstring of GpuCorrMM.init(), it still
says strides are unsupported.

—
Reply to this email directly or view it on GitHub
#2023 (comment).

stencilman · 2014-08-08T13:23:53Z

Thanks for the doc changes @nouiz and thanks for testing it @f0k ! Doc changes look good to me.

…puConvMM.

f0k · 2014-08-08T14:05:54Z

I've also sent you a PR for this PR to have the conv_gemm optimizer support strided convolution.

nouiz · 2014-08-08T22:08:55Z

Jan commit indicate that we didn't get out correctly, otherwise, the tests
should have failed. The problem is that test_valid and full don't try
subsample. Can you modify test_subsample as test_valid to test those cases?
Le 8 août 2014 10:05, "Jan Schlüter" notifications@github.com a écrit :

I've also sent you a PR for this PR to have the conv_gemm optimizer
support strided convolution.

—
Reply to this email directly or view it on GitHub
#2023 (comment).

f0k · 2014-08-11T11:34:00Z

Ah, I see, test_valid and test_full were always meant to test the new convolution only? They should probably be named differently then.

@stencilman: I've sent you a pull request to address Fred's suggestion.

f0k · 2014-08-11T15:01:51Z

@nouiz: Everything passes now!

abergeron · 2014-08-12T18:26:58Z

theano/sandbox/cuda/tests/test_conv_cuda_ndarray.py

@@ -7,6 +7,7 @@


 import numpy
+import scipy


This is not ok. Scipy is an optional dependency for theano so the tests need to still work (i.e. not crash) when it isn't there.

f0k · 2014-08-13T10:04:03Z

@stencilman: I've sent you a PR, hopefully the last one, to clean up the tests. test_valid, test_full and test_subsample now test the original convolution code, test_gemm_valid, test_gemm_full and test_gemm_subsample reuse these test suites to test the gemm-based convolution code (inserted via graph optimization), and test_gemm_directly tests the gemm-based convolution by manually constructing a graph with it. test_subsample fails on my office machine, but the other six test suites pass. I'll have a look at it...
/Edit: Okay, fixed it. You can merge my PR now and then hopefully your PR can be merged into Theano.

stencilman · 2014-08-13T15:23:58Z

@f0k: Did I merge it right or I screwed it up?

f0k · 2014-08-13T16:07:01Z

@stencilman: Hmm, you merged the wrong branch of mine. I'll instruct you how to fix it in a minute.

f0k · 2014-08-13T16:10:54Z

So the pull request I meant was this one: https://github.com/stencilman/Theano-1/pull/7
In your checkout, please do the following:

git checkout conv_gemm
git reset --hard 4d4c928
git push --force origin conv_gemm

This way the PR will be reset to the state of this morning. Afterwards, go to https://github.com/stencilman/Theano-1/pull/7 and click the green merge button to accept my PR.

Cleanup CUDA convolution tests

stencilman · 2014-08-13T16:13:40Z

Thanks a lot @f0k!! :-)

Also, do you mind sharing your findings regarding speed comparison with torch7? I would be very grateful. Thank you!

f0k · 2014-08-13T16:18:04Z

I don't have torch7 installed and the installation instructions scare me. My updated Theano benchmark is merged into convnet-benchmarks, though, please feel free to try it and report back!

stencilman · 2014-08-13T16:46:13Z

Here are the results of the benchmark I get on a Titan Black. As you see we are very competitive except that for some reason our bprop weights is slow(Upto 3x). Do you have any clue why?

Torch7:

CONFIG: input = 3x128x128 * ker = 3x96x11x11 (bs = 128, stride = 1)
SpatialConvolutionMM:updateOutput(): (tm = 0.11015701293945)
SpatialConvolutionMM:updateGradInput(): (tm = 0.099327743053436)
SpatialConvoltionMM:accGradParameters(): (tm = 0.22154402732849)

CONFIG: input = 64x64x64 * ker = 64x128x9x9 (bs = 128, stride = 1)
SpatialConvolutionMM:updateOutput(): (tm = 0.24804699420929)
SpatialConvolutionMM:updateGradInput(): (tm = 0.3508819937706)
SpatialConvoltionMM:accGradParameters(): (tm = 0.39958328008652)

CONFIG: input = 128x32x32 * ker = 128x128x9x9 (bs = 128, stride = 1)
SpatialConvolutionMM:updateOutput(): (tm = 0.1720854640007)
SpatialConvolutionMM:updateGradInput(): (tm = 0.14392179250717)
SpatialConvoltionMM:accGradParameters(): (tm = 0.15553104877472)

Ours

CONFIG: input = 3 x 128 x 128 * ker = 3 x 96 x 11 x 11 ( bs = 128 , stride = 1 )
(experimental) theano...blas.CorrMM fprop: 1171.54006131 GFLOP/s ( tm = 0.106029006958 )
(experimental) theano...blas.CorrMM bprop inputs: 0.0 GFLOP/s ( tm = 0.0922187194824 )
(experimental) theano...blas.CorrMM bprop weights: 0.0 GFLOP/s ( tm = 0.278021148682 )

CONFIG: input = 64 x 64 x 64 * ker = 64 x 128 x 9 x 9 ( bs = 128 , stride = 1 )
(experimental) theano...blas.CorrMM fprop: 2329.91436963 GFLOP/s ( tm = 0.228639373779 )
(experimental) theano...blas.CorrMM bprop inputs: 0.0 GFLOP/s ( tm = 0.351494049072 )
(experimental) theano...blas.CorrMM bprop weights: 0.0 GFLOP/s ( tm = 0.958748046875 )

CONFIG: input = 128 x 32 x 32 * ker = 128 x 128 x 9 x 9 ( bs = 128 , stride = 1 )
(experimental) theano...blas.CorrMM fprop: 1035.75263494 GFLOP/s ( tm = 0.188934539795 )
(experimental) theano...blas.CorrMM bprop inputs: 0.0 GFLOP/s ( tm = 0.152022628784 )
(experimental) theano...blas.CorrMM bprop weights: 0.0 GFLOP/s ( tm = 0.40099987793 )

f0k · 2014-08-13T19:39:51Z

Cool, thanks for the direct comparison! Our bprop wrt. weights uses the same algorithm as the frop (both do a valid convolution, and we currently only have a single gemm-based algorithm for that). Caffe uses a slightly different variant for the bprop wrt. weights, and it seems this is faster.

The results indicate that we should really split GpuCorrMM into three ops (answering my question in #2033): The forward pass for valid correlation, GpuCorrMM_gradInput for the gradient wrt. inputs (a full convolution) and GpuCorrMM_gradWeights for the gradient wrt. weights (a valid... convolution, if I see correctly). If we give GpuCorrMM a grad() method then, it should perform similarly to Torch. Adapting the optimizer to choose the best replacement for any GpuConv ops it stumbles upon (to have it work for models using the standard conv2d() instead of directly using GpuCorrMM) might be tricky. My ultimate goal would be a meta-optimizer that tries the different variants we have, including the FFT-based ops, and then chooses the best-performing replacement for each individual GpuConv.

PS: Let's hope someone has both mercy with us and time to merge this PR soon so we can go on :)

stencilman · 2014-08-13T21:24:47Z

I see, thanks for your explanation.

Yes, please merge this PR! @nouiz is perhaps away until fri and then things should move faster.

Hmm, so how do you think is the best and the fastest way to make it as fast? I am happy to write code for it.. I need the conv to be at least as fast as torch7 to be able to use theano.

f0k · 2014-08-13T22:30:12Z

Hmm, so how do you think is the best and the fastest way to make it as fast?

Well, when this PR is merged, I will rebase my other PR and then we can either work from there or start a second attempt. I would redefine the mode parameter in the CUDA code not to switch between valid and full convolution, but to switch between forward, bprop wrt. input and bprop wrt. weights, with the three matrix arguments always referring to the layer input, filters and layer output (instead of swapping input and output for full convolution, this made everything harder). There would be three corresponding Theano ops, the first of which would use the other two to define its gradient (similar to the cuda-convnet wrapper in pylearn2). The three ops should share as much of their C code as possible.
I'd rather like to do this tomorrow than next week... if @nouiz is away, maybe @abergeron can merge your PR?

I am happy to write code for it..

Thanks, I'll let you know how you can help!

I need the conv to be at least as fast as torch7 to be able to use theano.

What about the FFT-based convolution then? In convnet-benchmarks, it is faster than Torch7 for all configurations except L1.

abergeron · 2014-08-13T23:35:52Z

My GPU is occupied with other tests right now. I'll give it a spin after that and merge if the tests pass (should be later tonight or tomorrow).

stencilman · 2014-08-13T23:59:58Z

@f0k or @benanne: I can not get fft to work, somehow my scikits.cuda.cublas.cublasCgemmBatched seems to be missing in the scikits.cuda.cublas module (all batched versions seem to be missing, e.g. cublasCgemm exists). Do you have any idea why? Thanks.

abergeron · 2014-08-14T01:22:37Z

You have to install the development version. The last release doesn't have the necessary bindings.

stencilman · 2014-08-14T16:48:59Z

Any updates on merging this PR? Thanks!

Conv gemm non-square kernel support

abergeron · 2014-08-14T16:54:35Z

Just noticed that the tests have been successful.

stencilman · 2014-08-14T17:21:39Z

Thanks a lot @abergeron!

f0k · 2014-08-14T17:27:51Z

Great, thank you!

stencilman · 2014-08-14T18:12:02Z

Bellow I attach the convnet benchmark results for fft vs corrMM. However, when I try to run it for my project, it throws a scikits.cuda.cufft.cufftAllocFailed. So fft is not not practical to use.. And corrMM is still so slow as compared to torch7 :-(

CONFIG: input = 3 x 128 x 128 * ker = 3 x 96 x 11 x 11 ( bs = 128 , stride = 1 )
(experimental) theano.sandbox.cuda.fftconv.conv2d_fft fprop: 720.042110896 GFLOP/s ( tm = 0.172513839722 )
(experimental) theano.sandbox.cuda.fftconv.conv2d_fft bprop weights: 0.0 GFLOP/s ( tm = 0.190046844482 )
(experimental) theano.sandbox.cuda.fftconv.conv2d_fft bprop inputs: 0.0 GFLOP/s ( tm = 1.12835046387 )
(experimental) theano.sandbox.cuda.blas.CorrMM fprop: 1025.87552679 GFLOP/s ( tm = 0.121084114075 )
(experimental) theano.sandbox.cuda.blas.CorrMM bprop weights: 0.0 GFLOP/s ( tm = 0.264007446289 )
(experimental) theano.sandbox.cuda.blas.CorrMM bprop inputs: 0.0 GFLOP/s ( tm = 1.65133056641 )

CONFIG: input = 64 x 64 x 64 * ker = 64 x 128 x 9 x 9 ( bs = 128 , stride = 1 )
(experimental) theano.sandbox.cuda.fftconv.conv2d_fft fprop: 6408.74976884 GFLOP/s ( tm = 0.0831223220825 )
(experimental) theano.sandbox.cuda.fftconv.conv2d_fft bprop weights: 0.0 GFLOP/s ( tm = 0.104528160095 )
(experimental) theano.sandbox.cuda.fftconv.conv2d_fft bprop inputs: 0.0 GFLOP/s ( tm = 0.446905944824 )
(experimental) theano.sandbox.cuda.blas.CorrMM fprop: 1839.06468606 GFLOP/s ( tm = 0.289663635254 )
(experimental) theano.sandbox.cuda.blas.CorrMM bprop weights: 0.0 GFLOP/s ( tm = 0.887582336426 )
(experimental) theano.sandbox.cuda.blas.CorrMM bprop inputs: 0.0 GFLOP/s ( tm = 0.677083068848 )

CONFIG: input = 128 x 32 x 32 * ker = 128 x 128 x 9 x 9 ( bs = 128 , stride = 1 )
(experimental) theano.sandbox.cuda.fftconv.conv2d_fft fprop: 6634.08404822 GFLOP/s ( tm = 0.0294975833893 )
(experimental) theano.sandbox.cuda.fftconv.conv2d_fft bprop weights: 0.0 GFLOP/s ( tm = 0.0319159526825 )
(experimental) theano.sandbox.cuda.fftconv.conv2d_fft bprop inputs: 0.0 GFLOP/s ( tm = 0.151375915527 )
(experimental) theano.sandbox.cuda.blas.CorrMM fprop: 1041.26429915 GFLOP/s ( tm = 0.187934463501 )
(experimental) theano.sandbox.cuda.blas.CorrMM bprop weights: 0.0 GFLOP/s ( tm = 0.327285858154 )
(experimental) theano.sandbox.cuda.blas.CorrMM bprop inputs: 0.0 GFLOP/s ( tm = 0.281016693115 )

CONFIG: input = 128 x 16 x 16 * ker = 128 x 128 x 7 x 7 ( bs = 128 , stride = 1 )
(experimental) theano.sandbox.cuda.fftconv.conv2d_fft fprop: 2118.82111964 GFLOP/s ( tm = 0.0096997756958 )
(experimental) theano.sandbox.cuda.fftconv.conv2d_fft bprop weights: 0.0 GFLOP/s ( tm = 0.00985289573669 )
(experimental) theano.sandbox.cuda.fftconv.conv2d_fft bprop inputs: 0.0 GFLOP/s ( tm = 0.0386824493408 )
(experimental) theano.sandbox.cuda.blas.CorrMM fprop: 454.019960479 GFLOP/s ( tm = 0.0452669296265 )
(experimental) theano.sandbox.cuda.blas.CorrMM bprop weights: 0.0 GFLOP/s ( tm = 0.0540057106018 )
(experimental) theano.sandbox.cuda.blas.CorrMM bprop inputs: 0.0 GFLOP/s ( tm = 0.076849357605 )

CONFIG: input = 384 x 13 x 13 * ker = 384 x 384 x 3 x 3 ( bs = 128 , stride = 1 )
(experimental) theano.sandbox.cuda.fftconv.conv2d_fft fprop: 586.160629223 GFLOP/s ( tm = 0.0701315841675 )
(experimental) theano.sandbox.cuda.fftconv.conv2d_fft bprop weights: 0.0 GFLOP/s ( tm = 0.0719367828369 )
(experimental) theano.sandbox.cuda.fftconv.conv2d_fft bprop inputs: 0.0 GFLOP/s ( tm = 0.0903026046753 )
(experimental) theano.sandbox.cuda.blas.CorrMM fprop: 716.315122147 GFLOP/s ( tm = 0.057388671875 )
(experimental) theano.sandbox.cuda.blas.CorrMM bprop weights: 0.0 GFLOP/s ( tm = 0.215068939209 )
(experimental) theano.sandbox.cuda.blas.CorrMM bprop inputs: 0.0 GFLOP/s ( tm = 0.0558946075439 )

nouiz · 2014-08-18T15:27:59Z

I don't catch-up to everything was done while I was away.

I have 1 questions. Do one of you know what is the difference in the
implementation of bprop weight and the valid implementation we have? Both
are valid convolution. From the profile again torch7, and what was written,
it seem the algo isn't the same.

As both implementation are valid convolution, we could do multiple op, but
we could select different code path in the Op depending of the input shape.
That would be lighter and more newbie proop, in the case someone want to
use the convolution with different input shapes.

On Thu, Aug 14, 2014 at 2:12 PM, Arjun Jain notifications@github.com
wrote:

Bellow I attach the convnet benchmark results for fft vs corrMM. However,
when I try to run it for my project, it throws a
scikits.cuda.cufft.cufftAllocFailed. So fft is not not practical to use..
And corrMM is still so slow as compared to torch7 :-(

CONFIG: input = 3 x 128 x 128 * ker = 3 x 96 x 11 x 11 ( bs = 128 , stride = 1 )
(experimental) theano.sandbox.cuda.fftconv.conv2d_fft fprop: 720.042110896 GFLOP/s ( tm = 0.172513839722 )
(experimental) theano.sandbox.cuda.fftconv.conv2d_fft bprop weights: 0.0 GFLOP/s ( tm = 0.190046844482 )
(experimental) theano.sandbox.cuda.fftconv.conv2d_fft bprop inputs: 0.0 GFLOP/s ( tm = 1.12835046387 )
(experimental) theano.sandbox.cuda.blas.CorrMM fprop: 1025.87552679 GFLOP/s ( tm = 0.121084114075 )
(experimental) theano.sandbox.cuda.blas.CorrMM bprop weights: 0.0 GFLOP/s ( tm = 0.264007446289 )
(experimental) theano.sandbox.cuda.blas.CorrMM bprop inputs: 0.0 GFLOP/s ( tm = 1.65133056641 )

CONFIG: input = 64 x 64 x 64 * ker = 64 x 128 x 9 x 9 ( bs = 128 , stride = 1 )
(experimental) theano.sandbox.cuda.fftconv.conv2d_fft fprop: 6408.74976884 GFLOP/s ( tm = 0.0831223220825 )
(experimental) theano.sandbox.cuda.fftconv.conv2d_fft bprop weights: 0.0 GFLOP/s ( tm = 0.104528160095 )
(experimental) theano.sandbox.cuda.fftconv.conv2d_fft bprop inputs: 0.0 GFLOP/s ( tm = 0.446905944824 )
(experimental) theano.sandbox.cuda.blas.CorrMM fprop: 1839.06468606 GFLOP/s ( tm = 0.289663635254 )
(experimental) theano.sandbox.cuda.blas.CorrMM bprop weights: 0.0 GFLOP/s ( tm = 0.887582336426 )
(experimental) theano.sandbox.cuda.blas.CorrMM bprop inputs: 0.0 GFLOP/s ( tm = 0.677083068848 )

CONFIG: input = 128 x 32 x 32 * ker = 128 x 128 x 9 x 9 ( bs = 128 , stride = 1 )
(experimental) theano.sandbox.cuda.fftconv.conv2d_fft fprop: 6634.08404822 GFLOP/s ( tm = 0.0294975833893 )
(experimental) theano.sandbox.cuda.fftconv.conv2d_fft bprop weights: 0.0 GFLOP/s ( tm = 0.0319159526825 )
(experimental) theano.sandbox.cuda.fftconv.conv2d_fft bprop inputs: 0.0 GFLOP/s ( tm = 0.151375915527 )
(experimental) theano.sandbox.cuda.blas.CorrMM fprop: 1041.26429915 GFLOP/s ( tm = 0.187934463501 )
(experimental) theano.sandbox.cuda.blas.CorrMM bprop weights: 0.0 GFLOP/s ( tm = 0.327285858154 )
(experimental) theano.sandbox.cuda.blas.CorrMM bprop inputs: 0.0 GFLOP/s ( tm = 0.281016693115 )

CONFIG: input = 128 x 16 x 16 * ker = 128 x 128 x 7 x 7 ( bs = 128 , stride = 1 )
(experimental) theano.sandbox.cuda.fftconv.conv2d_fft fprop: 2118.82111964 GFLOP/s ( tm = 0.0096997756958 )
(experimental) theano.sandbox.cuda.fftconv.conv2d_fft bprop weights: 0.0 GFLOP/s ( tm = 0.00985289573669 )
(experimental) theano.sandbox.cuda.fftconv.conv2d_fft bprop inputs: 0.0 GFLOP/s ( tm = 0.0386824493408 )
(experimental) theano.sandbox.cuda.blas.CorrMM fprop: 454.019960479 GFLOP/s ( tm = 0.0452669296265 )
(experimental) theano.sandbox.cuda.blas.CorrMM bprop weights: 0.0 GFLOP/s ( tm = 0.0540057106018 )
(experimental) theano.sandbox.cuda.blas.CorrMM bprop inputs: 0.0 GFLOP/s ( tm = 0.076849357605 )

CONFIG: input = 384 x 13 x 13 * ker = 384 x 384 x 3 x 3 ( bs = 128 , stride = 1 )
(experimental) theano.sandbox.cuda.fftconv.conv2d_fft fprop: 586.160629223 GFLOP/s ( tm = 0.0701315841675 )
(experimental) theano.sandbox.cuda.fftconv.conv2d_fft bprop weights: 0.0 GFLOP/s ( tm = 0.0719367828369 )
(experimental) theano.sandbox.cuda.fftconv.conv2d_fft bprop inputs: 0.0 GFLOP/s ( tm = 0.0903026046753 )
(experimental) theano.sandbox.cuda.blas.CorrMM fprop: 716.315122147 GFLOP/s ( tm = 0.057388671875 )
(experimental) theano.sandbox.cuda.blas.CorrMM bprop weights: 0.0 GFLOP/s ( tm = 0.215068939209 )
(experimental) theano.sandbox.cuda.blas.CorrMM bprop inputs: 0.0 GFLOP/s ( tm = 0.0558946075439 )

—
Reply to this email directly or view it on GitHub
#2023 (comment).

stencilman · 2014-08-18T15:34:09Z

Hi @nouiz: Torch does a updateOutput (which is a valid conv) for doing the fprop, but does a accGradParameters when updating the weights (also a valid conv). If you see the code(https://github.com/torch/cunn/blob/master/SpatialConvolutionMM.cu) you will see that they are different and perhaps thats why the grad wrt to the weights is faster for torch. Does it answer your question?

Yes, perhaps you can choose a diff code path depending on the sizes, but you guys(@f0k) will know this better.

f0k · 2014-08-18T16:03:27Z

@nouiz: The difference is that for the bprop wrt. weights, caffe iterates over the batch and computes a number of dot products that are accumulated into a weight gradient (by setting both alpha and beta to 1 in the gemm call). I have it almost working now, will push to #2033 soon.

My plan was to first have a GpuCorrMM with gradients, then update the optimizer to choose the best op depending on the input shapes. /Edit: This will only work for replacing GpuConv ops with fully-defined shape information, of course. But one can always use GpuCorrMM directly if it has a grad() function.

stencilman · 2014-08-18T16:07:20Z

@f0k: Awesome!! Cant wait, thank you so much!! 👍

stencilman added 2 commits August 6, 2014 23:15

Support for non-square images and kernels

85b8a90

removing printf, what Fred suggested doesnt work for me

1b680cc

This was referenced Aug 7, 2014

caffe conv kernel for theano. tests work, but needs integration and some... #2002

Merged

Continue gemm convolution #2015

Closed

stencilman added 2 commits August 7, 2014 11:50

Feature: support for all stride sized. This makes it work for all sub…

a065367

…sample (x, y) values and all kernel, batch, image size compatible on GPU. @nouiz: Can you please run the tests once again just to be double sure? I think it works correctly.

Changing doc not to say it doesnt work on non-square

76811aa

small doc fix.

4c8d04f

nouiz and others added 2 commits August 8, 2014 09:33

Make a better error message and re-enable the compilation cache for G…

837b1ca

…puConvMM.

Make conv_gemm optimizer support strided convolution

7294d64

Test if subsampling GpuConv ops get properly replaced with CorrMM ops

7a95b2a

f0k mentioned this pull request Aug 12, 2014

Faster algorithms and gradients for GpuCorrMM #2033

Merged

abergeron reviewed Aug 12, 2014
View reviewed changes

Addressing @abergeron valid comments for fixes

4d4c928

cleanup cuda convolution tests

68e82ec

Merge pull request #7 from f0k/cormm-cleantest

b5e340b

Cleanup CUDA convolution tests

abergeron added a commit that referenced this pull request Aug 14, 2014

Merge pull request #2023 from stencilman/conv_gemm

0037c72

Conv gemm non-square kernel support

abergeron merged commit 0037c72 into Theano:master Aug 14, 2014

ballasn mentioned this pull request Sep 4, 2014

GPUCorr3Dmm #2077

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Conv gemm non-square kernel support #2023

Conv gemm non-square kernel support #2023

stencilman commented Aug 7, 2014

f0k commented Aug 8, 2014

nouiz commented Aug 8, 2014

.............

stencilman commented Aug 8, 2014

f0k commented Aug 8, 2014

nouiz commented Aug 8, 2014

f0k commented Aug 11, 2014

f0k commented Aug 11, 2014

abergeron Aug 12, 2014

f0k commented Aug 13, 2014

stencilman commented Aug 13, 2014

f0k commented Aug 13, 2014

f0k commented Aug 13, 2014

stencilman commented Aug 13, 2014

f0k commented Aug 13, 2014

stencilman commented Aug 13, 2014

f0k commented Aug 13, 2014

stencilman commented Aug 13, 2014

f0k commented Aug 13, 2014

abergeron commented Aug 13, 2014

stencilman commented Aug 13, 2014

abergeron commented Aug 14, 2014

stencilman commented Aug 14, 2014

abergeron commented Aug 14, 2014

stencilman commented Aug 14, 2014

f0k commented Aug 14, 2014

stencilman commented Aug 14, 2014

nouiz commented Aug 18, 2014

stencilman commented Aug 18, 2014

f0k commented Aug 18, 2014

stencilman commented Aug 18, 2014

Conv gemm non-square kernel support #2023

Conv gemm non-square kernel support #2023

Conversation

stencilman commented Aug 7, 2014

f0k commented Aug 8, 2014

nouiz commented Aug 8, 2014

.............

stencilman commented Aug 8, 2014

f0k commented Aug 8, 2014

nouiz commented Aug 8, 2014

f0k commented Aug 11, 2014

f0k commented Aug 11, 2014

abergeron Aug 12, 2014

Choose a reason for hiding this comment

f0k commented Aug 13, 2014

stencilman commented Aug 13, 2014

f0k commented Aug 13, 2014

f0k commented Aug 13, 2014

stencilman commented Aug 13, 2014

f0k commented Aug 13, 2014

stencilman commented Aug 13, 2014

f0k commented Aug 13, 2014

stencilman commented Aug 13, 2014

f0k commented Aug 13, 2014

abergeron commented Aug 13, 2014

stencilman commented Aug 13, 2014

abergeron commented Aug 14, 2014

stencilman commented Aug 14, 2014

abergeron commented Aug 14, 2014

stencilman commented Aug 14, 2014

f0k commented Aug 14, 2014

stencilman commented Aug 14, 2014

nouiz commented Aug 18, 2014

stencilman commented Aug 18, 2014

f0k commented Aug 18, 2014

stencilman commented Aug 18, 2014