New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make automatic insertions of dnn_conv() use the optimal cuDNN code path (2) #2273

Merged
merged 4 commits into from Nov 21, 2014

Conversation

Projects
None yet
2 participants
@f0k
Contributor

f0k commented Nov 20, 2014

Continuation of #2272 that has already been merged.

This also chooses between GpuDnnConv and GpuDnnConvGradW depending on what the original purpose of the convolution was in the graph. For me the gradient wrt. weights is faster than before, but still quite a bit slower than manually using GpuDnnConv. I cannot look into this any more today, but if anyone wants to have a try, please go ahead. The goal would be to find out if it's slower because of extra dimshuffles/copies, or because I did something wrong. Please don't merge before investigating this, and also don't merge before re-running tests.

@f0k

This comment has been minimized.

Show comment
Hide comment
@f0k

f0k Nov 21, 2014

Contributor

OK, found it, both local_conv_dnn() and dnn_conv() introduced gpu_contiguous() around the images and kernels. Because dnn_conv() needs to do a lot of flipping and dimshuffling when not using the forward pass, this introduced additional copy operations. The graph for the automated insertion still has a lot of redundant operations now (dimshuffles and flips), but they do not incur any practical overhead.

On my desktop, I get:

SKIP=legacy,meta,gemm,fft,convnet PRINT_GRAPH=0 python pylearn2_benchmark.py i1x115x80,k32x8x6,b64
Using gpu device 0: GeForce GT 640

CONFIG: input = 1 x 115 x 80 * ker = 1 x 32 x 8 x 6 ( bs = 64 , stride = 1 )
(auto) theano.sandbox.cuda.dnn.GpuDnnConv          ==> fprop         ==>      13
(auto) theano.sandbox.cuda.dnn.GpuDnnConv          ==> bprop inputs  ==>      19
(auto) theano.sandbox.cuda.dnn.GpuDnnConv          ==> bprop weights ==>      16

(manual conv) theano.sandbox.cuda.dnn.GpuDnnConv   ==> fprop         ==>      13
(manual conv) theano.sandbox.cuda.dnn.GpuDnnConv   ==> bprop inputs  ==>      19
(manual conv) theano.sandbox.cuda.dnn.GpuDnnConv   ==> bprop weights ==>      16

Before, I had:

(auto) theano.sandbox.cuda.dnn.GpuDnnConv          ==> bprop weights ==>     148

I'll rebase and then it's ready to merge!

Contributor

f0k commented Nov 21, 2014

OK, found it, both local_conv_dnn() and dnn_conv() introduced gpu_contiguous() around the images and kernels. Because dnn_conv() needs to do a lot of flipping and dimshuffling when not using the forward pass, this introduced additional copy operations. The graph for the automated insertion still has a lot of redundant operations now (dimshuffles and flips), but they do not incur any practical overhead.

On my desktop, I get:

SKIP=legacy,meta,gemm,fft,convnet PRINT_GRAPH=0 python pylearn2_benchmark.py i1x115x80,k32x8x6,b64
Using gpu device 0: GeForce GT 640

CONFIG: input = 1 x 115 x 80 * ker = 1 x 32 x 8 x 6 ( bs = 64 , stride = 1 )
(auto) theano.sandbox.cuda.dnn.GpuDnnConv          ==> fprop         ==>      13
(auto) theano.sandbox.cuda.dnn.GpuDnnConv          ==> bprop inputs  ==>      19
(auto) theano.sandbox.cuda.dnn.GpuDnnConv          ==> bprop weights ==>      16

(manual conv) theano.sandbox.cuda.dnn.GpuDnnConv   ==> fprop         ==>      13
(manual conv) theano.sandbox.cuda.dnn.GpuDnnConv   ==> bprop inputs  ==>      19
(manual conv) theano.sandbox.cuda.dnn.GpuDnnConv   ==> bprop weights ==>      16

Before, I had:

(auto) theano.sandbox.cuda.dnn.GpuDnnConv          ==> bprop weights ==>     148

I'll rebase and then it's ready to merge!

@f0k

This comment has been minimized.

Show comment
Hide comment
@f0k

f0k Nov 21, 2014

Contributor

Rebased and Travis passes. Merge merge merge! :)

Contributor

f0k commented Nov 21, 2014

Rebased and Travis passes. Merge merge merge! :)

@f0k f0k referenced this pull request Nov 21, 2014

Merged

conv2d meta-optimizer #2266

@@ -280,6 +280,7 @@ def __init__(self, imshp=None, kshp=None, nkern=None, bsize=None,
kshp_logical_top_aligned=True,
verbose=0,
version=-1,
direction_hint='forward',

This comment has been minimized.

@nouiz

nouiz Nov 21, 2014

Member

This op and GpuConv need a setstate method to allow unpickling old graph:

    def __setstate__(self, d):
        self.__dict__.update(d)
        self.direction_hint = d.get('direction_hint', None)
@nouiz

nouiz Nov 21, 2014

Member

This op and GpuConv need a setstate method to allow unpickling old graph:

    def __setstate__(self, d):
        self.__dict__.update(d)
        self.direction_hint = d.get('direction_hint', None)

This comment has been minimized.

@f0k

f0k Nov 21, 2014

Contributor

OK. What about __hash__ and __eq__? Should this be part of it or not? It doesn't change the op because it's not used in any of the computations, but I don't know what the equality tests are used for.

@f0k

f0k Nov 21, 2014

Contributor

OK. What about __hash__ and __eq__? Should this be part of it or not? It doesn't change the op because it's not used in any of the computations, but I don't know what the equality tests are used for.

This comment has been minimized.

@nouiz

nouiz Nov 21, 2014

Member

Do no change hash and eq. They are used to merge the same
computation. They should only contain what cause different computation, not
different algo.

On Fri, Nov 21, 2014 at 9:39 AM, Jan Schlüter notifications@github.com
wrote:

In theano/tensor/nnet/conv.py:

@@ -280,6 +280,7 @@ def init(self, imshp=None, kshp=None, nkern=None, bsize=None,
kshp_logical_top_aligned=True,
verbose=0,
version=-1,

  •             direction_hint='forward',
    

OK. What about hash and eq? Should this be part of it or not? It
doesn't change the op because it's not used in any of the computations, but
I don't know what the equality tests are used for.


Reply to this email directly or view it on GitHub
https://github.com/Theano/Theano/pull/2273/files#r20717664.

@nouiz

nouiz Nov 21, 2014

Member

Do no change hash and eq. They are used to merge the same
computation. They should only contain what cause different computation, not
different algo.

On Fri, Nov 21, 2014 at 9:39 AM, Jan Schlüter notifications@github.com
wrote:

In theano/tensor/nnet/conv.py:

@@ -280,6 +280,7 @@ def init(self, imshp=None, kshp=None, nkern=None, bsize=None,
kshp_logical_top_aligned=True,
verbose=0,
version=-1,

  •             direction_hint='forward',
    

OK. What about hash and eq? Should this be part of it or not? It
doesn't change the op because it's not used in any of the computations, but
I don't know what the equality tests are used for.


Reply to this email directly or view it on GitHub
https://github.com/Theano/Theano/pull/2273/files#r20717664.

@nouiz

This comment has been minimized.

Show comment
Hide comment
@nouiz

nouiz Nov 21, 2014

Member

thanks

Member

nouiz commented Nov 21, 2014

thanks

nouiz added a commit that referenced this pull request Nov 21, 2014

Merge pull request #2273 from f0k/conv-direction-hints
Make automatic insertions of dnn_conv() use the optimal cuDNN code path (2)

@nouiz nouiz merged commit 18fe036 into Theano:master Nov 21, 2014

1 check was pending

continuous-integration/travis-ci The Travis CI build is in progress
Details

@f0k f0k deleted the f0k:conv-direction-hints branch Nov 21, 2014

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment