Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cuDNN v6 dilated convolution #2858

Merged
merged 16 commits into from Oct 28, 2017
Merged

Conversation

anaruse
Copy link
Contributor

@anaruse anaruse commented Jun 14, 2017

This PR is related to dilated convolution (#2693) and enabling Chainer to use an implementation of dilated convolution in cuDNN v6. Note that you need to use this branch of Cupy (cupy/cupy#133).

Copy link
Contributor

@ir5 ir5 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for PR! I put comments, so could you check them?

@@ -33,7 +33,7 @@ def _pair(x):

class Convolution2DFunction(function.Function):

def __init__(self, stride=1, pad=0, cover_all=False, **kwargs):
def __init__(self, stride=1, pad=0, cover_all=False, dilate=1, **kwargs):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems you put dilate option in Convolution2DFunction, but it is not supported in CPU mode. This is misleading for users. Further, if Convolution2DFunction supports dilation feature fully, then DilatedConvolution2DFunction will become the subset of it. This means we will have some sort of overlapping codes. I think there are two ways to fix this.

  1. Do not modify Convolution2DFunction (i.e., do not put dilate option here) and modify only the dilated function.
  2. Complete to implement dilate argument (i.e., implement dilation in CPU mode) and make the current dilated function just calls Convolution2DFunction internally.

How do you think about this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Please look at my comment below)

assert out_w > 0, 'Width in the output should be positive.'

y = cuda.cupy.empty((n, out_c, out_h, out_w), dtype=x.dtype)
# print('# conv_2d.py:120, y.shape: {}'.format(y.shape)) # debug
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove debug message.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Noted, will remove it.

@@ -101,7 +101,8 @@ class Convolution2D(link.Link):
""" # NOQA

def __init__(self, in_channels, out_channels, ksize=None, stride=1, pad=0,
nobias=False, initialW=None, initial_bias=None, **kwargs):
dilate=1, nobias=False, initialW=None, initial_bias=None,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar problem holds for the link.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Please look at my comment below)

@ir5 ir5 self-assigned this Jun 19, 2017
@anaruse
Copy link
Contributor Author

anaruse commented Jun 19, 2017

Thank you for the feedback.

It seems you put dilate option in Convolution2DFunction, but it is not supported in CPU mode. This is misleading for users. Further, if Convolution2DFunction supports dilation feature fully, then DilatedConvolution2DFunction will become the subset of it. This means we will have some sort of overlapping codes. I think there are two ways to fix this.

  1. Do not modify Convolution2DFunction (i.e., do not put dilate option here) and modify only the dilated function.
  2. Complete to implement dilate argument (i.e., implement dilation in CPU mode) and make the current dilated function just calls Convolution2DFunction internally.

I was not sure which was better when I sent the PR, but now I think the option two would be better mainly because:

  • the codes of Convolution2DFunction and DilatedConvolution2DFunction are alike and there are a lot of duplicated lines among them, except the codes that utilizes cuDNN v5.x or before (which do not support dilated convolution) and calls cuDNN's 1x1 convolutions multiple times for dilated convolution (Dilated Convolution Layer #1335). The current implementation of dilated convolution using cuDNN v5.x or before is great, but seems it is not so faster than an implementation with cupy and looks it is not so easy to maintain. Considering that dilated convolution is natively supported by cuDNN from v6 and you can add dilated convolution feature to Convolution2DFunction without large amount of code modifications, I think it is good timing to merge Convolution2DFunction and DilatedConvolution2DFunction.

What would you think?

@ir5
Copy link
Contributor

ir5 commented Jun 20, 2017

Thank you for your opinion. Now I discussed with chainer core developers about this since merge is a little big change. They and I consider merging two similar functions is a good contribution, but it is somehow different from the objective in this PR. So, I recommend you to do (1) implementing cuDNN v6 dilated conv and (2) merging the two functions in separate PRs. I guess you may do either one ((1) or (2)) at first. If you feel making two PRs is troublesome, however, doing the two things in one PR is probably admissible.

@anaruse
Copy link
Contributor Author

anaruse commented Jun 20, 2017

Thank you for the comment.

I agree your comment that it is not a small change to make two functions together. So, I will focus mainly on implementing dilated convolution feature in Convolution2DFunction for CPU and GPU and try to make an amount of code modifications to DilatedConvolution2DFunction small in this PR.

@ir5
Copy link
Contributor

ir5 commented Jun 21, 2017

So, I assume we are working on only (1) in this PR.
I guess we need to check if the result of standard convolution with dilate option is compatible with the existing dilated convolution.
I noticed you add a hidden flag _use_standard_convolution in dilated_convolution_2d.py which is probably for local test. When I modified this to True and ran test for test_dilated_convolution_2d.py, however, some test cases failed. My cudnn is a little old (4.0.7), so my environment is probably different with yours. My cupy was up to latest master branch.
I feel the way of testing using _use_standard_convolution is a little awkward.
How about adding the tests in dilated convolution to standard convolution so that we can see the test results on CI?

@anaruse
Copy link
Contributor Author

anaruse commented Jun 21, 2017

Thank you for the feedback.
As you noticed, the flag _use_standard_convolution is for local testing. And I agree with you making that flag adjustable from test scripts like test_dilated_convolution_2d.py. So, will fix the codes.

@anaruse
Copy link
Contributor Author

anaruse commented Jun 21, 2017

I've fixed test_dilated_convolution_2d.py so that it also tests the dilated convolution codes in a class Convolution2DFunction.

@ir5
Copy link
Contributor

ir5 commented Jun 22, 2017

Thank you for adding tests. My comments are:

  • I feel it is suitable to move the new tests in test_convolution_2d.py to test_dilated_convolution_2d.py, because the tests are about dilate argument in Convolution2DFunction. I guess testing.parameterize will reduce overlaps of codes.
  • Probably we need to add an explanation of dilate argument in docstrings of both function and link. There are many ways to explain, but I guess a simple description is enough. For example, adding an item in "Args:" section such as "dilate (int or pair of ints): Dilation factor of filter applications. For detail, please see :class:DilatedConvolution2DFunction." looks enough. Also, it may be worth noting that cuDNN is used in dilated convolution if its version is >= 6 in docstrings. I guess we may write this either in standard convolution or in dilated convolution.
  • _use_standard_convolution is unused code, so I feel it's better to remove it from this PR.

@@ -61,6 +70,9 @@ def check_type_forward(self, in_types):
)

def forward_cpu(self, inputs):
if getattr(self, '_func', False):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Knowing that we will work on merging tasks (2) right after this PR, this part is unused in this PR as far as I could see. So, to avoid confusion, it might be better to remove this part in this PR. How do you think about this? The same thing holds for backward_cpu.

@anaruse
Copy link
Contributor Author

anaruse commented Jun 23, 2017

Thank you for the comment.

I've updated the branch based on your comment:

  • moving some test codes from test_dilated_convolution_2d.py to test_convolution_2d.py,
  • adding an explanation of a dilate argument of a function class Convolution2DFunction,
  • removing some unsed codes in a function class DilatedConvolution2DFunction, etc.

Note that I've not added a dilate argument to a link class Convolution2D yet, so I kept test codes for the link class unchanged. I thinks the addition of the dilate argument to the link class should be done at next step.

BTW, no failure has been observed so far when running nosetests with cuDNN 5.1 and 6.0.

@@ -388,6 +398,8 @@ def convolution_2d(x, W, b=None, stride=1, pad=0, cover_all=False, **kwargs):
``pad=p`` and ``pad=(p, p)`` are equivalent.
cover_all (bool): If ``True``, all spatial locations are convoluted
into some output pixels.
dilate (int or pair of ints): Dilation factor of filter applications.
``dilate=d`` and ``dilate=(d, d)`` are equivalent.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about putting a reference to DilatedConvolution2DFunction class? I guess dilated convolution is not familiar with beginners.

@ir5
Copy link
Contributor

ir5 commented Jun 23, 2017

Thank you. Now the code looks good to me. I put one minor comment.

@beam2d beam2d self-requested a review June 23, 2017 08:48
@okuta okuta added the cat:feature Implementation that introduces new interfaces. label Aug 20, 2017
@okuta okuta self-assigned this Sep 18, 2017
@okuta
Copy link
Member

okuta commented Sep 18, 2017

@anaruse Cloud you resolve conflicts?

Conflicts:
	chainer/functions/connection/convolution_2d.py
	chainer/functions/connection/deconvolution_2d.py
	tests/chainer_tests/functions_tests/connection_tests/test_dilated_convolution_2d.py
@okuta
Copy link
Member

okuta commented Oct 16, 2017

jenkins, test this please.

@okuta
Copy link
Member

okuta commented Oct 17, 2017

I got error on CUDA v5 environment.
Please fix this line.

@anaruse
Copy link
Contributor Author

anaruse commented Oct 17, 2017

Thank you for pointing it out. it was fixed.

@okuta
Copy link
Member

okuta commented Oct 20, 2017

jenkins, test this please.

@okuta
Copy link
Member

okuta commented Oct 21, 2017

Please merge or rebase master, and fix get_deconv_outsize.

@okuta okuta added this to the v4.0.0b1 milestone Oct 21, 2017
Conflicts:
	chainer/utils/conv.py
@anaruse
Copy link
Contributor Author

anaruse commented Oct 23, 2017

Thanks for your comment. The conflict was resolved.

@okuta
Copy link
Member

okuta commented Oct 23, 2017

jenkins, test this please.

@okuta okuta added st:test-and-merge State indicating that pull request is approved by a reviewer and can be merged after CI passes. and removed st:ready-for-review labels Oct 23, 2017
@okuta
Copy link
Member

okuta commented Oct 25, 2017

I got following error. Please fix test of dilated conv.

01:13:45 _____ TestDilatedConvolution2DCudnnCall_param_7.test_call_cudnn_backrward ______
01:13:45 
01:13:45 self = <chainer.testing.parameterized.TestDilatedConvolution2DCudnnCall_param_7 testMethod=test_call_cudnn_backrward>
01:13:45 
01:13:45     def test_call_cudnn_backrward(self):
01:13:45         with chainer.using_config('use_cudnn', self.use_cudnn):
01:13:45             y = self.forward()
01:13:45             y.grad = self.gy
01:13:45             name = 'cupy.cudnn.cudnn.convolutionBackwardData_v3'
01:13:45             with mock.patch(name) as func:
01:13:45                 y.backward()
01:13:45 >               self.assertEqual(func.called, self.expect)
01:13:45 E               AssertionError: False != True

@anaruse
Copy link
Contributor Author

anaruse commented Oct 25, 2017

Could you tell me CUDA version, cuDNN version, etc. when that error occurred?

@okuta
Copy link
Member

okuta commented Oct 26, 2017

I got error on CUDA7/cuDNN4 and CUDA8/cuDNN51.

@anaruse
Copy link
Contributor Author

anaruse commented Oct 26, 2017

I got following error. Please fix test of dilated conv.

Please run the test again. I think the error is already resolved by the last commit. I've confirmed the error above does not happen with CUDA8 and cuDNN 5.1.

@okuta
Copy link
Member

okuta commented Oct 28, 2017

jenkins, test this please.

@okuta
Copy link
Member

okuta commented Oct 28, 2017

LGTM!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cat:feature Implementation that introduces new interfaces. st:test-and-merge State indicating that pull request is approved by a reviewer and can be merged after CI passes.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants