Fixed issue #3069 by checking op tag #3070

kumasento · 2019-04-22T16:24:41Z

This PR intends to fix issue #3069.

Changes:

Added in_channels to the Conv2DAttrs interface, as well as the conv2d make function.
Updated the logic for deciding depthwise or group conv2d.

Please do not hesitate to let me know what can be improved.

@tqchen @Huyuwei @vinx13 any review from you will be very helpful, thanks!

tqchen · 2019-04-22T16:32:00Z

I am not sure if we want to introduce the additional attribute, as the number of input channels is implied by the input tensor itself.

kumasento · 2019-04-22T16:37:09Z

@tqchen

Thanks for your reply. I do feel this fix may not be optimal as well, but since the schedule_conv2d interface doesn’t take in any information about the inputs, it seems impossible to correctly distinguish depthwise and group conv2d.

Please do let me know if you have any suggestion. Thanks

tqchen · 2019-04-22T16:52:11Z

Usually, in this case, we can hide additional information in the attrs field in the compute https://docs.tvm.ai/api/python/tvm.html?highlight=compute#tvm.compute

kumasento · 2019-04-22T17:10:53Z

Thank you so much for pointing that doc for me.

Maybe I didn’t describe very clear about the context of this fix: the target issue happens while using Relay front end to convert a model with group convolution, which implies that I cannot apply a custom function or add a new attribute without changing the existing mechanisms within Relay. Therefore, I’m afraid I cannot see a viable solution based on tvm.compute specifically for the case in issue #3069.

Please let me know if you have any other hint, thanks!

kevinthesun · 2019-04-22T18:59:45Z

@kumasento We don't need to add extra attrs to conv2d, since autotvm registers "workload" for conv2d compute. In schedule, you can simply call outs[0].op.attrs["workload"] to get input information.

kumasento · 2019-04-22T19:30:03Z

@kevinthesun Thank you so much for your valuable comment. I did try that but outs[0].op.attrs in my case is an empty dictionary. If you can go to #3069 and insert print(outs[0].op.attrs) directly after this line, you may get {}.

I could look deeper into this issue - would you mind letting me know where workload is actually registered? Thanks!

kevinthesun · 2019-04-22T20:14:17Z

@kumasento https://github.com/dmlc/tvm/blob/master/python/tvm/autotvm/task/topi_integration.py#L333
In topi, each conv2d compute is registered with this decorator function. Note that if you are setting opt_level=3 and use "llvm" as target, you won't get attrs in schedule_conv2d for some conv2d ops. The reason is here: https://github.com/dmlc/tvm/blob/master/topi/python/topi/x86/conv2d.py#L325. x86 conv2d will replace some conv2d ops for opt 3 in _alter_conv2d_layout for better performance. Currently it will replace "normal" conv2d and depthwise conv2d. For group conv2d, it should keep the original implementation.

kumasento · 2019-04-22T21:53:30Z

Thank you so much @kevinthesun I will try this out tomorrow and get back to you

kumasento · 2019-04-23T06:54:57Z

@kevinthesun

Sorry for getting back late.

I did a bit more investigation into the register_topi_compute function you've pointed me to. The problem is that this function has never been called for my case (target="llvm", opt_level=1). So that I cannot get workload in outs[0].op.attrs.

I will spend more time in this direction, but if you have any other suggestion, please let me know. Thanks!

kumasento · 2019-04-23T11:01:19Z

I'm still stuck here. I was trying to figure out the reason why I cannot have a workload annotated outs in schedule_conv2d, but its upper function in the call stack is within the C++ codebase, specifically, GraphRuntimeCodegen::Codegen.

If possible, could anyone give me a hint about where would Codegen possibly call the schedule_conv2d function, so that I can trace the outs object to its origin? Thanks!

vinx13 · 2019-04-23T12:36:22Z

@kumasento schedule_conv2d is called here https://github.com/dmlc/tvm/blob/3f835bd974536a3b19dc63d7f7e55ada01d752bd/src/relay/backend/compile_engine.cc#L128

kumasento · 2019-04-23T12:38:07Z

@kumasento schedule_conv2d is called here

tvm/src/relay/backend/compile_engine.cc

Line 128 in 3f835bd

fschedule[master_op_](master_attrs_, tensor_outs, target_);

Thanks! I am checking it now

kumasento · 2019-04-23T15:16:18Z

@vinx13 Thank you so much for your help and now I can avoid adding new attributes to Conv2DAttrs to get the number of input channels, and therefore, distinguish correctly depthwise and group conv2d.

If I'm on the right track, I will move on to implement some test cases for this PR.

@tqchen @kevinthesun would you mind taking a look at this solution as well? Thanks

kumasento · 2019-04-23T15:17:02Z

python/tvm/relay/op/nn/_nn.py

@@ -133,12 +141,18 @@ def schedule_conv2d(attrs, outs, target):
        if groups == 1 and layout == "NHWC":
            return topi.generic.schedule_conv2d_nhwc(outs)
        if groups != 1:
-            if layout == "NCHW":
+            # collect in_channels to distinguish depthwise and group conv2d


Here I updated the logic to distinguish depthwise and group conv2d

kumasento · 2019-04-23T15:17:43Z

topi/python/topi/x86/conv2d.py

@@ -523,3 +523,15 @@ def traverse(op):

    traverse(outs[0].op)
    return s
+
+@autotvm.register_topi_compute(nn.group_conv2d_nchw, 'cpu', 'direct')


And this is where the registration happens.

kumasento · 2019-04-23T15:44:11Z

python/tvm/relay/op/nn/_nn.py

+            op = _find_conv2d_op(outs[0].op)
+            assert op is not None
+
+            is_depthwise = 'depthwise' in op.tag


I'm not sure whether it is a good way to go but it seems that checking whether depthwise is within the tag value is much easier and intuitive.

kevinthesun · 2019-04-23T19:24:27Z

topi/python/topi/x86/conv2d.py

@@ -27,7 +27,7 @@
 from .. import generic, tag
 from .. import nn
 from ..util import get_const_tuple
-from ..nn.conv2d import conv2d, conv2d_NCHWc, \
+from ..nn.conv2d import conv2d, conv2d_NCHWc, _group_conv2d_nchw, \


We'd better not importing an internal function. I think in nn.py we don't need an extra internal function for group_conv2d_nchw.

Thanks! I've removed this update since there is no need to register TOPI compute for group_covn2d_nchw now

kevinthesun · 2019-04-23T19:26:01Z

Maybe it's worth adding a test case to verify the correct schedules are fetched for different kind of conv2d?

vinx13 · 2019-04-24T02:19:09Z

python/tvm/relay/op/nn/_nn.py

+                if layout == "NHWC" and kernel_layout == "HWOI":
+                    return topi.generic.schedule_depthwise_conv2d_nhwc(outs)
+            else:
+                if layout == "NCHW":


layout in ['NCHW', 'NCHW4c']

Thanks! Just updated

vinx13 · 2019-04-24T02:21:03Z

topi/python/topi/x86/conv2d.py

@@ -523,3 +523,15 @@ def traverse(op):

    traverse(outs[0].op)
    return s
+
+@autotvm.register_topi_compute(nn.group_conv2d_nchw, 'cpu', 'direct')


we don't need to register this as autotvm template since you are using depthwise tag

Yes, I agree. I've just removed it.

kumasento · 2019-04-24T10:13:39Z

Maybe it's worth adding a test case to verify the correct schedules are fetched for different kind of conv2d?

That would be great. I'm still looking for the correct file for adding such test under tests/python/relay but I cannot find one. Should I create a new test file for it?

kumasento · 2019-04-25T09:12:26Z

@vinx13 Sure, I have done adding assertions in the latest commit.

But the second problem persists. I've dug into the error and noticed that group_conv2d_nchw_cuda returns a 5D tensor, which doesn't match the interface from Relay.

TVMError: Check failed: Equal(lhs, rhs): Failed to match the compute with TensorIntrin tensor_intrin's declaration  provided= reduce(combiner=comm_reducer(result=[(x + y)], lhs=[x], rhs=[y], identity_element=[0]), source=[(int32(x(rc))*int32(y(rc)))], axis=[iter_var(rc, Range(min=0, extent=4))], where=(uint1)1, valu
e_index=0), intrin=  reduce(combiner=comm_reducer(result=[(x + y)], lhs=[x], rhs=[y], identity_element=[0]), source=[(int32(x(rc))*int32(y(rc)))], axis=[iter_var(rc, Range(min=0, extent=4))], where=(uint1)1, value_index=0)
Error during compile function

You may also notice that in the CI error message.

Should I alter the shape of the tensor returned from group_conv2d_nchw_cuda directly? Thanks

vinx13 · 2019-04-25T09:27:28Z

@kumasento group_conv in NCHWc layout on cuda is not intended to be used diredctly. The autotvm registration is incorrect. Currently group_conv2d on CUDA have only int8 template, which should not be registered under direct.
Replacing https://github.com/dmlc/tvm/blob/1e84d04cbd56e2dc54f8e099acaa783607714796/topi/python/topi/cuda/group_conv2d_nchw.py#L30 with below lines will fix the problem.

autotvm.register_topi_compute(nn.group_conv2d_nchw, ['cuda', 'gpu'], 'direct', nn.group_conv2d_nchw.fdefault)
 
@autotvm.register_topi_compute(nn.group_conv2d_nchw, ['cuda', 'gpu'], ['int8'])

Since we don't have a direct schedule on CUDA yet, we need to disable group conv test on gpu.

kumasento · 2019-04-25T09:57:11Z

I think it might due to group_conv2d_nchw_cuda is actually designed for NCHW4c?

UPDATE: I just noticed your comment :) Yes, I agree with that

vinx13 · 2019-04-25T09:58:37Z

I think it might due to group_conv2d_nchw_cuda is actually designed for NCHW4c?

Yes, it is NCHW4c for int8 dtype. It's expected to be called by AlterOpLayout pass

Removed 'direct' CUDA tests

kumasento · 2019-04-25T10:39:12Z

tests/python/relay/test_op_level2.py

                    padding=(0, 1), channels=10, kernel_size=(1 ,3))
+    # mixed precision group conv2d
+    # NOTE(kumasento): This test cannot pass


I tried to schedule a group_conv2d_NCHW4c_int8 command here but this test failed. Needs some further investigation.

@kumasento By default we use direct schedule, so this test won't pass on CUDA. I would suggest disabling CUDA target for now

Alright, I will just remove these commented lines of tests.

kumasento · 2019-04-25T12:07:17Z

@vinx13 would you mind having another round of reviewing? Thanks!

vinx13 · 2019-04-25T12:37:49Z

topi/python/topi/nn/conv2d.py

@@ -575,6 +575,11 @@ def group_conv2d_nchw(Input, Filter, stride, padding, dilation, groups, out_dtyp
    else:
        dilation_h, dilation_w = dilation

+    if Input.dtype == 'int8':
+        tag = 'group_conv2d_NCHWc_int8'


This is generic compute function that can handle any dtype and should not be tagged as NCHWc. There is another implementation optimized for int8 for CUDA backend.

Sure, I've just removed that. Thanks for pointing it out.

tqchen · 2019-04-25T17:57:54Z

Will leave the PR management to @vinx13

topi/python/topi/cuda/group_conv2d_nchw.py

vinx13 · 2019-04-27T02:17:23Z

thanks @kumasento @kevinthesun @tqchen this is merged

* Fixed issue apache#3069 by adding in_channels * Registerd group_conv2d_nchw as topi compute * Improved by checking tag value * Removed group_conv2d_nchw topi registration * Added test for relay group_conv2d_nchw * Added assertions to forbid small group size * Removed hard-coded oc_block_factor * Added explanatory comments to group_conv2d_nchw_cuda * Updated group_conv2d_nchw_cuda schedule Removed 'direct' CUDA tests * Reverted an accidental change in a conv2d test * Fixed indentation problems * Fixed a mis-commented line * Reverted change in group_conv2d_nchw tag * Removed commented int8 group_conv2d test * Fixed group size assertions in group_conv2d_nchw_cuda

Fixed issue apache#3069 by adding in_channels

d008622

vinx13 mentioned this pull request Apr 23, 2019

[Relay][Frontend][BugFix] Fixed relay grouped convolution selection #3020

Closed

kumasento changed the title ~~Fixed issue #3069 by adding in_channels~~ Fixed issue #3069 by register group_conv2d_nchw to TOPI Apr 23, 2019

kumasento changed the title ~~Fixed issue #3069 by register group_conv2d_nchw to TOPI~~ Fixed issue #3069 by registering group_conv2d_nchw to TOPI Apr 23, 2019

Registerd group_conv2d_nchw as topi compute

c1d47dc

kumasento force-pushed the fix-gconv branch from 276d050 to c1d47dc Compare April 23, 2019 15:15

kumasento commented Apr 23, 2019

View reviewed changes

Improved by checking tag value

e524d62

kumasento commented Apr 23, 2019

View reviewed changes

kumasento changed the title ~~Fixed issue #3069 by registering group_conv2d_nchw to TOPI~~ Fixed issue #3069 by registering group_conv2d_nchw to TOPI and checking op tag Apr 23, 2019

kevinthesun requested changes Apr 23, 2019

View reviewed changes

vinx13 requested changes Apr 24, 2019

View reviewed changes

Removed group_conv2d_nchw topi registration

a56fb77

kumasento changed the title ~~Fixed issue #3069 by registering group_conv2d_nchw to TOPI and checking op tag~~ Fixed issue #3069 by checking op tag Apr 24, 2019

Removed hard-coded oc_block_factor

4774f28

Added explanatory comments to group_conv2d_nchw_cuda

626a5e6

Updated group_conv2d_nchw_cuda schedule

8de44b2

Removed 'direct' CUDA tests

kumasento commented Apr 25, 2019

View reviewed changes

kumasento added 3 commits April 25, 2019 11:39

Reverted an accidental change in a conv2d test

4866a17

Fixed indentation problems

c42b3bf

Fixed a mis-commented line

2c3f7c5

vinx13 requested changes Apr 25, 2019

View reviewed changes

Reverted change in group_conv2d_nchw tag

db8b2a9

tqchen assigned vinx13 Apr 25, 2019

Removed commented int8 group_conv2d test

b7ed247

vinx13 approved these changes Apr 26, 2019

View reviewed changes

vinx13 requested changes Apr 26, 2019

View reviewed changes

topi/python/topi/cuda/group_conv2d_nchw.py Outdated Show resolved Hide resolved

Fixed group size assertions in group_conv2d_nchw_cuda

e901c7c

vinx13 approved these changes Apr 26, 2019

View reviewed changes

vinx13 requested a review from kevinthesun April 26, 2019 12:52

kevinthesun approved these changes Apr 26, 2019

View reviewed changes

vinx13 merged commit 8f56949 into apache:master Apr 27, 2019

vinx13 added the status: accepted label Apr 27, 2019

masahi mentioned this pull request Apr 28, 2019

[Relay, OpFusion] Better tuple fusion implementation #3092

Merged

Wheest mentioned this pull request May 14, 2019

OpenCL grouped convolutions "did you forget to bind" #3190

Closed

Fixed issue #3069 by checking op tag #3070

Fixed issue #3069 by checking op tag #3070

Conversation

kumasento commented Apr 22, 2019

tqchen commented Apr 22, 2019

kumasento commented Apr 22, 2019

tqchen commented Apr 22, 2019

kumasento commented Apr 22, 2019

kevinthesun commented Apr 22, 2019

kumasento commented Apr 22, 2019

kevinthesun commented Apr 22, 2019 • edited

kumasento commented Apr 22, 2019

kumasento commented Apr 23, 2019

kumasento commented Apr 23, 2019

vinx13 commented Apr 23, 2019

kumasento commented Apr 23, 2019

kumasento commented Apr 23, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kevinthesun commented Apr 23, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kumasento commented Apr 24, 2019

kumasento commented Apr 25, 2019 • edited

vinx13 commented Apr 25, 2019

kumasento commented Apr 25, 2019 • edited

vinx13 commented Apr 25, 2019

Choose a reason for hiding this comment

vinx13 Apr 26, 2019 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kumasento commented Apr 25, 2019

vinx13 Apr 25, 2019 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tqchen commented Apr 25, 2019 • edited

vinx13 commented Apr 27, 2019

kevinthesun commented Apr 22, 2019 •

edited

kumasento commented Apr 25, 2019 •

edited

kumasento commented Apr 25, 2019 •

edited

vinx13 Apr 26, 2019 •

edited

vinx13 Apr 25, 2019 •

edited

tqchen commented Apr 25, 2019 •

edited