Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixed issue #3069 by checking op tag #3070

Merged
merged 15 commits into from
Apr 27, 2019
Merged

Conversation

kumasento
Copy link
Contributor

This PR intends to fix issue #3069.

Changes:

  • Added in_channels to the Conv2DAttrs interface, as well as the conv2d make function.
  • Updated the logic for deciding depthwise or group conv2d.

Please do not hesitate to let me know what can be improved.

@tqchen @Huyuwei @vinx13 any review from you will be very helpful, thanks!

@tqchen
Copy link
Member

tqchen commented Apr 22, 2019

I am not sure if we want to introduce the additional attribute, as the number of input channels is implied by the input tensor itself.

@kumasento
Copy link
Contributor Author

@tqchen

Thanks for your reply. I do feel this fix may not be optimal as well, but since the schedule_conv2d interface doesn’t take in any information about the inputs, it seems impossible to correctly distinguish depthwise and group conv2d.

Please do let me know if you have any suggestion. Thanks

@tqchen
Copy link
Member

tqchen commented Apr 22, 2019

Usually, in this case, we can hide additional information in the attrs field in the compute https://docs.tvm.ai/api/python/tvm.html?highlight=compute#tvm.compute

@kumasento
Copy link
Contributor Author

Thank you so much for pointing that doc for me.

Maybe I didn’t describe very clear about the context of this fix: the target issue happens while using Relay front end to convert a model with group convolution, which implies that I cannot apply a custom function or add a new attribute without changing the existing mechanisms within Relay. Therefore, I’m afraid I cannot see a viable solution based on tvm.compute specifically for the case in issue #3069.

Please let me know if you have any other hint, thanks!

@kevinthesun
Copy link
Contributor

@kumasento We don't need to add extra attrs to conv2d, since autotvm registers "workload" for conv2d compute. In schedule, you can simply call outs[0].op.attrs["workload"] to get input information.

@kumasento
Copy link
Contributor Author

@kevinthesun Thank you so much for your valuable comment. I did try that but outs[0].op.attrs in my case is an empty dictionary. If you can go to #3069 and insert print(outs[0].op.attrs) directly after this line, you may get {}.

I could look deeper into this issue - would you mind letting me know where workload is actually registered? Thanks!

@kevinthesun
Copy link
Contributor

kevinthesun commented Apr 22, 2019

@kumasento https://github.com/dmlc/tvm/blob/master/python/tvm/autotvm/task/topi_integration.py#L333
In topi, each conv2d compute is registered with this decorator function. Note that if you are setting opt_level=3 and use "llvm" as target, you won't get attrs in schedule_conv2d for some conv2d ops. The reason is here: https://github.com/dmlc/tvm/blob/master/topi/python/topi/x86/conv2d.py#L325. x86 conv2d will replace some conv2d ops for opt 3 in _alter_conv2d_layout for better performance. Currently it will replace "normal" conv2d and depthwise conv2d. For group conv2d, it should keep the original implementation.

@kumasento
Copy link
Contributor Author

Thank you so much @kevinthesun I will try this out tomorrow and get back to you

@kumasento
Copy link
Contributor Author

@kevinthesun

Sorry for getting back late.

I did a bit more investigation into the register_topi_compute function you've pointed me to. The problem is that this function has never been called for my case (target="llvm", opt_level=1). So that I cannot get workload in outs[0].op.attrs.

I will spend more time in this direction, but if you have any other suggestion, please let me know. Thanks!

@kumasento
Copy link
Contributor Author

I'm still stuck here. I was trying to figure out the reason why I cannot have a workload annotated outs in schedule_conv2d, but its upper function in the call stack is within the C++ codebase, specifically, GraphRuntimeCodegen::Codegen.

If possible, could anyone give me a hint about where would Codegen possibly call the schedule_conv2d function, so that I can trace the outs object to its origin? Thanks!

@vinx13
Copy link
Member

vinx13 commented Apr 23, 2019

@kumasento
Copy link
Contributor Author

@kumasento schedule_conv2d is called here

tvm/src/relay/backend/compile_engine.cc

Line 128 in 3f835bd

fschedule[master_op_](master_attrs_, tensor_outs, target_);

Thanks! I am checking it now

@kumasento kumasento changed the title Fixed issue #3069 by adding in_channels Fixed issue #3069 by register group_conv2d_nchw to TOPI Apr 23, 2019
@kumasento kumasento changed the title Fixed issue #3069 by register group_conv2d_nchw to TOPI Fixed issue #3069 by registering group_conv2d_nchw to TOPI Apr 23, 2019
@kumasento
Copy link
Contributor Author

@vinx13 Thank you so much for your help and now I can avoid adding new attributes to Conv2DAttrs to get the number of input channels, and therefore, distinguish correctly depthwise and group conv2d.

If I'm on the right track, I will move on to implement some test cases for this PR.

@tqchen @kevinthesun would you mind taking a look at this solution as well? Thanks

@@ -133,12 +141,18 @@ def schedule_conv2d(attrs, outs, target):
if groups == 1 and layout == "NHWC":
return topi.generic.schedule_conv2d_nhwc(outs)
if groups != 1:
if layout == "NCHW":
# collect in_channels to distinguish depthwise and group conv2d
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here I updated the logic to distinguish depthwise and group conv2d

@@ -523,3 +523,15 @@ def traverse(op):

traverse(outs[0].op)
return s

@autotvm.register_topi_compute(nn.group_conv2d_nchw, 'cpu', 'direct')
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And this is where the registration happens.

op = _find_conv2d_op(outs[0].op)
assert op is not None

is_depthwise = 'depthwise' in op.tag
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure whether it is a good way to go but it seems that checking whether depthwise is within the tag value is much easier and intuitive.

@kumasento kumasento changed the title Fixed issue #3069 by registering group_conv2d_nchw to TOPI Fixed issue #3069 by registering group_conv2d_nchw to TOPI and checking op tag Apr 23, 2019
@@ -27,7 +27,7 @@
from .. import generic, tag
from .. import nn
from ..util import get_const_tuple
from ..nn.conv2d import conv2d, conv2d_NCHWc, \
from ..nn.conv2d import conv2d, conv2d_NCHWc, _group_conv2d_nchw, \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'd better not importing an internal function. I think in nn.py we don't need an extra internal function for group_conv2d_nchw.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I've removed this update since there is no need to register TOPI compute for group_covn2d_nchw now

@kevinthesun
Copy link
Contributor

Maybe it's worth adding a test case to verify the correct schedules are fetched for different kind of conv2d?

if layout == "NHWC" and kernel_layout == "HWOI":
return topi.generic.schedule_depthwise_conv2d_nhwc(outs)
else:
if layout == "NCHW":
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

layout in ['NCHW', 'NCHW4c']

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Just updated

@@ -523,3 +523,15 @@ def traverse(op):

traverse(outs[0].op)
return s

@autotvm.register_topi_compute(nn.group_conv2d_nchw, 'cpu', 'direct')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't need to register this as autotvm template since you are using depthwise tag

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I agree. I've just removed it.

@kumasento kumasento changed the title Fixed issue #3069 by registering group_conv2d_nchw to TOPI and checking op tag Fixed issue #3069 by checking op tag Apr 24, 2019
@kumasento
Copy link
Contributor Author

Maybe it's worth adding a test case to verify the correct schedules are fetched for different kind of conv2d?

That would be great. I'm still looking for the correct file for adding such test under tests/python/relay but I cannot find one. Should I create a new test file for it?

@kumasento
Copy link
Contributor Author

kumasento commented Apr 25, 2019

@vinx13 Sure, I have done adding assertions in the latest commit.

But the second problem persists. I've dug into the error and noticed that group_conv2d_nchw_cuda returns a 5D tensor, which doesn't match the interface from Relay.

TVMError: Check failed: Equal(lhs, rhs): Failed to match the compute with TensorIntrin tensor_intrin's declaration  provided= reduce(combiner=comm_reducer(result=[(x + y)], lhs=[x], rhs=[y], identity_element=[0]), source=[(int32(x(rc))*int32(y(rc)))], axis=[iter_var(rc, Range(min=0, extent=4))], where=(uint1)1, valu
e_index=0), intrin=  reduce(combiner=comm_reducer(result=[(x + y)], lhs=[x], rhs=[y], identity_element=[0]), source=[(int32(x(rc))*int32(y(rc)))], axis=[iter_var(rc, Range(min=0, extent=4))], where=(uint1)1, value_index=0)
Error during compile function

You may also notice that in the CI error message.

Should I alter the shape of the tensor returned from group_conv2d_nchw_cuda directly? Thanks

@vinx13
Copy link
Member

vinx13 commented Apr 25, 2019

@kumasento group_conv in NCHWc layout on cuda is not intended to be used diredctly. The autotvm registration is incorrect. Currently group_conv2d on CUDA have only int8 template, which should not be registered under direct.
Replacing https://github.com/dmlc/tvm/blob/1e84d04cbd56e2dc54f8e099acaa783607714796/topi/python/topi/cuda/group_conv2d_nchw.py#L30 with below lines will fix the problem.

autotvm.register_topi_compute(nn.group_conv2d_nchw, ['cuda', 'gpu'], 'direct', nn.group_conv2d_nchw.fdefault)
 
@autotvm.register_topi_compute(nn.group_conv2d_nchw, ['cuda', 'gpu'], ['int8'])

Since we don't have a direct schedule on CUDA yet, we need to disable group conv test on gpu.

@kumasento
Copy link
Contributor Author

kumasento commented Apr 25, 2019

I think it might due to group_conv2d_nchw_cuda is actually designed for NCHW4c?

UPDATE: I just noticed your comment :) Yes, I agree with that

@vinx13
Copy link
Member

vinx13 commented Apr 25, 2019

I think it might due to group_conv2d_nchw_cuda is actually designed for NCHW4c?

Yes, it is NCHW4c for int8 dtype. It's expected to be called by AlterOpLayout pass

Removed 'direct' CUDA tests
padding=(0, 1), channels=10, kernel_size=(1 ,3))
# mixed precision group conv2d
# NOTE(kumasento): This test cannot pass
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to schedule a group_conv2d_NCHW4c_int8 command here but this test failed. Needs some further investigation.

Copy link
Member

@vinx13 vinx13 Apr 26, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kumasento By default we use direct schedule, so this test won't pass on CUDA. I would suggest disabling CUDA target for now

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, I will just remove these commented lines of tests.

@kumasento
Copy link
Contributor Author

@vinx13 would you mind having another round of reviewing? Thanks!

@@ -575,6 +575,11 @@ def group_conv2d_nchw(Input, Filter, stride, padding, dilation, groups, out_dtyp
else:
dilation_h, dilation_w = dilation

if Input.dtype == 'int8':
tag = 'group_conv2d_NCHWc_int8'
Copy link
Member

@vinx13 vinx13 Apr 25, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is generic compute function that can handle any dtype and should not be tagged as NCHWc. There is another implementation optimized for int8 for CUDA backend.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I've just removed that. Thanks for pointing it out.

@tqchen
Copy link
Member

tqchen commented Apr 25, 2019

Will leave the PR management to @vinx13

@vinx13 vinx13 requested a review from kevinthesun April 26, 2019 12:52
@vinx13 vinx13 merged commit 8f56949 into apache:master Apr 27, 2019
@vinx13
Copy link
Member

vinx13 commented Apr 27, 2019

thanks @kumasento @kevinthesun @tqchen this is merged

wweic pushed a commit to wweic/tvm that referenced this pull request May 13, 2019
* Fixed issue apache#3069 by adding in_channels

* Registerd group_conv2d_nchw as topi compute

* Improved by checking tag value

* Removed group_conv2d_nchw topi registration

* Added test for relay group_conv2d_nchw

* Added assertions to forbid small group size

* Removed hard-coded oc_block_factor

* Added explanatory comments to group_conv2d_nchw_cuda

* Updated group_conv2d_nchw_cuda schedule

Removed 'direct' CUDA tests

* Reverted an accidental change in a conv2d test

* Fixed indentation problems

* Fixed a mis-commented line

* Reverted change in group_conv2d_nchw tag

* Removed commented int8 group_conv2d test

* Fixed group size assertions in group_conv2d_nchw_cuda
wweic pushed a commit to neo-ai/tvm that referenced this pull request May 13, 2019
* Fixed issue apache#3069 by adding in_channels

* Registerd group_conv2d_nchw as topi compute

* Improved by checking tag value

* Removed group_conv2d_nchw topi registration

* Added test for relay group_conv2d_nchw

* Added assertions to forbid small group size

* Removed hard-coded oc_block_factor

* Added explanatory comments to group_conv2d_nchw_cuda

* Updated group_conv2d_nchw_cuda schedule

Removed 'direct' CUDA tests

* Reverted an accidental change in a conv2d test

* Fixed indentation problems

* Fixed a mis-commented line

* Reverted change in group_conv2d_nchw tag

* Removed commented int8 group_conv2d test

* Fixed group size assertions in group_conv2d_nchw_cuda
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants