Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding support for TFLite QnnSubtract operator. #5230

Merged
merged 1 commit into from Apr 10, 2020

Conversation

shoubhik
Copy link
Contributor

@shoubhik shoubhik commented Apr 3, 2020

  • Adding TFLite quantized subtract operator with test case.
  • Adding fix for depthwise conv2d if input channel is 1 with test case.

@shoubhik
Copy link
Contributor Author

shoubhik commented Apr 3, 2020

@anijain2305 and @zhiics can you take a look at this PR.

@anijain2305
Copy link
Contributor

anijain2305 commented Apr 3, 2020

@FrozenGene @yzhliu Can you take a look as well? Idea is that if input channel == 1, the depthwise conv becomes simple conv.

@shoubhik
Copy link
Contributor Author

shoubhik commented Apr 3, 2020

The context for the change is this - tensorflow/tensorflow#23563

@shoubhik
Copy link
Contributor Author

shoubhik commented Apr 6, 2020

@FrozenGene @yzhliu Can you take a look at the fix for depthwise conv2d with 1 input channel.

@anijain2305
Copy link
Contributor

anijain2305 commented Apr 7, 2020

@FrozenGene If we have C = 1, then depth wise conv becomes normal conv. There is nothing to accumulate across input channels basically. And depth_multiplier becomes equal to the number of output channels. What do you think? Is the change good with you?

@FrozenGene
Copy link
Member

@FrozenGene If we have C = 1, then depth wise conv becomes normal conv. There is nothing to accumulate across input channels basically. And depth_multiplier becomes equal to the number of output channels. What do you think? Is the change good with you?

I think it is ok. I met this thing ever. That is when input channel is 1 (for example, gray image), we will get depthwise convolution of multiplier greater than 1 rather than normal convolution. Would you mind doing one performance comparison? I expect we could get better performance when we use normal convolution.

@anijain2305
Copy link
Contributor

@FrozenGene It is not possible to use depthwise conv inside Relay here. The groups=1 in this case and Relay code only considers a conv as depthwise when groups > 1. So, I don't think there are 2 choices here. Let us know if you are ok with this.

@FrozenGene
Copy link
Member

FrozenGene commented Apr 9, 2020

@FrozenGene It is not possible to use depthwise conv inside Relay here. The groups=1 in this case and Relay code only considers a conv as depthwise when groups > 1. So, I don't think there are 2 choices here. Let us know if you are ok with this.

Yes. My meaning is before we decide to change or not, could we do one performance comparation so that we know depthwise with multiplier > 1 is good or becomes normal convolution is good.

@anijain2305
Copy link
Contributor

Thanks for the reply. Can you please share what two things we are comparing? I am still confused. Please let me know example of input shape, depth multiplier, groups etc.

@FrozenGene
Copy link
Member

FrozenGene commented Apr 9, 2020

Thanks for the reply. Can you please share what two things we are comparing? I am still confused. Please let me know example of input shape, depth multiplier, groups etc.

I leverage this issue tensorflow/tensorflow#23563 we talked about. For example we feed the input (1, 224, 224, 1),in our tensorflow we have convolution to handle it. i.e. input -> convolution. However, the tflite will change the convolution into depthwise convolution with multiplier greater than 1, the input channel will be 1. Like this picture of tensorflow issue:
image
I suspect if we don't change to depthwise convolution with multiplier greater than 1 and keep the original convolution, like this:
image
We could get better performance.

@anijain2305
Copy link
Contributor

Correct. So here, TFLite converts Conv to depthwise conv, where depthwise conv has input_channels = groups = 1 and depth multiplier > 1

This PR converts this type of depthwise conv back to normal conv. Exactly the same way that you are suggesting. (There is not even a way to run Relay depthwise conv with groups = 1)

Let us know if that is ok.

@FrozenGene
Copy link
Member

Correct. So here, TFLite converts Conv to depthwise conv, where depthwise conv has input_channels = groups = 1 and depth multiplier > 1

This PR converts this type of depthwise conv back to normal conv. Exactly the same way that you are suggesting. (There is not even a way to run Relay depthwise conv with groups = 1)

Let us know if that is ok.

Yes. Convolution should be better I expect. However, I wish we could have one performance comparation result between these two ways. So that we decide whether we keep depthwise convolution with multiplier greater than 1 or like this pr converting back to normal convolution finally.

@shoubhik
Copy link
Contributor Author

shoubhik commented Apr 9, 2020

Correct. So here, TFLite converts Conv to depthwise conv, where depthwise conv has input_channels = groups = 1 and depth multiplier > 1
This PR converts this type of depthwise conv back to normal conv. Exactly the same way that you are suggesting. (There is not even a way to run Relay depthwise conv with groups = 1)
Let us know if that is ok.

Yes. Convolution should be better I expect. However, I wish we could have one performance comparation result between these two ways. So that we decide whether we keep depthwise convolution with multiplier greater than 1 or like this pr converting back to normal convolution finally.

@FrozenGene I think I have not clarified the issue why we are converting the kernel layout for HWOI to HWIO when number of input channels is 1.

First, let us take the example of the current scenario

params['kernel_layout'] = 'HWOI'

Assuming you have a network you mentioned before with depthwise convolution with input_channel = 1 and try to parse a Relay graph using the from_tflite API you get an error like this

v0.0.4
fn (%input: Tensor[(1, 76, 64, 1), uint8], %v_param_1: Tensor[(64, 1), uint8], %v_param_2: Tensor[(1, 64, 1), uint8], %v_param_3: Tensor[(9, 5, 1, 96), uint8], ...) {
  %0 = qnn.subtract(%input, %v_param_1, ...);
  %1 = qnn.mul(%0, %v_param_2, ...);
  %2 = qnn.conv2d(%1, %v_param_3, ..., data_layout="NHWC", kernel_layout="HWOI", out_dtype="int32") in particular dimension 2 conflicts 96 does not match 1dimension 3 conflicts 1 does not match 96; unable to unify: `Tensor[(9, 5, 96, 1), uint8]` and `Tensor[(9, 5, 1, 96), uint8]`; ;

The change

params['kernel_layout'] = 'HWIO' if input_c == 1 else 'HWOI'

fixes this issue.

This error happens because input_channel is 1 and in the Conv2dRel function fails as there are checks for param->groups > 1. So this fix is mainly to get the current implementation working.

In order to do the benchmarking, you are suggesting we need to change the underlying Conv2dRel functionality which I think should be part of another PR and RFC perhaps.

@FrozenGene
Copy link
Member

Correct. So here, TFLite converts Conv to depthwise conv, where depthwise conv has input_channels = groups = 1 and depth multiplier > 1
This PR converts this type of depthwise conv back to normal conv. Exactly the same way that you are suggesting. (There is not even a way to run Relay depthwise conv with groups = 1)
Let us know if that is ok.

Yes. Convolution should be better I expect. However, I wish we could have one performance comparation result between these two ways. So that we decide whether we keep depthwise convolution with multiplier greater than 1 or like this pr converting back to normal convolution finally.

@FrozenGene I think I have not clarified the issue why we are converting the kernel layout for HWOI to HWIO when number of input channels is 1.

First, let us take the example of the current scenario

params['kernel_layout'] = 'HWOI'

Assuming you have a network you mentioned before with depthwise convolution with input_channel = 1 and try to parse a Relay graph using the from_tflite API you get an error like this

v0.0.4
fn (%input: Tensor[(1, 76, 64, 1), uint8], %v_param_1: Tensor[(64, 1), uint8], %v_param_2: Tensor[(1, 64, 1), uint8], %v_param_3: Tensor[(9, 5, 1, 96), uint8], ...) {
  %0 = qnn.subtract(%input, %v_param_1, ...);
  %1 = qnn.mul(%0, %v_param_2, ...);
  %2 = qnn.conv2d(%1, %v_param_3, ..., data_layout="NHWC", kernel_layout="HWOI", out_dtype="int32") in particular dimension 2 conflicts 96 does not match 1dimension 3 conflicts 1 does not match 96; unable to unify: `Tensor[(9, 5, 96, 1), uint8]` and `Tensor[(9, 5, 1, 96), uint8]`; ;

The change

params['kernel_layout'] = 'HWIO' if input_c == 1 else 'HWOI'

fixes this issue.

This error happens because input_channel is 1 and in the Conv2dRel function fails as there are checks for param->groups > 1. So this fix is mainly to get the current implementation working.

In order to do the benchmarking, you are suggesting we need to change the underlying Conv2dRel functionality which I think should be part of another PR and RFC perhaps.

Oh...I understand it fully now. Thanks for clarifying. If we don't change the Conv2dRel, yes, we can not do benchmark comparison currently. I am fine with this change.

@anijain2305 anijain2305 merged commit 6ecfaaf into apache:master Apr 10, 2020
@anijain2305
Copy link
Contributor

Thanks @shoubhik @FrozenGene This is merged!

trevor-m pushed a commit to trevor-m/tvm that referenced this pull request Apr 16, 2020
zhiics pushed a commit to neo-ai/tvm that referenced this pull request Apr 17, 2020
dpankratz pushed a commit to dpankratz/incubator-tvm that referenced this pull request Apr 24, 2020
shoubhik added a commit to shoubhik/incubator-tvm that referenced this pull request May 12, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants