[Frontend][PaddlePaddle] Update the export method of PaddlePaddle Softmax #16653

Zheng-Bicheng · 2024-02-28T13:19:41Z

中文简介(Chinese Introduction)

当PaddlePaddle模型想要将算子offload到cmsis-nn上时，需要通过cmsisnn.py的 qnn_softmax_pattern() 来判断是否符合要求。目前将PaddlePaddle转换为cmsis-nn的图时，会将Softmax拆分为几个不同的算子，此时无法被offload到cmsis-nn。

English Introduction(英文简介)

When a PaddlePaddle model wants to offload operators to CMSIS-NN, it needs to determine if it meets the requirements through qnn_softmax_pattern() in cmsisnn.py. Currently, when converting PaddlePaddle to CMSIS-NN graphs, Softmax is split into several different operators, which cannot be offloaded to CMSIS-NN at that point.

leandron

Given this is improving coverage for operators to be offloaded to CMSIS-NN, does that make sense to add a test case in the CMSIS-NN namespace (https://github.com/apache/tvm/tree/main/tests/python/contrib/test_cmsisnn) that tests for the proposed change so that I doesn't regress in future?

I suggest this is done in this same PR, so that we are not missing on tests.

Zheng-Bicheng · 2024-02-29T01:26:21Z

Given this is improving coverage for operators to be offloaded to CMSIS-NN, does that make sense to add a test case in the CMSIS-NN namespace (https://github.com/apache/tvm/tree/main/tests/python/contrib/test_cmsisnn) that tests for the proposed change so that I doesn't regress in future?

I suggest this is done in this same PR, so that we are not missing on tests.

Hello Leandron(@leandron), I understand your concerns, but TVM currently does not support reading quantized models from Paddle. Therefore, the Softmax operator cannot yet be offloaded to CMSIS-NN. I am working on adding support for TVM to handle Paddle's quantized models. I plan to add this test after the successful merge of TVM PR 16651.

Zheng-Bicheng · 2024-03-03T03:38:33Z

@jiangjiajun

jiangjiajun

This PR optimizes the conversion mapping from the Paddle Softmax operator to the TVM operator, replacing the original combination of multiple small operators with one softmax operator which looks more reasonable.
This PR looks good to me.

leandron · 2024-03-04T12:29:18Z

Hello Leandron(@leandron), I understand your concerns, but TVM currently does not support reading quantized models from Paddle. Therefore, the Softmax operator cannot yet be offloaded to CMSIS-NN. I am working on adding support for TVM to handle Paddle's quantized models. I plan to add this test after the successful merge of TVM PR 16651.

Sure, so let's get #16651 merged, and then update this one with a test case, then we merge this.

Zheng-Bicheng · 2024-03-07T09:46:31Z

Hi, @leandron. TVM Pull Request 16651 has been successfully merged. What type of example should I add in the testing next? (I see you've already added tests for Softmax.)

leandron · 2024-03-07T16:34:02Z

Hi, @leandron. TVM Pull Request 16651 has been successfully merged. What type of example should I add in the testing next? (I see you've already added tests for Softmax.)

Generally speaking, I suggest adding tests that will exercise the path being proposed here, that is from PaddlePaddle to CMSIS-NN, including at least one softmax operator. Does that make sense?

Zheng-Bicheng · 2024-03-08T02:23:11Z

Generally speaking, I suggest adding tests that will exercise the path being proposed here, that is from PaddlePaddle to CMSIS-NN, including at least one softmax operator. Does that make sense?

Are you suggesting using a PaddlePaddle model with softmax parameters and specifying the runtime as CMSIS-NN, then validating whether the output results of CMSIS-NN match those of the PaddlePaddle model? This approach might not be feasible at the moment. I've already highlighted the potential issues with this method in TVM Pull Request 16651.

In simple terms, in the current version of TVM, when a quantized PaddlePaddle model is converted to a TVM model, there are discrepancies in the model's computation results. I'm confident this isn't an issue with my porting efforts because the same problem exists with ONNX models.

You can review my detailed test code in TVM Pull Request 16651, where I convert the quantized Paddle model to a TVM model and specify the target to be llvm running on CPU.

leandron · 2024-03-08T09:25:52Z

Generally speaking, I suggest adding tests that will exercise the path being proposed here, that is from PaddlePaddle to CMSIS-NN, including at least one softmax operator. Does that make sense?

Are you suggesting using a PaddlePaddle model with softmax parameters and specifying the runtime as CMSIS-NN, then validating whether the output results of CMSIS-NN match those of the PaddlePaddle model? This approach might not be feasible at the moment. I've already highlighted the potential issues with this method in TVM Pull Request 16651.

I'm suggesting that tests need to be added when new code is contributed. Now, to what degree of specific tests are practical with specific contributions, that is part of the contribution process.

It is a known fact that TVM will have numerical discrepancies with various input frameworks, but that hasn't been a blocker to adding tests to the existing implementation in all different frameworks. I don't see why PaddlePaddle should be an exception to that, and it is not clear in the long term, what is the rush into merging contributions without tests, as it increases technical debt to the project.

Zheng-Bicheng · 2024-03-08T09:44:31Z

I'm suggesting that tests need to be added when new code is contributed. Now, to what degree of specific tests are practical with specific contributions, that is part of the contribution process.

It is a known fact that TVM will have numerical discrepancies with various input frameworks, but that hasn't been a blocker to adding tests to the existing implementation in all different frameworks. I don't see why PaddlePaddle should be an exception to that, and it is not clear in the long term, what is the rush into merging contributions without tests, as it increases technical debt to the project.

Sorry, English isn't my native language, so I might not have expressed my point clearly. The issue we're discussing isn't about 'whether to add tests' but rather 'what level of error is acceptable in the tests.'

There's an inherent error between TVM and various frameworks, and without altering the core TVM code, we can only tolerate a larger margin of error. This error isn't unique to the PaddlePaddle framework; it's also present in the ONNX framework. It's not that PaddlePaddle models are a special case.

The pull request for supporting ONNX quantized models was merged earlier than the PR for supporting PaddlePaddle quantized models. Since we allow for errors in ONNX quantized models in TVM, why can't we allow for the same level of error in PaddlePaddle quantized models in TVM?

Zheng-Bicheng · 2024-03-08T09:49:13Z

For example, with the same input, assuming there is a 5% error in the output of both the TVM and Paddle models (for instance, using cosine similarity as the measure of error). So, when determining whether the output of the TVM model is correct, is it acceptable to allow for this 5% error?

leandron · 2024-03-08T09:52:57Z

The pull request for supporting ONNX quantized models was merged earlier than the PR for supporting PaddlePaddle quantized models. Since we allow for errors in ONNX quantized models in TVM, why can't we allow for the same level of error in PaddlePaddle quantized models in TVM?

Sure, I understand your point.

I agree with this approach, please add tests for the PaddlePaddle path offloading softmax to CMSIS-NN that contain an error margin for the expected numerical discrepancy. Please do that as part of this PR.

Zheng-Bicheng · 2024-03-08T09:57:11Z

Sure, I understand your point.

I agree with this approach, please add tests for the PaddlePaddle path offloading softmax to CMSIS-NN that contain an error margin for the expected numerical discrepancy. Please do that as part of this PR.

I apologize for not expressing the meaning clearly. I will add corresponding tests in this PR. 😊

leandron · 2024-03-08T09:57:43Z

For example, with the same input, assuming there is a 5% error in the output of both the TVM and Paddle models (for instance, using cosine similarity as the measure of error). So, when determining whether the output of the TVM model is correct, is it acceptable to allow for this 5% error?

If that's what achievable, I'd say we should add the tests, merge this PR, and in parallel try to understand why this discrepancy is quite high, being 5%. Maybe that understanding will involve the PaddlePaddle community as well, so it's a longer process, and shouldn't block this particular PR.

Zheng-Bicheng · 2024-03-08T10:16:42Z

If that's what achievable, I'd say we should add the tests, merge this PR, and in parallel try to understand why this discrepancy is quite high, being 5%. Maybe that understanding will involve the PaddlePaddle community as well, so it's a longer process.

I fully endorse your point of view. I also believe that testing for the reasons behind errors is crucial, as it helps make TVM's code more 'robust.' After merging the PR, I will create a separate issue to discuss this matter, as it not only concerns the PaddlePaddle community but also the ONNX community.

I've conducted some tests regarding the error. I pruned the model to only leave a quantized conv2d operator and tested it with input data of the same shape but different values. I found that this error doesn't exist for every input data; in most cases, it doesn't occur.

Subsequently, I analyzed each element of the output. I found that when the error occurs, the majority of elements are the same, with only a small portion being different. Based on my work experience, I speculate that this is likely due to improper handling of overflow during computation.

tqchen · 2024-03-08T14:21:40Z

Chimingin here a bit, i think the overall we would like to gradually move away from relying on e2e tests. in this case, there are usually two parts:

T0: The frontend imports into the right operator, unittested through structural equality especially as we migrate towards relax
T1: The overall accuracy issue of a particular pattern that caused backend to regress, in this case usually it is desirable to directly construct the IR in low level possible and test that

Moving forward, i think we should decouple T0 and T1 (into likely separate PRs). This would help us to build more robust tests that runs faster and ensure unittest cases remain more UT, runs faster and can locate errors more effectively

Zheng-Bicheng · 2024-03-26T10:43:45Z

Hello,@leandron . I found in cmsis.py that the scale of softmax must be 1/256 and the zero point must be -128. Why is that? According to the formula Q(x_fp32, scale, zero_point) = round(x_fp32/scale) + zero_point, scale and zp should be adjustable (for example, in the case where scale is 1/128 and zp is 0, it should still meet the conditions for int8), right?

Zheng-Bicheng · 2024-03-26T10:46:30Z

Hello,@leandron . I found in cmsis.py that the scale of softmax must be 1/256 and the zero point must be -128. Why is that? According to the formula Q(x_fp32, scale, zero_point) = round(x_fp32/scale) + zero_point, scale and zp should be adjustable (for example, in the case where scale is 1/128 and zp is 0, it should still meet the conditions for int8), right?

By the way, in my testing of the Paddle model, the scale is 0.0078649195 (close to 1/127), and the zero point is 0."

leandron · 2024-03-27T09:43:39Z

Hello,@leandron . I found in cmsis.py that the scale of softmax must be 1/256 and the zero point must be -128. Why is that? According to the formula Q(x_fp32, scale, zero_point) = round(x_fp32/scale) + zero_point, scale and zp should be adjustable (for example, in the case where scale is 1/128 and zp is 0, it should still meet the conditions for int8), right?

By the way, in my testing of the Paddle model, the scale is 0.0078649195 (close to 1/127), and the zero point is 0."

This SOFTMAX operator coming from CMSIS-NN supports bit accurate (by design) compliance with TFLite.

These are some interesting documentation links:

Softmax operator in CMSIS-NN: https://arm-software.github.io/CMSIS_5/NN/html/group__Softmax.html
Quantization specification in TFLite: https://www.tensorflow.org/lite/performance/quantization_spec

Note the following restriction in the TFLite softmax:

SOFTMAX
  Input 0:
    data_type  : int8
    range      : [-128, 127]
    granularity: per-tensor
  Output 0:
    data_type  : int8
    range      : [-128, 127]
    granularity: per-tensor
    restriction: (scale, zero_point) = (1.0 / 256.0, -128)

In summary, the restriction is expected, and comes from the bit-accurate support to TFLite.

update paddlepaddle softmax

2dd69cd

leandron requested changes Feb 28, 2024

View reviewed changes

Merge branch 'apache:main' into fixed_softmax

c9f589c

jiangjiajun approved these changes Mar 4, 2024

View reviewed changes

Merge branch 'main' into fixed_softmax

f9e58a7

update paddlepaddle.py

1226b81

Zheng-Bicheng added 3 commits March 26, 2024 18:47

update cmsisnn.py

17c5b53

update cmsisnn.py

13a8cd5

update lint

28a2266

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Frontend][PaddlePaddle] Update the export method of PaddlePaddle Softmax #16653

[Frontend][PaddlePaddle] Update the export method of PaddlePaddle Softmax #16653

Zheng-Bicheng commented Feb 28, 2024 •

edited

leandron left a comment

Zheng-Bicheng commented Feb 29, 2024

Zheng-Bicheng commented Mar 3, 2024

jiangjiajun left a comment

leandron commented Mar 4, 2024 •

edited

Zheng-Bicheng commented Mar 7, 2024

leandron commented Mar 7, 2024

Zheng-Bicheng commented Mar 8, 2024

leandron commented Mar 8, 2024 •

edited

Zheng-Bicheng commented Mar 8, 2024

Zheng-Bicheng commented Mar 8, 2024

leandron commented Mar 8, 2024

Zheng-Bicheng commented Mar 8, 2024 •

edited

leandron commented Mar 8, 2024 •

edited

Zheng-Bicheng commented Mar 8, 2024 •

edited

tqchen commented Mar 8, 2024

Zheng-Bicheng commented Mar 26, 2024 •

edited

Zheng-Bicheng commented Mar 26, 2024

leandron commented Mar 27, 2024

[Frontend][PaddlePaddle] Update the export method of PaddlePaddle Softmax #16653

Are you sure you want to change the base?

[Frontend][PaddlePaddle] Update the export method of PaddlePaddle Softmax #16653

Conversation

Zheng-Bicheng commented Feb 28, 2024 • edited

中文简介(Chinese Introduction)

English Introduction(英文简介)

leandron left a comment

Choose a reason for hiding this comment

Zheng-Bicheng commented Feb 29, 2024

Zheng-Bicheng commented Mar 3, 2024

jiangjiajun left a comment

Choose a reason for hiding this comment

leandron commented Mar 4, 2024 • edited

Zheng-Bicheng commented Mar 7, 2024

leandron commented Mar 7, 2024

Zheng-Bicheng commented Mar 8, 2024

leandron commented Mar 8, 2024 • edited

Zheng-Bicheng commented Mar 8, 2024

Zheng-Bicheng commented Mar 8, 2024

leandron commented Mar 8, 2024

Zheng-Bicheng commented Mar 8, 2024 • edited

leandron commented Mar 8, 2024 • edited

Zheng-Bicheng commented Mar 8, 2024 • edited

tqchen commented Mar 8, 2024

Zheng-Bicheng commented Mar 26, 2024 • edited

Zheng-Bicheng commented Mar 26, 2024

leandron commented Mar 27, 2024

Zheng-Bicheng commented Feb 28, 2024 •

edited

leandron commented Mar 4, 2024 •

edited

leandron commented Mar 8, 2024 •

edited

Zheng-Bicheng commented Mar 8, 2024 •

edited

leandron commented Mar 8, 2024 •

edited

Zheng-Bicheng commented Mar 8, 2024 •

edited

Zheng-Bicheng commented Mar 26, 2024 •

edited