Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Frontend][PaddlePaddle] Update the export method of PaddlePaddle Softmax #16653

Open
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

Zheng-Bicheng
Copy link
Contributor

@Zheng-Bicheng Zheng-Bicheng commented Feb 28, 2024

中文简介(Chinese Introduction)

当PaddlePaddle模型想要将算子offload到cmsis-nn上时,需要通过cmsisnn.pyqnn_softmax_pattern() 来判断是否符合要求。目前将PaddlePaddle转换为cmsis-nn的图时,会将Softmax拆分为几个不同的算子,此时无法被offload到cmsis-nn。

English Introduction(英文简介)

When a PaddlePaddle model wants to offload operators to CMSIS-NN, it needs to determine if it meets the requirements through qnn_softmax_pattern() in cmsisnn.py. Currently, when converting PaddlePaddle to CMSIS-NN graphs, Softmax is split into several different operators, which cannot be offloaded to CMSIS-NN at that point.

Copy link
Contributor

@leandron leandron left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given this is improving coverage for operators to be offloaded to CMSIS-NN, does that make sense to add a test case in the CMSIS-NN namespace (https://github.com/apache/tvm/tree/main/tests/python/contrib/test_cmsisnn) that tests for the proposed change so that I doesn't regress in future?

I suggest this is done in this same PR, so that we are not missing on tests.

@Zheng-Bicheng
Copy link
Contributor Author

Given this is improving coverage for operators to be offloaded to CMSIS-NN, does that make sense to add a test case in the CMSIS-NN namespace (https://github.com/apache/tvm/tree/main/tests/python/contrib/test_cmsisnn) that tests for the proposed change so that I doesn't regress in future?

I suggest this is done in this same PR, so that we are not missing on tests.

Hello Leandron(@leandron), I understand your concerns, but TVM currently does not support reading quantized models from Paddle. Therefore, the Softmax operator cannot yet be offloaded to CMSIS-NN. I am working on adding support for TVM to handle Paddle's quantized models. I plan to add this test after the successful merge of TVM PR 16651.

@Zheng-Bicheng
Copy link
Contributor Author

@jiangjiajun

Copy link
Contributor

@jiangjiajun jiangjiajun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR optimizes the conversion mapping from the Paddle Softmax operator to the TVM operator, replacing the original combination of multiple small operators with one softmax operator which looks more reasonable.
This PR looks good to me.

@leandron
Copy link
Contributor

leandron commented Mar 4, 2024

Hello Leandron(@leandron), I understand your concerns, but TVM currently does not support reading quantized models from Paddle. Therefore, the Softmax operator cannot yet be offloaded to CMSIS-NN. I am working on adding support for TVM to handle Paddle's quantized models. I plan to add this test after the successful merge of TVM PR 16651.

Sure, so let's get #16651 merged, and then update this one with a test case, then we merge this.

@Zheng-Bicheng
Copy link
Contributor Author

Hi, @leandron. TVM Pull Request 16651 has been successfully merged. What type of example should I add in the testing next? (I see you've already added tests for Softmax.)

@leandron
Copy link
Contributor

leandron commented Mar 7, 2024

Hi, @leandron. TVM Pull Request 16651 has been successfully merged. What type of example should I add in the testing next? (I see you've already added tests for Softmax.)

Generally speaking, I suggest adding tests that will exercise the path being proposed here, that is from PaddlePaddle to CMSIS-NN, including at least one softmax operator. Does that make sense?

@Zheng-Bicheng
Copy link
Contributor Author

Generally speaking, I suggest adding tests that will exercise the path being proposed here, that is from PaddlePaddle to CMSIS-NN, including at least one softmax operator. Does that make sense?

Are you suggesting using a PaddlePaddle model with softmax parameters and specifying the runtime as CMSIS-NN, then validating whether the output results of CMSIS-NN match those of the PaddlePaddle model? This approach might not be feasible at the moment. I've already highlighted the potential issues with this method in TVM Pull Request 16651.

In simple terms, in the current version of TVM, when a quantized PaddlePaddle model is converted to a TVM model, there are discrepancies in the model's computation results. I'm confident this isn't an issue with my porting efforts because the same problem exists with ONNX models.

You can review my detailed test code in TVM Pull Request 16651, where I convert the quantized Paddle model to a TVM model and specify the target to be llvm running on CPU.

@leandron
Copy link
Contributor

leandron commented Mar 8, 2024

Generally speaking, I suggest adding tests that will exercise the path being proposed here, that is from PaddlePaddle to CMSIS-NN, including at least one softmax operator. Does that make sense?

Are you suggesting using a PaddlePaddle model with softmax parameters and specifying the runtime as CMSIS-NN, then validating whether the output results of CMSIS-NN match those of the PaddlePaddle model? This approach might not be feasible at the moment. I've already highlighted the potential issues with this method in TVM Pull Request 16651.

I'm suggesting that tests need to be added when new code is contributed. Now, to what degree of specific tests are practical with specific contributions, that is part of the contribution process.

It is a known fact that TVM will have numerical discrepancies with various input frameworks, but that hasn't been a blocker to adding tests to the existing implementation in all different frameworks. I don't see why PaddlePaddle should be an exception to that, and it is not clear in the long term, what is the rush into merging contributions without tests, as it increases technical debt to the project.

@Zheng-Bicheng
Copy link
Contributor Author

I'm suggesting that tests need to be added when new code is contributed. Now, to what degree of specific tests are practical with specific contributions, that is part of the contribution process.

It is a known fact that TVM will have numerical discrepancies with various input frameworks, but that hasn't been a blocker to adding tests to the existing implementation in all different frameworks. I don't see why PaddlePaddle should be an exception to that, and it is not clear in the long term, what is the rush into merging contributions without tests, as it increases technical debt to the project.

Sorry, English isn't my native language, so I might not have expressed my point clearly. The issue we're discussing isn't about 'whether to add tests' but rather 'what level of error is acceptable in the tests.'

There's an inherent error between TVM and various frameworks, and without altering the core TVM code, we can only tolerate a larger margin of error. This error isn't unique to the PaddlePaddle framework; it's also present in the ONNX framework. It's not that PaddlePaddle models are a special case.

The pull request for supporting ONNX quantized models was merged earlier than the PR for supporting PaddlePaddle quantized models. Since we allow for errors in ONNX quantized models in TVM, why can't we allow for the same level of error in PaddlePaddle quantized models in TVM?

@Zheng-Bicheng
Copy link
Contributor Author

For example, with the same input, assuming there is a 5% error in the output of both the TVM and Paddle models (for instance, using cosine similarity as the measure of error). So, when determining whether the output of the TVM model is correct, is it acceptable to allow for this 5% error?

@leandron
Copy link
Contributor

leandron commented Mar 8, 2024

The pull request for supporting ONNX quantized models was merged earlier than the PR for supporting PaddlePaddle quantized models. Since we allow for errors in ONNX quantized models in TVM, why can't we allow for the same level of error in PaddlePaddle quantized models in TVM?

Sure, I understand your point.

I agree with this approach, please add tests for the PaddlePaddle path offloading softmax to CMSIS-NN that contain an error margin for the expected numerical discrepancy. Please do that as part of this PR.

@Zheng-Bicheng
Copy link
Contributor Author

Zheng-Bicheng commented Mar 8, 2024

Sure, I understand your point.

I agree with this approach, please add tests for the PaddlePaddle path offloading softmax to CMSIS-NN that contain an error margin for the expected numerical discrepancy. Please do that as part of this PR.

I apologize for not expressing the meaning clearly. I will add corresponding tests in this PR. 😊

@leandron
Copy link
Contributor

leandron commented Mar 8, 2024

For example, with the same input, assuming there is a 5% error in the output of both the TVM and Paddle models (for instance, using cosine similarity as the measure of error). So, when determining whether the output of the TVM model is correct, is it acceptable to allow for this 5% error?

If that's what achievable, I'd say we should add the tests, merge this PR, and in parallel try to understand why this discrepancy is quite high, being 5%. Maybe that understanding will involve the PaddlePaddle community as well, so it's a longer process, and shouldn't block this particular PR.

@Zheng-Bicheng
Copy link
Contributor Author

Zheng-Bicheng commented Mar 8, 2024

If that's what achievable, I'd say we should add the tests, merge this PR, and in parallel try to understand why this discrepancy is quite high, being 5%. Maybe that understanding will involve the PaddlePaddle community as well, so it's a longer process.

I fully endorse your point of view. I also believe that testing for the reasons behind errors is crucial, as it helps make TVM's code more 'robust.' After merging the PR, I will create a separate issue to discuss this matter, as it not only concerns the PaddlePaddle community but also the ONNX community.

I've conducted some tests regarding the error. I pruned the model to only leave a quantized conv2d operator and tested it with input data of the same shape but different values. I found that this error doesn't exist for every input data; in most cases, it doesn't occur.

Subsequently, I analyzed each element of the output. I found that when the error occurs, the majority of elements are the same, with only a small portion being different. Based on my work experience, I speculate that this is likely due to improper handling of overflow during computation.

@tqchen
Copy link
Member

tqchen commented Mar 8, 2024

Chimingin here a bit, i think the overall we would like to gradually move away from relying on e2e tests. in this case, there are usually two parts:

  • T0: The frontend imports into the right operator, unittested through structural equality especially as we migrate towards relax
  • T1: The overall accuracy issue of a particular pattern that caused backend to regress, in this case usually it is desirable to directly construct the IR in low level possible and test that

Moving forward, i think we should decouple T0 and T1 (into likely separate PRs). This would help us to build more robust tests that runs faster and ensure unittest cases remain more UT, runs faster and can locate errors more effectively

@Zheng-Bicheng
Copy link
Contributor Author

Zheng-Bicheng commented Mar 26, 2024

Hello,@leandron . I found in cmsis.py that the scale of softmax must be 1/256 and the zero point must be -128. Why is that? According to the formula Q(x_fp32, scale, zero_point) = round(x_fp32/scale) + zero_point, scale and zp should be adjustable (for example, in the case where scale is 1/128 and zp is 0, it should still meet the conditions for int8), right?

@Zheng-Bicheng
Copy link
Contributor Author

Hello,@leandron . I found in cmsis.py that the scale of softmax must be 1/256 and the zero point must be -128. Why is that? According to the formula Q(x_fp32, scale, zero_point) = round(x_fp32/scale) + zero_point, scale and zp should be adjustable (for example, in the case where scale is 1/128 and zp is 0, it should still meet the conditions for int8), right?

By the way, in my testing of the Paddle model, the scale is 0.0078649195 (close to 1/127), and the zero point is 0."

@leandron
Copy link
Contributor

Hello,@leandron . I found in cmsis.py that the scale of softmax must be 1/256 and the zero point must be -128. Why is that? According to the formula Q(x_fp32, scale, zero_point) = round(x_fp32/scale) + zero_point, scale and zp should be adjustable (for example, in the case where scale is 1/128 and zp is 0, it should still meet the conditions for int8), right?

By the way, in my testing of the Paddle model, the scale is 0.0078649195 (close to 1/127), and the zero point is 0."

This SOFTMAX operator coming from CMSIS-NN supports bit accurate (by design) compliance with TFLite.

These are some interesting documentation links:

Note the following restriction in the TFLite softmax:

SOFTMAX
  Input 0:
    data_type  : int8
    range      : [-128, 127]
    granularity: per-tensor
  Output 0:
    data_type  : int8
    range      : [-128, 127]
    granularity: per-tensor
    restriction: (scale, zero_point) = (1.0 / 256.0, -128)

In summary, the restriction is expected, and comes from the bit-accurate support to TFLite.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants