[Frontend][PaddlePaddle] PaddlePaddle model with NCHW data format that supports quantization #16651

Zheng-Bicheng · 2024-02-28T02:10:08Z

PaddlePaddle model with NCHW data format that supports quantization

…nto support_nhwc

Zheng-Bicheng · 2024-02-28T08:59:29Z

@jiangjiajun I have added support for the PaddleSlim quantization model, but TVM seems to have issues in this regard.

Test in PaddlePaddle

I used MobileNetV1_QAT trained on PaddleSlim for testing, and the test code is:

import paddle
import tvm
from tvm import relay
from tvm.contrib import graph_executor
import numpy as np

log_file = "tune.json"
if __name__ == "__main__":
    input_shape = [1, 3, 224, 224]
    input_name = "inputs"

    paddle.enable_static()
    prefix = "MobileNetV1_QAT/inference"
    params_file_path = prefix + ".pdiparams"
    exe = paddle.static.Executor(paddle.CPUPlace())
    prog, feed_target_names, fetch_targets = paddle.static.load_inference_model(prefix, exe)
    # build
    mod, params = relay.frontend.from_paddle(prog, shape_dict={input_name: input_shape})

    with tvm.transform.PassContext(opt_level=5):
        lib = relay.build(mod, target="llvm", params=params)
    # create input data
    input_data = np.random.randn(1, 3, 224, 224).astype(np.float32)

    # tvm inference
    ctx = tvm.cpu()
    tvm_model = graph_executor.GraphModule(lib['default'](ctx))
    tvm_model.set_input(input_name, input_data)
    tvm_model.run()
    tvm_output = tvm_model.get_output(0).asnumpy()

    # paddle inference
    paddle_output, = exe.run(prog, feed={feed_target_names[0]: input_data}, fetch_list=fetch_targets)
    print(np.argmax(tvm_output[0]), np.argmax(paddle_output[0]))
    np.testing.assert_allclose(tvm_output[0], paddle_output[0], rtol=1e-5, atol=1e-5)

I found that the test failed with the following error:

AssertionError: 
Not equal to tolerance rtol=1e-05, atol=1e-05

Mismatched elements: 5 / 1000 (0.5%)
Max absolute difference: 0.02359476
Max relative difference: 1.
 x: array([0.      , 0.      , 0.      , 0.      , 0.      , 0.      ,
       0.      , 0.      , 0.      , 0.      , 0.      , 0.      ,
       0.      , 0.      , 0.      , 0.      , 0.      , 0.      ,...
 y: array([0.      , 0.      , 0.      , 0.      , 0.      , 0.      ,
       0.      , 0.      , 0.      , 0.      , 0.      , 0.      ,
       0.      , 0.      , 0.      , 0.      , 0.      , 0.      ,...

Test in ONNX

To verify whether the problem is caused by the different inference mechanisms of TVM and Paddle frameworks, I conducted additional testing on the ONNX framework. The input model is the same model exported from Paddle2ONNX, and the test code is as follows:

import tvm
from tvm import relay
from tvm.contrib import graph_executor
import numpy as np
import onnx
import onnxruntime as rt

onnx_model_path = "MobileNetV1_QAT/inference.onnx"
log_file = "tune.json"
if __name__ == "__main__":
    input_shape = [1, 3, 224, 224]
    input_name = "inputs"

    # build
    onnx_model = onnx.load_model(onnx_model_path)
    mod, params = relay.frontend.from_onnx(onnx_model, shape={input_name: input_shape})

    with tvm.transform.PassContext(opt_level=5):
        lib = relay.build(mod, target="llvm", params=params)
    # create input data
    input_data = np.random.randn(1, 3, 224, 224).astype(np.float32)

    # tvm inference
    ctx = tvm.cpu()
    tvm_model = graph_executor.GraphModule(lib['default'](ctx))
    tvm_model.set_input(input_name, input_data)
    tvm_model.run()
    tvm_output = tvm_model.get_output(0).asnumpy()

    sess = rt.InferenceSession(onnx_model_path,None)
    input_name = sess.get_inputs()[0].name
    out_name = sess.get_outputs()[0].name
    onnx_output = sess.run([out_name], {input_name:input_data})[0]

    print(np.max(tvm_output[0] - onnx_output[0]))
    print(np.argmax(tvm_output[0] - onnx_output[0]))
    np.testing.assert_allclose(tvm_output[0], onnx_output[0], rtol=1e-5, atol=1e-5)

I found that I still couldn't pass the test and the error was very large

Mismatched elements: 270 / 1000 (27%)
Max absolute difference: 0.01025282
Max relative difference: 0.52028346
 x: array([6.539907e-06, 6.727985e-05, 5.355392e-05, 4.267169e-06,
       1.363640e-04, 1.172559e-04, 1.836461e-04, 6.608244e-06,
       7.928120e-06, 3.821332e-06, 1.304903e-05, 4.503604e-05,...
 y: array([6.145719e-06, 7.556988e-05, 5.032599e-05, 4.009968e-06,
       1.281447e-04, 1.101883e-04, 1.725770e-04, 6.209937e-06,
       7.450255e-06, 4.292187e-06, 1.465689e-05, 4.232152e-05,...

Zheng-Bicheng · 2024-02-28T09:22:22Z

@jiangjiajun Is this error acceptable for the PaddleInference model? I tested using a single convolution operator, and in most cases, it meets the requirement of relative and absolute errors within 10 ^ -5. Currently, it is estimated that the accumulation of errors from multiple convolution operators may have caused a change in the output results of the model.

lhutton1 · 2024-02-28T22:49:21Z

@tvm-bot rerun

Zheng-Bicheng · 2024-02-29T01:30:38Z

@lhutton1, I'm glad you could help me rerun the CI. However, the errors in CI [unity/pr-head] seem to remain unresolved, and the tests still fail.

jiangjiajun · 2024-02-29T01:42:03Z

@jiangjiajun 针对与PaddleInference模型来说，这个误差应该是可以接受的？我用单一卷积算子进行了测试，在大多数情况下单个卷积算子是满足相对和绝对误差在 10^-5内的要求的，目前判断应该是是多个卷积算子误差的累积导致最后模型的输出结果发生了改变。

How about the difference between quantized paddle model and quantized onnx model?

Zheng-Bicheng · 2024-02-29T02:16:39Z

@jiangjiajun I integrated the inference code for TVM, PaddlePaddle, and ONNX. The code is as follows:

import paddle
import tvm
from tvm import relay
from tvm.contrib import graph_executor
import numpy as np
import onnx
import onnxruntime as rt

# Model Attr
input_shape = [1, 3, 224, 224]
input_name = "inputs"


def infer_by_paddlepaddle(temp_prefix, temp_input_data):
    paddle.enable_static()
    exe = paddle.static.Executor(paddle.CPUPlace())
    temp_prog, feed_target_names, fetch_targets = paddle.static.load_inference_model(temp_prefix, exe)
    temp_output, = exe.run(temp_prog, feed={feed_target_names[0]: temp_input_data}, fetch_list=fetch_targets)
    return temp_prog, temp_output


def infer_by_onnx(temp_model_path, temp_input_data):
    sess = rt.InferenceSession(temp_model_path, None)
    temp_input_name = sess.get_inputs()[0].name
    out_name = sess.get_outputs()[0].name
    temp_onnx_output = sess.run([out_name], {temp_input_name: temp_input_data})[0]
    temp_onnx_model = onnx.load_model(temp_model_path)
    return temp_onnx_model, temp_onnx_output


def infer_by_tvm(temp_model, temp_input_data):
    if isinstance(temp_model, paddle.static.Program):
        # model is loaded by `paddle.static.load_inference_model`
        mod, params = relay.frontend.from_paddle(temp_model, shape_dict={input_name: input_shape})
    else:
        mod, params = relay.frontend.from_onnx(temp_model, shape={input_name: input_shape})

    with tvm.transform.PassContext(opt_level=5):
        lib = relay.build(mod, target="llvm", params=params)

    # tvm inference
    ctx = tvm.cpu()
    tvm_model = graph_executor.GraphModule(lib['default'](ctx))
    tvm_model.set_input(input_name, temp_input_data)
    tvm_model.run()
    tvm_output = tvm_model.get_output(0).asnumpy()
    return tvm_output


log_file = "tune.json"
if __name__ == "__main__":
    np.random.seed(520)
    # create input data
    input_data = np.random.randn(1, 3, 224, 224).astype(np.float32)

    paddle_prefix = "MobileNetV1_QAT/inference"
    paddle_model, paddle_output = infer_by_paddlepaddle(paddle_prefix, input_data)

    onnx_model_path = "MobileNetV1_QAT/inference.onnx"
    onnx_model, onnx_output = infer_by_paddlepaddle(paddle_prefix, input_data)

    # 对比测试Paddle模型和ONNX模型的输出(通过测试)
    np.testing.assert_allclose(paddle_output[0], onnx_output[0], rtol=1e-5, atol=1e-5)

    # 测试TVM_Paddle模型和TVM_ONNX模型的输出(通过测试)
    tvm_paddle_result = infer_by_tvm(paddle_model, input_data)
    tvm_onnx_result = infer_by_tvm(onnx_model, input_data)
    np.testing.assert_allclose(tvm_paddle_result[0], tvm_onnx_result[0], rtol=1e-5, atol=1e-5)

    # 测试Paddle模型和TVM_Paddle模型的输出
    # np.testing.assert_allclose(tvm_paddle_result[0], paddle_output[0], rtol=1e-5, atol=1e-5)

    # 测试ONNX模型和TVM_ONNX模型的输出
    np.testing.assert_allclose(tvm_onnx_result[0], onnx_output[0], rtol=1e-5, atol=1e-5)

I found that when inputting the same data, the output data of the Paddle model and the ONNX model are consistent.

The differences between TVM and Paddle are as follows:

Mismatched elements: 4 / 1000 (0.4%)
Max absolute difference: 0.01572984
Max relative difference: 1.
 x: array([0.      , 0.      , 0.      , 0.      , 0.      , 0.      ,
       0.      , 0.      , 0.      , 0.      , 0.      , 0.      ,
       0.      , 0.      , 0.      , 0.      , 0.      , 0.      ,...
 y: array([0.      , 0.      , 0.      , 0.      , 0.      , 0.      ,
       0.      , 0.      , 0.      , 0.      , 0.      , 0.      ,
       0.      , 0.      , 0.      , 0.      , 0.      , 0.      ,...

The differences between TVM and ONNX are as follows:

Mismatched elements: 4 / 1000 (0.4%)
Max absolute difference: 0.01572984
Max relative difference: 1.
 x: array([0.      , 0.      , 0.      , 0.      , 0.      , 0.      ,
       0.      , 0.      , 0.      , 0.      , 0.      , 0.      ,
       0.      , 0.      , 0.      , 0.      , 0.      , 0.      ,...
 y: array([0.      , 0.      , 0.      , 0.      , 0.      , 0.      ,
       0.      , 0.      , 0.      , 0.      , 0.      , 0.      ,
       0.      , 0.      , 0.      , 0.      , 0.      , 0.      ,...

Therefore, my initial statement should be considered incorrect; under the same data conditions, both the Paddle model and the ONNX model exhibit the same symptoms.

Zheng-Bicheng · 2024-02-29T06:44:50Z

@tvm-bot rerun tvm-wasm

Zheng-Bicheng · 2024-02-29T06:52:46Z

Hello, @Hzfengsy. I noticed you have submitted PRs related to tvm-bot. I'd like to ask you, how can we rerun only the failed unit tests instead of using the "tvm-bot rerun" command to rerun all CI tests? This would help speed up the merging process for PRs.

Zheng-Bicheng · 2024-02-29T06:53:34Z

@tvm-bot rerun

Zheng-Bicheng · 2024-02-29T06:57:12Z

@tvm-bot rerun

Hzfengsy · 2024-02-29T07:20:51Z

@Zheng-Bicheng Good question. We do not have such a mechanism, because it's unsafe to only test failed tests. For example, we fixed test A which failed the last time, but may introduce a new failure test B.

It's a good method when debugging locally, but not suitable for CI

Zheng-Bicheng · 2024-02-29T09:11:21Z

@tvm-bot rerun

Zheng-Bicheng · 2024-02-29T09:15:15Z

@Zheng-Bicheng Good question. We do not have such a mechanism, because it's unsafe to only test failed tests. For example, we fixed test A which failed the last time, but may introduce a new failure test B.

It's a good method when debugging locally, but not suitable for CI

I understand what you mean, but I've found that CI currently encounters some unknown errors. For example:

CI[lint/pr-head] (Lint 1 of 2) log : log.txt

Each rerun may result in a different CI failure, and I haven't figured out what's causing it. It seems unrelated to the code I've submitted.

Zheng-Bicheng · 2024-02-29T10:05:41Z

@tvm-bot rerun

Zheng-Bicheng · 2024-03-03T03:38:55Z

@tvm-bot rerun

Zheng-Bicheng · 2024-03-03T07:38:54Z

@tvm-bot rerun

Zheng-Bicheng · 2024-03-04T01:04:02Z

@tvm-bot rerun

Zheng-Bicheng · 2024-03-04T12:30:59Z

@tvm-bot rerun

jiangjiajun

LGTM

…t supports quantization (apache#16651) * support conv2d when data_format is NHWC * modify the annotation * Do not convert input data when processing quantization conv_2d nodes * Fix code formatting issues * fixed error code format * update dequantize and quantize * fixed bug when model is fp32 model * update dequantize and quantize * update for paddle quantize model when format is NCHW

Zheng-Bicheng and others added 12 commits February 21, 2024 20:20

support conv2d when data_format is NHWC

c2bf9fb

modify the annotation

f6223d1

Merge branch 'apache:main' into support_nhwc

0f89323

Do not convert input data when processing quantization conv_2d nodes

3866762

Merge branch 'support_nhwc' of https://github.com/Zheng-Bicheng/tvm i…

4e0212d

…nto support_nhwc

Fix code formatting issues

2f8861c

Merge branch 'main' into support_nhwc

14d9623

fixed error code format

3224664

update dequantize and quantize

b804cab

fixed bug when model is fp32 model

de28af3

update dequantize and quantize

a802e47

update for paddle quantize model when format is NCHW

2a214eb

Zheng-Bicheng changed the title ~~[Frontend][PaddlePaddle] PaddlePaddle model with NHWC data format that supports quantization~~ [Frontend][PaddlePaddle] PaddlePaddle model with NCHW data format that supports quantization Feb 28, 2024

Zheng-Bicheng mentioned this pull request Feb 29, 2024

[Frontend][PaddlePaddle] Update the export method of PaddlePaddle Softmax #16653

Open

Merge branch 'apache:main' into support_nhwc

4d247dc

Zheng-Bicheng closed this Feb 29, 2024

Zheng-Bicheng reopened this Feb 29, 2024

Merge branch 'apache:main' into support_nhwc

e08df47

Merge branch 'apache:main' into support_nhwc

1e02af5

Merge branch 'apache:main' into support_nhwc

6e5a810

Merge branch 'apache:main' into support_nhwc

20dd386

jiangjiajun approved these changes Mar 7, 2024

View reviewed changes

jiangjiajun merged commit e005f85 into apache:main Mar 7, 2024
19 checks passed

ysh329 mentioned this pull request Apr 21, 2024

[Release] v0.16.0 Release Candidate Notes #16911

Closed

Zheng-Bicheng mentioned this pull request Apr 25, 2024

paddleseg模型和onnx模型的推理结果不一致，精度差距较大 PaddlePaddle/Paddle2ONNX#1238

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Frontend][PaddlePaddle] PaddlePaddle model with NCHW data format that supports quantization #16651

[Frontend][PaddlePaddle] PaddlePaddle model with NCHW data format that supports quantization #16651

Zheng-Bicheng commented Feb 28, 2024 •

edited

Zheng-Bicheng commented Feb 28, 2024 •

edited

Zheng-Bicheng commented Feb 28, 2024 •

edited

lhutton1 commented Feb 28, 2024

Zheng-Bicheng commented Feb 29, 2024

jiangjiajun commented Feb 29, 2024 •

edited

Zheng-Bicheng commented Feb 29, 2024 •

edited

Zheng-Bicheng commented Feb 29, 2024

Zheng-Bicheng commented Feb 29, 2024

Zheng-Bicheng commented Feb 29, 2024

Zheng-Bicheng commented Feb 29, 2024

Hzfengsy commented Feb 29, 2024

Zheng-Bicheng commented Feb 29, 2024

Zheng-Bicheng commented Feb 29, 2024

Zheng-Bicheng commented Feb 29, 2024

Zheng-Bicheng commented Mar 3, 2024

Zheng-Bicheng commented Mar 3, 2024

Zheng-Bicheng commented Mar 4, 2024

Zheng-Bicheng commented Mar 4, 2024

jiangjiajun left a comment

[Frontend][PaddlePaddle] PaddlePaddle model with NCHW data format that supports quantization #16651

[Frontend][PaddlePaddle] PaddlePaddle model with NCHW data format that supports quantization #16651

Conversation

Zheng-Bicheng commented Feb 28, 2024 • edited

Zheng-Bicheng commented Feb 28, 2024 • edited

Test in PaddlePaddle

Test in ONNX

Zheng-Bicheng commented Feb 28, 2024 • edited

lhutton1 commented Feb 28, 2024

Zheng-Bicheng commented Feb 29, 2024

jiangjiajun commented Feb 29, 2024 • edited

Zheng-Bicheng commented Feb 29, 2024 • edited

Zheng-Bicheng commented Feb 29, 2024

Zheng-Bicheng commented Feb 29, 2024

Zheng-Bicheng commented Feb 29, 2024

Zheng-Bicheng commented Feb 29, 2024

Hzfengsy commented Feb 29, 2024

Zheng-Bicheng commented Feb 29, 2024

Zheng-Bicheng commented Feb 29, 2024

Zheng-Bicheng commented Feb 29, 2024

Zheng-Bicheng commented Mar 3, 2024

Zheng-Bicheng commented Mar 3, 2024

Zheng-Bicheng commented Mar 4, 2024

Zheng-Bicheng commented Mar 4, 2024

jiangjiajun left a comment

Choose a reason for hiding this comment

Zheng-Bicheng commented Feb 28, 2024 •

edited

Zheng-Bicheng commented Feb 28, 2024 •

edited

Zheng-Bicheng commented Feb 28, 2024 •

edited

jiangjiajun commented Feb 29, 2024 •

edited

Zheng-Bicheng commented Feb 29, 2024 •

edited