Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Frontend][PaddlePaddle] PaddlePaddle model with NCHW data format that supports quantization #16651

Merged
merged 17 commits into from Mar 7, 2024

Conversation

Zheng-Bicheng
Copy link
Contributor

@Zheng-Bicheng Zheng-Bicheng commented Feb 28, 2024

PaddlePaddle model with NCHW data format that supports quantization

@Zheng-Bicheng Zheng-Bicheng changed the title [Frontend][PaddlePaddle] PaddlePaddle model with NHWC data format that supports quantization [Frontend][PaddlePaddle] PaddlePaddle model with NCHW data format that supports quantization Feb 28, 2024
@Zheng-Bicheng
Copy link
Contributor Author

Zheng-Bicheng commented Feb 28, 2024

@jiangjiajun I have added support for the PaddleSlim quantization model, but TVM seems to have issues in this regard.

Test in PaddlePaddle

I used MobileNetV1_QAT trained on PaddleSlim for testing, and the test code is:

import paddle
import tvm
from tvm import relay
from tvm.contrib import graph_executor
import numpy as np

log_file = "tune.json"
if __name__ == "__main__":
    input_shape = [1, 3, 224, 224]
    input_name = "inputs"

    paddle.enable_static()
    prefix = "MobileNetV1_QAT/inference"
    params_file_path = prefix + ".pdiparams"
    exe = paddle.static.Executor(paddle.CPUPlace())
    prog, feed_target_names, fetch_targets = paddle.static.load_inference_model(prefix, exe)
    # build
    mod, params = relay.frontend.from_paddle(prog, shape_dict={input_name: input_shape})

    with tvm.transform.PassContext(opt_level=5):
        lib = relay.build(mod, target="llvm", params=params)
    # create input data
    input_data = np.random.randn(1, 3, 224, 224).astype(np.float32)

    # tvm inference
    ctx = tvm.cpu()
    tvm_model = graph_executor.GraphModule(lib['default'](ctx))
    tvm_model.set_input(input_name, input_data)
    tvm_model.run()
    tvm_output = tvm_model.get_output(0).asnumpy()

    # paddle inference
    paddle_output, = exe.run(prog, feed={feed_target_names[0]: input_data}, fetch_list=fetch_targets)
    print(np.argmax(tvm_output[0]), np.argmax(paddle_output[0]))
    np.testing.assert_allclose(tvm_output[0], paddle_output[0], rtol=1e-5, atol=1e-5)

I found that the test failed with the following error:

AssertionError: 
Not equal to tolerance rtol=1e-05, atol=1e-05

Mismatched elements: 5 / 1000 (0.5%)
Max absolute difference: 0.02359476
Max relative difference: 1.
 x: array([0.      , 0.      , 0.      , 0.      , 0.      , 0.      ,
       0.      , 0.      , 0.      , 0.      , 0.      , 0.      ,
       0.      , 0.      , 0.      , 0.      , 0.      , 0.      ,...
 y: array([0.      , 0.      , 0.      , 0.      , 0.      , 0.      ,
       0.      , 0.      , 0.      , 0.      , 0.      , 0.      ,
       0.      , 0.      , 0.      , 0.      , 0.      , 0.      ,...

Test in ONNX

To verify whether the problem is caused by the different inference mechanisms of TVM and Paddle frameworks, I conducted additional testing on the ONNX framework. The input model is the same model exported from Paddle2ONNX, and the test code is as follows:

import tvm
from tvm import relay
from tvm.contrib import graph_executor
import numpy as np
import onnx
import onnxruntime as rt

onnx_model_path = "MobileNetV1_QAT/inference.onnx"
log_file = "tune.json"
if __name__ == "__main__":
    input_shape = [1, 3, 224, 224]
    input_name = "inputs"

    # build
    onnx_model = onnx.load_model(onnx_model_path)
    mod, params = relay.frontend.from_onnx(onnx_model, shape={input_name: input_shape})

    with tvm.transform.PassContext(opt_level=5):
        lib = relay.build(mod, target="llvm", params=params)
    # create input data
    input_data = np.random.randn(1, 3, 224, 224).astype(np.float32)

    # tvm inference
    ctx = tvm.cpu()
    tvm_model = graph_executor.GraphModule(lib['default'](ctx))
    tvm_model.set_input(input_name, input_data)
    tvm_model.run()
    tvm_output = tvm_model.get_output(0).asnumpy()

    sess = rt.InferenceSession(onnx_model_path,None)
    input_name = sess.get_inputs()[0].name
    out_name = sess.get_outputs()[0].name
    onnx_output = sess.run([out_name], {input_name:input_data})[0]

    print(np.max(tvm_output[0] - onnx_output[0]))
    print(np.argmax(tvm_output[0] - onnx_output[0]))
    np.testing.assert_allclose(tvm_output[0], onnx_output[0], rtol=1e-5, atol=1e-5)

I found that I still couldn't pass the test and the error was very large

Mismatched elements: 270 / 1000 (27%)
Max absolute difference: 0.01025282
Max relative difference: 0.52028346
 x: array([6.539907e-06, 6.727985e-05, 5.355392e-05, 4.267169e-06,
       1.363640e-04, 1.172559e-04, 1.836461e-04, 6.608244e-06,
       7.928120e-06, 3.821332e-06, 1.304903e-05, 4.503604e-05,...
 y: array([6.145719e-06, 7.556988e-05, 5.032599e-05, 4.009968e-06,
       1.281447e-04, 1.101883e-04, 1.725770e-04, 6.209937e-06,
       7.450255e-06, 4.292187e-06, 1.465689e-05, 4.232152e-05,...

@Zheng-Bicheng
Copy link
Contributor Author

Zheng-Bicheng commented Feb 28, 2024

@jiangjiajun Is this error acceptable for the PaddleInference model? I tested using a single convolution operator, and in most cases, it meets the requirement of relative and absolute errors within 10 ^ -5. Currently, it is estimated that the accumulation of errors from multiple convolution operators may have caused a change in the output results of the model.

@lhutton1
Copy link
Contributor

@tvm-bot rerun

@Zheng-Bicheng
Copy link
Contributor Author

@lhutton1, I'm glad you could help me rerun the CI. However, the errors in CI [unity/pr-head] seem to remain unresolved, and the tests still fail.

@jiangjiajun
Copy link
Contributor

jiangjiajun commented Feb 29, 2024

@jiangjiajun 针对与PaddleInference模型来说,这个误差应该是可以接受的?我用单一卷积算子进行了测试,在大多数情况下单个卷积算子是满足相对和绝对误差在 10^-5内的要求的,目前判断应该是是多个卷积算子误差的累积导致最后模型的输出结果发生了改变。

How about the difference between quantized paddle model and quantized onnx model?

@Zheng-Bicheng
Copy link
Contributor Author

Zheng-Bicheng commented Feb 29, 2024

@jiangjiajun I integrated the inference code for TVM, PaddlePaddle, and ONNX. The code is as follows:

import paddle
import tvm
from tvm import relay
from tvm.contrib import graph_executor
import numpy as np
import onnx
import onnxruntime as rt

# Model Attr
input_shape = [1, 3, 224, 224]
input_name = "inputs"


def infer_by_paddlepaddle(temp_prefix, temp_input_data):
    paddle.enable_static()
    exe = paddle.static.Executor(paddle.CPUPlace())
    temp_prog, feed_target_names, fetch_targets = paddle.static.load_inference_model(temp_prefix, exe)
    temp_output, = exe.run(temp_prog, feed={feed_target_names[0]: temp_input_data}, fetch_list=fetch_targets)
    return temp_prog, temp_output


def infer_by_onnx(temp_model_path, temp_input_data):
    sess = rt.InferenceSession(temp_model_path, None)
    temp_input_name = sess.get_inputs()[0].name
    out_name = sess.get_outputs()[0].name
    temp_onnx_output = sess.run([out_name], {temp_input_name: temp_input_data})[0]
    temp_onnx_model = onnx.load_model(temp_model_path)
    return temp_onnx_model, temp_onnx_output


def infer_by_tvm(temp_model, temp_input_data):
    if isinstance(temp_model, paddle.static.Program):
        # model is loaded by `paddle.static.load_inference_model`
        mod, params = relay.frontend.from_paddle(temp_model, shape_dict={input_name: input_shape})
    else:
        mod, params = relay.frontend.from_onnx(temp_model, shape={input_name: input_shape})

    with tvm.transform.PassContext(opt_level=5):
        lib = relay.build(mod, target="llvm", params=params)

    # tvm inference
    ctx = tvm.cpu()
    tvm_model = graph_executor.GraphModule(lib['default'](ctx))
    tvm_model.set_input(input_name, temp_input_data)
    tvm_model.run()
    tvm_output = tvm_model.get_output(0).asnumpy()
    return tvm_output


log_file = "tune.json"
if __name__ == "__main__":
    np.random.seed(520)
    # create input data
    input_data = np.random.randn(1, 3, 224, 224).astype(np.float32)

    paddle_prefix = "MobileNetV1_QAT/inference"
    paddle_model, paddle_output = infer_by_paddlepaddle(paddle_prefix, input_data)

    onnx_model_path = "MobileNetV1_QAT/inference.onnx"
    onnx_model, onnx_output = infer_by_paddlepaddle(paddle_prefix, input_data)

    # 对比测试Paddle模型和ONNX模型的输出(通过测试)
    np.testing.assert_allclose(paddle_output[0], onnx_output[0], rtol=1e-5, atol=1e-5)

    # 测试TVM_Paddle模型和TVM_ONNX模型的输出(通过测试)
    tvm_paddle_result = infer_by_tvm(paddle_model, input_data)
    tvm_onnx_result = infer_by_tvm(onnx_model, input_data)
    np.testing.assert_allclose(tvm_paddle_result[0], tvm_onnx_result[0], rtol=1e-5, atol=1e-5)

    # 测试Paddle模型和TVM_Paddle模型的输出
    # np.testing.assert_allclose(tvm_paddle_result[0], paddle_output[0], rtol=1e-5, atol=1e-5)

    # 测试ONNX模型和TVM_ONNX模型的输出
    np.testing.assert_allclose(tvm_onnx_result[0], onnx_output[0], rtol=1e-5, atol=1e-5)

I found that when inputting the same data, the output data of the Paddle model and the ONNX model are consistent.

The differences between TVM and Paddle are as follows:

Mismatched elements: 4 / 1000 (0.4%)
Max absolute difference: 0.01572984
Max relative difference: 1.
 x: array([0.      , 0.      , 0.      , 0.      , 0.      , 0.      ,
       0.      , 0.      , 0.      , 0.      , 0.      , 0.      ,
       0.      , 0.      , 0.      , 0.      , 0.      , 0.      ,...
 y: array([0.      , 0.      , 0.      , 0.      , 0.      , 0.      ,
       0.      , 0.      , 0.      , 0.      , 0.      , 0.      ,
       0.      , 0.      , 0.      , 0.      , 0.      , 0.      ,...

The differences between TVM and ONNX are as follows:

Mismatched elements: 4 / 1000 (0.4%)
Max absolute difference: 0.01572984
Max relative difference: 1.
 x: array([0.      , 0.      , 0.      , 0.      , 0.      , 0.      ,
       0.      , 0.      , 0.      , 0.      , 0.      , 0.      ,
       0.      , 0.      , 0.      , 0.      , 0.      , 0.      ,...
 y: array([0.      , 0.      , 0.      , 0.      , 0.      , 0.      ,
       0.      , 0.      , 0.      , 0.      , 0.      , 0.      ,
       0.      , 0.      , 0.      , 0.      , 0.      , 0.      ,...

Therefore, my initial statement should be considered incorrect; under the same data conditions, both the Paddle model and the ONNX model exhibit the same symptoms.

@Zheng-Bicheng
Copy link
Contributor Author

@tvm-bot rerun tvm-wasm

@Zheng-Bicheng
Copy link
Contributor Author

Hello, @Hzfengsy. I noticed you have submitted PRs related to tvm-bot. I'd like to ask you, how can we rerun only the failed unit tests instead of using the "tvm-bot rerun" command to rerun all CI tests? This would help speed up the merging process for PRs.

@Zheng-Bicheng
Copy link
Contributor Author

@tvm-bot rerun

1 similar comment
@Zheng-Bicheng
Copy link
Contributor Author

@tvm-bot rerun

@Hzfengsy
Copy link
Member

@Zheng-Bicheng Good question. We do not have such a mechanism, because it's unsafe to only test failed tests. For example, we fixed test A which failed the last time, but may introduce a new failure test B.

It's a good method when debugging locally, but not suitable for CI

@Zheng-Bicheng
Copy link
Contributor Author

@tvm-bot rerun

@Zheng-Bicheng
Copy link
Contributor Author

@Zheng-Bicheng Good question. We do not have such a mechanism, because it's unsafe to only test failed tests. For example, we fixed test A which failed the last time, but may introduce a new failure test B.

It's a good method when debugging locally, but not suitable for CI

I understand what you mean, but I've found that CI currently encounters some unknown errors. For example:

CI[lint/pr-head] (Lint 1 of 2) log : log.txt

Each rerun may result in a different CI failure, and I haven't figured out what's causing it. It seems unrelated to the code I've submitted.

@Zheng-Bicheng
Copy link
Contributor Author

@tvm-bot rerun

@Zheng-Bicheng
Copy link
Contributor Author

@tvm-bot rerun

1 similar comment
@Zheng-Bicheng
Copy link
Contributor Author

@tvm-bot rerun

@Zheng-Bicheng
Copy link
Contributor Author

@tvm-bot rerun

@Zheng-Bicheng
Copy link
Contributor Author

@tvm-bot rerun

Copy link
Contributor

@jiangjiajun jiangjiajun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@jiangjiajun jiangjiajun merged commit e005f85 into apache:main Mar 7, 2024
19 checks passed
Lunderberg pushed a commit to Lunderberg/tvm that referenced this pull request Mar 12, 2024
…t supports quantization (apache#16651)

* support conv2d when data_format is NHWC

* modify the annotation

* Do not convert input data when processing quantization conv_2d nodes

* Fix code formatting issues

* fixed error code format

* update dequantize and quantize

* fixed bug when model is fp32 model

* update dequantize and quantize

* update for paddle quantize model when format is NCHW
thaisacs pushed a commit to thaisacs/tvm that referenced this pull request Apr 3, 2024
…t supports quantization (apache#16651)

* support conv2d when data_format is NHWC

* modify the annotation

* Do not convert input data when processing quantization conv_2d nodes

* Fix code formatting issues

* fixed error code format

* update dequantize and quantize

* fixed bug when model is fp32 model

* update dequantize and quantize

* update for paddle quantize model when format is NCHW
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants