Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mix PTQ(int8 & fp16) get error result compare to FP16 by same C++ code #3975

Closed
renshujiajia opened this issue Jul 1, 2024 · 3 comments
Closed

Comments

@renshujiajia
Copy link

renshujiajia commented Jul 1, 2024

Description

i converted an onnx_fp32 etds model to booth fp16 model and mix_ptq model (int8 and fp16, also has been calibrated), i try to infer the model booth by Python and C++ api

  1. for the python: i got perfect result for fp16 and int8_fp16_model
  2. for the C++: i got different result for fp16 and int8_fp16_model, fp16 is correct and int8 comes an error result.

what makes me confused is that, booth two models are infered by the same C++ code, but got different result in C++。

Please help me analyze this🙏

Environment

TensorRT Version: 10.0.1

NVIDIA GPU: RTX4090

NVIDIA Driver Version: 12.0

CUDA Version: 12.0

CUDNN Version: 8.2.0

Operating System: Linux interactive11554 5.11.0-27-generic #29~20.04.1-Ubuntu SMP Wed Aug 11 15:58:17 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux

Python Version (if applicable): 3.8.19

Tensorflow Version (if applicable):

PyTorch Version (if applicable):

Baremetal or Container (if so, version):

Relevant Files

Model link:

Steps To Reproduce

Commands or scripts:

Have you tried the latest release?:

Can this model run on other frameworks? For example run ONNX model with ONNXRuntime (polygraphy run <model.onnx> --onnxrt):

@renshujiajia
Copy link
Author

the sub is my main infer code, i use the code to infer fp16 model got the correct resut of image.

   bool SrInfer::processInput(const std::string&  imagepath)
{    
    const int inputH = 4320;
    const int inputW = 7680;
    if (!imagepath.empty())
    {
        std::cout << "will process a image" << std::endl;

        // read image
        cv::Mat srcimg = cv::imread(imagepath);
        if (srcimg.empty())
        {
            std::cerr << "read image failed" << std::endl;
            return false;
        }
        srcimg.convertTo(srcimg, CV_32F, 1.0 / 255.0);

        // preprocess:使用int16操作
        half* hostDataBuffer = static_cast<half*>(mBuffers->getHostBuffer("input"));
        // 
        int channels = 3;
        int height = 4320;
        int width = 7680;

        // 优化版本
        int channelSize = height * width;
        for (int h = 0; h < height; ++h) {
            for (int w = 0; w < width; ++w) {
                for (int c = 0; c < channels; ++c) {
                    int dstIdx = c * channelSize + h * width + w;
                                int srcIdx = (h * width + w) * channels + c;
                    float normalized_pixel = srcimg.ptr<float>(h)[w * channels + c];
                    hostDataBuffer[dstIdx] = __float2half(normalized_pixel);
                }
            }
        }
    }


    return true;
}

bool SrInfer::postProcess(const std::string& outputpath)
{

    // 从host device的输出buffer中读取输出张量并进行后处理,而后完成推理结果图像保存
    half* hostResultBuffer = static_cast<half*>(mBuffers->getHostBuffer("output"));

    std::cout << " the output size is " << mBuffers->size("output") << std::endl;

    // chw->hwc
    int channels = 3;
    int height = 4320 * 2;
    int width = 7680 * 2;

    float* data_fp32 = new float[channels * height * width];
    
    // 优化版本
    int planeSize = height * wi
    int planeSize = height * width;
    for (int h = 0; h < height; ++h) {
        for (int w = 0; w < width; ++w) {
            for (int c = 0; c < channels; ++c) {
                // 计算FP16和FP32数组中的索引
                int index_fp16 = c * planeSize + h * width + w;
                int index_fp32 = h * width * channels + w * channels + c;
                
                // 转换并赋值
                data_fp32[index_fp32] = __half2float(hostResultBuffer[index_fp16]);
            }
        }
    }
    

    cv::Mat image(height, width, CV_32FC3, data_fp32);
    // image.convertTo(image, CV_8UC3, 255.0);
    image = image * 255.0;

    image.convertTo(image, CV_8UC3);
    cv::imwrite(outputpath, image);

    delete[] data_fp32;
    return true;
}



bool SrInfer::srInfer(const std::string& inputimage, const std::string& outputimage) 
{

    // 确定binds
    for (int32_t i = 0, e = mEngine->getNbIOTensors(); i < e; i++)
    {
        auto const name = mEngine->getIOTensorName(i);
        mContext->setTensorAddress(name, mBuffers->getDeviceBuffer(name));
    }

    // Read the input data into the managed buffers
    // ASSERT(mParams.inputTensorNames.size() == 1); // only one input for now
    if (!processInput(inputimage))
    {
            return false;
    }

    mBuffers->copyInputToDevice();
    auto time_start = std::chrono::steady_clock::now();
    bool status = mContext->executeV2(mBuffers->getDeviceBindings().data());
    auto time_end = std::chrono::steady_clock::now();
    std::cout << "infer time is " << std::chrono::duration_cast<std::chrono::milliseconds>(time_end - time_start).count() << "ms" << std::endl;
    if (!status)
    {
        return false;
    }


    // // Memcpy from device output buffers to host output buffers
    mBuffers->copyOutputToHost();
    if (!postProcess(outputimage))
    {
        return false;
    }


}

@lix19937
Copy link

lix19937 commented Jul 1, 2024

Maybe

    half* hostResultBuffer = static_cast<half*>(mBuffers->getHostBuffer("output"));

in your postProcess function is not right for int8 case. You should make sure your out tensor's data type.

If you provide the layinfo, maybe more helpful.

@renshujiajia
Copy link
Author

Maybe

    half* hostResultBuffer = static_cast<half*>(mBuffers->getHostBuffer("output"));

in your postProcess function is not right for int8 case. You should make sure your out tensor's data type.

If you provide the layinfo, maybe more helpful.

Thanks a lot, i thougt the input type of int8_mix_fp16 model was same as fp16 model🤣,
thanks again

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants