Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to make PTQ calibration for a Hybrid Quantization model (int8 & fp16) #3978

Open
renshujiajia opened this issue Jul 3, 2024 · 3 comments

Comments

@renshujiajia
Copy link

Description

what is the right way to calibrate a hybrid quantization model ?
i built my tensorrt engine from ONNX model by the sub code, i selected the class Calibrator(trt.IInt8EntropyCalibrator2) to set the config.int8_calibrator

My hybrid-quantized super-resolution model's inference results are biased towards magenta. I have performed clipping operations; what could be the possible reason for this? Is there an issue with my calibration code? Or could it be due to a poor distribution of the calibration dataset? i am sure that my infer program is absolute right.
image

def build_engine_onnx(model_file, engine_file_path, min_shape, opt_shape, max_shape, calibration_stream):
    logger = trt.Logger(trt.Logger.INFO)
    builder = trt.Builder(logger)
    network = builder.create_network(1 << int(trt.NetworkDefinitionCreationFlag.EXPLICIT_BATCH))
    parser = trt.OnnxParser(network, logger)
    
    config = builder.create_builder_config()
    config.set_memory_pool_limit(trt.MemoryPoolType.WORKSPACE, 2 << 30)                             # 1GB,即1024MB
    config.set_flag(trt.BuilderFlag.FP16)
    config.set_flag(trt.BuilderFlag.INT8)
    
    
    # 启用强类型匹配
    # config.set_flag(trt.BuilderFlag.GPU_FALLBACK)
    # print(dir(trt.BuilderFlag))
    
    # Add calibrator
    calibrator = Calibrator(calibration_stream, 'calibration.cache')
    config.int8_calibrator = calibrator

    with open(model_file, 'rb') as model:
        if not parser.parse(model.read()):
            for error in range(parser.num_errors):
                print(parser.get_error(error))
            return None

    profile = builder.create_optimization_profile()
    input_name = network.get_input(0).name
    
    # 设置多种输入张量维度
    # profile.set_shape(input_name, min_shape, opt_shape, max_shape)
    
    # 固定输入张量维度
    network.get_input(0).shape = fixed_shape            # 直接采用固定shape输入进行
    config.add_optimization_profile(profile)

    print(f"Building TensorRT engine from file {model_file}...")
    # engine = builder.build_engine(network, config)
    plan = builder.build_serialized_network(network, config)
    # if plan is None:
    #     raise RuntimeError("Failed to build the TensorRT engine!")

    # engine = runtime.deserialize_cuda_engine(plan)
    # print("Completed creating Engine")
    with open(engine_file_path, "wb") as f:
        f.write(bytearray(plan))
    return plan

Environment

TensorRT Version: 10.0.1

NVIDIA GPU: RTX4090

NVIDIA Driver Version: 12.0

CUDA Version: 12.0

CUDNN Version: 8.2.0

Operating System: Operating System: Linux interactive11554 5.11.0-27-generic #29 SMP Wed Aug 11 15:58:17 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux

Python Version (if applicable): 3,8,19

@lix19937
Copy link

lix19937 commented Jul 4, 2024

Try to add

profile.set_shape(input_name, opt_shape, opt_shape, opt_shape) # for fixed shape  

before config.add_optimization_profile(profile)

And check your preprocess code, or try minmax calibrator.

@renshujiajia
Copy link
Author

Try to add

profile.set_shape(input_name, opt_shape, opt_shape, opt_shape) # for fixed shape  

before config.add_optimization_profile(profile)

And check your preprocess code, or try minmax calibrator.

thanks alot, i will try the minmax calibrator, but isn't network.get_input(0).shape = opt_shape and profile.set_shape(input_name, opt_shape, opt_shape, opt_shape) # for fixed shape serve the same purpose? the exported model information is as follows:

 input id:  0    istis input:  True      binding name:  input    shape:  (1, 3, 4320, 7680)      type:  DataType.FLOAT
 input id:  1    istis input:  False     binding name:  output   shape:  (1, 3, 8640, 15360)     type:  DataType.FLOAT 

@lix19937
Copy link

lix19937 commented Jul 4, 2024

If not profile.set_shape , your profile is empty. In fact, for fixed shape model, need not care optimization_profile.

network.get_input(0).shape = opt_shape
and
profile.set_shape(input_name, opt_shape, opt_shape, opt_shape)
are diff roles.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants