Quant tensorrt engine don't achieve advantage in inference speed over fp16 on A100

I replace the unet.trt10.0.1.6.post12.dev1.engine built from [Build the TRT engine for the INT8 Quantized ONNX UNet](https://github.com/NVIDIA/TensorRT-Model-Optimizer/tree/main/diffusers#build-the-trt-engine-for-the-int8-quantized-onnx-unet) to engine/unet.trt10.0.1.plan .  AND rerun the textdemo(python demo_txt2img.py "enchanted winter forest, soft diffuse light on a snow-filled day, serene nature scene, the forest is illuminated by the snow" --negative-prompt "normal quality, low quality, worst quality, low res, blurry, nsfw, nude" --scheduler Euler --denoising-steps 30 --seed 2946901).

Is there something I missed or what mistake I made? Any apply will be appreciated.

fp16
![image](https://github.com/NVIDIA/TensorRT-Model-Optimizer/assets/26063296/07e0f550-6847-4bc2-a5eb-853f39276aeb)
![image](https://github.com/NVIDIA/TensorRT-Model-Optimizer/assets/26063296/f5621e4d-2882-4346-a9cc-957c430dc033)

int8
![image](https://github.com/NVIDIA/TensorRT-Model-Optimizer/assets/26063296/14cfe2db-324d-4929-8f9b-a4f8d38fde55)
![image](https://github.com/NVIDIA/TensorRT-Model-Optimizer/assets/26063296/5eadcc98-2046-4f75-9c5a-3991b7daa27b)



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Quant tensorrt engine don't achieve advantage in inference speed over fp16 on A100 #3

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Quant tensorrt engine don't achieve advantage in inference speed over fp16 on A100 #3

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions