I benchmarked my model with fp16 + int8 engine and fp16 engine and it seems fp16 is faster somehow even though its a timm convnext model with a lot of convolution layers.
Here is a quantized model : https://drive.google.com/file/d/1kFJnHLcFAVFWyrIEvJz-l3TKW0WgsasZ/view?usp=sharing demonstrating my issue.
I benchmarked my model with fp16 + int8 engine and fp16 engine and it seems fp16 is faster somehow even though its a timm convnext model with a lot of convolution layers.
Here is a quantized model : https://drive.google.com/file/d/1kFJnHLcFAVFWyrIEvJz-l3TKW0WgsasZ/view?usp=sharing demonstrating my issue.