Skip to content

Please help, ModelOPT int8 quantized model runs slower than fp16 quantized model. #80

@Rajjeshwar

Description

@Rajjeshwar

I benchmarked my model with fp16 + int8 engine and fp16 engine and it seems fp16 is faster somehow even though its a timm convnext model with a lot of convolution layers.

Here is a quantized model : https://drive.google.com/file/d/1kFJnHLcFAVFWyrIEvJz-l3TKW0WgsasZ/view?usp=sharing demonstrating my issue.

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions