Skip to content

Support for low bit-width Quantization in TensorRT #3245

@hachons

Description

@hachons

I'm interested in knowing if TensorRT currently supports quantization for 2-bit and 4-bit models. Could you confirm the same for mixed precision as well? Additionally, I'd appreciate insights on accuracy metrics and any potential challenges associated with achieving accurate inference at such low bit-widths.

Metadata

Metadata

Assignees

Labels

triagedIssue has been triaged by maintainers

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions