Understanding int8 vs fp16 Performance Differences with trtexec Quantization Logs

Hello,

I'm currently working to understand the performance distinction between fp16 and int8 quantization of my model using `trtexec`. I would like to know what insights I can get from the `trtexec` logs.

**Environment Details**: (using `pytorch:23.07-py3` docker image)
- TensorRT Version: v8.6.1.6
- Driver Version: 470.82.01
- CUDA Version: 12.1
- GPU: V100

I've attached the logs for the following commands:
- `trtexec --onnx=onnx/OneLayer.onnx --verbose --fp16`
- `trtexec --onnx=onnx/OneLayer.onnx --verbose --int8 --fp16`
[OneLayer_fp16.txt](https://github.com/NVIDIA/TensorRT/files/12291546/OneLayer_fp16.txt)
[OneLayer_quantized_int8_fp16.txt](https://github.com/NVIDIA/TensorRT/files/12299577/OneLayer_quantized_int8_fp16.txt)



For context, I've provided a snippet of how I quantized and exported my model:
```python
from pytorch_quantization import nn as quant_nn
from pytorch_quantization import quant_modules

class OneLayer(nn.Module):
    def __init__(self,) -> None:
        super().__init__()
        d_model = 512
        d_ff = 2048
        self.lin1 = nn.Linear(d_model, d_ff, bias=False)

    def forward(self, x):
        return self.lin1(x)

block = OneLayer()

quant_nn.TensorQuantizer.use_fb_fake_quant = True

quant_modules.initialize()

torch.onnx.export(
    block,
    torch.rand(input_shape),
    os.path.join(dest_dir, save_name),
    verbose=False,
    input_names=["x"]
)
```

If someone could direct me to an article or resource on how to interpret `trtexec` logs and the insights that can be extracted from them, e.g. which kernels are used, I would be very grateful.

Thank you for your assistance!



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Understanding int8 vs fp16 Performance Differences with trtexec Quantization Logs #3200

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Understanding int8 vs fp16 Performance Differences with trtexec Quantization Logs #3200

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions