-
Notifications
You must be signed in to change notification settings - Fork 2.3k
Closed
Labels
triagedIssue has been triaged by maintainersIssue has been triaged by maintainers
Description
Hello,
I'm currently working to understand the performance distinction between fp16 and int8 quantization of my model using trtexec. I would like to know what insights I can get from the trtexec logs.
Environment Details: (using pytorch:23.07-py3 docker image)
- TensorRT Version: v8.6.1.6
- Driver Version: 470.82.01
- CUDA Version: 12.1
- GPU: V100
I've attached the logs for the following commands:
trtexec --onnx=onnx/OneLayer.onnx --verbose --fp16trtexec --onnx=onnx/OneLayer.onnx --verbose --int8 --fp16
OneLayer_fp16.txt
OneLayer_quantized_int8_fp16.txt
For context, I've provided a snippet of how I quantized and exported my model:
from pytorch_quantization import nn as quant_nn
from pytorch_quantization import quant_modules
class OneLayer(nn.Module):
def __init__(self,) -> None:
super().__init__()
d_model = 512
d_ff = 2048
self.lin1 = nn.Linear(d_model, d_ff, bias=False)
def forward(self, x):
return self.lin1(x)
block = OneLayer()
quant_nn.TensorQuantizer.use_fb_fake_quant = True
quant_modules.initialize()
torch.onnx.export(
block,
torch.rand(input_shape),
os.path.join(dest_dir, save_name),
verbose=False,
input_names=["x"]
)If someone could direct me to an article or resource on how to interpret trtexec logs and the insights that can be extracted from them, e.g. which kernels are used, I would be very grateful.
Thank you for your assistance!
Metadata
Metadata
Assignees
Labels
triagedIssue has been triaged by maintainersIssue has been triaged by maintainers