Description
I tried doing INT8 calibration using IInt8EntropyCalibrator2 on A40 GPU and dumping the calibration cache to disk. I transferred this cache to a Jetson Nano Orin Devkit, and used below commands on the Orin to convert onnx -> mixed precision TensorRT engine but failed:
-
Attempt1:
>>> trtexec --onnx=model.onnx --useCudaGraph --calib=model_calibrated_mixed.cache --verbose --saveEngine=model_calibrated_mixed.engine
Got the error: [network.cpp::validate::2925] Error Code 4: Internal Error (fp16 precision has been set for a layer or layer output, but fp16 is not configured in the builder)
-
Attempt2: Added the --best flag to above command:
>>> trtexec --onnx=model.onnx --useCudaGraph --calib=model_calibrated_mixed.cache --verbose --saveEngine=model_calibrated_mixed.engine --best
But I kept getting Setting a default quantization params because quantization data is missing for for every layer. I.e. seems like cache isn't being used.
I have 2 questions:
- Is it possible to do INT8/mixed Calibration on one GPU, and use cache with another GPU?
- What's the use case for a cache if above is not supported? Is it intended for different checkpoints of the same model to use the same INT8/mixed cache? That seems very suboptimal to me because there's no guarantee that different checkpoints will have a similar range.
Description
I tried doing INT8 calibration using
IInt8EntropyCalibrator2on A40 GPU and dumping the calibration cache to disk. I transferred this cache to a Jetson Nano Orin Devkit, and used below commands on the Orin to convertonnx -> mixed precision TensorRT enginebut failed:Attempt1:
Got the error:
[network.cpp::validate::2925] Error Code 4: Internal Error (fp16 precision has been set for a layer or layer output, but fp16 is not configured in the builder)Attempt2: Added the
--bestflag to above command:But I kept getting
Setting a default quantization params because quantization data is missing forfor every layer. I.e. seems like cache isn't being used.I have 2 questions: