Skip to content

Is it possible to do INT8/mixed Calibration on one GPU, and use cache with another GPU? #3612

@saryazdi

Description

@saryazdi

Description

I tried doing INT8 calibration using IInt8EntropyCalibrator2 on A40 GPU and dumping the calibration cache to disk. I transferred this cache to a Jetson Nano Orin Devkit, and used below commands on the Orin to convert onnx -> mixed precision TensorRT engine but failed:

  • Attempt1:

    >>> trtexec --onnx=model.onnx --useCudaGraph --calib=model_calibrated_mixed.cache --verbose --saveEngine=model_calibrated_mixed.engine

    Got the error: [network.cpp::validate::2925] Error Code 4: Internal Error (fp16 precision has been set for a layer or layer output, but fp16 is not configured in the builder)

  • Attempt2: Added the --best flag to above command:

    >>> trtexec --onnx=model.onnx --useCudaGraph --calib=model_calibrated_mixed.cache --verbose --saveEngine=model_calibrated_mixed.engine --best

    But I kept getting Setting a default quantization params because quantization data is missing for for every layer. I.e. seems like cache isn't being used.

I have 2 questions:

  1. Is it possible to do INT8/mixed Calibration on one GPU, and use cache with another GPU?
  2. What's the use case for a cache if above is not supported? Is it intended for different checkpoints of the same model to use the same INT8/mixed cache? That seems very suboptimal to me because there's no guarantee that different checkpoints will have a similar range.

Metadata

Metadata

Assignees

Labels

triagedIssue has been triaged by maintainers

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions