-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
In trt10.0.1, these two APIs: setPrecision and setOutputType do not work #3941
Comments
I suggest use |
I have done this, and it works on 8.6, but fails on 10.0.1:
|
On trt10.0.1, try to use trtexec --fp16 --onnx=sample.onnx --precisionConstraints="obey" --layerPrecisions=${layer_precision} --layerOutputTypes=${layer_precision} --saveEngine=sample.trt --builderOptimizationLevel=5
|
I have added --builderOptimizationLevel=5, but it still overflows |
You can compare the tactic between two version. |
Thank you very much for your reply,after setting builderOptimizationLevel to 5, cache cannot be generated in trt86, but can be generated in trt10. |
Description
We have a model that overflows when using fp16, so we use layer-precision to limit it and let some layers use fp32. It worked in version 8.6 and we could infer normal results. But after upgrading to 10.0.1, we found that the model output overflowed. Using polygraphy, we found that nan was already generated at the first overflow location (Is setprecison and setoutputType invalid?)
Environment
TensorRT Version:
10.0.1
NVIDIA GPU:
3090 & 3080
NVIDIA Driver Version:
550
CUDA Version:
cuda-12.2
Steps To Reproduce
my code is like this:
By the way, I have already set kOBEY_PRECISION_CONSTRAINTS
env.config_->setFlag(nvinfer1::BuilderFlag::kOBEY_PRECISION_CONSTRAINTS);
The text was updated successfully, but these errors were encountered: