-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Detected subnormal FP16 values. --precisionConstraints --layerPrecisions did't work #2600
Comments
@zerollzeng |
Can you try |
@zerollzeng |
@zerollzeng I dont undstand what '||' means, "OR" "and" ,other how to process the unnamed* layers |
This is a warning, it might not affect the final accuracy, if you want to get rid of this warning totally I think the better solution is to retrain your model and restrict all the weights under the FP16 range. |
it affect the accuracy |
you don't need to care about |
Fall back many layer to FP32 may lead to huge degradation on performance, that's why I suggest retrain the model. |
ok Thanks |
I am having a similiar issue when using torch_tensorrt. How can I retrain my model within the FP16 range? I tried retraining using Gradscaler as indicated in AMP training tutorials for Pytorch. Are you meaning a different way? And if so, which way is that? Thank you. |
I would suggest you use tensorrt python api for easy experiments and to read this similar issue #2922 |
closing inactive issues, thanks all! |
Could tell me how to restrict all the weights under the FP16 range? it confuse me a long time, thank you. |
Description
I convert an ONNX model which is very complex to an tensortrt fp32 model, the outputs of the onnx model and trtfp32 model are the same the commd as follows:
trtexec --onnx=encoder2.onnx --fp16 --saveEngine=encoderfp16.trt --useCudaGraph --verbose --tacticSources=-cublasLt,+cublas --workspace=10240M --minShapes=src_tokens:1x1000 --optShapes=src_tokens:1x100000 --maxShapes=src_tokens:1x700000 --preview=+fasterDynamicShapes0805 >log.en
all the thing is ok
However when I conert the onnx model to an tensort fp16 the output is very different and some weights affected
[01/13/2023-10:07:54] [W] [TRT] onnx2trt_utils.cpp:377: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[01/13/2023-10:09:43] [W] [TRT] Using kFASTER_DYNAMIC_SHAPES_0805 preview feature.
[01/13/2023-10:20:45] [W] [TRT] TensorRT encountered issues when converting weights between types and that could affect accuracy.
[01/13/2023-10:20:45] [W] [TRT] If this is not the desired behavior, please modify the weights or retrain with regularization to adjust the magnitude of the weights.
[01/13/2023-10:20:45] [W] [TRT] Check verbose logs for the list of affected weights.
[01/13/2023-10:20:45] [W] [TRT] - 254 weights are affected by this issue: Detected subnormal FP16 values.
[01/13/2023-10:20:45] [W] [TRT] - 31 weights are affected by this issue: Detected values less than smallest positive FP16 subnormal value and converted them to the FP16 minimum subnormalized value.
detail log in log.en
List of affected weights: Conv_240.weight, Conv_263.weight, Conv_286.weight, Conv_309.weight, Conv_332.weight, Conv_355.weight, Conv_378.weight, Conv_416.weight, Conv_5165.bias, Conv_5165.weight, Conv_5169.weight, Conv_5173.bias, Conv_5173.weight, Gemm_1046.bias, Gemm_1046.weight, Gemm_1239.weight, Gemm_1432.bias, Gemm_1432.weight, Gemm_1625.weight, Gemm_181.....
I want to use --precisionConstraints and --layerPrecisions to restrict some weiths to fp32,the commd as follows:
trtexec --onnx=encoder2.onnx --fp16 --saveEngine=encoderfp16.trt --useCudaGraph --verbose --tacticSources=-cublasLt,+cublas --workspace=10240M --minShapes=src_tokens:1x1000 --optShapes=src_tokens:1x100000 --maxShapes=src_tokens:1x700000 --preview=+fasterDynamicShapes0805 --precisionConstraints=obey --layerPrecisions=Conv_240.weight:fp32, Conv_263.weight:fp32, Conv_286.weight:fp32 >log.en1
But the log output are the same,
[01/13/2023-10:42:38] [W] [TRT] onnx2trt_utils.cpp:377: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[01/13/2023-10:44:28] [W] [TRT] Using kFASTER_DYNAMIC_SHAPES_0805 preview feature.
[01/13/2023-10:55:24] [W] [TRT] TensorRT encountered issues when converting weights between types and that could affect accuracy.
[01/13/2023-10:55:24] [W] [TRT] If this is not the desired behavior, please modify the weights or retrain with regularization to adjust the magnitude of the weights.
[01/13/2023-10:55:24] [W] [TRT] Check verbose logs for the list of affected weights.
[01/13/2023-10:55:24] [W] [TRT] - 254 weights are affected by this issue: Detected subnormal FP16 values.
[01/13/2023-10:55:24] [W] [TRT] - 31 weights are affected by this issue: Detected values less than smallest positive FP16 subnormal value and converted them to the FP16 minimum subnormalized value.
[01/13/2023-10:55:26] [W] [TRT] CUDA lazy loading is not enabled. Enabling it can significantly reduce device memory usage. See
CUDA_MODULE_LOADING
in https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#env-varsdetail log in log.en1
A clear and concise description of the bug or issue.
Environment
TensorRT Version: TensorRT-8.5.2.2.Linux.x86_64-gnu.cuda-11.8.cudnn8.6.tar.gz
GPU Type: Tesla V100-PCIE
CUDA Version: 11.6
CUDNN Version: cudnn-linux-x86_64-8.6.0.163_cuda11-archive
PyTorch Version (if applicable): 1.12.0
onnx: 1.12.0
Relevant Files
Please attach or include links to any models, data, files, or scripts necessary to reproduce your issue. (Github repo, Google Drive, Dropbox, etc.)
Steps To Reproduce
Please include:
The text was updated successfully, but these errors were encountered: