Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Detected subnormal FP16 values. --precisionConstraints --layerPrecisions did't work #2600

Closed
zll0000 opened this issue Jan 13, 2023 · 14 comments
Assignees
Labels
triaged Issue has been triaged by maintainers

Comments

@zll0000
Copy link

zll0000 commented Jan 13, 2023

Description

I convert an ONNX model which is very complex to an tensortrt fp32 model, the outputs of the onnx model and trtfp32 model are the same the commd as follows:
trtexec --onnx=encoder2.onnx --fp16 --saveEngine=encoderfp16.trt --useCudaGraph --verbose --tacticSources=-cublasLt,+cublas --workspace=10240M --minShapes=src_tokens:1x1000 --optShapes=src_tokens:1x100000 --maxShapes=src_tokens:1x700000 --preview=+fasterDynamicShapes0805 >log.en
all the thing is ok
However when I conert the onnx model to an tensort fp16 the output is very different and some weights affected

[01/13/2023-10:07:54] [W] [TRT] onnx2trt_utils.cpp:377: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[01/13/2023-10:09:43] [W] [TRT] Using kFASTER_DYNAMIC_SHAPES_0805 preview feature.
[01/13/2023-10:20:45] [W] [TRT] TensorRT encountered issues when converting weights between types and that could affect accuracy.
[01/13/2023-10:20:45] [W] [TRT] If this is not the desired behavior, please modify the weights or retrain with regularization to adjust the magnitude of the weights.
[01/13/2023-10:20:45] [W] [TRT] Check verbose logs for the list of affected weights.
[01/13/2023-10:20:45] [W] [TRT] - 254 weights are affected by this issue: Detected subnormal FP16 values.
[01/13/2023-10:20:45] [W] [TRT] - 31 weights are affected by this issue: Detected values less than smallest positive FP16 subnormal value and converted them to the FP16 minimum subnormalized value.

detail log in log.en
List of affected weights: Conv_240.weight, Conv_263.weight, Conv_286.weight, Conv_309.weight, Conv_332.weight, Conv_355.weight, Conv_378.weight, Conv_416.weight, Conv_5165.bias, Conv_5165.weight, Conv_5169.weight, Conv_5173.bias, Conv_5173.weight, Gemm_1046.bias, Gemm_1046.weight, Gemm_1239.weight, Gemm_1432.bias, Gemm_1432.weight, Gemm_1625.weight, Gemm_181.....

I want to use --precisionConstraints and --layerPrecisions to restrict some weiths to fp32,the commd as follows:
trtexec --onnx=encoder2.onnx --fp16 --saveEngine=encoderfp16.trt --useCudaGraph --verbose --tacticSources=-cublasLt,+cublas --workspace=10240M --minShapes=src_tokens:1x1000 --optShapes=src_tokens:1x100000 --maxShapes=src_tokens:1x700000 --preview=+fasterDynamicShapes0805 --precisionConstraints=obey --layerPrecisions=Conv_240.weight:fp32, Conv_263.weight:fp32, Conv_286.weight:fp32 >log.en1

But the log output are the same,


[01/13/2023-10:42:38] [W] [TRT] onnx2trt_utils.cpp:377: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[01/13/2023-10:44:28] [W] [TRT] Using kFASTER_DYNAMIC_SHAPES_0805 preview feature.
[01/13/2023-10:55:24] [W] [TRT] TensorRT encountered issues when converting weights between types and that could affect accuracy.
[01/13/2023-10:55:24] [W] [TRT] If this is not the desired behavior, please modify the weights or retrain with regularization to adjust the magnitude of the weights.
[01/13/2023-10:55:24] [W] [TRT] Check verbose logs for the list of affected weights.
[01/13/2023-10:55:24] [W] [TRT] - 254 weights are affected by this issue: Detected subnormal FP16 values.
[01/13/2023-10:55:24] [W] [TRT] - 31 weights are affected by this issue: Detected values less than smallest positive FP16 subnormal value and converted them to the FP16 minimum subnormalized value.
[01/13/2023-10:55:26] [W] [TRT] CUDA lazy loading is not enabled. Enabling it can significantly reduce device memory usage. See CUDA_MODULE_LOADING in https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#env-vars


detail log in log.en1

A clear and concise description of the bug or issue.

Environment

TensorRT Version: TensorRT-8.5.2.2.Linux.x86_64-gnu.cuda-11.8.cudnn8.6.tar.gz
GPU Type: Tesla V100-PCIE
CUDA Version: 11.6
CUDNN Version: cudnn-linux-x86_64-8.6.0.163_cuda11-archive
PyTorch Version (if applicable): 1.12.0
onnx: 1.12.0

Relevant Files

Please attach or include links to any models, data, files, or scripts necessary to reproduce your issue. (Github repo, Google Drive, Dropbox, etc.)

Steps To Reproduce

Please include:

  • Exact steps/commands to build your repro
  • Exact steps/commands to run your repro
  • Full traceback of errors encountered
@zll0000
Copy link
Author

zll0000 commented Jan 14, 2023

@zerollzeng
Copy link
Collaborator

Can you try --precisionConstraints=obey --layerPrecisions=Conv_240:fp32, Conv_263:fp32, Conv_286:fp32 and see it it make difference? IIUC the --layerPrecisions is applied to onnx nodes, not weights or tensors.

@zerollzeng zerollzeng self-assigned this Jan 16, 2023
@zerollzeng zerollzeng added the triaged Issue has been triaged by maintainers label Jan 16, 2023
@zll0000
Copy link
Author

zll0000 commented Jan 17, 2023

@zerollzeng
in the log threre are several hundred parameters out of the range of FP16 each layer shoud be added in the commd like these --layerPrecisions=Conv_240:fp32, Conv_263:fp32, Conv_286:fp32.... have other solutions?
[01/17/2023-15:00:13] [W] [TRT] - 225 weights are affected by this issue: Detected subnormal FP16 values.
[01/17/2023-15:00:13] [W] [TRT] - 22 weights are affected by this issue: Detected values less than smallest positive FP16 subnormal value and converted them to the FP16 minimum subnormalized value.

@zll0000
Copy link
Author

zll0000 commented Jan 17, 2023

@zerollzeng
image

image

@zll0000
Copy link
Author

zll0000 commented Jan 17, 2023

@zerollzeng I dont undstand what '||' means, "OR" "and" ,other how to process the unnamed* layers

Image_20230117154723

@zerollzeng
Copy link
Collaborator

in the log threre are several hundred parameters out of the range of FP16 each layer shoud be added in the commd like these --layerPrecisions=Conv_240:fp32, Conv_263:fp32, Conv_286:fp32.... have other solutions?

This is a warning, it might not affect the final accuracy, if you want to get rid of this warning totally I think the better solution is to retrain your model and restrict all the weights under the FP16 range.

@zll0000
Copy link
Author

zll0000 commented Jan 18, 2023

it affect the accuracy

@zerollzeng
Copy link
Collaborator

I dont undstand what '||' means, "OR" "and" ,other how to process the unnamed* layers

you don't need to care about unnamed* layers, they are added by TensorRT. I think you can filter out all the nodes here(ignore the .weight and Unnamed Layer)

@zerollzeng
Copy link
Collaborator

Fall back many layer to FP32 may lead to huge degradation on performance, that's why I suggest retrain the model.

@zll0000
Copy link
Author

zll0000 commented Jan 18, 2023

Fall back many layer to FP32 may lead to huge degradation on performance, that's why I suggest retrain the model.

ok Thanks

@sardanian
Copy link

I am having a similiar issue when using torch_tensorrt.

How can I retrain my model within the FP16 range? I tried retraining using Gradscaler as indicated in AMP training tutorials for Pytorch. Are you meaning a different way? And if so, which way is that?

Thank you.

@dchebakov
Copy link

I would suggest you use tensorrt python api for easy experiments and to read this similar issue #2922

@ttyio
Copy link
Collaborator

ttyio commented Nov 23, 2023

closing inactive issues, thanks all!

@ttyio ttyio closed this as completed Nov 23, 2023
@mxsurui
Copy link

mxsurui commented Jun 17, 2024

in the log threre are several hundred parameters out of the range of FP16 each layer shoud be added in the commd like these --layerPrecisions=Conv_240:fp32, Conv_263:fp32, Conv_286:fp32.... have other solutions?

This is a warning, it might not affect the final accuracy, if you want to get rid of this warning totally I think the better solution is to retrain your model and restrict all the weights under the FP16 range.

Could tell me how to restrict all the weights under the FP16 range? it confuse me a long time, thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
triaged Issue has been triaged by maintainers
Projects
None yet
Development

No branches or pull requests

6 participants