-
Notifications
You must be signed in to change notification settings - Fork 199
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Building the BEVFusion TRT engines via the C++ API instead of trtexec
, they run but give garbage results
#240
Comments
Actually, I'm now less certain that the layernorm layers are the (only) issue. When I only set the model inputs and outputs to FP16 but leave the rest of the network using FP32, the layernorm FP16 overflow warning no longer shows up, but the values are still bad, both for
I see no more |
I am trying to integrate this TensorRT implementation of BEVFusion into an existing codebase that has TensorRT but not
trtexec
. As such, I have been trying to get the engines to build via the C++ API instead oftrtexec
and thebuild_trt_engine.sh
script. I got it to build and run inference without error, but the results I get are garbage and do not at all match the ones I get in your standalone implementation.The code I wrote to build the engine is as follows (with TensorRT version
8.6.1.6-1+cuda11.8
):This successfully builds the engines and saves them to the
*.plan
files. When I try to run a sample image that works well with thetrtexec
version, I get the following garbage outputs:As you can see, there are
inf
s and other nonsense values with odd patterns among them. When building the engine, the following warning is logged for the head, which seems like a likely cause for bad outputs:As such, I added the following code to try to force the offending layers to use FP32:
But the outputs are still garbage and, very bizarrely, the warning still shows but now reads:
With all of the layer names just replaced with empty strings, but still listed. I tried forcing all of the layers 1 or 5 or 10 before and after the listed layernorm layers to be FP32 as well, but it didn't help.
Why might my model give such garbage outputs? I used
polygraphy
to compare the outputs in ONNX and TensorRT with the commandpolygraphy run caamera.backbone.onnx --trt --fp16 --onnxrt --precision-constraints=obey
. It fails on most of theresnet50int8
.onnx files due to non-positive scaling values in some of the Q/DQ layers, and it fails on the lidar backbone because of the sparse convolution, but it works on the rest inresnet50
, and it shows essentially no error in fp32, and only small error in fp16 except for the outputs ofhead.bbox.onnx
. So it really seems likely that those layernorm layers and the resultant fp16 overflow are the cause of the bad values I'm getting, but how can I set them properly with the C++ API?Thanks in advance for any help.
The text was updated successfully, but these errors were encountered: