-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Half function undefined during compilation #1003
Comments
You'd probably want to guard the intrinsics with SM version checks. For example: https://github.com/NVIDIA/TensorRT/blob/master/plugin/common/common.cuh#L67 |
Is there a good way to determine the earliest SM version which supports a particular function? The math intrinsics documentation doesn't specify as far as I can tell. Also is it still required that I support SM<5.3 since they are being depricated and removed in future versions of Tensor RT? |
Is there any particular way about going about testing for older SM arch? I've got things compiling now, however its a bit messy trying to cover float and half and recasting things if it isn't supported (especially without if constexpr). The plugin works fine in my own application with a 3090, I'm just not sure how to validate it properly as a unit test and with other GPUs. I also haven't played with signing commits off before (previously haven't needed to do so). I've cooked some of the commits, should I be able to fix this by rebasing? |
There's a table here which shows what functionality each compute capability supports.
You should be able to retroactively sign commits with: CC'ing @rajeevsrao regarding how to test on older SMs, |
Interestingly __hmin and __hmax only work when CUDA_ARCH >= 800, otherwise returns undefined when using CUDA_ARCH >= 530, even when it should be supported from then onwards. Other functions such as __hdiv or __habs work corrrectly when using CUDA_ARCH >= 530. |
Closing for now due to >14 days with no response. Please feel free to reopen if the issue still exists. Thanks |
Description
I've implemented a plugin for grid sample 2D from the pytorch repo and used it within my own codebase. However when I try to add it to the Tensor RT OSS codebase to contribute, compilation fails with undefined identifiers to all the half functions used, i.e. error: identifier "__hmax" is undefined. I have noticed that I don't think I can see any other half math intrinsics used in the rest of the codebase, is that something that is to be avoided?
Environment
TensorRT Version: OSS-master
GPU Type: RTX 3090
Nvidia Driver Version: 455.32.00
CUDA Version: 11.1
CUDNN Version:
Operating System + Version: Ubuntu 20.04
Python Version (if applicable):
TensorFlow Version (if applicable):
PyTorch Version (if applicable):
Baremetal or Container (if container which image + tag):
Relevant Files
https://github.com/5had3z/TensorRT
Steps To Reproduce
Just compile as per the readme in the standard Tensor RT OSS instructions.
The text was updated successfully, but these errors were encountered: