Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Half function undefined during compilation #1003

Closed
5had3z opened this issue Jan 11, 2021 · 6 comments
Closed

Half function undefined during compilation #1003

5had3z opened this issue Jan 11, 2021 · 6 comments
Labels
OSS Build triaged Issue has been triaged by maintainers

Comments

@5had3z
Copy link

5had3z commented Jan 11, 2021

Description

I've implemented a plugin for grid sample 2D from the pytorch repo and used it within my own codebase. However when I try to add it to the Tensor RT OSS codebase to contribute, compilation fails with undefined identifiers to all the half functions used, i.e. error: identifier "__hmax" is undefined. I have noticed that I don't think I can see any other half math intrinsics used in the rest of the codebase, is that something that is to be avoided?

Environment

TensorRT Version: OSS-master
GPU Type: RTX 3090
Nvidia Driver Version: 455.32.00
CUDA Version: 11.1
CUDNN Version:
Operating System + Version: Ubuntu 20.04
Python Version (if applicable):
TensorFlow Version (if applicable):
PyTorch Version (if applicable):
Baremetal or Container (if container which image + tag):

Relevant Files

https://github.com/5had3z/TensorRT

Steps To Reproduce

Just compile as per the readme in the standard Tensor RT OSS instructions.

@pranavm-nvidia
Copy link
Collaborator

You'd probably want to guard the intrinsics with SM version checks. For example: https://github.com/NVIDIA/TensorRT/blob/master/plugin/common/common.cuh#L67

@ttyio ttyio added OSS Build triaged Issue has been triaged by maintainers labels Jan 15, 2021
@5had3z
Copy link
Author

5had3z commented Jan 27, 2021

Is there a good way to determine the earliest SM version which supports a particular function? The math intrinsics documentation doesn't specify as far as I can tell. Also is it still required that I support SM<5.3 since they are being depricated and removed in future versions of Tensor RT?
I've been without a PSU for the past few weeks so have only resumed working on this now.

@5had3z
Copy link
Author

5had3z commented Jan 28, 2021

Is there any particular way about going about testing for older SM arch? I've got things compiling now, however its a bit messy trying to cover float and half and recasting things if it isn't supported (especially without if constexpr). The plugin works fine in my own application with a 3090, I'm just not sure how to validate it properly as a unit test and with other GPUs.

I also haven't played with signing commits off before (previously haven't needed to do so). I've cooked some of the commits, should I be able to fix this by rebasing?

@pranavm-nvidia
Copy link
Collaborator

Is there a good way to determine the earliest SM version which supports a particular function?

There's a table here which shows what functionality each compute capability supports.

I also haven't played with signing commits off before (previously haven't needed to do so). I've cooked some of the commits, should I be able to fix this by rebasing?

You should be able to retroactively sign commits with: git commit --amend --signoff

CC'ing @rajeevsrao regarding how to test on older SMs,

@5had3z
Copy link
Author

5had3z commented Feb 3, 2021

Interestingly __hmin and __hmax only work when CUDA_ARCH >= 800, otherwise returns undefined when using CUDA_ARCH >= 530, even when it should be supported from then onwards. Other functions such as __hdiv or __habs work corrrectly when using CUDA_ARCH >= 530.

@nvpohanh
Copy link
Collaborator

Closing for now due to >14 days with no response. Please feel free to reopen if the issue still exists. Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
OSS Build triaged Issue has been triaged by maintainers
Projects
None yet
Development

No branches or pull requests

4 participants