Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Workaround a compiler problem that caused Invalid device function error. #2656

Merged
merged 2 commits into from
Feb 4, 2021

Conversation

mzient
Copy link
Contributor

@mzient mzient commented Feb 4, 2021

Signed-off-by: Michał Zientkiewicz mzient@gmail.com

Why we need this PR?

Pick one, remove the rest

  • It fixes bugs:
    • Invalid device function when calling operator Slice on half precision data.
    • Unknown type: float16 in DLTensor

What happened in this PR?

Fill relevant points, put NA otherwise. Replace anything inside []

  • What solution was applied:
    • WAR Use a compiler-dependent macro instead of an identity typedef
    • use DALI_TYPE_SWITCH_WITH_FP16 and is_fp_or_half in DLTensor
  • Affected modules and functionalities:
    • float16 header
    • Slice and SliceFlipNormalize kernels
    • Slice tests
    • DLTensor
  • Key points relevant for the review:
    • N/A
  • Validation and testing:
    • Python tests
  • Documentation (including examples):
    • N/A

JIRA TASK: DALI-1831

Signed-off-by: Michał Zientkiewicz <mzient@gmail.com>
Signed-off-by: Michał Zientkiewicz <mzient@gmail.com>
@mzient mzient requested a review from a team February 4, 2021 11:46
@mzient
Copy link
Contributor Author

mzient commented Feb 4, 2021

!build

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [2044971]: BUILD STARTED

dl_type.bits = sizeof(T) * 8;
dl_type.lanes = 1;
if (std::is_floating_point<T>::value) {
if (dali::is_fp_or_half<T>::value) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bad idea. I'm not sure if fp16 is supported by dlpack at all. It supports bfloat but not fp16. Now you would put fp16 inside and claim it to be float.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. Slice tests checks this indirectly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the record: DLPack capsules created this way are compatible with PyTorch. My guess is that it's a de-facto standard for float16 DLPack tensors now.

Copy link
Contributor

@klecki klecki left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks ok, but I had bad experience with the commented constructor being used.

@@ -256,7 +256,7 @@ class SliceGPU {
for (int i = 0; i < in.size(); i++) {
if (default_fill_values_) {
assert(nfill_values_ == 1);
fill_values_cpu[i] = static_cast<OutputType>(0.f);
fill_values_cpu[i] = OutputType{};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you check if this works as intended? When I was writing the GaussianBlur, the compiler did some weird things if the output was half, and I had to initialize the 0 by using the conversion from float (that one worked without problems).

@JanuszL JanuszL self-assigned this Feb 4, 2021
@dali-automaton
Copy link
Collaborator

CI MESSAGE: [2044971]: BUILD PASSED

@mzient mzient merged commit 92f1d5c into NVIDIA:master Feb 4, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants