Workaround a compiler problem that caused Invalid device function error. #2656

mzient · 2021-02-04T11:46:31Z

Signed-off-by: Michał Zientkiewicz mzient@gmail.com

Why we need this PR?

Pick one, remove the rest

It fixes bugs:
- Invalid device function when calling operator Slice on half precision data.
- Unknown type: float16 in DLTensor

What happened in this PR?

Fill relevant points, put NA otherwise. Replace anything inside []

What solution was applied:
- WAR Use a compiler-dependent macro instead of an identity typedef
- use DALI_TYPE_SWITCH_WITH_FP16 and is_fp_or_half in DLTensor
Affected modules and functionalities:
- float16 header
- Slice and SliceFlipNormalize kernels
- Slice tests
- DLTensor
Key points relevant for the review:
- N/A
Validation and testing:
- Python tests
Documentation (including examples):
- N/A

JIRA TASK: DALI-1831

Signed-off-by: Michał Zientkiewicz <mzient@gmail.com>

mzient · 2021-02-04T11:46:38Z

!build

dali-automaton · 2021-02-04T11:50:28Z

CI MESSAGE: [2044971]: BUILD STARTED

JanuszL · 2021-02-04T11:53:51Z

dali/pipeline/data/dltensor.cc

      dl_type.bits = sizeof(T) * 8;
      dl_type.lanes = 1;
-      if (std::is_floating_point<T>::value) {
+      if (dali::is_fp_or_half<T>::value) {


Bad idea. I'm not sure if fp16 is supported by dlpack at all. It supports bfloat but not fp16. Now you would put fp16 inside and claim it to be float.

Check https://github.com/dmlc/dlpack/blob/main/include/dlpack/dlpack.h#L84

Ok. Slice tests checks this indirectly.

For the record: DLPack capsules created this way are compatible with PyTorch. My guess is that it's a de-facto standard for float16 DLPack tensors now.

klecki

Looks ok, but I had bad experience with the commented constructor being used.

klecki · 2021-02-04T12:50:15Z

dali/kernels/slice/slice_gpu.cuh

@@ -256,7 +256,7 @@ class SliceGPU {
    for (int i = 0; i < in.size(); i++) {
      if (default_fill_values_) {
        assert(nfill_values_ == 1);
-        fill_values_cpu[i] = static_cast<OutputType>(0.f);
+        fill_values_cpu[i] = OutputType{};


Can you check if this works as intended? When I was writing the GaussianBlur, the compiler did some weird things if the output was half, and I had to initialize the 0 by using the conversion from float (that one worked without problems).

dali-automaton · 2021-02-04T13:30:26Z

CI MESSAGE: [2044971]: BUILD PASSED

mzient added 2 commits February 4, 2021 12:37

Workaround a compiler problem that caused Invalid device function error.

25fecec

Signed-off-by: Michał Zientkiewicz <mzient@gmail.com>

Add FP16 support in DLTensor.

d4ed042

Signed-off-by: Michał Zientkiewicz <mzient@gmail.com>

mzient requested a review from a team February 4, 2021 11:46

JanuszL reviewed Feb 4, 2021

View reviewed changes

JanuszL approved these changes Feb 4, 2021

View reviewed changes

klecki reviewed Feb 4, 2021

View reviewed changes

JanuszL self-assigned this Feb 4, 2021

klecki approved these changes Feb 4, 2021

View reviewed changes

mzient merged commit 92f1d5c into NVIDIA:master Feb 4, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Workaround a compiler problem that caused Invalid device function error. #2656

Workaround a compiler problem that caused Invalid device function error. #2656

mzient commented Feb 4, 2021

mzient commented Feb 4, 2021

dali-automaton commented Feb 4, 2021

JanuszL Feb 4, 2021

JanuszL Feb 4, 2021

JanuszL Feb 4, 2021

mzient Feb 4, 2021

klecki left a comment

klecki Feb 4, 2021

dali-automaton commented Feb 4, 2021

Workaround a compiler problem that caused Invalid device function error. #2656

Workaround a compiler problem that caused Invalid device function error. #2656

Conversation

mzient commented Feb 4, 2021

Why we need this PR?

What happened in this PR?

mzient commented Feb 4, 2021

dali-automaton commented Feb 4, 2021

JanuszL Feb 4, 2021

Choose a reason for hiding this comment

JanuszL Feb 4, 2021

Choose a reason for hiding this comment

JanuszL Feb 4, 2021

Choose a reason for hiding this comment

mzient Feb 4, 2021

Choose a reason for hiding this comment

klecki left a comment

Choose a reason for hiding this comment

klecki Feb 4, 2021

Choose a reason for hiding this comment

dali-automaton commented Feb 4, 2021