Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add isfinite, isinf, isnormal and signbit relational built-ins #959

Merged
merged 9 commits into from
Apr 3, 2023

Conversation

nmnobre
Copy link
Member

@nmnobre nmnobre commented Mar 4, 2023

Hi @illuhad,

Congratulations on the name change to Open SYCL, it's looking great. :)

I created some new macros to avoid repeating the same code structure over and over.
Let me know if their names make sense to you, and double check the return types as well. :)
I just didn't implement isnormal because I don't know what I'd map it to for the Nvidia libdevice SSCP path.

Cheers,
-Nuno

@nmnobre nmnobre changed the title Add isfinite and isinf relational built-in functions Add isfinite, isinf and signbit relational built-ins Mar 4, 2023
Copy link
Collaborator

@illuhad illuhad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome! Would it be possible to maybe add some tests? At least compile tests, to see if the functions get instantiated correctly for all backends.

include/hipSYCL/sycl/libkernel/builtins.hpp Show resolved Hide resolved

HIPSYCL_DECLARE_SSCP_GENFLOAT_REL_BUILTIN(isfinite)

HIPSYCL_DECLARE_SSCP_GENFLOAT_REL_BUILTIN(signbit)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think you actually save a lot by using macros here. You only need the declaration for f32 and f64. The one line that you save per builtin is almost eaten up by the macro definition (plus additional complexity) :-)

Copy link
Member Author

@nmnobre nmnobre Mar 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it was a tough one... I guess there are two reasons for a macro, 1) saving lines of code and/or 2) guaranteed consistency and immediate certainty, visually, that you are looking at the same code structure, i.e. that the four built-ins are declared in the same way. For instance, it might have helped prevent the bug with lgamma_r fixed in #960.
If, however, you still think the macros are a bit too much, I'll remove them, no problem.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we currently use any macros for the SSCP builtin headers. If this is the case I think we should not start here (where there's only little benefit) for the sake of consistency.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, reverted!


HIPSYCL_SSCP_MAP_OCML_REL_BUILTIN(isfinite)

HIPSYCL_SSCP_MAP_OCML_REL_BUILTIN(signbit)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly, is it really worth using macros here (and in the other sscp implementation files)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added isnormal, so now it's just slightly more worth it. :-)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome, thank you :)

@nmnobre nmnobre requested a review from illuhad March 7, 2023 11:29
@nmnobre nmnobre changed the title Add isfinite, isinf and signbit relational built-ins Add isfinite, isinf, isnormal and signbit relational built-ins Mar 9, 2023
@nmnobre
Copy link
Member Author

nmnobre commented Mar 9, 2023

Awesome! Would it be possible to maybe add some tests? At least compile tests, to see if the functions get instantiated correctly for all backends.

Sure, can you maybe point me to an existing example?

@illuhad
Copy link
Collaborator

illuhad commented Mar 10, 2023

Sure, can you maybe point me to an existing example?

You can look at the tests for the math builtins for some inspiration: https://github.com/OpenSYCL/OpenSYCL/blob/develop/tests/sycl/math.cpp

@illuhad
Copy link
Collaborator

illuhad commented Mar 16, 2023

Hi Nuno, what is the state here? From my perspective it's only waiting for tests. Just want to make sure you're not waiting on something from me :-)

@nmnobre
Copy link
Member Author

nmnobre commented Mar 17, 2023

Hi Nuno, what is the state here? From my perspective it's only waiting for tests. Just want to make sure you're not waiting on something from me :-)

No :-P It's just lack of time, I'll try and have a look soon :-)

@nmnobre nmnobre force-pushed the to_infinity_and_beyond branch 12 times, most recently from e17485d to 76a9961 Compare March 19, 2023 14:15
@nmnobre
Copy link
Member Author

nmnobre commented Mar 20, 2023

Hi Aksel,

I've given it a shot. 🙂

I took heavy inspiration from the tests for the math builtins, thanks for the tip! There's some redundancy now on the utility types, traits and functions between relational.cpp and math.cpp, let me know if you'd like to move them to a common header.

Unfortunately, isnormal remains a bit of a problem. I'm facing linking issues so I've commented it out for now. What's the path that the nvc++ flow/tests take? Calling std::isnormal should work in that case, so I could make the switch.

Cheers!

@nmnobre
Copy link
Member Author

nmnobre commented Mar 20, 2023

Unfortunately, isnormal remains a bit of a problem. I'm facing linking issues so I've commented it out for now. What's the path that the nvc++ flow/tests take? Calling std::is normal should work in that case, so I could make the switch.

Okay, the linking problem on the Linux clang based tests only happens with clang 11, all other tested versions are okay (I suspect isnormal just wasn’t implemented back then for the amd backend). Two options:

  • We fence the test using __clang_major__;
  • We stop testing with clang 11.

Which would you prefer?

@illuhad
Copy link
Collaborator

illuhad commented Mar 20, 2023

Lovely! Thanks, Nuno :-)

I'm facing linking issues so I've commented it out for now.

What are the error messages?

I'm facing linking issues so I've commented it out for now. What's the path that the nvc++ flow/tests take?

It goes through the same path as clang CUDA.

Which would you prefer?

I'd first like to have a better understanding of the issue :-)

@nmnobre
Copy link
Member Author

nmnobre commented Mar 20, 2023

Lovely! Thanks, Nuno :-)

😊

What are the error messages?

I thought it'd be easier if I just let you see the error messages for yourself so I've (re-)enabled the test for the isnormal built-in.

It goes through the same path as clang CUDA.

Ah, in that case I don't quite understand what the problem is then, because I can definitely use ::isnormal inside a CUDA kernel with nvc++. I don't have version 22.1 installed locally, but I did try it with both 21.7 and 23.1...

if(idx==12) return v.sC();
if(idx==13) return v.sD();
if(idx==14) return v.sE();
if(idx==15) return v.sF();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SYCL 2020 has introduced vec::operator[], so I think you could simplify this quite a bit :-)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

}
template<typename DT, int D, std::enable_if_t<D==16, int> = 0>
auto get_rel_input(cl::sycl::vec<DT, 16> v) {
return v;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just to obtain n-component subvector of a 16-component vector, right? Couldn't you just do something like:

template<class T, int N>
auto get_subvector(vec<T, 16> v) {
  if constexpr(N==1) {
    return vec<T,1>{v.swizzle<0>()};
  else if constexpr(N==2){
    return vec<T,2>{v.swizzle<0,1>()};
  } else if constexpr(N==3){
    ....
  } ....
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, but nvc++ doesn't like it, we need to think of a workaround...

{
auto inputs = in.template get_access<s::access::mode::write>();
auto outputs = out.template get_access<s::access::mode::write>();
inputs[0] = get_rel_input<DT, D>({NAN, INFINITY, INFINITY - INFINITY, 0.0, 0.0/0.0, 1.0/0.0, sqrt(-1), FLT_MIN, FLT_MIN/2.0, DBL_MIN, DBL_MIN/2.0, -1.0, 17.0, -4.0, -2.0, 3.0});
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe std::numeric_limits<T>::max() instead of FLT_MAX etc, as we are in C++? :-)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

s::buffer<OutType> out{{FUN_COUNT}};
{
auto inputs = in.template get_access<s::access::mode::write>();
auto outputs = out.template get_access<s::access::mode::write>();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better use get_host_access(), otherwise you get a deprecation warning once #979 is merged.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@illuhad
Copy link
Collaborator

illuhad commented Mar 20, 2023

For the clang issue, you could try whether __builtin_isnormal() works. It's possible clang 11 cuda wrapper headers just don't support isnormal(). The implementation of this function should just call __builtin_isnormal().

For nvc++: nvc++ is known to be a little brittle as a compiler. It's not as stable as clang (or gcc). There are already a couple of test cases disabled for nvc++, because they cause it to crash. There are also a couple of math builtins that don't get resolved correctly by nvc++ in specific code environments with similar linker issues. We've reported those issues were we could find a minimal reproducer, but this is not always easy. Long story short: nvc++ has bugs. Just disable the test case for nvc++ in that case.

@nmnobre nmnobre force-pushed the to_infinity_and_beyond branch 2 times, most recently from e36c4d9 to 55b8bfc Compare March 21, 2023 22:19
@nmnobre
Copy link
Member Author

nmnobre commented Mar 23, 2023

For the clang issue, you could try whether __builtin_isnormal() works. It's possible clang 11 cuda wrapper headers just don't support isnormal(). The implementation of this function should just call __builtin_isnormal().

Bingo, that was it! :)

For nvc++: nvc++ is known to be a little brittle as a compiler. It's not as stable as clang (or gcc). There are already a couple of test cases disabled for nvc++, because they cause it to crash. There are also a couple of math builtins that don't get resolved correctly by nvc++ in specific code environments with similar linker issues. We've reported those issues were we could find a minimal reproducer, but this is not always easy. Long story short: nvc++ has bugs. Just disable the test case for nvc++ in that case.

I've built a minimal reproducer and reported the issue in the Nvidia forums... I'm still a bit confused, but hopefully it'll get clearer.
I've disabled the test but, as hinted in another comment, the use of sycl::vec::swizzle brought new problems with it...
I've discovered that newer versions of the Nvidia HPC SDK don't have the same issues and I've prepared #986 which if merged will fix the failing test in this PR. :-)

Cheers,
-Nuno

@nmnobre nmnobre requested a review from illuhad March 27, 2023 14:36
Copy link
Collaborator

@illuhad illuhad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, that looks good so far. I'm just running some more tests that CI cannot yet do automatically (like JIT code from generic compilation flow).

Which GPUs have you already validated?

@nmnobre
Copy link
Member Author

nmnobre commented Mar 28, 2023

I normally test on a V100 and on an MI210.
I think I remember testing the new functionality on both, but I'm also almost certain I used literals... I hope that didn't replace any actual device work with compile time constant expression evaluations... 😬

@illuhad
Copy link
Collaborator

illuhad commented Mar 28, 2023

I've tested on NVIDIA and Intel with the generic SSCP path. Works too :-) At least the SSCP compiler does generate actual device work. Here's an excerpt of the code it generates (note the calls to __hipsycl_sscp_isnormal_f32() etc) for relational.cpp:

define linkonce_odr dso_local void @_ZNK7hipsycl4glue15__sscp_dispatch11single_taskIZZN9rel_tests18rel_genfloat_unaryINS_4sycl3vecIfLi2ENS5_6detail11vec_storageIfLi2EEEEEE11test_methodEvENKUlRNS5_7handlerEE_clESD_EUlvE_EclEv(%"class.hipsycl::glue::__sscp_dispatch::single_task.731"* noundef nonnull align 8 dereferenceable(48) %0) local_unnamed_addr #1 comdat align 2 {
  %2 = getelementptr inbounds %"class.hipsycl::glue::__sscp_dispatch::single_task.731", %"class.hipsycl::glue::__sscp_dispatch::single_task.731"* %0, i64 0, i32 0, i32 1
  %3 = bitcast %"class.hipsycl::sycl::accessor.726"* %2 to <2 x float>**
  %4 = load <2 x float>*, <2 x float>** %3, align 8, !tbaa !5
  %5 = load <2 x float>, <2 x float>* %4, align 8, !tbaa.struct !9
  %6 = extractelement <2 x float> %5, i64 0
  %7 = tail call i32 @__hipsycl_sscp_isfinite_f32(float noundef %6) #4
  %8 = extractelement <2 x float> %5, i64 1
  %9 = tail call i32 @__hipsycl_sscp_isfinite_f32(float noundef %8) #4
  %10 = zext i32 %9 to i64
  %11 = shl nuw i64 %10, 32
  %12 = zext i32 %7 to i64
  %13 = or i64 %11, %12
  %14 = bitcast %"class.hipsycl::glue::__sscp_dispatch::single_task.731"* %0 to %"class.hipsycl::sycl::vec.715"**
  %15 = bitcast %"class.hipsycl::glue::__sscp_dispatch::single_task.731"* %0 to i64**
  %16 = load i64*, i64** %15, align 8, !tbaa !5
  store i64 %13, i64* %16, align 8, !tbaa.struct !9
  %17 = load <2 x float>*, <2 x float>** %3, align 8, !tbaa !5
  %18 = load <2 x float>, <2 x float>* %17, align 8, !tbaa.struct !9
  %19 = extractelement <2 x float> %18, i64 0
  %20 = tail call i32 @__hipsycl_sscp_isinf_f32(float noundef %19) #4
  %21 = extractelement <2 x float> %18, i64 1
  %22 = tail call i32 @__hipsycl_sscp_isinf_f32(float noundef %21) #4
  %23 = zext i32 %22 to i64
  %24 = shl nuw i64 %23, 32
  %25 = zext i32 %20 to i64
  %26 = or i64 %24, %25
  %27 = load %"class.hipsycl::sycl::vec.715"*, %"class.hipsycl::sycl::vec.715"** %14, align 8, !tbaa !5
  %28 = getelementptr inbounds %"class.hipsycl::sycl::vec.715", %"class.hipsycl::sycl::vec.715"* %27, i64 1
  %29 = bitcast %"class.hipsycl::sycl::vec.715"* %28 to i64*
  store i64 %26, i64* %29, align 8, !tbaa.struct !9
  %30 = load <2 x float>*, <2 x float>** %3, align 8, !tbaa !5
  %31 = load <2 x float>, <2 x float>* %30, align 8, !tbaa.struct !9
  %32 = extractelement <2 x float> %31, i64 0
  %33 = tail call i32 @__hipsycl_sscp_isnan_f32(float noundef %32) #4
  %34 = extractelement <2 x float> %31, i64 1
  %35 = tail call i32 @__hipsycl_sscp_isnan_f32(float noundef %34) #4
  %36 = zext i32 %35 to i64
  %37 = shl nuw i64 %36, 32
  %38 = zext i32 %33 to i64
  %39 = or i64 %37, %38
  %40 = load %"class.hipsycl::sycl::vec.715"*, %"class.hipsycl::sycl::vec.715"** %14, align 8, !tbaa !5
  %41 = getelementptr inbounds %"class.hipsycl::sycl::vec.715", %"class.hipsycl::sycl::vec.715"* %40, i64 2
  %42 = bitcast %"class.hipsycl::sycl::vec.715"* %41 to i64*
  store i64 %39, i64* %42, align 8, !tbaa.struct !9
  %43 = load <2 x float>*, <2 x float>** %3, align 8, !tbaa !5
  %44 = load <2 x float>, <2 x float>* %43, align 8, !tbaa.struct !9
  %45 = extractelement <2 x float> %44, i64 0
  %46 = tail call i32 @__hipsycl_sscp_isnormal_f32(float noundef %45) #4
  %47 = extractelement <2 x float> %44, i64 1
  %48 = tail call i32 @__hipsycl_sscp_isnormal_f32(float noundef %47) #4
  %49 = zext i32 %48 to i64
  %50 = shl nuw i64 %49, 32
  %51 = zext i32 %46 to i64
  %52 = or i64 %50, %51
  %53 = load %"class.hipsycl::sycl::vec.715"*, %"class.hipsycl::sycl::vec.715"** %14, align 8, !tbaa !5
  %54 = getelementptr inbounds %"class.hipsycl::sycl::vec.715", %"class.hipsycl::sycl::vec.715"* %53, i64 3
  %55 = bitcast %"class.hipsycl::sycl::vec.715"* %54 to i64*
  store i64 %52, i64* %55, align 8, !tbaa.struct !9
  %56 = load <2 x float>*, <2 x float>** %3, align 8, !tbaa !5
  %57 = load <2 x float>, <2 x float>* %56, align 8, !tbaa.struct !9
  %58 = extractelement <2 x float> %57, i64 0
  %59 = tail call i32 @__hipsycl_sscp_signbit_f32(float noundef %58) #4
  %60 = extractelement <2 x float> %57, i64 1
  %61 = tail call i32 @__hipsycl_sscp_signbit_f32(float noundef %60) #4
  %62 = zext i32 %61 to i64
  %63 = shl nuw i64 %62, 32
  %64 = zext i32 %59 to i64
  %65 = or i64 %63, %64
  %66 = load %"class.hipsycl::sycl::vec.715"*, %"class.hipsycl::sycl::vec.715"** %14, align 8, !tbaa !5
  %67 = getelementptr inbounds %"class.hipsycl::sycl::vec.715", %"class.hipsycl::sycl::vec.715"* %66, i64 4
  %68 = bitcast %"class.hipsycl::sycl::vec.715"* %67 to i64*
  store i64 %65, i64* %68, align 8, !tbaa.struct !9
  ret void
}

If you'd like to test with the other compilation flows (HIP/CUDA), you can try compiling relational.cpp using --cuda-device-only -emit-llvm and/or -S.
But seeing that it does not generate constant expression in the generic compiler, it is unlikely that the other backends behave differently.

EDIT: Can you rebase on current develop so that we have the updated nvc++ in CI? I'd like to see it pass there too.

@illuhad
Copy link
Collaborator

illuhad commented Mar 29, 2023

Hi Nuno,

given all the contributions you've made/are making, I've sent an invitation to join the OpenSYCL github organization. No obligations or expectations are tied to this, it just makes it potentially easier to collaborate - and once our self-hosted CI is fixed, you'll get access to our self-hosted github runners and can test PRs on GPUs :-)

PS: I realize the edit might have been easy to overlook - as stated in my previous post, this PR is just waiting for rebasing on current develop so that we can have nvc++ tests run with the updated nvc++ CI :)

@nmnobre
Copy link
Member Author

nmnobre commented Apr 2, 2023

Hi Nuno,

given all the contributions you've made/are making, I've sent an invitation to join the OpenSYCL github organization. No obligations or expectations are tied to this, it just makes it potentially easier to collaborate - and once our self-hosted CI is fixed, you'll get access to our self-hosted github runners and can test PRs on GPUs :-)

Hi Aksel,

Thank you for the kind words and for making me part of the GitHub organisation. :-)

PS: I realize the edit might have been easy to overlook - as stated in my previous post, this PR is just waiting for rebasing on current develop so that we can have nvc++ tests run with the updated nvc++ CI :)

Done.

@nmnobre
Copy link
Member Author

nmnobre commented Apr 2, 2023

I've also tried generating human-readable (debatable :-P) LLVM bitcode (.ll) with syclcc --hipsycl-targets=cuda:sm_70 -O0 -S -emit-llvm relational.cpp. Then, I grepped the directory for isnormal and this is the result:

Screenshot 2023-04-02 at 16 29 17

As you can see, I've commented out the host call at the bottom of the file to make sure these are due to the SYCL kernel region. But also note that all the hits are in relational.ll and not relational-cuda-nvptx64-nvidia-cuda-sm_70.ll, reason why I didn't use --cuda-device-only... I've used -O0 because with the usual -O3 I suspect everything is getting inlined and I lose the explicit calls.

The only thing left in my mind is if these could be host calls. Does Open SYCL take two passes, one for the host and one for the CUDA device, or just one for the latter? I'm almost sure at least some of that is device code though, since commenting out the kernel call, leaving the host call at the bottom and changing it to the sycl namespace, gives:

Screenshot 2023-04-02 at 17 25 58

Cheers,
-Nuno

@illuhad
Copy link
Collaborator

illuhad commented Apr 2, 2023

Thank you for the kind words and for making me part of the GitHub organisation. :-)

No problem :)

The only thing left in my mind is if these could be host calls. Does Open SYCL take two passes, one for the host and one for the CUDA device, or just one for the latter? I'm almost sure at least some of that is device code though, since commenting out the kernel call, leaving the host call at the bottom and changing it to the sycl namespace, gives:

The cuda and hip targets go through the clang CUDA/HIP toolchains which indeed is based on a multipass design. So, the actual GPU code would be in the relational-cuda-nvptx64-nvidia-cuda-sm_70.ll. (Note that our generic SSCP compiler is different and does not rely on a multipass design)

However, note that we always also generate kernels for the host CPU too inside the host pass, so you still have kernel code in the relational.ll - just not for the GPU path.

It might not be very reliable to grep for isnormal as the inlining behavior on GPU might be different even with -O0. Better to actually look at what the LLVM IR for the kernels is doing.

@nmnobre
Copy link
Member Author

nmnobre commented Apr 3, 2023

The cuda and hip targets go through the clang CUDA/HIP toolchains which indeed is based on a multipass design. So, the actual GPU code would be in the relational-cuda-nvptx64-nvidia-cuda-sm_70.ll. (Note that our generic SSCP compiler is different and does not rely on a multipass design)

However, note that we always also generate kernels for the host CPU too inside the host
pass, so you still have kernel code in the relational.ll - just not for the GPU path.

It might not be very reliable to grep for isnormal as the inlining behavior on GPU might be different even with -O0. Better to actually look at what the LLVM IR for the kernels is doing.

Thanks for the explanation, that clarifies a few things.
So now I just did syclcc --hipsycl-targets=cuda:sm_70 -g -O0 -S --cuda-device-only -emit-llvm relational.cpp (note the -g) and then grepped the directory for hiplike_builtin to catch them all:

Screenshot 2023-04-03 at 11 52 09

I think I'm more convinced now, what do you think?

@illuhad
Copy link
Collaborator

illuhad commented Apr 3, 2023

It's probably fine, but what you are seeing is debug information metadata, not actual code. In theory the compiler might still have evaluated the expression at compile time, and then just slapped the metadata there to say that this expression came from the isnormal function call. Can you attach the full .ll?

@nmnobre
Copy link
Member Author

nmnobre commented Apr 3, 2023

It's probably fine, but what you are seeing is debug information metadata, not actual code.

Indeed.

In theory the compiler might still have evaluated the expression at compile time, and then just slapped the metadata there to say that this expression came from the isnormal function call.

Oh, I didn't think that was possible, I was maybe reading too much into "linkageName"...

Can you attach the full .ll?

Certainly.
relational-cuda-nvptx64-nvidia-cuda-sm_70.ll.zip

@illuhad
Copy link
Collaborator

illuhad commented Apr 3, 2023

Thanks - to me it looks good. It does contain stuff like:

%109 = load double, ptr %30, align 8, !dbg !6688
 %110 = call double @llvm.nvvm.fabs.d(double %109) #8, !dbg !6689
 %111 = fcmp ole double %110, 0x7FF0000000000000, !dbg !6689
 %112 = xor i1 %111, true, !dbg !6689
 %113 = zext i1 %112 to i32, !dbg !6689
 %114 = icmp ne i32 %113, 0, !dbg !6690
 %115 = zext i1 %114 to i32, !dbg !6681
 %116 = sext i32 %115 to i64, !dbg !6677

where it loads a double, and then does some magic to it with some bitmasks. That is the typical result of a floating point builtin (which don't result in actual function calls). The compiler inlines the calls to __hipsycl_*, because, as I remember, they have the __attribute__((always_inline)) :D

@illuhad illuhad merged commit 5ce98cf into AdaptiveCpp:develop Apr 3, 2023
13 of 16 checks passed
@nmnobre
Copy link
Member Author

nmnobre commented Apr 3, 2023

where it loads a double, and then does some magic to it with some bitmasks. That is the typical result of a floating point builtin (which don't result in actual function calls). The compiler inlines the calls to __hipsycl_*, because, as I remember, they have the __attribute__((always_inline)) :D

I think you are right, again! :P Following the debugging symbols, you get:

!6689 = !DILocation(line: 442, column: 45, scope: !5576, inlinedAt: !6687)

and then,

!5576 = distinct !DISubprogram(name: "__isnan", linkageName: "_ZL7__isnand", scope: !2331, file: !2331, line: 442, type: !1115, scopeLine: 442, flags: DIFlagPrototyped, spFlags: DISPFlagLocalToUnit | DISPFlagDefinition, unit: !6, retainedNodes: !844)

🙌

@illuhad
Copy link
Collaborator

illuhad commented Apr 3, 2023

Good find, I forgot about the debug symbols :D

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants