-
Notifications
You must be signed in to change notification settings - Fork 156
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add isfinite, isinf, isnormal and signbit relational built-ins #959
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome! Would it be possible to maybe add some tests? At least compile tests, to see if the functions get instantiated correctly for all backends.
|
||
HIPSYCL_DECLARE_SSCP_GENFLOAT_REL_BUILTIN(isfinite) | ||
|
||
HIPSYCL_DECLARE_SSCP_GENFLOAT_REL_BUILTIN(signbit) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think you actually save a lot by using macros here. You only need the declaration for f32 and f64. The one line that you save per builtin is almost eaten up by the macro definition (plus additional complexity) :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it was a tough one... I guess there are two reasons for a macro, 1) saving lines of code and/or 2) guaranteed consistency and immediate certainty, visually, that you are looking at the same code structure, i.e. that the four built-ins are declared in the same way. For instance, it might have helped prevent the bug with lgamma_r
fixed in #960.
If, however, you still think the macros are a bit too much, I'll remove them, no problem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we currently use any macros for the SSCP builtin headers. If this is the case I think we should not start here (where there's only little benefit) for the sake of consistency.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, reverted!
|
||
HIPSYCL_SSCP_MAP_OCML_REL_BUILTIN(isfinite) | ||
|
||
HIPSYCL_SSCP_MAP_OCML_REL_BUILTIN(signbit) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similarly, is it really worth using macros here (and in the other sscp implementation files)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added isnormal
, so now it's just slightly more worth it. :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome, thank you :)
bfba422
to
c0896b0
Compare
Sure, can you maybe point me to an existing example? |
You can look at the tests for the math builtins for some inspiration: https://github.com/OpenSYCL/OpenSYCL/blob/develop/tests/sycl/math.cpp |
Hi Nuno, what is the state here? From my perspective it's only waiting for tests. Just want to make sure you're not waiting on something from me :-) |
No :-P It's just lack of time, I'll try and have a look soon :-) |
e17485d
to
76a9961
Compare
Hi Aksel, I've given it a shot. 🙂 I took heavy inspiration from the tests for the math builtins, thanks for the tip! There's some redundancy now on the utility types, traits and functions between Unfortunately, Cheers! |
Okay, the linking problem on the Linux clang based tests only happens with clang 11, all other tested versions are okay (I suspect
Which would you prefer? |
Lovely! Thanks, Nuno :-)
What are the error messages?
It goes through the same path as clang CUDA.
I'd first like to have a better understanding of the issue :-) |
😊
I thought it'd be easier if I just let you see the error messages for yourself so I've (re-)enabled the test for the
Ah, in that case I don't quite understand what the problem is then, because I can definitely use |
tests/sycl/relational.cpp
Outdated
if(idx==12) return v.sC(); | ||
if(idx==13) return v.sD(); | ||
if(idx==14) return v.sE(); | ||
if(idx==15) return v.sF(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SYCL 2020 has introduced vec::operator[]
, so I think you could simplify this quite a bit :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
} | ||
template<typename DT, int D, std::enable_if_t<D==16, int> = 0> | ||
auto get_rel_input(cl::sycl::vec<DT, 16> v) { | ||
return v; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is just to obtain n-component subvector of a 16-component vector, right? Couldn't you just do something like:
template<class T, int N>
auto get_subvector(vec<T, 16> v) {
if constexpr(N==1) {
return vec<T,1>{v.swizzle<0>()};
else if constexpr(N==2){
return vec<T,2>{v.swizzle<0,1>()};
} else if constexpr(N==3){
....
} ....
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, but nvc++ doesn't like it, we need to think of a workaround...
tests/sycl/relational.cpp
Outdated
{ | ||
auto inputs = in.template get_access<s::access::mode::write>(); | ||
auto outputs = out.template get_access<s::access::mode::write>(); | ||
inputs[0] = get_rel_input<DT, D>({NAN, INFINITY, INFINITY - INFINITY, 0.0, 0.0/0.0, 1.0/0.0, sqrt(-1), FLT_MIN, FLT_MIN/2.0, DBL_MIN, DBL_MIN/2.0, -1.0, 17.0, -4.0, -2.0, 3.0}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe std::numeric_limits<T>::max()
instead of FLT_MAX
etc, as we are in C++? :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
tests/sycl/relational.cpp
Outdated
s::buffer<OutType> out{{FUN_COUNT}}; | ||
{ | ||
auto inputs = in.template get_access<s::access::mode::write>(); | ||
auto outputs = out.template get_access<s::access::mode::write>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better use get_host_access()
, otherwise you get a deprecation warning once #979 is merged.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
For the clang issue, you could try whether For nvc++: nvc++ is known to be a little brittle as a compiler. It's not as stable as clang (or gcc). There are already a couple of test cases disabled for nvc++, because they cause it to crash. There are also a couple of math builtins that don't get resolved correctly by nvc++ in specific code environments with similar linker issues. We've reported those issues were we could find a minimal reproducer, but this is not always easy. Long story short: nvc++ has bugs. Just disable the test case for nvc++ in that case. |
e36c4d9
to
55b8bfc
Compare
55b8bfc
to
b928152
Compare
Bingo, that was it! :)
I've built a minimal reproducer and reported the issue in the Nvidia forums... I'm still a bit confused, but hopefully it'll get clearer. Cheers, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, that looks good so far. I'm just running some more tests that CI cannot yet do automatically (like JIT code from generic compilation flow).
Which GPUs have you already validated?
I normally test on a V100 and on an MI210. |
I've tested on NVIDIA and Intel with the generic SSCP path. Works too :-) At least the SSCP compiler does generate actual device work. Here's an excerpt of the code it generates (note the calls to
If you'd like to test with the other compilation flows (HIP/CUDA), you can try compiling EDIT: Can you rebase on current |
Hi Nuno, given all the contributions you've made/are making, I've sent an invitation to join the OpenSYCL github organization. No obligations or expectations are tied to this, it just makes it potentially easier to collaborate - and once our self-hosted CI is fixed, you'll get access to our self-hosted github runners and can test PRs on GPUs :-) PS: I realize the edit might have been easy to overlook - as stated in my previous post, this PR is just waiting for rebasing on current develop so that we can have nvc++ tests run with the updated nvc++ CI :) |
0651b3d
to
aa659cd
Compare
Hi Aksel, Thank you for the kind words and for making me part of the GitHub organisation. :-)
Done. |
I've also tried generating human-readable (debatable :-P) LLVM bitcode (.ll) with As you can see, I've commented out the host call at the bottom of the file to make sure these are due to the SYCL kernel region. But also note that all the hits are in The only thing left in my mind is if these could be host calls. Does Open SYCL take two passes, one for the host and one for the CUDA device, or just one for the latter? I'm almost sure at least some of that is device code though, since commenting out the kernel call, leaving the host call at the bottom and changing it to the sycl namespace, gives: Cheers, |
No problem :)
The However, note that we always also generate kernels for the host CPU too inside the host pass, so you still have kernel code in the It might not be very reliable to grep for |
Thanks for the explanation, that clarifies a few things. I think I'm more convinced now, what do you think? |
It's probably fine, but what you are seeing is debug information metadata, not actual code. In theory the compiler might still have evaluated the expression at compile time, and then just slapped the metadata there to say that this expression came from the |
Indeed.
Oh, I didn't think that was possible, I was maybe reading too much into "linkageName"...
|
Thanks - to me it looks good. It does contain stuff like:
where it loads a double, and then does some magic to it with some bitmasks. That is the typical result of a floating point builtin (which don't result in actual function calls). The compiler inlines the calls to |
I think you are right, again! :P Following the debugging symbols, you get:
and then,
🙌 |
Good find, I forgot about the debug symbols :D |
Hi @illuhad,
Congratulations on the name change to Open SYCL, it's looking great. :)
I created some new macros to avoid repeating the same code structure over and over.
Let me know if their names make sense to you, and double check the return types as well. :)
I just didn't implement
isnormal
because I don't know what I'd map it to for the Nvidia libdevice SSCP path.Cheers,
-Nuno