-
Notifications
You must be signed in to change notification settings - Fork 156
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add clz builtin #965
add clz builtin #965
Conversation
Nice, more awesome PRs! :D
This looks plausible to me.
That too.
You can add a test case. We do test builtin correctness for math functions here: https://github.com/OpenSYCL/OpenSYCL/blob/develop/tests/sycl/math.cpp There seems to be an issue with the current |
I've changed the declaration now the checks passes on my computer at least Currently on the host side I use a declaration like this template <class T,
std::enable_if_t<
(std::is_same_v<T, unsigned int> || std::is_same_v<T, int> ||
std::is_same_v<T, unsigned short> || std::is_same_v<T, short> ||
std::is_same_v<T, unsigned char> ||
std::is_same_v<T, signed char> || std::is_same_v<T, char>),
int> = 0>
HIPSYCL_BUILTIN T __hipsycl_clz(T x) noexcept {
return __builtin_clz(x);
}
template <class T, std::enable_if_t<(std::is_same_v<T, unsigned long> ||
std::is_same_v<T, long>),
int> = 0>
HIPSYCL_BUILTIN T __hipsycl_clz(T x) noexcept {
return __builtin_clzl(x);
}
template <class T, std::enable_if_t<(std::is_same_v<T, unsigned long long> ||
std::is_same_v<T, long long>),
int> = 0>
HIPSYCL_BUILTIN T __hipsycl_clz(T x) noexcept {
return __builtin_clzll(x);
} which does work but is rather long. Would this be ok ? |
I feel like this could be simplified. Would maybe something like this work? template<class T>
T __hipsycl_clz(T x) {
if constexpr(std::is_same_v<T, long long> || std::is_same_v<T, unsigned long long>)
return __builtin_clzll(x);
else if constexpr(std::is_same_v<T, long> || std::is_same_v<T unsigned long>)
return __builtin_clzl(x);
else
return __builtin_clz(x); // do we need static_cast<T>(...) for the return statements?
} |
CI says nvc++ is not happy: "/home/runner/work/OpenSYCL/OpenSYCL/build/install/bin/../include/CL/../hipSYCL/sycl/libkernel/builtin_interface.hpp", line 534: error: no instance of overloaded function "hipsycl::sycl::detail::hiplike_builtins::__hipsycl_clz" matches the argument list
argument types are: (char)
HIPSYCL_RETURN_DISPATCH_BUILTIN(__hipsycl_clz, x);
^
detected during instantiation of "T hipsycl::sycl::detail::__hipsycl_clz(T) noexcept [with T=char]" at line 788 of "/home/runner/work/OpenSYCL/OpenSYCL/build/install/bin/../include/CL/../hipSYCL/sycl/libkernel/builtins.hpp"
"/home/runner/work/OpenSYCL/OpenSYCL/build/install/bin/../include/CL/../hipSYCL/sycl/libkernel/builtin_interface.hpp", line 534: error: no instance of overloaded function "hipsycl::sycl::detail::hiplike_builtins::__hipsycl_clz" matches the argument list
argument types are: (signed char)
HIPSYCL_RETURN_DISPATCH_BUILTIN(__hipsycl_clz, x);
^
detected during instantiation of "T hipsycl::sycl::detail::__hipsycl_clz(T) noexcept [with T=signed char]" at line 788 of "/home/runner/work/OpenSYCL/OpenSYCL/build/install/bin/../include/CL/../hipSYCL/sycl/libkernel/builtins.hpp"
"/home/runner/work/OpenSYCL/OpenSYCL/build/install/bin/../include/CL/../hipSYCL/sycl/libkernel/builtin_interface.hpp", line 534: error: no instance of overloaded function "hipsycl::sycl::detail::hiplike_builtins::__hipsycl_clz" matches the argument list
argument types are: (unsigned char)
HIPSYCL_RETURN_DISPATCH_BUILTIN(__hipsycl_clz, x);
^
detected during instantiation of "T hipsycl::sycl::detail::__hipsycl_clz(T) noexcept [with T=unsigned char]" at line 788 of "/home/runner/work/OpenSYCL/OpenSYCL/build/install/bin/../include/CL/../hipSYCL/sycl/libkernel/builtins.hpp"
"/home/runner/work/OpenSYCL/OpenSYCL/build/install/bin/../include/CL/../hipSYCL/sycl/libkernel/builtin_interface.hpp", line 534: error: no instance of overloaded function "hipsycl::sycl::detail::hiplike_builtins::__hipsycl_clz" matches the argument list
argument types are: (short)
HIPSYCL_RETURN_DISPATCH_BUILTIN(__hipsycl_clz, x);
^
detected during instantiation of "T hipsycl::sycl::detail::__hipsycl_clz(T) noexcept [with T=short]" at line 788 of "/home/runner/work/OpenSYCL/OpenSYCL/build/install/bin/../include/CL/../hipSYCL/sycl/libkernel/builtins.hpp"
"/home/runner/work/OpenSYCL/OpenSYCL/build/install/bin/../include/CL/../hipSYCL/sycl/libkernel/builtin_interface.hpp", line 534: error: no instance of overloaded function "hipsycl::sycl::detail::hiplike_builtins::__hipsycl_clz" matches the argument list
argument types are: (unsigned short)
HIPSYCL_RETURN_DISPATCH_BUILTIN(__hipsycl_clz, x);
^
detected during instantiation of "T hipsycl::sycl::detail::__hipsycl_clz(T) noexcept [with T=unsigned short]" at line 788 of "/home/runner/work/OpenSYCL/OpenSYCL/build/install/bin/../include/CL/../hipSYCL/sycl/libkernel/builtins.hpp"
"/home/runner/work/OpenSYCL/OpenSYCL/build/install/bin/../include/CL/../hipSYCL/sycl/libkernel/builtin_interface.hpp", line 534: error: no instance of overloaded function "hipsycl::sycl::detail::hiplike_builtins::__hipsycl_clz" matches the argument list
argument types are: (long)
HIPSYCL_RETURN_DISPATCH_BUILTIN(__hipsycl_clz, x);
^
detected during instantiation of "T hipsycl::sycl::detail::__hipsycl_clz(T) noexcept [with T=long]" at line 788 of "/home/runner/work/OpenSYCL/OpenSYCL/build/install/bin/../include/CL/../hipSYCL/sycl/libkernel/builtins.hpp"
"/home/runner/work/OpenSYCL/OpenSYCL/build/install/bin/../include/CL/../hipSYCL/sycl/libkernel/builtin_interface.hpp", line 534: error: no instance of overloaded function "hipsycl::sycl::detail::hiplike_builtins::__hipsycl_clz" matches the argument list
argument types are: (unsigned long)
HIPSYCL_RETURN_DISPATCH_BUILTIN(__hipsycl_clz, x);
^
detected during instantiation of "T hipsycl::sycl::detail::__hipsycl_clz(T) noexcept [with T=unsigned long]" at line 788 of "/home/runner/work/OpenSYCL/OpenSYCL/build/install/bin/../include/CL/../hipSYCL/sycl/libkernel/builtins.hpp"
|
actually after checking the hip/cuda doc specify So I've changed it to : template<class T>
HIPSYCL_HIPLIKE_BUILTIN T __hipsycl_clz(T x) noexcept {
// use __clzll or __clz by checking the bit lenght because
// the nvidia/hip documentation mention clz as 32 bits and clzll as 64
if constexpr (sizeof(T)*CHAR_BIT == 64){
return __clzll(static_cast<__hipsycl_int64>(x));
}
return __clz(static_cast<__hipsycl_int32>(x));
} It should also solve the instantiation issues. |
Co-authored-by: Ronan Keryell <ronan@keryell.fr>
Can you confirm whether you intend to also add a test case, or if this PR should be considered for final review/testing as is? |
I think I won't be able to work on this one for a few weeks, so I won't be able to add testing to this as of now...
So do you prefer to wait until i add some test case or do you want to review it now ? |
I'll start reviewing now and merge once tests are there too :-) |
I've added the test, and cleaned the clz implementation. |
Thank you, I have just merged our new self-hosted CI with NVIDIA and AMD GPUs - so if you rebase on current |
We don't support nvcc, only nvc++ :-) (those are two very different compilers). |
yeah my bad i meant nvc++. The test in math.cpp with nvc++ passes, with the fix, tho i'm not sure if the __clz or fallback_clz version was used edit : after checking in the doc it says |
So i reverted to the previous commit. With the test passing on all config (+ self hosted), is everything ok for final review ? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
Hm.. it seems that there might still be an issue on CPU: https://github.com/OpenSYCL/OpenSYCL/actions/runs/4883593430/jobs/8716435004?pr=965 |
Yes, but i can not reproduce the issue ... |
The only potential issue I see is that it would probably evalute Let's see if it works now :) |
it's odd that the macos test doesn't work, it was ok here : https://github.com/OpenSYCL/OpenSYCL/actions/runs/4883180762/jobs/8714601525 and the only difference is this line : if(x==0){return sizeof(T)*CHAR_BIT;} and the test pass on my MAC (M1) with appleClang :'( |
Mac also failed previously, before the fix. So I wonder why changing to |
it is still correct on my Mac, I've added bunch of print statements, can you try running it only the Mac OS workflow? |
I've invited you into the OpenSYCL organization; the workflows will then run automatically for you. |
Oh thanks a lot, it will speed up the process for sure ^^ |
ok I found the issue : in the failing tests you have this ...
the clz builtin looks broken in the Mac OS workflow, also it does not look consistent and hard to check ... also I found this : https://stackoverflow.com/questions/19527897/how-undefined-are-builtin-ctz0-or-builtin-clz0 |
Good find! So basically we should always
whenever using the builtin? I do wonder though when this case is so undefined, whether the SYCL specification even mandates a specific result here? I can kind of see that the concept of leading or trailing zeros falls apart somewhat in the case of a value of 0. Good thing we noticed this edge case :) |
The SYCL standard actually says :
So I've added the check for __builtin_clz In cuda :
so this one is fine I'm not 100% sure for the others |
My guess is that SPIR-V will probably be aligned with OpenCL, which is probably aligned with SYCL. No idea for AMD. So maybe also add the check for those two backends to be sure? |
... Or maybe just add the check to the high-level builtin interface, so that the backends don't need to be concerned about this case anymore? |
As you guessed
So spir-v should be fine
Actually we can just add the check in tests, also I don't know also why in the first place the value was 0 in the inputs, it shouldn't when i'm looking at the test. |
From RadeonCompute gits: Looks like they are aware and check for it :), so looks like we only have to be careful with the host. |
Hi,
I've tried to implement the
clz
builtin in OpenSYCL by mimicking the way it was done formul24
.The part where I'm unsure whether the definition is correct are :
HIPSYCL_DEFINE_BUILTIN(clz, HIPSYCL_BUILTIN_OVERLOAD_SET_GENINTEGER, HIPSYCL_BUILTIN_GENERATOR_UNARY_T)
and
Also do you have a mechanism in place to check the correctness of builtin functions ?