-
Notifications
You must be signed in to change notification settings - Fork 156
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SYCL2020] Migrate information descriptors to their respective namespaces #987
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wow, Nuno, you are on fire :D
include/hipSYCL/sycl/device.hpp
Outdated
rt::device_uint_property::max_group_size0)); | ||
std::size_t size1 = static_cast<std::size_t>(get_rt_device()->get_property( | ||
rt::device_uint_property::max_group_size1)); | ||
return id<2>{size0, size1}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here we need to be extra careful. The main motivation for switching to structs instead of the old enums in SYCL 2020 was exactly this query: SYCL implementations for some cases need to flip the dimensions for some backends.
E.g. on CUDA, the "vectorized" index is always x (i.e. 0), while the SYCL memory layout wants the highest index to be the fast one.
So, on such backends, SYCL implementations flip indices such that from the perspective of the hardware, x is the fast index, while from the application point of view it remains the highest index.
I believe that these runtime queries return directly the result from the backend for the respective dimension (Can you double check?). Therefore, to make everything match, we would need to check here whether a flip is required (this might require a new property needs_index_flip
or similar for the runtime), and if so, return {size1, size0}
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I think you suggested that in #832, I'll look into that...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Aksel, I've taken a shortcut by just flipping the return value of the query irrespectively of the runtime backend.
Here's my reasoning:
- CUDA: the flip is needed, thus covered.
- HIP: since they've followed the CUDA implementation so closely, I'd assume the flip also applies. In any case, I've confirmed that on both the MI100 and MI200 series, the limits are the same for the three dimensions (1024), so it's rather irrelevant.
- OpenMP: the limits are currently being hardcoded by us and they are the same for all three dimensions (1024).
- Level Zero: not sure here, never tried this backend myself if I'm honest, but all examples I found online seem to indicate the limits are also the same for the three dimensions. See here and also here.
Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's probably true that currently we might be able to get by by just flipping in any case. HIP, CUDA and Level Zero does the flip, and OpenMP does not (but exposes similar limits in all dimensions as you point out - although it's questionable whether this is actually correct).
My concern is that this introduces an assumption that might not hold anymore in the future e.g. if new backends are introduced, and might be surprising then. To me it seems the better strategy long term to ask those components that actually know the details (such as the runtime backends) instead of making assumptions in the SYCL headers.
Is there a problem introducing new runtime queries?
An alternative to asking whether things should flip could be to directly add a runtime query for the 1, 2, and 3-dimensional maximums. That might be even more flexible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, there is no problem, it's just I guess I started wondering if it was worth the trouble. :-)
Please have a look now, do I just leave the properties hardcoded or is there a better way of querying the backends?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks good to me. What do you mean by hardcoded properties for querying the backends?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, you knew that...
HIP, CUDA and Level Zero does the flip, and OpenMP does not
But I didn't. So I was wondering if there is a property of the respective runtimes I could query that would expose the need for flipping instead of just returning the boolean literals explicitly like we're doing now.
return id<2>{size0, size1}; | ||
} | ||
|
||
HIPSYCL_SPECIALIZE_GET_INFO(device, max_work_item_sizes<3>) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here
4b8096f
to
f8f24d5
Compare
include/hipSYCL/runtime/hardware.hpp
Outdated
@@ -60,6 +60,7 @@ enum class device_uint_property { | |||
max_global_size0, | |||
max_global_size1, | |||
max_global_size2, | |||
max_group_sizes_need_flip, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this specific to max_group_sizes
? It also flips the work item indices passed into the kernel, global ranges etc. So maybe just flips_dimensions
or needs_dimension_flip
or something like that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True, I've opted for needs_dimension_flip
.
So I expect there's other places, e.g. kernel launches, that need tweaking, can you point me to those?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not really. The relevant code would be the kernel launchers (inside glue
), but those are already backend-specific and know the right behavior for their backend.
The one exception is the sscp kernel launcher, but that one currently only supports backends that do the flip, and so it just flips. Even then I'm not sure it would be possible to have the SSCP compilation flow flip only in some cases, since the assignment of builtin queries to indices would need to change which is handled inside the SYCL headers, so outside the control of the SSCP kernel launcher.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, so do you still think it's worth having this property at all?
I guess it makes sense if you can think of a backend out there that doesn't do the flip and whose limits aren't the same for every dimension.
Otherwise, I'm happy if you are with this PR. :-)
struct param_traits {}; | ||
|
||
#define HIPSYCL_PARAM_TRAIT_RETURN_VALUE(T, param, ret_value) \ | ||
template<> \ | ||
struct param_traits<T, param> \ | ||
struct param_traits<param> \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've just looked at the spec again to check what it says on param_traits
- the only mention I find is this:
Information query descriptors have been changed to structures under namespaces named accordingly. param_traits has been removed and the return type of an information query is now contained in the descriptor. [...]
from here: https://registry.khronos.org/SYCL/specs/sycl-2020/html/sycl-2020.html#sec:what-changed-between
However, I can't find any mention of how exactly the return type should be exposed in the struct. Arguably, that one quote might not even be normative since it just describes the changes as a changelog.
Are you aware of any place where it says more?
I'm not sure it makes sense to do the required refactoring if the spec does not even clearly say anything. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right... I didn't think this was standardised. Well, looking at what our friends in blue did:
We should probably do something similar then... I'm thinking of removing param_traits.hpp
and creating two macros in info.hpp
, one for the plain case and the other for the templated case, just like their __SYCL_PARAM_TRAITS_SPEC
and __SYCL_PARAM_TRAITS_TEMPLATE_SPEC
. Help me choose some names:
HIPSYCL_INFO_DESCRIPTOR(param, ret_value)
and HIPSYCL_INFO_DESCRIPTOR_TEMPLATE(...)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, I'd say that arguably it's not standardized.. That one sentence is only a changelog. So, we could adopt the point of view that this is not required. However, it is possible that, if this is discussed inside the SYCL WG, it is found that this is just an oversight, and we have to implement the type inside the descriptor classes.
If you want to make the transition, I agree with the approach from DPC++.
Do we actually need two macros?
#define HIPSYCL_DECLARE_INFO_DESCRIPTOR(param, ret_type) \
struct param { using return_type = ret_type; };
template<int Dim>
HIPSYCL_DECLARE_INFO_DESCRIPTOR(max_group_sizes, range<Dim>)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True. Solid.
I'd just rather argue this a definition, and so HIPSYCL_DEFINE_INFO_DESCRIPTOR()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're totally right. Define it is :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you aware of any place where it says more? I'm not sure it makes sense to do the required refactoring if the spec does not even clearly say anything. What do you think?
I found this for get_info()
from https://registry.khronos.org/SYCL/specs/sycl-2020/html/sycl-2020.html#table.members.device:
"Queries this SYCL device for information requested by the template parameter Param. The type alias Param::return_type must be defined in accordance with the info parameters in Table 25 to facilitate returning the type associated with the Param parameter."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In hindsight, I'm supper happy we decided to do the refactoring.
Same functionality (or just slightly more) in 60% of the lines of code. 🎉
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, looks much better :) Thanks for all this work :)
edf6a72
to
cc0e979
Compare
8db9d4b
to
a3151fa
Compare
include/hipSYCL/sycl/info/device.hpp
Outdated
HIPSYCL_DEFINE_INFO_DESCRIPTOR(max_work_item_dimensions, detail::u_int); | ||
|
||
template<int Dimensions = 3> | ||
HIPSYCL_DEFINE_INFO_DESCRIPTOR(max_work_item_sizes, id<Dimensions>); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should be range<Dim>
, not id. Looks like this is a preexisting bug, so we could also fix this in a separate PR if your prefer. https://registry.khronos.org/SYCL/specs/sycl-2020/html/sycl-2020.html#_device_information_descriptors
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
18af41e
to
8b1e73c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Nuno!
Hi Aksel,
I've tried to move all the information descriptors from their old enum classes to their new namespaces.
The motivation for this is that
info::device::max_work_item_sizes
is now a template and I couldn't figure out a clean way of maintaining the old way of doing things while implementing this template...I've not removed or added any functionality and, in particular, that means that old features like the
program
class and its information descriptors remain, just with new syntax. I believe this should be completely transparent for the user, it's just the slightly awkward situation of implementing some parts of the old standard in the new standard's way...Closes #774.
Cheers,
-Nuno