Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SYCL2020] Migrate information descriptors to their respective namespaces #987

Merged
merged 11 commits into from
May 3, 2023

Conversation

nmnobre
Copy link
Member

@nmnobre nmnobre commented Mar 27, 2023

Hi Aksel,

I've tried to move all the information descriptors from their old enum classes to their new namespaces.
The motivation for this is that info::device::max_work_item_sizes is now a template and I couldn't figure out a clean way of maintaining the old way of doing things while implementing this template...
I've not removed or added any functionality and, in particular, that means that old features like the program class and its information descriptors remain, just with new syntax. I believe this should be completely transparent for the user, it's just the slightly awkward situation of implementing some parts of the old standard in the new standard's way...

Closes #774.

Cheers,
-Nuno

Copy link
Collaborator

@illuhad illuhad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow, Nuno, you are on fire :D

rt::device_uint_property::max_group_size0));
std::size_t size1 = static_cast<std::size_t>(get_rt_device()->get_property(
rt::device_uint_property::max_group_size1));
return id<2>{size0, size1};
Copy link
Collaborator

@illuhad illuhad Mar 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we need to be extra careful. The main motivation for switching to structs instead of the old enums in SYCL 2020 was exactly this query: SYCL implementations for some cases need to flip the dimensions for some backends.
E.g. on CUDA, the "vectorized" index is always x (i.e. 0), while the SYCL memory layout wants the highest index to be the fast one.
So, on such backends, SYCL implementations flip indices such that from the perspective of the hardware, x is the fast index, while from the application point of view it remains the highest index.

I believe that these runtime queries return directly the result from the backend for the respective dimension (Can you double check?). Therefore, to make everything match, we would need to check here whether a flip is required (this might require a new property needs_index_flip or similar for the runtime), and if so, return {size1, size0}.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think you suggested that in #832, I'll look into that...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Aksel, I've taken a shortcut by just flipping the return value of the query irrespectively of the runtime backend.
Here's my reasoning:

  • CUDA: the flip is needed, thus covered.
  • HIP: since they've followed the CUDA implementation so closely, I'd assume the flip also applies. In any case, I've confirmed that on both the MI100 and MI200 series, the limits are the same for the three dimensions (1024), so it's rather irrelevant.
  • OpenMP: the limits are currently being hardcoded by us and they are the same for all three dimensions (1024).
  • Level Zero: not sure here, never tried this backend myself if I'm honest, but all examples I found online seem to indicate the limits are also the same for the three dimensions. See here and also here.

Thoughts?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's probably true that currently we might be able to get by by just flipping in any case. HIP, CUDA and Level Zero does the flip, and OpenMP does not (but exposes similar limits in all dimensions as you point out - although it's questionable whether this is actually correct).

My concern is that this introduces an assumption that might not hold anymore in the future e.g. if new backends are introduced, and might be surprising then. To me it seems the better strategy long term to ask those components that actually know the details (such as the runtime backends) instead of making assumptions in the SYCL headers.

Is there a problem introducing new runtime queries?

An alternative to asking whether things should flip could be to directly add a runtime query for the 1, 2, and 3-dimensional maximums. That might be even more flexible.

Copy link
Member Author

@nmnobre nmnobre Apr 3, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, there is no problem, it's just I guess I started wondering if it was worth the trouble. :-)
Please have a look now, do I just leave the properties hardcoded or is there a better way of querying the backends?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks good to me. What do you mean by hardcoded properties for querying the backends?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, you knew that...

HIP, CUDA and Level Zero does the flip, and OpenMP does not

But I didn't. So I was wondering if there is a property of the respective runtimes I could query that would expose the need for flipping instead of just returning the boolean literals explicitly like we're doing now.

return id<2>{size0, size1};
}

HIPSYCL_SPECIALIZE_GET_INFO(device, max_work_item_sizes<3>)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here

@nmnobre nmnobre force-pushed the templatez branch 2 times, most recently from 4b8096f to f8f24d5 Compare April 3, 2023 15:30
@@ -60,6 +60,7 @@ enum class device_uint_property {
max_global_size0,
max_global_size1,
max_global_size2,
max_group_sizes_need_flip,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this specific to max_group_sizes? It also flips the work item indices passed into the kernel, global ranges etc. So maybe just flips_dimensions or needs_dimension_flip or something like that?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, I've opted for needs_dimension_flip.
So I expect there's other places, e.g. kernel launches, that need tweaking, can you point me to those?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really. The relevant code would be the kernel launchers (inside glue), but those are already backend-specific and know the right behavior for their backend.
The one exception is the sscp kernel launcher, but that one currently only supports backends that do the flip, and so it just flips. Even then I'm not sure it would be possible to have the SSCP compilation flow flip only in some cases, since the assignment of builtin queries to indices would need to change which is handled inside the SYCL headers, so outside the control of the SSCP kernel launcher.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, so do you still think it's worth having this property at all?
I guess it makes sense if you can think of a backend out there that doesn't do the flip and whose limits aren't the same for every dimension.

Otherwise, I'm happy if you are with this PR. :-)

struct param_traits {};

#define HIPSYCL_PARAM_TRAIT_RETURN_VALUE(T, param, ret_value) \
template<> \
struct param_traits<T, param> \
struct param_traits<param> \
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've just looked at the spec again to check what it says on param_traits - the only mention I find is this:

Information query descriptors have been changed to structures under namespaces named accordingly. param_traits has been removed and the return type of an information query is now contained in the descriptor. [...]

from here: https://registry.khronos.org/SYCL/specs/sycl-2020/html/sycl-2020.html#sec:what-changed-between

However, I can't find any mention of how exactly the return type should be exposed in the struct. Arguably, that one quote might not even be normative since it just describes the changes as a changelog.

Are you aware of any place where it says more?
I'm not sure it makes sense to do the required refactoring if the spec does not even clearly say anything. What do you think?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right... I didn't think this was standardised. Well, looking at what our friends in blue did:

https://github.com/intel/llvm/blob/f81499ecad464bd03d7f50e95004eeed1e85dbc9/sycl/include/sycl/info/info_desc.hpp#L26-L29

We should probably do something similar then... I'm thinking of removing param_traits.hpp and creating two macros in info.hpp, one for the plain case and the other for the templated case, just like their __SYCL_PARAM_TRAITS_SPEC and __SYCL_PARAM_TRAITS_TEMPLATE_SPEC. Help me choose some names:
HIPSYCL_INFO_DESCRIPTOR(param, ret_value) and HIPSYCL_INFO_DESCRIPTOR_TEMPLATE(...)?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, I'd say that arguably it's not standardized.. That one sentence is only a changelog. So, we could adopt the point of view that this is not required. However, it is possible that, if this is discussed inside the SYCL WG, it is found that this is just an oversight, and we have to implement the type inside the descriptor classes.

If you want to make the transition, I agree with the approach from DPC++.

Do we actually need two macros?

#define HIPSYCL_DECLARE_INFO_DESCRIPTOR(param, ret_type) \
 struct param { using return_type = ret_type; };

template<int Dim>
HIPSYCL_DECLARE_INFO_DESCRIPTOR(max_group_sizes, range<Dim>)

?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True. Solid.

I'd just rather argue this a definition, and so HIPSYCL_DEFINE_INFO_DESCRIPTOR().

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're totally right. Define it is :-)

Copy link
Member Author

@nmnobre nmnobre Apr 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you aware of any place where it says more? I'm not sure it makes sense to do the required refactoring if the spec does not even clearly say anything. What do you think?

I found this for get_info() from https://registry.khronos.org/SYCL/specs/sycl-2020/html/sycl-2020.html#table.members.device:

"Queries this SYCL device for information requested by the template parameter Param. The type alias Param::return_type must be defined in accordance with the info parameters in Table 25 to facilitate returning the type associated with the Param parameter."

Copy link
Member Author

@nmnobre nmnobre Apr 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In hindsight, I'm supper happy we decided to do the refactoring.
Same functionality (or just slightly more) in 60% of the lines of code. 🎉

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, looks much better :) Thanks for all this work :)

@nmnobre nmnobre force-pushed the templatez branch 2 times, most recently from edf6a72 to cc0e979 Compare April 27, 2023 16:09
@nmnobre nmnobre requested a review from illuhad April 27, 2023 19:02
@nmnobre nmnobre force-pushed the templatez branch 3 times, most recently from 8db9d4b to a3151fa Compare May 2, 2023 21:32
HIPSYCL_DEFINE_INFO_DESCRIPTOR(max_work_item_dimensions, detail::u_int);

template<int Dimensions = 3>
HIPSYCL_DEFINE_INFO_DESCRIPTOR(max_work_item_sizes, id<Dimensions>);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be range<Dim>, not id. Looks like this is a preexisting bug, so we could also fix this in a separate PR if your prefer. https://registry.khronos.org/SYCL/specs/sycl-2020/html/sycl-2020.html#_device_information_descriptors

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@nmnobre nmnobre force-pushed the templatez branch 2 times, most recently from 18af41e to 8b1e73c Compare May 3, 2023 14:00
Copy link
Collaborator

@illuhad illuhad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Nuno!

@illuhad illuhad merged commit d2bd9fc into AdaptiveCpp:develop May 3, 2023
17 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Information descriptors are enum values, must be structs
2 participants