Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SYCL: Add support for Graphs #6912

Open
wants to merge 38 commits into
base: develop
Choose a base branch
from

Conversation

masterleinad
Copy link
Contributor

@masterleinad masterleinad commented Apr 2, 2024

This pull request adds support for Graphs based on https://github.com/intel/llvm/blob/sycl/sycl/doc/extensions/experimental/sycl_ext_oneapi_graph.asciidoc. The current status is sufficient to pass all related unit tests.
Note that:

  • immediate command lists are not supported in some implementations
  • SYCL_EXT_ONEAPI_GRAPH needs to be defined
  • we don't deal with scratch memory at all (same as for Cuda and HIP) so Team policy and parallel_reduce/parallel_scan are not properly supported.

@masterleinad masterleinad marked this pull request as ready for review May 8, 2024 20:51
@masterleinad
Copy link
Contributor Author

@romintomasetti FYI

Copy link
Member

@dalg24 dalg24 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do the no_immediate_command_list changes stand on their own or are do they only make sense in the context of this PR?

{0, 0}},
KOKKOS_LAMBDA(int, int, int&){}, count)
.then_parallel_reduce(
Kokkos::TeamPolicy<TEST_EXECSPACE>{0, 1},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason not to use Kokkos::AUTO as 2nd argument here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just noting that this is increasing coverage of the current test and could be proposed on its own.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason not to use Kokkos::AUTO as 2nd argument here?

No

Just noting that this is increasing coverage of the current test and could be proposed on its own.

Yes, we can do this independently. I only noticed that we don't test all the implementations (even though we can't do proper reductions anyway).

@@ -19,6 +19,9 @@

#if defined(KOKKOS_ENABLE_SYCL)
#include <SYCL/Kokkos_SYCL.hpp>
#ifdef SYCL_EXT_ONEAPI_GRAPH
#include <SYCL/Kokkos_SYCL_GraphNodeKernel.hpp>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought the current design was you get these when you explicitly include <Kokkos_Graph.hpp> meaning we should not add it here.
Am I missing something?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cuda and HIP are abstracting the kernel launch away but we don't have such a mechanism for the SYCL backend (yet). This means that the parallel construct implementations are calling get_sycl_graph_node_from_kernel directly and need this header.

For a correct parallel_reduce/parallel_scan implementation, we will need to create multiple nodes anyway and can't get away with just attaching to the node in the kernel launch abstraction.

@masterleinad
Copy link
Contributor Author

Do the no_immediate_command_list changes stand on their own or are do they only make sense in the context of this PR?

Whether or not to use immediate command lists is independent of this pull request (the difference is whether the runtime waits for a couple of kernels before executing them batched or immediately executes them). Thus, it's something users could play with depending on which mode gives better performance.
The reason to introduce the option here is that the initial implementation didn't support immediate command lists. This limitation was removed in intel/llvm#12279 and oneapi-src/unified-runtime#1218 but is not included in the default versions on the testbeds.

@dalg24
Copy link
Member

dalg24 commented May 9, 2024

Then please open a separate PR for this change

@@ -21,6 +21,7 @@

namespace Test {

#if !defined(KOKKOS_ENABLE_SYCL) || defined(SYCL_EXT_ONEAPI_GRAPH)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure to understand why you want to disable the whole code in this file if KOKKOS_ENABLE_SYCL is defined...

I'm also wondering who is defining SYCL_EXT_ONEAPI_GRAPH ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't the logic be in the CMake itself ? Something like:

if(KOKKOS_ENABLE_SYCL AND SYCL_EXT_ONEAPI_GRAPH)
    .... enable the SYCL graph test ...
endif()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The condition reads: only enable the test if the SYCL backend is not enabled, or if the SYCL backend is enabled and the SYCL_EXT_ONEAPI_GRAPH feature macro is defined.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't the logic be in the CMake itself ?

The feature macro is not a CMake variable.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. I would suggest to move the line

#if !defined(KOKKOS_ENABLE_SYCL) || defined(SYCL_EXT_ONEAPI_GRAPH)

to TestSYCL_Graph.hpp then.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We typically auto-generate the *.cpp test files (such as TestSYCL_Graph.cpp) which we might want to do in a follow-up. Moving the guard to the *.cpp file would prevent that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. I would strongly advise a "CMake experimental feature flag" like KOKKOS_ENABLE_SYCL_GRAPH or so for several reasons 🔢

  1. I feel strongly against introducing backend-specific preprocessing guard that deactivate the whole bunch of generic tests in TestGraph.hpp.
    This goes against the idea of a portable test file and is a backdoor for future maintenance headaches.
  2. It feels strange that if a user has the SYCL backend enabled, but hasn't support for SYCL_EXT_ONEAPI_GRAPH, still there will be a test target generated
    for TestSYCL_Graph.cpp that actually does nothing. To me, such a test deactivation should be done in the CMakeLists.txt for sure.
  3. Am I right to say that if KOKKOS_ENABLE_SYCL is defined, but SYCL_EXT_ONEAPI_GRAPH is not, then Kokkos::Graph<SYCL> will fallback on the
    default GraphImpl? If so, I see that as a very big warning sign ⚠️. Indeed, as a user that wants the SYCL specialization, I would not be happy to see that
    Kokkos::Graph<SYCL> silently falls back on the default Graph implementation...

For these reasons, I strongly suggest that you introduce KOKKOS_ENABLE_SYCL_GRAPH or so. It would be an "experimental feature" flag that, if set by the user,
enforces the SYCL Graph specialization (and therefore ensures that compilation fails if SYCL_EXT_ONEAPI_GRAPH is not supported).

You could also check for the SYCL_EXT_ONEAPI_GRAPH support with some kind of try_compile.

The logic could be align these lines:

if(NOT DEFINED KOKKOS_ENABLE_SYCL_GRAPH)
    try_compile(...)
    if(COMPILATION_SUCCEEDED)
        set(KOKKOS_ENABLE_SYCL_GRAPH ON)
    endif()
endif()

...

if(KOKKOS_ENABLE_SYCL_GRAPH)
    add_test(TestSYCL_Graph.cpp)
endif()

What do you think ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. I feel strongly against introducing backend-specific preprocessing guard that deactivate the whole bunch of generic tests in TestGraph.hpp.
  2. It feels strange that if a user has the SYCL backend enabled, but hasn't support for SYCL_EXT_ONEAPI_GRAPH, still there will be a test target generated
    for TestSYCL_Graph.cpp that actually does nothing. To me, such a test deactivation should be done in the CMakeLists.txt for sure.

We shouldn't actually need that guard and can just always execute the test even if no specialization is available, see #7011.

Am I right to say that if KOKKOS_ENABLE_SYCL is defined, but SYCL_EXT_ONEAPI_GRAPH is not, then Kokkos::Graph will fallback on the
default GraphImpl? If so, I see that as a very big warning sign ⚠️. Indeed, as a user that wants the SYCL specialization, I would not be happy to see that
Kokkos::Graph silently falls back on the default Graph implementation...

That's how the Graph implementation is designed and we should not start breaking Graphs for users if there is a specialization but it's not active in their configuration. We always reserve the right to improve performance based on the backend capability without the need for the user to pitch in. I added information about SYCL_EXT_ONEAPI_GRAPH to print_configuration, though.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We shouldn't actually need that guard and can just always execute the test even if no specialization is available, see #7011.

Nice 👍

That's how the Graph implementation is designed and we should not start breaking Graphs for users if there is a specialization but it's not active in their configuration. We always reserve the right to improve performance based on the backend capability without the need for the user to pitch in. I added information about SYCL_EXT_ONEAPI_GRAPH to print_configuration, though.

My suggestion is about allowing the user to "fore opt-in into the specialization or stop compilation" 😉 So if I define KOKKOS_ENABLE_SYCL_GRAPH_IMPL, and for some reason it will not be possible to compile the SYCL specialization, I would prefer that Kokkos does not compile, rather than falling back to the default implementation. However, if I don't define KOKKOS_ENABLE_SYCL_GRAPH_IMPL, then I agree with you that Kokkos can do whatever it want.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand what your use case is. Why would you rather fail compilation than use a fallback implementation? Since KOKKOS_ENABLE_SYCL_GRAPH_IMPL is not portable, you can just as well check for SYCL_EXT_ONEAPI_GRAPH in user code.

Note that we already have

#if !((HIP_VERSION_MAJOR == 5) && (HIP_VERSION_MINOR == 2))
#define KOKKOS_IMPL_HIP_GRAPH_ENABLED
#endif

which silently enables or disables the Graph specialization based on the compiler version.

static sycl::event sycl_direct_launch(const Policy& policy,
const Functor& functor,
const sycl::event& memcpy_event) {
sycl::event sycl_direct_launch(const Policy& policy, const Functor& functor,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just observing that this PR makes all the Parallel<Construct>::sycl_direct_launch have a somewhat misleading name because they now either launch a kernel or append a node to some graph.
Any thought on whether/how we can improve?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, we could inline sycl_direct_launch into execute but that's just as misleading (but there is no good reason anymore not to do it other than that there is little gain from it).
Just as for all other Graph specializations, the implementation calls execute since that's where the functor is set up. In the case of HIP and Cuda, the functor is then passed to a launch mechanism outside, and only there treating it as a regular kernel or graph node happens. This won't work once a parallel construct requires more than one node though.

We could think about renaming execute everywhere and split it into a prepare_kernel and a launch step and then introduce a setup_graph_node function (which might just call launch for Cuda and HIP but would call sycl_attach_kernel_to_node for SYCL).

if constexpr (Policy::is_graph_kernel::value)
Kokkos::abort(
"parallel_reduce not implemented for graph kernels if result is "
"not device-accessible!");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is that something we could turn into a compile error?
(Not necessarily blocking, just wondering if you look into this and if it is doable)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a little tricky since we don't know at compile-time at this point whether parallel_reduce was called with a result parameter or not.

core/unit_test/TestGraph.hpp Outdated Show resolved Hide resolved
core/src/SYCL/Kokkos_SYCL_Graph_Impl.hpp Outdated Show resolved Hide resolved
core/src/SYCL/Kokkos_SYCL_Graph_Impl.hpp Show resolved Hide resolved
core/src/SYCL/Kokkos_SYCL_Graph_Impl.hpp Show resolved Hide resolved
core/src/SYCL/Kokkos_SYCL_Graph_Impl.hpp Outdated Show resolved Hide resolved
}

void set_sycl_graph_node_ptr(
std::optional<sycl::ext::oneapi::experimental::node>* arg_node) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So you would prefer pointer to optional to resemble the CUDA/HIP code?
I suppose that's fine.

core/src/SYCL/Kokkos_SYCL_Graph_Impl.hpp Outdated Show resolved Hide resolved
core/src/SYCL/Kokkos_SYCL_Graph_Impl.hpp Show resolved Hide resolved
core/src/SYCL/Kokkos_SYCL_GraphNodeKernel.hpp Outdated Show resolved Hide resolved
core/src/SYCL/Kokkos_SYCL_GraphNodeKernel.hpp Outdated Show resolved Hide resolved
core/src/SYCL/Kokkos_SYCL_GraphNodeKernel.hpp Outdated Show resolved Hide resolved
@masterleinad
Copy link
Contributor Author

I can confirm that this works correctly on the Intel testbeds with two different compiler drops (one of them requires SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=0).

@masterleinad masterleinad requested a review from nliber May 31, 2024 17:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants