-
Notifications
You must be signed in to change notification settings - Fork 405
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SYCL: Add support for Graphs #6912
base: develop
Are you sure you want to change the base?
Conversation
32b3243
to
7bb46ab
Compare
f07fc23
to
3ef4b14
Compare
@romintomasetti FYI |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do the no_immediate_command_list
changes stand on their own or are do they only make sense in the context of this PR?
core/unit_test/TestGraph.hpp
Outdated
{0, 0}}, | ||
KOKKOS_LAMBDA(int, int, int&){}, count) | ||
.then_parallel_reduce( | ||
Kokkos::TeamPolicy<TEST_EXECSPACE>{0, 1}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason not to use Kokkos::AUTO
as 2nd argument here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just noting that this is increasing coverage of the current test and could be proposed on its own.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason not to use Kokkos::AUTO as 2nd argument here?
No
Just noting that this is increasing coverage of the current test and could be proposed on its own.
Yes, we can do this independently. I only noticed that we don't test all the implementations (even though we can't do proper reductions anyway).
@@ -19,6 +19,9 @@ | |||
|
|||
#if defined(KOKKOS_ENABLE_SYCL) | |||
#include <SYCL/Kokkos_SYCL.hpp> | |||
#ifdef SYCL_EXT_ONEAPI_GRAPH | |||
#include <SYCL/Kokkos_SYCL_GraphNodeKernel.hpp> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought the current design was you get these when you explicitly include <Kokkos_Graph.hpp>
meaning we should not add it here.
Am I missing something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cuda
and HIP
are abstracting the kernel launch away but we don't have such a mechanism for the SYCL
backend (yet). This means that the parallel construct implementations are calling get_sycl_graph_node_from_kernel
directly and need this header.
For a correct parallel_reduce/parallel_scan implementation, we will need to create multiple nodes anyway and can't get away with just attaching to the node in the kernel launch abstraction.
Whether or not to use immediate command lists is independent of this pull request (the difference is whether the runtime waits for a couple of kernels before executing them batched or immediately executes them). Thus, it's something users could play with depending on which mode gives better performance. |
Then please open a separate PR for this change |
277b470
to
67597df
Compare
core/unit_test/TestGraph.hpp
Outdated
@@ -21,6 +21,7 @@ | |||
|
|||
namespace Test { | |||
|
|||
#if !defined(KOKKOS_ENABLE_SYCL) || defined(SYCL_EXT_ONEAPI_GRAPH) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure to understand why you want to disable the whole code in this file if KOKKOS_ENABLE_SYCL
is defined...
I'm also wondering who is defining SYCL_EXT_ONEAPI_GRAPH
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't the logic be in the CMake
itself ? Something like:
if(KOKKOS_ENABLE_SYCL AND SYCL_EXT_ONEAPI_GRAPH)
.... enable the SYCL graph test ...
endif()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The condition reads: only enable the test if the SYCL backend is not enabled, or if the SYCL backend is enabled and the SYCL_EXT_ONEAPI_GRAPH feature macro is defined.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't the logic be in the
CMake
itself ?
The feature macro is not a CMake
variable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. I would suggest to move the line
#if !defined(KOKKOS_ENABLE_SYCL) || defined(SYCL_EXT_ONEAPI_GRAPH)
to TestSYCL_Graph.hpp
then.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We typically auto-generate the *.cpp test files (such as TestSYCL_Graph.cpp
) which we might want to do in a follow-up. Moving the guard to the *.cpp file would prevent that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. I would strongly advise a "CMake experimental feature flag" like KOKKOS_ENABLE_SYCL_GRAPH
or so for several reasons 🔢
- I feel strongly against introducing backend-specific preprocessing guard that deactivate the whole bunch of generic tests in
TestGraph.hpp
.
This goes against the idea of a portable test file and is a backdoor for future maintenance headaches. - It feels strange that if a user has the
SYCL
backend enabled, but hasn't support forSYCL_EXT_ONEAPI_GRAPH
, still there will be a test target generated
forTestSYCL_Graph.cpp
that actually does nothing. To me, such a test deactivation should be done in theCMakeLists.txt
for sure. - Am I right to say that if
KOKKOS_ENABLE_SYCL
is defined, butSYCL_EXT_ONEAPI_GRAPH
is not, thenKokkos::Graph<SYCL>
will fallback on the
defaultGraphImpl
? If so, I see that as a very big warning sign⚠️ . Indeed, as a user that wants theSYCL
specialization, I would not be happy to see that
Kokkos::Graph<SYCL>
silently falls back on the defaultGraph
implementation...
For these reasons, I strongly suggest that you introduce KOKKOS_ENABLE_SYCL_GRAPH
or so. It would be an "experimental feature" flag that, if set by the user,
enforces the SYCL
Graph
specialization (and therefore ensures that compilation fails if SYCL_EXT_ONEAPI_GRAPH
is not supported).
You could also check for the SYCL_EXT_ONEAPI_GRAPH
support with some kind of try_compile
.
The logic could be align these lines:
if(NOT DEFINED KOKKOS_ENABLE_SYCL_GRAPH)
try_compile(...)
if(COMPILATION_SUCCEEDED)
set(KOKKOS_ENABLE_SYCL_GRAPH ON)
endif()
endif()
...
if(KOKKOS_ENABLE_SYCL_GRAPH)
add_test(TestSYCL_Graph.cpp)
endif()
What do you think ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- I feel strongly against introducing backend-specific preprocessing guard that deactivate the whole bunch of generic tests in TestGraph.hpp.
- It feels strange that if a user has the SYCL backend enabled, but hasn't support for SYCL_EXT_ONEAPI_GRAPH, still there will be a test target generated
for TestSYCL_Graph.cpp that actually does nothing. To me, such a test deactivation should be done in the CMakeLists.txt for sure.
We shouldn't actually need that guard and can just always execute the test even if no specialization is available, see #7011.
Am I right to say that if KOKKOS_ENABLE_SYCL is defined, but SYCL_EXT_ONEAPI_GRAPH is not, then Kokkos::Graph will fallback on the
default GraphImpl? If so, I see that as a very big warning sign⚠️ . Indeed, as a user that wants the SYCL specialization, I would not be happy to see that
Kokkos::Graph silently falls back on the default Graph implementation...
That's how the Graph implementation is designed and we should not start breaking Graphs for users if there is a specialization but it's not active in their configuration. We always reserve the right to improve performance based on the backend capability without the need for the user to pitch in. I added information about SYCL_EXT_ONEAPI_GRAPH
to print_configuration
, though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We shouldn't actually need that guard and can just always execute the test even if no specialization is available, see #7011.
Nice 👍
That's how the Graph implementation is designed and we should not start breaking Graphs for users if there is a specialization but it's not active in their configuration. We always reserve the right to improve performance based on the backend capability without the need for the user to pitch in. I added information about SYCL_EXT_ONEAPI_GRAPH to print_configuration, though.
My suggestion is about allowing the user to "fore opt-in into the specialization or stop compilation" 😉 So if I define KOKKOS_ENABLE_SYCL_GRAPH_IMPL
, and for some reason it will not be possible to compile the SYCL
specialization, I would prefer that Kokkos
does not compile, rather than falling back to the default implementation. However, if I don't define KOKKOS_ENABLE_SYCL_GRAPH_IMPL
, then I agree with you that Kokkos
can do whatever it want.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand what your use case is. Why would you rather fail compilation than use a fallback implementation? Since KOKKOS_ENABLE_SYCL_GRAPH_IMPL
is not portable, you can just as well check for SYCL_EXT_ONEAPI_GRAPH
in user code.
Note that we already have
#if !((HIP_VERSION_MAJOR == 5) && (HIP_VERSION_MINOR == 2))
#define KOKKOS_IMPL_HIP_GRAPH_ENABLED
#endif
which silently enables or disables the Graph specialization based on the compiler version.
16ed361
to
8da765f
Compare
64a959a
to
96209c1
Compare
96209c1
to
358416e
Compare
static sycl::event sycl_direct_launch(const Policy& policy, | ||
const Functor& functor, | ||
const sycl::event& memcpy_event) { | ||
sycl::event sycl_direct_launch(const Policy& policy, const Functor& functor, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just observing that this PR makes all the Parallel<Construct>::sycl_direct_launch
have a somewhat misleading name because they now either launch a kernel or append a node to some graph.
Any thought on whether/how we can improve?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, we could inline sycl_direct_launch
into execute
but that's just as misleading (but there is no good reason anymore not to do it other than that there is little gain from it).
Just as for all other Graph specializations, the implementation calls execute
since that's where the functor is set up. In the case of HIP
and Cuda
, the functor is then passed to a launch mechanism outside, and only there treating it as a regular kernel or graph node happens. This won't work once a parallel construct requires more than one node though.
We could think about renaming execute
everywhere and split it into a prepare_kernel
and a launch
step and then introduce a setup_graph_node
function (which might just call launch
for Cuda and HIP but would call sycl_attach_kernel_to_node
for SYCL).
if constexpr (Policy::is_graph_kernel::value) | ||
Kokkos::abort( | ||
"parallel_reduce not implemented for graph kernels if result is " | ||
"not device-accessible!"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is that something we could turn into a compile error?
(Not necessarily blocking, just wondering if you look into this and if it is doable)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a little tricky since we don't know at compile-time at this point whether parallel_reduce was called with a result parameter or not.
} | ||
|
||
void set_sycl_graph_node_ptr( | ||
std::optional<sycl::ext::oneapi::experimental::node>* arg_node) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So you would prefer pointer to optional to resemble the CUDA/HIP code?
I suppose that's fine.
5e530ad
to
548d4dc
Compare
I can confirm that this works correctly on the Intel testbeds with two different compiler drops (one of them requires |
This pull request adds support for Graphs based on https://github.com/intel/llvm/blob/sycl/sycl/doc/extensions/experimental/sycl_ext_oneapi_graph.asciidoc. The current status is sufficient to pass all related unit tests.
Note that:
SYCL_EXT_ONEAPI_GRAPH
needs to be defined