Use reference-counting of user SYCL objects to manage runtime lifetime #749

illuhad · 2022-06-14T19:36:23Z

Previously, the runtime was a singleton that lived within libhipSYCL-rt. This is occasionally caused issues with the shutdown behavior: Because the unload order of shared libraries is not well defined, it could happen that backend libraries (e.g. the CUDA runtime) unload before the hipSYCL runtime. This can prevent a clean release of allocated resources.

This PR fixes this at a fundamental level:

The runtime no longer is a global singleton; instead it just manages a std::weak_ptr<runtime>.
User-facing SYCL objects (queue, buffer, context etc) obtain a std::shared_ptr<runtime> from the weak pointer. If the weak_ptr is not currently valid, a new runtime object is constructed. This is abstracted using a runtime_keep_alive_token, which obtains the shared_ptr in its constructor.
The runtime internally forwards runtime* pointers to all its classes that need it - here we cannot use shared_ptr to avoid cyclic references or deadlocks during startup or shutdown.

With this approach, the runtime will automatically shutdown when there are no SYCL classes used in the main application anymore. If there are such classes again, the runtime starts up again.

Generally SYCL applications cannot really assume a global SYCL state in other implementations either, however hipSYCL previously exhibited this behavior. This can have performance impact, since launching the runtime is not free. The old behavior can be retrieved by setting the environment variable HIPSYCL_PERSISTENT_RUNTIME=1.

psalz · 2022-07-14T14:25:21Z

In Celerity we used hipsycl::rt::application::dag().flush_async(); to immediately start execution of submitted work (previous discussion in #599). This API seems to have been removed in this patch. Would this call be equivalent? hipsycl::rt::application::get_runtime_pointer()->dag().flush_async();.

illuhad · 2022-07-14T15:05:24Z

In principle yes, although get_runtime_pointer() is a bit of an implementation detail and not super-efficient because it locks a mutex.
A more idiomatic approach is to:

Store rt::runtime_keep_alive_token in some object that is central to celerity (something like a celerity context?). runtime_keep_alive_token will obtain a shared_ptr to the runtime in its constructor, thereby only locking the mutex once in its constructor, and keeping the hipSYCL runtime alive as long as it exists. You can use the get() member function to get a pointer to the runtime efficiently.
Alternatively, all objects from the hipSYCL SYCL interface already have such a runtime_keep_alive_token, and you can actually leverage that using context::hipSYCL_runtime(), e.g. my_queue.get_context().hipSYCL_runtime()->dag().flush_async().

psalz · 2022-07-14T15:09:21Z

Perfect, thank you!

illuhad added 4 commits May 23, 2022 18:21

Merge branch 'develop' into feature/global-kernel-cache

0888bbc

Use reference-counted mechanisms to manage runtime lifetime

83e290f

Merge branch 'develop' into feature/rt-auto-shutdown

84d9d64

Remove incorrect include

40d2cca

illuhad merged commit 6aa58ce into develop Jun 15, 2022

illuhad deleted the feature/rt-auto-shutdown branch June 15, 2022 15:39

psalz mentioned this pull request Jul 14, 2022

Fix explicit DAG flushing for hipSYCL celerity/celerity-runtime#124

Merged

psalz mentioned this pull request Mar 15, 2023

info::event::command_execution_status does not change to running #976

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use reference-counting of user SYCL objects to manage runtime lifetime #749

Use reference-counting of user SYCL objects to manage runtime lifetime #749

illuhad commented Jun 14, 2022

psalz commented Jul 14, 2022

illuhad commented Jul 14, 2022 •

edited

psalz commented Jul 14, 2022

Use reference-counting of user SYCL objects to manage runtime lifetime #749

Use reference-counting of user SYCL objects to manage runtime lifetime #749

Conversation

illuhad commented Jun 14, 2022

psalz commented Jul 14, 2022

illuhad commented Jul 14, 2022 • edited

psalz commented Jul 14, 2022

illuhad commented Jul 14, 2022 •

edited