Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use reference-counting of user SYCL objects to manage runtime lifetime #749

Merged
merged 4 commits into from
Jun 15, 2022

Conversation

illuhad
Copy link
Collaborator

@illuhad illuhad commented Jun 14, 2022

Previously, the runtime was a singleton that lived within libhipSYCL-rt. This is occasionally caused issues with the shutdown behavior: Because the unload order of shared libraries is not well defined, it could happen that backend libraries (e.g. the CUDA runtime) unload before the hipSYCL runtime. This can prevent a clean release of allocated resources.

This PR fixes this at a fundamental level:

  • The runtime no longer is a global singleton; instead it just manages a std::weak_ptr<runtime>.
  • User-facing SYCL objects (queue, buffer, context etc) obtain a std::shared_ptr<runtime> from the weak pointer. If the weak_ptr is not currently valid, a new runtime object is constructed. This is abstracted using a runtime_keep_alive_token, which obtains the shared_ptr in its constructor.
  • The runtime internally forwards runtime* pointers to all its classes that need it - here we cannot use shared_ptr to avoid cyclic references or deadlocks during startup or shutdown.

With this approach, the runtime will automatically shutdown when there are no SYCL classes used in the main application anymore. If there are such classes again, the runtime starts up again.

Generally SYCL applications cannot really assume a global SYCL state in other implementations either, however hipSYCL previously exhibited this behavior. This can have performance impact, since launching the runtime is not free. The old behavior can be retrieved by setting the environment variable HIPSYCL_PERSISTENT_RUNTIME=1.

@illuhad illuhad merged commit 6aa58ce into develop Jun 15, 2022
@illuhad illuhad deleted the feature/rt-auto-shutdown branch June 15, 2022 15:39
@psalz
Copy link
Member

psalz commented Jul 14, 2022

In Celerity we used hipsycl::rt::application::dag().flush_async(); to immediately start execution of submitted work (previous discussion in #599). This API seems to have been removed in this patch. Would this call be equivalent? hipsycl::rt::application::get_runtime_pointer()->dag().flush_async();.

@illuhad
Copy link
Collaborator Author

illuhad commented Jul 14, 2022

In principle yes, although get_runtime_pointer() is a bit of an implementation detail and not super-efficient because it locks a mutex.
A more idiomatic approach is to:

  • Store rt::runtime_keep_alive_token in some object that is central to celerity (something like a celerity context?). runtime_keep_alive_token will obtain a shared_ptr to the runtime in its constructor, thereby only locking the mutex once in its constructor, and keeping the hipSYCL runtime alive as long as it exists. You can use the get() member function to get a pointer to the runtime efficiently.
  • Alternatively, all objects from the hipSYCL SYCL interface already have such a runtime_keep_alive_token, and you can actually leverage that using context::hipSYCL_runtime(), e.g. my_queue.get_context().hipSYCL_runtime()->dag().flush_async().

@psalz
Copy link
Member

psalz commented Jul 14, 2022

Perfect, thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants