-
Notifications
You must be signed in to change notification settings - Fork 50
Update rotmg interface to handle issue in OpenCL CPU support #532
Conversation
src/interface/blas1_interface.hpp
Outdated
| auto copy_y1 = | ||
| blas::helper::copy_to_device(sb_handle.get_queue(), &_y1, _y1_tmp, 1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does copying this need to wait on _dependencies?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your review @hjabird
I thought it was more asynchronous, but maybe y1 could be the result of another operation and the copy needs to wait on the dependencies. I will update it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I addressed in 81f5ee7. I changed other dependencies accordingly.
| constexpr helper::AllocType mem_type = std::is_pointer_v<container_0_t> | ||
| ? helper::AllocType::usm | ||
| : helper::AllocType::buffer; | ||
| auto _y1_tmp = blas::helper::allocate<mem_type, container_3_t>( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your PR, @s-Nick.
I think that _y1_tmp needs to be deallocated when everything is done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
src/interface/blas1_interface.hpp
Outdated
| // portBLAS implementation. Otherwise event dependencies works fine. The | ||
| // issue has been reported to intel/llvm project here: | ||
| // https://github.com/intel/llvm/issues/14623 | ||
| if constexpr (mem_type != helper::AllocType::buffer) copy_y1.wait(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this still required since we are not dealing anymore with host_task dependencies?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Back when I tested it still seemed necessary, but now with more sync in place it doesn't. I removed in c128d67. I tested on multiple hardware within a loop and it is fine now. Thanks for pointing it out.
|
I think that |
Due to compatibility with oneMKL rotmg needs to handle scalar value not passed as device pointer. Moreover, a bug in the compiler make the event dependency not always respected when using OpenCL backend on CPUs, so it is necessary a work-around specifically for the default target which can be used for any hardware.
The work around is required only for usm memory type.
Add dependencies to copy operator and change how the dependencies are passed accordingly
Add async deallocation for the temporary memory allocated to support scalar value.
This patch adds the new interface of rotmg to the library and not only for header only usage. It adds tests to portBLAS and update the test itself. Test updates include removal of y1 from result checking, since according to spec y1 is not part of the operator output.
host_task caused issues with some backends, so changing to a synchronization and a simple sycl::free when using usm. Removing synchronization on copy since it is not necessary anymore.
To offer proper oneMKL support
rotmgoperator must support the possibility to havey1argument passed as scalar value.This PR enables this possibility by adding a check on its type and handling both cases: pointer/buffer or scalar.
Moreover, due to a bug in OpenCL the copy for DEFAULT target, that it generally used on CPUs, must be explicitly synchronized if using USM.