Skip to content
This repository was archived by the owner on Jan 13, 2025. It is now read-only.

Conversation

@s-Nick
Copy link
Collaborator

@s-Nick s-Nick commented Jul 31, 2024

To offer proper oneMKL support rotmg operator must support the possibility to have y1 argument passed as scalar value.
This PR enables this possibility by adding a check on its type and handling both cases: pointer/buffer or scalar.
Moreover, due to a bug in OpenCL the copy for DEFAULT target, that it generally used on CPUs, must be explicitly synchronized if using USM.

@s-Nick s-Nick requested review from Rbiessy, hjabird and pgorlani July 31, 2024 08:13
Comment on lines 823 to 824
auto copy_y1 =
blas::helper::copy_to_device(sb_handle.get_queue(), &_y1, _y1_tmp, 1);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does copying this need to wait on _dependencies?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your review @hjabird
I thought it was more asynchronous, but maybe y1 could be the result of another operation and the copy needs to wait on the dependencies. I will update it.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I addressed in 81f5ee7. I changed other dependencies accordingly.

constexpr helper::AllocType mem_type = std::is_pointer_v<container_0_t>
? helper::AllocType::usm
: helper::AllocType::buffer;
auto _y1_tmp = blas::helper::allocate<mem_type, container_3_t>(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your PR, @s-Nick.

I think that _y1_tmp needs to be deallocated when everything is done.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your review @pgorlani

yes, you are right. I addressed it in a4e833c and update it in c128d67 to avoid host_task usage that causes some issues.

// portBLAS implementation. Otherwise event dependencies works fine. The
// issue has been reported to intel/llvm project here:
// https://github.com/intel/llvm/issues/14623
if constexpr (mem_type != helper::AllocType::buffer) copy_y1.wait();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this still required since we are not dealing anymore with host_task dependencies?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Back when I tested it still seemed necessary, but now with more sync in place it doesn't. I removed in c128d67. I tested on multiple hardware within a loop and it is fine now. Thanks for pointing it out.

@pgorlani
Copy link
Contributor

I think that src/interface/blas1/rotmg.cpp.in needs to be updated in order to consider the new rotmg interface. Probably, the unit test could be updated as well.

s-Nick added 7 commits August 21, 2024 10:20
Due to compatibility with oneMKL rotmg needs to handle scalar value not
passed as device pointer.
Moreover, a bug in the compiler make the event dependency not always
respected when using OpenCL backend on CPUs, so it is necessary a
work-around specifically for the default target which can be used for
any hardware.
The work around is required only for usm memory type.
Add dependencies to copy operator and change how the dependencies are
passed accordingly
Add async deallocation for the temporary memory allocated to support
scalar value.
This patch adds the new interface of rotmg to the library and not only
for header only usage. It adds tests to portBLAS and update the test
itself.
Test updates include removal of y1 from result checking, since according
to spec y1 is not part of the operator output.
host_task caused issues with some backends, so changing to a
synchronization and a simple sycl::free when using usm.
Removing synchronization on copy since it is not necessary anymore.
@s-Nick
Copy link
Collaborator Author

s-Nick commented Aug 21, 2024

@pgorlani I added the new interface and modify the tests in 39016ea

@s-Nick s-Nick merged commit a9378fb into codeplaysoftware:main Aug 23, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants