-
Notifications
You must be signed in to change notification settings - Fork 74
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
emulate hip/cuda-Memcpy3D with a kernel #1014
emulate hip/cuda-Memcpy3D with a kernel #1014
Conversation
5e529c7
to
08ba662
Compare
/// It is required to start `height * depth` HIP/CUDA blocks. | ||
/// The kernel loops over the memory rows. | ||
template<typename T> | ||
__global__ void hipMemcpy3DEmulatedKernelD2D( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note: a native kernel is used to avoid cyclic dependencies within alpaka.
cmake/alpakaCommon.cmake
Outdated
@@ -18,7 +19,10 @@ set(ALPAKA_ACC_CPU_B_SEQ_T_FIBERS_ENABLE_DEFAULT ON) | |||
set(ALPAKA_ACC_CPU_B_TBB_T_SEQ_ENABLE_DEFAULT ON) | |||
set(ALPAKA_ACC_CPU_B_OMP2_T_SEQ_ENABLE_DEFAULT ON) | |||
set(ALPAKA_ACC_CPU_B_SEQ_T_OMP2_ENABLE_DEFAULT ON) | |||
set(ALPAKA_ACC_CPU_BT_OMP4_ENABLE_DEFAULT ON) | |||
set(ALPAKA_ACC_ANY_BT_OMP5_ENABLE_DEFAULT ON) | |||
set(ALPAKA_ACC_ANY_BT_OACC_ENABLE_DEFAULT OFF) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should not be part of this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
e5889d1
to
d7a4c63
Compare
- add kernel to emulate hip/cuda-Memcpy3D - add CMake option to enable/disable emulated memory copy (by default only for HIP enabled)
d7a4c63
to
e503b7f
Compare
@BenjaminW3 Do you know what we can do if a CI test fails. Actions do not allow to restart single tests. |
Github is working on it. I do not know a solution despite rebuilding all or merging it nevertheless. |
Feel free to merge it even if the CI is not passing. The CI passed before but I fixed a indention issue in CMake and today not all tests passing. |
only for HIP enabled)
ALPAKA_EMU_MEMCPY3D
for one HIP and one CUDA CI testThis optimization based on my HIP issue and will increase the memory copy performance for device to device copies on the same device.
I enabled the emulated copy only for HIP, for CUDA it can be optional enabled but is not showing any improvement. I assume the CUDA driver is already using a kernel instead of looping over the rows and call 1D mem-copies.