New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Convert LinearAlgebra::distributed::Vector to Kokkos #14537
Convert LinearAlgebra::distributed::Vector to Kokkos #14537
Conversation
With the latest commit, I can run |
d317012
to
f11fc59
Compare
/rebuild |
Can you rebase to see if the new MPI Jenkins passes? |
c554965
to
d8ac4cd
Compare
d8ac4cd
to
b21c27b
Compare
I fixed the authorship for all commits and reverted unifying the implementations for |
It seems |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just minor comments. Do you know if this PR fixes the problem we have with ghosted vector (see the cuda/parallel_vector tests failing here)?
Kokkos::realloc(indices_dev, tmp_n_elements); | ||
Kokkos::deep_copy(indices_dev, | ||
Kokkos::View<size_type *>(indices.data(), | ||
tmp_n_elements)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
indices
is already a View
so why do you create a new one? Shouldn't this be a subview?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
AssertCuda(error_code); | ||
typename ::dealii::MemorySpace::Default::kokkos_space::execution_space | ||
exec; | ||
Kokkos::parallel_reduce( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You forgot to use a name for all the parallel_reduce
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
I haven't run the tests on NVIDIA GPUs but only on Intel GPUs (and CPUs using |
Maybe a problem when we use the CUDA-aware MPI path then. Is there something similar to CUDA-aware MPI with SYCL? |
Yes, there is GPU-aware MPI for Intel GPUs but this pull request doesn't address that. That's what #14571 is for (I have only tested that one with |
c9628f1
to
cacd8d3
Compare
Depends on #14510.