New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Convert CUDA tests to use Kokkos #14628
Conversation
/rebuild |
Since we are not using any shared memory space like |
Why? If there are not necessary, why do you want to add them? |
I think that it makes sense to at least avoid global fences in certain situations (which we will need to do some more profiling for) and I would rather have the implementation ready for that. If the global fences don't hurt us, adding some more fences is likely also not performance-critical. Either way, I think we should make it easy for users not to forget about synchronization and offer options to be more specific if necessary. Also, I already removed the fences you were commenting on. |
This should cover all tests that are not requiring CUDA/GPU/Device-aware MPI, CUDA objects (sparse matrix, vectors, preconditioners), or CUDA MatrixFree.