Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Convert CUDA tests to use Kokkos #14628

Merged
merged 7 commits into from Jan 24, 2023
Merged

Conversation

masterleinad
Copy link
Member

This should cover all tests that are not requiring CUDA/GPU/Device-aware MPI, CUDA objects (sparse matrix, vectors, preconditioners), or CUDA MatrixFree.

@masterleinad
Copy link
Member Author

/rebuild

include/deal.II/lac/affine_constraints.templates.h Outdated Show resolved Hide resolved
tests/base/array_view_access_data.cc Outdated Show resolved Hide resolved
tests/base/kokkos_point.cc Outdated Show resolved Hide resolved
tests/base/kokkos_point.cc Outdated Show resolved Hide resolved
tests/base/kokkos_tensor_01.cc Outdated Show resolved Hide resolved
tests/base/kokkos_tensor_02.cc Outdated Show resolved Hide resolved
tests/lac/affine_constraints_set_zero.cc Outdated Show resolved Hide resolved
tests/lac/affine_constraints_set_zero.cc Show resolved Hide resolved
@masterleinad
Copy link
Member Author

Since we are not using any shared memory space like CudaUVM, only the default execution space instance explicitly and only the two-argument deep_copy (that fences for and after all execution space instances), we should be fine without ever fencing explicitly. That being said, I think we should reevaluate the fences after profiling.

@Rombur
Copy link
Member

Rombur commented Jan 22, 2023

we should be fine without ever fencing explicitly. That being said, I think we should reevaluate the fences after profiling.

Why? If there are not necessary, why do you want to add them?

@masterleinad
Copy link
Member Author

Why? If they are not necessary, why do you want to add them?

I think that it makes sense to at least avoid global fences in certain situations (which we will need to do some more profiling for) and I would rather have the implementation ready for that. If the global fences don't hurt us, adding some more fences is likely also not performance-critical. Either way, I think we should make it easy for users not to forget about synchronization and offer options to be more specific if necessary. Also, I already removed the fences you were commenting on.
I would prefer to get the conversion done and consider synchronization again later.

@Rombur Rombur merged commit 069d370 into dealii:master Jan 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants