Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve compress for cuda-aware mpi #7707

Merged
merged 4 commits into from Feb 12, 2019
Merged

Conversation

Rombur
Copy link
Member

@Rombur Rombur commented Feb 8, 2019

This PR does two things:

  • update the api of a few functions in cuda_kernel for consistency (these functions are a couple of months old so we are free to change the API)
  • use the same logic found in Allow using CUDA-aware MPI #7303 (where it was used to speed up update_ghost) to speed up compress

cc: @dsambit

@Rombur Rombur added the GPU label Feb 8, 2019
@Rombur Rombur added this to In Progress in CUDA support Feb 8, 2019
Copy link
Member

@kronbichler kronbichler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great from my point of view. I also like that the import indices for the GPU got a separate function.

@tamiko
Copy link
Member

tamiko commented Feb 8, 2019

/rebuild

@tamiko
Copy link
Member

tamiko commented Feb 8, 2019

(It is probably a moot point to let the CI run, but let's check the non-cuda scenario for successful compilation anyway.)

Copy link
Member

@masterleinad masterleinad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a few comments below.

for (const auto &import_range : import_indices_data)
const unsigned int import_indices_plain_dev_size =
import_indices_plain_dev.size();
for (unsigned int i = 0; i < import_indices_plain_dev_size; ++i)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that you can still use a range-based for loop here and I expect clang-tidy to complain if it sees this loop.

{
const unsigned int import_indices_plain_dev_size =
import_indices_plain_dev.size();
for (unsigned int i = 0; i < import_indices_plain_dev_size; ++i)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see above

{
const unsigned int import_indices_plain_dev_size =
import_indices_plain_dev.size();
for (unsigned int i = 0; i < import_indices_plain_dev_size; ++i)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see above

{
const unsigned int import_indices_plain_dev_size =
import_indices_plain_dev.size();
for (unsigned int i = 0; i < import_indices_plain_dev_size; ++i)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see above

@@ -573,9 +591,9 @@ namespace LinearAlgebra

template <typename Number>
__global__ void
add_permutated(Number * val,
add_permutated(const size_type *indices,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why doesn't have this function an IndexType template parameter? Would it make sense for conformity?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because it doesn't need to be templated. I only template the functions that have to be templated

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK

@masterleinad
Copy link
Member

I can push the range-based loop changes here or create a PR afterward if you prefer that.

@Rombur
Copy link
Member Author

Rombur commented Feb 10, 2019

I can push the range-based loop changes here or create a PR afterward if you prefer that.

That's fine, I'll do it.

@Rombur
Copy link
Member Author

Rombur commented Feb 10, 2019

@masterleinad done

Copy link
Member

@masterleinad masterleinad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@masterleinad
Copy link
Member

Passes all cuda tests for me locally.

@@ -571,6 +571,13 @@ namespace Utilities
<< " elements for this partitioner.");

private:
/**
* Initialize import_indices_plain_dev from import_indices_data. This
* function is only used when CUDA-aware MPI.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* function is only used when CUDA-aware MPI.
* function is only used when using CUDA-aware MPI.

@kronbichler kronbichler merged commit 1efdc0a into dealii:master Feb 12, 2019
@Rombur Rombur deleted the cuda_compress branch February 14, 2019 01:57
@masterleinad masterleinad moved this from In Progress to Done in CUDA support Feb 28, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Development

Successfully merging this pull request may close these issues.

None yet

5 participants