Improve compress for cuda-aware mpi #7707

Rombur · 2019-02-08T14:57:11Z

This PR does two things:

update the api of a few functions in cuda_kernel for consistency (these functions are a couple of months old so we are free to change the API)
use the same logic found in Allow using CUDA-aware MPI #7303 (where it was used to speed up update_ghost) to speed up compress

cc: @dsambit

kronbichler

This looks great from my point of view. I also like that the import indices for the GPU got a separate function.

tamiko · 2019-02-08T22:58:53Z

/rebuild

tamiko · 2019-02-08T23:00:30Z

(It is probably a moot point to let the CI run, but let's check the non-cuda scenario for successful compilation anyway.)

masterleinad

I have a few comments below.

masterleinad · 2019-02-08T23:17:23Z

include/deal.II/base/partitioner.templates.h

-              for (const auto &import_range : import_indices_data)
+              const unsigned int import_indices_plain_dev_size =
+                import_indices_plain_dev.size();
+              for (unsigned int i = 0; i < import_indices_plain_dev_size; ++i)


It seems that you can still use a range-based for loop here and I expect clang-tidy to complain if it sees this loop.

masterleinad · 2019-02-08T23:19:45Z

include/deal.II/base/partitioner.templates.h

+            {
+              const unsigned int import_indices_plain_dev_size =
+                import_indices_plain_dev.size();
+              for (unsigned int i = 0; i < import_indices_plain_dev_size; ++i)


masterleinad · 2019-02-08T23:19:56Z

include/deal.II/base/partitioner.templates.h

+            {
+              const unsigned int import_indices_plain_dev_size =
+                import_indices_plain_dev.size();
+              for (unsigned int i = 0; i < import_indices_plain_dev_size; ++i)


masterleinad · 2019-02-08T23:20:08Z

include/deal.II/base/partitioner.templates.h

+            {
+              const unsigned int import_indices_plain_dev_size =
+                import_indices_plain_dev.size();
+              for (unsigned int i = 0; i < import_indices_plain_dev_size; ++i)


masterleinad · 2019-02-08T23:24:09Z

include/deal.II/lac/cuda_kernels.templates.h

@@ -573,9 +591,9 @@ namespace LinearAlgebra

      template <typename Number>
      __global__ void
-      add_permutated(Number *         val,
+      add_permutated(const size_type *indices,


Why doesn't have this function an IndexType template parameter? Would it make sense for conformity?

Because it doesn't need to be templated. I only template the functions that have to be templated

masterleinad · 2019-02-10T21:58:20Z

I can push the range-based loop changes here or create a PR afterward if you prefer that.

Rombur · 2019-02-10T22:06:21Z

I can push the range-based loop changes here or create a PR afterward if you prefer that.

That's fine, I'll do it.

Rombur · 2019-02-10T22:33:19Z

@masterleinad done

masterleinad

Thanks!

masterleinad · 2019-02-10T22:56:14Z

Passes all cuda tests for me locally.

bangerth · 2019-02-11T01:12:56Z

include/deal.II/base/partitioner.h

@@ -571,6 +571,13 @@ namespace Utilities
                     << " elements for this partitioner.");

    private:
+      /**
+       * Initialize import_indices_plain_dev from import_indices_data. This
+       * function is only used when CUDA-aware MPI.


Suggested change

* function is only used when CUDA-aware MPI.

* function is only used when using CUDA-aware MPI.

…t of the vector

Reduce the number of kernel launch in a way similar to what is done for update_ghost.

Rombur added the GPU label Feb 8, 2019

Rombur added this to In Progress in CUDA support Feb 8, 2019

kronbichler approved these changes Feb 8, 2019

View reviewed changes

tamiko approved these changes Feb 8, 2019

View reviewed changes

tamiko added the ready to test label Feb 8, 2019

tamiko added the Reviewed and ready to merge label Feb 8, 2019

masterleinad requested changes Feb 8, 2019

View reviewed changes

tamiko removed the Reviewed and ready to merge label Feb 8, 2019

masterleinad mentioned this pull request Feb 10, 2019

probable bug in compress with CUDA AWARE MPI=ON #7713

Open

Rombur force-pushed the cuda_compress branch from 2607958 to eccbb86 Compare February 10, 2019 22:32

masterleinad approved these changes Feb 10, 2019

View reviewed changes

masterleinad added the Reviewed and ready to merge label Feb 10, 2019

bangerth reviewed Feb 11, 2019

View reviewed changes

Rombur added 4 commits February 12, 2019 01:32

Change API of a few functions cuda_kernel for consistency

42021d3

Move initialization of import_indices_plain_dev into a separate function

17a59db

Add new cuda kernel function to execute a binary operation on a subse…

5cb0fe6

…t of the vector

Improve compress when using CUDA-aware MPI

55026d6

Reduce the number of kernel launch in a way similar to what is done for update_ghost.

Rombur force-pushed the cuda_compress branch from eccbb86 to 55026d6 Compare February 12, 2019 01:32

kronbichler merged commit 1efdc0a into dealii:master Feb 12, 2019

Rombur deleted the cuda_compress branch February 14, 2019 01:57

masterleinad moved this from In Progress to Done in CUDA support Feb 28, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve compress for cuda-aware mpi #7707

Improve compress for cuda-aware mpi #7707

Rombur commented Feb 8, 2019

kronbichler left a comment

tamiko commented Feb 8, 2019

tamiko commented Feb 8, 2019

masterleinad left a comment

masterleinad Feb 8, 2019

masterleinad Feb 8, 2019

masterleinad Feb 8, 2019

masterleinad Feb 8, 2019

masterleinad Feb 8, 2019

Rombur Feb 10, 2019

masterleinad Feb 10, 2019

masterleinad commented Feb 10, 2019

Rombur commented Feb 10, 2019

Rombur commented Feb 10, 2019

masterleinad left a comment

masterleinad commented Feb 10, 2019

bangerth Feb 11, 2019

	* function is only used when CUDA-aware MPI.
	* function is only used when using CUDA-aware MPI.

Improve compress for cuda-aware mpi #7707

Improve compress for cuda-aware mpi #7707

Conversation

Rombur commented Feb 8, 2019

kronbichler left a comment

Choose a reason for hiding this comment

tamiko commented Feb 8, 2019

tamiko commented Feb 8, 2019

masterleinad left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

masterleinad commented Feb 10, 2019

Rombur commented Feb 10, 2019

Rombur commented Feb 10, 2019

masterleinad left a comment

Choose a reason for hiding this comment

masterleinad commented Feb 10, 2019

Choose a reason for hiding this comment