MPI version of the Nullspace tutorial

ddemidov · Dec 7, 2020 · afe27b1 · afe27b1
1 parent 9be6a71
commit afe27b1
Show file tree

Hide file tree

Showing 5 changed files with 259 additions and 1 deletion.
diff --git a/docs/NullspaceMPI.rst b/docs/NullspaceMPI.rst
@@ -0,0 +1,81 @@
+Using near null-space vectors (MPI version)
+-------------------------------------------
+
+Let us look at how to use the near null-space vectors in the MPI version of the
+solver for the elasticity problem (see :doc:`Nullspace`). The following points
+need to be kept in mind:
+
+- The near null-space vectors need to be partitioned (and reordered) similar to
+  the RHS vector.
+- Since we are using coordinates of the discretization grid nodes for the
+  computation of the rigid body modes, in order to be able to do this locally
+  we need to partition the system in such a way that DOFs from a single grid
+  node are owned by the same MPI process. In this case this means we need to do
+  a block-wise partitioning with a :math:`3\times3` blocks.
+- It is more convenient to partition the coordinate matrix and then to compute
+  the rigid body modes.
+
+The listing below shows the complete source code for the MPI elasticity solver
+(`tutorial/5.Nullspace/nullspace_mpi.cpp`_)
+
+.. _tutorial/5.Nullspace/nullspace_mpi.cpp: https://github.com/ddemidov/amgcl/blob/master/tutorial/5.Nullspace/nullspace_mpi.cpp
+.. _ParMETIS: http://glaros.dtc.umn.edu/gkhome/metis/parmetis/overview
+.. _PT-SCOTCH: https://www.labri.fr/perso/pelegrin/scotch/
+
+.. literalinclude:: ../tutorial/5.Nullspace/nullspace_mpi.cpp
+   :caption: The MPI solution of the elasticity problem
+   :language: cpp
+   :linenos:
+
+In lines 44--49 we split the system into approximately equal chunks of rows,
+while making sure the chunk sizes are divisible by 3 (the number of DOFs per
+grid node). This is a naive paritioning that will be improved a bit later:
+
+We read the parts of the system matrix, the RHS vector, and the grid node
+coordinates that belong to the current MPI process in lines 52--61. The
+backends for the iterative solver and the preconditioner and the solver type
+are declared in lines 72--82.  In lines 85--86 we create the distributed
+version of the matrix from the local CRS arrays. After that, we are ready to
+partition the system using AMGCL wrapper for either ParMETIS_ or PT-SCOTCH_
+libraries (lines 91--123).  Note that we are reordering the coordinate matrix
+``coo`` in the same way the RHS vector is reordered, even though the coordinate
+matrix has three times less rows than the system matrix. We can do this because
+the coordinate matrix is stored in the row-major order, and each row of the
+matrix has three coordinates, which means the total number of elements in the
+matrix is equal to the number of elements in the RHS vector, and we can apply
+our block-wise partitioning to the coordinate matrix.
+
+The coordinates for the current MPI domain are converted into the rigid body
+modes in lines 135--136, after which we are ready to setup the solver (line
+140) and solve the system (line 152). Below is the output of the compiled
+program::
+
+    $ export OMP_NUM_THREADS=1
+    $ mpirun -np 4 nullspace_mpi A.bin b.bin C.bin 
+    Matrix A.bin: 81657x81657
+    RHS b.bin: 81657x1
+    Coords C.bin: 27219x3
+    Partitioning[ParMETIS] 4 -> 4
+    Type:             CG
+    Unknowns:         19965
+    Memory footprint: 311.95 K
+
+    Number of levels:    3
+    Operator complexity: 1.53
+    Grid complexity:     1.10
+
+    level     unknowns       nonzeros
+    ---------------------------------
+        0        81657        3171111 (65.31%) [4]
+        1         7824        1674144 (34.48%) [4]
+        2          144          10224 ( 0.21%) [4]
+
+    Iters: 104
+    Error: 9.26388e-09
+
+    [Nullspace:       2.833 s] (100.00%)
+    [ self:           0.070 s] (  2.48%)
+    [  partition:     0.230 s] (  8.10%)
+    [  read:          0.009 s] (  0.32%)
+    [  setup:         1.081 s] ( 38.15%)
+    [  solve:         1.443 s] ( 50.94%)
diff --git a/docs/tutorial.rst b/docs/tutorial.rst
@@ -22,3 +22,4 @@ GeForce GTX 1050 Ti GPU.
     CoupCons3D
     Stokes
     Nullspace
+    NullspaceMPI
diff --git a/tutorial/2.Serena/serena_mpi.cpp b/tutorial/2.Serena/serena_mpi.cpp
@@ -147,7 +147,7 @@ int main(int argc, char *argv[]) {
 
         // and the RHS vector:
         std::vector<dvec_type> new_rhs(R->loc_rows());
-        R->move_to_backend(typename DBackend::params());
+        R->move_to_backend();
         amgcl::backend::spmv(1, *R, rhs, 0, new_rhs);
         rhs.swap(new_rhs);
 

diff --git a/tutorial/5.Nullspace/CMakeLists.txt b/tutorial/5.Nullspace/CMakeLists.txt
@@ -1,3 +1,16 @@
 add_executable(nullspace nullspace.cpp)
 target_link_libraries(nullspace amgcl)
 
+if (TARGET mpi_target)
+    add_executable(nullspace_mpi nullspace_mpi.cpp)
+    target_link_libraries(nullspace_mpi amgcl mpi_target)
+
+    if (TARGET scotch_target)
+        target_link_libraries(nullspace_mpi scotch_target)
+    endif()
+
+    if (TARGET Metis::metis)
+        target_link_libraries(nullspace_mpi Metis::metis)
+    endif()
+
+endif()
diff --git a/tutorial/5.Nullspace/nullspace_mpi.cpp b/tutorial/5.Nullspace/nullspace_mpi.cpp
@@ -0,0 +1,163 @@
+#include <vector>
+#include <iostream>
+
+#include <amgcl/backend/builtin.hpp>
+#include <amgcl/adapter/crs_tuple.hpp>
+#include <amgcl/coarsening/rigid_body_modes.hpp>
+
+#include <amgcl/mpi/distributed_matrix.hpp>
+#include <amgcl/mpi/make_solver.hpp>
+#include <amgcl/mpi/amg.hpp>
+#include <amgcl/mpi/coarsening/smoothed_aggregation.hpp>
+#include <amgcl/mpi/relaxation/spai0.hpp>
+#include <amgcl/mpi/solver/cg.hpp>
+
+#include <amgcl/io/binary.hpp>
+#include <amgcl/profiler.hpp>
+
+#if defined(AMGCL_HAVE_PARMETIS)
+#  include <amgcl/mpi/partition/parmetis.hpp>
+#elif defined(AMGCL_HAVE_SCOTCH)
+#  include <amgcl/mpi/partition/ptscotch.hpp>
+#endif
+
+int main(int argc, char *argv[]) {
+    // The command line should contain the matrix, the RHS, and the coordinate files:
+    if (argc < 4) {
+        std::cerr << "Usage: " << argv[0] << " <A.bin> <b.bin> <coo.bin>" << std::endl;
+        return 1;
+    }
+
+    amgcl::mpi::init mpi(&argc, &argv);
+    amgcl::mpi::communicator world(MPI_COMM_WORLD);
+
+    // The profiler:
+    amgcl::profiler<> prof("Nullspace");
+
+    // Read the system matrix, the RHS, and the coordinates:
+    prof.tic("read");
+    // Get the global size of the matrix:
+    ptrdiff_t rows = amgcl::io::crs_size<ptrdiff_t>(argv[1]);
+
+    // Split the matrix into approximately equal chunks of rows, and
+    // make sure each chunk size is divisible by 3.
+    ptrdiff_t chunk = (rows + world.size - 1) / world.size;
+    if (chunk % 3) chunk += 3 - chunk % 3;
+
+    ptrdiff_t row_beg = std::min(rows, chunk * world.rank);
+    ptrdiff_t row_end = std::min(rows, row_beg + chunk);
+    chunk = row_end - row_beg;
+
+    // Read our part of the system matrix, the RHS and the coordinates.
+    std::vector<ptrdiff_t> ptr, col;
+    std::vector<double> val, rhs, coo;
+    amgcl::io::read_crs(argv[1], rows, ptr, col, val, row_beg, row_end);
+
+    ptrdiff_t n, m;
+    amgcl::io::read_dense(argv[2], n, m, rhs, row_beg, row_end);
+    amgcl::precondition(n == rows && m == 1, "The RHS file has wrong dimensions");
+
+    amgcl::io::read_dense(argv[3], n, m, coo, row_beg / 3, row_end / 3);
+    amgcl::precondition(n * 3 == rows && m == 3, "The coordinate file has wrong dimensions");
+    prof.toc("read");
+
+    if (world.rank == 0) {
+        std::cout
+            << "Matrix " << argv[1] << ": " << rows << "x" << rows << std::endl
+            << "RHS "    << argv[2] << ": " << rows << "x1" << std::endl
+            << "Coords " << argv[3] << ": " << rows / 3 << "x3" << std::endl;
+    }
+
+    // Declare the backends and the solver type
+    typedef amgcl::backend::builtin<double> SBackend; // the solver backend
+    typedef amgcl::backend::builtin<float>  PBackend; // the preconditioner backend
+
+    typedef amgcl::mpi::make_solver<
+        amgcl::mpi::amg<
+            PBackend,
+            amgcl::mpi::coarsening::smoothed_aggregation<PBackend>,
+            amgcl::mpi::relaxation::spai0<PBackend>
+            >,
+        amgcl::mpi::solver::cg<PBackend>
+        > Solver;
+
+    // The distributed matrix
+    auto A = std::make_shared<amgcl::mpi::distributed_matrix<SBackend>>(
+            world, std::tie(chunk, ptr, col, val));
+
+    // Partition the matrix, the RHS vector, and the coordinates.
+    // If neither ParMETIS not PT-SCOTCH are not available,
+    // just keep the current naive partitioning.
+#if defined(AMGCL_HAVE_PARMETIS) || defined(AMGCL_HAVE_SCOTCH)
+#  if defined(AMGCL_HAVE_PARMETIS)
+    typedef amgcl::mpi::partition::parmetis<SBackend> Partition;
+#  elif defined(AMGCL_HAVE_SCOTCH)
+    typedef amgcl::mpi::partition::ptscotch<SBackend> Partition;
+#  endif
+
+    if (world.size > 1) {
+        auto t = prof.scoped_tic("partition");
+        Partition part;
+
+        // part(A) returns the distributed permutation matrix.
+        // Keep the DOFs belonging to the same grid nodes together
+        // (use block-wise partitioning with block size 3).
+        auto P = part(*A, 3);
+        auto R = transpose(*P);
+
+        // Reorder the matrix:
+        A = product(*R, *product(*A, *P));
+
+        // Reorder the RHS vector and the coordinates:
+        R->move_to_backend();
+        std::vector<double> new_rhs(R->loc_rows());
+        std::vector<double> new_coo(R->loc_rows());
+        amgcl::backend::spmv(1, *R, rhs, 0, new_rhs);
+        amgcl::backend::spmv(1, *R, coo, 0, new_coo);
+        rhs.swap(new_rhs);
+        coo.swap(new_coo);
+
+        // Update the number of the local rows
+        // (it may have changed as a result of permutation).
+        chunk = A->loc_rows();
+    }
+#endif
+
+    // Solver parameters:
+    Solver::params prm;
+    prm.solver.maxiter = 500;
+    prm.precond.coarsening.aggr.eps_strong = 0;
+
+    // Convert the coordinates to the rigid body modes.
+    // The function returns the number of near null-space vectors
+    // (3 in 2D case, 6 in 3D case) and writes the vectors to the
+    // std::vector<double> specified as the last argument:
+    prm.precond.coarsening.aggr.nullspace.cols = amgcl::coarsening::rigid_body_modes(
+            3, coo, prm.precond.coarsening.aggr.nullspace.B);
+
+    // Initialize the solver with the system matrix.
+    prof.tic("setup");
+    Solver solve(world, A, prm);
+    prof.toc("setup");
+
+    // Show the mini-report on the constructed solver:
+    if (world.rank == 0) std::cout << solve << std::endl;
+
+    // Solve the system with the zero initial approximation:
+    int iters;
+    double error;
+    std::vector<double> x(chunk, 0.0);
+
+    prof.tic("solve");
+    std::tie(iters, error) = solve(*A, rhs, x);
+    prof.toc("solve");
+
+    // Output the number of iterations, the relative error,
+    // and the profiling data:
+    if (world.rank == 0) {
+        std::cout
+            << "Iters: " << iters << std::endl
+            << "Error: " << error << std::endl
+            << prof << std::endl;
+    }
+}