-
Notifications
You must be signed in to change notification settings - Fork 107
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
MPI version of the Nullspace tutorial
- Loading branch information
Showing
5 changed files
with
259 additions
and
1 deletion.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,81 @@ | ||
Using near null-space vectors (MPI version) | ||
------------------------------------------- | ||
|
||
Let us look at how to use the near null-space vectors in the MPI version of the | ||
solver for the elasticity problem (see :doc:`Nullspace`). The following points | ||
need to be kept in mind: | ||
|
||
- The near null-space vectors need to be partitioned (and reordered) similar to | ||
the RHS vector. | ||
- Since we are using coordinates of the discretization grid nodes for the | ||
computation of the rigid body modes, in order to be able to do this locally | ||
we need to partition the system in such a way that DOFs from a single grid | ||
node are owned by the same MPI process. In this case this means we need to do | ||
a block-wise partitioning with a :math:`3\times3` blocks. | ||
- It is more convenient to partition the coordinate matrix and then to compute | ||
the rigid body modes. | ||
|
||
The listing below shows the complete source code for the MPI elasticity solver | ||
(`tutorial/5.Nullspace/nullspace_mpi.cpp`_) | ||
|
||
.. _tutorial/5.Nullspace/nullspace_mpi.cpp: https://github.com/ddemidov/amgcl/blob/master/tutorial/5.Nullspace/nullspace_mpi.cpp | ||
.. _ParMETIS: http://glaros.dtc.umn.edu/gkhome/metis/parmetis/overview | ||
.. _PT-SCOTCH: https://www.labri.fr/perso/pelegrin/scotch/ | ||
|
||
.. literalinclude:: ../tutorial/5.Nullspace/nullspace_mpi.cpp | ||
:caption: The MPI solution of the elasticity problem | ||
:language: cpp | ||
:linenos: | ||
|
||
In lines 44--49 we split the system into approximately equal chunks of rows, | ||
while making sure the chunk sizes are divisible by 3 (the number of DOFs per | ||
grid node). This is a naive paritioning that will be improved a bit later: | ||
|
||
We read the parts of the system matrix, the RHS vector, and the grid node | ||
coordinates that belong to the current MPI process in lines 52--61. The | ||
backends for the iterative solver and the preconditioner and the solver type | ||
are declared in lines 72--82. In lines 85--86 we create the distributed | ||
version of the matrix from the local CRS arrays. After that, we are ready to | ||
partition the system using AMGCL wrapper for either ParMETIS_ or PT-SCOTCH_ | ||
libraries (lines 91--123). Note that we are reordering the coordinate matrix | ||
``coo`` in the same way the RHS vector is reordered, even though the coordinate | ||
matrix has three times less rows than the system matrix. We can do this because | ||
the coordinate matrix is stored in the row-major order, and each row of the | ||
matrix has three coordinates, which means the total number of elements in the | ||
matrix is equal to the number of elements in the RHS vector, and we can apply | ||
our block-wise partitioning to the coordinate matrix. | ||
|
||
The coordinates for the current MPI domain are converted into the rigid body | ||
modes in lines 135--136, after which we are ready to setup the solver (line | ||
140) and solve the system (line 152). Below is the output of the compiled | ||
program:: | ||
|
||
$ export OMP_NUM_THREADS=1 | ||
$ mpirun -np 4 nullspace_mpi A.bin b.bin C.bin | ||
Matrix A.bin: 81657x81657 | ||
RHS b.bin: 81657x1 | ||
Coords C.bin: 27219x3 | ||
Partitioning[ParMETIS] 4 -> 4 | ||
Type: CG | ||
Unknowns: 19965 | ||
Memory footprint: 311.95 K | ||
|
||
Number of levels: 3 | ||
Operator complexity: 1.53 | ||
Grid complexity: 1.10 | ||
|
||
level unknowns nonzeros | ||
--------------------------------- | ||
0 81657 3171111 (65.31%) [4] | ||
1 7824 1674144 (34.48%) [4] | ||
2 144 10224 ( 0.21%) [4] | ||
|
||
Iters: 104 | ||
Error: 9.26388e-09 | ||
|
||
[Nullspace: 2.833 s] (100.00%) | ||
[ self: 0.070 s] ( 2.48%) | ||
[ partition: 0.230 s] ( 8.10%) | ||
[ read: 0.009 s] ( 0.32%) | ||
[ setup: 1.081 s] ( 38.15%) | ||
[ solve: 1.443 s] ( 50.94%) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -22,3 +22,4 @@ GeForce GTX 1050 Ti GPU. | |
CoupCons3D | ||
Stokes | ||
Nullspace | ||
NullspaceMPI |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,3 +1,16 @@ | ||
add_executable(nullspace nullspace.cpp) | ||
target_link_libraries(nullspace amgcl) | ||
|
||
if (TARGET mpi_target) | ||
add_executable(nullspace_mpi nullspace_mpi.cpp) | ||
target_link_libraries(nullspace_mpi amgcl mpi_target) | ||
|
||
if (TARGET scotch_target) | ||
target_link_libraries(nullspace_mpi scotch_target) | ||
endif() | ||
|
||
if (TARGET Metis::metis) | ||
target_link_libraries(nullspace_mpi Metis::metis) | ||
endif() | ||
|
||
endif() |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,163 @@ | ||
#include <vector> | ||
#include <iostream> | ||
|
||
#include <amgcl/backend/builtin.hpp> | ||
#include <amgcl/adapter/crs_tuple.hpp> | ||
#include <amgcl/coarsening/rigid_body_modes.hpp> | ||
|
||
#include <amgcl/mpi/distributed_matrix.hpp> | ||
#include <amgcl/mpi/make_solver.hpp> | ||
#include <amgcl/mpi/amg.hpp> | ||
#include <amgcl/mpi/coarsening/smoothed_aggregation.hpp> | ||
#include <amgcl/mpi/relaxation/spai0.hpp> | ||
#include <amgcl/mpi/solver/cg.hpp> | ||
|
||
#include <amgcl/io/binary.hpp> | ||
#include <amgcl/profiler.hpp> | ||
|
||
#if defined(AMGCL_HAVE_PARMETIS) | ||
# include <amgcl/mpi/partition/parmetis.hpp> | ||
#elif defined(AMGCL_HAVE_SCOTCH) | ||
# include <amgcl/mpi/partition/ptscotch.hpp> | ||
#endif | ||
|
||
int main(int argc, char *argv[]) { | ||
// The command line should contain the matrix, the RHS, and the coordinate files: | ||
if (argc < 4) { | ||
std::cerr << "Usage: " << argv[0] << " <A.bin> <b.bin> <coo.bin>" << std::endl; | ||
return 1; | ||
} | ||
|
||
amgcl::mpi::init mpi(&argc, &argv); | ||
amgcl::mpi::communicator world(MPI_COMM_WORLD); | ||
|
||
// The profiler: | ||
amgcl::profiler<> prof("Nullspace"); | ||
|
||
// Read the system matrix, the RHS, and the coordinates: | ||
prof.tic("read"); | ||
// Get the global size of the matrix: | ||
ptrdiff_t rows = amgcl::io::crs_size<ptrdiff_t>(argv[1]); | ||
|
||
// Split the matrix into approximately equal chunks of rows, and | ||
// make sure each chunk size is divisible by 3. | ||
ptrdiff_t chunk = (rows + world.size - 1) / world.size; | ||
if (chunk % 3) chunk += 3 - chunk % 3; | ||
|
||
ptrdiff_t row_beg = std::min(rows, chunk * world.rank); | ||
ptrdiff_t row_end = std::min(rows, row_beg + chunk); | ||
chunk = row_end - row_beg; | ||
|
||
// Read our part of the system matrix, the RHS and the coordinates. | ||
std::vector<ptrdiff_t> ptr, col; | ||
std::vector<double> val, rhs, coo; | ||
amgcl::io::read_crs(argv[1], rows, ptr, col, val, row_beg, row_end); | ||
|
||
ptrdiff_t n, m; | ||
amgcl::io::read_dense(argv[2], n, m, rhs, row_beg, row_end); | ||
amgcl::precondition(n == rows && m == 1, "The RHS file has wrong dimensions"); | ||
|
||
amgcl::io::read_dense(argv[3], n, m, coo, row_beg / 3, row_end / 3); | ||
amgcl::precondition(n * 3 == rows && m == 3, "The coordinate file has wrong dimensions"); | ||
prof.toc("read"); | ||
|
||
if (world.rank == 0) { | ||
std::cout | ||
<< "Matrix " << argv[1] << ": " << rows << "x" << rows << std::endl | ||
<< "RHS " << argv[2] << ": " << rows << "x1" << std::endl | ||
<< "Coords " << argv[3] << ": " << rows / 3 << "x3" << std::endl; | ||
} | ||
|
||
// Declare the backends and the solver type | ||
typedef amgcl::backend::builtin<double> SBackend; // the solver backend | ||
typedef amgcl::backend::builtin<float> PBackend; // the preconditioner backend | ||
|
||
typedef amgcl::mpi::make_solver< | ||
amgcl::mpi::amg< | ||
PBackend, | ||
amgcl::mpi::coarsening::smoothed_aggregation<PBackend>, | ||
amgcl::mpi::relaxation::spai0<PBackend> | ||
>, | ||
amgcl::mpi::solver::cg<PBackend> | ||
> Solver; | ||
|
||
// The distributed matrix | ||
auto A = std::make_shared<amgcl::mpi::distributed_matrix<SBackend>>( | ||
world, std::tie(chunk, ptr, col, val)); | ||
|
||
// Partition the matrix, the RHS vector, and the coordinates. | ||
// If neither ParMETIS not PT-SCOTCH are not available, | ||
// just keep the current naive partitioning. | ||
#if defined(AMGCL_HAVE_PARMETIS) || defined(AMGCL_HAVE_SCOTCH) | ||
# if defined(AMGCL_HAVE_PARMETIS) | ||
typedef amgcl::mpi::partition::parmetis<SBackend> Partition; | ||
# elif defined(AMGCL_HAVE_SCOTCH) | ||
typedef amgcl::mpi::partition::ptscotch<SBackend> Partition; | ||
# endif | ||
|
||
if (world.size > 1) { | ||
auto t = prof.scoped_tic("partition"); | ||
Partition part; | ||
|
||
// part(A) returns the distributed permutation matrix. | ||
// Keep the DOFs belonging to the same grid nodes together | ||
// (use block-wise partitioning with block size 3). | ||
auto P = part(*A, 3); | ||
auto R = transpose(*P); | ||
|
||
// Reorder the matrix: | ||
A = product(*R, *product(*A, *P)); | ||
|
||
// Reorder the RHS vector and the coordinates: | ||
R->move_to_backend(); | ||
std::vector<double> new_rhs(R->loc_rows()); | ||
std::vector<double> new_coo(R->loc_rows()); | ||
amgcl::backend::spmv(1, *R, rhs, 0, new_rhs); | ||
amgcl::backend::spmv(1, *R, coo, 0, new_coo); | ||
rhs.swap(new_rhs); | ||
coo.swap(new_coo); | ||
|
||
// Update the number of the local rows | ||
// (it may have changed as a result of permutation). | ||
chunk = A->loc_rows(); | ||
} | ||
#endif | ||
|
||
// Solver parameters: | ||
Solver::params prm; | ||
prm.solver.maxiter = 500; | ||
prm.precond.coarsening.aggr.eps_strong = 0; | ||
|
||
// Convert the coordinates to the rigid body modes. | ||
// The function returns the number of near null-space vectors | ||
// (3 in 2D case, 6 in 3D case) and writes the vectors to the | ||
// std::vector<double> specified as the last argument: | ||
prm.precond.coarsening.aggr.nullspace.cols = amgcl::coarsening::rigid_body_modes( | ||
3, coo, prm.precond.coarsening.aggr.nullspace.B); | ||
|
||
// Initialize the solver with the system matrix. | ||
prof.tic("setup"); | ||
Solver solve(world, A, prm); | ||
prof.toc("setup"); | ||
|
||
// Show the mini-report on the constructed solver: | ||
if (world.rank == 0) std::cout << solve << std::endl; | ||
|
||
// Solve the system with the zero initial approximation: | ||
int iters; | ||
double error; | ||
std::vector<double> x(chunk, 0.0); | ||
|
||
prof.tic("solve"); | ||
std::tie(iters, error) = solve(*A, rhs, x); | ||
prof.toc("solve"); | ||
|
||
// Output the number of iterations, the relative error, | ||
// and the profiling data: | ||
if (world.rank == 0) { | ||
std::cout | ||
<< "Iters: " << iters << std::endl | ||
<< "Error: " << error << std::endl | ||
<< prof << std::endl; | ||
} | ||
} |