Iterative parallel tridiagonal solver #2030

JosephThomasParker · 2020-05-11T14:51:02Z

This adds an iterative parallel tridiagonal solver for Laplacian inversion, called as type=ipt.

This is a hybrid approach that uses direct solves with the Thomas algorithm on each processor, and multigrid between processors. Given a processor's subdomain, we can write the solution u as
u = u_0 + u_lower . alpha + u_upper . beta
where u_0, alpha and beta are vectors output from the Thomas algorithm, and u_lower and u_upper are the values in the processor's guard cells. That is, given guard cells values, we can construct the full solution. To find the guard cell values, we solve using multigrid on a reduced grid containing only guard cells.

The algorithm scales very favourably, as most of the work is in the local Thomas algorithm that scales like O(number of x grid points / number of processors), while almost all the communication is local halo swaps in multigrid.

I can't find a reference for this algorithm in the literature. It is somewhat similar to [1], but that uses a direct solve rather than multigrid after reducing the system. I have pdf notes, but it would be good to get an idea of how best to document this in BOUT++'s infrastructure.

[1] A Memory Efficient Parallel Tridiagonal Solver, T. M. Austin, M. Berndt, and J. D. Moulton https://pdfs.semanticscholar.org/3a05/eb3923980416762ff444d68ca49d2479834c.pdf

Development commits in branch feature/ipt-parallel-multigrid

bendudson · 2020-05-11T15:02:20Z

Awesome! Thanks @JosephThomasParker !
If possible, it would be good to convert the pdf source (LaTeX?) to .rst format, so it can be included in the online manual. This is mostly automatic, but equations often need some manual intervention.

ZedThree

Thanks @JosephThomasParker ! Sorry for the tons of comments, but it's quite a large PR.

Would it be possible to write some unit tests for the individual parts? For example, we know that the refine/coarsen operations must be inverses of each other, we should be able to test that. There's an MpiWrapper class that can be used to fake running on many cores that could be useful.

Some general comments:

Please run git clang-format next, there's some confusing indentation!
Prefer the Matrix/Tensor::reallocate and std::vector::resize methods over the copy-assignment operators to change the size
Prefer the Matrix/Tensor assignment operators over nested loops to fill them with a scalar value
Matrix/Tensor have a constructor that takes the size, which simplifies a few places
Try to make all local variables const to begin with, and then un-const as necessary
Invert Level::included conditionals to return early
All the functions that take a Level& should probably be methods on Level
The source and header files need adding to CMakeLists.txt

src/invert/laplace/impls/iterative_parallel_tri/iterative_parallel_tri.hxx

src/invert/laplace/impls/iterative_parallel_tri/iterative_parallel_tri.cxx

ZedThree · 2020-05-13T10:51:01Z

src/invert/laplace/impls/iterative_parallel_tri/iterative_parallel_tri.cxx

+  if(l.included){
+    for(int kz=0; kz<nmode; kz++){
+      if(!converged[kz]){
+	for(int ix=1; ix<3; ix++){


This loop is over elements 0, 1, 2, but fine_error has 4 elements in first dimension. Also, I can't see where fine_error(0, :) is set, or fine_error(2, :) for the non-last processors

Think of fine_error, like all the length 4 arrays as being:
(proc_in's first point, my first point, my last point, proc_out's first point)

I think the reason fine_error is confusing is that it shouldn't really have guard cells. It's not like with the solution where we are syncing something between procs; we're really filling in an array that previous had no values:
|(*,a,0,*) | skipped | (*,b,0,*) |
gets refined to
|(*,a,0,*) | (*,0.5*(a+b),0,*) | (*,b,0,*) |

Should probably rewrite this as a 2 array.

ZedThree · 2020-05-13T10:54:41Z

src/invert/laplace/impls/iterative_parallel_tri/iterative_parallel_tri.cxx

+	    l.residual(2,kz) = l.rr(2,kz) - l.ar(jy,2,kz)*l.xloc(1,kz) - l.br(jy,2,kz)*l.xloc(2,kz) - l.cr(jy,2,kz)*l.xloc(3,kz)  ;
+	  }
+	  else{
+	    l.residual(2,kz) = l.rr(2,kz) - l.ar(jy,2,kz)*l.xloc(0,kz) - l.br(jy,2,kz)*l.xloc(2,kz) - l.cr(jy,2,kz)*l.xloc(3,kz)  ;


Worth making this a method on Level, and passing in the indices?

src/invert/laplace/impls/iterative_parallel_tri/iterative_parallel_tri.cxx

JosephThomasParker · 2020-05-13T18:37:40Z

Thanks @ZedThree for the detailed comments! Really appreciated!

Nice speed-up in reconstructing the solution

To remove more, would need to make sure (2,kz) is only used on last proc

- Use Array/Matrix/Tensor::reallocate to set size of container - Use `1 << n` instead of `pow(2, n)` for integral types

ZedThree · 2020-06-08T12:37:11Z

@JosephThomasParker Please could you look over the changes in #2040 ?

…ctors" This reverts commit 0474ac6.

This allows skipping work in cvode when we call the RHS many times

…x+allow-zero-loops Update IPT feature branch with code used in study

Add iterative parallel tridiagonal solver

22e7937

Development commits in branch feature/ipt-parallel-multigrid

JosephThomasParker added the feature label May 11, 2020

ZedThree reviewed May 13, 2020

View reviewed changes

JosephThomasParker added 25 commits May 13, 2020 19:47

Run clang format

1b94d58

Use helper function for power of 2

f65c769

Tidy array declarations and assignments

d988744

Remove unused code for using approximate solution as initial guess

7e33659

Use all_of in custom all(array) rountine

89385cc

Delete unused any, max, and maxloc functions

93e18e1

Move solve to header

86f9812

Use std ceil and log

11b92a6

Preincrement

6a2e352

Add transpose function

c5fcc35

Return when proc is not included

2502b37

Remove unused variables

4079e64

Replace divides with multilpication by one-over

ce17f23

Reorder lower/upperGuard cell arrays

919ee45

Nice speed-up in reconstructing the solution

Improve comment on Nyquist frequency in transpose

80e58a7

Tidy logic of black and red procs

29fafc8

Use default destructor

b0e0a16

Improve comment in header

758ff4d

Take kz=0 case out of kz loop

97b8659

Change first_call from matrix to array (now kz is bundled)

567c16d

Make Delta a const

388ed3f

Remove some branching from residual calcalution

6bf7a0d

To remove more, would need to make sure (2,kz) is only used on last proc

Dubious start on using level as a class

2ce3ab1

Make level a class, and pass LaplaceIPT this as argument

d239e1e

Make jy a member of LaplaceIPT so no need to pass to functions

eca6846

ZedThree and others added 17 commits May 27, 2020 17:26

Make LaplaceIPT::Level::init methods constructors

d71df89

- Use Array/Matrix/Tensor::reallocate to set size of container - Use `1 << n` instead of `pow(2, n)` for integral types

Use fmt formatting in error message

a191e20

Avoid needless assignment

e34b259

Make some methods and local variables const

6a981b2

Remove unneeded else statement

fec39e8

Remove function declaration for deleted function

b82a27f

Apply clang-format

cececb7

Fix some pass by const-ref issues

4944b4d

Add helper function for testing Laplacian flags

cc122e1

Use ternaries to make local variable const

df9ddb0

Use helper func to communicate reduced coefficients in Level ctors

0474ac6

Suggest increase rtol or atol in convergence error message

943d164

Delete some commented out code

5ed6ff6

Make flag helper methods const

323153c

Improve defaults, and prevent calling MG on 1 core

db03e2d

Do not calculate diagonal dominance on skipped procs

7345339

Commit forgotten header

3e8b1cc

JosephThomasParker and others added 5 commits June 12, 2020 17:20

Merge branch 'ipt_polish' into feature/ipt+next

d6145fa

Combine rtol and atol into single error measure

0f905ed

Revert "Use helper func to communicate reduced coefficients in Level …

c6df3ae

…ctors" This reverts commit 0474ac6.

Test for convergence before start of while loop

276f777

This allows skipping work in cvode when we call the RHS many times

Merge pull request #2140 from boutproject/feature/ipt+new-error+bugfi…

faaad42

…x+allow-zero-loops Update IPT feature branch with code used in study

bendudson previously approved these changes Dec 14, 2020

View reviewed changes

bendudson dismissed their stale review via 1fedab3 December 28, 2020 11:44

bendudson approved these changes Dec 28, 2020

View reviewed changes

Merge branch 'next' into feature/ipt+next

1fedab3

bendudson merged commit a59648a into next Dec 28, 2020

bendudson deleted the feature/ipt+next branch December 28, 2020 12:26

ZedThree mentioned this pull request Jan 4, 2021

test-invpar timing out #2185

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Iterative parallel tridiagonal solver #2030

Iterative parallel tridiagonal solver #2030

JosephThomasParker commented May 11, 2020

bendudson commented May 11, 2020

ZedThree left a comment •

edited

ZedThree May 13, 2020

JosephThomasParker May 14, 2020

ZedThree May 13, 2020

JosephThomasParker commented May 13, 2020

ZedThree commented Jun 8, 2020

Iterative parallel tridiagonal solver #2030

Iterative parallel tridiagonal solver #2030

Conversation

JosephThomasParker commented May 11, 2020

bendudson commented May 11, 2020

ZedThree left a comment • edited

Choose a reason for hiding this comment

ZedThree May 13, 2020

Choose a reason for hiding this comment

JosephThomasParker May 14, 2020

Choose a reason for hiding this comment

ZedThree May 13, 2020

Choose a reason for hiding this comment

JosephThomasParker commented May 13, 2020

ZedThree commented Jun 8, 2020

ZedThree left a comment •

edited