Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Iterative parallel tridiagonal solver #2030

Merged
merged 78 commits into from
Dec 28, 2020
Merged

Iterative parallel tridiagonal solver #2030

merged 78 commits into from
Dec 28, 2020

Conversation

JosephThomasParker
Copy link
Contributor

This adds an iterative parallel tridiagonal solver for Laplacian inversion, called as type=ipt.

This is a hybrid approach that uses direct solves with the Thomas algorithm on each processor, and multigrid between processors. Given a processor's subdomain, we can write the solution u as
u = u_0 + u_lower . alpha + u_upper . beta
where u_0, alpha and beta are vectors output from the Thomas algorithm, and u_lower and u_upper are the values in the processor's guard cells. That is, given guard cells values, we can construct the full solution. To find the guard cell values, we solve using multigrid on a reduced grid containing only guard cells.

The algorithm scales very favourably, as most of the work is in the local Thomas algorithm that scales like O(number of x grid points / number of processors), while almost all the communication is local halo swaps in multigrid.

I can't find a reference for this algorithm in the literature. It is somewhat similar to [1], but that uses a direct solve rather than multigrid after reducing the system. I have pdf notes, but it would be good to get an idea of how best to document this in BOUT++'s infrastructure.

[1] A Memory Efficient Parallel Tridiagonal Solver, T. M. Austin, M. Berndt, and J. D. Moulton https://pdfs.semanticscholar.org/3a05/eb3923980416762ff444d68ca49d2479834c.pdf

Development commits in branch feature/ipt-parallel-multigrid
@bendudson
Copy link
Contributor

Awesome! Thanks @JosephThomasParker !
If possible, it would be good to convert the pdf source (LaTeX?) to .rst format, so it can be included in the online manual. This is mostly automatic, but equations often need some manual intervention.

Copy link
Member

@ZedThree ZedThree left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @JosephThomasParker ! Sorry for the tons of comments, but it's quite a large PR.

Would it be possible to write some unit tests for the individual parts? For example, we know that the refine/coarsen operations must be inverses of each other, we should be able to test that. There's an MpiWrapper class that can be used to fake running on many cores that could be useful.

Some general comments:

  • Please run git clang-format next, there's some confusing indentation!
  • Prefer the Matrix/Tensor::reallocate and std::vector::resize methods over the copy-assignment operators to change the size
  • Prefer the Matrix/Tensor assignment operators over nested loops to fill them with a scalar value
  • Matrix/Tensor have a constructor that takes the size, which simplifies a few places
  • Try to make all local variables const to begin with, and then un-const as necessary
  • Invert Level::included conditionals to return early
  • All the functions that take a Level& should probably be methods on Level
  • The source and header files need adding to CMakeLists.txt

if(l.included){
for(int kz=0; kz<nmode; kz++){
if(!converged[kz]){
for(int ix=1; ix<3; ix++){
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This loop is over elements 0, 1, 2, but fine_error has 4 elements in first dimension. Also, I can't see where fine_error(0, :) is set, or fine_error(2, :) for the non-last processors

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Think of fine_error, like all the length 4 arrays as being:
(proc_in's first point, my first point, my last point, proc_out's first point)

I think the reason fine_error is confusing is that it shouldn't really have guard cells. It's not like with the solution where we are syncing something between procs; we're really filling in an array that previous had no values:
|(*,a,0,*) | skipped | (*,b,0,*) |
gets refined to
|(*,a,0,*) | (*,0.5*(a+b),0,*) | (*,b,0,*) |

Should probably rewrite this as a 2 array.

l.residual(2,kz) = l.rr(2,kz) - l.ar(jy,2,kz)*l.xloc(1,kz) - l.br(jy,2,kz)*l.xloc(2,kz) - l.cr(jy,2,kz)*l.xloc(3,kz) ;
}
else{
l.residual(2,kz) = l.rr(2,kz) - l.ar(jy,2,kz)*l.xloc(0,kz) - l.br(jy,2,kz)*l.xloc(2,kz) - l.cr(jy,2,kz)*l.xloc(3,kz) ;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Worth making this a method on Level, and passing in the indices?

@JosephThomasParker
Copy link
Contributor Author

Thanks @ZedThree for the detailed comments! Really appreciated!

@ZedThree
Copy link
Member

ZedThree commented Jun 8, 2020

@JosephThomasParker Please could you look over the changes in #2040 ?

bendudson
bendudson previously approved these changes Dec 14, 2020
@bendudson bendudson merged commit a59648a into next Dec 28, 2020
@bendudson bendudson deleted the feature/ipt+next branch December 28, 2020 12:26
@ZedThree ZedThree mentioned this pull request Jan 4, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants