-
Notifications
You must be signed in to change notification settings - Fork 90
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Iterative parallel tridiagonal solver #2030
Conversation
Development commits in branch feature/ipt-parallel-multigrid
Awesome! Thanks @JosephThomasParker ! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @JosephThomasParker ! Sorry for the tons of comments, but it's quite a large PR.
Would it be possible to write some unit tests for the individual parts? For example, we know that the refine/coarsen operations must be inverses of each other, we should be able to test that. There's an MpiWrapper
class that can be used to fake running on many cores that could be useful.
Some general comments:
- Please run
git clang-format next
, there's some confusing indentation! - Prefer the
Matrix/Tensor::reallocate
andstd::vector::resize
methods over the copy-assignment operators to change the size - Prefer the
Matrix/Tensor
assignment operators over nested loops to fill them with a scalar value Matrix/Tensor
have a constructor that takes the size, which simplifies a few places- Try to make all local variables
const
to begin with, and then un-const
as necessary - Invert
Level::included
conditionals to return early - All the functions that take a
Level&
should probably be methods onLevel
- The source and header files need adding to
CMakeLists.txt
src/invert/laplace/impls/iterative_parallel_tri/iterative_parallel_tri.hxx
Outdated
Show resolved
Hide resolved
src/invert/laplace/impls/iterative_parallel_tri/iterative_parallel_tri.hxx
Outdated
Show resolved
Hide resolved
src/invert/laplace/impls/iterative_parallel_tri/iterative_parallel_tri.hxx
Outdated
Show resolved
Hide resolved
src/invert/laplace/impls/iterative_parallel_tri/iterative_parallel_tri.hxx
Outdated
Show resolved
Hide resolved
src/invert/laplace/impls/iterative_parallel_tri/iterative_parallel_tri.cxx
Outdated
Show resolved
Hide resolved
src/invert/laplace/impls/iterative_parallel_tri/iterative_parallel_tri.cxx
Outdated
Show resolved
Hide resolved
src/invert/laplace/impls/iterative_parallel_tri/iterative_parallel_tri.cxx
Outdated
Show resolved
Hide resolved
if(l.included){ | ||
for(int kz=0; kz<nmode; kz++){ | ||
if(!converged[kz]){ | ||
for(int ix=1; ix<3; ix++){ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This loop is over elements 0, 1, 2
, but fine_error
has 4 elements in first dimension. Also, I can't see where fine_error(0, :)
is set, or fine_error(2, :)
for the non-last processors
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Think of fine_error
, like all the length 4 arrays as being:
(proc_in's first point, my first point, my last point, proc_out's first point)
I think the reason fine_error
is confusing is that it shouldn't really have guard cells. It's not like with the solution where we are syncing something between procs; we're really filling in an array that previous had no values:
|(*,a,0,*) | skipped | (*,b,0,*) |
gets refined to
|(*,a,0,*) | (*,0.5*(a+b),0,*) | (*,b,0,*) |
Should probably rewrite this as a 2 array.
l.residual(2,kz) = l.rr(2,kz) - l.ar(jy,2,kz)*l.xloc(1,kz) - l.br(jy,2,kz)*l.xloc(2,kz) - l.cr(jy,2,kz)*l.xloc(3,kz) ; | ||
} | ||
else{ | ||
l.residual(2,kz) = l.rr(2,kz) - l.ar(jy,2,kz)*l.xloc(0,kz) - l.br(jy,2,kz)*l.xloc(2,kz) - l.cr(jy,2,kz)*l.xloc(3,kz) ; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Worth making this a method on Level
, and passing in the indices?
src/invert/laplace/impls/iterative_parallel_tri/iterative_parallel_tri.cxx
Outdated
Show resolved
Hide resolved
Thanks @ZedThree for the detailed comments! Really appreciated! |
Nice speed-up in reconstructing the solution
To remove more, would need to make sure (2,kz) is only used on last proc
- Use Array/Matrix/Tensor::reallocate to set size of container - Use `1 << n` instead of `pow(2, n)` for integral types
@JosephThomasParker Please could you look over the changes in #2040 ? |
…ctors" This reverts commit 0474ac6.
This allows skipping work in cvode when we call the RHS many times
…x+allow-zero-loops Update IPT feature branch with code used in study
This adds an iterative parallel tridiagonal solver for Laplacian inversion, called as
type=ipt
.This is a hybrid approach that uses direct solves with the Thomas algorithm on each processor, and multigrid between processors. Given a processor's subdomain, we can write the solution
u
asu = u_0 + u_lower . alpha + u_upper . beta
where
u_0
,alpha
andbeta
are vectors output from the Thomas algorithm, andu_lower
andu_upper
are the values in the processor's guard cells. That is, given guard cells values, we can construct the full solution. To find the guard cell values, we solve using multigrid on a reduced grid containing only guard cells.The algorithm scales very favourably, as most of the work is in the local Thomas algorithm that scales like
O(number of x grid points / number of processors)
, while almost all the communication is local halo swaps in multigrid.I can't find a reference for this algorithm in the literature. It is somewhat similar to [1], but that uses a direct solve rather than multigrid after reducing the system. I have pdf notes, but it would be good to get an idea of how best to document this in BOUT++'s infrastructure.
[1] A Memory Efficient Parallel Tridiagonal Solver, T. M. Austin, M. Berndt, and J. D. Moulton https://pdfs.semanticscholar.org/3a05/eb3923980416762ff444d68ca49d2479834c.pdf