Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Algebraic multigrid #1803

Merged
merged 258 commits into from
Mar 18, 2020
Merged

Algebraic multigrid #1803

merged 258 commits into from
Mar 18, 2020

Conversation

cmacmackin
Copy link
Collaborator

This is my work on the Algebraic multigrid solver for the fully-3D Laplace inversion. It is currently passing all unit tests but these only cover simple Cartesian metrics on a single processor. I am awaiting integration tests to try it in more complex scenarios. There is a little bit of simple optimisation still required, prior to merging, and possibly some tidying up hear and there.

There is also the implementation of an AMG solver for the 2D Laplace inversion, implemented by a previous developer. I have not been involved with this code and can not vouch for its behaviour.

I have implemented a new interface to PETSc (see issue #1771) which allows vectors and arrays to be set using the local BOUT++ index objects for that processor. The wrapper will automatically handle conversions to global indexing for PETSc. It also handles the interpolation weights needed when calculating along-field values in the y-direction. It has been extensively unit tested. The unit tests for the index conversion ability required some modest refactoring of the Mesh class, making some methods virtual which weren't previously and adding methods which wrap some MPI calls (so that they can be overridden for testing purposes). This allows the parallel behaviour of the indexing to be tested while running the unit tests in serial.

The way which @johnomotani has changed the D2DXDY made my unit tests of the Laplace solver fail as the forward version of the Laplace operator I had defined could no longer work. See my email to John below:

Hi John,

I got my Laplace solver to the point where it was passing all of my unit tests. I then decided to merge in the next branch. I've gotten everything compiling and unit tests are all passing, except the ones for the Laplace solver. These now all fail when I apply my forward Laplace operator due to NaNs occurring while taking D2DXDY (called from Coordinates::Laplace). This is a result of some changes which git-blame shows you made to how boundaries are handled. Previously the routine was

const Field3D D2DXDY(const Field3D &f, CELL_LOC outloc, const std::string &method, REGION region) {
   Field3D dfdy = DDY(f, outloc, method, RGN_NOY);
   f.getMesh()->communicate(dfdy);
   return DDX(dfdy, outloc, method, region);
}

It is now

Field3D D2DXDY(const Field3D& f, CELL_LOC outloc, const std::string& method,
   const std::string& region, const std::string& dfdy_boundary_condition) {

 // If staggering in x, take y-derivative at f's location.
 const auto y_location =
   (outloc == CELL_XLOW or f.getLocation() == CELL_XLOW) ? CELL_DEFAULT : outloc;

 Field3D dfdy = DDY(f, y_location, method, region);

 // Set x-guard cells and x-boundary cells before calculating DDX
 f.getMesh()->communicate(dfdy);
 dfdy.applyBoundary(dfdy_boundary_condition);

 return DDX(dfdy, outloc, method, region);
}

Whereas previously the y-derivative was computed in the region of the x-guards, it no longer is and instead it must be extrapolated from the interior cells. What is the reasoning for this choice? To me it seems like it would be less accurate. It would certainly mean a different result when applying the forward D2DXDY operator than when doing the same with a differentiation matrix.

In addition to the intrinsic numerical differences between these two approaches, the change causes problems for my unit tests. I am testing on a field with NX=3, meaning that you can't really do a decent extrapolation from only one interior point. Furthermore, the FakeMesh object doesn't properly initialise itself with boundaries so the FakeMesh::getBoundaries() method returns an empty vector and Field3D::applyBoundaries() doesn't actually do anything to set the boundaries of a field anyway.

I'm not really sure what the best approach would be to fixing my unit tests. Any ideas? Also, let me know if it would be better for me to post this issue on GitHub and I can copy it over.

Thanks,
Chris

He replied

Hi Chris,

The reason for the change is that values the corner guard cells are not well defined.

As you pointed out at our last meeting, they are populated by 'communicate', but only by out-of-date values (unless we changed to do the x-communication before y-communication).

And at the corner boundary cells, I'm surprised that they weren't 'NaN' already (at least if you ran with CHECK=3), because I don't think the boundary conditions set the corner values, although they might be set when a field is initialised (e.g. from an expression in the input file). The problem is that we have not before had to make a choice/decision about how boundary conditions are defined by the corner cells - the x- and y-boundaries are just treated independently - and I don't see an obviously correct choice of what to do.

The above two issues are why I took the corner cells to be un-defined, and therefore communicate and apply a boundary condition to 'dfdy' before calculating 'DDX(dfdy)'. In principle, the x-boundary condition on 'dfdy' should be consistent with the x-boundary condition defined for 'f', but I couldn't see a straightforward way of doing that generally, hence the use of extrapolating boundary conditions which normally do something that's not violently wrong.

There's a work-around if you were using 'D2DXDY' directly, namely passing something like "neumann" (which can be applied with only one grid point, i.e. NX=3) for the 'dfdy_boundary_condition' argument, but at the moment we've not implemented anything to pass that argument through the 'Laplace' or 'Laplace_perp' operators. That argument might be a good thing to have, although it makes the signature of Laplace/Laplace_perp more complicated. If generalising this argument is a solution for you, it would make sense to raise an issue on github to see whether anyone has any objections - it should be simple to implement, as D2DXDY is only used in a couple of places.

The issue of FakeMesh not doing actually applying boundary conditions doesn't have a simple answer that I can see... Peter Hill would be the best person to ask about the unit testing framework, I'd suggest dropping him a message on Slack.

Cheers,
John

Ultimately, the resolution which I settled on was to add a dfdy_boundary_condition argument to the Laplace routines. I also added an argument to D2DXDY called dfdy_region, defining what region to take DDY on. It defaults to an empty string, which indicates that this should be the same region as the D2DXDY operation as a whole. I added the argument dfdy_dy_region region to the Laplace routines, which can then be passed to D2DXDY.

None of this changes any default behaviour. However, it allows me to make my call to the Laplace operator in my tests with the form Laplace_perp(f, CELL_DEFAULT, "free", "RGN_NOY"), which ensures I get the behaviour I need.

I am of course willing to reconsider any of these changes to the codebase if people object to them.

johnomotani and others added 30 commits March 16, 2018 02:37
Added a new MMS test for 2d Laplacian solvers, but on a 3d
circular-cross-section-tokamak grid.
Copied from test-multigrid_laplace with laplace:type changed.
Major change is to correct use of 'x' as poloidal flux (starting from
inner edge of grid) vs. [0,1].

Fix calculation of dx/dy/dz. Were previously, incorrectly, using Lx as
a box length in all directions. Should be grid width of psiwidth in
x-direction and 2pi in y- and z-directions.

Add psiN0 argument of SimpleTokamak to choose where in minor radius
to start the grid.
This is currently not working in mms_alternate.py because the symbolic
integration to calculate zShift fails.
Useful to reduce size of z-direction
Makes grid closer to square, easier for iterative Laplace solvers.
This option may help to normalize matrix coefficients in iterative
Laplacian solvers, aiding convergence (at least in simple tests).
Normalize length scales to make gradients closer to order unity.
Change make q and shear small (q=.5+.1*psiN instead of q=2+psiN**2).

The test now passes when using the iterative multigrid solver.
Was applying petscamg matrix and BOUT++ operators to a field which did
not have a Dirichlet boundary correctly set numerically. This was
causing a discrepancy in the last grid cell.
Can change this to change the size of the radial domain
tests/MMS/laplace3d/runtest now (i) takes command line argument to
specify the maximum number of processors to use and (ii) saves plots as
pdf instead of showing interactively.
If Christoffel symbols and G1/G2/G3 are present in the input (grid file
or [mesh] section of BOUT.inp) then use these instead of calculating
from the metric with finite differences.
@ZedThree
Copy link
Member

Assuming the Travis backlog clears overnight, the last two split PRs will go into next tomorrow, then I'll merge next in here and fix the conflicts.

@johnomotani I can't see mms_alternate.py being used anywhere. Can it go, or should it replace the existing mms.py?

@johnomotani
Copy link
Contributor

👍 @ZedThree I think mms_alternate.py can go. The (old, partially implemented) test I was using it in has gone. It was originally intended as an upgrade/replacement of mms.py, but it's not polished/tested enough to be worth worrying about. A utility to provide analytic inputs with tokamak single/double-null topology for MMS testing would be useful, but probably better to start that from scratch.

* next: (254 commits)
  Return current position from MsgStack::push for OpenMP fix
  Remove unused macro in msg_stack header
  Temporary fix for MsgStack thread safety: only one thread
  Fix typo in example/performance/communications
  Make local variables const and reduce scope in XZ interpolations
  Add region arguments for methods of Interpolation classes
  Make build_and_log build the test for CMake
  Pass method and region arguments to D2DXDY in Laplace(Field2D)
  Simplify boundary loops in LaplaceXY::precon()
  Copy parallel slices in Field3D copy ctor
  Add Field3D move assignment operator
  Fix infinite recursion in identity transform calcParallelSlices
  fix for running unit-test as root
  mention that *_ROOT is optional
  Don't increase iterations before GMRES restart in test-laplacexy*
  Fix indexing in LaplaceXY preconditioner
  Better comments in LaplaceXY::solveFiniteDifference()
  Keep corner boundary cells from solution for finite-difference LaplaceXY
  Correct comments in finite-difference LaplaceXY preallocation
  Remove communications of corner cells in LaplaceXY
  ...
@ZedThree
Copy link
Member

ZedThree commented Mar 12, 2020

@johnomotani Dealing with the conflicts went pretty smoothly -- just keep the version from next! However, test-laplacexy is quite different between next and this PR -- which one to keep?

EDIT: I can reproduce the failing test on my machine now, so hopefully tomorrow I'll get a bit closer to working out what's causing it

@johnomotani
Copy link
Contributor

test-laplacexy is quite different between next and this PR -- which one to keep?

Keep the one from next. I think this branch had a very rough version from ages ago. The one from #1789 should be better.

@ZedThree
Copy link
Member

Thanks to the magic of ASan, I think I've found the issue:

WRITE of size 8 at 0x61b000027d78 thread T4
    #0 0x7f515d657d5b in MatDestroy /home/peter/Tools/petsc-git/src/mat/interface/matrix.c:1264
    #1 0xa22b6f in PetscMatrix<Field3D>::MatrixDeleter::operator()(_p_Mat**) const ../include/bout/petsc_interface.hxx:507
    #2 0xa337be in std::_Sp_counted_deleter<_p_Mat**, PetscMatrix<Field3D>::MatrixDeleter, std::allocator<void>, (__gnu_cxx::_Lock_policy)2>::_M_dispose() /usr/include/c++/9/bits/shared_ptr_base.h:471
...
    #6 0xa131f6 in std::shared_ptr<_p_Mat*>::operator=(std::shared_ptr<_p_Mat*> const&) /usr/include/c++/9/bits/shared_ptr.h:103
    #7 0xa132e3 in PetscMatrix<Field3D>::ynext(int) ../include/bout/petsc_interface.hxx:742
    #8 0xa0e1e0 in PetscMatrix<Field3D>::ydown(int) ../include/bout/petsc_interface.hxx:735
    #9 0xa023a8 in LaplacePetsc3dAmg::updateMatrix3D() [clone ._omp_fn.0] ../src/invert/laplace/impls/petsc3damg/petsc3damg.cxx:352
...

This is essentially because PETSc is not thread safe. A minimal fix is:

@@ -275,7 +279,7 @@ void LaplacePetsc3dAmg::updateMatrix3D() {
   
   // Set up the matrix for the internal points on the grid.
   // Boundary conditions were set in the constructor.
-  BOUT_FOR(l, indexer->getRegionNobndry()) {
+  BOUT_FOR_SERIAL(l, indexer->getRegionNobndry()) {
     // Index is called l for "location". It is not called i so as to
     // avoid confusing it with the x-index.
 
@@ -352,7 +356,7 @@ void LaplacePetsc3dAmg::updateMatrix3D() {
 
   // Must add these (rather than assign) so that elements used in
   // interpolation don't overwrite each other.
-  BOUT_FOR(l, indexer->getRegionNobndry()) {
+  BOUT_FOR_SERIAL(l, indexer->getRegionNobndry()) {
     BoutReal C_df_dy = (coords->G2[l] - dJ_dy[l]/coords->J[l]);
     if (issetD) {
       C_df_dy *= D[l];

but really, we should only be using BOUT_FOR_SERIAL with all loops that interact with PETSc. Or possibly go further and not compile with OpenMP and PETSc?

@johnomotani @cmacmackin How important is OpenMP to the performance here?


There's a second issue here that that PetscMatrix::ynext returns a new PetscMatrix. This is more expensive than necessary (there's a shared_ptr copy on top of shared_ptr ctor/dtor and PetscLib ctor/dtor). Might be better to follow how Field3D does it and keep vector<PetscMatrix> yup_matrices, ydown_matrices

@johnomotani
Copy link
Contributor

@ZedThree we haven't tested performance at all yet, but as PETSc isn't thread-safe I would not expect to be using OpenMP with this solver. I think the aim was just to make the BOUT++ part thread-safe, and then wrap the PETSc calls in BOUT_OMP(critical); I'd hope this would make it easier to convert the whole interface to (for example) Hypre that is (or can be?) thread-safe.

Would it be enough to just put a BOUT_OMP(critical) around the call to MatDestroy?

@ZedThree
Copy link
Member

I think even accesses to Mat/Vecs aren't thread safe, so there should be no calls to PETSc in any OMP regions. It seems to be mostly fine, aside from a few cases like this, but it's likely just a setting up a footgun for ourselves.

The tests are passing now so I'm going to go read through the changes again, apply some polish where needed, and then it's ready to merge.

@ZedThree
Copy link
Member

Switching PetscMatrix::yup/ydown to return references to members of a vector, similar to how Field3D::yup/ydown work saves about 30% runtime of the loops in LaplacePetsc3dAmg::updateMatrix3D.

However, it then requires an explicit call to PetscMatrix::splitParallelSlices, which I'm less sure about. I'll see if I can magic it away somehow.

Copy link
Member

@ZedThree ZedThree left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've got a bunch of changes for this, but they're stacking up. It might be best to just merge this now, and I'll open separate PRs for my changes. @johnomotani thoughts?

g_13 = 0.
g_23 = Bt*hthe*Rxy/Bp

\int_theta0^theta{nu dtheta}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be a comment?

g_13 = 0.
g_23 = Bt*hthe*Rxy/Bp

\int_theta0^theta{nu dtheta}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment?

@johnomotani
Copy link
Contributor

I've got a bunch of changes for this, but they're stacking up. It might be best to just merge this now, and I'll open separate PRs for my changes.

I think that's a good idea. The solver does work and pass tests as-is, and it'll be easier to review changes in separate PRs.

@johnomotani johnomotani merged commit 4db6a98 into next Mar 18, 2020
@johnomotani johnomotani deleted the algebraic_multigrid branch March 18, 2020 15:14
@ZedThree
Copy link
Member

🎉 Thanks @cmacmackin @johnomotani !

@ZedThree ZedThree added feature and removed work in progress Not ready for merging labels Mar 26, 2020
@ZedThree ZedThree added this to the BOUT-5.0 milestone Mar 26, 2020
@ZedThree ZedThree mentioned this pull request Mar 26, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants