Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Set corner cells consistently #1885

Merged
merged 53 commits into from
Mar 9, 2020
Merged

Set corner cells consistently #1885

merged 53 commits into from
Mar 9, 2020

Conversation

johnomotani
Copy link
Contributor

@johnomotani johnomotani commented Jan 10, 2020

To properly support non-orthogonal grids (where the x- and y- directions are not orthogonal) we should properly fill the corner cells which are needed by mixed-derivative stencils (i.e. D2DXDY).

This PR addresses this by adding communication of corner guard cells, and updating boundary conditions to apply something to the corner boundary cells.

The communications are defined to make the corner cells consistent with what would happen if there was no processor boundary (in particular y-processor boundaries are required at X-points, but the separatrices can be anywhere in the x-domain, either on boundaries or not).

The corner boundary points are set by applying a y-boundary condition using x-boundary points as input. This is a somewhat arbitrary choice. It requires applying x-boundary conditions before y-boundary conditions (which happened to be the existing behaviour anyway). User-code boundary conditions (e.g. sheath boundary conditions) need to take care with the order. The y-boundary iterators have been updated to include the corner boundary points so user-implemented boundary conditions should not need any special handling of the corners, just care that x-boundary conditions have been applied before they are called.

There is an option mesh:include_corner_cells which can be set to false to revert to to the original behaviour, which is correct for orthogonal grids and might be slightly more optimized for that case.

WIP because tests of the new communications are still needed.

Possible backport candidate (maybe with the default of include_corner_cells set to false)? If we want to back-port the 3d Laplace solvers to v4.4 then this will be needed.

Defaults to true, as it is better to be correct by default than
maximally optimized.

Flag not implemented yet.
But include x-boundary cells in the y-communication.
Define some variables to specify where corner cells should be sent to
(dest) and where they should be received from (orig). Set to standard
neighbouring connections in BoutMesh::default_connections() and modified
as necessary when modifying connections with BoutMesh::set_connection()
or adding a target with BoutMesh::add_target().
Provides and uses a utility function in_range(value, lower, upper) for
forcing an index to be in the range [lower, upper).
If processor boundary is at the end of a limiter, previously the
y-boundary cells would be sent but not received. Add separate y-ranges
for sending and receiving data in the x-direction. Also ensure we don't
communicate corner cells separately when including y-boundaries in the
x-communication.
Since IDATA_buff_lowerY, etc., were split into IDATA_buff_lowerY_send
and IDATA_buff_lowerY_recv, etc., we need to take the max of all of
these to ensure the buffers created in BoutMesh::send are big enough.
This is (or will be) needed for non-orthogonal grids which have stencils
extending into corner cells so that corner cells need to be filled
consistently. Note that this fills the corner cells by applying a
y-boundary condition using x-boundary points. Therefore x-boundary
conditions must be applied before y-boundary conditions.
@johnomotani johnomotani added work in progress Not ready for merging feature proposal A code/feature outline proposal labels Jan 10, 2020
@ZedThree
Copy link
Member

Related and might be interesting, while researching multiblock libraries this week, I came across one library which extrapolated into the corners. Unfortunately, I can't remember which library! I'll try to dig it out.
Extrapolating is essentially equivalent to changing the stencil weights, I think, so might end up reducing the overall order.


TS_up_out = ts;
output_info.write("=> This processor sending out up\n");
}

if (include_corner_cells) {
// Connect corners as if communicating all guard cells first in y, then in x
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I fail to see why you not do that?
That would keep the number of communications the same, only introduce slightly larger messages.

What you would do is first communicate in y as we do now. Then you communicate in x, but including the boundary cells. This way you need the name number of communications, which would mean less impact on the runtime, as the package size change only increases slightly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The trouble is you'd have to wait for the y-communication to complete before you can do the x-communication; at the moment, and in the changes proposed here, all the communications can be sent at the same time, and you only have to call wait once. So while it's true that the amount of information passed would not change much, I'm under the impression that communications are limited by latency rather than bandwidth, so we don't want to wait for two sets of communication to complete rather than one.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we should do some proper benchmarking ...

A simple way to do this, without any code changes should be switching
between async and sync send, where we already have a switch for.
If we are limited by latencies, the async version should make the
communication roughly 4x as fast, but as far as I remember this is not
the case, but I might be wrong.

It's been a while, but I seem to remember that the async switch didn't
have a significant impact.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The switch is async_send, which results in using MPI_Isend instead of MPI_Send, but because MPI can copy the buffer and return from MPI_Send without needing to actually complete the communication, this may not make much difference https://stackoverflow.com/questions/10017301/mpi-blocking-vs-non-blocking. I just did a quick test and communicating over 4 nodes of Marconi, async_send made no noticable difference. However, the receives in BOUT++ are always asynchronous - I think that means that regardless of the async_send setting, the communications are overlapping. It would need more hacking around to force the x- and y-communications to really be consecutive to test this further - would be interesting to prove there's a difference, but I'm not going to pursue now.

Tests both disconnected-double-null and limiter topologies, with a
variety of different NXPE values to check different cases of
separatrices being on or away from processor boundaries.
Previously include_corner_cells was initialised after nx, ny, nz, MXG,
MYG, and MZG although it is declared before them.
Now that corner cells are excluded from the y-communications, there can
sometimes be zero-size parts of buffers. These could previously cause
out-of-bounds indexing errors when calling things like &buffer[len] to
get the start of the part of the buffer.
What should have been an 'else if' was previously an 'if'.
Previously the corner guard cells were copied into the buffers to be
sent. Should be the corner interior cells, which are sent to the corner
guard cells on another processor.
Allows them to be used inside const methods.
Update the recent modification to the iterateBndry*Y() methods, which
made them include corner boundary cells, so that the corner cells are
only included when include_corner_cells=true. This way if
include_corner_cells=false, the previous behaviour is maintained.
Handling the nullptr case within the initialisation list means that
there is always a valid options member, which can be used to set the
'const bool include_corner_cells' in the initialisation list.
@johnomotani
Copy link
Contributor Author

I've refactored now to implement corners by doing communications first in the y-direction then in the x-direction. Across 4 nodes on Marconi, communication seems to take about the same time as on current next, which is better than it was when communicating the corners separately, and the code is much simpler too.

I've also added the boundary regions @cmacmackin mentioned.

@ZedThree
Copy link
Member

Apparently our communications are limited by bandwidth and not latency, which was a surprise to me

@johnomotani This is an interesting result! Any chance the data and generating code could be shared? Maybe stick it in examples/performance?

At some point, I will get around to making a BOUT++ performance repo, with all our test cases and harnesses, data, etc.

@johnomotani
Copy link
Contributor Author

@ZedThree I added the code in examples/performance/communications already. I was testing with the data/BOUT.inp.smallblocks case which for 4x48proc nodes on Marconi is only 2x2 grids per processor, and has the x-domain split across 2 nodes. I'll double check and upload some example log files.

@johnomotani
Copy link
Contributor Author

Here are some logs for simulations with 2x2 blocks on each processor:

Here are some with a larger grid (192x128x128, with nxpe=96, nype=2) (note that the root processor claims to spend less time in communications, but almost all the rest spend about 60% in communications for all cases below - I've uploaded all the logs in the tarballs):

@cmacmackin
Copy link
Collaborator

@johnomotani Thanks, I can confirm that the regions are now behaving correctly. I have already pulled this work into #1803, where it seems to be working.

@dschwoerer
Copy link
Contributor

@johnomotani The test you send me behave strange, and behave better if you set the time derivative ddt(f)=0 explicitly. Not sure why it didn't crash without setting the derivative, but without setting the derivative, some procs spend 9 times in rhs/solver compared to otherwise. This might have caused issues for timing ...

I haven't got the new results of Marconi before it went down ... so will post again tomorrow.

In your case there are several ones that spend 2 times as much in calc and also more time in solving. As such the timings are not that reliably, as all the other procs need to wait on the slow ones in comm.

cmacmackin added a commit that referenced this pull request Feb 3, 2020
Updated the way FakeParallelMesh works so that it reflects the new
approach to communication implemented in #1885, guaranteeing
consistent corner cells.
Geometric fields that are loaded from the grid by Coordinates previously
had the corner boundary cells set to NaN. These should now either be
read in, or filled by extrapolating boundary conditions if
include_corner_cells=true, so do not set them to NaN in that case. The
NaNs could cause floating-point exception errors when --enable-sigfpe is
used.
MPI_Barrier() before mesh->communicate() and only time
mesh->communicate(), ensures different execution times on different
processors do not give misleading timings.
@johnomotani
Copy link
Contributor Author

In your case there are several ones that spend 2 times as much in calc and also more time in solving. As such the timings are not that reliably, as all the other procs need to wait on the slow ones in comm.

@dschwoerer has a good point. I've updated the performance test to have an MPI_Barrier() just before the call to mesh->communicate() and to time only that call - printed on the just comms line in the output.

With that update, I get (using a 192x128x128 grid on 4 nodes of Marconi with nxpe=48 so both x- and y-directions are split onto 2 nodes):

  • 7.46 s existing standard implementation (without consistent corner cells)
  • 15.49 s using consecutive x- then y-communications
  • 13.43 s using overlapping x-, y- and corner-communications (version from 5ae549f, before Reverts)

I still don't understand why the version with corner communications but everything overlapping is so much slower, but it seems to be faster than the x-then-y version (although not massively) - numbers above are from repeating each run twice and the timings of the communications seem fairly consistent. I propose to reset this PR to the previous, overlapping-communications version (and cherry-pick the last 4 commits which are still needed/useful). Any objections? As an argument against, the non-overlapping implementation is much simpler, and we don't usually spend much time in mesh->communicate() anyway.

result(i, localmesh->LocalNy - 1 - j) = BoutNaN;
result(localmesh->LocalNx - 1 - i, j) = BoutNaN;
result(localmesh->LocalNx - 1 - i, localmesh->LocalNy - 1 - j) = BoutNaN;
if (not localmesh->include_corner_cells) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the point?
If they are not included, they shouldn't be used.
If you think they are invalid, then you should set them to NaN in all cases.
These seems to me like this is causing unnecessary breakage.
Especially if this is generated from a expression, like 1.0, there is absolutely nothing wrong with the corner cells, and setting them to NaN has caused issues for me several times in the past.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A fair point, I think. If include_corner_cells is true, then the corners will be communicated later, but they will remain invalid until that communication.

I'm not sure we either currently do or should treat fields generated from expressions differently, so probably the best thing to do is to always set them to NaN.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My issue is that in the coords3d branch I removed the setting to NaN, so that dz is constant if it is generated from a constant expression. Another option would be to extrapolate them if we extrapolate in x and y, however that would be more work ...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could wrap setting them in #if CHECK > 0 (level to be debated)?

What do you do if dz is not generated from a constant expression? And do you need dz in the corners?

At some we're also going to add guards and boundaries for z, so this may get even more fun.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wrapping in CHECK is probably useful anyway, > 0 is probably fine because it is only used in the init. However, that does not solve this for coords3d.

I check whether dz (or zlength()) is constant. Currently the check is for RGN_ALL as then no further checks are needed, what the actual used domain is, and the first element is used.

In the case of a non-constant dz, any of the routines that assume a constant dz/zlength() cannot be used, so dz doesn't need to be constant, and corner cells are no more important then in any other case.

It is probably best to check in all cases what domain is used, and only check constantness for this domain, rather then RGN_ALL, but dz(0,0) is so much easier then dz[dz.getRegion(...).begin()[0]]

To do things right, I think we should:

  • extrapolate into corners, if they are to be used, otherwise leave them. The motivation to not set them to NaN is that they might be generated from a grid or input, which already includes guards and corners. We also do not break boundaries, if they are not extrapolated.
  • use appropriate regions in coord3d

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is probably best to check in all cases what domain is used, and only check constantness for this domain, rather then RGN_ALL, but dz(0,0) is so much easier then dz[dz.getRegion(...).begin()[0]]

Might be wise to wrap this up in something like constant_dz(const string& region) that throws if it's not constant over that region.

extrapolate into corners, if they are to be used, otherwise leave them

Is that not what is done in this change? This flag doesn't say whether or not to use the corners, just whether or not they are communicated.

We could of course mandate that grids for v5 provide values in the boundaries, which removes all this extrapolation stuff. That might just be moving the problem up a level.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is probably best to check in all cases what domain is used, and only check constantness for this domain, rather then RGN_ALL, but dz(0,0) is so much easier then dz[dz.getRegion(...).begin()[0]]

Might be wise to wrap this up in something like constant_dz(const string& region) that throws if it's not constant over that region.

👍 will do

extrapolate into corners, if they are to be used, otherwise leave them

Is that not what is done in this change? This flag doesn't say whether or not to use the corners, just whether or not they are communicated.

I do not see any corner extrapolation. Either they are set to NaN (if ! include_corner_cells) and otherwise left as they are. With the methods from the non-uniform boundary stuff, it shouldn't be that hard to extend this into 2D/3D to allow extrapolation into corners.

We could of course mandate that grids for v5 provide values in the boundaries, which removes all this extrapolation stuff. That might just be moving the problem up a level.

Certainly would be easier for the initialization. Also the extrapolation would only need to be done once, at grid creation time. Writing it as script might be easier, because we don't have to think at all possible cases at once.

Also with the extrapolation, I feel uncertain on whether we should include the grid spacing. For dz, dy, dz that is non-trivial, but might be more accurate for the other ones ...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, all the corner-cell handling is rather awful. I probably need more comments/documentation...

If include_corner_cells == true, we should be extrapolating into the corners. This PR modifies the boundary regions so that if include_corner_cells == true then the y-boundaries include the corner cells. As the y-boundaries are added after the x-boundaries, the for (auto& bndry : localmesh->getBoundaries()) { loop above should first extrapolate into the x-boundaries, then into the y-boundaries, including extrapolating into corner cells using values from x-boundary cells.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd be happy to set the corners to NaN only when we are extrapolating into some boundary cells. Otherwise there is correct data in the corner cells, which there is no reason to overwrite. Keeping it can occasionally be useful: for example when using staggered grids, there is a case when we want to invert a Laplacian at Ny+1, and the non-uniform grid correction breaks unless there is valid data in the corner cells.

examples/performance/communications/communications.cxx Outdated Show resolved Hide resolved
result(i, localmesh->LocalNy - 1 - j) = BoutNaN;
result(localmesh->LocalNx - 1 - i, j) = BoutNaN;
result(localmesh->LocalNx - 1 - i, localmesh->LocalNy - 1 - j) = BoutNaN;
if (not localmesh->include_corner_cells) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A fair point, I think. If include_corner_cells is true, then the corners will be communicated later, but they will remain invalid until that communication.

I'm not sure we either currently do or should treat fields generated from expressions differently, so probably the best thing to do is to always set them to NaN.

Field3D f;
};

BOUTMAIN(TestCommunications);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Future work, could this be ported to be a unit test using a mock MPI?

include/bout/mesh.hxx Show resolved Hide resolved
include/bout/mesh.hxx Show resolved Hide resolved
The corner cells are only invalid when include_corner_cells==false and
we are extrapolating into either or both the x- and y-boundaries.
Otherwise they are valid and there is no reason to set them to NaN.
johnomotani pushed a commit that referenced this pull request Mar 8, 2020
@ZedThree ZedThree merged commit f519ebe into next Mar 9, 2020
@ZedThree ZedThree deleted the consistent-corner-cells branch March 9, 2020 13:50
ZedThree added a commit that referenced this pull request Mar 10, 2020
* next: (95 commits)
  Format test-yupdown-weights runtest
  Fix test-yupdown-weights
  Add y-communications in Hermite spline interpolations
  Exclude corner cells in LaplaceXY
  Fix options for test-yupdown-weights
  Added test-yupdown-weights to CMake build
  Changes requested in code review of PR #1885
  Set twistshift=true to fix test-yupdown-weights
  Changed ParallelTransform::positionsAndWeights to PositionsAndWeights
  Update test-yupdown-weights for v4.3
  Getters for interpolation weights moved to ParallelTransforms
  Add test for constructing derivatives from Hermite weights.
  Declare result where it is first assigned
  Fix elm-pb
  Only set corner cells of Coordinates fields to NaN if CHECK>0
  Simplify performance communications test, removing PhysicsModel
  Only set corner cells of Coordinates fields to NaN when necessary
  Update to 6field-simple example to use interpolation_xz.hxx header
  Remove option to use CtoL and LtoC operators in 6field-simple example
  Attempt to update staggered grid support in gravity_reduced example
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature proposal A code/feature outline proposal
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants