Set corner cells consistently #1885

johnomotani · 2020-01-10T10:04:25Z

To properly support non-orthogonal grids (where the x- and y- directions are not orthogonal) we should properly fill the corner cells which are needed by mixed-derivative stencils (i.e. D2DXDY).

This PR addresses this by adding communication of corner guard cells, and updating boundary conditions to apply something to the corner boundary cells.

The communications are defined to make the corner cells consistent with what would happen if there was no processor boundary (in particular y-processor boundaries are required at X-points, but the separatrices can be anywhere in the x-domain, either on boundaries or not).

The corner boundary points are set by applying a y-boundary condition using x-boundary points as input. This is a somewhat arbitrary choice. It requires applying x-boundary conditions before y-boundary conditions (which happened to be the existing behaviour anyway). User-code boundary conditions (e.g. sheath boundary conditions) need to take care with the order. The y-boundary iterators have been updated to include the corner boundary points so user-implemented boundary conditions should not need any special handling of the corners, just care that x-boundary conditions have been applied before they are called.

There is an option mesh:include_corner_cells which can be set to false to revert to to the original behaviour, which is correct for orthogonal grids and might be slightly more optimized for that case.

~~WIP because tests of the new communications are still needed.~~

Possible backport candidate (maybe with the default of include_corner_cells set to false)? If we want to back-port the 3d Laplace solvers to v4.4 then this will be needed.

Defaults to true, as it is better to be correct by default than maximally optimized. Flag not implemented yet.

But include x-boundary cells in the y-communication.

Define some variables to specify where corner cells should be sent to (dest) and where they should be received from (orig). Set to standard neighbouring connections in BoutMesh::default_connections() and modified as necessary when modifying connections with BoutMesh::set_connection() or adding a target with BoutMesh::add_target().

Provides and uses a utility function in_range(value, lower, upper) for forcing an index to be in the range [lower, upper).

If processor boundary is at the end of a limiter, previously the y-boundary cells would be sent but not received. Add separate y-ranges for sending and receiving data in the x-direction. Also ensure we don't communicate corner cells separately when including y-boundaries in the x-communication.

Since IDATA_buff_lowerY, etc., were split into IDATA_buff_lowerY_send and IDATA_buff_lowerY_recv, etc., we need to take the max of all of these to ensure the buffers created in BoutMesh::send are big enough.

This is (or will be) needed for non-orthogonal grids which have stencils extending into corner cells so that corner cells need to be filled consistently. Note that this fills the corner cells by applying a y-boundary condition using x-boundary points. Therefore x-boundary conditions must be applied before y-boundary conditions.

ZedThree · 2020-01-10T10:32:22Z

Related and might be interesting, while researching multiblock libraries this week, I came across one library which extrapolated into the corners. Unfortunately, I can't remember which library! I'll try to dig it out.
Extrapolating is essentially equivalent to changing the stencil weights, I think, so might end up reducing the overall order.

dschwoerer · 2020-01-10T13:00:10Z

src/mesh/impls/bout/boutmesh.cxx


      TS_up_out = ts;
      output_info.write("=> This processor sending out up\n");
    }
+
+    if (include_corner_cells) {
+      // Connect corners as if communicating all guard cells first in y, then in x


I fail to see why you not do that?
That would keep the number of communications the same, only introduce slightly larger messages.

What you would do is first communicate in y as we do now. Then you communicate in x, but including the boundary cells. This way you need the name number of communications, which would mean less impact on the runtime, as the package size change only increases slightly.

The trouble is you'd have to wait for the y-communication to complete before you can do the x-communication; at the moment, and in the changes proposed here, all the communications can be sent at the same time, and you only have to call wait once. So while it's true that the amount of information passed would not change much, I'm under the impression that communications are limited by latency rather than bandwidth, so we don't want to wait for two sets of communication to complete rather than one.

I guess we should do some proper benchmarking ...

A simple way to do this, without any code changes should be switching
between async and sync send, where we already have a switch for.
If we are limited by latencies, the async version should make the
communication roughly 4x as fast, but as far as I remember this is not
the case, but I might be wrong.

It's been a while, but I seem to remember that the async switch didn't
have a significant impact.

The switch is async_send, which results in using MPI_Isend instead of MPI_Send, but because MPI can copy the buffer and return from MPI_Send without needing to actually complete the communication, this may not make much difference https://stackoverflow.com/questions/10017301/mpi-blocking-vs-non-blocking. I just did a quick test and communicating over 4 nodes of Marconi, async_send made no noticable difference. However, the receives in BOUT++ are always asynchronous - I think that means that regardless of the async_send setting, the communications are overlapping. It would need more hacking around to force the x- and y-communications to really be consecutive to test this further - would be interesting to prove there's a difference, but I'm not going to pursue now.

Tests both disconnected-double-null and limiter topologies, with a variety of different NXPE values to check different cases of separatrices being on or away from processor boundaries.

Previously include_corner_cells was initialised after nx, ny, nz, MXG, MYG, and MZG although it is declared before them.

Now that corner cells are excluded from the y-communications, there can sometimes be zero-size parts of buffers. These could previously cause out-of-bounds indexing errors when calling things like &buffer[len] to get the start of the part of the buffer.

What should have been an 'else if' was previously an 'if'.

Previously the corner guard cells were copied into the buffers to be sent. Should be the corner interior cells, which are sent to the corner guard cells on another processor.

Allows them to be used inside const methods.

Update the recent modification to the iterateBndry*Y() methods, which made them include corner boundary cells, so that the corner cells are only included when include_corner_cells=true. This way if include_corner_cells=false, the previous behaviour is maintained.

Handling the nullptr case within the initialisation list means that there is always a valid options member, which can be used to set the 'const bool include_corner_cells' in the initialisation list.

johnomotani · 2020-01-22T15:02:27Z

I've refactored now to implement corners by doing communications first in the y-direction then in the x-direction. Across 4 nodes on Marconi, communication seems to take about the same time as on current next, which is better than it was when communicating the corners separately, and the code is much simpler too.

I've also added the boundary regions @cmacmackin mentioned.

ZedThree · 2020-01-22T15:33:24Z

Apparently our communications are limited by bandwidth and not latency, which was a surprise to me

@johnomotani This is an interesting result! Any chance the data and generating code could be shared? Maybe stick it in examples/performance?

At some point, I will get around to making a BOUT++ performance repo, with all our test cases and harnesses, data, etc.

johnomotani · 2020-01-22T15:45:43Z

@ZedThree I added the code in examples/performance/communications already. I was testing with the data/BOUT.inp.smallblocks case which for 4x48proc nodes on Marconi is only 2x2 grids per processor, and has the x-domain split across 2 nodes. I'll double check and upload some example log files.

johnomotani · 2020-01-22T17:40:36Z

Here are some logs for simulations with 2x2 blocks on each processor:

12.7s from next next.log
15.0s this branch consistent-corner-cells.log
15.2s this branch with include_corner_cells=false, so communication patterns are the same as next and x- and y-communications are sent at the same time consistent-corner-cells_turned-off.log
18.5s from before refactoring, when this branch was communicating corner cells separately and sending them at the same time as the x- and y-buffers (58e7659) communicate-corners.log

Here are some with a larger grid (192x128x128, with nxpe=96, nype=2) (note that the root processor claims to spend less time in communications, but almost all the rest spend about 60% in communications for all cases below - I've uploaded all the logs in the tarballs):

28.3s next next.log nextlogs.tar.gz
29.0s this branch consistent-corner-cells.log consistent-corner-cellslog.tar.gz
29.1s before refactoring, communicating corners separately communicate-corners.log communicate-cornerslog.tar.gz

cmacmackin · 2020-01-23T11:56:43Z

@johnomotani Thanks, I can confirm that the regions are now behaving correctly. I have already pulled this work into #1803, where it seems to be working.

dschwoerer · 2020-01-23T15:49:27Z

@johnomotani The test you send me behave strange, and behave better if you set the time derivative ddt(f)=0 explicitly. Not sure why it didn't crash without setting the derivative, but without setting the derivative, some procs spend 9 times in rhs/solver compared to otherwise. This might have caused issues for timing ...

I haven't got the new results of Marconi before it went down ... so will post again tomorrow.

In your case there are several ones that spend 2 times as much in calc and also more time in solving. As such the timings are not that reliably, as all the other procs need to wait on the slow ones in comm.

Updated the way FakeParallelMesh works so that it reflects the new approach to communication implemented in #1885, guaranteeing consistent corner cells.

Geometric fields that are loaded from the grid by Coordinates previously had the corner boundary cells set to NaN. These should now either be read in, or filled by extrapolating boundary conditions if include_corner_cells=true, so do not set them to NaN in that case. The NaNs could cause floating-point exception errors when --enable-sigfpe is used.

MPI_Barrier() before mesh->communicate() and only time mesh->communicate(), ensures different execution times on different processors do not give misleading timings.

johnomotani · 2020-03-01T14:41:38Z

In your case there are several ones that spend 2 times as much in calc and also more time in solving. As such the timings are not that reliably, as all the other procs need to wait on the slow ones in comm.

@dschwoerer has a good point. I've updated the performance test to have an MPI_Barrier() just before the call to mesh->communicate() and to time only that call - printed on the just comms line in the output.

With that update, I get (using a 192x128x128 grid on 4 nodes of Marconi with nxpe=48 so both x- and y-directions are split onto 2 nodes):

7.46 s existing standard implementation (without consistent corner cells)
15.49 s using consecutive x- then y-communications
13.43 s using overlapping x-, y- and corner-communications (version from 5ae549f, before Reverts)

I still don't understand why the version with corner communications but everything overlapping is so much slower, but it seems to be faster than the x-then-y version (although not massively) - numbers above are from repeating each run twice and the timings of the communications seem fairly consistent. I propose to reset this PR to the previous, overlapping-communications version (and cherry-pick the last 4 commits which are still needed/useful). Any objections? As an argument against, the non-overlapping implementation is much simpler, and we don't usually spend much time in mesh->communicate() anyway.

dschwoerer · 2020-03-01T20:16:19Z

src/mesh/coordinates.cxx

-      result(i, localmesh->LocalNy - 1 - j) = BoutNaN;
-      result(localmesh->LocalNx - 1 - i, j) = BoutNaN;
-      result(localmesh->LocalNx - 1 - i, localmesh->LocalNy - 1 - j) = BoutNaN;
+  if (not localmesh->include_corner_cells) {


What is the point?
If they are not included, they shouldn't be used.
If you think they are invalid, then you should set them to NaN in all cases.
These seems to me like this is causing unnecessary breakage.
Especially if this is generated from a expression, like 1.0, there is absolutely nothing wrong with the corner cells, and setting them to NaN has caused issues for me several times in the past.

A fair point, I think. If include_corner_cells is true, then the corners will be communicated later, but they will remain invalid until that communication.

I'm not sure we either currently do or should treat fields generated from expressions differently, so probably the best thing to do is to always set them to NaN.

My issue is that in the coords3d branch I removed the setting to NaN, so that dz is constant if it is generated from a constant expression. Another option would be to extrapolate them if we extrapolate in x and y, however that would be more work ...

Could wrap setting them in #if CHECK > 0 (level to be debated)?

What do you do if dz is not generated from a constant expression? And do you need dz in the corners?

At some we're also going to add guards and boundaries for z, so this may get even more fun.

Wrapping in CHECK is probably useful anyway, > 0 is probably fine because it is only used in the init. However, that does not solve this for coords3d.

I check whether dz (or zlength()) is constant. Currently the check is for RGN_ALL as then no further checks are needed, what the actual used domain is, and the first element is used.

In the case of a non-constant dz, any of the routines that assume a constant dz/zlength() cannot be used, so dz doesn't need to be constant, and corner cells are no more important then in any other case.

It is probably best to check in all cases what domain is used, and only check constantness for this domain, rather then RGN_ALL, but dz(0,0) is so much easier then dz[dz.getRegion(...).begin()[0]]

To do things right, I think we should:

extrapolate into corners, if they are to be used, otherwise leave them. The motivation to not set them to NaN is that they might be generated from a grid or input, which already includes guards and corners. We also do not break boundaries, if they are not extrapolated.

use appropriate regions in coord3d

It is probably best to check in all cases what domain is used, and only check constantness for this domain, rather then RGN_ALL, but dz(0,0) is so much easier then dz[dz.getRegion(...).begin()[0]]

Might be wise to wrap this up in something like constant_dz(const string& region) that throws if it's not constant over that region.

extrapolate into corners, if they are to be used, otherwise leave them

Is that not what is done in this change? This flag doesn't say whether or not to use the corners, just whether or not they are communicated.

We could of course mandate that grids for v5 provide values in the boundaries, which removes all this extrapolation stuff. That might just be moving the problem up a level.

It is probably best to check in all cases what domain is used, and only check constantness for this domain, rather then RGN_ALL, but dz(0,0) is so much easier then dz[dz.getRegion(...).begin()[0]]

Might be wise to wrap this up in something like constant_dz(const string& region) that throws if it's not constant over that region.

👍 will do

extrapolate into corners, if they are to be used, otherwise leave them

Is that not what is done in this change? This flag doesn't say whether or not to use the corners, just whether or not they are communicated.

I do not see any corner extrapolation. Either they are set to NaN (if ! include_corner_cells) and otherwise left as they are. With the methods from the non-uniform boundary stuff, it shouldn't be that hard to extend this into 2D/3D to allow extrapolation into corners.

We could of course mandate that grids for v5 provide values in the boundaries, which removes all this extrapolation stuff. That might just be moving the problem up a level.

Certainly would be easier for the initialization. Also the extrapolation would only need to be done once, at grid creation time. Writing it as script might be easier, because we don't have to think at all possible cases at once.

Also with the extrapolation, I feel uncertain on whether we should include the grid spacing. For dz, dy, dz that is non-trivial, but might be more accurate for the other ones ...

Sorry, all the corner-cell handling is rather awful. I probably need more comments/documentation...

If include_corner_cells == true, we should be extrapolating into the corners. This PR modifies the boundary regions so that if include_corner_cells == true then the y-boundaries include the corner cells. As the y-boundaries are added after the x-boundaries, the for (auto& bndry : localmesh->getBoundaries()) { loop above should first extrapolate into the x-boundaries, then into the y-boundaries, including extrapolating into corner cells using values from x-boundary cells.

I'd be happy to set the corners to NaN only when we are extrapolating into some boundary cells. Otherwise there is correct data in the corner cells, which there is no reason to overwrite. Keeping it can occasionally be useful: for example when using staggered grids, there is a case when we want to invert a Laplacian at Ny+1, and the non-uniform grid correction breaks unless there is valid data in the corner cells.

examples/performance/communications/communications.cxx

ZedThree · 2020-03-03T11:54:04Z

src/mesh/coordinates.cxx

-      result(i, localmesh->LocalNy - 1 - j) = BoutNaN;
-      result(localmesh->LocalNx - 1 - i, j) = BoutNaN;
-      result(localmesh->LocalNx - 1 - i, localmesh->LocalNy - 1 - j) = BoutNaN;
+  if (not localmesh->include_corner_cells) {


A fair point, I think. If include_corner_cells is true, then the corners will be communicated later, but they will remain invalid until that communication.

I'm not sure we either currently do or should treat fields generated from expressions differently, so probably the best thing to do is to always set them to NaN.

ZedThree · 2020-03-03T11:59:48Z

tests/integrated/test-communications/test-communications.cxx

+  Field3D f;
+};
+
+BOUTMAIN(TestCommunications);


Future work, could this be ported to be a unit test using a mock MPI?

include/bout/mesh.hxx

The corner cells are only invalid when include_corner_cells==false and we are extrapolating into either or both the x- and y-boundaries. Otherwise they are valid and there is no reason to set them to NaN.

Use custom main() instead.

* next: (95 commits) Format test-yupdown-weights runtest Fix test-yupdown-weights Add y-communications in Hermite spline interpolations Exclude corner cells in LaplaceXY Fix options for test-yupdown-weights Added test-yupdown-weights to CMake build Changes requested in code review of PR #1885 Set twistshift=true to fix test-yupdown-weights Changed ParallelTransform::positionsAndWeights to PositionsAndWeights Update test-yupdown-weights for v4.3 Getters for interpolation weights moved to ParallelTransforms Add test for constructing derivatives from Hermite weights. Declare result where it is first assigned Fix elm-pb Only set corner cells of Coordinates fields to NaN if CHECK>0 Simplify performance communications test, removing PhysicsModel Only set corner cells of Coordinates fields to NaN when necessary Update to 6field-simple example to use interpolation_xz.hxx header Remove option to use CtoL and LtoC operators in 6field-simple example Attempt to update staggered grid support in gravity_reduced example ...

johnomotani added 12 commits January 7, 2020 01:11

Add flag to BoutMesh to include and communicate corner cells correctly

9fa1e1f

Defaults to true, as it is better to be correct by default than maximally optimized. Flag not implemented yet.

x-communicate corner cells in y-boundaries

386db3e

Exclude corner cells from y-communications if include_corner_cells=true

06a1ba5

But include x-boundary cells in the y-communication.

Communicate the corner cells

02bf743

Update bounds for sending/receiving/unpacking data

f994732

Fix *_DATA_XSPLIT to be within range MXG to MXG + MXSUB if necessary

ed8ae8d

Provides and uses a utility function in_range(value, lower, upper) for forcing an index to be in the range [lower, upper).

Ensure CommHandle buffers are large enough

3aa637e

Since IDATA_buff_lowerY, etc., were split into IDATA_buff_lowerY_send and IDATA_buff_lowerY_recv, etc., we need to take the max of all of these to ensure the buffers created in BoutMesh::send are big enough.

Update manual to describe new communication that includes corners

449e838

Describe setting of corner boundary cells in manual

4c60265

johnomotani added work in progress Not ready for merging feature proposal A code/feature outline proposal labels Jan 10, 2020

Merge branch 'next' into consistent-corner-cells

aa4cd6c

Initialise include_corner_cells in BoutMesh constructor used for tests

3ba910a

dschwoerer reviewed Jan 10, 2020

View reviewed changes

cmacmackin mentioned this pull request Jan 16, 2020

Petsc interface generalisation #1889

Merged

johnomotani added 10 commits January 20, 2020 14:57

Integrated test of communications

6e7b35d

Tests both disconnected-double-null and limiter topologies, with a variety of different NXPE values to check different cases of separatrices being on or away from processor boundaries.

Fix initialisation order in BoutMesh constructor for fake meshes

d8adbd4

Previously include_corner_cells was initialised after nx, ny, nz, MXG, MYG, and MZG although it is declared before them.

Fix setting lower_outer_corner_dest in BoutMesh

ef94915

What should have been an 'else if' was previously an 'if'.

Fix values copied to corner send buffers

1b60f56

Previously the corner guard cells were copied into the buffers to be sent. Should be the corner interior cells, which are sent to the corner guard cells on another processor.

Add variables to mesh giving global grid sizes without boundary points

8a79469

Include corner boundary cells in iterateBndry*Y() methods

644b507

Make firstX() and lastX() methods const

c72e2a3

Allows them to be used inside const methods.

Make firstX() and lastX() const in FakeMesh

5ae549f

johnomotani added 2 commits January 22, 2020 13:45

Example to use for performance testing of communications

749ec8a

Handle nullptr options in member initialisation list of Mesh constructor

5b4dd27

Handling the nullptr case within the initialisation list means that there is always a valid options member, which can be used to set the 'const bool include_corner_cells' in the initialisation list.

johnomotani force-pushed the consistent-corner-cells branch from 63b3846 to 5b4dd27 Compare January 22, 2020 13:45

Merge branch 'next' into consistent-corner-cells

0ed5a93

cmacmackin added a commit that referenced this pull request Feb 3, 2020

Refactor PETSc global indexer tests for corner communication

9f6bec7

Updated the way FakeParallelMesh works so that it reflects the new approach to communication implemented in #1885, guaranteeing consistent corner cells.

johnomotani added 2 commits February 14, 2020 17:31

examples/performance/communications only time mesh->communicate()

f80584e

MPI_Barrier() before mesh->communicate() and only time mesh->communicate(), ensures different execution times on different processors do not give misleading timings.

dschwoerer reviewed Mar 1, 2020

View reviewed changes

ZedThree reviewed Mar 3, 2020

View reviewed changes

johnomotani added 3 commits March 7, 2020 21:28

Only set corner cells of Coordinates fields to NaN when necessary

f0f58a1

The corner cells are only invalid when include_corner_cells==false and we are extrapolating into either or both the x- and y-boundaries. Otherwise they are valid and there is no reason to set them to NaN.

Simplify performance communications test, removing PhysicsModel

8cbd070

Use custom main() instead.

Only set corner cells of Coordinates fields to NaN if CHECK>0

be81ff1

johnomotani pushed a commit that referenced this pull request Mar 8, 2020

Changes requested in code review of PR #1885

6dc70ad

ZedThree approved these changes Mar 9, 2020

View reviewed changes

ZedThree merged commit f519ebe into next Mar 9, 2020

ZedThree deleted the consistent-corner-cells branch March 9, 2020 13:50

ZedThree mentioned this pull request Mar 9, 2020

Exclude corner cells in LaplaceXY #1957

Merged

johnomotani mentioned this pull request Mar 10, 2020

Laplace operators and mixed derivatives #139

Open

This was referenced Mar 10, 2020

LaplaceXY: finite difference option #1789

Merged

Fix typo in example/performance/communications #1963

Merged

johnomotani mentioned this pull request Mar 23, 2020

interpolateAndExtrapolate not working as expected #1982

Closed

mikekryjak mentioned this pull request Oct 10, 2022

Implement proper handling of non-orthogonal grids bendudson/hermes-3#66

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Set corner cells consistently #1885

Set corner cells consistently #1885

johnomotani commented Jan 10, 2020 •

edited

ZedThree commented Jan 10, 2020

dschwoerer Jan 10, 2020

johnomotani Jan 10, 2020

dschwoerer Jan 10, 2020

johnomotani Jan 10, 2020 •

edited

johnomotani commented Jan 22, 2020

ZedThree commented Jan 22, 2020

johnomotani commented Jan 22, 2020

johnomotani commented Jan 22, 2020

cmacmackin commented Jan 23, 2020

dschwoerer commented Jan 23, 2020

johnomotani commented Mar 1, 2020

dschwoerer Mar 1, 2020

ZedThree Mar 3, 2020

dschwoerer Mar 3, 2020

ZedThree Mar 3, 2020

dschwoerer Mar 3, 2020

ZedThree Mar 3, 2020

dschwoerer Mar 3, 2020

johnomotani Mar 7, 2020

johnomotani Mar 7, 2020

ZedThree Mar 3, 2020

ZedThree Mar 3, 2020

Set corner cells consistently #1885

Set corner cells consistently #1885

Conversation

johnomotani commented Jan 10, 2020 • edited

ZedThree commented Jan 10, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

johnomotani Jan 10, 2020 • edited

Choose a reason for hiding this comment

johnomotani commented Jan 22, 2020

ZedThree commented Jan 22, 2020

johnomotani commented Jan 22, 2020

johnomotani commented Jan 22, 2020

cmacmackin commented Jan 23, 2020

dschwoerer commented Jan 23, 2020

johnomotani commented Mar 1, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

johnomotani commented Jan 10, 2020 •

edited

johnomotani Jan 10, 2020 •

edited