-
Notifications
You must be signed in to change notification settings - Fork 90
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Set corner cells consistently #1885
Conversation
Defaults to true, as it is better to be correct by default than maximally optimized. Flag not implemented yet.
But include x-boundary cells in the y-communication.
Define some variables to specify where corner cells should be sent to (dest) and where they should be received from (orig). Set to standard neighbouring connections in BoutMesh::default_connections() and modified as necessary when modifying connections with BoutMesh::set_connection() or adding a target with BoutMesh::add_target().
Provides and uses a utility function in_range(value, lower, upper) for forcing an index to be in the range [lower, upper).
If processor boundary is at the end of a limiter, previously the y-boundary cells would be sent but not received. Add separate y-ranges for sending and receiving data in the x-direction. Also ensure we don't communicate corner cells separately when including y-boundaries in the x-communication.
Since IDATA_buff_lowerY, etc., were split into IDATA_buff_lowerY_send and IDATA_buff_lowerY_recv, etc., we need to take the max of all of these to ensure the buffers created in BoutMesh::send are big enough.
This is (or will be) needed for non-orthogonal grids which have stencils extending into corner cells so that corner cells need to be filled consistently. Note that this fills the corner cells by applying a y-boundary condition using x-boundary points. Therefore x-boundary conditions must be applied before y-boundary conditions.
Related and might be interesting, while researching multiblock libraries this week, I came across one library which extrapolated into the corners. Unfortunately, I can't remember which library! I'll try to dig it out. |
src/mesh/impls/bout/boutmesh.cxx
Outdated
|
||
TS_up_out = ts; | ||
output_info.write("=> This processor sending out up\n"); | ||
} | ||
|
||
if (include_corner_cells) { | ||
// Connect corners as if communicating all guard cells first in y, then in x |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I fail to see why you not do that?
That would keep the number of communications the same, only introduce slightly larger messages.
What you would do is first communicate in y as we do now. Then you communicate in x, but including the boundary cells. This way you need the name number of communications, which would mean less impact on the runtime, as the package size change only increases slightly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The trouble is you'd have to wait for the y-communication to complete before you can do the x-communication; at the moment, and in the changes proposed here, all the communications can be sent at the same time, and you only have to call wait
once. So while it's true that the amount of information passed would not change much, I'm under the impression that communications are limited by latency rather than bandwidth, so we don't want to wait for two sets of communication to complete rather than one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess we should do some proper benchmarking ...
A simple way to do this, without any code changes should be switching
between async and sync send, where we already have a switch for.
If we are limited by latencies, the async version should make the
communication roughly 4x as fast, but as far as I remember this is not
the case, but I might be wrong.
It's been a while, but I seem to remember that the async switch didn't
have a significant impact.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The switch is async_send
, which results in using MPI_Isend
instead of MPI_Send
, but because MPI can copy the buffer and return from MPI_Send
without needing to actually complete the communication, this may not make much difference https://stackoverflow.com/questions/10017301/mpi-blocking-vs-non-blocking. I just did a quick test and communicating over 4 nodes of Marconi, async_send
made no noticable difference. However, the receives in BOUT++ are always asynchronous - I think that means that regardless of the async_send
setting, the communications are overlapping. It would need more hacking around to force the x- and y-communications to really be consecutive to test this further - would be interesting to prove there's a difference, but I'm not going to pursue now.
Tests both disconnected-double-null and limiter topologies, with a variety of different NXPE values to check different cases of separatrices being on or away from processor boundaries.
Previously include_corner_cells was initialised after nx, ny, nz, MXG, MYG, and MZG although it is declared before them.
Now that corner cells are excluded from the y-communications, there can sometimes be zero-size parts of buffers. These could previously cause out-of-bounds indexing errors when calling things like &buffer[len] to get the start of the part of the buffer.
What should have been an 'else if' was previously an 'if'.
Previously the corner guard cells were copied into the buffers to be sent. Should be the corner interior cells, which are sent to the corner guard cells on another processor.
Allows them to be used inside const methods.
Update the recent modification to the iterateBndry*Y() methods, which made them include corner boundary cells, so that the corner cells are only included when include_corner_cells=true. This way if include_corner_cells=false, the previous behaviour is maintained.
Handling the nullptr case within the initialisation list means that there is always a valid options member, which can be used to set the 'const bool include_corner_cells' in the initialisation list.
63b3846
to
5b4dd27
Compare
I've refactored now to implement corners by doing communications first in the y-direction then in the x-direction. Across 4 nodes on Marconi, communication seems to take about the same time as on current I've also added the boundary regions @cmacmackin mentioned. |
@johnomotani This is an interesting result! Any chance the data and generating code could be shared? Maybe stick it in At some point, I will get around to making a |
@ZedThree I added the code in |
Here are some logs for simulations with 2x2 blocks on each processor:
Here are some with a larger grid (
|
@johnomotani Thanks, I can confirm that the regions are now behaving correctly. I have already pulled this work into #1803, where it seems to be working. |
@johnomotani The test you send me behave strange, and behave better if you set the time derivative I haven't got the new results of Marconi before it went down ... so will post again tomorrow. In your case there are several ones that spend 2 times as much in calc and also more time in solving. As such the timings are not that reliably, as all the other procs need to wait on the slow ones in comm. |
Updated the way FakeParallelMesh works so that it reflects the new approach to communication implemented in #1885, guaranteeing consistent corner cells.
Geometric fields that are loaded from the grid by Coordinates previously had the corner boundary cells set to NaN. These should now either be read in, or filled by extrapolating boundary conditions if include_corner_cells=true, so do not set them to NaN in that case. The NaNs could cause floating-point exception errors when --enable-sigfpe is used.
MPI_Barrier() before mesh->communicate() and only time mesh->communicate(), ensures different execution times on different processors do not give misleading timings.
@dschwoerer has a good point. I've updated the performance test to have an With that update, I get (using a 192x128x128 grid on 4 nodes of Marconi with nxpe=48 so both x- and y-directions are split onto 2 nodes):
I still don't understand why the version with corner communications but everything overlapping is so much slower, but it seems to be faster than the x-then-y version (although not massively) - numbers above are from repeating each run twice and the timings of the communications seem fairly consistent. I propose to reset this PR to the previous, overlapping-communications version (and cherry-pick the last 4 commits which are still needed/useful). Any objections? As an argument against, the non-overlapping implementation is much simpler, and we don't usually spend much time in |
src/mesh/coordinates.cxx
Outdated
result(i, localmesh->LocalNy - 1 - j) = BoutNaN; | ||
result(localmesh->LocalNx - 1 - i, j) = BoutNaN; | ||
result(localmesh->LocalNx - 1 - i, localmesh->LocalNy - 1 - j) = BoutNaN; | ||
if (not localmesh->include_corner_cells) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the point?
If they are not included, they shouldn't be used.
If you think they are invalid, then you should set them to NaN in all cases.
These seems to me like this is causing unnecessary breakage.
Especially if this is generated from a expression, like 1.0
, there is absolutely nothing wrong with the corner cells, and setting them to NaN has caused issues for me several times in the past.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A fair point, I think. If include_corner_cells
is true, then the corners will be communicated later, but they will remain invalid until that communication.
I'm not sure we either currently do or should treat fields generated from expressions differently, so probably the best thing to do is to always set them to NaN.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My issue is that in the coords3d branch I removed the setting to NaN, so that dz is constant if it is generated from a constant expression. Another option would be to extrapolate them if we extrapolate in x and y, however that would be more work ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could wrap setting them in #if CHECK > 0
(level to be debated)?
What do you do if dz is not generated from a constant expression? And do you need dz in the corners?
At some we're also going to add guards and boundaries for z, so this may get even more fun.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wrapping in CHECK
is probably useful anyway, > 0
is probably fine because it is only used in the init. However, that does not solve this for coords3d.
I check whether dz
(or zlength()
) is constant. Currently the check is for RGN_ALL
as then no further checks are needed, what the actual used domain is, and the first element is used.
In the case of a non-constant dz
, any of the routines that assume a constant dz
/zlength()
cannot be used, so dz
doesn't need to be constant, and corner cells are no more important then in any other case.
It is probably best to check in all cases what domain is used, and only check constantness for this domain, rather then RGN_ALL
, but dz(0,0)
is so much easier then dz[dz.getRegion(...).begin()[0]]
To do things right, I think we should:
- extrapolate into corners, if they are to be used, otherwise leave them. The motivation to not set them to NaN is that they might be generated from a grid or input, which already includes guards and corners. We also do not break boundaries, if they are not extrapolated.
- use appropriate regions in coord3d
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is probably best to check in all cases what domain is used, and only check constantness for this domain, rather then RGN_ALL, but dz(0,0) is so much easier then dz[dz.getRegion(...).begin()[0]]
Might be wise to wrap this up in something like constant_dz(const string& region)
that throws if it's not constant over that region.
extrapolate into corners, if they are to be used, otherwise leave them
Is that not what is done in this change? This flag doesn't say whether or not to use the corners, just whether or not they are communicated.
We could of course mandate that grids for v5 provide values in the boundaries, which removes all this extrapolation stuff. That might just be moving the problem up a level.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is probably best to check in all cases what domain is used, and only check constantness for this domain, rather then RGN_ALL, but dz(0,0) is so much easier then dz[dz.getRegion(...).begin()[0]]
Might be wise to wrap this up in something like
constant_dz(const string& region)
that throws if it's not constant over that region.
👍 will do
extrapolate into corners, if they are to be used, otherwise leave them
Is that not what is done in this change? This flag doesn't say whether or not to use the corners, just whether or not they are communicated.
I do not see any corner extrapolation. Either they are set to NaN (if ! include_corner_cells
) and otherwise left as they are. With the methods from the non-uniform boundary stuff, it shouldn't be that hard to extend this into 2D/3D to allow extrapolation into corners.
We could of course mandate that grids for v5 provide values in the boundaries, which removes all this extrapolation stuff. That might just be moving the problem up a level.
Certainly would be easier for the initialization. Also the extrapolation would only need to be done once, at grid creation time. Writing it as script might be easier, because we don't have to think at all possible cases at once.
Also with the extrapolation, I feel uncertain on whether we should include the grid spacing. For dz, dy, dz that is non-trivial, but might be more accurate for the other ones ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, all the corner-cell handling is rather awful. I probably need more comments/documentation...
If include_corner_cells == true
, we should be extrapolating into the corners. This PR modifies the boundary regions so that if include_corner_cells == true
then the y-boundaries include the corner cells. As the y-boundaries are added after the x-boundaries, the for (auto& bndry : localmesh->getBoundaries()) {
loop above should first extrapolate into the x-boundaries, then into the y-boundaries, including extrapolating into corner cells using values from x-boundary cells.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd be happy to set the corners to NaN only when we are extrapolating into some boundary cells. Otherwise there is correct data in the corner cells, which there is no reason to overwrite. Keeping it can occasionally be useful: for example when using staggered grids, there is a case when we want to invert a Laplacian at Ny+1
, and the non-uniform grid correction breaks unless there is valid data in the corner cells.
src/mesh/coordinates.cxx
Outdated
result(i, localmesh->LocalNy - 1 - j) = BoutNaN; | ||
result(localmesh->LocalNx - 1 - i, j) = BoutNaN; | ||
result(localmesh->LocalNx - 1 - i, localmesh->LocalNy - 1 - j) = BoutNaN; | ||
if (not localmesh->include_corner_cells) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A fair point, I think. If include_corner_cells
is true, then the corners will be communicated later, but they will remain invalid until that communication.
I'm not sure we either currently do or should treat fields generated from expressions differently, so probably the best thing to do is to always set them to NaN.
Field3D f; | ||
}; | ||
|
||
BOUTMAIN(TestCommunications); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Future work, could this be ported to be a unit test using a mock MPI?
The corner cells are only invalid when include_corner_cells==false and we are extrapolating into either or both the x- and y-boundaries. Otherwise they are valid and there is no reason to set them to NaN.
Use custom main() instead.
* next: (95 commits) Format test-yupdown-weights runtest Fix test-yupdown-weights Add y-communications in Hermite spline interpolations Exclude corner cells in LaplaceXY Fix options for test-yupdown-weights Added test-yupdown-weights to CMake build Changes requested in code review of PR #1885 Set twistshift=true to fix test-yupdown-weights Changed ParallelTransform::positionsAndWeights to PositionsAndWeights Update test-yupdown-weights for v4.3 Getters for interpolation weights moved to ParallelTransforms Add test for constructing derivatives from Hermite weights. Declare result where it is first assigned Fix elm-pb Only set corner cells of Coordinates fields to NaN if CHECK>0 Simplify performance communications test, removing PhysicsModel Only set corner cells of Coordinates fields to NaN when necessary Update to 6field-simple example to use interpolation_xz.hxx header Remove option to use CtoL and LtoC operators in 6field-simple example Attempt to update staggered grid support in gravity_reduced example ...
To properly support non-orthogonal grids (where the x- and y- directions are not orthogonal) we should properly fill the corner cells which are needed by mixed-derivative stencils (i.e.
D2DXDY
).This PR addresses this by adding communication of corner guard cells, and updating boundary conditions to apply something to the corner boundary cells.
The communications are defined to make the corner cells consistent with what would happen if there was no processor boundary (in particular y-processor boundaries are required at X-points, but the separatrices can be anywhere in the x-domain, either on boundaries or not).
The corner boundary points are set by applying a y-boundary condition using x-boundary points as input. This is a somewhat arbitrary choice. It requires applying x-boundary conditions before y-boundary conditions (which happened to be the existing behaviour anyway). User-code boundary conditions (e.g. sheath boundary conditions) need to take care with the order. The y-boundary iterators have been updated to include the corner boundary points so user-implemented boundary conditions should not need any special handling of the corners, just care that x-boundary conditions have been applied before they are called.
There is an option
mesh:include_corner_cells
which can be set tofalse
to revert to to the original behaviour, which is correct for orthogonal grids and might be slightly more optimized for that case.WIP because tests of the new communications are still needed.Possible backport candidate (maybe with the default of
include_corner_cells
set tofalse
)? If we want to back-port the 3d Laplace solvers to v4.4 then this will be needed.