Feature p4est find partition - Find MPI ranks of point owners in distributed meshes #12341

konsim83 · 2021-05-28T17:13:59Z

To open the gate for massively parallel semi-Lagrangian simulations (with high Courant numbers) one may need to trace back points across partition boundaries of distributed meshes. These points may be located in cells outside the ghost layer of a process or even outside the mesh.

The purpose of this feature is to introduce a first function that can find the rank of the MPI process that owns the cell in which the point is located. This is done through interfacing p4est's p4est_search_partition(...) function with appropriate callbacks. Mathematically this amounts to a search across all trees and in each tree through the Morton curve utilizing meta information communicated throughout the forest.

marcfehling · 2021-05-28T19:20:53Z

In case you need it: If you just need to find the MPI rank a cell belongs to, you can use GridTools::get_subdomain_association(Tria, CellId) introduced in #11366. For p:d:T it uses p4est_comm_find_owner internally which does not require iterating over all cells.

konsim83 · 2021-06-05T13:20:57Z

Thanks for the hint @marcfehling. This feature is slightly different. It finds the rank of the cell that owns an arbitrary point (if there is such a cell) on each rank, meaning even when the rank querying does not have (all) info about other parts of the triangulation. I will try to be quick to come up with tests.

konsim83 · 2021-06-11T15:18:50Z

Hi @tjhei, as I promised, here the new feature. I guess it works, at least it does so on meshes that have the default manifold attached. (Where can I check that?) I will be happy about comments and hints.

I have removed draft flag and turned it into a PR. The project is also ready to test now.

@bangerth maybe this PR is of relevance to you, too.

bangerth · 2021-06-11T16:47:05Z

I have a feeling like this could be done in a simpler way using the functionality that sits inside GridTools::distributed_compute_point_locations() that was recently added. Take a look at how that function is implemented in grid_tools.cc, I think it does what you want.

konsim83 · 2021-06-12T06:59:33Z

distributed_compute_point_locations

Hi @bangerth I was not aware of this function but as far as I understand this PR here introduces a slightly different tool. It seems that GridTools::distributed_compute_point_locations() function communicates. The p4est interface is free of communication (the p4est internals communicate some meta information of the mesh) and hence reduces this overhead drastically. It is a top-down search that terminates if the point is found in a cell (not necessarily active - or a leaf in p4est terms) that has cildren only owned by one processor. It seems there are more differences since the query points on each rank (which can also be different on each rank) can be anywhere in the mesh globally. Also, no bounding boxes are required.

I guess you refer to #PR5414 and the also related #PR5458.

FYI, @cburstedde Maybe a question where you could help: The tests fail in the release config but not in debug. I have the clue that this might be me doing something wrong with the sc_array.

cburstedde · 2021-06-14T15:36:57Z

Hi @konsim83, your understanding of the functionality is pretty much spot-on. The p4est_search_partition is communication free and ideal for large CFL type simulations. It does not even require a ghost layer.

What exactly is the fail you're getting? Please feel free to isolate / discuss in the p4est repository.

source/distributed/tria.cc

konsim83 · 2021-06-19T21:20:31Z

It would be great if a Deal.II maintainer would add the "ready to test" label, so I can be confident that the bug I am chasing does not come from my installation of p4est.

jppelteret · 2021-06-19T21:29:37Z

There you go :-)

/rebuild

konsim83 · 2021-06-19T21:43:19Z

Btw, am I as a contributor able to set labes like "documentation" or "bug" myself or upon request?

peterrum · 2021-06-19T21:58:20Z

Btw, am I as a contributor able to set labes like "documentation" or "bug" myself

Nope ;)

konsim83 · 2021-06-20T09:12:51Z

/jenkins/workspace/dealii_PR-12341/include/deal.II/distributed/p4est_wrappers.h:99:41: error: ‘p4est_search_partition_t’ does not name a type
       using search_partition_callback = p4est_search_partition_t;

This should be defined in the header p4est_search.h. Could it be that the p4est version used for testing is a bit too old to contain the searches?

masterleinad · 2021-06-20T15:46:34Z

This should be defined in the header p4est_search.h. Could it be that the p4est version used for testing is a bit too old to contain the searches?

Looks like at least version 2.2 is required (https://github.com/cburstedde/p4est/blob/v2.1/src/p4est_search.h vs. https://github.com/cburstedde/p4est/blob/v2.2/src/p4est_search.h).

konsim83 · 2021-06-20T20:32:31Z

This should be defined in the header p4est_search.h. Could it be that the p4est version used for testing is a bit too old to contain the searches?

Looks like at least version 2.2 is required (https://github.com/cburstedde/p4est/blob/v2.1/src/p4est_search.h vs. https://github.com/cburstedde/p4est/blob/v2.2/src/p4est_search.h).

I see, seems like my local version is newer. But that means even if I remove the bug the CI has no chance to pass.

cburstedde · 2021-06-21T09:51:28Z

+ // p4est for some reason always needs double vxyz[3] as last argument to + // quadrant_coord_to_vertex

Yes, the in-out argument is double vxyz[3] on purpose, even in 2D.

The memory leak I am chasing occurs in these lines most likely in the function `internal::p4est::functions<dim>::quadrant_coord_to_vertex(...)` which is just a `typedef` for `p4est_qcoord_to_vertex(...)` of the `p8est_...`-version in 3D. I found this with some caveman debugging using `std::cout` . Am I doing something obviously wrong here? I do not find it and feel a bit blind, to be honest. Any help would be appreciated.

Not all connectivities have vertices, but it may be that the ones used by deal.II do. Without vertices, the function will run into an assertion. Just make sure the in-out argument vxyz has 3 entries.

cburstedde · 2021-06-21T09:52:56Z

`/jenkins/workspace/dealii_PR-12341/include/deal.II/distributed/p4est_wrappers.h:99:41: error: ‘p4est_search_partition_t’ does not name a type using search_partition_callback = p4est_search_partition_t;` This should be defined in the header `p4est_search.h`.

I you do not want to check for versions, you can query the P4EST_SEARCH_LOCAL #define in p4est_base.h. When defined, search_partition should be available.

tjhei · 2021-06-21T13:16:21Z

If you are running into issues in release mode, it could be because we don't set vertex positions in p4est in release mode, only in debug mode.

konsim83 · 2021-06-21T21:56:24Z

Any chance to compute the vertices if they are not set?

konsim83 · 2021-06-22T07:19:09Z

Thanks for the hint @tjhei , should have read the code more thoroughly. I guess you mean these lines

https://github.com/dealii/dealii/blob/master/source/distributed/tria.cc#L1802

which orccur in a similar way at several places in this file. Is it critical not to set the vertices in release mode?

tjhei · 2021-06-22T11:07:30Z

I have to admit that we did this a long time ago without checking memory and performance implications. As this information is not used anywhere, we decided not to provide it.

konsim83 · 2021-06-22T11:49:51Z

I have to admit that we did this a long time ago without checking memory and performance implications. As this information is not used anywhere, we decided not to provide it.

Would it make sense if I compared the performance in terms of timing and memory in some tests, maybe on a small cluster? Another option would be to introduce another #define P4EST_USE_VERTICES_RELEASE_MODE or so that could be set in the config phase. What is your opinion?

cburstedde · 2021-06-22T12:05:10Z

> I have to admit that we did this a long time ago without checking memory and performance implications. As this information is not used anywhere, we decided not to provide it.

This policy makes a lot of sense imo, since it forces both deal.II and application developers to handle geometry the deal.II way only.

Would it make sense if I compared the performance in terms of timing and memory in some tests, maybe on a small cluster? Another option would be to introduce another `#define P4EST_USE_VERTICES_RELEASE_MODE` or so that could be set in the config phase. What is your opinion?

I don't know much about deal.II geometries, but since most applications would use deal.II specific geometry code, it appears least redundant to me to use such code in the new functionality and not to rely on p4est-internal vertices. The deal.II way will allow us to stop maintaining vertex logic in the future.

konsim83 · 2021-06-22T15:54:14Z

If you are running into issues in release mode, it could be because we don't set vertex positions in p4est in release mode, only in debug mode.

@tjhei , I checked it. The cause for the failure in release mode is exactly this. As far as I understand the coarsest mesh is available on each processor, right? Is there any simple way to compute the vertex information from this efficiently? (maybe for the beginning for attached default manifolds for now) - sorry if this question is dumb but I am not familiar with all internals here.

What would be a good way to proceed with this then. In particular keeping @cburstedde 's last comment in mind.

tjhei · 2021-06-22T16:31:18Z

What would be a good way to proceed with this then

You need the vertices to answer if a point is within a quadrant, right? Would it be possible to use the deal.II functionality to answer the question instead?

konsim83 · 2021-06-25T15:34:07Z

What would be a good way to proceed with this then

You need the vertices to answer if a point is within a quadrant, right? Would it be possible to use the deal.II functionality to answer the question instead?

Sorry, @tjhei, for the late reply. @cburstedde Please correct me if I am wrong with the following.

Yes, to locate a point you need the vertices. Please note that the p4est algorithm is free of any communication. But you can not hold all points of the mesh on one processor of course. According to my understanding of p4est (and I am for sure not an expert on it) locating a point is possible since each MPI process knows about the physical location of its first quadrant. This gives you the partition boundaries on the space filling curve and you can exclude entire trees from the search when going top-down from coarse to fine.

Once you refine the mesh you may need to rebalance and this invokes communicating a new location of the first quadrant of each process. I did not find any deal.ii function that does that, right? My feeling here is that having the vertices in p4est as in the debug config may be redundant but it is not a big overhead and can enable users to find arbitrary points (or more general objects through modifying the callback wrappers) in the partition without communication during the search. I also have the feeling doing it directly in deal.ii would only reimplement what is already available in p4est. I, personally find these searches really useful since you can minimize communication (in particular no bad all-to-all communication is necessary) in semi-Lagrangian simulations which are a big deal in climate and wheather related applications. This could even lift it on another level since you do not need ghost layers and hence the CFL can be larger than in state-of-the arts algorithms.

My suggestion would be to take into account vertices in release mode. The change is minor and the additional memory overhead caused by this little redundancy is not too big. It will still hide the p4est backend from the deal.ii user. Also keep in mind that the bounding box based solution currently implemented does not scale very well with the number of processes. Opinions on this?

Last remark: A case that I do not fully understand is the case when a manifold other than the default manifold is attached, e.g., CAD based geometry or so. Are vertices after applying the manifold projection (after refinement) communicated to p4est? As far as I understand yes which would be good.

Best regards and thanks for your patience,
Konrad

konsim83 · 2021-06-25T15:56:48Z

I have to admit that we did this a long time ago without checking memory and performance implications. As this information is not used anywhere, we decided not to provide it.
This policy makes a lot of sense imo, since it forces both deal.II and application developers to handle geometry the deal.II way only.

Right, I think this p4est internal geometry handling should be hidden from the user. But I do think that the way p4est_search_partition() is wrapped does hide the internals.

Would it make sense if I compared the performance in terms of timing and memory in some tests, maybe on a small cluster? Another option would be to introduce another #define P4EST_USE_VERTICES_RELEASE_MODE or so that could be set in the config phase. What is your opinion?
I don't know much about deal.II geometries, but since most applications would use deal.II specific geometry code, it appears least redundant to me to use such code in the new functionality and not to rely on p4est-internal vertices. The deal.II way will allow us to stop maintaining vertex logic in the future.

In deal.ii's debug mode you still have the vertex info. Having it also in release is redundant, yes, but without it one would need to reimplement the search in deal.ii terms and that would make a lot more work I guess.

cburstedde · 2021-06-27T16:54:39Z

Whatever works for you! If you feel like using the vertices and p4est_qcoord_to_vertex, that is certainly fine by me.

tjhei · 2021-06-27T18:11:19Z

Last remark: A case that I do not fully understand is the case when a manifold other than the default manifold is attached, e.g., CAD based geometry or so. Are vertices after applying the manifold projection (after refinement) communicated to p4est? As far as I understand yes which would be good.

We only ever communicate the coarse mesh vertices to p4est. If you have a manifold attached, new vertices when refining will be moved to conform to the manifold. p4est does not know about any of that.
This is why we see the vertex information as a debugging tool (in fact, we only ever decided to send them to be able to use the p4est vtk output when testing our code at the very beginning).

If you need to go this way, I would suggest to add a settings flag to the Triangulation (rather than a configuration macro).

cburstedde · 2021-09-30T09:56:54Z

Wrt. the earlier question of @peterrum, for local leaves and entirely local branches, the partition search returns exactly one owner. For higher branches in the tree, we return a range of processes from pfirst to plast. This may happen to be a single process or more than one. For example, one process is returned for a branch that is on exactly one remote process.

cburstedde · 2021-09-30T11:26:55Z

Wrt. the earlier question of @peterrum, for local leaves and entirely local branches, the partition search returns exactly one owner. For higher branches in the tree, we return a range of processes from pfirst to plast. This may happen to be a single process or more than one. For example, one process is returned for a branch that is on exactly one remote process.

To add to this, users are free to return true for a point in multiple quadrants. Accordingly, they may get multiple owner processes for any given point, local or not.

konsim83 · 2021-10-01T08:09:08Z

My question would be how could we call the new function at these two places? I guess we could introduce a new version of guess_point_owner() that takes a Triangulation instance; but how and who decides if this new function can be called. It depends on the type of the mapping/manifold and the triangulation type. Should there be a dedicated function? Or does one need to dynamic cast, which would be unfortunate once the function is implemented for other Triangulation types? Would one create a check around this code:

dealii/source/base/mpi_remote_point_evaluation.cc

Lines 82 to 107 in 317a999

std::vector<BoundingBox<spacedim>> local_boxes;

for (const auto &cell : tria.active_cell_iterators())

if (cell->is_locally_owned())

local_boxes.push_back(mapping.get_bounding_box(cell));

// create r-tree of bounding boxes

const auto local_tree = pack_rtree(local_boxes);

// compress r-tree to a minimal set of bounding boxes

const auto local_reduced_box =

extract_rtree_level(local_tree, rtree_level);

// gather bounding boxes of other processes

const auto global_bboxes =

Utilities::MPI::all_gather(tria.get_communicator(), local_reduced_box);

const GridTools::Cache<dim, spacedim> cache(tria, mapping);

const auto data =

GridTools::internal::distributed_compute_point_locations(

cache,

points,

global_bboxes,

tolerance,

true,

enforce_unique_mapping);

and only set up the BBs if the new guess_point_owner is supported?

The p4est function seems to return to each point at most one owner. Many algorithms rely on that all owners of a vertex are known (e.g. in the context of DG), I guess if one would like to have the same effect as the current internal::guess_point_owner, one would need need to add an optional communication step to guarantee that the behavior is the same. Question: how is the owner of a vertex with multiple potentially owners determined by the new function? If one enables enforce_unique_mapping in

dealii/source/grid/grid_tools.cc

Line 5880 in 6dce9bf

const bool enforce_unique_mapping)

the lowest ranks seizes ownership. Is this similar as in p4est?

@peterrum The function that I implemented is only the first step toward what you refer to in your comments (which name important applications of this new interface). Currently the new function is implemented through an interface that returns exactly one process per point. The implementation of the C-callback responsible for the decision on what you return is flexible as @cburstedde mentioned and can be extended to, e.g., a DG usecase.

IMHO, incorporating this function or a modified version of its internals into, e.g., the ParticleHandler class is definitely worth it but deserves some more planning and another PR.

@marcfehling, @tjhei and @peterrum: Many thanks for the reviews. I will take care of incorporating your comments :-)

konsim83

I incorporated the comments. What will we do with the version of p4est? The CI does not run the tests since the availability of search functions is not given.

include/deal.II/distributed/tria.h

doc/news/changes/minor/20210924Simon

include/deal.II/distributed/tria.h

include/deal.II/distributed/p4est_wrappers.h

include/deal.II/distributed/tria.h

source/distributed/tria.cc

tests/mpi/p4est_find_point_owner_rank.h

konsim83 · 2021-10-01T10:05:26Z

write_mesh_vtk

@tjhei and @marcfehling I added an AssertThrow to all calls I could find (2x in tests and in write_mesh_vtk()).

marcfehling

I removed the empty output file manually using the GitHub interface.

Would you still incorporate what you reverted in 7952529, but fix it with what I suggested in #12341 (review)? In a follow-up PR (not this one), we can then change the behavior of deal.II and remove the distinction between debug and release mode.

Then this PR is good to be merged in my opinion. We should address @peterrum's important points #12341 (review) in a follow-up PR as you suggested in #12341 (comment).

In terms of CI: we either need to kindly ask @tjhei to update p4est, or rely on the word of someone else that the testsuite passes on a manual run. Since I already have your branch on my workstation (while I tried to fix that static_cast-enum problem), I can do that with not much of an overhead.

(EDIT: I think I got you wrong in the first time. Not the whole testsuite is run by the CI worker. However, we need to skip these tests, when P4EST_SEARCH_LOCAL is not set. Hmm, we need to tell cmake about it like in #9266. Hmm, I don't know straight away how to do it.)

(EDIT EDIT: Or we add an ifdef P4EST_SEARCH_LOCAL in these tests and provide an empty alternative output file. This would be the cheap solution.)

marcfehling · 2021-10-01T20:56:27Z

All the 3D tests fail on my workstation with p4est-2.2 in debug mode. In release mode however, they pass.

4306: mpi/p4est_3d_find_point_owner_rank_01.mpirun=1.debug: RUN failed. ------ Partial output:
4306: DEAL:0:3D-Test:: Point search on subdivided hyper cube...
4306: DEAL:0:3D-Test::   Number of MPI processes = 1
4306: DEAL:0:3D-Test::   Number of this MPI processes = 0
4306: DEAL:0:3D-Test::   Number of cells = 4096
4306: DEAL:0:3D-Test::   Number of levels = 4
4306: DEAL:0:3D-Test::   Number of global levels = 4
4306: DEAL:0:3D-Test::   Triangulation checksum = 544659457
4306: DEAL:0:3D-Test::   points = 0.230000 0.770000 0.770000    
4306: 
4306: mpi/p4est_3d_find_point_owner_rank_01.mpirun=1.debug: RUN failed. ------ Additional output on stdout/stderr:
4306: 
4306: Abort: Assertion 'x >= 0 && x <= P4EST_ROOT_LEN'
4306: Abort: .../bin/packages/p4est-2.2/src/p4est.c:111
4306: Abort

@konsim83 -- Can you investigate?

source/distributed/tria.cc

konsim83 · 2021-10-02T06:01:31Z

All the 3D tests fail on my workstation with p4est-2.2 in debug mode. In release mode however, they pass.

4306: mpi/p4est_3d_find_point_owner_rank_01.mpirun=1.debug: RUN failed. ------ Partial output:
4306: DEAL:0:3D-Test:: Point search on subdivided hyper cube...
4306: DEAL:0:3D-Test::   Number of MPI processes = 1
4306: DEAL:0:3D-Test::   Number of this MPI processes = 0
4306: DEAL:0:3D-Test::   Number of cells = 4096
4306: DEAL:0:3D-Test::   Number of levels = 4
4306: DEAL:0:3D-Test::   Number of global levels = 4
4306: DEAL:0:3D-Test::   Triangulation checksum = 544659457
4306: DEAL:0:3D-Test::   points = 0.230000 0.770000 0.770000    
4306: 
4306: mpi/p4est_3d_find_point_owner_rank_01.mpirun=1.debug: RUN failed. ------ Additional output on stdout/stderr:
4306: 
4306: Abort: Assertion 'x >= 0 && x <= P4EST_ROOT_LEN'
4306: Abort: .../bin/packages/p4est-2.2/src/p4est.c:111
4306: Abort

@konsim83 -- Can you investigate?

@marcfehling thank you for the hint, I need to test it with another version anyway. I am on it.

konsim83

Incorporated suggestions.

source/distributed/tria.cc

marcfehling · 2021-10-08T22:14:28Z

The find_point_owner specific tests pass! Thank you.

Now we need to still figure out a way how to hide your tests for incompatible versions of p4est, i.e. if we run the testsuite with p4est 2.0, which is the current minimal requirement. P4EST_SEARCH_LOCAL has been introduced with p4est 2.2.

We discussed ideas for this in #12341 (review). I think the cheapest solution with alternative output files are the way to go here. I would not rewrite the cmake module for p4est to check its configuration. @tamiko -- What do you think?

marcfehling · 2021-10-13T00:00:23Z

@konsim83 -- Could you review and merge the two open PRs against this feature branch?

konsim83 · 2021-10-13T06:46:07Z

@konsim83 -- Could you review and merge the two open PRs against this feature branch?

Sorry, was on my list, then forgot. It is merged now. Many thanks for the work! :-) Le me know when everything is ready so I can squash my commits.

konsim83 · 2021-10-13T13:10:08Z

@marcfehling The windows 2019 check fails and I do not see why. Any idea?

marcfehling · 2021-10-13T17:37:07Z

@marcfehling The windows 2019 check fails and I do not see why. Any idea?

Just a hickup of github actions. Re-running the tests helped -- there are no errors.

konsim83 · 2021-10-13T19:15:35Z

OK, squashed everything. Actual use cases of this function are about to follow. We already implemented then and now would like to share them with the community :-)

marcfehling

Looks good to me!

There are just two things that we should address in future pull requests as this one is already large enough:

Do no longer communicate vertices to p4est in debug mode in general, but use the new Settings parameter.
I can take care of this as a follow-up.
As discussed in Feature p4est find partition - Find MPI ranks of point owners in distributed meshes #12341 (review).
Reconsider interface to match GridTools::distributed_compute_point_locations().
Maybe even come up with a function that calls the p4est specialization for p:d:Tria, and the GridTools variant for other Triangulation types (similar to GridTools::get_subdomain_association()).
Any volunteers? :-)
As discussed in Feature p4est find partition - Find MPI ranks of point owners in distributed meshes #12341 (review).

I looked over this PR for quite some time, and feel like I'm less likely to find anything else. I would prefer someone else looking over this PR one more time before merging.

konsim83 · 2021-10-13T21:48:55Z

many thanks @marcfehling for your valuable comments and addons. Thumbs up! @bangerth @tjhei @peterrum @cburstedde or oanyone else that I forgot to mention have additional comments?

peterrum

This PR looks to me ready to be merged! Very nice; thanks for the efforts! I will merge tomorrow in the morning CEST, to give others the chance to express their opinions. The changes a very much localized in p:d:T and changes do not affect the rest of the code. A better integration of the introduced in the rest of the library would be very nice but need the join involvement of more people!

@konsim83 @cburstedde Thanks for the explanations and sorry for the silence - vacations and busy days since than! I remember reading Parallel tree algorithms for AMR and non-standard data access and Parallel level-set methods on adaptive tree-based grids quite a while; it is nice to see algorithms from there now also in deal.II!

konsim83 marked this pull request as ready for review June 11, 2021 15:14

konsim83 commented Jun 19, 2021

View reviewed changes