Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature p4est find partition - Find MPI ranks of point owners in distributed meshes #12341

Merged
merged 1 commit into from Oct 15, 2021

Conversation

konsim83
Copy link
Contributor

@konsim83 konsim83 commented May 28, 2021

To open the gate for massively parallel semi-Lagrangian simulations (with high Courant numbers) one may need to trace back points across partition boundaries of distributed meshes. These points may be located in cells outside the ghost layer of a process or even outside the mesh.

The purpose of this feature is to introduce a first function that can find the rank of the MPI process that owns the cell in which the point is located. This is done through interfacing p4est's p4est_search_partition(...) function with appropriate callbacks. Mathematically this amounts to a search across all trees and in each tree through the Morton curve utilizing meta information communicated throughout the forest.

@marcfehling
Copy link
Member

marcfehling commented May 28, 2021

In case you need it: If you just need to find the MPI rank a cell belongs to, you can use GridTools::get_subdomain_association(Tria, CellId) introduced in #11366. For p:d:T it uses p4est_comm_find_owner internally which does not require iterating over all cells.

@konsim83
Copy link
Contributor Author

konsim83 commented Jun 5, 2021

Thanks for the hint @marcfehling. This feature is slightly different. It finds the rank of the cell that owns an arbitrary point (if there is such a cell) on each rank, meaning even when the rank querying does not have (all) info about other parts of the triangulation. I will try to be quick to come up with tests.

@konsim83 konsim83 marked this pull request as ready for review June 11, 2021 15:14
@konsim83
Copy link
Contributor Author

Hi @tjhei, as I promised, here the new feature. I guess it works, at least it does so on meshes that have the default manifold attached. (Where can I check that?) I will be happy about comments and hints.

I have removed draft flag and turned it into a PR. The project is also ready to test now.

@bangerth maybe this PR is of relevance to you, too.

@bangerth
Copy link
Member

I have a feeling like this could be done in a simpler way using the functionality that sits inside GridTools::distributed_compute_point_locations() that was recently added. Take a look at how that function is implemented in grid_tools.cc, I think it does what you want.

@konsim83
Copy link
Contributor Author

konsim83 commented Jun 12, 2021

distributed_compute_point_locations

Hi @bangerth I was not aware of this function but as far as I understand this PR here introduces a slightly different tool. It seems that GridTools::distributed_compute_point_locations() function communicates. The p4est interface is free of communication (the p4est internals communicate some meta information of the mesh) and hence reduces this overhead drastically. It is a top-down search that terminates if the point is found in a cell (not necessarily active - or a leaf in p4est terms) that has cildren only owned by one processor. It seems there are more differences since the query points on each rank (which can also be different on each rank) can be anywhere in the mesh globally. Also, no bounding boxes are required.

I guess you refer to #PR5414 and the also related #PR5458.

FYI, @cburstedde Maybe a question where you could help: The tests fail in the release config but not in debug. I have the clue that this might be me doing something wrong with the sc_array.

@cburstedde
Copy link

Hi @konsim83, your understanding of the functionality is pretty much spot-on. The p4est_search_partition is communication free and ideal for large CFL type simulations. It does not even require a ghost layer.

What exactly is the fail you're getting? Please feel free to isolate / discuss in the p4est repository.

source/distributed/tria.cc Outdated Show resolved Hide resolved
@konsim83
Copy link
Contributor Author

It would be great if a Deal.II maintainer would add the "ready to test" label, so I can be confident that the bug I am chasing does not come from my installation of p4est.

@jppelteret
Copy link
Member

There you go :-)

/rebuild

@konsim83
Copy link
Contributor Author

Btw, am I as a contributor able to set labes like "documentation" or "bug" myself or upon request?

@peterrum
Copy link
Member

peterrum commented Jun 19, 2021

Btw, am I as a contributor able to set labes like "documentation" or "bug" myself

Nope ;)

@konsim83
Copy link
Contributor Author

konsim83 commented Jun 20, 2021

/jenkins/workspace/dealii_PR-12341/include/deal.II/distributed/p4est_wrappers.h:99:41: error: ‘p4est_search_partition_t’ does not name a type
       using search_partition_callback = p4est_search_partition_t;

This should be defined in the header p4est_search.h. Could it be that the p4est version used for testing is a bit too old to contain the searches?

@masterleinad
Copy link
Member

This should be defined in the header p4est_search.h. Could it be that the p4est version used for testing is a bit too old to contain the searches?

Looks like at least version 2.2 is required (https://github.com/cburstedde/p4est/blob/v2.1/src/p4est_search.h vs. https://github.com/cburstedde/p4est/blob/v2.2/src/p4est_search.h).

@konsim83
Copy link
Contributor Author

This should be defined in the header p4est_search.h. Could it be that the p4est version used for testing is a bit too old to contain the searches?

Looks like at least version 2.2 is required (https://github.com/cburstedde/p4est/blob/v2.1/src/p4est_search.h vs. https://github.com/cburstedde/p4est/blob/v2.2/src/p4est_search.h).

I see, seems like my local version is newer. But that means even if I remove the bug the CI has no chance to pass.

@cburstedde
Copy link

cburstedde commented Jun 21, 2021 via email

@cburstedde
Copy link

cburstedde commented Jun 21, 2021 via email

@tjhei
Copy link
Member

tjhei commented Jun 21, 2021

If you are running into issues in release mode, it could be because we don't set vertex positions in p4est in release mode, only in debug mode.

@konsim83
Copy link
Contributor Author

konsim83 commented Jun 21, 2021

Any chance to compute the vertices if they are not set?

@konsim83
Copy link
Contributor Author

Thanks for the hint @tjhei , should have read the code more thoroughly. I guess you mean these lines

https://github.com/dealii/dealii/blob/master/source/distributed/tria.cc#L1802

which orccur in a similar way at several places in this file. Is it critical not to set the vertices in release mode?

@tjhei
Copy link
Member

tjhei commented Jun 22, 2021

I have to admit that we did this a long time ago without checking memory and performance implications. As this information is not used anywhere, we decided not to provide it.

@konsim83
Copy link
Contributor Author

I have to admit that we did this a long time ago without checking memory and performance implications. As this information is not used anywhere, we decided not to provide it.

Would it make sense if I compared the performance in terms of timing and memory in some tests, maybe on a small cluster? Another option would be to introduce another #define P4EST_USE_VERTICES_RELEASE_MODE or so that could be set in the config phase. What is your opinion?

@cburstedde
Copy link

cburstedde commented Jun 22, 2021 via email

@konsim83
Copy link
Contributor Author

If you are running into issues in release mode, it could be because we don't set vertex positions in p4est in release mode, only in debug mode.

@tjhei , I checked it. The cause for the failure in release mode is exactly this. As far as I understand the coarsest mesh is available on each processor, right? Is there any simple way to compute the vertex information from this efficiently? (maybe for the beginning for attached default manifolds for now) - sorry if this question is dumb but I am not familiar with all internals here.

What would be a good way to proceed with this then. In particular keeping @cburstedde 's last comment in mind.

@tjhei
Copy link
Member

tjhei commented Jun 22, 2021

What would be a good way to proceed with this then

You need the vertices to answer if a point is within a quadrant, right? Would it be possible to use the deal.II functionality to answer the question instead?

@konsim83
Copy link
Contributor Author

What would be a good way to proceed with this then

You need the vertices to answer if a point is within a quadrant, right? Would it be possible to use the deal.II functionality to answer the question instead?

Sorry, @tjhei, for the late reply. @cburstedde Please correct me if I am wrong with the following.

Yes, to locate a point you need the vertices. Please note that the p4est algorithm is free of any communication. But you can not hold all points of the mesh on one processor of course. According to my understanding of p4est (and I am for sure not an expert on it) locating a point is possible since each MPI process knows about the physical location of its first quadrant. This gives you the partition boundaries on the space filling curve and you can exclude entire trees from the search when going top-down from coarse to fine.

Once you refine the mesh you may need to rebalance and this invokes communicating a new location of the first quadrant of each process. I did not find any deal.ii function that does that, right? My feeling here is that having the vertices in p4est as in the debug config may be redundant but it is not a big overhead and can enable users to find arbitrary points (or more general objects through modifying the callback wrappers) in the partition without communication during the search. I also have the feeling doing it directly in deal.ii would only reimplement what is already available in p4est. I, personally find these searches really useful since you can minimize communication (in particular no bad all-to-all communication is necessary) in semi-Lagrangian simulations which are a big deal in climate and wheather related applications. This could even lift it on another level since you do not need ghost layers and hence the CFL can be larger than in state-of-the arts algorithms.

My suggestion would be to take into account vertices in release mode. The change is minor and the additional memory overhead caused by this little redundancy is not too big. It will still hide the p4est backend from the deal.ii user. Also keep in mind that the bounding box based solution currently implemented does not scale very well with the number of processes. Opinions on this?

Last remark: A case that I do not fully understand is the case when a manifold other than the default manifold is attached, e.g., CAD based geometry or so. Are vertices after applying the manifold projection (after refinement) communicated to p4est? As far as I understand yes which would be good.

Best regards and thanks for your patience,
Konrad

@konsim83
Copy link
Contributor Author

I have to admit that we did this a long time ago without checking memory and performance implications. As this information is not used anywhere, we decided not to provide it.
This policy makes a lot of sense imo, since it forces both deal.II and application developers to handle geometry the deal.II way only.

Right, I think this p4est internal geometry handling should be hidden from the user. But I do think that the way p4est_search_partition() is wrapped does hide the internals.

Would it make sense if I compared the performance in terms of timing and memory in some tests, maybe on a small cluster? Another option would be to introduce another #define P4EST_USE_VERTICES_RELEASE_MODE or so that could be set in the config phase. What is your opinion?
I don't know much about deal.II geometries, but since most applications would use deal.II specific geometry code, it appears least redundant to me to use such code in the new functionality and not to rely on p4est-internal vertices. The deal.II way will allow us to stop maintaining vertex logic in the future.

In deal.ii's debug mode you still have the vertex info. Having it also in release is redundant, yes, but without it one would need to reimplement the search in deal.ii terms and that would make a lot more work I guess.

@cburstedde
Copy link

Whatever works for you! If you feel like using the vertices and p4est_qcoord_to_vertex, that is certainly fine by me.

@tjhei
Copy link
Member

tjhei commented Jun 27, 2021

Last remark: A case that I do not fully understand is the case when a manifold other than the default manifold is attached, e.g., CAD based geometry or so. Are vertices after applying the manifold projection (after refinement) communicated to p4est? As far as I understand yes which would be good.

We only ever communicate the coarse mesh vertices to p4est. If you have a manifold attached, new vertices when refining will be moved to conform to the manifold. p4est does not know about any of that.
This is why we see the vertex information as a debugging tool (in fact, we only ever decided to send them to be able to use the p4est vtk output when testing our code at the very beginning).

If you need to go this way, I would suggest to add a settings flag to the Triangulation (rather than a configuration macro).

@cburstedde
Copy link

Wrt. the earlier question of @peterrum, for local leaves and entirely local branches, the partition search returns exactly one owner. For higher branches in the tree, we return a range of processes from pfirst to plast. This may happen to be a single process or more than one. For example, one process is returned for a branch that is on exactly one remote process.

@cburstedde
Copy link

Wrt. the earlier question of @peterrum, for local leaves and entirely local branches, the partition search returns exactly one owner. For higher branches in the tree, we return a range of processes from pfirst to plast. This may happen to be a single process or more than one. For example, one process is returned for a branch that is on exactly one remote process.

To add to this, users are free to return true for a point in multiple quadrants. Accordingly, they may get multiple owner processes for any given point, local or not.

@konsim83
Copy link
Contributor Author

konsim83 commented Oct 1, 2021

My question would be how could we call the new function at these two places? I guess we could introduce a new version of guess_point_owner() that takes a Triangulation instance; but how and who decides if this new function can be called. It depends on the type of the mapping/manifold and the triangulation type. Should there be a dedicated function? Or does one need to dynamic cast, which would be unfortunate once the function is implemented for other Triangulation types? Would one create a check around this code:

std::vector<BoundingBox<spacedim>> local_boxes;
for (const auto &cell : tria.active_cell_iterators())
if (cell->is_locally_owned())
local_boxes.push_back(mapping.get_bounding_box(cell));
// create r-tree of bounding boxes
const auto local_tree = pack_rtree(local_boxes);
// compress r-tree to a minimal set of bounding boxes
const auto local_reduced_box =
extract_rtree_level(local_tree, rtree_level);
// gather bounding boxes of other processes
const auto global_bboxes =
Utilities::MPI::all_gather(tria.get_communicator(), local_reduced_box);
const GridTools::Cache<dim, spacedim> cache(tria, mapping);
const auto data =
GridTools::internal::distributed_compute_point_locations(
cache,
points,
global_bboxes,
tolerance,
true,
enforce_unique_mapping);

and only set up the BBs if the new guess_point_owner is supported?

The p4est function seems to return to each point at most one owner. Many algorithms rely on that all owners of a vertex are known (e.g. in the context of DG), I guess if one would like to have the same effect as the current internal::guess_point_owner, one would need need to add an optional communication step to guarantee that the behavior is the same. Question: how is the owner of a vertex with multiple potentially owners determined by the new function? If one enables enforce_unique_mapping in

const bool enforce_unique_mapping)

the lowest ranks seizes ownership. Is this similar as in p4est?

@peterrum The function that I implemented is only the first step toward what you refer to in your comments (which name important applications of this new interface). Currently the new function is implemented through an interface that returns exactly one process per point. The implementation of the C-callback responsible for the decision on what you return is flexible as @cburstedde mentioned and can be extended to, e.g., a DG usecase.

IMHO, incorporating this function or a modified version of its internals into, e.g., the ParticleHandler class is definitely worth it but deserves some more planning and another PR.

@marcfehling, @tjhei and @peterrum: Many thanks for the reviews. I will take care of incorporating your comments :-)

Copy link
Contributor Author

@konsim83 konsim83 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I incorporated the comments. What will we do with the version of p4est? The CI does not run the tests since the availability of search functions is not given.

include/deal.II/distributed/tria.h Outdated Show resolved Hide resolved
doc/news/changes/minor/20210924Simon Outdated Show resolved Hide resolved
include/deal.II/distributed/tria.h Outdated Show resolved Hide resolved
include/deal.II/distributed/tria.h Outdated Show resolved Hide resolved
include/deal.II/distributed/tria.h Outdated Show resolved Hide resolved
include/deal.II/distributed/p4est_wrappers.h Outdated Show resolved Hide resolved
include/deal.II/distributed/tria.h Outdated Show resolved Hide resolved
include/deal.II/distributed/tria.h Outdated Show resolved Hide resolved
source/distributed/tria.cc Outdated Show resolved Hide resolved
tests/mpi/p4est_find_point_owner_rank.h Outdated Show resolved Hide resolved
@konsim83
Copy link
Contributor Author

konsim83 commented Oct 1, 2021

write_mesh_vtk

@tjhei and @marcfehling I added an AssertThrow to all calls I could find (2x in tests and in write_mesh_vtk()).

Copy link
Member

@marcfehling marcfehling left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed the empty output file manually using the GitHub interface.

Would you still incorporate what you reverted in 7952529, but fix it with what I suggested in #12341 (review)? In a follow-up PR (not this one), we can then change the behavior of deal.II and remove the distinction between debug and release mode.

Then this PR is good to be merged in my opinion. We should address @peterrum's important points #12341 (review) in a follow-up PR as you suggested in #12341 (comment).

In terms of CI: we either need to kindly ask @tjhei to update p4est, or rely on the word of someone else that the testsuite passes on a manual run. Since I already have your branch on my workstation (while I tried to fix that static_cast-enum problem), I can do that with not much of an overhead.

(EDIT: I think I got you wrong in the first time. Not the whole testsuite is run by the CI worker. However, we need to skip these tests, when P4EST_SEARCH_LOCAL is not set. Hmm, we need to tell cmake about it like in #9266. Hmm, I don't know straight away how to do it.)

(EDIT EDIT: Or we add an ifdef P4EST_SEARCH_LOCAL in these tests and provide an empty alternative output file. This would be the cheap solution.)

@marcfehling
Copy link
Member

marcfehling commented Oct 1, 2021

All the 3D tests fail on my workstation with p4est-2.2 in debug mode. In release mode however, they pass.

4306: mpi/p4est_3d_find_point_owner_rank_01.mpirun=1.debug: RUN failed. ------ Partial output:
4306: DEAL:0:3D-Test:: Point search on subdivided hyper cube...
4306: DEAL:0:3D-Test::   Number of MPI processes = 1
4306: DEAL:0:3D-Test::   Number of this MPI processes = 0
4306: DEAL:0:3D-Test::   Number of cells = 4096
4306: DEAL:0:3D-Test::   Number of levels = 4
4306: DEAL:0:3D-Test::   Number of global levels = 4
4306: DEAL:0:3D-Test::   Triangulation checksum = 544659457
4306: DEAL:0:3D-Test::   points = 0.230000 0.770000 0.770000    
4306: 
4306: mpi/p4est_3d_find_point_owner_rank_01.mpirun=1.debug: RUN failed. ------ Additional output on stdout/stderr:
4306: 
4306: Abort: Assertion 'x >= 0 && x <= P4EST_ROOT_LEN'
4306: Abort: .../bin/packages/p4est-2.2/src/p4est.c:111
4306: Abort

@konsim83 -- Can you investigate?

@konsim83
Copy link
Contributor Author

konsim83 commented Oct 2, 2021

All the 3D tests fail on my workstation with p4est-2.2 in debug mode. In release mode however, they pass.

4306: mpi/p4est_3d_find_point_owner_rank_01.mpirun=1.debug: RUN failed. ------ Partial output:
4306: DEAL:0:3D-Test:: Point search on subdivided hyper cube...
4306: DEAL:0:3D-Test::   Number of MPI processes = 1
4306: DEAL:0:3D-Test::   Number of this MPI processes = 0
4306: DEAL:0:3D-Test::   Number of cells = 4096
4306: DEAL:0:3D-Test::   Number of levels = 4
4306: DEAL:0:3D-Test::   Number of global levels = 4
4306: DEAL:0:3D-Test::   Triangulation checksum = 544659457
4306: DEAL:0:3D-Test::   points = 0.230000 0.770000 0.770000    
4306: 
4306: mpi/p4est_3d_find_point_owner_rank_01.mpirun=1.debug: RUN failed. ------ Additional output on stdout/stderr:
4306: 
4306: Abort: Assertion 'x >= 0 && x <= P4EST_ROOT_LEN'
4306: Abort: .../bin/packages/p4est-2.2/src/p4est.c:111
4306: Abort

@konsim83 -- Can you investigate?

@marcfehling thank you for the hint, I need to test it with another version anyway. I am on it.

Copy link
Contributor Author

@konsim83 konsim83 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Incorporated suggestions.

source/distributed/tria.cc Outdated Show resolved Hide resolved
source/distributed/tria.cc Outdated Show resolved Hide resolved
source/distributed/tria.cc Outdated Show resolved Hide resolved
source/distributed/tria.cc Outdated Show resolved Hide resolved
@marcfehling
Copy link
Member

The find_point_owner specific tests pass! Thank you.


Now we need to still figure out a way how to hide your tests for incompatible versions of p4est, i.e. if we run the testsuite with p4est 2.0, which is the current minimal requirement. P4EST_SEARCH_LOCAL has been introduced with p4est 2.2.

We discussed ideas for this in #12341 (review). I think the cheapest solution with alternative output files are the way to go here. I would not rewrite the cmake module for p4est to check its configuration. @tamiko -- What do you think?

@marcfehling
Copy link
Member

marcfehling commented Oct 13, 2021

@konsim83 -- Could you review and merge the two open PRs against this feature branch?

@konsim83
Copy link
Contributor Author

konsim83 commented Oct 13, 2021

@konsim83 -- Could you review and merge the two open PRs against this feature branch?

Sorry, was on my list, then forgot. It is merged now. Many thanks for the work! :-) Le me know when everything is ready so I can squash my commits.

@konsim83
Copy link
Contributor Author

@marcfehling The windows 2019 check fails and I do not see why. Any idea?

@marcfehling
Copy link
Member

@marcfehling The windows 2019 check fails and I do not see why. Any idea?

Just a hickup of github actions. Re-running the tests helped -- there are no errors.

@konsim83
Copy link
Contributor Author

OK, squashed everything. Actual use cases of this function are about to follow. We already implemented then and now would like to share them with the community :-)

Copy link
Member

@marcfehling marcfehling left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me!

There are just two things that we should address in future pull requests as this one is already large enough:

I looked over this PR for quite some time, and feel like I'm less likely to find anything else. I would prefer someone else looking over this PR one more time before merging.

@konsim83
Copy link
Contributor Author

many thanks @marcfehling for your valuable comments and addons. Thumbs up! @bangerth @tjhei @peterrum @cburstedde or oanyone else that I forgot to mention have additional comments?

@peterrum peterrum self-requested a review October 14, 2021 06:21
Copy link
Member

@peterrum peterrum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR looks to me ready to be merged! Very nice; thanks for the efforts! I will merge tomorrow in the morning CEST, to give others the chance to express their opinions. The changes a very much localized in p:d:T and changes do not affect the rest of the code. A better integration of the introduced in the rest of the library would be very nice but need the join involvement of more people!

@konsim83 @cburstedde Thanks for the explanations and sorry for the silence - vacations and busy days since than! I remember reading Parallel tree algorithms for AMR and non-standard data access and Parallel level-set methods on adaptive tree-based grids quite a while; it is nice to see algorithms from there now also in deal.II!

@peterrum peterrum merged commit 0159f37 into dealii:master Oct 15, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants