New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve performance of setup of MGTwoLevelTransfer
/MGTransferGlobalCoarsening
#14968
Comments
I have a benchmark for matrix-free, global-coarsening multigrid, p-multigrid and mesh adaptivity 95% ready. I use an affine mesh because that has a higher arithmetic intensity and thus more easily exposes bad data structures in our code. I will post it in the coming days, unless someone else feels we should do something entirely different. |
Let me give some numbers from a 2D version of my benchmark, I plan the 3D case of course:
The cost of the setup, compared to the solve cost, shows there is something to be done. |
Which version are you using? dealii/include/deal.II/multigrid/mg_transfer_global_coarsening.templates.h Lines 2385 to 2388 in b8135fa
or dealii/include/deal.II/multigrid/mg_transfer_global_coarsening.templates.h Lines 2460 to 2466 in b8135fa
If latter, which policy are you using?
I don't see an obvious reason why it is so expensive. The element prolongation matrices are only set up once and there is a few loops over cells to collect data like indices or constraints. I guess you use |
Let me post some results here: PR #15794 allow to run local smoothing with The left column contains the current times of Exploiting the knowledge of having first-child policy or having meshes that are not repartitioned allows to reduce the setup costs by approx 35%.
|
MGTwoLevelTransfer
/MGTransferGlobalCoarsening
When timing dealii/tests/performance/timing_mg_glob_coarsen.cc Lines 463 to 466 in 2e55d72
to |
Thanks for the timings, this helps to understand the variations.
In my opinion, the performance tests do not necessarily need to cover the best possible way to run a problem, and stressing an additional component can make sense. Therefore, I do not think it is necessary to change this part of the benchmark, and I would rather see it as part of the benchmark to repartition and create a new mesh. On a related note, I would think that it should be possible to make this manipulation of the grid much cheaper than operations working with the dofs, since we use higher order polynomials and the grid should not. This does not need to happen now, but I think it is good to have it on the radar. If we have good reasons, we can of course also think of a different setup that we think is even more representative of real work loads. I think repartitioning is useful in real cases. |
I don't understand what you mean. We work on cell level but need to gather the DoFs after we have figured out which cells are relevant. |
I was not super precise, but thought it was obvious from the context: The test case uses polynomial coarsening, so the grid manipulation and repartitioning is on a low-order finite element space (degree 2) and should thus be much cheaper than the work on the active dofs (degree 4) inside the solver. |
Already in two projects, I observed that
MGTwoLevelTransfer::reinit()
is takes a long time for polynomial transfer. More precisely, I observed that the casedealii/include/deal.II/multigrid/mg_transfer_global_coarsening.templates.h
Lines 3240 to 3248 in a1b1356
The text was updated successfully, but these errors were encountered: