Improve performance of setup of `MGTwoLevelTransfer`/`MGTransferGlobalCoarsening` #14968

kronbichler · 2023-03-24T11:52:31Z

Already in two projects, I observed that MGTwoLevelTransfer::reinit() is takes a long time for polynomial transfer. More precisely, I observed that the case

dealii/include/deal.II/multigrid/mg_transfer_global_coarsening.templates.h

Lines 3240 to 3248 in a1b1356

    
           if (do_polynomial_transfer) 
        
             internal::MGTwoLevelTransferImplementation::reinit_polynomial_transfer( 
        
               dof_handler_fine, 
        
               dof_handler_coarse, 
        
               constraints_fine, 
        
               constraints_coarse, 
        
               mg_level_fine, 
        
               mg_level_coarse, 
        
               *this);

with uniform polynomial degree takes much longer to set up than the geometric transfer, which is, given the simplicity in case of the same mesh, surprising. Obviously, the code is written for the general case, but there might be room for some improvements. Maybe we should also add a benchmark similar to https://github.com/dealii/dealii/blob/master/tests/performance/timing_step_37.cc but using some combination of adaptive meshes, global coarsening and p-multigrid to ensure that these steps get done reasonably well.

The text was updated successfully, but these errors were encountered:

kronbichler · 2023-03-25T09:52:32Z

I have a benchmark for matrix-free, global-coarsening multigrid, p-multigrid and mesh adaptivity 95% ready. I use an affine mesh because that has a higher arithmetic intensity and thus more easily exposes bad data structures in our code. I will post it in the coming days, unless someone else feels we should do something entirely different.

kronbichler · 2023-03-25T10:00:36Z

Let me give some numbers from a 2D version of my benchmark, I plan the 3D case of course:

Refining the grid adaptively takes 600m instructions
Setting up geometric_coarsening_sequence takes 4.9b instructions
DoFHandler::distribute_dofs takes 400m instructions, showing the optimizations done in the past
Matrix-free setup + renumbering (float 2x, double) is around 3b instructions, so pretty good given the work we do there
MGTwoLevelTransfer::reinit_polynomial_transfer takes 3.8b instructions
MGTwoLevelTransfer::reinit_geometric_transfer takes 1.8b instructions
Solving two times takes 1.1b instructions

The cost of the setup, compared to the solve cost, shows there is something to be done.

peterrum · 2023-03-25T17:45:10Z

Setting up geometric_coarsening_sequence takes 4.9b instructions

Which version are you using?

dealii/include/deal.II/multigrid/mg_transfer_global_coarsening.templates.h

Lines 2385 to 2388 in b8135fa

    
           template <int dim, int spacedim> 
        
           std::vector<std::shared_ptr<const Triangulation<dim, spacedim>>> 
        
           create_geometric_coarsening_sequence( 
        
             const Triangulation<dim, spacedim> &fine_triangulation_in)

or

dealii/include/deal.II/multigrid/mg_transfer_global_coarsening.templates.h

Lines 2460 to 2466 in b8135fa

    
           template <int dim, int spacedim> 
        
           std::vector<std::shared_ptr<const Triangulation<dim, spacedim>>> 
        
           create_geometric_coarsening_sequence( 
        
             Triangulation<dim, spacedim> &                        fine_triangulation_in, 
        
             const RepartitioningPolicyTools::Base<dim, spacedim> &policy, 
        
             const bool keep_fine_triangulation, 
        
             const bool repartition_fine_triangulation)

If latter, which policy are you using?

MGTwoLevelTransfer::reinit_polynomial_transfer takes 3.8b instructions
MGTwoLevelTransfer::reinit_geometric_transfer takes 1.8b instructions

I don't see an obvious reason why it is so expensive. The element prolongation matrices are only set up once and there is a few loops over cells to collect data like indices or constraints. I guess you use FE_Q? Which degree? You first coarse p and after that h?

kronbichler · 2023-03-28T14:21:02Z

@peterrum have a look at #14981. The numbers you see here are for a 2D version of the code, but they are essentially the same otherwise. (I did in fact use the first variant of create_geometric_coarsening_sequence before and switched to the second one later, but in 3D they are not that far apart.)

peterrum · 2023-08-01T15:43:53Z

Let me post some results here:

PR #15794 allow to run local smoothing with MGTransferMF (aka MGTransferGlobalCoarsaning) and PR #15807 specializes the setup (for first-child policy and p-multigrid without partitioning).

The left column contains the current times of timing_step_37 with MGTransferMatrixFree and MGTransferMF and timing_mg_glob_coarsen. The right column contains the old times.

Exploiting the knowledge of having first-child policy or having meshes that are not repartitioned allows to reduce the setup costs by approx 35%.

MGTransferMatrixFree and MGTransferMF have similar costs but latter has still higher setup costs.

peterrum · 2023-08-05T12:07:04Z

When timing timing_mg_glob_coarsen with #15807, I noticed that changing

dealii/tests/performance/timing_mg_glob_coarsen.cc

Lines 463 to 466 in 2e55d72

    
             coarse_triangulations = 
        
               MGTransferGlobalCoarseningTools::create_geometric_coarsening_sequence( 
        
                 triangulation/*, 
        
                                RepartitioningPolicyTools::MinimalGranularityPolicy<dim>(16)*/);

to RepartitioningPolicyTools::FirstChildPolicy<dim>(triangulation) allows to cut down the setup costs by 50%. Shall we adopt this change also in the performance test?

kronbichler · 2023-08-06T14:25:38Z

Thanks for the timings, this helps to understand the variations.

]...] RepartitioningPolicyTools::FirstChildPolicy<dim>(triangulation) allows to cut down the setup costs by 50%. Shall we adopt this change also in the performance test?

In my opinion, the performance tests do not necessarily need to cover the best possible way to run a problem, and stressing an additional component can make sense. Therefore, I do not think it is necessary to change this part of the benchmark, and I would rather see it as part of the benchmark to repartition and create a new mesh.

On a related note, I would think that it should be possible to make this manipulation of the grid much cheaper than operations working with the dofs, since we use higher order polynomials and the grid should not. This does not need to happen now, but I think it is good to have it on the radar.

If we have good reasons, we can of course also think of a different setup that we think is even more representative of real work loads. I think repartitioning is useful in real cases.

peterrum · 2023-08-07T07:49:31Z

On a related note, I would think that it should be possible to make this manipulation of the grid much cheaper than operations working with the dofs, since we use higher order polynomials and the grid should not. This does not need to happen now, but I think it is good to have it on the radar.

I don't understand what you mean. We work on cell level but need to gather the DoFs after we have figured out which cells are relevant.

kronbichler · 2023-08-07T08:01:31Z

I was not super precise, but thought it was obvious from the context: The test case uses polynomial coarsening, so the grid manipulation and repartitioning is on a low-order finite element space (degree 2) and should thus be much cheaper than the work on the active dofs (degree 4) inside the solver.

kronbichler added the Multigrid label Mar 24, 2023

kronbichler mentioned this issue Mar 28, 2023

Add benchmark for multgrid with global coarsening #14981

Merged

kronbichler added this to the Release 9.6 milestone May 1, 2023

kronbichler mentioned this issue May 1, 2023

MatrixFree: Unify DoFInfo and ConstraintInfo #15162

Open

kronbichler mentioned this issue May 23, 2023

Naming of MultigridVariant::LocalSmoothing/GlobalCoarsening exadg/exadg#445

Closed

kronbichler mentioned this issue Jun 28, 2023

Fix MGTwoLevelTransfer when used without MPI #15517

Merged

peterrum mentioned this issue Jul 23, 2023

MGTransferGlobalCoarsening: initialize with MGConstrainedDoFs #15752

Merged

peterrum changed the title ~~reinit_polynomial_transfer is very slow~~ Improve performance of setup of MGTwoLevelTransfer/MGTransferGlobalCoarsening Aug 5, 2023

peterrum mentioned this issue Aug 5, 2023

MGTwoLevelTransfer: remove temporal vector in compute_weights() #15841

Merged

kronbichler mentioned this issue Aug 8, 2023

Avoid memory allocation in CellIDTranslator #15857

Merged

This was referenced Dec 10, 2023

MGTwoLevelTransfer: improve setup of weights #16322

Merged

MGTwoLevelTransfer: improve setup of indices #16339

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve performance of setup of `MGTwoLevelTransfer`/`MGTransferGlobalCoarsening` #14968

Improve performance of setup of `MGTwoLevelTransfer`/`MGTransferGlobalCoarsening` #14968

kronbichler commented Mar 24, 2023

kronbichler commented Mar 25, 2023

kronbichler commented Mar 25, 2023

peterrum commented Mar 25, 2023

kronbichler commented Mar 28, 2023

peterrum commented Aug 1, 2023

peterrum commented Aug 5, 2023

kronbichler commented Aug 6, 2023

peterrum commented Aug 7, 2023

kronbichler commented Aug 7, 2023 •

edited

Improve performance of setup of MGTwoLevelTransfer/MGTransferGlobalCoarsening #14968

Improve performance of setup of MGTwoLevelTransfer/MGTransferGlobalCoarsening #14968

Comments

kronbichler commented Mar 24, 2023

kronbichler commented Mar 25, 2023

kronbichler commented Mar 25, 2023

peterrum commented Mar 25, 2023

kronbichler commented Mar 28, 2023

peterrum commented Aug 1, 2023

peterrum commented Aug 5, 2023

kronbichler commented Aug 6, 2023

peterrum commented Aug 7, 2023

kronbichler commented Aug 7, 2023 • edited

Improve performance of setup of `MGTwoLevelTransfer`/`MGTransferGlobalCoarsening` #14968

Improve performance of setup of `MGTwoLevelTransfer`/`MGTransferGlobalCoarsening` #14968

kronbichler commented Aug 7, 2023 •

edited