Add load balancing control (#198)

* Redo load balancing (only). * Update submodules. * Describe additional derived vars. * Add a brief description of the code base. * A few fixes. * Add control doc. * Trailing whitespaces. * Use LMeX version of the diagnostics only if requested. * Add a couple of details to address Bruce's comments.
AMReX-Combustion · Apr 28, 2023 · e5864ff · e5864ff
1 parent e7e1fcc
commit e5864ff
Show file tree

Hide file tree

Showing 19 changed files with 1,027 additions and 50 deletions.
diff --git a/CMakeLists.txt b/CMakeLists.txt
@@ -28,6 +28,7 @@ option(PELELMEX_USE_EB        "Enable Embedded Boundary"     OFF)
 option(PELELMEX_USE_EFIELD    "Enable Electric field module" OFF)
 option(PELELMEX_USE_PARTICLES "Enable Spray module"          OFF)
 option(PELELMEX_USE_SOOT      "Enable Soot module"           OFF)
+option(PELELMEX_USE_DIAG      "Enable LMeX Diagn module"     OFF)
 
 #
 # Misc options

diff --git a/Docs/source/manual/Implementation.rst b/Docs/source/manual/Implementation.rst
@@ -0,0 +1,197 @@
+.. highlight:: rst
+
+.. _sec:code:
+
+Source code
+===========
+
+The following provides an overview of *PeleLMeX* source code, basic information on the data structure and is
+useful for any user intending to use and/or do some development in the code.
+
+Overview of source code
+-----------------------
+
+*PeleLMeX* is based upon AMReX's `AmrCore <https://amrex-codes.github.io/amrex/docs_html/AmrCore.html>`_ from which it inherits
+an AMR hierarchy data structure and basic regridding functionnalities. The code is entirely written in C++, with low level
+compute-intensive kernels implemented as lambda functions to seamlessly run on CPU and various GPU backends through AMReX
+high performance portatbility abstraction.
+
+The core of the algorithm is implementation in the ``advance()`` function which acts on all the levels concurrently.
+Projection operators and advection scheme functions are imported the `AMReX-Hydro library <https://amrex-codes.github.io/AMReX-Hydro>`_
+while the core of the thermo-chemistry functionalities comes from `PelePhysics <https://amrex-combustion.github.io/PelePhysics/>`_ .
+Users are responsible for providing initial and boundary conditions in the local subfolder implementing their case, i.e. it is
+not possible to compile and run *PeleLMeX* without actually writting a few lines of codes. However, numerous example are provided
+in ``Exec/RegTests`` from which new users can pull for their new case.
+
+The source code contains a few dozen files, organized around the pieces of the algorithm and major functionalities:
+
+* ``PeleLMEvolve``: top level time advance loop, with IO/exit controls
+* ``PeleLMSetup``: setting up the simulation parameters, parsing the input file
+* ``PeleLMInit``: generating the initial solution from scratch or checkpoint file, performing initial projections/iteration(s)
+* ``PeleLMAdvance``: top level implementation of the time step algorithm
+* ``PeleLMProjection``: implement the various flavors of the nodal projection
+* ``PeleLMUmac``: implement the construction and projection of MAC-velocities
+* ``PeleLMAdvection``: functions to compute the explicit advection terms
+* ``PeleLMDiffusion``: functions to compute the diffusion terms, using the operators defined in ``DiffusionOp``
+* ``PeleLMReaction``: function using *PelePhysics* reactors to integrate the chemistry and linearized advection/diffusion
+* ``PeleLMPlot``: implementation of plotfile and checkpoint file IOs
+* ``PeleLMBC``: functions filling the ghost cells (at fine/fine, coarse/fine and domain boundaries)
+* ``PeleLMRegrid``: creating new AMR level or remaking modified AMR level during adaptive refinement
+* ``PeleLMTagging``: mark cells for refinement
+* ``PeleLM_K.H``: low-level kernel functions
+
+Data structure and containers
+-----------------------------
+
+AMReX 101
+^^^^^^^^^
+
+The basic AMReX`s data structure is the `MultiFab <https://amrex-codes.github.io/amrex/docs_html/Basics.html#fabarray-multifab-and-imultifab>`_
+(historically, multi Fortran Array Box (FAB)).
+Within the block-structured AMR approach of AMReX, the domain is decomposed into non-overlapping rectangular `boxes`,
+which can be assembled into a `boxArray`. Each AMR level has a `boxArray` providing the list of `boxes` of that level.
+The `boxes` are distributed accross the MPI ranks, the mapping of which is described by a `DistributionMap`. Given a
+`boxArray` and a `DistributionMap`, one can define an actual data container (`boxes` are only lightweight descriptor
+of the geometrical rectangular object, containing bounds and centering information only), where each rank will
+allocate a FAB for the boxes it owns in the `boxArray`, resulting in a collection of FABs or a MultiFab, distributed
+accross the MPI ranks.
+
+To access the data in a MultiFab, one uses a `MFIter <https://amrex-codes.github.io/amrex/docs_html/Basics.html#mfiter-and-tiling>`_
+(or MultiFab iterator), which provides each MPI rank access to the FABs it owns within the MultiFab. Actual access to the data in
+memory is then provided by the lightweight `Array4` structure and it is strongly advised to rely on AMReX
+`ParallelFor <https://amrex-codes.github.io/amrex/docs_html/Basics.html#parallelfor>`_ function template to loop through the logical `i,j,k` indexes.
+For example, to set the velocity data stored in a MultiFab called `NewState` with data from an `OldState` and an increment
+from a third MultiFab `advTerm`: ::
+
+    for (MFIter mfi(State,TilingIfNotGPU()); mfi.isValid(); ++mfi) {
+        Box const& bx = mfi.tilebox();
+        auto const& velo_old = OldState.const_array(mfi,VELOCITY_INDEX);
+        auto const& incr = advTerm.const_array(mfi);
+        auto const& velo_new = NewState.array(mfi,VELOCITY_INDEX);
+        ParallelFor(bx,
+        [=] AMREX_GPU_DEVICE (int i, int j, int k) noexcept
+        {
+            velo_new(i,j,k) = velo_old(i,j,k) + incr(i,j,k);
+        });
+    }
+
+.. note::
+   For the example above to function, all three MultiFabs must have the same `boxArray` and `DistributionMap`
+
+Users are strongly encouraged to review of the content of `AMReX documentation <https://amrex-codes.github.io/amrex/docs_html/Basics.html>`_
+to get more familiar with AMReX data structures and environment.
+
+*PeleLMeX* state and advance containers
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+The state vector of *PeleLMeX* contains the 2 or 3 components of velocity, the mixture density, species density (rhoYs),
+rhoH, temperature and the thermodynamic pressure. The state components are stored in a cell-centered MultiFab with
+`NVAR` components. Additionnally, the perturbational pressure stored at the nodes is contained in a separate MultiFab.
+Together with the cell-centered pressure gradient, the cell-centered divergence constraint and cell-centered
+transport properties, these MultiFabs are assembled into a `LevelData` struct.
+
+Each level in the AMR hierarchy have two versions of the `LevelData` at any point during the simulation: one
+for the old state and one for the new state. The developer can get a pointer to the `LevelData` struct by
+calling : ::
+
+    auto ldata_p = getLevelDataPtr(lev,AmrOldTime);
+
+with either `AmrOldTime` or `AmrNewTime` on level `lev`. Additionnally, calling this function with
+`AmrHalfTime` with return a `LevelData` struct whose `state` is a linearly interpolated between the old and new
+states (but the other MultiFab in `LevelData` are empty !).
+It is also often useful to have access to a vector of a state component accross the entire AMR hierarchy. To do so, *PeleLMeX*
+provides a set of functions returning a vector of MultiFab `std::unique_ptr` aliased into the `LevelData`
+MultiFab on each level: ::
+
+    getStateVect(time);         # Return the entire state (ncomp: NVAR)
+    getVelocityVect(time);      # Return the velocity only (ncomp: AMREX_SPACEDIM)
+    getDensityVect(time);       # Return the mixture density (ncomp: 1)
+    getSpeciesVect(time);       # Return the species density (ncomp: NUM_SPECIES)
+    getRhoHVect(time);          # Return rhoH (ncomp: 1)
+    getTempVect(time);          # Return temperature (ncomp: 1)
+    getDivUVect(time);          # Return divergence constraint (ncomp: 1)
+    getDiffusivityVect(time);   # Return diffusivity (ncomp: NUM_SPECIES+2)
+    getViscosityVect(time);     # Return viscosity (ncomp: 1)
+
+where ``time`` can either be `AmrOldTime` or `AmrNewTime`.
+Also available at any point during the simulation is the `LevelDataReact` which contains the species
+chemical source terms. A single version of the container is avaible on each level and can be accessed
+using: ::
+
+    auto ldataR_p = getLevelDataReactPtr(lev);
+
+Within the time-advance function, the *PeleLMeX* algorithm calls for the computation of the advection,
+diffusion and reaction source terms iteratively using SDC. At each step, the results of other steps
+can be used as part of the numerical scheme (e.g. the explicit advection with a Godunov scheme uses
+the diffusion term). These temporary variables, only useful in the scope of the advance function, are
+assembled into two structs: ``AdvanceDiffData`` and ``AdvanceAdvData``. The former contains three
+MultiFabs for the separate diffusion term evaluations described in :numref:`LMeX_Algo`: :math:`D^n`,
+:math:`D^{n+1,k}` and :math:`D^{n+1,k+1}`, as well as additional containers for the :math:`\overline{W}`
+and Soret contributions. The later encapsulate the face-centered MAC velocities :math:`U_{ADV}`, the
+advection term :math:`A_{n+1/2,(k+1)}`, the pressure correction :math:`\chi` and a forcing container
+used in the RHS of advection/diffusion/reaction solves. In contrast with the `LevelData`, these two containers
+are freed at the end of the advance function, and are passed around in the functions called in `advance()`.
+
+Parallelism
+-----------
+
+*PeleLMeX* inherits the MPI+X approach from the AMReX library, where X can be any of CUDA, HIP or SYCL
+(Open-MP is currently only available for non-reacting/explicit chemistry integrators) for heterogeneous architectures.
+The reader is referred to `AMReX GPU documentation <https://amrex-codes.github.io/amrex/docs_html/GPU.html>`_ for more details on
+the thread parallelism.
+
+As mentioned above, the top-level spatial decomposition arises from AMReX's block-structured approach. On each level, non-overlapping
+`boxes` are assembled into `boxArray` and distributed accross MPI rank with `DistributionMap` (or `DMap`).
+It is in our best interest to ensure that all the MultiFab in the code use the same `boxArray` and `DMap`,
+such that operation using `MFIter` can be performed and data copy accross MPI ranks is minimized.
+However, it is also important to maintain a good load balancing, i.e. ensure that each MPI rank has the same amount
+of work, to avoid wasting computational ressource. Reactive flow simulation are challenging, because the chemistry
+integration is very spatially heterogeneous, with stiff ODE integration required within the flame front and non-stiff
+integration of the linearized advection/diffusion required in the cold gases or burnt mixture. Additionnally, because
+a non-subcycling approach is used in *PeleLMeX*, the chemistry doesn't have to be integrated in fine-covered region.
+Two `boxArray` and associated `DMap` are thus available in *PeleLMeX*:
+
+1. The first one is inherited from `AmrCore` and is availble as ``grid[lev]`` (`boxArray`) and ``dmap[lev]`` (`DMap`) throughout the code. Most
+   of *PeleLMeX* MultiFabs use these two, and the `boxes` sizes are dictated by the `amr.max_grid_size` and `amr.blocking_factor` from the input
+   file. These are employed for all the operations in the code except the chemistry integration. The default load balancing approach is to use
+   space curve filling (SCF) with each box weighted by the number of cells in each box. Advanced users can try alternate appraoch using the
+   keys listed in :doc:`LMeXControls`.
+2. A second one is created, masking fine-covered regions and updated during regrid operations. It is used to perform the chemistry integration,
+   and because this is a purely local integration (in contrast with implicit diffusion solve for instance, which require communications
+   to solve the linear problem using GMG), a Knapsack load balancing approach is used by default, where the weight of each box is based
+   on the total number of chemistry RHS calls in the box. The size of the `boxes` in the chemistry `boxArray` (accessible with ``m_baChem[lev]``)
+   is controled by the `peleLM.max_grid_size_chem` in the input file. Once again, advanced users can try alternate approaches to load
+   balancing the chemistry `DMap` using the keys described in :doc:`LMeXControls`.
+
+After each regrid operation, even if the grids did not actually change, *PeleLMeX* will try to find a better load balancing for the
+`AmrCore` `DMap`. Because changing the load balancing requires copying data accross MPI ranks, we only want to change the `DMap`
+only if a significantly better new `DMap` can be obtained, with the threshold for a better `DMap` defined based on the value of
+`peleLM.load_balancing_efficiency_threshold`.
+
+Debugging
+---------
+
+The first step to debug anyh addition or undefined behavior of *PeleLMeX* is to turn the ``DEBUG`` flag ``ON`` in the
+GNUmakefile and activate AMReX`s floating point exception traps in the input file: ::
+
+    amrex.fpe_trap_invalid = 1
+    amrex.fpe_trap_zero = 1
+    amrex.fpe_trap_overflow = 1
+
+This will slow down the code considerably, but will enable bound checks on all AMReX low-level data structure,
+catch floating point errors (using nans, dividing by zero, ...) and any ``AMREX_ASSERT`` statement added to the
+code base. It is also often useful to visualize data in order to understand the erroneous results the solver can
+return. Developers can write to disk a single MultiFab using AMReX `VisMF`: ::
+
+    VisMF::Write(myMF,"VisMyMF");
+
+and can be visualized with `Amrvis` using `amrvis -mf`. Alternatively, visualizing the entire AMR hierarchy is also
+useful. *PeleLMeX* provides a simple function to write a vector of MultiFab: ::
+
+    WriteDebugPlotFile(GetVecOfConstPtrs(getTempVect()),"TempDebug");
+
+which can be opened with `Amrvis` or any other visualization software. This last function will function providing that
+the MultiFabs in the vector all have the same number of components.
+Finally, another way of checking individual pieces of the algorithm is to use *PeleLMeX* evaluate mode ``peleLM.run_mode=evaluate``
+and specify a list of fields with ``peleLM.evaluate_vars`` as described in :doc:`LMeXControls`. Note that not all of the
+algorithm is available in this mode yet.
diff --git a/Docs/source/manual/LMeXControls.rst b/Docs/source/manual/LMeXControls.rst
@@ -52,6 +52,27 @@ Grid/AMR parameters
     amr.blocking_factor = 16               # block factor in grid generation (min box size)
     amr.max_grid_size   = 64               # max box size
 
+    peleLM.max_grid_size_chem = 32         # [OPT, DEF="None"] Max box size for the Chemistry BoxArray
+
+Load balancing
+--------------
+
+*PeleLMeX* relies on two distribution maps (see :doc:`Implementation` for more details).
+
+::
+
+    peleLM.do_load_balancing = 1                    # [OPT, DEF=0] Activate load balancing
+    peleLM.load_balancing_method = sfc              # [OPT, DEF="sfc"] AmrCore dmap load balancing method
+    peleLM.load_balancing_cost_estimate = ncell     # [OPT, DEF="ncell"] AmrCore dmap balancing cost
+    peleLM.chem_load_balancing_method = knapsack    # [OPT, DEF="knapsack"] Chemistry dmap load balancing method
+    peleLM.chem_load_balancing_cost_estimate = chemfunctcall_sum # [OPT, DEF="chemfunctcall_sum"] Chemistry dmap balancing cost
+    peleLM.load_balancing_efficiency_threshold = 1.05  # What constitute a better dmap ?
+
+The balancing method can be one of `sfc`, `roundrobin` or `knapsack`, while the cost estimate can be one of
+`ncell`, `chemfunctcall_avg`, `chemfunctcall_max`, `chemfunctcall_sum`, `userdefined_avg` or `userdefined_sum`. When
+using either of the last to option, the user must provide a definition for the `derUserDefined`. If multiple components
+are defined in the `derUserDefined` function, the first one is used for load balancing.
+
 Time stepping parameters
 ------------------------
 
@@ -125,7 +146,6 @@ overview of the available controls:
 The `field_name` can be any of the state or derived variables (see below) component. Additional controls specific
 to embedded boundaries are discussed below.
 
-
 PeleLMeX derived variables
 --------------------------
 
@@ -186,8 +206,17 @@ The following list of derived variables are available in PeleLMeX:
     * - `coordinates`
       - AMREX_SPACEDIM
       - Cell-center coordinates
+    * - `DistributionMap`
+      - 1
+      - The MPI-rank of each box
+    * - `derUserDefined`
+      - ?
+      - A user-defined derived which number of components is provided by the user (see below).
 
 Note that `mixture_fraction` and `progress_variable` requires additional inputs from the users as described below.
+The `derUserDefined` allow the user to define its own derived variable which can comprise several components. To do
+so, the user need to copy the Source/DeriveUserDefined.cpp file into his run folder and update the file. The number of
+components is defined based on the size of the vector returned by pelelm_setuserderives().
 
 PeleLMeX algorithm
 ------------------

diff --git a/Docs/source/manual/Model.rst b/Docs/source/manual/Model.rst
@@ -40,7 +40,7 @@ In a nutshell, `PeleLMeX` features include:
 Mathematical background
 -----------------------
 
-`PeleLMeX` evolves chemically reacting low Mach number flows with block-structured adaptive mesh refinement (AMR). The code depends upon the `AMReX <https://github.com/AMReX-Codes/amrex>`_ library to provide the underlying data structures, and tools to manage and operate on them across massively parallel computing architectures. `PeleLMeX` also utilizes the source code and algorithmic infrastructure of `AMReX-Hydro <https://github.com/AMReX-Codes/AMReX-Hydro>`_. `PeleLMeX` borrows heavily from `PeleLM <https://github.com/AMReX-Combustion/PeleLM>`_. The core algorithms in `PeleLM` are described in the following papers:
+`PeleLMeX` evolves chemically reacting low Mach number flows with block-structured adaptive mesh refinement (AMR). The code depends upon the `AMReX <https://github.com/AMReX-Codes/amrex>`_ library to provide the underlying data structures, and tools to manage and operate on them across massively parallel computing architectures. `PeleLMeX` also utilizes the source code and algorithmic infrastructure of `AMReX-Hydro <https://github.com/AMReX-Codes/AMReX-Hydro>`_. `PeleLMeX` borrows heavily from `PeleLM`_. The core algorithms in `PeleLM` are described in the following papers:
 
 * *A conservative, thermodynamically consistent numerical approach for low Mach number combustion. I. Single-level integration*, A. Nonaka, J. B. Bell, and M. S. Day, *Combust. Theor. Model.*, **22** (1) 156-184 (2018)
 
@@ -172,12 +172,15 @@ and simplified velocity constraint,
 PeleLMeX Algorithm
 ------------------
 
-An overview of `PeleLMeX` time-advance function is provided in the figure below and details are provided in the following subsections.
+An overview of `PeleLMeX` time-advance function is provided in :numref:`LMeX_Algo` and details are provided in the following subsections.
 
 .. figure:: images/model/PeleLMeX_Algorithm.png
+   :name: LMeX_Algo
    :align: center
    :figwidth: 50%
 
+   : Flowchart of the *PeleLMeX* advance function.
+
 The three steps of the low Mach number projection scheme described :ref:`below <ssec:projScheme>` are referenced to better
 emphasize how the thermodynamic solve is closely weaved into the fractional step appraoch. Striped boxes indicate where the
 :ref:`Godunov procedure <ssec:advScheme>` is employed while the four different linear solves are highlighted.
@@ -340,7 +343,7 @@ Note that in the presence of EB, only the `Godunov_PLM` variant is available.
 AMR extension
 ^^^^^^^^^^^^^
 
-In contrast with `PeleLM <https://github.com/AMReX-Combustion/PeleLM>`_, `PeleLMeX` do not rely a on subcycling appraoch to advance the AMR hierarchy.
+In contrast with `PeleLM`_, `PeleLMeX` do not rely a on subcycling appraoch to advance the AMR hierarchy.
 This difference is illustrated in the figure below comparing the multi-level time-stepping approach in both codes:
 
 .. figure:: images/model/PeleLMeX_Subcycling.png

diff --git a/Docs/source/manual/index.rst b/Docs/source/manual/index.rst
@@ -22,6 +22,7 @@ point your web browser at the file ``${PELELMEX_HOME}/Docs/build/html/index.html
    :caption: Theory:
 
    Model.rst
+   Implementation.rst
    Validation.rst
    Performances.rst
 

diff --git a/Source/CMakeLists.txt b/Source/CMakeLists.txt
@@ -42,7 +42,9 @@ target_sources(pelelmex
         Utils.cpp
     )
 
-add_subdirectory(Diagnostics)
+if (PELELMEX_USE_DIAGS)
+    add_subdirectory(Diagnostics)
+endif ()
 
 if (PELELMEX_USE_EFIELD)
     add_subdirectory(Efield)