AMReX-Combustion · esclapez · Dec 21, 2022 · Dec 16, 2022 · Dec 16, 2022 · Dec 19, 2022
diff --git a/Docs/source/landing/overview.html b/Docs/source/landing/overview.html
@@ -58,13 +58,19 @@
                          <li style="margin-bottom: -15px; margin-left:20px; LINE-HEIGHT:25px">
                            Written in C++</li>
                          <li style="margin-bottom: -15px; margin-left:20px; LINE-HEIGHT:25px">
-                           Two- and Three- dimensional support</li>
+                           MPI+X appraoch, with X one of OpenMP, CUDA, HIP or SYCL</li>
                          <li style="margin-bottom: -15px; margin-left:20px; LINE-HEIGHT:25px">
-                           Combustion (transport, kinetics, thermodynamics) models based on Cantera and EGLib</li>
-                         <li style="margin-bottom: -15px; margin-left:20px; LINE-HEIGHT:25px">
-                           Parallelization using MPI+X appraoch, with X one of OpenMP, CUDA, HIP or SYCL</li>
+                           2D-Cartesian, 2D-Axisymmetric and 3D support</li>
                          <li style="margin-bottom: -15px; margin-left:20px; LINE-HEIGHT:25px">
                            Parallel I/O</li>
+                         <li style="margin-bottom: -15px; margin-left:20px; LINE-HEIGHT:25px">
+                           Built-in profiling tools</li>
+                         <li style="margin-bottom: -15px; margin-left:20px; LINE-HEIGHT:25px">
+                           Finite volume, block-structured AMR appraoch</li>
+                         <li style="margin-bottom: -15px; margin-left:20px; LINE-HEIGHT:25px">
+                           Complex geometries using Embedded Boundaries (EB)</li>
+                         <li style="margin-bottom: -15px; margin-left:20px; LINE-HEIGHT:25px">
+                           Combustion (transport, kinetics, thermodynamics) models based on Cantera and EGLib</li>
                          <li style="margin-bottom: -15px; margin-left:20px; LINE-HEIGHT:25px">
                            Plotfile format supported by 
                            <a href="https://github.com/AMReX-Codes/Amrvis">Amrvis</a>,
@@ -82,8 +88,6 @@
                            Temporally implicit viscosity, species mass diffusion, thermal conductivity, chemical kinetics</li>
                          <li style="margin-bottom: -15px; margin-left:20px; LINE-HEIGHT:25px">
                            Closed chamber algorithm enable time-varying background pressure changes</li>
-                         <li style="margin-bottom: -15px; margin-left:20px; LINE-HEIGHT:25px">
-                           Built-in profiling tools</li>
                        </ul>
                 </li>
 

diff --git a/Docs/source/manual/Performances.rst b/Docs/source/manual/Performances.rst
@@ -42,8 +42,8 @@ the time step size usually employed to remove artifacts from the initial data:
     amr.dt_shrink = 1.0
     amr.fixed_dt = 2.5e-7
 
-Additionnaly, all the tests on GPUs are conducted using the MAGMA dense-direct solver to solve for
-the Newton direction within CVODE's non-linear integration.
+Additionnaly, unless otherwise specified, all the tests on GPUs are conducted
+using the MAGMA dense-direct solver to solve for the Newton direction within CVODE's non-linear integration.
 
 ::
 
@@ -102,3 +102,110 @@ details):
 The total time comparison shows close to a 4x speed-up on a node basis on this platform, with the AMD Milan CPU being amongst
 the most performant to date. The detailed distribution of the computational time within each run highlight the dominant contribution
 of the stiff chemistry integration, especially on the GPU.
+
+Results on Crusher (ORNL)
+^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Crusher is the testbed for DOE's first ExaScale platform Frontier. Crusher's `nodes <https://docs.olcf.ornl.gov/systems/crusher_quick_start_guide.html#crusher-compute-nodes>`_ consists of a single AMD EPYC 7A53 (Trento), 64 cores CPU connected to 4 AMD MI250X,
+each containing 2 Graphics Compute Dies (GCDs) for a total of 8 GCDs per node. When running with GPU acceleration, `PeleLMeX` will use 8 MPI ranks with each access to one GCD, while when running on flat MPI, we will use 64 MPI-ranks.
+
+The FlameSheet case is ran using 2 levels of refinement (3 levels total) and the following domain size and cell count:
+
+::
+
+    geometry.prob_lo     = 0.0 0.0 0.0        # x_lo y_lo (z_lo)
+    geometry.prob_hi     = 0.016 0.016 0.016  # x_hi y_hi (z_hi)
+
+    amr.n_cell           = 64 64 64
+    amr.max_level        = 2
+
+leading to an initial cell count of 6.545 M, i.e. 0.8M/cells per GPU. The git hashes of `PeleLMeX` and its dependencies for
+these tests are:
+
+::
+
+     ================= Build infos =================
+     PeleLMeX    git hash: v22.12-15-g769168c-dirty
+     AMReX       git hash: 22.12-15-gff1cce552-dirty
+     PelePhysics git hash: v0.1-1054-gd6733fef
+     AMReX-Hydro git hash: cc9b82d
+     ===============================================
+
+The graph below compares the timings of the two runs obtained from `AMReX` TinyProfiler.
+Inclusive averaged data are presented here, for separates portion of the `PeleLMeX` algorithm
+(see the `algorithm page <https://amrex-combustion.github.io/PeleLMeX/manual/html/Model.html#pelelmex-algorithm>`_ for more
+details):
+
+
+.. figure:: images/performances/PMF/SingleNodePMF_Crusher.png
+   :align: center
+   :figwidth: 90%
+
+The total time comparison shows more than a 7.5x speed-up on a node basis on this platform,
+The detailed distribution of the computational time within each run highlight the dominant contribution
+of the stiff chemistry integration, especially on the GPU.
+
+Results on Summit (ORNL)
+^^^^^^^^^^^^^^^^^^^^^^^^
+
+Summit was launched in 2018 as the first DOE's fully GPU-accelerated platform.
+Summit's `nodes <https://docs.olcf.ornl.gov/systems/summit_user_guide.html#summit-nodes>`_ consists
+of a two IBM Power9 CPU connected to 6 NVIDIA V100 GPUs. When running with GPU acceleration, `PeleLMeX` will
+use 6 MPI ranks with each access to one V100, while when running on flat MPI, we will use 42 MPI-ranks.
+Note that in contrast with newer GPUs available on Perlmutter or Crusher, Summit's V100s only have 16GBs of
+memory which limit the number of cells/GPU. For this reason, the chemical linear solver used within Sundials is
+modified to the the less memory demanding *cuSparse* solver:
+
+::
+
+    cvode.solve_type = sparse_direct
+
+The FlameSheet case is ran using 2 levels of refinement (3 levels total) and the following domain size and cell count:
+
+::
+
+    geometry.prob_lo     = 0.0 0.0 0.0        # x_lo y_lo (z_lo)
+    geometry.prob_hi     = 0.004 0.008 0.016  # x_hi y_hi (z_hi)
+
+    amr.n_cell           = 16 32 64
+    amr.max_level        = 2
+
+leading to an initial cell count of 0.819 M, i.e. 0.136M/cells per GPU. The git hashes of `PeleLMeX` and its dependencies for
+these tests are:
+
+::
+
+     ================= Build infos =================
+     PeleLMeX    git hash: v22.12-15-g769168c-dirty
+     AMReX       git hash: 22.12-15-gff1cce552-dirty
+     PelePhysics git hash: v0.1-1054-gd6733fef
+     AMReX-Hydro git hash: cc9b82d
+     ===============================================
+
+The graph below compares the timings of the two runs obtained from `AMReX` TinyProfiler.
+Inclusive averaged data are presented here, for separates portion of the `PeleLMeX` algorithm
+(see the `algorithm page <https://amrex-combustion.github.io/PeleLMeX/manual/html/Model.html#pelelmex-algorithm>`_ for more
+details):
+
+
+.. figure:: images/performances/PMF/SingleNodePMF_Summit.png
+   :align: center
+   :figwidth: 90%
+
+The total time comparison shows close to a 4.5x speed-up on a node basis on this platform,
+The detailed distribution of the computational time within each run highlight the dominant contribution
+of the stiff chemistry integration,
+
+System comparison
+^^^^^^^^^^^^^^^^^
+
+It is interesting to compare the performances of each system on a node basis, normalizing by the number of cells
+to provide a node time / million of cells.
+
+.. figure:: images/performances/PMF/SingleNodePMFComparison.png
+   :align: center
+   :figwidth: 60%
+
+
+Results shows that a 3x and 4.2x speed is obtained on a node basis going from Summit to more recent
+Perlmutter or Crusher, respectively.
diff --git a/Docs/source/manual/images/performances/PMF/SingleNodePMFComparison.png b/Docs/source/manual/images/performances/PMF/SingleNodePMFComparison.png
diff --git a/Docs/source/manual/images/performances/PMF/SingleNodePMF_Crusher.png b/Docs/source/manual/images/performances/PMF/SingleNodePMF_Crusher.png
diff --git a/Docs/source/manual/images/performances/PMF/SingleNodePMF_Summit.png b/Docs/source/manual/images/performances/PMF/SingleNodePMF_Summit.png
diff --git a/Docs/source/manual/index.rst b/Docs/source/manual/index.rst
@@ -7,7 +7,7 @@ Welcome to PeleLMeX's documentation!
 ====================================
 
 `PeleLMeX` is the non-subcycling version of `PeleLM <https://amrex-combustion.github.io/PeleLM/>`_, an adaptive-mesh low Mach number hydrodynamics
-code for reacting flows. If you need help or have questions, please join the users `forum <https://groups.google.com/forum/#!forum/pelelmusers>`_.
+code for reacting flows. If you need help or have questions, please use the `GitHub discussion <https://github.com/AMReX-Combustion/PeleLMeX/discussions>`_.
 The documentation pages appearing here are distributed with the code in the ``Docs`` folder as "restructured text" files.  The html is built
 automatically with certain pushes to the `PeleLMeX` GibHub repository. A local version can also be built as follows ::