Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Summit perfs #162

Merged
merged 5 commits into from
Dec 21, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 10 additions & 6 deletions Docs/source/landing/overview.html
Original file line number Diff line number Diff line change
Expand Up @@ -58,13 +58,19 @@
<li style="margin-bottom: -15px; margin-left:20px; LINE-HEIGHT:25px">
Written in C++</li>
<li style="margin-bottom: -15px; margin-left:20px; LINE-HEIGHT:25px">
Two- and Three- dimensional support</li>
MPI+X appraoch, with X one of OpenMP, CUDA, HIP or SYCL</li>
<li style="margin-bottom: -15px; margin-left:20px; LINE-HEIGHT:25px">
Combustion (transport, kinetics, thermodynamics) models based on Cantera and EGLib</li>
<li style="margin-bottom: -15px; margin-left:20px; LINE-HEIGHT:25px">
Parallelization using MPI+X appraoch, with X one of OpenMP, CUDA, HIP or SYCL</li>
2D-Cartesian, 2D-Axisymmetric and 3D support</li>
<li style="margin-bottom: -15px; margin-left:20px; LINE-HEIGHT:25px">
Parallel I/O</li>
<li style="margin-bottom: -15px; margin-left:20px; LINE-HEIGHT:25px">
Built-in profiling tools</li>
<li style="margin-bottom: -15px; margin-left:20px; LINE-HEIGHT:25px">
Finite volume, block-structured AMR appraoch</li>
<li style="margin-bottom: -15px; margin-left:20px; LINE-HEIGHT:25px">
Complex geometries using Embedded Boundaries (EB)</li>
<li style="margin-bottom: -15px; margin-left:20px; LINE-HEIGHT:25px">
Combustion (transport, kinetics, thermodynamics) models based on Cantera and EGLib</li>
<li style="margin-bottom: -15px; margin-left:20px; LINE-HEIGHT:25px">
Plotfile format supported by
<a href="https://github.com/AMReX-Codes/Amrvis">Amrvis</a>,
Expand All @@ -82,8 +88,6 @@
Temporally implicit viscosity, species mass diffusion, thermal conductivity, chemical kinetics</li>
<li style="margin-bottom: -15px; margin-left:20px; LINE-HEIGHT:25px">
Closed chamber algorithm enable time-varying background pressure changes</li>
<li style="margin-bottom: -15px; margin-left:20px; LINE-HEIGHT:25px">
Built-in profiling tools</li>
</ul>
</li>

Expand Down
111 changes: 109 additions & 2 deletions Docs/source/manual/Performances.rst
Original file line number Diff line number Diff line change
Expand Up @@ -42,8 +42,8 @@ the time step size usually employed to remove artifacts from the initial data:
amr.dt_shrink = 1.0
amr.fixed_dt = 2.5e-7

Additionnaly, all the tests on GPUs are conducted using the MAGMA dense-direct solver to solve for
the Newton direction within CVODE's non-linear integration.
Additionnaly, unless otherwise specified, all the tests on GPUs are conducted
using the MAGMA dense-direct solver to solve for the Newton direction within CVODE's non-linear integration.

::

Expand Down Expand Up @@ -102,3 +102,110 @@ details):
The total time comparison shows close to a 4x speed-up on a node basis on this platform, with the AMD Milan CPU being amongst
the most performant to date. The detailed distribution of the computational time within each run highlight the dominant contribution
of the stiff chemistry integration, especially on the GPU.

Results on Crusher (ORNL)
^^^^^^^^^^^^^^^^^^^^^^^^^

Crusher is the testbed for DOE's first ExaScale platform Frontier. Crusher's `nodes <https://docs.olcf.ornl.gov/systems/crusher_quick_start_guide.html#crusher-compute-nodes>`_ consists of a single AMD EPYC 7A53 (Trento), 64 cores CPU connected to 4 AMD MI250X,
each containing 2 Graphics Compute Dies (GCDs) for a total of 8 GCDs per node. When running with GPU acceleration, `PeleLMeX` will use 8 MPI ranks with each access to one GCD, while when running on flat MPI, we will use 64 MPI-ranks.

The FlameSheet case is ran using 2 levels of refinement (3 levels total) and the following domain size and cell count:

::

geometry.prob_lo = 0.0 0.0 0.0 # x_lo y_lo (z_lo)
geometry.prob_hi = 0.016 0.016 0.016 # x_hi y_hi (z_hi)

amr.n_cell = 64 64 64
amr.max_level = 2

leading to an initial cell count of 6.545 M, i.e. 0.8M/cells per GPU. The git hashes of `PeleLMeX` and its dependencies for
these tests are:

::

================= Build infos =================
PeleLMeX git hash: v22.12-15-g769168c-dirty
AMReX git hash: 22.12-15-gff1cce552-dirty
PelePhysics git hash: v0.1-1054-gd6733fef
AMReX-Hydro git hash: cc9b82d
===============================================

The graph below compares the timings of the two runs obtained from `AMReX` TinyProfiler.
Inclusive averaged data are presented here, for separates portion of the `PeleLMeX` algorithm
(see the `algorithm page <https://amrex-combustion.github.io/PeleLMeX/manual/html/Model.html#pelelmex-algorithm>`_ for more
details):


.. figure:: images/performances/PMF/SingleNodePMF_Crusher.png
:align: center
:figwidth: 90%

The total time comparison shows more than a 7.5x speed-up on a node basis on this platform,
The detailed distribution of the computational time within each run highlight the dominant contribution
of the stiff chemistry integration, especially on the GPU.

Results on Summit (ORNL)
^^^^^^^^^^^^^^^^^^^^^^^^

Summit was launched in 2018 as the first DOE's fully GPU-accelerated platform.
Summit's `nodes <https://docs.olcf.ornl.gov/systems/summit_user_guide.html#summit-nodes>`_ consists
of a two IBM Power9 CPU connected to 6 NVIDIA V100 GPUs. When running with GPU acceleration, `PeleLMeX` will
use 6 MPI ranks with each access to one V100, while when running on flat MPI, we will use 42 MPI-ranks.
Note that in contrast with newer GPUs available on Perlmutter or Crusher, Summit's V100s only have 16GBs of
memory which limit the number of cells/GPU. For this reason, the chemical linear solver used within Sundials is
modified to the the less memory demanding *cuSparse* solver:

::

cvode.solve_type = sparse_direct

The FlameSheet case is ran using 2 levels of refinement (3 levels total) and the following domain size and cell count:

::

geometry.prob_lo = 0.0 0.0 0.0 # x_lo y_lo (z_lo)
geometry.prob_hi = 0.004 0.008 0.016 # x_hi y_hi (z_hi)

amr.n_cell = 16 32 64
amr.max_level = 2

leading to an initial cell count of 0.819 M, i.e. 0.136M/cells per GPU. The git hashes of `PeleLMeX` and its dependencies for
these tests are:

::

================= Build infos =================
PeleLMeX git hash: v22.12-15-g769168c-dirty
AMReX git hash: 22.12-15-gff1cce552-dirty
PelePhysics git hash: v0.1-1054-gd6733fef
AMReX-Hydro git hash: cc9b82d
===============================================

The graph below compares the timings of the two runs obtained from `AMReX` TinyProfiler.
Inclusive averaged data are presented here, for separates portion of the `PeleLMeX` algorithm
(see the `algorithm page <https://amrex-combustion.github.io/PeleLMeX/manual/html/Model.html#pelelmex-algorithm>`_ for more
details):


.. figure:: images/performances/PMF/SingleNodePMF_Summit.png
:align: center
:figwidth: 90%

The total time comparison shows close to a 4.5x speed-up on a node basis on this platform,
The detailed distribution of the computational time within each run highlight the dominant contribution
of the stiff chemistry integration,

System comparison
^^^^^^^^^^^^^^^^^

It is interesting to compare the performances of each system on a node basis, normalizing by the number of cells
to provide a node time / million of cells.

.. figure:: images/performances/PMF/SingleNodePMFComparison.png
:align: center
:figwidth: 60%


Results shows that a 3x and 4.2x speed is obtained on a node basis going from Summit to more recent
Perlmutter or Crusher, respectively.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
2 changes: 1 addition & 1 deletion Docs/source/manual/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ Welcome to PeleLMeX's documentation!
====================================

`PeleLMeX` is the non-subcycling version of `PeleLM <https://amrex-combustion.github.io/PeleLM/>`_, an adaptive-mesh low Mach number hydrodynamics
code for reacting flows. If you need help or have questions, please join the users `forum <https://groups.google.com/forum/#!forum/pelelmusers>`_.
code for reacting flows. If you need help or have questions, please use the `GitHub discussion <https://github.com/AMReX-Combustion/PeleLMeX/discussions>`_.
The documentation pages appearing here are distributed with the code in the ``Docs`` folder as "restructured text" files. The html is built
automatically with certain pushes to the `PeleLMeX` GibHub repository. A local version can also be built as follows ::

Expand Down