Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions docs/platforms/platforms_index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -210,6 +210,7 @@ libEnsemble on specific HPC systems.
bebop
cori
perlmutter
polaris
spock/crusher <spock_crusher>
summit
theta
Expand Down
109 changes: 109 additions & 0 deletions docs/platforms/polaris.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,109 @@
=======
Polaris
=======

Polaris_ is a 560-node HPE system located in the ALCF_ at Argonne
National Laboratory. The compute nodes are equipped with a single AMD EPYC Milan
processor and four A100 NVIDIA GPUs. It uses the PBS scheduler to submit
jobs from login nodes to run on the compute nodes.


Configuring Python and Installation
-----------------------------------

Python and libEnsemble are available on Polaris with the `conda` module. Load the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the first "conda" supposed to be "conda_ "?

``conda`` module and activate the base environment::

module load conda
conda activate base

This also gives you access to machine-optimized packages such as mpi4py_.

To install further packages, including updating libEnsemble, you may either create
a virtual environment on top of this (if just using ``pip install``) or clone the base
environment (if you need ``conda install``). More details at `Python for Polaris`_.

.. container:: toggle

.. container:: header

Example of Conda + virtual environment

E.g.,~ to create a virtual environment that allows installation of further packages::

python -m venv /path/to-venv --system-site-packages
. /path/to-venv/bin/activate

Where ``/path/to-venv`` can be anywhere you have write access. For future sessions,
just load the ``conda`` module and run the activate line.

You can now pip install libEnsemble::

pip install libensemble

See :doc:`here<../advanced_installation>` for more information on advanced options
for installing libEnsemble, including using Spack.


Ensuring use of mpiexec
-----------------------

If using the :doc:`MPIExecutor<../executor/mpi_executor>` it is recommended to
ensure you are using ``mpiexec`` instead of ``aprun``. When setting up the executor use::

from libensemble.executors.mpi_executor import MPIExecutor
exctr = MPIExecutor(custom_info={'mpi_runner':'mpich', 'runner_name':'mpiexec'})


Job Submission
--------------

Polaris uses the PBS scheduler to submit jobs from login nodes to run on
the compute nodes. libEnsemble runs on the compute nodes using either
``multi-processing`` or ``mpi4py``

A simple example batch script for a libEnsemble use case that runs 5 workers
(e.g.,~ one persistent generator and four for simulations) on one node:

.. code-block:: bash
:linenos:

#!/bin/bash
#PBS -A <myproject>
#PBS -lwalltime=00:15:00
#PBS -lselect=1
#PBS -q debug
#PBS -lsystem=polaris
#PBS -lfilesystems=home:grand

export MPICH_GPU_SUPPORT_ENABLED=1

cd $PBS_O_WORKDIR

python run_libe_forces.py --comms local --nworkers 5

The script can be run with::

qsub submit_libe.sh

Or you can run an interactive session with::

qsub -A <myproject> -l select=1 -l walltime=15:00 -lfilesystems=home:grand -qdebug -I

You may need to reload your ``conda`` module and reactivate ``venv`` environment
again after starting the interactive session.

Demonstration
-------------

For an example that runs a small ensemble using a C application (offloading work to the
GPU), see the :doc:`forces_gpu<../tutorials/forces_gpu_tutorial>` tutorial. A video demonstration_
of this example is also available.


.. _Polaris: https://www.alcf.anl.gov/polaris
.. _ALCF: https://www.alcf.anl.gov/
.. _Python for Polaris: https://www.alcf.anl.gov/support/user-guides/polaris/data-science-workflows/python/index.html
.. _conda: https://conda.io/en/latest/
.. _mpi4py: https://mpi4py.readthedocs.io/en/stable/
.. _demonstration: https://youtu.be/Ff0dYYLQzoU
84 changes: 43 additions & 41 deletions docs/tutorials/forces_gpu_tutorial.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3,36 +3,28 @@ Executor - Assign GPUs
======================

This tutorial shows the most portable way to assign tasks (user applications)
to the GPU.
to the GPU. The libEnsemble scripts in this example are available under
forces_gpu_ in the libEnsemble repository.

This example is based on the
:doc:`simple forces tutorial <../tutorials/executor_forces_tutorial>` with
a slightly modified simulation function (to assign GPUs) and an increased number
of particles (allows live GPU usage to be viewed).

In the first example, each worker will be using one GPU. We assume the workers are on a
cluster with CUDA-capable GPUs. We will assign GPUs by setting the environment
variable ``CUDA_VISIBLE_DEVICES``. An equivalent approach can be used with other
devices.

This example is based on the
:doc:`simple forces tutorial <../tutorials/executor_forces_tutorial>` with
a slightly modified simulation function.

To compile the forces application to use the GPU, ensure forces.c_ has the
``#pragma omp target`` line uncommented and comment out the equivalent
``#pragma omp parallel`` line. Then compile **forces.x** using one of the
GPU build lines in build_forces.sh_ or similar for your platform.

The libEnsemble scripts in this example are available under forces_gpu_ in
the libEnsemble repository.

Note that at the time of writing, the calling script **run_libe_forces.py** is functionally
the same as that in *forces_simple*, but contains some commented out lines that can
be used for a variable resources example. The *forces_simf.py* file has slight modifications
to assign GPUs.

Videos demonstrate running this example on Perlmutter_ and Spock_.
Videos demonstrate running this example on Perlmutter_, Spock_, and Polaris_.
*The first two videos are from an earlier release - you no longer need to change
particle count or modify the `forces.c` file).*

Simulation function
-------------------

The ``sim_f`` (``forces_simf.py``) becomes as follows. The new lines are highlighted:
The ``sim_f`` (``forces_simf.py``) is as follows. The lines that are different
to the forces simple example are highlighted:

.. code-block:: python
:linenos:
Expand Down Expand Up @@ -132,6 +124,16 @@ Alternative environment variables can be simply substituted in ``set_env_to_slot
On some systems ``CUDA_VISIBLE_DEVICES`` may be overridden by other assignments
such as ``--gpus-per-task=1``


Compiling the Forces application
--------------------------------

First, compile the forces application under the ``forces_app`` directory.

Compile **forces.x** using one of the GPU build lines in build_forces.sh_
or similar for your platform.


Running the example
-------------------

Expand All @@ -141,22 +143,22 @@ eight workers. For example::
python run_libe_forces.py --comms local --nworkers 8

Note that if you are running one persistent generator that does not require
resources, then assign nine workers, and fix the number of *resource_sets* in
you calling script::
resources, then assign nine workers and fix the number of *resource_sets* in
your calling script::

libE_specs["num_resource_sets"] = 8

See :ref:`zero resource workers<zero_resource_workers>` for more ways to express this.

Changing number of GPUs per worker
----------------------------------
Changing the number of GPUs per worker
--------------------------------------

If you want to have two GPUs per worker on the same system (four GPUs per node),
you could assign only four workers, and change line 24 to::

resources.set_env_to_slots("CUDA_VISIBLE_DEVICES", multiplier=2)

In this case there are two GPUs per worker (and per slot).
In this case, there are two GPUs per worker (and per slot).

Varying resources
-----------------
Expand All @@ -167,25 +169,29 @@ calling script.

In the generator function, assign the ``resource_sets`` field of
:ref:`H<funcguides-history>` for each point generated. For example
if a larger simulation requires two MPI tasks (and two GPUs), set ``resource_sets``
if a larger simulation requires two MPI tasks (and two GPUs), set the ``resource_sets``
field to *2* for that sim_id in the generator function.

The calling script run_libe_forces.py_ contains alternative commented out lines for
The calling script run_libe_forces.py_ contains alternative commented-out lines for
a variable resource example. Search for "Uncomment for var resources"

In this case, the simulator function will still work, assigning one CPU processor
and one GPU to each MPI rank. If you want to have one rank with multiple GPUs,
then change source lines 29/30 accordingly.

Further guidance on varying resource to workers can be found under the
:doc:`resource manager<../resource_manager/resources_index>`.
Further guidance on varying the resources assigned to workers can be found under the
:doc:`resource manager<../resource_manager/resources_index>` section.

Checking GPU usage
------------------

You can check you are running forces on the GPUs as expected by using profiling tools and/or by using
a monitoring utility. For NVIDIA GPUs, for example, the **Nsight** profiler is generally available
and can be run from the command line. To simply run `forces.x` stand-alone you could run::
The output of `forces.x` will say if it has run on the host or device. When running
libEnsemble, this can be found under the ``ensemble`` directory.

You can check you are running forces on the GPUs as expected by using profiling tools and/or
by using a monitoring utility. For NVIDIA GPUs, for example, the **Nsight** profiler is
generally available and can be run from the command line. To simply run `forces.x` stand-alone
you could run::

nsys profile --stats=true mpirun -n 2 ./forces.x

Expand All @@ -195,22 +201,17 @@ running (this may entail using *ssh* to get on to the node), and run::
watch -n 0.1 nvidia-smi

This will update GPU usage information every 0.1 seconds. You would need to ensure the code
runs for long enough to register on the monitor, so lets try 100,000 particles::
runs for long enough to register on the monitor, so let's try 100,000 particles::

mpirun -n 2 ./forces.x 100000

It is also recommended that you run without the profiler when using the `nvidia-smi` utility.

This can also be used when running via libEnsemble, so long as you are on the node where the
forces applications are being run. As the default particles in the forces example is 1000, you
will need to to increase particles to see clear GPU usage in the live monitor. E.g.,~ in line 14
to multiply the particles by 10::

# Parse out num particles, from generator function
particles = str(int(H["x"][0][0]) * 10)
forces applications are being run.

Alternative monitoring devices include ``rocm-smi`` (AMD) and ``intel_gpu_top`` (Intel). The latter
does not need the *watch* command.
Alternative monitoring devices include ``rocm-smi`` (AMD) and ``intel_gpu_top`` (Intel).
The latter does not need the *watch* command.

Example submission script
-------------------------
Expand Down Expand Up @@ -242,4 +243,5 @@ resource conflicts on each node.
.. _build_forces.sh: https://github.com/Libensemble/libensemble/blob/develop/libensemble/tests/scaling_tests/forces/forces_app/build_forces.sh
.. _Perlmutter: https://www.youtube.com/watch?v=Av8ctYph7-Y
.. _Spock: https://www.youtube.com/watch?v=XHXcslDORjU
.. _Polaris: https://youtu.be/Ff0dYYLQzoU
.. _run_libe_forces.py: https://github.com/Libensemble/libensemble/blob/develop/libensemble/tests/scaling_tests/forces/forces_gpu/run_libe_forces.py
10 changes: 10 additions & 0 deletions examples/libE_submission_scripts/submit_pbs_simple.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
#!/bin/bash -l
#PBS -l select=2
#PBS -l walltime=00:15:00
#PBS -q <queue_name>
#PBS -A <myproject>

# We selected 2 nodes - now running with 8 workers.
export MPICH_GPU_SUPPORT_ENABLED=1
cd $PBS_O_WORKDIR
python run_libe_forces.py --comms local --nworkers 8
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
#!/bin/bash -l
#PBS -l select=1:system=polaris
#PBS -l walltime=00:15:00
#PBS -l filesystems=home:grand
#PBS -q debug
#PBS -A <myproject>

export MPICH_GPU_SUPPORT_ENABLED=1
cd $PBS_O_WORKDIR
python run_libe_forces.py --comms local --nworkers 4