Libensemble · shuds13 · Feb 21, 2023 · Dec 7, 2022 · Feb 8, 2023 · Feb 8, 2023
diff --git a/docs/platforms/platforms_index.rst b/docs/platforms/platforms_index.rst
@@ -210,6 +210,7 @@ libEnsemble on specific HPC systems.
     bebop
     cori
     perlmutter
+    polaris
     spock/crusher <spock_crusher>
     summit
     theta

diff --git a/docs/platforms/polaris.rst b/docs/platforms/polaris.rst
@@ -0,0 +1,109 @@
+=======
+Polaris
+=======
+
+Polaris_ is a 560-node HPE system located in the ALCF_ at Argonne
+National Laboratory. The compute nodes are equipped with a single AMD EPYC Milan
+processor and four A100 NVIDIA GPUs. It uses the PBS scheduler to submit
+jobs from login nodes to run on the compute nodes.
+
+
+Configuring Python and Installation
+-----------------------------------
+
+Python and libEnsemble are available on Polaris with the `conda` module. Load the
+``conda`` module and activate the base environment::
+
+    module load conda
+    conda activate base
+
+This also gives you access to machine-optimized packages such as mpi4py_.
+
+To install further packages, including updating libEnsemble, you may either create
+a virtual environment on top of this (if just using ``pip install``) or clone the base
+environment (if you need ``conda install``). More details at `Python for Polaris`_.
+
+.. container:: toggle
+
+   .. container:: header
+
+      Example of Conda + virtual environment
+
+   E.g.,~  to create a virtual environment that allows installation of further packages::
+
+       python -m venv /path/to-venv --system-site-packages
+       . /path/to-venv/bin/activate
+
+   Where ``/path/to-venv`` can be anywhere you have write access. For future sessions,
+   just load the ``conda`` module and run the activate line.
+
+   You can now pip install libEnsemble::
+
+       pip install libensemble
+
+See :doc:`here<../advanced_installation>` for more information on advanced options
+for installing libEnsemble, including using Spack.
+
+
+Ensuring use of mpiexec
+-----------------------
+
+If using the :doc:`MPIExecutor<../executor/mpi_executor>` it is recommended to
+ensure you are using ``mpiexec`` instead of ``aprun``. When setting up the executor use::
+
+    from libensemble.executors.mpi_executor import MPIExecutor
+    exctr = MPIExecutor(custom_info={'mpi_runner':'mpich', 'runner_name':'mpiexec'})
+
+
+Job Submission
+--------------
+
+Polaris uses the PBS scheduler to submit jobs from login nodes to run on
+the compute nodes. libEnsemble runs on the compute nodes using either
+``multi-processing`` or ``mpi4py``
+
+A simple example batch script for a libEnsemble use case that runs 5 workers
+(e.g.,~ one persistent generator and four for simulations) on one node:
+
+.. code-block:: bash
+    :linenos:
+
+    #!/bin/bash
+    #PBS -A <myproject>
+    #PBS -lwalltime=00:15:00
+    #PBS -lselect=1
+    #PBS -q debug
+    #PBS -lsystem=polaris
+    #PBS -lfilesystems=home:grand
+
+    export MPICH_GPU_SUPPORT_ENABLED=1
+
+    cd $PBS_O_WORKDIR
+
+    python run_libe_forces.py --comms local --nworkers 5
+
+The script can be run with::
+
+    qsub submit_libe.sh
+
+Or you can run an interactive session with::
+
+    qsub -A <myproject> -l select=1 -l walltime=15:00 -lfilesystems=home:grand -qdebug -I
+
+You may need to reload your ``conda`` module and reactivate ``venv`` environment
+again after starting the interactive session.
+
+Demonstration
+-------------
+
+For an example that runs a small ensemble using a C application (offloading work to the
+GPU), see the :doc:`forces_gpu<../tutorials/forces_gpu_tutorial>` tutorial. A video demonstration_
+of this example is also available.
+
+
+.. _Polaris: https://www.alcf.anl.gov/polaris
+.. _ALCF: https://www.alcf.anl.gov/
+.. _Python for Polaris: https://www.alcf.anl.gov/support/user-guides/polaris/data-science-workflows/python/index.html
+.. _conda: https://conda.io/en/latest/
+.. _mpi4py: https://mpi4py.readthedocs.io/en/stable/
+.. _demonstration: https://youtu.be/Ff0dYYLQzoU
diff --git a/docs/tutorials/forces_gpu_tutorial.rst b/docs/tutorials/forces_gpu_tutorial.rst
@@ -3,36 +3,28 @@ Executor - Assign GPUs
 ======================
 
 This tutorial shows the most portable way to assign tasks (user applications)
-to the GPU.
+to the GPU. The libEnsemble scripts in this example are available under
+forces_gpu_ in the libEnsemble repository.
+
+This example is based on the
+:doc:`simple forces tutorial  <../tutorials/executor_forces_tutorial>` with
+a slightly modified simulation function (to assign GPUs) and an increased number
+of particles (allows live GPU usage to be viewed).
 
 In the first example, each worker will be using one GPU. We assume the workers are on a
 cluster with CUDA-capable GPUs. We will assign GPUs by setting the environment
 variable ``CUDA_VISIBLE_DEVICES``. An equivalent approach can be used with other
 devices.
 
-This example is based on the
-:doc:`simple forces tutorial  <../tutorials/executor_forces_tutorial>` with
-a slightly modified simulation function.
-
-To compile the forces application to use the GPU, ensure forces.c_ has the
-``#pragma omp target`` line uncommented and comment out the equivalent
-``#pragma omp parallel`` line. Then compile **forces.x** using one of the
-GPU build lines in build_forces.sh_ or similar for your platform.
-
-The libEnsemble scripts in this example are available under forces_gpu_ in
-the libEnsemble repository.
-
-Note that at the time of writing, the calling script **run_libe_forces.py** is functionally
-the same as that in *forces_simple*, but contains some commented out lines that can
-be used for a variable resources example. The *forces_simf.py* file has slight modifications
-to assign GPUs.
-
-Videos demonstrate running this example on Perlmutter_ and Spock_.
+Videos demonstrate running this example on Perlmutter_, Spock_, and Polaris_.
+*The first two videos are from an earlier release - you no longer need to change
+particle count or modify the `forces.c` file).*
 
 Simulation function
 -------------------
 
-The ``sim_f`` (``forces_simf.py``) becomes as follows. The new lines are highlighted:
+The ``sim_f`` (``forces_simf.py``) is as follows. The lines that are different
+to the forces simple example are highlighted:
 
 .. code-block:: python
     :linenos:
@@ -132,6 +124,16 @@ Alternative environment variables can be simply substituted in ``set_env_to_slot
     On some systems ``CUDA_VISIBLE_DEVICES`` may be overridden by other assignments
     such as ``--gpus-per-task=1``
 
+
+Compiling the Forces application
+--------------------------------
+
+First, compile the forces application under the ``forces_app`` directory.
+
+Compile **forces.x** using one of the GPU build lines in build_forces.sh_
+or similar for your platform.
+
+
 Running the example
 -------------------
 
@@ -141,22 +143,22 @@ eight workers. For example::
     python run_libe_forces.py --comms local --nworkers 8
 
 Note that if you are running one persistent generator that does not require
-resources, then assign nine workers, and fix the number of *resource_sets* in
-you calling script::
+resources, then assign nine workers and fix the number of *resource_sets* in
+your calling script::
 
     libE_specs["num_resource_sets"] = 8
 
 See :ref:`zero resource workers<zero_resource_workers>` for more ways to express this.
 
-Changing number of GPUs per worker
-----------------------------------
+Changing the number of GPUs per worker
+--------------------------------------
 
 If you want to have two GPUs per worker on the same system (four GPUs per node),
 you could assign only four workers, and change line 24 to::
 
     resources.set_env_to_slots("CUDA_VISIBLE_DEVICES", multiplier=2)
 
-In this case there are two GPUs per worker (and per slot).
+In this case, there are two GPUs per worker (and per slot).
 
 Varying resources
 -----------------
@@ -167,25 +169,29 @@ calling script.
 
 In the generator function, assign the ``resource_sets`` field of
 :ref:`H<funcguides-history>` for each point generated. For example
-if a larger simulation requires two MPI tasks (and two GPUs), set ``resource_sets``
+if a larger simulation requires two MPI tasks (and two GPUs), set the ``resource_sets``
 field to *2* for that sim_id in the generator function.
 
-The calling script run_libe_forces.py_ contains alternative commented out lines for
+The calling script run_libe_forces.py_ contains alternative commented-out lines for
 a variable resource example. Search for "Uncomment for var resources"
 
 In this case, the simulator function will still work, assigning one CPU processor
 and one GPU to each MPI rank. If you want to have one rank with multiple GPUs,
 then change source lines 29/30 accordingly.
 
-Further guidance on varying resource to workers can be found under the
-:doc:`resource manager<../resource_manager/resources_index>`.
+Further guidance on varying the resources assigned to workers can be found under the
+:doc:`resource manager<../resource_manager/resources_index>` section.
 
 Checking GPU usage
 ------------------
 
-You can check you are running forces on the GPUs as expected by using profiling tools and/or by using
-a monitoring utility. For NVIDIA GPUs, for example, the **Nsight** profiler is generally available
-and can be run from the command line. To simply run `forces.x` stand-alone you could run::
+The output of `forces.x` will say if it has run on the host or device. When running
+libEnsemble, this can be found under the ``ensemble`` directory.
+
+You can check you are running forces on the GPUs as expected by using profiling tools and/or
+by using a monitoring utility. For NVIDIA GPUs, for example, the **Nsight** profiler is
+generally available and can be run from the command line. To simply run `forces.x` stand-alone
+you could run::
 
     nsys profile --stats=true mpirun -n 2 ./forces.x
 
@@ -195,22 +201,17 @@ running (this may entail using *ssh* to get on to the node), and run::
     watch -n 0.1 nvidia-smi
 
 This will update GPU usage information every 0.1 seconds. You would need to ensure the code
-runs for long enough to register on the monitor, so lets try 100,000 particles::
+runs for long enough to register on the monitor, so let's try 100,000 particles::
 
     mpirun -n 2 ./forces.x 100000
 
 It is also recommended that you run without the profiler when using the `nvidia-smi` utility.
 
 This can also be used when running via libEnsemble, so long as you are on the node where the
-forces applications are being run. As the default particles in the forces example is 1000, you
-will need to to increase particles to see clear GPU usage in the live monitor. E.g.,~ in line 14
-to multiply the particles by 10::
-
-        # Parse out num particles, from generator function
-        particles = str(int(H["x"][0][0]) * 10)
+forces applications are being run.
 
-Alternative monitoring devices include ``rocm-smi`` (AMD) and ``intel_gpu_top`` (Intel). The latter
-does not need the *watch* command.
+Alternative monitoring devices include ``rocm-smi`` (AMD) and ``intel_gpu_top`` (Intel).
+The latter does not need the *watch* command.
 
 Example submission script
 -------------------------
@@ -242,4 +243,5 @@ resource conflicts on each node.
 .. _build_forces.sh: https://github.com/Libensemble/libensemble/blob/develop/libensemble/tests/scaling_tests/forces/forces_app/build_forces.sh
 .. _Perlmutter: https://www.youtube.com/watch?v=Av8ctYph7-Y
 .. _Spock: https://www.youtube.com/watch?v=XHXcslDORjU
+.. _Polaris: https://youtu.be/Ff0dYYLQzoU
 .. _run_libe_forces.py: https://github.com/Libensemble/libensemble/blob/develop/libensemble/tests/scaling_tests/forces/forces_gpu/run_libe_forces.py
diff --git a/examples/libE_submission_scripts/submit_pbs_simple.sh b/examples/libE_submission_scripts/submit_pbs_simple.sh
@@ -0,0 +1,10 @@
+#!/bin/bash -l
+#PBS -l select=2
+#PBS -l walltime=00:15:00
+#PBS -q <queue_name>
+#PBS -A <myproject>
+
+# We selected 2 nodes - now running with 8 workers.
+export MPICH_GPU_SUPPORT_ENABLED=1
+cd $PBS_O_WORKDIR
+python run_libe_forces.py --comms local --nworkers 8
diff --git a/libensemble/tests/scaling_tests/forces/submission_scripts/submit_pbs_simple.sh b/libensemble/tests/scaling_tests/forces/submission_scripts/submit_pbs_simple.sh
@@ -0,0 +1,10 @@
+#!/bin/bash -l
+#PBS -l select=1:system=polaris
+#PBS -l walltime=00:15:00
+#PBS -l filesystems=home:grand
+#PBS -q debug
+#PBS -A <myproject>
+
+export MPICH_GPU_SUPPORT_ENABLED=1
+cd $PBS_O_WORKDIR
+python run_libe_forces.py --comms local --nworkers 4
-Original file line number
+Diff line change
@@ Expand Up / @@ -210,6 +210,7 @@ libEnsemble on specific HPC systems. @@
         bebop
         cori
         perlmutter
+        polaris
         spock/crusher <spock_crusher>
         summit
         theta
@@ Expand Down @@