Skip to content

Commit

Permalink
Merge 0b8af7d into 5ffea25
Browse files Browse the repository at this point in the history
  • Loading branch information
shuds13 committed Nov 19, 2019
2 parents 5ffea25 + 0b8af7d commit f450f46
Show file tree
Hide file tree
Showing 12 changed files with 273 additions and 62 deletions.
6 changes: 3 additions & 3 deletions docs/FAQ.rst
Original file line number Diff line number Diff line change
Expand Up @@ -126,9 +126,9 @@ to ``pdb``. How well this works varies by system::

**Can I use the MPI Job Controller when running libEnsemble with multiprocessing?**

Actually, yes! The job controller type only determines how launched jobs communicate
with libEnsemble, and is independent of ``comms`` chosen for manager-worker
communications.
Actually, yes! The job controller type only determines how libEnsemble workers
launch and interact with user applications, and is independent of ``comms`` chosen
for manager-worker communications.

macOS-specific Errors
---------------------
Expand Down
9 changes: 9 additions & 0 deletions docs/history_output.rst
Original file line number Diff line number Diff line change
Expand Up @@ -53,3 +53,12 @@ Other libEnsemble files produced by default are:
set to DEBUG. If this file is not removed, multiple runs will append output.
Messages at or above level MANAGER_WARNING are also copied to stderr to alert
the user promptly. For more info, see :doc:`Logging<logging>`.

Output Analysis
^^^^^^^^^^^^^^^
The ``postproc_scripts`` directory, in the libEnsemble project root directory,
contains scripts to compare outputs and create plots based on the ensemble output.

.. include:: ../postproc_scripts/readme.rst

.. include:: ../postproc_scripts/balsam/readme.rst
1 change: 0 additions & 1 deletion docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,6 @@
Quickstart<introduction>
overview_usecases
programming_libE
utilities
platforms/platforms_index

.. toctree::
Expand Down
1 change: 1 addition & 0 deletions docs/platforms/platforms_index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -43,6 +43,7 @@ Read more about configuring and launching libEnsemble on some HPC systems:

bebop
theta
summit
example_scripts

.. _Balsam: https://balsam.readthedocs.io/en/latest/
136 changes: 136 additions & 0 deletions docs/platforms/summit.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,136 @@
======
Summit
======

Summit_ is an IBM AC922 system located at the Oak Ridge Leadership Computing Facility.
Each of the approximately 4,600 compute nodes on Summit contains two IBM POWER9 processors and six NVIDIA Volta V100 accelerators.

Summit features three tiers of nodes: login, launch, and compute nodes.
Users on login nodes submit batch runs to the launch nodes.
Launch nodes execute user batch-scripts to run on the compute nodes via ``jsrun``.

Configuring Python
------------------

Begin by loading the Python 3 Anaconda module::

$ module load python

You can now create your own custom Conda_ environment::

conda create --name myenv python=3.7

Now activate environment::

export PYTHONNOUSERSITE=1 # Make sure get python from conda env
. activate myenv

If you are installing any packages with extensions, ensure the correct compiler module
is loaded. If using mpi4py_, this must be installed from source, referencing the compiler.
At time of writing, mpi4py must be built with gcc::

module load gcc

With your environment activated::

CC=mpicc MPICC=mpicc pip install mpi4py --no-binary mpi4py

Installing libEnsemble
----------------------

Obtaining libEnsemble is now as simple as ``pip install libensemble``.
Your prompt should be similar to the following line:

.. code-block:: console
(my_env) user@login5:~$ pip install libensemble
.. note::
If you encounter pip errors, run ``python -m pip install --upgrade pip`` first

Job Submission
--------------

Summit uses LSF_ for job management and submission. For libEnsemble, the most
important command is ``bsub``, for submitting batch scripts from the login nodes
to execute on the Launch nodes.

It is recommended to run libEnsemble on the Launch nodes (assuming workers are submitting
MPI jobs) using ``local`` comm mode (multiprocessing). In the future, Balsam may be used
to run libEnsemble on compute nodes.

Interactive Runs
^^^^^^^^^^^^^^^^

Users can run interactively with ``bsub`` by specifying the ``-Is`` flag, similarly
to the following::

$ bsub -W 30 -P [project] -nnodes 8 -Is

This will place the user on a launch node. Then, to launch MPI jobs to the compute
nodes use ``jsrun`` where you would use ``mpirun``.

.. note::
You will need to re-activate your conda virtual environment.

Batch Runs
^^^^^^^^^^

Batch scripts specify run-settings using ``#BSUB`` statements. The following
simple example depicts configuring and launching libEnsemble to a launch node with
multiprocessing. This script also assumes the user is using the ``parse_args()``
convenience function within libEnsemble's ``utils.py``.

.. code-block:: bash
#!/bin/bash -x
#BSUB -P <project code>
#BSUB -J libe_mproc
#BSUB -W 60
#BSUB -nnodes 128
#BSUB -alloc_flags "smt1"
# --- Prepare Python ---
# Load conda module and gcc.
module load python
module load gcc
# Name of Conda environment
export CONDA_ENV_NAME=my_env
# Activate Conda environment
export PYTHONNOUSERSITE=1
source activate $CONDA_ENV_NAME
# --- Prepare libEnsemble ---
# Name of calling script
export EXE=calling_script.py
# Communication Method
export COMMS='--comms local'
# Number of workers.
export NWORKERS='--nworkers 128'
hash -r # Check no commands hashed (pip/python...)
# Launch libE
python $EXE $COMMS $NWORKERS > out.txt 2>&1
With this saved as ``myscript.sh``, allocating, configuring, and queueing
libEnsemble on Summit becomes::

$ bsub script myscript.sh

Additional Information
----------------------

See the OCLF guides_ on for more information about Summit.

.. _Summit: https://www.olcf.ornl.gov/for-users/system-user-guides/summit/
.. _LSF: https://www.olcf.ornl.gov/wp-content/uploads/2018/12/summit_workshop_fuson.pdf
.. _guides: https://www.olcf.ornl.gov/for-users/system-user-guides/summit/
.. _Conda: https://conda.io/en/latest/
.. _mpi4py: https://mpi4py.readthedocs.io/en/stable/
8 changes: 4 additions & 4 deletions docs/platforms/theta.rst
Original file line number Diff line number Diff line change
Expand Up @@ -6,8 +6,8 @@ Theta_ is a Cray XC40 system based on the second-generation Intel
Xeon Phi processor, available within ALCF_ at Argonne National Laboratory.

Theta features three tiers of nodes: login, MOM (Machine-Oriented Mini-server),
and compute nodes. Users on login nodes submit batch runs to the MOM nodes.
MOM nodes execute user batch-scripts to run on the compute nodes.
and compute nodes. Users on login nodes submit batch jobs to the MOM nodes.
MOM nodes execute user batch-scripts to run on the compute nodes via ``aprun``.

Configuring Python
------------------
Expand Down Expand Up @@ -119,7 +119,7 @@ Interactive Runs
Users can run interactively with ``qsub`` by specifying the ``-I`` flag, similarly
to the following::

$ qsub -A [project] -n 128 -q default -t 120 -I
$ qsub -A [project] -n 8 -q debug-cache-quad -t 60 -I

This will place the user on a MOM node. Then, to launch MPI jobs to the compute
nodes use ``aprun`` where you would use ``mpirun``.
Expand Down Expand Up @@ -196,7 +196,7 @@ Here is an example Balsam submission script:
#COBALT -O libE_test
#COBALT -n 128
#COBALT -q default
##COBALT -A [project]
#COBALT -A [project]
# Name of calling script
export EXE=calling_script.py
Expand Down
3 changes: 3 additions & 0 deletions docs/programming_libE.rst
Original file line number Diff line number Diff line change
Expand Up @@ -14,3 +14,6 @@ Programming with libEnsemble
.. toctree::
job_controller/jc_index
logging

.. toctree::
utilities
41 changes: 4 additions & 37 deletions docs/utilities.rst
Original file line number Diff line number Diff line change
@@ -1,42 +1,9 @@
Utilities
=========

libEnsemble features several modules and tools to assist in writing consistent
libEnsemble features a utilities module to assist in writing consistent
calling scripts and user functions.

Input consistency
-----------------

Users can check the formatting and consistency of ``exit_criteria`` and each
``specs`` dictionary with the ``check_inputs()`` function from the ``utils``
module. Provide any combination of these data structures as keyword arguments.
For example::

from libensemble.utils import check_inputs
check_inputs(sim_specs=my-sim_specs, gen_specs=my-gen_specs, exit_criteria=ec)

Parameters as command-line arguments
------------------------------------

The ``parse_args()`` function can be used to pass common libEnsemble parameters as
command-line arguments.

In your calling script::

from libensemble.utils import parse_args
nworkers, is_master, libE_specs, misc_args = parse_args()

From the shell, for example::

$ python calling_script --comms local --nworkers 4

Usage:

.. code-block:: bash
usage: test_... [-h] [--comms [{local,tcp,ssh,client,mpi}]]
[--nworkers [NWORKERS]] [--workers WORKERS [WORKERS ...]]
[--workerID [WORKERID]] [--server SERVER SERVER SERVER]
[--pwd [PWD]] [--worker_pwd [WORKER_PWD]]
[--worker_python [WORKER_PYTHON]]
[--tester_args [TESTER_ARGS [TESTER_ARGS ...]]]
.. automodule:: utils
:members:
:no-undoc-members:
8 changes: 5 additions & 3 deletions libensemble/tests/scaling_tests/forces/summit_submit_mproc.sh
Original file line number Diff line number Diff line change
Expand Up @@ -15,8 +15,11 @@
# Name of calling script-
export EXE=run_libe_forces.py

# Communication Method
export COMMS="--comms local"

# Number of workers.
#export NUM_WORKERS=4 # Optional if pass to script
export NWORKERS="--nworkers 4"

# Wallclock for libE. Slightly smaller than job wallclock
#export LIBE_WALLCLOCK=15 # Optional if pass to script
Expand All @@ -40,8 +43,7 @@ hash -r # Check no commands hashed (pip/python...)

# Launch libE.
#python $EXE $NUM_WORKERS $LIBE_WALLCLOCK > out.txt 2>&1
#python $EXE $NUM_WORKERS > out.txt 2>&1
python $EXE > out.txt 2>&1
python $EXE $COMMS $NWORKERS > out.txt 2>&1

if [[ $LIBE_PLOTS = "true" ]]; then
python $PLOT_DIR/plot_libe_calcs_util_v_time.py
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,8 @@ export LIBE_WALLCLOCK=25
export WORKFLOW_NAME=libe_workflow #sh - todo - may currently be hardcoded to this in libE - allow user to specify

#Tell libE manager to stop workers, dump timing.dat and exit after this time. Script must be set up to receive as argument.
export SCRIPT_ARGS=$(($LIBE_WALLCLOCK-5))
export SCRIPT_ARGS="--comms mpi --nworkers $NUM_WORKERS"
# export SCRIPT_ARGS=$(($LIBE_WALLCLOCK-5))
# export SCRIPT_ARGS='' #Default No args

# Name of Conda environment (Need to have set up: https://balsam.alcf.anl.gov/quick/quickstart.html)
Expand Down
7 changes: 5 additions & 2 deletions libensemble/tests/scaling_tests/forces/theta_submit_mproc.sh
Original file line number Diff line number Diff line change
Expand Up @@ -14,8 +14,11 @@
# Name of calling script
export EXE=run_libe_forces.py

# Communication Method
export COMMS="--comms local"

# Number of workers.
#export NUM_WORKERS=4 # Optional if pass to script
export NWORKERS="--nworkers 4"

# Wallclock for libE (allow clean shutdown)
#export LIBE_WALLCLOCK=25 # Optional if pass to script
Expand Down Expand Up @@ -43,7 +46,7 @@ export PYTHONNOUSERSITE=1
# Launch libE
#python $EXE $NUM_WORKERS $LIBE_WALLCLOCK > out.txt 2>&1
#python $EXE $NUM_WORKERS > out.txt 2>&1
python $EXE > out.txt 2>&1
python $EXE $COMMS $NWORKERS > out.txt 2>&1

if [[ $LIBE_PLOTS = "true" ]]; then
python $PLOT_DIR/plot_libe_calcs_util_v_time.py
Expand Down

0 comments on commit f450f46

Please sign in to comment.