Skip to content

Commit

Permalink
Merge 4e9a1cf into 5ffea25
Browse files Browse the repository at this point in the history
  • Loading branch information
shuds13 committed Nov 18, 2019
2 parents 5ffea25 + 4e9a1cf commit 76a0747
Show file tree
Hide file tree
Showing 10 changed files with 232 additions and 20 deletions.
6 changes: 3 additions & 3 deletions docs/FAQ.rst
Original file line number Diff line number Diff line change
Expand Up @@ -126,9 +126,9 @@ to ``pdb``. How well this works varies by system::

**Can I use the MPI Job Controller when running libEnsemble with multiprocessing?**

Actually, yes! The job controller type only determines how launched jobs communicate
with libEnsemble, and is independent of ``comms`` chosen for manager-worker
communications.
Actually, yes! The job controller type only determines how libEnsemble workers
launch and interact with user applications, and is independent of ``comms`` chosen
for manager-worker communications.

macOS-specific Errors
---------------------
Expand Down
10 changes: 10 additions & 0 deletions docs/history_output.rst
Original file line number Diff line number Diff line change
Expand Up @@ -53,3 +53,13 @@ Other libEnsemble files produced by default are:
set to DEBUG. If this file is not removed, multiple runs will append output.
Messages at or above level MANAGER_WARNING are also copied to stderr to alert
the user promptly. For more info, see :doc:`Logging<logging>`.

Output Analysis
^^^^^^^^^^^^^^^
The ``postproc_scripts`` directory, in the libEnsemble project root directory,
contains scripts to compare outputs and create plots based on the ensemble output.

.. include:: ../postproc_scripts/readme.rst

.. include:: ../postproc_scripts/balsam/readme.rst

1 change: 1 addition & 0 deletions docs/platforms/platforms_index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -43,6 +43,7 @@ Read more about configuring and launching libEnsemble on some HPC systems:

bebop
theta
summit
example_scripts

.. _Balsam: https://balsam.readthedocs.io/en/latest/
138 changes: 138 additions & 0 deletions docs/platforms/summit.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,138 @@
======
Summit
======

Summit_ is an IBM AC922 system located at the Oak Ridge Leadership Computing Facility.
Each of the approximately 4,600 compute nodes on Summit contains two IBM POWER9 processors and six NVIDIA Volta V100 accelerators.

Summit features three tiers of nodes: login, launch, and compute nodes.
Users on login nodes submit batch runs to the launch nodes.
Launch nodes execute user batch-scripts to run on the compute nodes via ``jsrun``.

Configuring Python
------------------

Begin by loading the Python 3 Anaconda module::

$ module load python

You can now create your own custom Conda_ environment::

conda create --name myenv python=3.7
Now activate environment::

export PYTHONNOUSERSITE=1 # Make sure get python from conda env
. activate myenv
If you are installing any packages with extensions, ensure the correct compiler module
is loaded. If using mpi4py_, this must be installed from source, referencing the compiler.
At time of writing, mpi4py must be built with gcc::

module load gcc

With your environment activated::

CC=mpicc MPICC=mpicc pip install mpi4py --no-binary mpi4py


Installing libEnsemble
----------------------

Obtaining libEnsemble is now as simple as ``pip install libensemble``.
Your prompt should be similar to the following line:

.. code-block:: console
(my_env) user@login5:~$ pip install libensemble
.. note::
If you encounter pip errors, run ``python -m pip install --upgrade pip`` first


Job Submission
--------------

Summit uses LSF_ for job management and submission. For libEnsemble, the most
important command is ``bsub``, for submitting batch scripts from the login nodes
to execute on the Launch nodes.

It is recommended to run libEnsemble on the Launch nodes (assuming workers are submitting
MPI jobs) using ``local`` comm mode (multiprocessing). In the future, Balsam may be used
to run libEnsemble on compute nodes.

Interactive Runs
^^^^^^^^^^^^^^^^

Users can run interactively with ``bsub`` by specifying the ``-Is`` flag, similarly
to the following::

$ bsub -W 30 -P [project] -nnodes 8 -Is

This will place the user on a launch node. Then, to launch MPI jobs to the compute
nodes use ``jsrun`` where you would use ``mpirun``.

.. note::
You will need to re-activate your conda virtual environment.

Batch Runs
^^^^^^^^^^

Batch scripts specify run-settings using ``#BSUB`` statements. The following
simple example depicts configuring and launching libEnsemble to a launch node with
multiprocessing. This script also assumes the user is using the ``parse_args()``
convenience function within libEnsemble's ``utils.py``.

.. code-block:: bash
#!/bin/bash -x
#BSUB -P <project code>
#BSUB -J libe_mproc
#BSUB -W 60
#BSUB -nnodes 128
#BSUB -alloc_flags "smt1"
# --- Prepare Python ---
# Load conda module and gcc.
module load python
module load gcc
# Name of Conda environment
export CONDA_ENV_NAME=my_env
# Activate Conda environment
export PYTHONNOUSERSITE=1
source activate $CONDA_ENV_NAME
# --- Prepare libEnsemble ---
# Name of calling script
export EXE=calling_script.py
# Communication Method
export COMMS='--comms local'
# Number of workers.
export NWORKERS='--nworkers 128'
hash -r # Check no commands hashed (pip/python...)
# Launch libE
python $EXE $COMMS $NWORKERS > out.txt 2>&1
With this saved as ``myscript.sh``, allocating, configuring, and queueing
libEnsemble on Summit becomes::

$ bsub script myscript.sh

Additional Information
----------------------

See the OCLF guides_ on for more information about Summit.

.. _Summit: https://www.olcf.ornl.gov/for-users/system-user-guides/summit/
.. _LSF: https://www.olcf.ornl.gov/wp-content/uploads/2018/12/summit_workshop_fuson.pdf
.. _guides: https://www.olcf.ornl.gov/for-users/system-user-guides/summit/
.. _Conda: https://conda.io/en/latest/
.. _mpi4py: https://mpi4py.readthedocs.io/en/stable/
8 changes: 4 additions & 4 deletions docs/platforms/theta.rst
Original file line number Diff line number Diff line change
Expand Up @@ -6,8 +6,8 @@ Theta_ is a Cray XC40 system based on the second-generation Intel
Xeon Phi processor, available within ALCF_ at Argonne National Laboratory.

Theta features three tiers of nodes: login, MOM (Machine-Oriented Mini-server),
and compute nodes. Users on login nodes submit batch runs to the MOM nodes.
MOM nodes execute user batch-scripts to run on the compute nodes.
and compute nodes. Users on login nodes submit batch jobs to the MOM nodes.
MOM nodes execute user batch-scripts to run on the compute nodes via ``aprun``.

Configuring Python
------------------
Expand Down Expand Up @@ -119,7 +119,7 @@ Interactive Runs
Users can run interactively with ``qsub`` by specifying the ``-I`` flag, similarly
to the following::

$ qsub -A [project] -n 128 -q default -t 120 -I
$ qsub -A [project] -n 8 -q debug-cache-quad -t 60 -I

This will place the user on a MOM node. Then, to launch MPI jobs to the compute
nodes use ``aprun`` where you would use ``mpirun``.
Expand Down Expand Up @@ -196,7 +196,7 @@ Here is an example Balsam submission script:
#COBALT -O libE_test
#COBALT -n 128
#COBALT -q default
##COBALT -A [project]
#COBALT -A [project]
# Name of calling script
export EXE=calling_script.py
Expand Down
9 changes: 8 additions & 1 deletion docs/utilities.rst
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
Utilities
=========

libEnsemble features several modules and tools to assist in writing consistent
libEnsemble features a utilities module to assist in writing consistent
calling scripts and user functions.

Input consistency
Expand Down Expand Up @@ -40,3 +40,10 @@ Usage:
[--pwd [PWD]] [--worker_pwd [WORKER_PWD]]
[--worker_python [WORKER_PYTHON]]
[--tester_args [TESTER_ARGS [TESTER_ARGS ...]]]
Utilities API
-------------
.. automodule:: utils
:members:
:no-undoc-members:
8 changes: 5 additions & 3 deletions libensemble/tests/scaling_tests/forces/summit_submit_mproc.sh
Original file line number Diff line number Diff line change
Expand Up @@ -15,8 +15,11 @@
# Name of calling script-
export EXE=run_libe_forces.py

# Communication Method
export COMMS="--comms local"

# Number of workers.
#export NUM_WORKERS=4 # Optional if pass to script
export NWORKERS="--nworkers 4"

# Wallclock for libE. Slightly smaller than job wallclock
#export LIBE_WALLCLOCK=15 # Optional if pass to script
Expand All @@ -40,8 +43,7 @@ hash -r # Check no commands hashed (pip/python...)

# Launch libE.
#python $EXE $NUM_WORKERS $LIBE_WALLCLOCK > out.txt 2>&1
#python $EXE $NUM_WORKERS > out.txt 2>&1
python $EXE > out.txt 2>&1
python $EXE $COMMS $NWORKERS > out.txt 2>&1

if [[ $LIBE_PLOTS = "true" ]]; then
python $PLOT_DIR/plot_libe_calcs_util_v_time.py
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,8 @@ export LIBE_WALLCLOCK=25
export WORKFLOW_NAME=libe_workflow #sh - todo - may currently be hardcoded to this in libE - allow user to specify

#Tell libE manager to stop workers, dump timing.dat and exit after this time. Script must be set up to receive as argument.
export SCRIPT_ARGS=$(($LIBE_WALLCLOCK-5))
export SCRIPT_ARGS="--comms mpi --nworkers $NUM_WORKERS"
# export SCRIPT_ARGS=$(($LIBE_WALLCLOCK-5))
# export SCRIPT_ARGS='' #Default No args

# Name of Conda environment (Need to have set up: https://balsam.alcf.anl.gov/quick/quickstart.html)
Expand Down
7 changes: 5 additions & 2 deletions libensemble/tests/scaling_tests/forces/theta_submit_mproc.sh
Original file line number Diff line number Diff line change
Expand Up @@ -14,8 +14,11 @@
# Name of calling script
export EXE=run_libe_forces.py

# Communication Method
export COMMS="--comms local"

# Number of workers.
#export NUM_WORKERS=4 # Optional if pass to script
export NWORKERS="--nworkers 4"

# Wallclock for libE (allow clean shutdown)
#export LIBE_WALLCLOCK=25 # Optional if pass to script
Expand Down Expand Up @@ -43,7 +46,7 @@ export PYTHONNOUSERSITE=1
# Launch libE
#python $EXE $NUM_WORKERS $LIBE_WALLCLOCK > out.txt 2>&1
#python $EXE $NUM_WORKERS > out.txt 2>&1
python $EXE > out.txt 2>&1
python $EXE $COMMS $NWORKERS > out.txt 2>&1

if [[ $LIBE_PLOTS = "true" ]]; then
python $PLOT_DIR/plot_libe_calcs_util_v_time.py
Expand Down
62 changes: 56 additions & 6 deletions libensemble/utils.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
"""
libEnsemble utilities
============================================
=====================
"""

Expand Down Expand Up @@ -310,7 +310,11 @@ def _client_parse_args(args):


def parse_args():
"Unified parsing interface for regression test arguments"
"""Parses command line arguments.
:doc:`(See usage)<utilities>`
"""
args = parser.parse_args(sys.argv[1:])
front_ends = {
'mpi': _mpi_parse_args,
Expand All @@ -326,6 +330,35 @@ def parse_args():


def save_libE_output(H, persis_info, calling_file, nworkers):
"""
Writes out history array and persis_info to files.
Format: <user_script>_results_History_length=<history_length>_evals=<Completed evals>_ranks=<nworkers>
Parameters
----------
H: `NumPy structured array <https://docs.scipy.org/doc/numpy/user/basics.rec.html>`_
History array storing rows for each point.
:doc:`(example)<data_structures/history_array>`
persis_info: :obj:`dict`
Persistent information dictionary
:doc:`(example)<data_structures/persis_info>`
calling_file : :obj:`string`
Name of user calling script (or user chosen name) to prefix output files.
The convention is to send __file__ from user calling script.
nworkers: :obj:`int`
The number of workers in this ensemble. Added to output file names.
"""

script_name = os.path.splitext(os.path.basename(calling_file))[0]
short_name = script_name.split("test_", 1).pop()
filename = short_name + '_results_History_length=' + str(len(H)) \
Expand All @@ -341,10 +374,27 @@ def save_libE_output(H, persis_info, calling_file, nworkers):
# ===================== per-worker numpy random-streams ========================


def add_unique_random_streams(persis_info, size):
# Creates size random number streams for the libE manager and workers when
# size is num_workers + 1. Stream i is initialized with seed i.
for i in range(size):
def add_unique_random_streams(persis_info, nstreams):
"""Creates nstreams random number streams for the libE manager and workers
when nstreams is num_workers + 1. Stream i is initialized with seed i.
The entries are appended to the existing persis_info dictionary.
Parameters
----------
persis_info: :obj:`dict`
Persistent information dictionary
:doc:`(example)<data_structures/persis_info>`
nstreams: :obj:`int`
Number of independent random number streams to produce
"""

for i in range(nstreams):
if i in persis_info:
persis_info[i].update({
'rand_stream': np.random.RandomState(i),
Expand Down

0 comments on commit 76a0747

Please sign in to comment.