Skip to content

Commit

Permalink
Merge 608eb9f into 3cad97c
Browse files Browse the repository at this point in the history
  • Loading branch information
shuds13 committed Nov 27, 2019
2 parents 3cad97c + 608eb9f commit 591c5e4
Show file tree
Hide file tree
Showing 31 changed files with 107 additions and 57 deletions.
2 changes: 1 addition & 1 deletion README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -242,4 +242,4 @@ Resources
.. _tarball: https://github.com/Libensemble/libensemble/releases/latest
.. _Travis CI: https://travis-ci.org/Libensemble/libensemble
.. _user guide: https://libensemble.readthedocs.io/en/latest/user_guide.html
.. _xSDK Extreme-scale Scientific Software Development Kit: https://xsdk.info/
.. _xSDK Extreme-scale Scientific Software Development Kit: https://xsdk.info
2 changes: 1 addition & 1 deletion docs/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -91,7 +91,7 @@ def __getattr__(cls, name):

# General information about the project.
project = 'libEnsemble'
copyright = '2019'
copyright = '2019 Argonne National Laboratory'
author = 'Jeffrey Larson, Stephen Hudson, Stefan M. Wild, David Bindel and John-Luke Navarro'

# The version info for the project you're documenting, acts as replacement for
Expand Down
2 changes: 1 addition & 1 deletion docs/introduction_latex.rst
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,6 @@ Quickstart Guide
:end-before: after_resources_rst_tag

.. _Balsam: https://www.alcf.anl.gov/balsam
.. _common.py: https://github.com/Libensemble/libensemble/blob/develop/libensemble/tests/regression_tests/common.py
.. _Coveralls: https://coveralls.io/github/Libensemble/libensemble?branch=master
.. _GitHub: https://github.com/Libensemble/libensemble
.. _libEnsemble mailing list: https://lists.mcs.anl.gov/mailman/listinfo/libensemble
Expand All @@ -45,3 +44,4 @@ Quickstart Guide
.. _tarball: https://github.com/Libensemble/libensemble/releases/latest
.. _Travis CI: https://travis-ci.org/Libensemble/libensemble
.. _user guide: https://libensemble.readthedocs.io/en/latest/user_guide.html
.. _xSDK Extreme-scale Scientific Software Development Kit: https://xsdk.info
20 changes: 17 additions & 3 deletions docs/platforms/example_scripts.rst
Original file line number Diff line number Diff line change
Expand Up @@ -5,14 +5,14 @@ Below are some example job-submission scripts used to configure and launch libEn
on a variety of high-powered systems. See :doc:`here<platforms_index>` for more
information about the respective systems and configuration.

Bebop - Central Mode
Bebop - Central mode
--------------------

.. literalinclude:: ../../examples/job_submission_scripts/bebop_submit_slurm_centralmode.sh
:caption: /examples/job_submission_scripts/bebop_submit_slurm_centralmode.sh
:language: bash

Bebop - Distributed Mode
Bebop - Distributed mode
------------------------

.. literalinclude:: ../../examples/job_submission_scripts/bebop_submit_slurm.sh
Expand All @@ -26,9 +26,23 @@ Blues
:caption: /examples/job_submission_scripts/blues_script.pbs
:language: bash

Theta - Central Mode with Balsam
Theta - On MOM nodes with multiprocessing
-----------------------------------

.. literalinclude:: ../../examples/job_submission_scripts/theta_submit_mproc.sh
:caption: /examples/job_submission_scripts/theta_submit_mproc.sh
:language: bash

Theta - Central mode with Balsam
--------------------------------

.. literalinclude:: ../../examples/job_submission_scripts/theta_submit_balsam.sh
:caption: /examples/job_submission_scripts/theta_submit_balsam.sh
:language: bash

Summit - On launch nodes with multiprocessing
---------------------------------------------

.. literalinclude:: ../../examples/job_submission_scripts/summit_submit_mproc.sh
:caption: /examples/job_submission_scripts/summit_submit_mproc.sh
:language: bash
31 changes: 30 additions & 1 deletion docs/platforms/theta.rst
Original file line number Diff line number Diff line change
Expand Up @@ -78,6 +78,11 @@ Initialize a new database similarly to the following (from the Balsam docs):
Read Balsam's documentation here_.

.. note::
Balsam will create the run directories inside the data sub-directory within the database
directory. From here files can be staged out to the user directory (see the example
batch script below).

Job Submission
--------------

Expand Down Expand Up @@ -108,7 +113,7 @@ functions execute computationally expensive code, or code built for specific
architectures. Recall also that only the MOM nodes can launch MPI jobs.

Although libEnsemble workers on the MOM nodes can technically submit
user-applications to the compute nodes via ``aprun`` within user functions, it
user-applications to the compute nodes directly via ``aprun`` within user functions, it
is highly recommended that the aforementioned :doc:`job_controller<../job_controller/overview>`
interface is used instead. The libEnsemble job-controller features advantages like
automatic resource-detection, portability, launch failure resilience, and ease-of-use.
Expand All @@ -119,6 +124,20 @@ Theta features one default production queue, ``default``, and two debug queues,
.. note::
For the default queue, the minimum number of nodes to allocate at once is 128

Module and environment variables
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

To ensure proper functioning of libEnsemble, including the ability to kill running jobs, it
recommended that the following environment variable is set::

export PMI_NO_FORK=1

It is also recommended that the following environment modules are unloaded, if present::

module unload trackdeps
module unload darshan
module unload xalt

Interactive Runs
^^^^^^^^^^^^^^^^

Expand Down Expand Up @@ -183,6 +202,11 @@ convenience function from libEnsemble's :doc:`utils module<../utilities>`.
# Required for python kills on Theta
export PMI_NO_FORK=1
# Unload Theta modules that may interfere with job monitoring/kills
module unload trackdeps
module unload darshan
module unload xalt
python $EXE $COMMS $NWORKERS > out.txt 2>&1
With this saved as ``myscript.sh``, allocating, configuring, and queueing
Expand Down Expand Up @@ -232,6 +256,11 @@ Here is an example Balsam submission script:
# Required for python kills on Theta
export PMI_NO_FORK=1
# Unload Theta modules that may interfere with job monitoring/kills
module unload trackdeps
module unload darshan
module unload xalt
# Activate conda environment
. activate $CONDA_ENV_NAME
Expand Down

This file was deleted.

This file was deleted.

This file was deleted.

This file was deleted.

This file was deleted.

This file was deleted.

This file was deleted.

This file was deleted.

This file was deleted.

This file was deleted.

1 change: 0 additions & 1 deletion examples/calling_scripts/test_chwirut_pounders.py

This file was deleted.

This file was deleted.

1 change: 0 additions & 1 deletion examples/calling_scripts/test_fast_alloc.py

This file was deleted.

1 change: 0 additions & 1 deletion examples/calling_scripts/test_jobcontroller_hworld.py

This file was deleted.

1 change: 0 additions & 1 deletion examples/calling_scripts/test_nan_func_aposmm.py

This file was deleted.

5 changes: 1 addition & 4 deletions examples/job_submission_scripts/blues_script.pbs
Original file line number Diff line number Diff line change
Expand Up @@ -30,9 +30,6 @@
##PBS -q ivy
#PBS -q shared

export NLOPT_PYTHON_HOME="/home/jlarson/software/nlopt_install/lib/python2.7/site-packages"
export PYTHONPATH="${PYTHONPATH}:${NLOPT_PYTHON_HOME}"

cd $PBS_O_WORKDIR

# A little useful information for the log file...
Expand All @@ -55,7 +52,7 @@ cat $PBS_NODEFILE | sort | uniq >> libE_machinefile
echo Starting executation at: `date`

pwd
cmd="mpiexec -np 5 -machinefile libE_machinefile python2 libE_calling_script.py libE_machinefile"
cmd="mpiexec -np 5 -machinefile libE_machinefile python libE_calling_script.py libE_machinefile"
# This note that this command passes the libE_machinefile to both MPI and the
# libE_calling_script, in the latter script, it can be parsed and given to the
# alloc_func
Expand Down
11 changes: 6 additions & 5 deletions examples/job_submission_scripts/summit_submit_mproc.sh
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@
# - Workers submit jobs to the nodes in the job available.

# Name of calling script-
export EXE=run_libe_forces.py
export EXE=libE_calling_script.py

# Communication Method
export COMMS="--comms local"
Expand All @@ -22,7 +22,7 @@ export COMMS="--comms local"
export NWORKERS="--nworkers 4"

# Wallclock for libE. (allow clean shutdown)
#export LIBE_WALLCLOCK=25 # Optional if pass to script
export LIBE_WALLCLOCK=25 # Optional if pass to script

# Name of Conda environment
export CONDA_ENV_NAME=<conda_env_name>
Expand All @@ -38,6 +38,7 @@ export PYTHONNOUSERSITE=1
# hash -d python # Check pick up python in conda env
hash -r # Check no commands hashed (pip/python...)

# Launch libE.
#python $EXE $LIBE_WALLCLOCK > out.txt 2>&1 # If user script takes wall-clock as positional arg.
python $EXE $COMMS $NWORKERS > out.txt 2>&1 # If script is using utils.parse_args()
# Launch libE
# python $EXE $NUM_WORKERS > out.txt 2>&1 # No args. All defined in calling script
# python $EXE $COMMS $NWORKERS > out.txt 2>&1 # If calling script is using utils.parse_args()
python $EXE $LIBE_WALLCLOCK $COMMS $NWORKERS > out.txt 2>&1 # If calling script takes wall-clock as positional arg.
12 changes: 5 additions & 7 deletions examples/job_submission_scripts/theta_submit_balsam.sh
Original file line number Diff line number Diff line change
Expand Up @@ -32,11 +32,9 @@ export WORKFLOW_NAME=libe_workflow
export LIBE_WALLCLOCK=$(($BALSAM_WALLCLOCK-3))

# libEnsemble calling script arguments
# export SCRIPT_ARGS='' # No args
# export SCRIPT_ARGS='$LIBE_WALLCLOCK' # If calling script takes wall-clock as positional arg.

# If script is using utilsparse_args() and takes wall-clock as positional arg.
export SCRIPT_ARGS="--comms mpi --nworkers $NUM_WORKERS $LIBE_WALLCLOCK"
# export SCRIPT_ARGS='' # No args. All defined in calling script
# export SCRIPT_ARGS="--comms mpi --nworkers $NUM_WORKERS # If calling script is using utils.parse_args()
export SCRIPT_ARGS="$LIBE_WALLCLOCK --comms mpi --nworkers $NUM_WORKERS" # If calling script takes wall-clock as positional arg.

# Name of Conda environment (Need to have set up: https://balsam.readthedocs.io/en/latest/userguide/getting-started.html)
export CONDA_ENV_NAME=<conda_env_name>
Expand All @@ -45,8 +43,8 @@ export CONDA_ENV_NAME=<conda_env_name>
export DBASE_NAME=<dbase_name> # default - to use default database.

# Conda location - theta specific
# export PATH=/opt/intel/python/2017.0.035/intelpython35/bin:$PATH
# export LD_LIBRARY_PATH=~/.conda/envs/$CONDA_ENV_NAME/lib:$LD_LIBRARY_PATH
export PATH=/opt/intel/python/2017.0.035/intelpython35/bin:$PATH
export LD_LIBRARY_PATH=~/.conda/envs/$CONDA_ENV_NAME/lib:$LD_LIBRARY_PATH

export PYTHONNOUSERSITE=1 # Ensure environment isolated

Expand Down
12 changes: 6 additions & 6 deletions examples/job_submission_scripts/theta_submit_mproc.sh
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@
# - Workers submit jobs to the compute nodes in the node allocation available.

# Name of calling script
export EXE=run_libe_forces.py
export EXE=libE_calling_script.py

# Communication Method
export COMMS="--comms local"
Expand All @@ -23,14 +23,14 @@ export COMMS="--comms local"
export NWORKERS="--nworkers 4"

# Wallclock for libE (allow clean shutdown)
#export LIBE_WALLCLOCK=25 # Optional if pass to script
export LIBE_WALLCLOCK=25 # Optional if pass to script

# Name of Conda environment
export CONDA_ENV_NAME=<conda_env_name>

# Conda location - theta specific
export PATH=/opt/intel/python/2017.0.035/intelpython35/bin:$PATH
export LD_LIBRARY_PATH=~/.conda/envs/balsam/lib:$LD_LIBRARY_PATH
export LD_LIBRARY_PATH=~/.conda/envs/$CONDA_ENV_NAME/lib:$LD_LIBRARY_PATH
export PMI_NO_FORK=1 # Required for python kills on Theta

# Unload Theta modules that may interfere with job monitoring/kills
Expand All @@ -43,6 +43,6 @@ export PYTHONNOUSERSITE=1
. activate $CONDA_ENV_NAME

# Launch libE
#python $EXE $NUM_WORKERS > out.txt 2>&1 # No args
#python $EXE $NUM_WORKERS $LIBE_WALLCLOCK > out.txt 2>&1 # If user script takes wall-clock as positional arg.
python $EXE $COMMS $NWORKERS > out.txt 2>&1 # If script is using utils.parse_args()
# python $EXE $NUM_WORKERS > out.txt 2>&1 # No args. All defined in calling script
# python $EXE $COMMS $NWORKERS > out.txt 2>&1 # If calling script is using utils.parse_args()
python $EXE $LIBE_WALLCLOCK $COMMS $NWORKERS > out.txt 2>&1 # If calling script takes wall-clock as positional arg.
3 changes: 1 addition & 2 deletions libensemble/resources.py
Original file line number Diff line number Diff line change
Expand Up @@ -158,8 +158,7 @@ def get_MPI_variant():
if 'unrecognized argument npernode' in stdout.decode():
return 'mpich'
return 'openmpi'
except Exception as e:
print('Testing: Error on MPI command: {}'.format(e))
except Exception:
pass

try:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -48,5 +48,5 @@ python $EXE $COMMS $NWORKERS > out.txt 2>&1
if [[ $LIBE_PLOTS = "true" ]]; then
python $PLOT_DIR/plot_libe_calcs_util_v_time.py
python $PLOT_DIR/plot_libe_runs_util_v_time.py
python $PLOT_DIR/plot_libE_histogram.py
python $PLOT_DIR/plot_libe_histogram.py
fi
Original file line number Diff line number Diff line change
Expand Up @@ -92,7 +92,7 @@ balsam launcher --consume-all --job-mode=mpi --num-transition-threads=1
if [[ $LIBE_PLOTS = "true" ]]; then
python $PLOT_DIR/plot_libe_calcs_util_v_time.py
python $PLOT_DIR/plot_libe_runs_util_v_time.py
python $PLOT_DIR/plot_libE_histogram.pyfi
python $PLOT_DIR/plot_libe_histogram.pyfi

if [[ $BALSAM_PLOTS = "true" ]]; then
# export MPLBACKEND=TkAgg
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -51,5 +51,5 @@ python $EXE $COMMS $NWORKERS > out.txt 2>&1
if [[ $LIBE_PLOTS = "true" ]]; then
python $PLOT_DIR/plot_libe_calcs_util_v_time.py
python $PLOT_DIR/plot_libe_runs_util_v_time.py
python $PLOT_DIR/plot_libE_histogram.py
python $PLOT_DIR/plot_libe_histogram.py
fi

0 comments on commit 591c5e4

Please sign in to comment.