Skip to content

Commit

Permalink
Merge branch 'develop' into docs/spellcheck
Browse files Browse the repository at this point in the history
  • Loading branch information
jmlarson1 committed Dec 11, 2019
2 parents 67b8932 + fa57caa commit f144322
Show file tree
Hide file tree
Showing 3 changed files with 77 additions and 11 deletions.
2 changes: 1 addition & 1 deletion docs/job_controller/jc_index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ Job Controller

The job controller can be used within the simulator (and potentially generator)
functions to provide a simple, portable interface for running and managing user
jobs.
applications.

.. toctree::
:maxdepth: 2
Expand Down
80 changes: 70 additions & 10 deletions docs/platforms/cori.rst
Original file line number Diff line number Diff line change
Expand Up @@ -16,38 +16,94 @@ Begin by loading the Python 3 Anaconda_ module::
module load python/3.7-anaconda-2019.07

In many cases this may provide all the dependent packages you need (including
mpi4py). Then libEnsemble can be installed locally::
mpi4py). Note that these packages are installed under the ``/global/common``
file system. This performs best for imported Python packages.

Installing libEnsemble
----------------------

Having loaded the Anaconda Python module, libEnsemble can be installed
by one of the following ways.

1. External pip installation
^^^^^^^^^^^^^^^^^^^^^^^^^^^^

libEnsemble can be installed locally either with::

export PYTHONNOUSERSITE=0
pip install libensemble --user

Alternatively, you can create your own Conda_ environment in which to install
libEnsemble and all dependencies. If using ``mpi4py``, installation will need
to be done using the `specific instructions from NERSC`_. libEnsemble can then
be pip installed into the environment.
Or, if you have a project directory under ``/global/common/software`` it is
recommended to pip install there, for performance::

export PREFIX_PATH=/global/common/software/<project_name>/packages
pip install --install-option="--prefix=$PREFIX_PATH" libensemble

For the latter option, to ensure you pick up from this install you will need
to prepend to your ``PYTHONPATH`` when running (check the exact ``pythonX.Y`` version)::

export PYTHONPATH=$PREFIX_PATH/lib/<pythonX.Y>/site-packages:$PYTHONPATH

If libEnsemble is not found, ensure that local paths are being used with::

export PYTHONNOUSERSITE=0

2. Create a Conda environment
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

As an alternative to using an external pip install, you can create your own
Conda_ environment in which to install libEnsemble and all dependencies.
If using ``mpi4py``, installation will need to be done using the
`specific instructions from NERSC`_. libEnsemble can then be pip installed
into the environment.

.. code-block:: console
(my_env) user@cori07:~$ pip install libensemble
Again, it is preferable to create your Conda environment under the ``common``
file system. This can be done by modifying your ``~/.condarc`` file.
For example, add the lines::

envs_dirs:
- /path/to/my/conda_envs
env_prompt: ({name})

The env_prompt line ensures the whole directory path is not prepended to
your prompt (The ({name}) here is literal, do not substitute).

If highly parallel runs experience long start-up delays consider the NERSC
documentation on `scaling Python`_.

Job Submission
--------------

Cori uses Slurm_ for job submission and management. The two commands you'll
likely use the most to run jobs are ``srun`` and ``sbatch`` for running
likely use the most to initiate jobs are ``salloc`` and ``sbatch`` for running
interactively and batch, respectively. libEnsemble runs on the compute nodes
on Cori using either ``multi-processing`` or ``mpi4py``.

.. note::
While it is possible to submit jobs from the user ``$HOME`` file system, this
is likely to perform very poorly, especially for large ensembles. Users
should preferably submit their calling script from the user
$SCRATCH (``/global/cscratch1/sd/<YourUserName>``) directory (fastest but
regularly purged) or the project directory (``/project/projectdirs/<project_name>/``).
You cannot run and create output under the ``/global/common/`` file system
as this is read-only from compute nodes, but any imported codes (including
libEnsemble and gen/sim functions) are best imported from there, especially
when running at scale.
See instructions in `scaling Python`_ for more information.

Interactive Runs
^^^^^^^^^^^^^^^^

You can allocate four Knights Landing nodes for thirty minutes through the following::

salloc -N 4 -C knl -q interactive -t 00:30:00

Ensure that the Python 3 Anaconda module module is loaded. If you have installed
libEnsemble under the ``common`` file system, ensure ``PYTHONPATH`` is set (as above).

With your nodes allocated, queue your job to start with four MPI ranks::

srun --ntasks 4 --nodes=1 python calling.py
Expand All @@ -57,7 +113,7 @@ allocated compute node, with three nodes available for the workers to launch
user applications with the job-controller or a job-launch command.

This is an example of running in :doc:`centralized<platforms_index>` mode and,
if using the :doc:`job_controller<../job_controller/mpi_controller>`, it should
if using the :doc:`job_controller<../job_controller/jc_index>`, it should
be initiated with ``central_mode=True``. libEnsemble must be run in central mode
on Cori as jobs cannot share nodes.

Expand All @@ -66,7 +122,7 @@ Batch Runs

Batch scripts specify run-settings using ``#SBATCH`` statements. A simple example
for a libEnsemble use-case running in :doc:`centralized<platforms_index>` MPI
mode on KNL nodes resembles the following:
mode on KNL nodes resembles the following (add ``PYTHONPATH`` lines if necessary):

.. code-block:: bash
:linenos:
Expand All @@ -81,6 +137,8 @@ mode on KNL nodes resembles the following:
#SBATCH -t 00:15:00
#SBATCH -C knl
module load python/3.7-anaconda-2019.07
# Run libEnsemble (manager and 4 workers) on one node
# leaving 4 nodes for worker launched applications.
srun --ntasks 5 --nodes=1 python calling_script.py
Expand All @@ -90,7 +148,7 @@ on Cori becomes::

sbatch myscript.sh

If you wish to run in multi-processing (local) mode instead of using mpi4py,
If you wish to run in multi-processing (local) mode instead of using ``mpi4py``,
and your calling script uses the :doc:`parse_args()<../utilities>` function,
then the run line in the above script would be::

Expand All @@ -114,6 +172,8 @@ user application. libEnsemble could be run on more than one node, but here the
#SBATCH -t 01:00:00
#SBATCH -C knl
module load python/3.7-anaconda-2019.07
# Run libEnsemble (manager and 128 workers) on one node
# leaving 256 nodes for worker launched applications.
srun --overcommit --ntasks 129 --nodes=1 python calling_script.py
Expand Down
6 changes: 6 additions & 0 deletions examples/job_submission_scripts/cori_submit.sh
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,12 @@
export EXE=libE_calling_script.py
export NUM_WORKERS=4

# Ensure anaconda Python module is loaded
module load python/3.7-anaconda-2019.07

# If libensemble is installed under common (set to your install location and python version)
export PYTHONPATH=/global/common/software/<my_project>/<user_name>/packages/lib/python3.7/site-packages:$PYTHONPATH

# Overcommit will allow ntasks up to the no. of contexts on one node (eg. 320 on Bebop)
srun --overcommit --ntasks=$(($NUM_WORKERS+1)) --nodes=1 python $EXE

Expand Down

0 comments on commit f144322

Please sign in to comment.