Skip to content

Latest commit

 

History

History
218 lines (142 loc) · 7.59 KB

quickstart.rst

File metadata and controls

218 lines (142 loc) · 7.59 KB

libEnsemble

image

image

image

Documentation Status

What is libEnsemble?

libEnsemble is a Python library to coordinate the concurrent evaluation of ensembles of computations. Designed with flexibility in mind, libEnsemble can utilize massively parallel resources to accelerate the solution of design, decision, and inference problems.

A visual overview is given in the libEnsemble poster.

libEnsemble aims for:

  • Extreme scaling
  • Resilience/Fault tolerance
  • Monitoring/killing jobs (and recovering resources)
  • Portability and flexibility
  • Exploitation of persistent data/control flow.

The user selects or supplies a generation function that produces simulation input as well as a simulation function that performs and monitors the simulations. The generation function may contain, for example, an optimization method to generate new simulation parameters on-the-fly and based on the results of previous simulations. Examples and templates of these functions are included in the library.

libEnsemble employs a manager-worker scheme that can run on various communication media (including MPI, multiprocessing, and TCP). Each worker can control and monitor any type of job from small sub-node jobs to huge many-node simulations. A job controller interface is provided to ensure scripts are portable, resilient and flexible; it also enables automatic detection of the nodes and cores in a system and can split up jobs automatically if nodes/cores are not supplied.

Dependencies

Required dependencies:

For libEnsemble running with the mpi4py parallelism:

  • A functional MPI 1.x/2.x/3.x implementation such as MPICH built with shared/dynamic libraries.
  • mpi4py v2.0.0 or above

Optional dependency:

From v0.2.0, libEnsemble has the option of using the Balsam job manager. This is required for running libEnsemble on the compute nodes of some supercomputing platforms (e.g., Cray XC40); platforms that do not support launching jobs from compute nodes. Note that as of v0.5.0, libEnsemble can also be run on the launch nodes using multiprocessing.

The example sim and gen functions and tests require the following dependencies:

PETSc and NLopt must be built with shared libraries enabled and present in sys.path (e.g., via setting the PYTHONPATH environment variable). NLopt should produce a file nlopt.py if Python is found on the system.

Installation

Use pip to install libEnsemble and its dependencies:

pip install libensemble

libEnsemble is also available in the Spack distribution. It can be installed from Spack with:

spack install py-libensemble

The tests and examples can be accessed in the github repository. A tarball of the most recent release is also available.

Testing

The provided test suite includes both unit and regression tests and is run regularly on:

The test suite requires the mock, pytest, pytest-cov and pytest-timeout packages to be installed and can be run from the libensemble/tests directory of the source distribution by running:

./run-tests.sh

To clean the test repositories run:

./run-tests.sh -c

Further options are available. To see a complete list of options run:

./run-tests.sh -h

Coverage reports are produced separately for unit tests and regression tests under the relevant directories. For parallel tests, the union of all processors is taken. Furthermore, a combined coverage report is created at the top level, which can be viewed after running the tests via the html file libensemble/tests/cov_merge/index.html. The Travis CI coverage results are given online at Coveralls.

Note: The job_controller tests can be run using the direct-launch or Balsam job controllers. However, currently only the direct-launch versions can be run on Travis CI, which reduces the test coverage results.

Basic Usage

The examples directory contains example libEnsemble calling scripts, sim functions, gen functions, alloc functions and job submission scripts.

The user will create a python script to call the libEnsemble libE<libE_module> function. This must supply the sim_specs<datastruct-sim-specs> and gen_specs<datastruct-gen-specs>, and optionally libE_specs<datastruct-libe-specs>, alloc_specs<datastruct-alloc-specs> and persis_info<datastruct-persis-info>.

The default manager/worker communications mode is MPI. The user script is launched as:

mpiexec -np N python myscript.py

where N is the number of processors. This will launch one manager and N-1 workers.

If running in local mode, which uses Python's multiprocessing module, the 'local' comms option and the number of workers must be specified in libE_specs<datastruct-libe-specs>. The script can then be run as a regular python script:

python myscript.py

When specifying these options via command line options, one may use the parse_args function used in the regression tests, which can be found in libensemble/tests/regression_tests/common.py

See the user-guide for more information.

Documentation

Citing libEnsemble

Please use the following to cite libEnsemble in a publication:

@techreport{libEnsemble,
  author      = {Stephen Hudson and Jeffrey Larson and Stefan M. Wild and David Bindel and John-Luke Navarro},
  title       = {{libEnsemble} Users Manual},
  institution = {Argonne National Laboratory},
  number      = {Revision 0.5.1},
  year        = {2019},
  url         = {https://buildmedia.readthedocs.org/media/pdf/libensemble/latest/libensemble.pdf}
}

Support

Join the libEnsemble mailing list at:

or email questions to:

or communicate (and establish a private channel, if desired) at: