Skip to content

Commit

Permalink
Merge branch 'develop' into performance/sliding_window
Browse files Browse the repository at this point in the history
  • Loading branch information
jmlarson1 committed Aug 3, 2023
2 parents 1b19ccc + dfdcd4a commit 0c78428
Show file tree
Hide file tree
Showing 44 changed files with 522 additions and 248 deletions.
4 changes: 4 additions & 0 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -131,6 +131,10 @@ jobs:
conda env update --file install/gen_deps_environment.yml
pip install ax-platform==0.2.8
- name: Install surmise
if: matrix.os != 'macos-latest' && steps.cache.outputs.cache-hit != 'true'
run: |
pip install --upgrade git+https://github.com/surmising/surmise.git@develop
- name: Build ytopt and dependencies
Expand Down
4 changes: 2 additions & 2 deletions .wci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -17,8 +17,8 @@ description: |
language: Python

release:
version: 0.10.1
date: 2023-07-10
version: 0.10.2
date: 2023-07-24

documentation:
general: https://libensemble.readthedocs.io
Expand Down
47 changes: 33 additions & 14 deletions CHANGELOG.rst
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,26 @@ GitHub issues are referenced, and can be viewed with hyperlinks on the `github r

.. _`github releases page`: https://github.com/Libensemble/libensemble/releases

Release 0.10.2
--------------

:Date: July 24, 2023

* Fixes issues with workflow directories:
* Ensure relative paths are interpreted from where libEnsemble is run. #1020
* Create intermediate directories for workflow paths. #1017

* Fixes issue where libEnsemble pre-initialized a shared multiprocssing queue. #1026

:Note:

* Tested platforms include Linux, MacOS, Windows and major systems including Frontier (OLCF), Polaris (ALCF), Perlmutter (NERSC), Theta (ALCF) and Bebop. The major system tests ran heterogeneous workflows.

:Known issues:

* On systems using SLURM 23.02, some issues have been experienced when using ``mpi4py`` comms.
* See the known issues section in the documentation for more information (https://libensemble.readthedocs.io/en/main/known_issues.html).

Release 0.10.1
--------------

Expand All @@ -27,7 +47,6 @@ Hotfix for breaking changes in Pydantic.

* See known issues section in the documentation.


Release 0.10.0
--------------

Expand All @@ -36,16 +55,16 @@ Release 0.10.0
New capabilities:

* Enhance portability and simplify the assignment of procs/GPUs to worker resources #928 / #983
* Auto-detect GPUs across systems (inc. Nvidia, AMD, and Intel GPUs).
* Auto-determination of GPU assignment method by MPI runner or provided platform.
* Portable `auto_assign_gpus` / `match_procs_to_gpus` and `num_gpus` arguments added to the MPI executor submit.
* Add `set_to_gpus` function (similar to `set_to_slots`).
* Allow users to specify known systems via option or environment variable.
* Allow users to specify their own system configurations.
* These changes remove a number of tweaks that were needed for particular platforms.
* Auto-detect GPUs across systems (inc. Nvidia, AMD, and Intel GPUs).
* Auto-determination of GPU assignment method by MPI runner or provided platform.
* Portable `auto_assign_gpus` / `match_procs_to_gpus` and `num_gpus` arguments added to the MPI executor submit.
* Add `set_to_gpus` function (similar to `set_to_slots`).
* Allow users to specify known systems via option or environment variable.
* Allow users to specify their own system configurations.
* These changes remove a number of tweaks that were needed for particular platforms.

* Resource management supports GPU and non-GPU simulations in the same ensemble. #993
* User's can specify `num_procs` and `num_gpus` in the generator for each evaluation.
* Resource management supports GPU and non-GPU simulations in the same ensemble. #993
* User's can specify `num_procs` and `num_gpus` in the generator for each evaluation.

* Pydantic models are used for validating major libE input (input can be provided as classes or dictionaries). #878
* Added option to store output and ensemble directories in a workflow directory. #982
Expand All @@ -71,10 +90,10 @@ Documentation:
Tests and Examples:

* Updated forces_gpu tutorial example. #956
* Source code edit is not required for the GPU version.
* Reports whether running on device or host.
* Increases problem size.
* Added versions with persistent generator and multi-task (GPU v non-GPU).
* Source code edit is not required for the GPU version.
* Reports whether running on device or host.
* Increases problem size.
* Added versions with persistent generator and multi-task (GPU v non-GPU).
* Moved multiple tests, generators, and simulators to the community repo.
* Added ytopt example. And updated heFFTe example. #943
* Support Python 3.11 #922
Expand Down
14 changes: 11 additions & 3 deletions README.rst
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
.. image:: docs/images/libEnsemble_Logo.svg
.. image:: https://raw.githubusercontent.com/Libensemble/libensemble/main/docs/images/libE_logo.png
:align: center
:alt: libEnsemble

Expand All @@ -7,6 +7,14 @@
.. image:: https://img.shields.io/pypi/v/libensemble.svg?color=blue
:target: https://pypi.org/project/libensemble

.. image:: https://img.shields.io/conda/v/conda-forge/libensemble?color=blue
:target: https://anaconda.org/conda-forge/libensemble

.. image:: https://img.shields.io/spack/v/py-libensemble?color=blue
:target: https://spack.readthedocs.io/en/latest/package_list.html#py-libensemble

|
.. image:: https://github.com/Libensemble/libensemble/workflows/libEnsemble-CI/badge.svg?branch=main
:target: https://github.com/Libensemble/libensemble/actions

Expand Down Expand Up @@ -82,6 +90,7 @@ Resources
doi = {10.1109/tpds.2021.3082815}
}
.. _Community Examples repository: https://github.com/Libensemble/libe-community-examples
.. _conda-forge: https://conda-forge.org/
.. _Contributions: https://github.com/Libensemble/libensemble/blob/main/CONTRIBUTING.rst
.. _docs: https://libensemble.readthedocs.io/en/main/advanced_installation.html
Expand All @@ -90,7 +99,6 @@ Resources
.. _libEnsemble Slack page: https://libensemble.slack.com
.. _MPICH: http://www.mpich.org/
.. _mpmath: http://mpmath.org/
.. _Quickstart: https://libensemble.readthedocs.io/en/main/introduction.html
.. _PyPI: https://pypi.org
.. _Quickstart: https://libensemble.readthedocs.io/en/main/introduction.html
.. _ReadtheDocs: http://libensemble.readthedocs.org/
.. _Community Examples repository: https://github.com/Libensemble/libe-community-examples
8 changes: 5 additions & 3 deletions docs/advanced_installation.rst
Original file line number Diff line number Diff line change
Expand Up @@ -9,8 +9,9 @@ automatically installed alongside libEnsemble:
* Python_ 3.8 or above
* NumPy_
* psutil_
* setuptools_
* pydantic_
* pyyaml_
* tomli_

In view of libEnsemble's compiled dependencies, the following installation
methods each offer a trade-off between convenience and the ability
Expand Down Expand Up @@ -170,15 +171,16 @@ the given system (rather than building from scratch). This may include
``Python`` and the packages distributed with it (e.g., ``numpy``), and will
often include the system MPI library.

.. _GitHub: https://github.com/Libensemble/libensemble
.. _Conda: https://docs.conda.io/en/latest/
.. _conda-forge: https://conda-forge.org/
.. _GitHub: https://github.com/Libensemble/libensemble
.. _MPICH: https://www.mpich.org/
.. _NumPy: http://www.numpy.org
.. _`Open MPI`: https://www.open-mpi.org/
.. _psutil: https://pypi.org/project/psutil/
.. _pydantic: https://pydantic-docs.helpmanual.io/
.. _pyyaml: https://github.com/yaml/pyyaml
.. _Python: http://www.python.org
.. _setuptools: https://setuptools.pypa.io/en/latest/
.. _Spack: https://spack.readthedocs.io/en/latest
.. _spack_libe: https://github.com/Libensemble/spack_libe
.. _tomli: https://github.com/hukkin/tomli
4 changes: 4 additions & 0 deletions docs/data_structures/libE_specs.rst
Original file line number Diff line number Diff line change
Expand Up @@ -77,6 +77,10 @@ the ``LibeSpecs`` class. When provided as a Python class, options are validated
Whether to copy back directories within ``ensemble_dir_path`` back to launch
location. Useful if ``ensemble_dir_path`` located on node-local storage.

"reuse_output_dir" [bool] = ``False``:
Whether to allow overwrites and access to previous ensemble and workflow directories in subsequent runs.
``False`` by default to protect results.

"use_worker_dirs" [bool] = ``False``:
Whether to organize calculation directories under worker-specific directories:

Expand Down
2 changes: 1 addition & 1 deletion docs/introduction_latex.rst
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,6 @@
.. _NLopt documentation: https://nlopt.readthedocs.io/en/latest/NLopt_Installation/
.. _nlopt: https://nlopt.readthedocs.io/en/latest/
.. _NumPy: http://www.numpy.org
.. _Quickstart: https://libensemble.readthedocs.io/en/main/introduction.html
.. _OPAL: http://amas.web.psi.ch/docs/opal/opal_user_guide-1.6.0.pdf
.. _petsc4py: https://bitbucket.org/petsc/petsc4py
.. _PETSc/TAO: http://www.mcs.anl.gov/petsc
Expand All @@ -44,6 +43,7 @@
.. _pytest: https://pypi.org/project/pytest/
.. _Python: http://www.python.org
.. _pyyaml: https://pyyaml.org/
.. _Quickstart: https://libensemble.readthedocs.io/en/main/introduction.html
.. _ReadtheDocs: http://libensemble.readthedocs.org/
.. _SciPy: http://www.scipy.org
.. _scipy.optimize: https://docs.scipy.org/doc/scipy/reference/optimize.html
Expand Down
14 changes: 11 additions & 3 deletions docs/known_issues.rst
Original file line number Diff line number Diff line change
Expand Up @@ -4,9 +4,15 @@ Known Issues
The following selection describes known bugs, errors, or other difficulties that
may occur when using libEnsemble.

* As of 10/13/2022, on Perlmutter there was an issue running concurrent applications
on a node, following a recent system update. This also affects previous versions
of libEnsemble, and is being investigated.
* Platforms using SLURM version 23.02 experience a `pickle error`_ when using
``mpi4py`` comms. Disabling matching probes via the environment variable
``export MPI4PY_RC_RECV_MPROBE=0`` or adding ``mpi4py.rc.recv_mprobe = False``
at the top of the calling script should resolve this error. If using the MPI
executor and multiple workers per node, some users may experience failed
applications with the message
``srun: error: CPU binding outside of job step allocation, allocated`` in
the application's standard error. This is being investigated. If this happens
we recommend using ``local`` comms in place of ``mpi4py``.
* When using the Executor: OpenMPI does not work with direct MPI task
submissions in mpi4py comms mode, since OpenMPI does not support nested MPI
executions. Use either ``local`` mode or the Balsam Executor instead.
Expand All @@ -23,3 +29,5 @@ may occur when using libEnsemble.
:doc:`FAQ<FAQ>` for more information.
* We currently recommended running in Central mode on Bridges as distributed
runs are experiencing hangs.

.. _pickle error: https://docs.nersc.gov/development/languages/python/using-python-perlmutter/#missing-support-for-matched-proberecv
36 changes: 19 additions & 17 deletions docs/platforms/platforms_index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -146,13 +146,13 @@ When using the MPI Executor, it is possible to override the detected information

.. _funcx_ref:

funcX - Remote User functions
-----------------------------
Globus Compute - Remote User functions
--------------------------------------

*Alternatively to much of the above*, if libEnsemble is running on some resource with
internet access (laptops, login nodes, other servers, etc.), workers can be instructed to
launch generator or simulator user function instances to separate resources from
themselves via funcX_, a distributed, high-performance function-as-a-service platform:
themselves via `Globus Compute`_, a distributed, high-performance function-as-a-service platform:

.. image:: ../images/funcxmodel.png
:alt: running_with_funcx
Expand All @@ -162,17 +162,17 @@ themselves via funcX_, a distributed, high-performance function-as-a-service pla
This is useful for running ensembles across machines and heterogeneous resources, but
comes with several caveats:

1. User functions registered with funcX must be *non-persistent*, since
1. User functions registered with Globus Compute must be *non-persistent*, since
manager-worker communicators can't be serialized or used by a remote resource.

2. Likewise, the ``Executor.manager_poll()`` capability is disabled. The only
available control over remote functions by workers is processing return values
or exceptions when they complete.

3. funcX imposes a `handful of task-rate and data limits`_ on submitted functions.
3. Globus Compute imposes a `handful of task-rate and data limits`_ on submitted functions.

4. Users are responsible for authenticating via Globus_ and maintaining their
`funcX endpoints`_ on their target systems.
`Globus Compute endpoints`_ on their target systems.

Users can still define Executor instances within their user functions and submit
MPI applications normally, as long as libEnsemble and the target application are
Expand All @@ -184,15 +184,17 @@ accessible on the remote system::
exctr.register_app(full_path="/home/user/forces.x", app_name="forces")
task = exctr.submit(app_name="forces", num_procs=64)

Specify a funcX endpoint in either :class:`sim_specs<libensemble.specs.SimSpecs>` or :class:`gen_specs<libensemble.specs.GenSpecs>` via the ``funcx_endpoint``
key. For example::
Specify a Globus Compute endpoint in either :class:`sim_specs<libensemble.specs.SimSpecs>` or :class:`gen_specs<libensemble.specs.GenSpecs>` via the ``globus_compute_endpoint``
argument. For example::

from libensemble.specs import SimSpecs

sim_specs = {
"sim_f": sim_f,
"in": ["x"],
"out": [("f", float)],
"funcx_endpoint": "3af6dc24-3f27-4c49-8d11-e301ade15353",
}
sim_specs = SimSpecs(
sim_f = sim_f,
inputs = ["x"],
out = [("f", float)],
globus_compute_endpoint = "3af6dc24-3f27-4c49-8d11-e301ade15353",
)

See the ``libensemble/tests/scaling_tests/funcx_forces`` directory for a complete
remote-simulation example.
Expand All @@ -219,7 +221,7 @@ libEnsemble on specific HPC systems.

.. _Balsam: https://balsam.readthedocs.io/en/latest/
.. _Cooley: https://www.alcf.anl.gov/support-center/cooley
.. _funcX: https://funcx.org/
.. _`funcX endpoints`: https://funcx.readthedocs.io/en/latest/endpoints.html
.. _`Globus Compute`: https://www.globus.org/compute
.. _`Globus Compute endpoints`: https://globus-compute.readthedocs.io/en/latest/endpoints.html
.. _Globus: https://www.globus.org/
.. _`handful of task-rate and data limits`: https://funcx.readthedocs.io/en/latest/limits.html
.. _`handful of task-rate and data limits`: https://globus-compute.readthedocs.io/en/latest/limits.html
7 changes: 5 additions & 2 deletions docs/platforms/polaris.rst
Original file line number Diff line number Diff line change
Expand Up @@ -42,12 +42,15 @@ for installing libEnsemble, including using Spack.
Ensuring use of mpiexec
-----------------------

If using the :doc:`MPIExecutor<../executor/mpi_executor>` it is recommended to
ensure you are using ``mpiexec`` instead of ``aprun``. When setting up the executor use::
Prior to libE v 0.10.0, when using the :doc:`MPIExecutor<../executor/mpi_executor>` it
is necessary to manually tell libEnsemble to use``mpiexec`` instead of ``aprun``.
When setting up the executor use::

from libensemble.executors.mpi_executor import MPIExecutor
exctr = MPIExecutor(custom_info={'mpi_runner':'mpich', 'runner_name':'mpiexec'})

From version 0.10.0, this is not necessary.

Job Submission
--------------

Expand Down
2 changes: 1 addition & 1 deletion docs/requirements.txt
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
sphinx
sphinx<7
sphinxcontrib-bibtex
autodoc_pydantic
sphinx-design
Expand Down
2 changes: 1 addition & 1 deletion docs/running_libE.rst
Original file line number Diff line number Diff line change
Expand Up @@ -99,7 +99,7 @@ Reverse-ssh interface
By specifying ``--comms ssh`` on the command line, libEnsemble workers can
be launched to remote ssh-accessible systems without needing to specify ``"port"`` or ``"authkey"``. This allows users
to colocate workers, simulation, or generator functions, and any applications they submit on the same machine. Such user
functions can also be persistent, unlike when launching remote functions via :ref:`funcX<funcx_ref>`.
functions can also be persistent, unlike when launching remote functions via :ref:`Globus Compute<funcx_ref>`.

The working directory and Python to run on the remote system need to be specified. Running a calling script may resemble::

Expand Down

0 comments on commit 0c78428

Please sign in to comment.