Merge pull request #314 from Libensemble/develop

Updating Stefan's branch with latest Develop
Libensemble · Dec 3, 2019 · 29defcc · 29defcc
2 parents 0c36945 + da24a66
commit 29defcc
Show file tree

Hide file tree

Showing 20 changed files with 226 additions and 133 deletions.
diff --git a/docs/FAQ.rst b/docs/FAQ.rst
@@ -145,7 +145,8 @@ There are several ways to address this nuisance, but all involve trial and error
 An easy (but insecure) solution is temporarily disabling the Firewall through
 System Preferences -> Security & Privacy -> Firewall -> Turn Off Firewall. Alternatively,
 adding a Firewall "Allow incoming connections" rule can be attempted for the offending
-Job Controller executable. We've had limited success running ``sudo codesign --force --deep --sign - /path/to/application.app``
+Job Controller executable. We've had limited success running 
+``sudo codesign --force --deep --sign - /path/to/application.app``
 on our Job Controller executables, then confirming the next alerts for the executable
 and ``mpiexec.hydra``.
 

diff --git a/docs/data_structures/calc_status.rst b/docs/data_structures/calc_status.rst
@@ -3,7 +3,15 @@
 calc_status
 ===========
 
-The ``calc_status`` is an integer attribute with named (enumerated) values and a corresponding description that can be used in :ref:`sim_f<api_sim_f>` or :ref:`gen_f<api_gen_f>` functions to capture the status of a calcuation. This is returned to the manager and printed to the ``libE_stats.txt`` file. Only the status values ``FINISHED_PERSISTENT_SIM_TAG`` and ``FINISHED_PERSISTENT_GEN_TAG`` are currently used by the manager,  but others can still provide a useful summary in libE_stats.txt. The user determines the status of the calculation, as it could include multiple application runs. It can be added as a third return variable in sim_f or gen_f functions.
+The ``calc_status`` is an integer attribute with named (enumerated) values and
+a corresponding description that can be used in :ref:`sim_f<api_sim_f>` or
+:ref:`gen_f<api_gen_f>` functions to capture the status of a calcuation. This
+is returned to the manager and printed to the ``libE_stats.txt`` file. Only the
+status values ``FINISHED_PERSISTENT_SIM_TAG`` and
+``FINISHED_PERSISTENT_GEN_TAG`` are currently used by the manager,  but others
+can still provide a useful summary in libE_stats.txt. The user determines the
+status of the calculation, as it could include multiple application runs. It
+can be added as a third return variable in sim_f or gen_f functions.
 The calc_status codes are in the ``libensemble.message_numbers`` module.
 
 Example of ``calc_status`` used along with :ref:`job controller<jobcontroller_index>` in sim_f:

diff --git a/docs/data_structures/work_dict.rst b/docs/data_structures/work_dict.rst
@@ -20,8 +20,11 @@ Dictionary with integer keys ``i`` and dictionary values to be given to worker `
         'persistent' [bool]: True if worker 'i' will enter persistent mode
 
 .. seealso::
-  For allocation functions giving Work dictionaries using persistent workers, see `start_only_persistent.py`_ or `start_persistent_local_opt_gens.py`_.
-  For a use case where the allocation and generator functions combine to do simulation evaluations with different resources (blocking some workers), see `test_6-hump_camel_with_different_nodes_uniform_sample.py`_.
+  For allocation functions giving Work dictionaries using persistent workers,
+  see `start_only_persistent.py`_ or `start_persistent_local_opt_gens.py`_.
+  For a use case where the allocation and generator functions combine to do
+  simulation evaluations with different resources (blocking some workers), see
+  `test_6-hump_camel_with_different_nodes_uniform_sample.py`_.
 
 .. _start_only_persistent.py: https://github.com/Libensemble/libensemble/blob/develop/libensemble/alloc_funcs/start_only_persistent.py
 .. _start_persistent_local_opt_gens.py: https://github.com/Libensemble/libensemble/blob/develop/libensemble/alloc_funcs/start_persistent_local_opt_gens.py

diff --git a/docs/data_structures/worker_array.rst b/docs/data_structures/worker_array.rst
@@ -12,9 +12,10 @@ worker array
         'blocked' [int]:
             Is the worker's resources blocked by another calculation
 
-Since workers can be an a variety of states, the worker array ``W`` is contains
-information about each workers state. This can allow an allocation functions to
-determine what work should be performed.
+The worker array ``W`` contains information about each worker's state. This is
+useful information for allocation functions determining what work should be
+performed next.
+
 We take the following convention:
 
 =========================================   =======  ============  =======

diff --git a/docs/dev_guide/release_management/release_platforms/rel_spack.rst b/docs/dev_guide/release_management/release_platforms/rel_spack.rst
@@ -13,9 +13,11 @@ This assumes you have already:
 
 Details on how to create forks can be found at: https://help.github.com/articles/fork-a-repo
 
-You now have a configuration like shown in answer at: https://stackoverflow.com/questions/6286571/are-git-forks-actually-git-clones
+You now have a configuration like shown in answer at:
+https://stackoverflow.com/questions/6286571/are-git-forks-actually-git-clones
 
-Upstream, in this case, is the official Spack repository on GitHub. Origin is your fork on GitHub and Local Machine is your local clone (from your fork).
+Upstream, in this case, is the official Spack repository on GitHub. Origin is
+your fork on GitHub and Local Machine is your local clone (from your fork).
 
 Make sure ``SPACK_ROOT`` is set and spack binary is in your path::
 
@@ -37,8 +39,8 @@ To set upstream repo::
 Now to update (the main develop branch)
 ---------------------------------------
 
-You will now update your local machine from the upstream repo (if in doubt - make a copy of local repo
-in your filestystem before doing the following).
+You will now update your local machine from the upstream repo (if in doubt -
+make a copy of local repo in your filestystem before doing the following).
 
 Check upstream remote is present::
 

diff --git a/docs/dev_guide/release_management/release_process.rst b/docs/dev_guide/release_management/release_process.rst
@@ -13,9 +13,11 @@ Before release
 
 - A release branch should be taken off develop (or develop pulls controlled).
 
-- Release notes for this version are added to the documentation with release date, including a list of supported (tested) platforms.
+- Release notes for this version are added to the documentation with release
+  date, including a list of supported (tested) platforms.
 
-- Version number is updated wherever it appears (in ``setup.py``, ``libensemble/__init__.py``, ``README.rst`` and twice in ``docs/conf.py``)
+- Version number is updated wherever it appears:
+  (in ``setup.py``, ``libensemble/__init__.py``, ``README.rst`` and twice in ``docs/conf.py``)
 
 - Check year is correct in ``README.rst`` under *Citing libEnsemble* and in ``docs/conf.py``.
 
@@ -31,7 +33,8 @@ Before release
 
   - Documentation must build and display correctly wherever hosted (currently readthedocs.com).
 
-- Pull request from either develop or release branch to master requesting reviewer/s (including at least one other administrator).
+- Pull request from either develop or release branch to master requesting
+  reviewer/s (including at least one other administrator).
 
 - Reviewer will check tests have passed and approve merge.
 
@@ -55,6 +58,7 @@ An administrator will take the following steps.
 After release
 -------------
 
-- Ensure all relevant GitHub issues are closed and moved to the *Done* column on the kanban project board (inc. the release checklist).
+- Ensure all relevant GitHub issues are closed and moved to the *Done* column
+  on the kanban project board (inc. the release checklist).
 
 - Email libEnsemble mailing list
diff --git a/docs/history_output.rst b/docs/history_output.rst
@@ -1,16 +1,15 @@
 The History Array
 ~~~~~~~~~~~~~~~~~
 libEnsemble uses a NumPy structured array :ref:`H<datastruct-history-array>` to
-store output from ``gen_f`` and corresponding ``sim_f`` output. Similarly,
-``gen_f`` and ``sim_f`` are expected to return output in NumPy structured
-arrays. The names of the fields to be given as input to ``gen_f`` and ``sim_f``
-must be an output from ``gen_f`` or ``sim_f``. In addition to the fields output
-from ``sim_f`` and ``gen_f``, the final history returned from libEnsemble will
-include the following fields:
+store corresponding output from each ``gen_f`` and ``sim_f``. Similarly,
+``gen_f`` and ``sim_f`` are expected to return output as NumPy structured
+arrays. The names of the input fields for ``gen_f`` and ``sim_f``
+must be output from ``gen_f`` or ``sim_f``. In addition to the user-function output fields,
+the final history from libEnsemble will includes the following:
 
 * ``sim_id`` [int]: Each unit of work output from ``gen_f`` must have an
   associated ``sim_id``. The generator can assign this, but users must be
-  careful to ensure points are added in order. For example, ``if alloc_f``
+  careful to ensure points are added in order. For example, if ``alloc_f``
   allows for two ``gen_f`` instances to be running simultaneously, ``alloc_f``
   should ensure that both don’t generate points with the same ``sim_id``.
 
@@ -20,7 +19,7 @@ include the following fields:
 * ``given_time`` [float]: At what time (since the epoch) was this ``gen_f``
   output given to a worker?
 
-* ``sim_worker`` [int]: libEnsemble worker that it was given to be evaluated.
+* ``sim_worker`` [int]: libEnsemble worker that output was given to for evaluation
 
 * ``gen_worker`` [int]: libEnsemble worker that generated this ``sim_id``
 
@@ -44,15 +43,15 @@ where ``sim_count`` is the number of points evaluated.
 
 Other libEnsemble files produced by default are:
 
-* ``libE_stats.txt``: This contains a one-line summary of all user
-  calculations.  Each calculation summary is sent by workers to the manager and
-  printed as the run progresses.
+* ``libE_stats.txt``: This contains one-line summaries for each user
+  calculation. Each summary is sent by workers to the manager and
+  logged as the run progresses.
 
-* ``ensemble.log``: This is the logging output from libEnsemble. The default
-  logging is at INFO level. To gain additional diagnostics logging level can be
-  set to DEBUG.  If this file is not removed, multiple runs will append output.
-  Messages at or above level MANAGER_WARNING are also copied to stderr to alert
-  the user promptly.  For more info, see :doc:`Logging<logging>`.
+* ``ensemble.log``: This contains logging output from libEnsemble. The default
+  logging level is INFO. To gain additional diagnostics, the logging level can be
+  set to DEBUG. If this file is not removed, multiple runs will append output.
+  Messages at or above MANAGER_WARNING are also copied to stderr to alert
+  the user promptly. For more info, see :doc:`Logging<logging>`.
 
 Output Analysis
 ^^^^^^^^^^^^^^^

diff --git a/docs/job_controller/jc_index.rst b/docs/job_controller/jc_index.rst
@@ -3,7 +3,9 @@
 Job Controller
 ==============
 
-The job controller can be used within the simulator (and potentially generator) functions to provide a simple, portable interface for running and managing user jobs.
+The job controller can be used within the simulator (and potentially generator)
+functions to provide a simple, portable interface for running and managing user
+jobs.
 
 .. toctree::
    :maxdepth: 2

diff --git a/docs/job_controller/job_controller.rst b/docs/job_controller/job_controller.rst
@@ -18,9 +18,10 @@ See the controller APIs for optional arguments.
 Job Class
 ---------
 
-Jobs are created and returned though the job_controller launch function. Jobs can be polled and
-killed with the respective poll and kill functions. Job information can be queried through the job attributes
-below and the query functions. Note that the job attributes are only updated when they are
+Jobs are created and returned though the job_controller launch function. Jobs
+can be polled and killed with the respective poll and kill functions. Job
+information can be queried through the job attributes below and the query
+functions. Note that the job attributes are only updated when they are
 polled/killed (or through other job or job controller functions).
 
 .. autoclass:: Job
@@ -32,9 +33,12 @@ polled/killed (or through other job or job controller functions).
 Job Attributes
 --------------
 
-Following is a list of job status and configuration attributes that can be retrieved from a job.
+Following is a list of job status and configuration attributes that can be
+retrieved from a job.
 
-:NOTE: These should not be set directly. Jobs are launched by the job controller and job information can be queired through the job attributes below and the query functions.
+:NOTE: These should not be set directly. Jobs are launched by the job
+       controller and job information can be queired through the job attributes
+       below and the query functions.
 
 Job Status attributes include:
 

diff --git a/docs/job_controller/mpi_controller.rst b/docs/job_controller/mpi_controller.rst
@@ -22,7 +22,9 @@ See the controller API below for optional arguments.
 Class specific attributes
 -------------------------
 
-These attributes can be set directly to alter behaviour of the MPI job controller. However, they should be used with caution, as they may not be implemented in other job controllers.
+These attributes can be set directly to alter behaviour of the MPI job
+controller. However, they should be used with caution, as they may not be
+implemented in other job controllers.
 
 :max_launch_attempts: (int) Maximum number of launch attempts for a given job. *Default: 5*.
 :fail_time: (int) *Only if wait_on_run is set.* Maximum run-time to failure in seconds that results in re-launch. *Default: 2*.
@@ -33,4 +35,6 @@ Example. To increase resilience against launch failures::
     jobctrl.max_launch_attempts = 10
     jobctrl.fail_time = 5
 
-Note that an the re-try delay on launches starts at 5 seconds and increments by 5 seconds for each retry. So the 4th re-try will wait for 20 seconds before re-launching.
+Note that an the re-try delay on launches starts at 5 seconds and increments by
+5 seconds for each retry. So the 4th re-try will wait for 20 seconds before
+re-launching.
diff --git a/docs/job_controller/overview.rst b/docs/job_controller/overview.rst
@@ -1,38 +1,37 @@
 Job Controller Overview
 =======================
 
-Many users' will wish to launch an application to the system from a :ref:`sim_f<api_sim_f>`
-(or :ref:`gen_f<api_gen_f>`), running on a worker.
-
-An MPI job, for example, could be initialized with a subprocess call to ``mpirun``, or
-an alternative launcher such as ``aprun`` or ``jsrun``. The sim_f may then monitor this job,
-check output, and possibly kill the job. The word ``job`` is used here to represent
-a launch of an application to the system, where the system could be a supercomputer,
-cluster, or any other provision of compute resources.
-
-In order to remove the burden of system interaction from the user, and enable sim_f
-scripts that are portable between systems, a job_controller interface is provided by
-libEnsemble. The job_controller provides the key functions: ``launch()``, ``poll()`` and
-``kill()``. libEnsemble auto-detects a number of system criteria, such as the MPI launcher,
-along with correct mechanisms for polling and killing jobs, on supported systems. It also
-contains built in resilience, such as re-launching jobs that fail due to system factors.
-User scripts that employ the job_controller interface will be portable between supported
-systems. Job attributes can be queried to determine status after each poll. Functions are
-also provided to access and interrogate files in the job's working directory.
-
-The Job Controller module can be used to submit
-and manage jobs using a portable interface. Various back-end mechanisms may be
-used to implement this interface on the system, including a proxy launcher and
-job management system, such as Balsam. Currently, these job_controllers launch
-at the application level within an existing resource pool. However, submissions
-to a batch schedular may be supported in the future.
-
-At the top-level calling script, a job_controller is created and the executable
-gen or sim applications are registered to it (these are applications that will
-be runnable jobs). If an alternative job_controller, such as Balsam, is to be
-used, then these can be created as in the example. Once in the user-side worker
-code (sim/gen func), an MPI based job_controller can be retrieved without any
-need to specify the type.
+Users who wish to launch jobs to a system from a :ref:`sim_f<api_sim_f>`(or :ref:`gen_f<api_gen_f>`),
+running on a worker have several options.
+
+Typically, an MPI job could be initialized with a subprocess call to
+``mpirun`` or an alternative launcher such as ``aprun`` or ``jsrun``. The ``sim_f``
+may then monitor this job, check output, and possibly kill the job. We use "job"
+to represent an application launch to the system, which may be a supercomputer,
+cluster, or other provision of compute resources.
+
+A **job_controller** interface is provided by libEnsemble to remove the burden of
+system interaction from the user and ease writing portable user scripts that
+launch applications. The job_controller provides the key functions: ``launch()``,
+``poll()`` and ``kill()``. Job attributes can be queried to determine status after
+each poll. To implement these functions, libEnsemble auto-detects system criteria
+such as the MPI launcher and mechanisms to poll and kill jobs on supported systems.
+libEnsemble's job_controller is resilient, and can re-launch jobs that fail due
+to system factors.
+
+Functions are also provided to access and interrogate files in the job's working directory.
+
+Various back-end mechanisms may be used by the job_controller to best interact
+with each system, including proxy launchers or job management systems like
+Balsam_. Currently, these job_controllers launch at the application level within
+an existing resource pool. However, submissions to a batch scheduler may be
+supported in the future.
+
+In a calling script, a job_controller object is created and the executable
+generator or simulation applications are registered to it for launch. If an
+alternative job_controller like Balsam will be used, then the applications can be
+registered like in the example below. Once in the user-side worker code (sim/gen func),
+an MPI based job_controller can be retrieved without any need to specify the type.
 
 **Example usage (code runnable with or without a Balsam backend):**
 
@@ -92,3 +91,5 @@ For a more realistic example see:
 - libensemble/tests/scaling_tests/forces/
 
 which launches the forces.x application as an MPI job.
+
+.. _Balsam: https://balsam.readthedocs.io/en/latest/