Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
39 commits
Select commit Hold shift + click to select a range
4ef9318
initial commit for making a pair of scripts that respectively produce…
jlnav May 27, 2022
434cb5c
add remaining options, with sensible defaults.
jlnav May 31, 2022
605fdc6
renames, some cleanup, begin submission script
jlnav May 31, 2022
2618db8
black first script
jlnav May 31, 2022
e5a5f06
add launcher choices
jlnav May 31, 2022
c5cd5fd
replace some unportable defaults, small fixes, instantiate psi/j exec…
jlnav Jun 1, 2022
938ef2d
Merge branch 'develop' into feature/psij_scripts
jlnav Jun 2, 2022
750d522
small adjusmtments, auto-specify PYTHONNOUSERSITE, add logic to libes…
jlnav Jun 3, 2022
076972e
small fixes
jlnav Jun 3, 2022
0d18e26
small adjustments
jlnav Jun 6, 2022
a9991a5
various small fixes, plus allow user to narrow down search directory …
jlnav Jun 8, 2022
6eef8f0
black, and create stdout/stderr files
jlnav Jun 8, 2022
cabb019
proper python headers?
jlnav Jun 8, 2022
f867030
small reformattings, set new script as installable in environment's bin
jlnav Jun 8, 2022
8a1a847
make default nnodes 1
jlnav Jun 8, 2022
dca2923
probably giving up on evaluating and waiting for exit criteria for no…
jlnav Jun 13, 2022
66a5cf6
black
jlnav Jun 13, 2022
85cb75a
fix reference to other psi-j script
jlnav Jun 13, 2022
f54c786
first attempts to define MPI resources for an MPI run
jlnav Jun 13, 2022
6d94da0
black
jlnav Jun 13, 2022
6cb26bb
specify processes per node
jlnav Jun 13, 2022
111375f
try libesubmit with ppn=1
jlnav Jun 13, 2022
be55f73
just experimenting until some arrangement of resources for MPI runs w…
jlnav Jun 13, 2022
613c8ba
trying out dry-run feature and adding nnodes option to libesubmit
jlnav Jun 14, 2022
9d9d1e3
print help and exit if libesubmit didn't take in a json file
jlnav Jun 15, 2022
5ba3733
Merge branch 'develop' into feature/psij_scripts
jlnav Jun 30, 2022
8d95433
move psij scripts into base scripts directory
jlnav Jun 30, 2022
c1aea4d
fix symlink
jlnav Jun 30, 2022
4a8d9ac
Merge branch 'develop' into feature/psij_scripts
jlnav Jul 6, 2022
f585498
oMerge branch 'develop' into feature/psij_scripts
jlnav Jul 11, 2022
177c16f
odd symlink in repo base?
jlnav Jul 11, 2022
4e23b95
some fixes, add if name == main line to simple forces
jlnav Jul 11, 2022
e9edbb9
some attempts to catch and help resolve missing dependencies in psij …
jlnav Jul 13, 2022
4748b79
first attempt at including nsim_workers and nresource_sets in libereg…
jlnav Jul 13, 2022
20e0071
Merge branch 'develop' into feature/psij_scripts
jlnav Jul 14, 2022
2dd65b7
fix logic so args are just passed though - evaluating how well they w…
jlnav Jul 14, 2022
7777f2c
Merge branch 'develop' into feature/psij_scripts
jlnav Jul 15, 2022
25c5a86
first approach to documenting psi/j scripts
jlnav Jul 18, 2022
41ad54d
mention new optional dependencies in README
jlnav Jul 19, 2022
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 11 additions & 2 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -71,7 +71,7 @@ and cores, and can dynamically assign resources to workers.
Dependencies
~~~~~~~~~~~~

Required dependencies:
**Required dependencies**:

* Python_ 3.7 or above
* NumPy_
Expand All @@ -83,7 +83,7 @@ When using ``mpi4py`` for libEnsemble communications:
* A functional MPI 1.x/2.x/3.x implementation, such as MPICH_, built with shared/dynamic libraries
* mpi4py_ v2.0.0 or above

Optional dependencies:
**Optional dependencies**:

* Balsam_

Expand All @@ -103,6 +103,12 @@ a function-as-a-service platform to which workers can submit remote generator or
simulator function instances. This feature can help distribute an ensemble
across systems and heterogeneous resources.

* `psi-j-python`_

As of v0.9.2+dev, libEnsemble features a set of command-line utilities for submitting
libEnsemble jobs to almost any system or scheduler via a `PSI/J`_ Python interface. tqdm_
is also required.

The example simulation and generation functions and tests require the following:

* SciPy_
Expand Down Expand Up @@ -330,6 +336,8 @@ See a complete list of `example user scripts`_.
.. _petsc4py: https://bitbucket.org/petsc/petsc4py
.. _PETSc/TAO: http://www.mcs.anl.gov/petsc
.. _poster: https://figshare.com/articles/libEnsemble_A_Python_Library_for_Dynamic_Ensemble-Based_Computations/12559520
.. _PSI/J: https://exaworks.org/psij
.. _psi-j-python: https://github.com/ExaWorks/psi-j-python
.. _psutil: https://pypi.org/project/psutil/
.. _PyPI: https://pypi.org
.. _pytest-cov: https://pypi.org/project/pytest-cov/
Expand All @@ -348,6 +356,7 @@ See a complete list of `example user scripts`_.
.. _tarball: https://github.com/Libensemble/libensemble/releases/latest
.. _Tasmanian: https://tasmanian.ornl.gov/
.. _Theta: https://www.alcf.anl.gov/alcf-resources/theta
.. _tqdm: https://tqdm.github.io/
.. _user guide: https://libensemble.readthedocs.io/en/latest/programming_libE.html
.. _VTMOP: https://github.com/Libensemble/libe-community-examples#vtmop
.. _WarpX: https://warpx.readthedocs.io/en/latest/
Expand Down
3 changes: 3 additions & 0 deletions docs/introduction_latex.rst
Original file line number Diff line number Diff line change
Expand Up @@ -54,6 +54,8 @@ We now present further information on running and testing libEnsemble.
.. _petsc4py: https://bitbucket.org/petsc/petsc4py
.. _PETSc/TAO: http://www.mcs.anl.gov/petsc
.. _poster: https://figshare.com/articles/libEnsemble_A_Python_Library_for_Dynamic_Ensemble-Based_Computations/12559520
.. _PSI/J: https://exaworks.org/psij
.. _psi-j-python: https://github.com/ExaWorks/psi-j-python
.. _psutil: https://pypi.org/project/psutil/
.. _PyPI: https://pypi.org
.. _pytest-cov: https://pypi.org/project/pytest-cov/
Expand All @@ -72,6 +74,7 @@ We now present further information on running and testing libEnsemble.
.. _tarball: https://github.com/Libensemble/libensemble/releases/latest
.. _Tasmanian: https://tasmanian.ornl.gov/
.. _Theta: https://www.alcf.anl.gov/alcf-resources/theta
.. _tqdm: https://tqdm.github.io/
.. _user guide: https://libensemble.readthedocs.io/en/latest/programming_libE.html
.. _VTMOP: https://informs-sim.org/wsc20papers/311.pdf
.. _WarpX: https://warpx.readthedocs.io/en/latest/
Expand Down
4 changes: 4 additions & 0 deletions docs/platforms/example_scripts.rst
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,10 @@ Below are example submission scripts used to configure and launch libEnsemble
on a variety of high-powered systems. See :doc:`here<platforms_index>` for more
information about the respective systems and configuration.

Alternatively to interacting with the scheduler or configuring submission scripts,
libEnsemble now features a portable set of :ref:`command-line utilities<liberegister>`
for submitting workflows to almost any system or scheduler.

Slurm - Basic
-------------

Expand Down
141 changes: 141 additions & 0 deletions docs/running_libE.rst
Original file line number Diff line number Diff line change
Expand Up @@ -134,6 +134,147 @@ Further command line options

See the **parse_args()** function in :doc:`Convenience Tools<utilities>` for further command line options.

.. _liberegister:

liberegister / libesubmit
-------------------------

libEnsemble now features a pair of command-line utilities for preparing and launching libEnsemble workflows onto almost
any machine and any scheduler, using a `PSI/J`_ Python implementation. This is an alternative approach
to maintaining system or scheduler-specific batch submission scripts.

- `liberegister`

Creates an initial, platform-independent PSI/J serialization of a libEnsemble submission. Run this utility on
a calling script in a familiar manner::

liberegister my_calling_script.py --comms local --nworkers 4

This produces an initial `my_calling_script.json` serialization conforming to PSI/J's specification:

.. container:: toggle

.. container:: header

`my_calling_script.json`

.. code-block:: JSON

{
"version": 0.1,
"type": "JobSpec",
"data": {
"name": "libe-job",
"executable": "python",
"arguments": [
"my_calling_script.py",
"--comms",
"local",
"--nworkers",
"4"
],
"directory": null,
"inherit_environment": true,
"environment": {
"PYTHONNOUSERSITE": "1"
},
"stdin_path": null,
"stdout_path": null,
"stderr_path": null,
"resources": {
"node_count": 1,
"process_count": null,
"process_per_node": null,
"cpu_cores_per_process": null,
"gpu_cores_per_process": null,
"exclusive_node_use": true
},
"attributes": {
"duration": "30",
"queue_name": null,
"project_name": null,
"reservation_id": null,
"custom_attributes": {}
},
"launcher": null
}
}

- `libesubmit`

Further parameterizes a serialization, and submits a corresponding Job to the specified scheduler.
Running ``qsub``, ``sbatch``, etc. on some batch submission script is not needed. For instance::

libesubmit my_calling_script.json -q debug -A project -s slurm --nnodes 8

Results in::

*** libEnsemble 0.9.2+dev ***
Imported PSI/J serialization: my_calling_script.json. Preparing submission...
Calling script: my_calling_script.py
...found! Proceding.
Submitting Job!: Job[id=ce4ead75-a3a4-42a3-94ff-c44b3b2c7e61, native_id=None, executor=None, status=JobStatus[NEW, time=1658167808.5125017]]

$ squeue --long --users=user
Mon Jul 18 13:10:15 2022
JOBID PARTITION NAME USER STATE TIME TIME_LIMI NODES NODELIST(REASON)
2508936 debug ce4ead75 user PENDING 0:00 30:00 8 (Priority)

This also produces a Job-specific representation, e.g:

.. container:: toggle

.. container:: header

`8ba9de56.my_calling_script.json`

.. code-block:: JSON

{
"version": 0.1,
"type": "JobSpec",
"data": {
"name": "libe-job",
"executable": "/Users/jnavarro/miniconda3/envs/libe/bin/python3.8",
"arguments": [
"my_calling_script.py",
"--comms",
"local",
"--nworkers",
"4"
],
"directory": "/home/user/libensemble/scratch",
"inherit_environment": true,
"environment": {
"PYTHONNOUSERSITE": "1"
},
"stdin_path": null,
"stdout_path": "8ba9de56.my_calling_script.out",
"stderr_path": "8ba9de56.my_calling_script.err",
"resources": {
"node_count": 8,
"process_count": null,
"process_per_node": null,
"cpu_cores_per_process": null,
"gpu_cores_per_process": null,
"exclusive_node_use": true
},
"attributes": {
"duration": "30",
"queue_name": "debug",
"project_name": "project",
"reservation_id": null,
"custom_attributes": {}
},
"launcher": null
}
}

If libesubmit is run on a ``.json`` serialization from liberegister and can't find the
specified calling script, it'll help search for matching candidate scripts.

.. _PSI/J: https://exaworks.org/psij

Persistent Workers
------------------
.. _persis_worker:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -9,47 +9,49 @@
from libensemble.tools import parse_args, add_unique_random_streams
from libensemble.executors import MPIExecutor

# Parse number of workers, comms type, etc. from arguments
nworkers, is_manager, libE_specs, _ = parse_args()

# Initialize MPI Executor instance
exctr = MPIExecutor()

# Register simulation executable with executor
sim_app = os.path.join(os.getcwd(), "../forces_app/forces.x")

if not os.path.isfile(sim_app):
sys.exit("forces.x not found - please build first in ../forces_app dir")

exctr.register_app(full_path=sim_app, app_name="forces")

# State the sim_f, inputs, outputs
sim_specs = {
"sim_f": run_forces, # sim_f, imported above
"in": ["x"], # Name of input for sim_f
"out": [("energy", float)], # Name, type of output from sim_f
}

# State the gen_f, inputs, outputs, additional parameters
gen_specs = {
"gen_f": uniform_random_sample, # Generator function
"in": [], # Generator input
"out": [("x", float, (1,))], # Name, type and size of data from gen_f
"user": {
"lb": np.array([1000]), # User parameters for the gen_f
"ub": np.array([3000]),
"gen_batch_size": 8,
},
}

# Create and work inside separate per-simulation directories
libE_specs["sim_dirs_make"] = True

# Instruct libEnsemble to exit after this many simulations
exit_criteria = {"sim_max": 8}

# Seed random streams for each worker, particularly for gen_f
persis_info = add_unique_random_streams({}, nworkers + 1)

# Launch libEnsemble
H, persis_info, flag = libE(sim_specs, gen_specs, exit_criteria, persis_info=persis_info, libE_specs=libE_specs)
if __name__ == "__main__":

# Parse number of workers, comms type, etc. from arguments
nworkers, is_manager, libE_specs, _ = parse_args()

# Initialize MPI Executor instance
exctr = MPIExecutor()

# Register simulation executable with executor
sim_app = os.path.join(os.getcwd(), "../forces_app/forces.x")

if not os.path.isfile(sim_app):
sys.exit("forces.x not found - please build first in ../forces_app dir")

exctr.register_app(full_path=sim_app, app_name="forces")

# State the sim_f, inputs, outputs
sim_specs = {
"sim_f": run_forces, # sim_f, imported above
"in": ["x"], # Name of input for sim_f
"out": [("energy", float)], # Name, type of output from sim_f
}

# State the gen_f, inputs, outputs, additional parameters
gen_specs = {
"gen_f": uniform_random_sample, # Generator function
"in": [], # Generator input
"out": [("x", float, (1,))], # Name, type and size of data from gen_f
"user": {
"lb": np.array([1000]), # User parameters for the gen_f
"ub": np.array([3000]),
"gen_batch_size": 8,
},
}

# Create and work inside separate per-simulation directories
libE_specs["sim_dirs_make"] = True

# Instruct libEnsemble to exit after this many simulations
exit_criteria = {"sim_max": 8}

# Seed random streams for each worker, particularly for gen_f
persis_info = add_unique_random_streams({}, nworkers + 1)

# Launch libEnsemble
H, persis_info, flag = libE(sim_specs, gen_specs, exit_criteria, persis_info=persis_info, libE_specs=libE_specs)
Loading