Skip to content

Commit

Permalink
Merge pull request #19 from djf604/v0.1.8
Browse files Browse the repository at this point in the history
V0.1.8
  • Loading branch information
Dominic Fitzgerald committed Aug 29, 2018
2 parents 5979fd6 + 16344ec commit 16aa26c
Show file tree
Hide file tree
Showing 31 changed files with 841 additions and 671 deletions.
71 changes: 8 additions & 63 deletions docs/building_pipelines.rst
Expand Up @@ -152,38 +152,6 @@ A ``CondaPackage`` named tuple takes the following keys:
To see which executables are offered by Bioconda, please refer to their `package index
<https://bioconda.github.io/recipes.html>`_.

Parsl Configuration
###################
A default Parsl configuration can be provided in the event the user doesn't provide any higher-precendence Parsl
configuration. The returned ``dict`` will be fed directly to Parsl before execution.

.. code-block:: python
def parsl_configuration(self):
return {
'sites': [
{
'site': 'Local_Threads',
'auth': {'channel': None},
'execution': {
'executor': 'threads',
'provider': None,
'max_workers': 4
}
}
],
'globals': {'lazyErrors': True}
}
To better understand Parsl configuration, please refer to `their documentation
<http://parsl.readthedocs.io/en/latest/userguide/configuring.html>`_ on the subject.

.. note::

This method of configuring Parsl has very low precedence, and that's on purpose. The user is given every
opportunity to provide a configuration that works for her specific platform, so the configuration provided
by the pipeline is only meant as a desperation-style "we don't have anything else" configuration.

Pipeline Configuration
######################
The pipeline configuration contains attributes passed into the pipeline logic which may change from platform to
Expand Down Expand Up @@ -406,18 +374,18 @@ and ``Redirect`` objects.

Meta ``operon.meta.Meta``
#########################
The ``Meta`` class has only one method ``define_site()`` used to give a name to a resource configuration.
The ``Meta`` class has a method ``define_executor()`` used to give a name to a resource configuration.

.. code-block:: python
from operon.meta import Meta
Meta.define_site(name='small_site', resources={
Meta.define_executor(label='small_site', resources={
'cpu': '2',
'mem': '2G'
})
Meta.define_site(name='large_site', resources={
Meta.define_executor(label='large_site', resources={
'cpu': '8',
'mem': '50G'
})
Expand Down Expand Up @@ -480,22 +448,22 @@ dependencies are resolved.
In the above example, ``third`` won't start running until both ``first`` is finished running and the output from
``second`` called ``second.out`` is available.

Multisite Pipelines
Multiexecutor Pipelines
-------------------
For many workflows, the resource requirements of its software won't be uniform. One solution is to calculate the
largest resource need and allocate that to every software, but this leads to a large amount of unused resources. A
better solution is to define resource pools of varying size and assign software to an appropriate pool. This can be
done with the ``meta=`` keyword argument in two ways.
done with the ``meta=`` keyword argument.

The developer can define a resource configuration with a call to ``Meta.define_site()`` and then pass that name to the
The developer can define a resource configuration with a call to ``Meta.define_executor()`` and then pass that name to the
``meta=`` keyword argument:

.. code-block:: python
from operon.components import Software
from operon.meta import Meta
Meta.define_site(name='small_site', resources={
Meta.define_executor(label='small_site', resources={
'cpu': '2',
'mem': '2G'
})
Expand All @@ -505,30 +473,10 @@ The developer can define a resource configuration with a call to ``Meta.define_s
Parameter('-a', '1'),
Parameter('-b', '2'),
meta={
'site': 'small_site' # Matches the above Meta definition
}
)
The developer can also define only the resource requirements of a given software at the time of registration:

.. code-block:: python
from operon.components import Software
soft1 = Software('soft1')
soft1.register(
Parameter('-a', '1'),
Parameter('-b', '2'),
meta={
'resources': {
'cpu': '2',
'mem': '2G'
}
'executor': 'small_site' # Matches the above Meta definition
}
)
The above implicity defines a site called ``resources_(2,2G)``. For notes on how a multisite pipeline changes the
Parsl configuration, refer to the section on :ref:`Parsl configuration <parsl_configuration>`.
CodeBlock ``operon.components.CodeBlock``
#########################################
Expand Down Expand Up @@ -601,11 +549,8 @@ which stream(s) to redirect and to where on the filesystem, respectively.
.. code-block:: text
Redirect.STDOUT # >
Redirect.STDOUT_APPEND # >>
Redirect.STDERR # 2>
Redirect.STDERR_APPEND # 2>>
Redirect.BOTH # &>
Redirect.BOTH_APPEND # &>>
The order of ``Redirect`` objects passed to a ``Software`` instance, both in relation to each other and to other
``Parameter`` objects, doesn't matter. However, if more than two ``Redirect`` s are passed in, only the first two
Expand Down
39 changes: 32 additions & 7 deletions docs/changelog.rst
@@ -1,12 +1,29 @@
Changelog
=========

v0.1.1 (released Jan 2018)
--------------------------
* If the ``path=`` argument isn't provided to a ``Software`` instance, the path will attempt to populate from
``pipeline_config[software_name]['path']``
* Added ``subprogram=`` argument to ``Software``
* Made tab completer program silent if it ever fails because it doesn't exist
v0.1.8 (released 29 August 2018)
--------------------------------
* Moved to Parsl 6.0+ exclusively, now only accepting class based Parsl configurations
* Integrated TinyDB for the framework state DB
* Tab autocompleter now automatically updates itself if needed
* Relaxed requirements on the name of the pipeline class, though ``Pipeline`` is still recommended
* Removed 'append' style redirects (``>>``) because Parsl doesn't support them at this time
* Stderr redirects are now properly respected on the lefthand side of a Pipe
* Supporting information added to run logs such as using the run name in the filename, logging the original command
used to start the run, etc

v0.1.5 (released 7 June 2018)
-----------------------------
* Software execution itself can now be considered as input in the workflow graph
* Default dfk execution set down to 2 Python threads
* Fixed future state reporting
* Operon's temporary directory is now set to the same place as ``--logs-dir`` in hopes that this directory
will exist on worker nodes which may not have access to the same filesystem as the head node
* ``Data('')`` now returns an empty string
* Added a ``batch-run`` execution mode, which allows for multiple concurrent runs of a single pipeline, each
with different input and output, using the same resources pool
* Added support for multiple resources pools
* Added unit tests

v0.1.2 (released 21 Feb 2018)
-----------------------------
Expand All @@ -22,4 +39,12 @@ v0.1.2 (released 21 Feb 2018)
* Updated output for ``operon show``
* When a conda environment already exists, added ability to reinstall
* Added ``operon uninstall`` to remove pipelines
* Refactored cleanup so that it always runs, even if some programs fail during the run
* Refactored cleanup so that it always runs, even if some programs fail during the run

v0.1.1 (released Jan 2018)
--------------------------
* If the ``path=`` argument isn't provided to a ``Software`` instance, the path will attempt to populate from
``pipeline_config[software_name]['path']``
* Added ``subprogram=`` argument to ``Software``
* Made tab completer program silent if it ever fails because it doesn't exist

43 changes: 28 additions & 15 deletions docs/using_operon.rst
Expand Up @@ -88,13 +88,15 @@ The set of accepted ``pipeline-options`` is defined by the pipeline itself and a
run to run, such as input files, metadata, etc. Three options will always exist:

* ``--pipeline-config`` can point to a pipeline config to use for this run only
* ``--parsl-config`` can point to a file containing JSON that represents a Parsl config to use for this run only
* ``--parsl-config`` can point to a Python file that represents a Parsl config to use for this run only (see below)
* ``--logs-dir`` can point to a location where log files from this run should be deposited; if it doesn't exist, it
will be created; defaults to the currect directory
will be created; defaults to the current directory
* ``--run-name`` gives a name to the run, which will be used in the log filename and helps differentiate this run from
other runs

When an Operon pipeline is run, under the hood it creates a Parsl workflow which can be run in many different ways
depending on the accompanying Parl configuration. This means that while the definition for a pipeline run with the
``run`` subprogram is consistent, that actual execution model may vary if the Parsl configuration varies.
When an Operon pipeline is run, under the hood it creates a Parsl workflow which can be exectuted in different ways
depending on the accompanying Parsl configuration. This means that while the definition for a pipeline run with the
``run`` subprogram is consistent, the actual execution model may vary if the Parsl configuration varies.

.. _parsl_configuration:

Expand All @@ -105,20 +107,31 @@ Parsl is the package the powers Operon and and is responsible for Operon's power
Operon itself is only a front-end abstraction of a Parsl workflow; the actual execution model is fully
Parsl-specific and as such it's advised to check out the
`Parsl documentation <http://parsl.readthedocs.io/en/latest/>`_
to get a sense for how to design a Parls configuration for a specific need-case.
to get a sense for how to design a Parsl configuration for a specific need-case.

The Parsl configuration must be specified in a Python file where the variable name ``config`` contains an object of
type ``parsl.config.Config``:

.. code-block:: python
from parsl.config import Config
config = Config(
executors=[...],
lazy_errors=True,
retries=10
)
The ``run`` subprogram attempts to pull a Parsl configuration from the user in the following order:

1. From the command line argument ``--parsl-config``
2. From the pipeline configuration key ``parsl_config``
3. From a platform default JSON file located at ``$OPERON_HOME/.operon/parsl_config.json``
4. A default parsl configuration provided by the pipeline
5. A package default parsl configuration of 8 workers using Python threads
1. A path from the command line argument ``--parsl-config``
2. A path from the pipeline configuration key ``parsl_config``
3. A package default Parsl configuration of 2 workers using Python threads

The Parsl configuration can contain multiple sites, each with different models of execution and different available
resources. If a multisite Parsl configuration is provided to Operon, it will try to match up the site names as best as
possible and execute software on appropriate sites. Any software which can't find a Parsl configuration site match will
run in a random site. The set of site names the pipeline expects is output as a part of ``operon show``.
The Parsl configuration can contain multiple executors, each with different models of execution and different available
resources. If a multiexecutor Parsl configuration is provided to Operon, it will try to match up the executor names as best as
possible and execute software on appropriate sites. Any software which can't find a Parsl configuration executor match will
run in a random executor. The set of executor names the pipeline expects is output as a part of ``operon show``.

For more detailed information, refer to the
`Parsl documentation <http://parsl.readthedocs.io/en/latest/userguide/configuring.html>`_ on the subject.
Expand Down
3 changes: 2 additions & 1 deletion operon/__init__.py
@@ -1 +1,2 @@
__version__ = '0.1.5'
__version__ = '0.1.8'
COMPLETER_VERSION = 1
28 changes: 18 additions & 10 deletions operon/_cli/_completer.py
Expand Up @@ -2,24 +2,20 @@
import os
import sys
import subprocess
import json
try:
from operon._cli import get_operon_subcommands
from operon._util.home import OperonState
from operon import COMPLETER_VERSION
except ImportError:
sys.exit()

COMPGEN = 'compgen -W "{options}" -- "{stub}"'
VERSION = 1
SEMANTIC_VERSION = '0.1.8'


def get_pipeline_options():
operon_home_root = os.environ.get('OPERON_HOME') or os.path.expanduser('~')
operon_state_json_path = os.path.join(operon_home_root, '.operon', 'operon_state.json')
try:
with open(operon_state_json_path) as operon_state_json:
operon_state = json.load(operon_state_json)
return ' '.join(operon_state['pipelines'].keys())
except OSError:
return ''
return ' '.join([_p['name'] for _p in OperonState().db.search(OperonState().query.type == 'pipeline_record')])


def get_completion_options(options, stub):
Expand All @@ -42,7 +38,7 @@ def completer():
completion_options = ''
if num_completed_tokens == 1:
completion_options = get_completion_options(
options=' '.join(get_operon_subcommands().replace('_', '-')),
options=' '.join([_sub.replace('_', '-') for _sub in get_operon_subcommands()]),
stub=stub_token
)
elif num_completed_tokens == 2:
Expand All @@ -57,4 +53,16 @@ def completer():


if __name__ == '__main__':
# Run version check, self-update if necessary
if COMPLETER_VERSION > VERSION:
try:
import inspect
from operon._cli import _completer
completer_path = os.path.abspath(__file__)
with open(completer_path, 'w') as operon_completer:
operon_completer.write(inspect.getsource(_completer))
os.chmod(completer_path, 0o755)
except:
pass

completer()
37 changes: 35 additions & 2 deletions operon/_cli/subcommands/__init__.py
@@ -1,9 +1,14 @@
import os
import inspect

from operon._util.home import get_operon_home, load_pipeline_file
from operon._util.errors import MalformedPipelineError
from operon.components import ParslPipeline

FLATTEN = 0
MODULE_NAME = 0
MODULE_INSTANCE = 1


class BaseSubcommand(object):
home_pipelines = os.path.join(get_operon_home(), 'pipelines')
Expand All @@ -23,16 +28,44 @@ def get_pipeline_instance(self, pipeline_name):
else:
return None

# Get all classes in the pipeline file
classes_in_pipeline_mod = [
c for c in
inspect.getmembers(pipeline_mod, inspect.isclass)
if '__operon.pipeline' in str(c[MODULE_INSTANCE])
]
# If there is only one class in the pipeline file, use that class
if len(classes_in_pipeline_mod) == 1:
pipeline_class = classes_in_pipeline_mod[FLATTEN][MODULE_INSTANCE]
# If there are multiple classes, attempt to find one called Pipeline
elif len(classes_in_pipeline_mod) > 1:
try:
pipeline_class = [
c[MODULE_INSTANCE]
for c in classes_in_pipeline_mod
if c[MODULE_NAME] == 'Pipeline'
][FLATTEN]
except:
# If Pipeline does not exist, send back None
raise MalformedPipelineError(
'Pipeline file contained multiple classes, none of '
'which were called \'Pipeline\'\n'
'Try the form:\n\n'
'\tclass Pipeline(ParslPipeline):\n')
else:
# If there are zero classes found, send back None
raise MalformedPipelineError('Pipeline file has no classes')

# Return pipeline instance
try:
# Ensure Pipeline subclasses ParslPipeline
if not issubclass(pipeline_mod.Pipeline, ParslPipeline):
if not issubclass(pipeline_class, ParslPipeline):
raise MalformedPipelineError(
'Pipeline class does not subclass ParslPipeline\n'
'Try the form:\n\n'
'\tclass Pipeline(ParslPipeline):\n'
)
return pipeline_mod.Pipeline()
return pipeline_class()
except AttributeError:
# Ensure the pipeline file contains a class called Pipeline
raise MalformedPipelineError(
Expand Down

0 comments on commit 16aa26c

Please sign in to comment.