Merge pull request #19 from djf604/v0.1.8

V0.1.8
djf604 · Aug 29, 2018 · 16aa26c · 16aa26c
2 parents 5979fd6 + 16344ec
commit 16aa26c
Show file tree

Hide file tree

Showing 31 changed files with 841 additions and 671 deletions.
diff --git a/docs/building_pipelines.rst b/docs/building_pipelines.rst
@@ -152,38 +152,6 @@ A ``CondaPackage`` named tuple takes the following keys:
 To see which executables are offered by Bioconda, please refer to their `package index
 <https://bioconda.github.io/recipes.html>`_.
 
-Parsl Configuration
-###################
-A default Parsl configuration can be provided in the event the user doesn't provide any higher-precendence Parsl
-configuration. The returned ``dict`` will be fed directly to Parsl before execution.
-
-.. code-block:: python
-
-    def parsl_configuration(self):
-        return {
-            'sites': [
-                {
-                    'site': 'Local_Threads',
-                    'auth': {'channel': None},
-                    'execution': {
-                        'executor': 'threads',
-                        'provider': None,
-                        'max_workers': 4
-                    }
-                }
-            ],
-            'globals': {'lazyErrors': True}
-        }
-
-To better understand Parsl configuration, please refer to `their documentation
-<http://parsl.readthedocs.io/en/latest/userguide/configuring.html>`_ on the subject.
-
-.. note::
-
-    This method of configuring Parsl has very low precedence, and that's on purpose. The user is given every
-    opportunity to provide a configuration that works for her specific platform, so the configuration provided
-    by the pipeline is only meant as a desperation-style "we don't have anything else" configuration.
-
 Pipeline Configuration
 ######################
 The pipeline configuration contains attributes passed into the pipeline logic which may change from platform to
@@ -406,18 +374,18 @@ and ``Redirect`` objects.
 
 Meta ``operon.meta.Meta``
 #########################
-The ``Meta`` class has only one method ``define_site()`` used to give a name to a resource configuration.
+The ``Meta`` class has a method ``define_executor()`` used to give a name to a resource configuration.
 
 .. code-block:: python
 
     from operon.meta import Meta
 
-    Meta.define_site(name='small_site', resources={
+    Meta.define_executor(label='small_site', resources={
         'cpu': '2',
         'mem': '2G'
     })
 
-    Meta.define_site(name='large_site', resources={
+    Meta.define_executor(label='large_site', resources={
         'cpu': '8',
         'mem': '50G'
     })
@@ -480,22 +448,22 @@ dependencies are resolved.
 In the above example, ``third`` won't start running until both ``first`` is finished running and the output from
 ``second`` called ``second.out`` is available.
 
-Multisite Pipelines
+Multiexecutor Pipelines
 -------------------
 For many workflows, the resource requirements of its software won't be uniform. One solution is to calculate the
 largest resource need and allocate that to every software, but this leads to a large amount of unused resources. A
 better solution is to define resource pools of varying size and assign software to an appropriate pool. This can be
-done with the ``meta=`` keyword argument in two ways.
+done with the ``meta=`` keyword argument.
 
-The developer can define a resource configuration with a call to ``Meta.define_site()`` and then pass that name to the
+The developer can define a resource configuration with a call to ``Meta.define_executor()`` and then pass that name to the
 ``meta=`` keyword argument:
 
 .. code-block:: python
 
     from operon.components import Software
     from operon.meta import Meta
 
-    Meta.define_site(name='small_site', resources={
+    Meta.define_executor(label='small_site', resources={
         'cpu': '2',
         'mem': '2G'
     })
@@ -505,30 +473,10 @@ The developer can define a resource configuration with a call to ``Meta.define_s
         Parameter('-a', '1'),
         Parameter('-b', '2'),
         meta={
-            'site': 'small_site'  # Matches the above Meta definition
-        }
-    )
-
-The developer can also define only the resource requirements of a given software at the time of registration:
-
-.. code-block:: python
-
-    from operon.components import Software
-
-    soft1 = Software('soft1')
-    soft1.register(
-        Parameter('-a', '1'),
-        Parameter('-b', '2'),
-        meta={
-            'resources': {
-                'cpu': '2',
-                'mem': '2G'
-            }
+            'executor': 'small_site'  # Matches the above Meta definition
         }
     )
 
-The above implicity defines a site called ``resources_(2,2G)``. For notes on how a multisite pipeline changes the
-Parsl configuration, refer to the section on :ref:`Parsl configuration <parsl_configuration>`.
 
 CodeBlock ``operon.components.CodeBlock``
 #########################################
@@ -601,11 +549,8 @@ which stream(s) to redirect and to where on the filesystem, respectively.
 .. code-block:: text
 
     Redirect.STDOUT         # >
-    Redirect.STDOUT_APPEND  # >>
     Redirect.STDERR         # 2>
-    Redirect.STDERR_APPEND  # 2>>
     Redirect.BOTH           # &>
-    Redirect.BOTH_APPEND    # &>>
 
 The order of ``Redirect`` objects passed to a ``Software`` instance, both in relation to each other and to other
 ``Parameter`` objects, doesn't matter. However, if more than two ``Redirect`` s are passed in, only the first two

diff --git a/docs/changelog.rst b/docs/changelog.rst
@@ -1,12 +1,29 @@
 Changelog
 =========
 
-v0.1.1 (released Jan 2018)
---------------------------
-* If the ``path=`` argument isn't provided to a ``Software`` instance, the path will attempt to populate from
-  ``pipeline_config[software_name]['path']``
-* Added ``subprogram=`` argument to ``Software``
-* Made tab completer program silent if it ever fails because it doesn't exist
+v0.1.8 (released 29 August 2018)
+--------------------------------
+* Moved to Parsl 6.0+ exclusively, now only accepting class based Parsl configurations
+* Integrated TinyDB for the framework state DB
+* Tab autocompleter now automatically updates itself if needed
+* Relaxed requirements on the name of the pipeline class, though ``Pipeline`` is still recommended
+* Removed 'append' style redirects (``>>``) because Parsl doesn't support them at this time
+* Stderr redirects are now properly respected on the lefthand side of a Pipe
+* Supporting information added to run logs such as using the run name in the filename, logging the original command
+  used to start the run, etc
+
+v0.1.5 (released 7 June 2018)
+-----------------------------
+* Software execution itself can now be considered as input in the workflow graph
+* Default dfk execution set down to 2 Python threads
+* Fixed future state reporting
+* Operon's temporary directory is now set to the same place as ``--logs-dir`` in hopes that this directory
+  will exist on worker nodes which may not have access to the same filesystem as the head node
+* ``Data('')`` now returns an empty string
+* Added a ``batch-run`` execution mode, which allows for multiple concurrent runs of a single pipeline, each
+  with different input and output, using the same resources pool
+* Added support for multiple resources pools
+* Added unit tests
 
 v0.1.2 (released 21 Feb 2018)
 -----------------------------
@@ -22,4 +39,12 @@ v0.1.2 (released 21 Feb 2018)
 * Updated output for ``operon show``
 * When a conda environment already exists, added ability to reinstall
 * Added ``operon uninstall`` to remove pipelines
-* Refactored cleanup so that it always runs, even if some programs fail during the run
+* Refactored cleanup so that it always runs, even if some programs fail during the run
+
+v0.1.1 (released Jan 2018)
+--------------------------
+* If the ``path=`` argument isn't provided to a ``Software`` instance, the path will attempt to populate from
+  ``pipeline_config[software_name]['path']``
+* Added ``subprogram=`` argument to ``Software``
+* Made tab completer program silent if it ever fails because it doesn't exist
+
diff --git a/docs/using_operon.rst b/docs/using_operon.rst
@@ -88,13 +88,15 @@ The set of accepted ``pipeline-options`` is defined by the pipeline itself and a
 run to run, such as input files, metadata, etc. Three options will always exist:
 
 * ``--pipeline-config`` can point to a pipeline config to use for this run only
-* ``--parsl-config`` can point to a file containing JSON that represents a Parsl config to use for this run only
+* ``--parsl-config`` can point to a Python file that represents a Parsl config to use for this run only (see below)
 * ``--logs-dir`` can point to a location where log files from this run should be deposited; if it doesn't exist, it
-  will be created; defaults to the currect directory
+  will be created; defaults to the current directory
+* ``--run-name`` gives a name to the run, which will be used in the log filename and helps differentiate this run from
+  other runs
 
-When an Operon pipeline is run, under the hood it creates a Parsl workflow which can be run in many different ways
-depending on the accompanying Parl configuration. This means that while the definition for a pipeline run with the
-``run`` subprogram is consistent, that actual execution model may vary if the Parsl configuration varies.
+When an Operon pipeline is run, under the hood it creates a Parsl workflow which can be exectuted in different ways
+depending on the accompanying Parsl configuration. This means that while the definition for a pipeline run with the
+``run`` subprogram is consistent, the actual execution model may vary if the Parsl configuration varies.
 
 .. _parsl_configuration:
 
@@ -105,20 +107,31 @@ Parsl is the package the powers Operon and and is responsible for Operon's power
 Operon itself is only a front-end abstraction of a Parsl workflow; the actual execution model is fully
 Parsl-specific and as such it's advised to check out the
 `Parsl documentation <http://parsl.readthedocs.io/en/latest/>`_
-to get a sense for how to design a Parls configuration for a specific need-case.
+to get a sense for how to design a Parsl configuration for a specific need-case.
+
+The Parsl configuration must be specified in a Python file where the variable name ``config`` contains an object of
+type ``parsl.config.Config``:
+
+.. code-block:: python
+
+    from parsl.config import Config
+
+    config = Config(
+        executors=[...],
+        lazy_errors=True,
+        retries=10
+    )
 
 The ``run`` subprogram attempts to pull a Parsl configuration from the user in the following order:
 
-1. From the command line argument ``--parsl-config``
-2. From the pipeline configuration key ``parsl_config``
-3. From a platform default JSON file located at ``$OPERON_HOME/.operon/parsl_config.json``
-4. A default parsl configuration provided by the pipeline
-5. A package default parsl configuration of 8 workers using Python threads
+1. A path from the command line argument ``--parsl-config``
+2. A path from the pipeline configuration key ``parsl_config``
+3. A package default Parsl configuration of 2 workers using Python threads
 
-The Parsl configuration can contain multiple sites, each with different models of execution and different available
-resources. If a multisite Parsl configuration is provided to Operon, it will try to match up the site names as best as
-possible and execute software on appropriate sites. Any software which can't find a Parsl configuration site match will
-run in a random site. The set of site names the pipeline expects is output as a part of ``operon show``.
+The Parsl configuration can contain multiple executors, each with different models of execution and different available
+resources. If a multiexecutor Parsl configuration is provided to Operon, it will try to match up the executor names as best as
+possible and execute software on appropriate sites. Any software which can't find a Parsl configuration executor match will
+run in a random executor. The set of executor names the pipeline expects is output as a part of ``operon show``.
 
 For more detailed information, refer to the
 `Parsl documentation <http://parsl.readthedocs.io/en/latest/userguide/configuring.html>`_ on the subject.

diff --git a/operon/__init__.py b/operon/__init__.py
@@ -1 +1,2 @@
-__version__ = '0.1.5'
+__version__ = '0.1.8'
+COMPLETER_VERSION = 1
diff --git a/operon/_cli/_completer.py b/operon/_cli/_completer.py
@@ -2,24 +2,20 @@
 import os
 import sys
 import subprocess
-import json
 try:
     from operon._cli import get_operon_subcommands
+    from operon._util.home import OperonState
+    from operon import COMPLETER_VERSION
 except ImportError:
     sys.exit()
 
 COMPGEN = 'compgen -W "{options}" -- "{stub}"'
+VERSION = 1
+SEMANTIC_VERSION = '0.1.8'
 
 
 def get_pipeline_options():
-    operon_home_root = os.environ.get('OPERON_HOME') or os.path.expanduser('~')
-    operon_state_json_path = os.path.join(operon_home_root, '.operon', 'operon_state.json')
-    try:
-        with open(operon_state_json_path) as operon_state_json:
-            operon_state = json.load(operon_state_json)
-            return ' '.join(operon_state['pipelines'].keys())
-    except OSError:
-        return ''
+    return ' '.join([_p['name'] for _p in OperonState().db.search(OperonState().query.type == 'pipeline_record')])
 
 
 def get_completion_options(options, stub):
@@ -42,7 +38,7 @@ def completer():
     completion_options = ''
     if num_completed_tokens == 1:
         completion_options = get_completion_options(
-            options=' '.join(get_operon_subcommands().replace('_', '-')),
+            options=' '.join([_sub.replace('_', '-') for _sub in get_operon_subcommands()]),
             stub=stub_token
         )
     elif num_completed_tokens == 2:
@@ -57,4 +53,16 @@ def completer():
 
 
 if __name__ == '__main__':
+    # Run version check, self-update if necessary
+    if COMPLETER_VERSION > VERSION:
+        try:
+            import inspect
+            from operon._cli import _completer
+            completer_path = os.path.abspath(__file__)
+            with open(completer_path, 'w') as operon_completer:
+                operon_completer.write(inspect.getsource(_completer))
+            os.chmod(completer_path, 0o755)
+        except:
+            pass
+
     completer()
diff --git a/operon/_cli/subcommands/__init__.py b/operon/_cli/subcommands/__init__.py
@@ -1,9 +1,14 @@
 import os
+import inspect
 
 from operon._util.home import get_operon_home, load_pipeline_file
 from operon._util.errors import MalformedPipelineError
 from operon.components import ParslPipeline
 
+FLATTEN = 0
+MODULE_NAME = 0
+MODULE_INSTANCE = 1
+
 
 class BaseSubcommand(object):
     home_pipelines = os.path.join(get_operon_home(), 'pipelines')
@@ -23,16 +28,44 @@ def get_pipeline_instance(self, pipeline_name):
         else:
             return None
 
+        # Get all classes in the pipeline file
+        classes_in_pipeline_mod = [
+            c for c in
+            inspect.getmembers(pipeline_mod, inspect.isclass)
+            if '__operon.pipeline' in str(c[MODULE_INSTANCE])
+        ]
+        # If there is only one class in the pipeline file, use that class
+        if len(classes_in_pipeline_mod) == 1:
+            pipeline_class = classes_in_pipeline_mod[FLATTEN][MODULE_INSTANCE]
+        # If there are multiple classes, attempt to find one called Pipeline
+        elif len(classes_in_pipeline_mod) > 1:
+            try:
+                pipeline_class = [
+                    c[MODULE_INSTANCE]
+                    for c in classes_in_pipeline_mod
+                    if c[MODULE_NAME] == 'Pipeline'
+                ][FLATTEN]
+            except:
+                # If Pipeline does not exist, send back None
+                raise MalformedPipelineError(
+                    'Pipeline file contained multiple classes, none of '
+                    'which were called \'Pipeline\'\n'
+                    'Try the form:\n\n'
+                    '\tclass Pipeline(ParslPipeline):\n')
+        else:
+            # If there are zero classes found, send back None
+            raise MalformedPipelineError('Pipeline file has no classes')
+
         # Return pipeline instance
         try:
             # Ensure Pipeline subclasses ParslPipeline
-            if not issubclass(pipeline_mod.Pipeline, ParslPipeline):
+            if not issubclass(pipeline_class, ParslPipeline):
                 raise MalformedPipelineError(
                     'Pipeline class does not subclass ParslPipeline\n'
                     'Try the form:\n\n'
                     '\tclass Pipeline(ParslPipeline):\n'
                 )
-            return pipeline_mod.Pipeline()
+            return pipeline_class()
         except AttributeError:
             # Ensure the pipeline file contains a class called Pipeline
             raise MalformedPipelineError(