# Introduction to pyflow

Pyflow is a high-level language to describe suites. We aim to build object-oriented suites that are designed and maintained as software.

Pyflow acts as a compiler and library for that language, that generates ecflow suites. Internally it wraps the ecflow python library.

* Provides higher level API
* Introduces common idioms, and provides helper functionality
* Encourages the use of certain work practices
* pyflow classes wrap ecFlow classes (pyflow.Family wraps ecflow.Family)

An ecflow suite is made up of four elements:

* A visual and executable structure - the definitions
* A set of scripts
* Any additional deployable resources (e.g. list of MARS requests)
* A set of machine configurations on which to run scripts

Pyflow provides interfaces to provide and control any or all of these components.

***This course assumes a working knowledge of ecflow, and will not attempt to introduce ecflow concepts.***

# Preconfiguration of notebook

To make use of this course you will need:

 1. An environment containing Python, ecflow and pyflow. 
 2. A local ecflow server, started using the eflow_start.sh script or using an existing ecflow server. Update the `server_host` and `server_port` values to those for your server.

In [18]:
import datetime
import sys
import os
import pyflow as pf
scratchdir = os.path.join(os.path.abspath(''), 'scratch')
filesdir = os.path.join(scratchdir, 'files')
outdir = os.path.join(scratchdir, 'out')

if not os.path.exists(outdir):
    os.makedirs(outdir, exist_ok=True)

server_host = 'localhost'
server_port = 2001

# Trivial Starting Example

To get us started, consider a very simple starting suite

In [7]:
with pf.Suite('trivial',
              host=pf.LocalHost('localhost'),
              files=os.path.join(filesdir, 'trivial'),
              home=outdir,
              defstatus=pf.state.suspended) as s:
    pf.Label('flag', '')
    t1 = pf.Task('t1', script='echo "I am on $(hostname) : $ECF_HOST"')
    t2 = pf.Task('t2', script='ecflow_client --alter=change label flag "I am set" /trivial')
    t1 >> t2

You can check that ecflow is happy with everything, and see the definitions that will be generated.

In [8]:
s.check_definition()
print(s)

suite trivial
  defstatus suspended
  edit ECF_FILES '/Users/macw/git/pyflow/tutorials/course/scratch/files/trivial'
  edit ECF_HOME '/Users/macw/git/pyflow/tutorials/course/scratch/out'
  edit ECF_JOB_CMD 'bash -c 'export ECF_PORT=%ECF_PORT%; export ECF_HOST=%ECF_HOST%; export ECF_NAME=%ECF_NAME%; export ECF_PASS=%ECF_PASS%; export ECF_TRYNO=%ECF_TRYNO%; export PATH=/Users/macw/opt/miniconda3/envs/pyflow_test/bin:$PATH; ecflow_client --init="$$" && %ECF_JOB% && ecflow_client --complete || ecflow_client --abort ' 1> %ECF_JOBOUT% 2>&1 &'
  edit ECF_KILL_CMD 'pkill -15 -P %ECF_RID%'
  edit ECF_STATUS_CMD 'true'
  edit ECF_OUT '%ECF_HOME%'
  label exec_host "localhost"
  label flag ""
  task t1
  task t2
    trigger t1 eq complete
endsuite



Deploying the suite is a two-stage process, in which we generate any script files and then play the suite. These two processes are entirely independent.

Taking the software analogy; the generation stage acts as a compilation phase, and the deployment as an installation phase.

In [9]:
s.deploy_suite()
s.replace_on_server(server_host, server_port)

Overwriting existing file: /Users/macw/git/pyflow/tutorials/course/scratch/files/trivial/t1.ecf
Save /Users/macw/git/pyflow/tutorials/course/scratch/files/trivial/t1.ecf
Overwriting existing file: /Users/macw/git/pyflow/tutorials/course/scratch/files/trivial/t2.ecf
Save /Users/macw/git/pyflow/tutorials/course/scratch/files/trivial/t2.ecf


# Idiomatic pyflow suites

We aim to build object-oriented suites, which compile to ecflow output.

Ecflow suites involve the construction of three tree-structures:

 1. A graphical tree, visible to the user of the suite.
 2. A Directed Graph for execution (not necessarily a DAG, as it may contain cycles).
 3. An on-disk layout of scripts.
 
The on-disk layout for scripts is constrained by ecflow and is discussed in the section on Scripts.

## Structure of Suites

### Suite Structural Layout

We encourage pyflow users to use the python `with` statement to build the structure of the suites following the graphical ecflow tree.

Dependencies are then added to form the Directed Graph for execution.

The example below creates an initial simple suite with interdependent tasks. In software terms it is essentially an example of *procedural programming*.

In [25]:
with pf.Suite('first_suite') as s:
    
    with pf.Family('family1') as f1:
        t1 = pf.Task('t1')
        with pf.Task('t2') as t2:
            pf.Variable('FOO', 'bar')
            
        t1 >> t2
        
    with pf.Family('family2') as f2:
        t1 = pf.Task('t1')
        t2 = pf.Task('t2')
        t1 >> t2
        
    f1 >> f2
    
print(s)

suite first_suite
  edit ECF_JOB_CMD 'bash -c 'export ECF_PORT=%ECF_PORT%; export ECF_HOST=%ECF_HOST%; export ECF_NAME=%ECF_NAME%; export ECF_PASS=%ECF_PASS%; export ECF_TRYNO=%ECF_TRYNO%; export PATH=/Users/macw/opt/miniconda3/envs/pyflow_test/bin:$PATH; ecflow_client --init="$$" && %ECF_JOB% && ecflow_client --complete || ecflow_client --abort ' 1> %ECF_JOBOUT% 2>&1 &'
  edit ECF_KILL_CMD 'pkill -15 -P %ECF_RID%'
  edit ECF_STATUS_CMD 'true'
  edit ECF_OUT '%ECF_HOME%'
  label exec_host "default"
  family family1
    task t1
    task t2
      trigger t1 eq complete
      edit FOO 'bar'
  endfamily
  family family2
    trigger family1 eq complete
    task t1
    task t2
      trigger t1 eq complete
  endfamily
endsuite



## Object Oriented Suites

Whilst procedural programming can be used to build simple suites, to manage long-term lifecycles of complex suites we encourage drawing inspiration from object-oriented software development.

Suites can be split into objects that are derived from pyflow components. Suites can then be assembled from those configurable and reusable objects.

### Deriving From Task

Probably the most important pyflow class to subclass is `pf.Task`. This object describes what should be carried out as one executable unit.

Consider this ***non-object-oriented*** task definition built within a Family:

In [26]:
with pf.Family('f') as f:
    
    variables = {
        'HALF': 7,
        'LIMIT': 2*7
    }
        
    labels = {
        'a_label': 'with a value'
    }
    
    t = pf.Task('my_task', labels=labels, defstatus=pf.state.suspended, **variables)
    
    # Note that t is incomplete at this point...
    t.script = [
        'echo "This is a counting task ..."',
        'for i in $(seq 1 $HALF); do echo "count $i/$LIMIT"; done',
        'i=$[$HALF+1]; while [ $i -lt $LIMIT ]; do echo "count $i/$LIMIT" ; i=$[$i+1]; done'
    ]
        
print(f)

  family f
    task my_task
      defstatus suspended
      edit HALF '7'
      edit LIMIT '14'
      label a_label "with a value"
  endfamily



As a suite grows, and the number of tasks increases, the complexity of managing all of these components becomes prohibitive.

We wish to *encapsulate* all of the functionality related to this task into a single object. As we want to reuse functionality we organise objects into classes. These classes should be appropriately configurable.

As the number of tasks increases, we can re-use the class to create objects with similar behaviour. This in turn will dramatically reduce the complexity of the Families and then of the Suites.

The above task should now be defined as a reusable class, as below.

In [27]:
class MyTask(pf.Task):
    
    """Counts to the double of a number, first half using a for loop then a while loop"""
    
    def __init__(self, name, default_value=0, **kwargs):
        
        variables = {
            'HALF': default_value,
            'LIMIT': 2*default_value,
        }
        variables.update(**kwargs)
        
        labels = {
            'counter_label': 'count to {}'.format(2*default_value)
        }
        
        script = [
            'echo "This is a counting task named {}"'.format(name),
            'for i in $(seq 1 $HALF); do echo "count $i/$LIMIT"; done',
            'i=$[$HALF+1]; while [ $i -lt $LIMIT ]; do echo "count $i/$LIMIT" ; i=$[$i+1]; done'
        ]
        
        super().__init__(name,
                         script=script,
                         labels=labels,
                         **variables)

with pf.Suite('CountingSuite', files=os.path.join(filesdir, 'CountingSuite')) as s:
    with pf.Family('F') as f:
        MyTask('Seven', 7, defstatus=pf.state.suspended)
        MyTask('Five', 5)
    
print(s)
s.deploy_suite()
s.replace_on_server(server_host, server_port)

suite CountingSuite
  edit ECF_FILES '/Users/macw/git/pyflow/tutorials/course/scratch/files/CountingSuite'
  edit ECF_JOB_CMD 'bash -c 'export ECF_PORT=%ECF_PORT%; export ECF_HOST=%ECF_HOST%; export ECF_NAME=%ECF_NAME%; export ECF_PASS=%ECF_PASS%; export ECF_TRYNO=%ECF_TRYNO%; export PATH=/Users/macw/opt/miniconda3/envs/pyflow_test/bin:$PATH; ecflow_client --init="$$" && %ECF_JOB% && ecflow_client --complete || ecflow_client --abort ' 1> %ECF_JOBOUT% 2>&1 &'
  edit ECF_KILL_CMD 'pkill -15 -P %ECF_RID%'
  edit ECF_STATUS_CMD 'true'
  edit ECF_OUT '%ECF_HOME%'
  label exec_host "default"
  family F
    task Seven
      defstatus suspended
      edit HALF '7'
      edit LIMIT '14'
      label counter_label "count to 14"
    task Five
      edit HALF '5'
      edit LIMIT '10'
      label counter_label "count to 10"
  endfamily
endsuite

Overwriting existing file: /Users/macw/git/pyflow/tutorials/course/scratch/files/CountingSuite/Seven.ecf
Save /Users/macw/git/pyflow/tutorials/course/scrat

### Deriving from Family and other pyflow objects

The same process can be used for deriving from Families or other pyflow-related classes. In this manner we can build up configurable functionality piece by piece.

Note how the family takes an input parameter, `counters`, to control how many tasks it generates internally.

In [28]:
class MyFamily(pf.Family):
    
    def __init__(self, name, counters, **kwargs):
        
        labels = {
            'total_counters': counters
        }
        
        super().__init__(name, labels=labels, **kwargs)
        
        with self:
            pf.sequence(MyTask('{}_{}'.format(name,i), i) for i in range(counters))
         
with pf.Suite('CountingSuite', files=os.path.join(filesdir, 'CountingSuite')) as s:
    print(MyFamily('TaskCounter', 7))

s.deploy_suite()
s.replace_on_server(server_host, server_port)

  family TaskCounter
    label total_counters "7"
    task TaskCounter_0
      edit HALF '0'
      edit LIMIT '0'
      label counter_label "count to 0"
    task TaskCounter_1
      trigger TaskCounter_0 eq complete
      edit HALF '1'
      edit LIMIT '2'
      label counter_label "count to 2"
    task TaskCounter_2
      trigger TaskCounter_1 eq complete
      edit HALF '2'
      edit LIMIT '4'
      label counter_label "count to 4"
    task TaskCounter_3
      trigger TaskCounter_2 eq complete
      edit HALF '3'
      edit LIMIT '6'
      label counter_label "count to 6"
    task TaskCounter_4
      trigger TaskCounter_3 eq complete
      edit HALF '4'
      edit LIMIT '8'
      label counter_label "count to 8"
    task TaskCounter_5
      trigger TaskCounter_4 eq complete
      edit HALF '5'
      edit LIMIT '10'
      label counter_label "count to 10"
    task TaskCounter_6
      trigger TaskCounter_5 eq complete
      edit HALF '6'
      edit LIMIT '12'
      label counter_label

### Composing Suites from Reusable Components

All objects in the suite can be constructed and configured. It is worth noting that the derived class can be used within python `with` statements in the same way as the base classes. This allows us to set some values or defaults without *forcing* us to build the entire suite inside the constructor of a derived type.

Here we define a CourseSuite class which we will use throughout this notebook to facilitate working with the notebook environment and the local ecflow server.

In [29]:
class CourseSuite(pf.Suite):
    """
    This CourseSuite object will be used throughout the course to provide sensible
    defaults without verbosity
    """
    def __init__(self, name, **kwargs):
        
        config = {
            'host': pf.LocalHost('localhost'),
            'files': os.path.join(filesdir, name),
            'home': outdir,
            'defstatus': pf.state.suspended
        }
        config.update(kwargs)
        
        super().__init__(name, **config)

         
with CourseSuite('configurable_suite') as s:
    MyFamily('fam1', 3)
    MyFamily('fam2', 5)
    
print(s)
s.deploy_suite()
s.replace_on_server(server_host, server_port)

suite configurable_suite
  defstatus suspended
  edit ECF_FILES '/Users/macw/git/pyflow/tutorials/course/scratch/files/configurable_suite'
  edit ECF_HOME '/Users/macw/git/pyflow/tutorials/course/scratch/out'
  edit ECF_JOB_CMD 'bash -c 'export ECF_PORT=%ECF_PORT%; export ECF_HOST=%ECF_HOST%; export ECF_NAME=%ECF_NAME%; export ECF_PASS=%ECF_PASS%; export ECF_TRYNO=%ECF_TRYNO%; export PATH=/Users/macw/opt/miniconda3/envs/pyflow_test/bin:$PATH; ecflow_client --init="$$" && %ECF_JOB% && ecflow_client --complete || ecflow_client --abort ' 1> %ECF_JOBOUT% 2>&1 &'
  edit ECF_KILL_CMD 'pkill -15 -P %ECF_RID%'
  edit ECF_STATUS_CMD 'true'
  edit ECF_OUT '%ECF_HOME%'
  label exec_host "localhost"
  family fam1
    label total_counters "3"
    task fam1_0
      edit HALF '0'
      edit LIMIT '0'
      label counter_label "count to 0"
    task fam1_1
      trigger fam1_0 eq complete
      edit HALF '1'
      edit LIMIT '2'
      label counter_label "count to 2"
    task fam1_2
      trigger fam1_

Pyflow aims to provide a library of commonly used abstract functionality, but suites should aim to build and collect classes of internally useful functionality which can be used to build a suite out of relevant objects.

### Utility Tasks for this Notebook

We define here some useful Tasks that we use throughout this course, to build object-oriented suites. The details of how they work are covered later.

In [30]:
class LabelSetter(pf.Task):
    
    def __init__(self, *args, **kwargs):
        """
        Accepts a sequence of label-value tuples
        """
        script = [
            pf.TemplateScript(
                'ecflow_client --alter=change label {{ LABEL.name }} "{{ VALUE }}" {{ LABEL.parent.fullname }}',
                LABEL=label, VALUE=value
            ) for label, value in args
        ]
        
        name = kwargs.pop('name', 'set_labels')
        super().__init__(name, script=script, **kwargs)
        
        
class WaitSeconds(pf.Task):
    def __init__(self, seconds, **kwargs):
        name = kwargs.pop('name', 'wait_{}'.format(seconds))
        super().__init__(name, script='sleep {}'.format(seconds), **kwargs)

### Configurabilty of Suites, Families, Tasks, Hosts ...

We build such library of classes and objects so we can re-use these components (Tasks, Families, Suites) in different contexts. A given task class could be used in a research workflow and then reused in another operational workflow.

However different contexts may require some differences in the suite execution. To ensure that we still have a concise, maintainable and easily checkable suite, we need to cater for those differences preferably in a single entity (as opposed to spreadout through the suite).

To that aim, we introduce the use of a _configuration object_ that will handle the differences, and therefore interact and configure our objects under each different context.

This results in suites that are _configurable_ for different use-cases and different contexts and build fundamentally different generated suites from the same components

A configuration object can be constructed manually for different use cases or as a result of parsing configuration files. It can be used to:

 * Provide constants and data for specific cases, that will be needed in the suites.
 * Switch functionality on/off or modify it.
 * Configuration for hosts where to run the tasks.
 * Locations of and details of data to process.
 
But most importantly, as objects, these configuration objects can be programmable in themselves (can include code). The suite components can delegate part of the suite definition to these _configurators_ and as such the structure of the suite can be determined by logic in the configuration object if necessary.

***Delegation is preferred over conditional 'if' statements in the suite depending on configuration values***

In [31]:
class BaseConfig:
    """This is a very contrived example showing delegation of behaviour to configuration"""
    
    def __init__(self, name, common_count=3, unit_count=4, integration_count=5):
        self.name = name
        self.common_count=common_count
        self.unit_count = unit_count
        self.integration_count = integration_count
    
    def build_unit_tests(self):
        pass
    
    def build_integration_tests(self):
        pass    
    
    
class ProductionConfig(BaseConfig): 
    def build_integration_tests(self):
        with pf.Family('integration') as f:
            pf.sequence(MyTask('integration_{}'.format(i), 123*i) for i in range(self.integration_count))
        return f
            
    
class DevConfig(BaseConfig):
    def build_unit_tests(self):
        with pf.Family('unit') as f:
            pf.sequence(MyTask('unit_{}'.format(i), 123*i) for i in range(self.unit_count))
        return f

We can now build a common testing family that behaves (structurally) differently according to the configuration supplied.

In [32]:
class ConfiguredFamily(pf.Family):
    def __init__(self, config):
        super().__init__(config.name)
        
        with self:

            # the static part of the suite, common to all suites of this type
            
            with pf.Family('common') as common:
                pf.sequence(MyTask('common_{}'.format(i), 123*i) for i in range(5))

            # the dynamic part of the suite, with hooks for the variability

            test_families = [
                config.build_unit_tests(),
                config.build_integration_tests()
            ]

            # some other static of the suite

            with pf.Family('cleanup') as cleanup:
                MyTask('cleaner')
            
            # establish dependencies
            
            common >> cleanup
            for f in test_families:
                if f is not None:
                    common >> f >> cleanup


In [33]:
with CourseSuite('configuration_example') as s:
    
    ConfiguredFamily(ProductionConfig('prod', integration_count=3))
    
    ConfiguredFamily(DevConfig('dev', unit_count=25))

print(s.prod)
print('\n------------------------------------\n')
print(s.dev)
s.deploy_suite()
s.replace_on_server(server_host, server_port)

  family prod
    family common
      task common_0
        edit HALF '0'
        edit LIMIT '0'
        label counter_label "count to 0"
      task common_1
        trigger common_0 eq complete
        edit HALF '123'
        edit LIMIT '246'
        label counter_label "count to 246"
      task common_2
        trigger common_1 eq complete
        edit HALF '246'
        edit LIMIT '492'
        label counter_label "count to 492"
      task common_3
        trigger common_2 eq complete
        edit HALF '369'
        edit LIMIT '738'
        label counter_label "count to 738"
      task common_4
        trigger common_3 eq complete
        edit HALF '492'
        edit LIMIT '984'
        label counter_label "count to 984"
    endfamily
    family integration
      trigger common eq complete
      task integration_0
        edit HALF '0'
        edit LIMIT '0'
        label counter_label "count to 0"
      task integration_1
        trigger integration_0 eq complete
        edit HALF 

### Pyflow Families and AnchorFamilies

The Family class provides the fundamental visual block of pyflow. Families provide two distinct roles within suites:

 1. Visually grouping related families/tasks
 2. Logically grouping related families/tasks from an execution perspective
 
Due to constraints imposed by the order in which ecflow searches for scripts within the configured `files` location, by default ***all*** tasks with the same name must share the same script located in the `files` directory (if scripts are deployed by pyflow, they will be deployed to this directory). This means that tasks with the same name must either be avoided, or written to have identical scripts, and is a significant constraint on encapsulation in object-oriented suite design.

For simple agregation of tasks, it is encouraged to use `pf.Family` or derive from it.
This provides minimal encapsulation of tasks, but not of scripts.
All tasks with the same name will share the same script.

In [34]:
with CourseSuite('family_example') as s:
    with pf.Family('simple', labels={'example': ''}) as f:
        LabelSetter((f.example, 'example text'))
    MyFamily('derived_family', 5)
print(s)

suite family_example
  defstatus suspended
  edit ECF_FILES '/Users/macw/git/pyflow/tutorials/course/scratch/files/family_example'
  edit ECF_HOME '/Users/macw/git/pyflow/tutorials/course/scratch/out'
  edit ECF_JOB_CMD 'bash -c 'export ECF_PORT=%ECF_PORT%; export ECF_HOST=%ECF_HOST%; export ECF_NAME=%ECF_NAME%; export ECF_PASS=%ECF_PASS%; export ECF_TRYNO=%ECF_TRYNO%; export PATH=/Users/macw/opt/miniconda3/envs/pyflow_test/bin:$PATH; ecflow_client --init="$$" && %ECF_JOB% && ecflow_client --complete || ecflow_client --abort ' 1> %ECF_JOBOUT% 2>&1 &'
  edit ECF_KILL_CMD 'pkill -15 -P %ECF_RID%'
  edit ECF_STATUS_CMD 'true'
  edit ECF_OUT '%ECF_HOME%'
  label exec_host "localhost"
  family simple
    label example ""
    task set_labels
  endfamily
  family derived_family
    label total_counters "5"
    task derived_family_0
      edit HALF '0'
      edit LIMIT '0'
      label counter_label "count to 0"
    task derived_family_1
      trigger derived_family_0 eq complete
      edit HAL

For more complex functionality containing groups of tasks that require encapsulation we encourage the use of `AnchorFamily`.

The `AnchorFamily` class updates the `files` location according to the relative path of the family from the suite (or previous `AnchorFamily`). Within an `AnchorFamily`, all script lookups are relative to this new location, providing isolation and encapsulation.

All tasks with the same name *within an `AnchorFamily`* **must share the same script** located in the `files` location *for that `AnchorFamily`*.

As such it is encouraged to:
 * Use `AnchorFamily` to encapsulate independent units within a suite. Typically these are the subtrees that make sense to deploy as a whole.
 * Use `Family` to aggregate tasks that could share scripts with each other. This can be within an `AnchorFamily`.

The following example shows a suite with identical task names using different scripts, by scoping them with the `AnchorFamily`. These scripts are located and managed externally to the suite (in the `course` folder).

In [35]:
with CourseSuite('anchor_families', files=os.path.join(os.path.abspath(''), 'anchor_families')) as s:
    with pf.Family('f1'):
        pf.Task('test1')        # Script <files>/test1.ecf
    with pf.Family('f2'):
        pf.Task('test1')        # Script <files>/test1.ecf
    with pf.AnchorFamily('f'):
        with pf.Family('f1'):
            pf.Task('test1')    # Script <files>/f/test1.ecf
            pf.Task('test2')    # Script <files>/f/test2.ecf
        with pf.Family('f2'):
            pf.Task('test2')    # Script <files>/f/test2.ecf
            
s.replace_on_server(server_host, server_port)

This supports 2 ways of attaching scripts to identical `Tasks` with different parameters:

 * Generate one script per task containing the parameters
 * Use one script that is parameterised by the `Variables` on the `Families` and `Tasks`

### Layout of suites in repository

Object-oriented suites imply a certain breakdown of suites into components. These classes should be placed into different files within a repository.

Sub components of suites should be placed in their own python submodule.

We encourage maintaining Configurations independently to the suite structure.

As an example, we have (part of) the filesystem layout for the MARS testing suites:

```
repo/
 ├─ configuration/
 │   ├─ marsdev.yaml
 │   ├─ fdb-server-dev.yaml
 │   └─ ...
 │  
 ├─ server/
 │   ├─ deployment/
 │   │   ├─ scripts/
 │   │   │   ├─ configure.sh
 │   │   │   ├─ build.sh
 │   │   │   └─ ...
 │   │   ├─ __init__.py
 │   │   ├─ deployment_family.py
 │   │   ├─ config.py
 │   │   └─ ...
 │   │  
 │   ├─ tests/
 │   │   ├─ fdb-tools/
 │   │   │   ├─ scripts/
 │   │   │   │   ├─ write/
 │   │   │   │   │   ├─ simple.sh
 │   │   │   │   │   ├─ masking.sh
 │   │   │   │   │   └─ ...
 │   │   │   │   └─ ...
 │   │   │   │
 │   │   │   ├─ __init__.py
 │   │   │   ├─ tools_family.py
 │   │   │   ├─ fdb_write.py
 │   │   │   ├─ fdb_wipe.py
 │   │   │   └─ ...
 │   │   └─ ... 
 │   │
 │   ├─ __init__.py
 │   ├─ server_family.py
 │   └─ ...
 │   
 ├─ client/
 │   └─ ...
 │ 
 ├─ mars_flow.py
 └─ ...
```

# Worked Example: Building a Configurable, Object-Oriented Suite

In this section we construct of a component of a test suite which will obtain testing data from MARS, perform some "test" on it, and then clean up after itself. This demonstrates a number of characteristics of object-oriented suite design:

 1. Functionality that is configurable on a data description.
 2. Functionality that is encapsulated in re-usable subcomponents
 3. Delegation or inheritance to fine-tune behaviour within an existing framework

Firstly we create a helper class that can understand MARS requests, and output them in a useful format.

In [36]:
class MarsRequest:
    
    separator = ",\n    "
    
    def __init__(self, verb, request_dict):
        self._verb = verb
        self._request_dict = request_dict
        
    def __str__(self):
        return (
            self._verb +
            self.separator +            
            self.separator.join("{}={}".format(k, self._resolve(v)) for k, v in self._request_dict.items())
        )
        
    @staticmethod
    def _resolve(v):
        '''Convert values into something understood by MARS'''
        if isinstance(v, bool):
            return "on" if v else "off"
        if isinstance(v, list):
            return '/'.join(MarsRequest._resolve(vv) for vv in v)
        if isinstance(v, str) and ('/' in v or '$' in v):
            return '"{}"'.format(v)
        return str(v)        

These requests are useful in the context of a MarsTask. This makes use of the MarsRequest object defined above to do something in the current working directory. It also creates a label for monitoring in ecflow and a timers file for diagnostics according to the environment variables understood by MARS.

In [37]:
mars_task_script = """
req=$(mktemp req.XXXX)
cat > $req <<@
{{ REQUEST }}
@
mars $req
rm $req
"""


class MarsTask(pf.Task):
    
    verb = None
    
    def __init__(self, request_dict, **kwargs):
        
        # Construct a MarsRequest object from the dictionary supplied
        assert self.verb is not None
        request = MarsRequest(self.verb, request_dict)
        
        name = kwargs.get('name', "{}_data".format(self.verb))
        
        super().__init__(name,
                         labels={'info': ''},
                         script=pf.TemplateScript(mars_task_script, REQUEST=request),
                         **kwargs)
        
        self.script.define_environment_variable('MARS_ECFLOW_LABEL', self.info)
        self.script.define_environment_variable('MARS_TIMERS_FILE', "{}.timers".format(name))
        
        
class ArchiveTask(MarsTask):
    verb = 'archive'
    
class RetrieveTask(MarsTask):
    verb = 'retrieve'

There are two major object-oriented approaches to making encapsulated: inheritance and delegation.

## Suite Objects using Inheritance

In this first example we are going to choose to use *inheritance*, although this is a fairly arbitrary choice. Which is desirable depends very much on context. We are also going to avoid using the `ArchiveTask` defined above just to avoid having to put lots of safety-related code into these examples.

We wish to define a standard test pattern. This will:

 1. create a temporary (scratch) directory within the scratch space configured for the given host.
 2. Retrieve testing data, which is specified by the derived class
 3. Run a test, which is defined by the derived class
 4. Clean up after ourselves

In [38]:
class Cleanup(pf.Task):
    def __init__(self, path, name='cleanup', **kwargs):
        assert path != "/"
        super().__init__(name, script='rm -rf "{}"'.format(path), **kwargs)
      
    
class TestBase(pf.AnchorFamily):
    
    """This class is an interface"""
    
    def __init__(self, name, **kwargs):
        super().__init__(name, **kwargs)
        
        # Generate a unique working directory
        self._workdir = os.path.join(scratchdir,
                                     self.suite.name, self.fullname.replace('/', '_'))
        
        # Ensure that the data gets put somewhere
        self._data_filename = 'retrieved.grib'
        request = self.request_dict().copy()
        request['target'] = self._data_filename
        
        with self:
            (
                RetrieveTask(request, workdir=self._workdir)
                >>
                self.build_test()
                >>
                Cleanup(self._workdir)
            )
            
    def request_dict(self):
        raise NotImplementedError("abstract base property")
        
    def build_test(self):
        raise NotImplementedError("abstract base method")

Classes should be derived from this abstract base test class, implementing the `request_dict` property and `build_test` methods. These derived classes can be further derived, or set up according to configuration passed in from outside.

In [39]:
class GribLsTest(TestBase):
    def __init__(self, date, param, **kwargs):
        self._date = date
        self._param = param
        name = kwargs.pop('name', 'grib_ls')
        super().__init__(name, **kwargs)
        
    def request_dict(self):
        return {
            'class': 'od',
            'expver': '0001',
            'stream': 'oper',
            'date': self._date,
            'time': [0, 12],
            'step': 0,
            'type': 'an',
            'levtype': 'ml',
            'levelist': 1,
            'param': self._param,
        }
    
    def build_test(self):
        return pf.Task('grib_ls', workdir=self._workdir, script='grib_ls -m {}'.format(self._data_filename))

In [40]:
class LsTest(TestBase):
    def __init__(self, **kwargs):
        super().__init__('ls', **kwargs)
        
    def request_dict(self):
        return {
            'class': 'od',
            'expver': '0001',
            'stream': 'oper',
            'date': -1,
            'time': [0, 12],
            'step': 0,
            'type': 'an',
            'levtype': 'ml',
            'levelist': 1,
            'param': 't',
        }
    
    def build_test(self):
        with pf.Family('test_family') as f:
            pf.Task('ls', workdir=self._workdir, script='ls -l {}'.format(self._data_filename))
        return f

These tests can be combined inside a suite:

In [41]:
with CourseSuite('inheritance_example') as s:
    with pf.Family('tests'):
        (
            GribLsTest(datetime.date.today() - datetime.timedelta(days=2), 't')
            >>
            GribLsTest(datetime.date.today() - datetime.timedelta(days=1), 'z', name='grib_ls_2')
            >>
            LsTest()
        )
    
s.deploy_suite()
s.replace_on_server(server_host, server_port)

Overwriting existing file: /Users/macw/git/pyflow/tutorials/course/scratch/files/inheritance_example/tests/grib_ls/retrieve_data.ecf
Save /Users/macw/git/pyflow/tutorials/course/scratch/files/inheritance_example/tests/grib_ls/retrieve_data.ecf
Overwriting existing file: /Users/macw/git/pyflow/tutorials/course/scratch/files/inheritance_example/tests/grib_ls/grib_ls.ecf
Save /Users/macw/git/pyflow/tutorials/course/scratch/files/inheritance_example/tests/grib_ls/grib_ls.ecf
Overwriting existing file: /Users/macw/git/pyflow/tutorials/course/scratch/files/inheritance_example/tests/grib_ls/cleanup.ecf
Save /Users/macw/git/pyflow/tutorials/course/scratch/files/inheritance_example/tests/grib_ls/cleanup.ecf
Overwriting existing file: /Users/macw/git/pyflow/tutorials/course/scratch/files/inheritance_example/tests/grib_ls_2/retrieve_data.ecf
Save /Users/macw/git/pyflow/tutorials/course/scratch/files/inheritance_example/tests/grib_ls_2/retrieve_data.ecf
Overwriting existing file: /Users/macw/git/p

## Suite Objects using Delegation

Alternatively, we can take the approach of delegation such that decisions about the data request to use and the test to construct are delegated to a configuration object that is injected from the controlling scope. If we do this then the resultant `Test` class is now a concrete class (and we no longer need to derive from it), changing the structure of the suite somewhat.

In this case, we build our Test class to delegate the construction to a config object whose type is unknown.

In [42]:
class DelegatingTest(pf.AnchorFamily):
    def __init__(self, config, **kwargs):
        
        name = config.name
        super().__init__(name, **kwargs)
        
        # Generate a unique working directory
        workdir = os.path.join(scratchdir,
                               self.suite.name, self.fullname.replace('/', '_'))
        
        # Ensure that the data gets put somewhere
        data_filename = 'retrieved.grib'
        request = config.request_dict.copy()
        request['target'] = data_filename
        
        with self:
            (
                RetrieveTask(request, workdir=workdir)
                >>
                config.build_test(workdir, data_filename)
                >>
                Cleanup(workdir)
            )

We can now create config classes that provide this functionality. They do not have to be built in the same way, or related to each other in any way other than that they provide the given functionality.

In [43]:
class LsConfig:
    name = 'ls'
    request_dict = {
        'class': 'od',
        'expver': '0001',
        'stream': 'oper',
        'date': -1,
        'time': [0, 12],
        'step': 0,
        'type': 'an',
        'levtype': 'ml',
        'levelist': 1,
        'param': 't',
    }
    
    @staticmethod
    def build_test(workdir, data_filename):
        with pf.Family('test_family') as f:
            return pf.Task('ls', workdir=workdir, script='ls -l {}'.format(data_filename))

In [44]:
class GribLsConfig:
    
    def __init__(self, date, param, name='grib_ls'):
        self.name = name
        self._date = date
        self._param = param
        
    @property
    def request_dict(self):
        return {
            'class': 'od',
            'expver': '0001',
            'stream': 'oper',
            'date': self._date,
            'time': [0, 12],
            'step': 0,
            'type': 'an',
            'levtype': 'ml',
            'levelist': 1,
            'param': self._param,
        }
    
    def build_test(self, workdir, data_filename):
        return pf.Task('grib_ls', workdir=workdir, script='grib_ls -m {}'.format(data_filename))

We can then construct a combined configuration object

In [45]:
class CombinedConfig:
    def __init__(self):
        self.tests = [
            GribLsConfig(datetime.date.today() - datetime.timedelta(days=2), 't'),
            GribLsConfig(datetime.date.today() - datetime.timedelta(days=1), 'z', name='grib_ls_2'),
            LsConfig # n.b. here we just used a raw class.
        ]

And we then configure the suite with the config object

In [46]:
class DelegatedSuite(CourseSuite):
    def __init__(self, config):
        super().__init__('delegated_example')
        
        with self:
            pf.sequence(DelegatingTest(test_cfg) for test_cfg in config.tests)
            
s = DelegatedSuite(CombinedConfig())
s.deploy_suite()
s.replace_on_server(server_host, server_port)

Overwriting existing file: /Users/macw/git/pyflow/tutorials/course/scratch/files/delegated_example/grib_ls/retrieve_data.ecf
Save /Users/macw/git/pyflow/tutorials/course/scratch/files/delegated_example/grib_ls/retrieve_data.ecf
Overwriting existing file: /Users/macw/git/pyflow/tutorials/course/scratch/files/delegated_example/grib_ls/grib_ls.ecf
Save /Users/macw/git/pyflow/tutorials/course/scratch/files/delegated_example/grib_ls/grib_ls.ecf
Overwriting existing file: /Users/macw/git/pyflow/tutorials/course/scratch/files/delegated_example/grib_ls/cleanup.ecf
Save /Users/macw/git/pyflow/tutorials/course/scratch/files/delegated_example/grib_ls/cleanup.ecf
Overwriting existing file: /Users/macw/git/pyflow/tutorials/course/scratch/files/delegated_example/grib_ls_2/retrieve_data.ecf
Save /Users/macw/git/pyflow/tutorials/course/scratch/files/delegated_example/grib_ls_2/retrieve_data.ecf
Overwriting existing file: /Users/macw/git/pyflow/tutorials/course/scratch/files/delegated_example/grib_ls_2

# Attributes on Suites, Families and Tasks

### Creation of Attributes - All Methods

Typically, we have three methods to construct Attributes (or sub nodes) attached to any specific node. We give here examples both within a simple tree formulation of a suite, or within a class derived from a specific pyflow class.

These different methods have different constraints on them, and differ in clarity and legibility in different contexts. Ultimately, the choic of which to use should come down to which is most legible in context.

Firstly, we can construct the pyflow object within a context manager containing the parent node.

In [47]:
with pf.Suite('s', host=pf.NullHost()) as s:
    with pf.Family('f') as f:
        pf.Label('l', 'text')
        pf.Variable('V', 'value')
print(s)

suite s
  family f
    edit V 'value'
    label l "text"
  endfamily
endsuite



In [48]:
class DerivedFamily(pf.Family):
    def __init__(self):
        super().__init__('f')
        with self:
            pf.Label('l', 'text')
            pf.Variable('V', 'value')

with pf.Suite('s', host=pf.NullHost()) as s:
    DerivedFamily()
print(s)

suite s
  family f
    edit V 'value'
    label l "text"
  endfamily
endsuite



Secondly, objects can be allocated by using keyword arguments on the parent node constructor. These take three forms:

 1. For an attribute of which there can only be one instance, the keyword argument is the lower-case string of the attribute class name. E.g. "script=".
 2. For an attribute of which there cane be multiple instances, the keyword argument is the lower-case, pluralised version of the class name. E.g. "labels=", and accepts a list or tuple.
 3. Ecflow variables are passed in as direct keyword arguments, identified by being capitalised and valid ecflow variable names.

In [49]:
with pf.Suite('s', host=pf.NullHost()) as s:
    pf.Family('f', labels={'l': 'text'}, V='value')
print(s)

suite s
  family f
    edit V 'value'
    label l "text"
  endfamily
endsuite



In [50]:
class DerivedFamily(pf.Family):
    def __init__(self, **kwargs):
        
        variables = {'V': 'value'}
        variables.update(kwargs)
        
        labels = {'l': 'text'}
        
        super().__init__('f', labels=labels, **variables)

with pf.Suite('s', host=pf.NullHost()) as s:
    DerivedFamily()
print(s)

suite s
  family f
    edit V 'value'
    label l "text"
  endfamily
endsuite



Finally, unambiguously named pyflow objects (variables, script, ...) can be directly assigned to their parent nodes.

In [51]:
with pf.Suite('s', host=pf.NullHost()) as s:
    f = pf.Family('f')
    f.V = 'value' 
        
print(s)

suite s
  family f
    edit V 'value'
  endfamily
endsuite



### Best Practice for Variables and Attributes

Best practice for pyflow is to create derived types that encapsulate all of the concerns of a given class. This means that variable and attribute creation should occur within the constructor of the class being written. This should generally take the form of a setup section, in which various children are defined, before passing them through to the constructor of the superclass. Any structural children should then be defined below.

**NOTE: Explain where/why super() should be called with respect to validity of 'self'**

In [52]:
class ExampleFamily(pf.Family):
    def __init__(self, name, example_value, initial_label, **kwargs):
        
        # This structure allows the kwargs to override any of these variables if needed, or
        # to set other more general properties of the superclass (such as host=). The same
        # effect could be achieved by using kwargs.setdefault(...) and passing kwargs through.
        variables = {
            'REQUIRED_VARIABLE': 'required_value',
            'EXAMPLE_VARIABLE': example_value
        }
        variables.update(kwargs)
        
        labels = {
            'a_label': initial_label
        }
        
        super().__init__(name, labels=labels, **variables)
        
        # Here we define structural children
        with self:
            (
                MyFamily('f1')
                >>
                MyTask('t1')
            )

### Variable substitition and expansion

Variables and attributes can be directly referred to in scripts by making use of automatically exported environment variables of the same name. For example, a `RepeatDate('YMD', ...)` object may be referred to in a script by writing `$YMD`. This will be automatically detected by pyflow and the variable exported.

If generating scripts, or using the templating engine, pyflow objects can generate their own representations. The `str()` and `repr()` functions in python will return representations of variables that can be used in scripts (after automatic variable exporting) and in technical contexts (pre variable exporting, such as in other ecflow variables) respectively.

We can access the properties of an ecflow `Variable` programatically. This allows us to make interdependencies explicit, and to generate snippets within scripts that are guaranteed to correctly use the objcets.

In [53]:
with pf.Suite('s'):
    v = pf.Variable('A_VARIABLE', 1234)
    
print(str(v), repr(v), v.value)
print(v.name, v.fullname)

$A_VARIABLE %A_VARIABLE% 1234
A_VARIABLE /s:A_VARIABLE


This allows us to automatically generate the correct shell-expansion of variables in the appropriate script context. Note that both python string substitution and Jinja2 templating use the `str()` representation by default.

In [54]:
text_script = 'echo "Variable value: {}"'.format(v)
print(text_script)

echo "Variable value: $A_VARIABLE"


In [55]:
templated_script = pf.TemplateScript(
    'echo "variable {{ VARIABLE.name }} has value {{ VARIABLE }}"',
    VARIABLE=v
)
print(templated_script)

echo "variable A_VARIABLE has value $A_VARIABLE"


Other ecflow objects that set accessible values can be accessed in the same way

In [56]:
with pf.Suite('s') as s:
    pf.RepeatDate("YMD", datetime.date(2019, 1, 1), datetime.date(2019, 12, 31))
    
print(pf.TemplateScript(
    'echo "The current date object is {{ YMD.name }}. Value={{ YMD }}',
    YMD=s.YMD
))

echo "The current date object is YMD. Value=$YMD


We can also use templating to facilitate accessing attributes using the `ecflow_client`, and to correctly set thew according to mutable values (including ecflow variables)

In [57]:
with pf.Suite('s', FOO='bar') as s:
    pf.Label('label', '')
    
print(pf.TemplateScript(
    'ecflow_client --alter=change label {{ LABEL.name }} "{{ VALUE }}" {{ LABEL.parent.fullname }}',
    LABEL=s.label,
    VALUE=s.FOO
))

ecflow_client --alter=change label label "$FOO" /s


### Using attributes belonging to other nodes

Attributes associated with other nodes can be used by passing the relevant attribute object to the site where it is needed. This can be facilitated by accessing children of various nodes as attributes of the parent.

In [58]:
with pf.Suite('s') as s:
    with pf.Family('family1') as f1:
        pf.Label('the_label', '')
        
    with pf.Family('family2') as f2:
        LabelSetter((f1.the_label, "a value"), name='labeller')
        
print(f2.labeller.script)

ecflow_client --alter=change label the_label "a value" /s/family1


In contexts where the relative path between nodes and attributes is required, the `relative_path` method is able to interrogate the relationships. Alternatively the `fullname` attribute will give the absolute path of nodes.

Within pyflow expressions it should not be necessary to generate these paths manually, as the expression generator should do the right thing. However, it is sometimes useful to refer to these components within scripts, especially as expansions within templates scripts.

In [59]:
print(s.family1.the_label.relative_path(s.family2))
print(s.family2.labeller.relative_path(s.family1))
print(s.family2.labeller.relative_path(s.family1.the_label))
print(s.family2.labeller.fullname)
print(s.family1.the_label.fullname)

print('\nscript: \n', pf.TemplateScript(
    'location of external node: {{ NODE.fullname }}',
    NODE=s.family2.labeller
))
print('\nscript: \n', pf.TemplateScript(
    'attribute relative path: {{ ATTRIBUTE.relative_path(NODE) }}',
    ATTRIBUTE=s.family1.the_label,
    NODE=s.family2.labeller
))


family1:the_label
family2/labeller
../family2/labeller
/s/family2/labeller
/s/family1:the_label

script: 
 location of external node: /s/family2/labeller

script: 
 attribute relative path: ../family1:the_label


### Using variables defined in parents

Ecflow suites inherit variables from above. If a task is making use of these variables it is very easy to end up writing tasks that assume the existence of variables in a suite already, without anything programattically indicating or enforcing that this relationship exists.

Derived Tasks that make use of external variables should require that they be passed in from outside. If they are not directly used (i.e. the value is used in the script directly) then validity should be `asserted` in the code.

In [60]:
class ChildTask(pf.Task):
    def __init__(self, external_variable):
        
        assert external_variable.name == 'EXTERNAL_VAR'
        script = 'echo "external variable: $EXTERNAL_VAR"'
        super().__init__('uses_var', script=script)
        
with CourseSuite('assert_external_variable') as s:
    with pf.Family('containing_family', EXTERNAL_VAR=1234) as f:
        ChildTask(f.EXTERNAL_VAR)
        
print(s)
print("script:\n", f.uses_var.script, '\n')
s.deploy_suite()
s.replace_on_server(server_host, server_port)

suite assert_external_variable
  defstatus suspended
  edit ECF_FILES '/Users/macw/git/pyflow/tutorials/course/scratch/files/assert_external_variable'
  edit ECF_HOME '/Users/macw/git/pyflow/tutorials/course/scratch/out'
  edit ECF_JOB_CMD 'bash -c 'export ECF_PORT=%ECF_PORT%; export ECF_HOST=%ECF_HOST%; export ECF_NAME=%ECF_NAME%; export ECF_PASS=%ECF_PASS%; export ECF_TRYNO=%ECF_TRYNO%; export PATH=/Users/macw/opt/miniconda3/envs/pyflow_test/bin:$PATH; ecflow_client --init="$$" && %ECF_JOB% && ecflow_client --complete || ecflow_client --abort ' 1> %ECF_JOBOUT% 2>&1 &'
  edit ECF_KILL_CMD 'pkill -15 -P %ECF_RID%'
  edit ECF_STATUS_CMD 'true'
  edit ECF_OUT '%ECF_HOME%'
  label exec_host "localhost"
  family containing_family
    edit EXTERNAL_VAR '1234'
    task uses_var
  endfamily
endsuite

script:
 echo "external variable: $EXTERNAL_VAR" 

Overwriting existing file: /Users/macw/git/pyflow/tutorials/course/scratch/files/assert_external_variable/uses_var.ecf
Save /Users/macw/git/pyfl

If scripts are being generated or templated, then the existence of inherited variables can be enforced through generation.

In [61]:
class ChildTask(pf.Task):
    def __init__(self, external_variable):
        script = pf.TemplateScript(
            'echo "external variable: {{ VARIABLE }}"',
            VARIABLE=external_variable
        )
        super().__init__('uses_var', script=script)
        
with CourseSuite('templated_external_variable') as s:
    with pf.Family('containing_family', MY_VAR=1234) as f:
        ChildTask(f.MY_VAR)
        
print(s)
print("script:\n", f.uses_var.script, '\n')
s.deploy_suite()
s.replace_on_server(server_host, server_port)

suite templated_external_variable
  defstatus suspended
  edit ECF_FILES '/Users/macw/git/pyflow/tutorials/course/scratch/files/templated_external_variable'
  edit ECF_HOME '/Users/macw/git/pyflow/tutorials/course/scratch/out'
  edit ECF_JOB_CMD 'bash -c 'export ECF_PORT=%ECF_PORT%; export ECF_HOST=%ECF_HOST%; export ECF_NAME=%ECF_NAME%; export ECF_PASS=%ECF_PASS%; export ECF_TRYNO=%ECF_TRYNO%; export PATH=/Users/macw/opt/miniconda3/envs/pyflow_test/bin:$PATH; ecflow_client --init="$$" && %ECF_JOB% && ecflow_client --complete || ecflow_client --abort ' 1> %ECF_JOBOUT% 2>&1 &'
  edit ECF_KILL_CMD 'pkill -15 -P %ECF_RID%'
  edit ECF_STATUS_CMD 'true'
  edit ECF_OUT '%ECF_HOME%'
  label exec_host "localhost"
  family containing_family
    edit MY_VAR '1234'
    task uses_var
  endfamily
endsuite

script:
 echo "external variable: $MY_VAR" 

Overwriting existing file: /Users/macw/git/pyflow/tutorials/course/scratch/files/templated_external_variable/uses_var.ecf
Save /Users/macw/git/pyflow/

Alternatively, we can provide default values which are overridden in the context of an externally supplied variable.

In [62]:
class TaskWithVariable(pf.Task):
    def __init__(self, name, default_value=1234, **kwargs):
        super().__init__(name, **kwargs)
        
        # Note that this sort of introspective setup is one that requires constructing
        # components after calling the superclass
        if isinstance(default_value, pf.Variable):
            var = default_value
        else:
            self.TASK_VALUE = default_value
            var = self.TASK_VALUE
        
        self.script = pf.TemplateScript(
            'echo "external variable: {{ VARIABLE }}"',
            VARIABLE=var
        )

with CourseSuite('internal_or_external_variable') as s:
    with pf.Family('containing_family', MY_VAR=1234) as f:
        TaskWithVariable('external_variable', f.MY_VAR)
        TaskWithVariable('external_value', f.MY_VAR.value)
        TaskWithVariable('default_value')
        
print(s)
print("script external:\n", f.external_variable.script, '\n')
print("script default:\n", f.default_value.script, '\n')
s.deploy_suite()
s.replace_on_server(server_host, server_port)

suite internal_or_external_variable
  defstatus suspended
  edit ECF_FILES '/Users/macw/git/pyflow/tutorials/course/scratch/files/internal_or_external_variable'
  edit ECF_HOME '/Users/macw/git/pyflow/tutorials/course/scratch/out'
  edit ECF_JOB_CMD 'bash -c 'export ECF_PORT=%ECF_PORT%; export ECF_HOST=%ECF_HOST%; export ECF_NAME=%ECF_NAME%; export ECF_PASS=%ECF_PASS%; export ECF_TRYNO=%ECF_TRYNO%; export PATH=/Users/macw/opt/miniconda3/envs/pyflow_test/bin:$PATH; ecflow_client --init="$$" && %ECF_JOB% && ecflow_client --complete || ecflow_client --abort ' 1> %ECF_JOBOUT% 2>&1 &'
  edit ECF_KILL_CMD 'pkill -15 -P %ECF_RID%'
  edit ECF_STATUS_CMD 'true'
  edit ECF_OUT '%ECF_HOME%'
  label exec_host "localhost"
  family containing_family
    edit MY_VAR '1234'
    task external_variable
    task external_value
      edit TASK_VALUE '1234'
    task default_value
      edit TASK_VALUE '1234'
  endfamily
endsuite

script external:
 echo "external variable: $MY_VAR" 

script default:
 echo "

### General node properties

Nodes and attributes have many accessible properties that can be accessed. Here is a non-exhaustive list of useful general node properties:

 - `suite` - The `Suite` object containing the node
 - `host()` - The currently active `Host` object
 - `anchor` - The current anchor (either `Suite` or `AnchorFamily`) containing this node
 - `name` - The visible name of this node
 - `fullname` - The full path of this node from the root
 - `all_children` - All (direct) children of a node
 - `all_executable_children` - All `Tasks` and `Families` (directly) contained within a `Family`
 - `all_tasks` - All `Tasks` (directly) contained within a `Family`

## Looping Constructs

Pyflow supports ecflow looping constructs, and ensures that they are initialised in a type-safe manner. The values of these looping constructs can be accessed from scripts in the same manner as normal ecflow variables.

In [63]:
with CourseSuite('looping_constructs') as s:
    
    with pf.Family('date_family'):
        pf.RepeatDate('REPEAT_DATE',
                      datetime.date(year=2019, month=1, day=1),
                      datetime.date(year=2019, month=12, day=31))
        
        with pf.Family('hour_family', labels={'date_time': ''}) as f:
            pf.RepeatInteger('REPEAT_HOUR', 1, 24)
            (
                LabelSetter((f.date_time, '$REPEAT_DATE hour $REPEAT_HOUR'))
                >>
                WaitSeconds(2)
            )

s.deploy_suite()
s.replace_on_server(server_host, server_port)

Overwriting existing file: /Users/macw/git/pyflow/tutorials/course/scratch/files/looping_constructs/set_labels.ecf
Save /Users/macw/git/pyflow/tutorials/course/scratch/files/looping_constructs/set_labels.ecf
Overwriting existing file: /Users/macw/git/pyflow/tutorials/course/scratch/files/looping_constructs/wait_2.ecf
Save /Users/macw/git/pyflow/tutorials/course/scratch/files/looping_constructs/wait_2.ecf


## Expressions - Triggers, Completes

Ecflow has a rich languge and (associated behaviour) for expressions that trigger dependencies and conditional behaviour in suites. These expressions are ultimately strings that are parsed by the ecflow server and evaluated to control the suite.

Within pyflow, all of the components that make up ecflow expressions are already present as objects in the script. This means we can generate type-safe, validated expressions by using the existing objects directly. These can then be assigned to the `triggers` or `completes` attributes of any appropriate node.

Trigger expressions should follow the natural arithmetic expressing the problem.

In [64]:
with pf.Suite('s'):
    
    with pf.Family('repeat1') as repeat1:
        pf.RepeatDate('YMD', datetime.date(2019, 1, 1), datetime.date(2010, 12, 31))
        
    with pf.Family('repeat2') as repeat2:
        pf.RepeatDate('YMD', datetime.date(2019, 1, 1), datetime.date(2010, 12, 31))
        
    repeat2.triggers = (repeat1 == pf.state.complete) | (repeat1.YMD > repeat2.YMD)
        
    pf.Task('t3').completes = (repeat2.YMD > '20190616')

### Shortcut - helper properties

A number of shortcut properties exist to construct standard expression components. The following sets of examples are equivalent.

In [65]:
t = MyTask('a_task')
exprn = (t == pf.state.aborted)
exprn = (t == pf.state.complete)
exprn = (t == pf.state.unknown)
exprn = (t == pf.state.queued)
exprn = (t == pf.state.submitted)
exprn = (t == pf.state.active)

In [66]:
t = MyTask('a_task')
exprn = t.aborted
exprn = t.complete
exprn = t.unknown
exprn = t.queued
exprn = t.submitted
exprn = t.active

### Combined Expressions

Expressions can be combined with logical operators, both unary and binary.

In [67]:
with pf.Suite('s'):
    t1 = MyTask('t1')
    t2 = MyTask('t2')
    t3 = MyTask('t3')
    
    t1.triggers = t2.complete & t3.aborted

In [68]:
with pf.Suite('s'):
    t1 = MyTask('t1')
    t2 = MyTask('t2')
    t3 = MyTask('t3')
    
    t1.triggers = t2.complete
    t1.triggers |= t3.aborted

### Shortcut - dependencies

The most common trigger expression to express is one of dependencies. Task A runs only after Task B has completed. We provide a special operator to simplify this approach.

The following are equivalent approaches.

In [69]:
with pf.Suite('s'):
    t1 = MyTask('t1')
    t2 = MyTask('t2')
    t1.triggers = t2.complete

In [70]:
with pf.Suite('s'):
    t1 = MyTask('t1')
    t2 = MyTask('t2')
    t1.triggers = (t2 == pf.state.complete)

In [71]:
with pf.Suite('s'):
    t1 = MyTask('t1')
    t2 = MyTask('t2')
    t1 >> t2

In [72]:
with pf.Suite("s"):
    (
        MyTask('t1')
        >>
        MyTask('t2')
    )

If we need to establish a dependency relationship between a large number of tasks

## External Ecflow Dependencies

Pyflow builds its dependency trees using python objects. This means that if we wish to have connections to external suites, that are not built from the same repository, then we must build shadow objects that map to the nodes we wish to connect to.

A full range of these `Extern*` objects exist which may be used in the normal way.

In [73]:
with pf.Suite('s'):
    
    etask = pf.ExternTask('/a/b/c/d')
    efamily = pf.ExternFamily('/f/g/h/i')
    
    eymd = pf.ExternYMD('/a/b/c/d:YMD')
    eevent = pf.ExternEvent('/e/f/g/h:ev')
    emeter = pf.ExternMeter('/g/h/i/j:mt')
    
    t1 = pf.Task('t1')
    t1.triggers = etask & efamily

# In-depth Functionality

Pyflow aims to contain not just a collection of ecflow functionality, but also helper functionality to assist in building suites. Where idiomatic uses of ecflow result in the same mechanisms being built repeatedly, pyflow can incorporate these to help generate clearer suites.

The `ecflow_name()` functionality converts an arbitrary string into a name which meets the character restrictions for ecflow nodes. This is very useful for converting strings such as hostnames or the names of various data sets into a form that can be used as the name of a Family or Task.

In [74]:
print(pf.ecflow_name('hyphenated-name'))

hyphenated_name


The `all_complete()` and `sequence()` functions facilitate working with generated sequences of python tasks. `all_complete()` generates an expression suitable for use in triggers (or completes). `sequence()` generates triggers such that all of the tasks will run sequentially.

In [75]:
with CourseSuite('sequences') as s:
    tasks = [pf.Task('t_{}'.format(i)) for i in range(10)]
    pf.Task('done', triggers=pf.all_complete(tasks))
    pf.sequence(tasks)
print(s)

suite sequences
  defstatus suspended
  edit ECF_FILES '/Users/macw/git/pyflow/tutorials/course/scratch/files/sequences'
  edit ECF_HOME '/Users/macw/git/pyflow/tutorials/course/scratch/out'
  edit ECF_JOB_CMD 'bash -c 'export ECF_PORT=%ECF_PORT%; export ECF_HOST=%ECF_HOST%; export ECF_NAME=%ECF_NAME%; export ECF_PASS=%ECF_PASS%; export ECF_TRYNO=%ECF_TRYNO%; export PATH=/Users/macw/opt/miniconda3/envs/pyflow_test/bin:$PATH; ecflow_client --init="$$" && %ECF_JOB% && ecflow_client --complete || ecflow_client --abort ' 1> %ECF_JOBOUT% 2>&1 &'
  edit ECF_KILL_CMD 'pkill -15 -P %ECF_RID%'
  edit ECF_STATUS_CMD 'true'
  edit ECF_OUT '%ECF_HOME%'
  label exec_host "localhost"
  task t_0
  task t_1
    trigger t_0 eq complete
  task t_2
    trigger t_1 eq complete
  task t_3
    trigger t_2 eq complete
  task t_4
    trigger t_3 eq complete
  task t_5
    trigger t_4 eq complete
  task t_6
    trigger t_5 eq complete
  task t_7
    trigger t_6 eq complete
  task t_8
    trigger t_7 eq complet

A common idiom in looping suites is to have two suites that both loop on dates/times, one which runs behind the other. For example the `lag` family running after the forecast has completed. This idiom can be expressed more clearly by encapsulating its functionality.

In [76]:
with pf.Suite('follow') as s:
    with pf.Family('leader') as leader:
        pf.RepeatDate("YMD", datetime.date(2019, 1, 1), datetime.date(2019, 12, 31))
    with pf.Family('follower') as follower:
        pf.RepeatDate("YMD", datetime.date(2019, 1, 1), datetime.date(2019, 12, 31))
    follower.follow = leader.YMD
print(s)

suite follow
  edit ECF_JOB_CMD 'bash -c 'export ECF_PORT=%ECF_PORT%; export ECF_HOST=%ECF_HOST%; export ECF_NAME=%ECF_NAME%; export ECF_PASS=%ECF_PASS%; export ECF_TRYNO=%ECF_TRYNO%; export PATH=/Users/macw/opt/miniconda3/envs/pyflow_test/bin:$PATH; ecflow_client --init="$$" && %ECF_JOB% && ecflow_client --complete || ecflow_client --abort ' 1> %ECF_JOBOUT% 2>&1 &'
  edit ECF_KILL_CMD 'pkill -15 -P %ECF_RID%'
  edit ECF_STATUS_CMD 'true'
  edit ECF_OUT '%ECF_HOME%'
  label exec_host "default"
  family leader
    repeat date YMD 20190101 20191231 1
  endfamily
  family follower
    trigger leader eq complete or follower:YMD lt leader:YMD
    repeat date YMD 20190101 20191231 1
  endfamily
endsuite



This collection of utility functionality is (perpetually) in progress, and will be updated to account for useful idioms as they emerge.

# Host management

Ecflow is ultimately a framework for executing tasks, but task execution requires a context. Pyflow makes use of a `Host` object to supply the context for this execution. As such pyflow *requires* a host object to be defined before it will generate any executable nodes in the tree. The `host` can be set at any level (`Suite`, `Family` or `Task`) and is inherited unless overridden.

If the default behaviour of ecflow is required, and task execution is being managed explicitly, the host may be set to `NullHost()` at the `Suite` level. This will suppress all host-related behaviour inside pyflow.

For task handling, it is important that the `ecflow_client` is configured (via appropriate environment variables) and that it is correctly called to trigger changes of state in the server. Further, any and all errors that may occur in a script must be correctly caught and reported to the ecflow server.

`Host` objects must also know how to transfer data to/from the host to be able to implement the Data Resource functionality discussed later.

## Host Arguments

Host classes have many configurable options, but some of these options are available for all host classes and configure the base Host class. Other than `name`, all of these are optional, keyword arguments with plausible defaults.

 * `name` - the name used for the host. Required (non keyword argument).
 * `hostname` - The hostname to run the task on. Defaults to `name` if not supplied
 * `scratch_directory` - The path in which tasks will be run, unless otherwise specified. Also to be used within suites when a scratch location is needed.
 * `log_directory` - The directory to use for script output. Defaults to `ECF_HOME`, but may need to be changed on systems with scheduling systems to make the output visible to the ecflow server.
 * `limit` - How many tasks can run on the node simultaneously.
 * `extra_paths` - Paths that are to be added to PATH on the host.
 * `extra_variables` - A dictionary of additional ECFLOW variables that should be set to configure the host (e.g. {'SCHOST': 'ccb'}).
 * `environment_variables` - Additional environment variables to export into all scripts.
 * `modules` - Modules to `module load`
 * `module_purge` - Should a `module purge` command be run (before loading any modules). Default False.
 * `module_source` - The shell script to source to initialise the module system. Default None.
 * `ecflow_path` - The directory containing the `ecflow_client` executable
 * `label_host` - When the `host` property is changed on a node, should a `Label` be created in the tree. Default True.
 


## Existing Host Classes

A number of existing host clases have been defined. These can be extended, and alternatives provided.


### `LocalHost`

This is essentially a trivial host. It runs tasks as background processes on the current node - i.e. on the ecflow server, and running as the same user as the server. Other than for examples, this is extremely useful for running tasks that update labels, meters, events and variables on a node that is certain to have the `ecflow_client` working correctly and with no job queuing delay.

In [77]:
host = pf.LocalHost()

### `SSHHost`

Run a script on a remote host which has been accessed by ssh. The `name` argument is treated as the target hostname unless the `hostname` keyword argument is explicitly supplied. By default the user that generated the pyflow suite is used, unless the `user` argument is supplied.

The SSHHost is special in that it does not require the `ecflow_client` to be installed on the remote host and does not require the presence of any shared filesystems or log servers to make output logs visible to the user. All of the `ecflow_client` commands required are executed on the *server side*, and the script output is piped back through the SSH command.

For these connections to be established, it is necessary that the ecflow server is configured to have SSH access to the target systems using SSH keys. Further, as this requires an SSH connection to be maintained for each of the running commands, it imposes a practical limit on the number of commands that can be run simultaneously on any remote host. There may be value in setting up SSH connections that persist across multiple commands, by making use of the `ControlMaster`, `ControlPath` and `ControlPersist` options in the ssh config file.

In [78]:
host = pf.SSHHost('dhs9999', user='max', scratch_directory='/data/a_mounted_filesystem/tmp')

The SSHHost class can also take additional optional arguments `indirect_host` and `indirect_user`. If `indirect_host` is supplied then a two-hop connection is made, such that a connection is made to the `indirect_host`, and then a further ssh connection is made to the real host. Note that this is not the same as using a `ProxyCommand` configured to a normal ssh connection - the credentials for the second hop are held on the intermediate system. `indirect_user` defaults to `user` if it is not supplied.

In [79]:
host = pf.SSHHost('cloud-mvr001',
                  user='mover-user',
                  indirect_host='cloud-gateway',
                  indirect_user='cloud-user')

### `PBSHost`

Connects to a remote host by ssh, and submits a job on the batch scheduling system. As this task will run asynchronously on a remote system this *requires* the `ecflow_client` to be available, and if it is not at the default location this should be configured with the `ecflow_path` keyword argument.

It is anticipated that for real use this class will be derived from to add and configure site-specific functionality (such as knowledge of, and handling of, queues). See the example for hosts cca/ccb in `pyflow/hosts/cca.py`.

It is likely that the `log_directory` will need to be modified, and the `ECF_LOGHOST` and `ECF_LOGPORT` variables are likely to be needed to operate with a log server to get output working fully.

***This class is very much a work in progress, and is probably currently too ECMWF-specific***

### `SLURMHost`

This executes scripts on a remote system, by ssh-ing in and submitting to the SLURM job scheduling system. This is very much analagous to the PBSHost.

### `TroikaHost`

This executes scripts on a remote system, by ssh-ing in and submitting the jobs using the Troika tool developed by ECWMF, which allows to abstract the job submission system using configuration files.

## For Discussion

### For Discussion: Limits

`Host` objects accept an argument `limit=`. This can be used to construct a limit (preferably in a sensible location within the suite). Once this has been set up then any `Task` that is created using this host object will automatically be added to the limit for the given host.

Note that this implies that the same host *object* should be used to configure `Tasks` throughout the suite, rather than just using host objects that refer to the same host.

In [80]:
with CourseSuite('limits', host=pf.LocalHost('localhost', limit=3)) as s:
    
    with pf.Family('limits'):
        s.host.build_limits()
        
    pf.Task('t1', script='I am limited')
    
print(s)

suite limits
  defstatus suspended
  edit ECF_FILES '/Users/macw/git/pyflow/tutorials/course/scratch/files/limits'
  edit ECF_HOME '/Users/macw/git/pyflow/tutorials/course/scratch/out'
  edit ECF_JOB_CMD 'bash -c 'export ECF_PORT=%ECF_PORT%; export ECF_HOST=%ECF_HOST%; export ECF_NAME=%ECF_NAME%; export ECF_PASS=%ECF_PASS%; export ECF_TRYNO=%ECF_TRYNO%; export PATH=/Users/macw/opt/miniconda3/envs/pyflow_test/bin:$PATH; ecflow_client --init="$$" && %ECF_JOB% && ecflow_client --complete || ecflow_client --abort ' 1> %ECF_JOBOUT% 2>&1 &'
  edit ECF_KILL_CMD 'pkill -15 -P %ECF_RID%'
  edit ECF_STATUS_CMD 'true'
  edit ECF_OUT '%ECF_HOME%'
  label exec_host "localhost"
  family limits
    limit localhost 3
  endfamily
  task t1
    inlimit /limits/limits:localhost
endsuite



***Discussion: Is this useful functionality to have in pyflow, or should this be nuked?***

### For Discussion: Job Characteristics

In pyflow, a task is generated as a synthesis of multiple pieces of information:

 - The Task object in the suite - *when* to run
 - The Script object (script attribute on Task) - *what* to run
 - The Host object - *how* to run
 
The combination of these three components provides the information to determine *when*, *what*, and *how* a task should be executed. The Host object is important as it provides two major components:

 1. A mechanism by which a task should be executed. This reduces to the ECF_JOB_CMD and associated machinery.
 2. Preamble and Postamble material that is used for consting the script to execute.
 
Unfortunately, the breakdown is not nearly so clear in real life. Consider the case of one of the HPC machines. We can:

 - Run a task on the head node as a simple SSHHost
 - Submit a serial, fractional or parallel job
 - Submit jobs using various (machine specific) resource requirements
 
This is a problem. Conceptually properties such as the number of cores and nodes, whether to use hyperthreading or hugepages are properties of the Task but they depend very strongly on the Host.

Currently all properties that determine the execution process must belong to the Host. These can be parameterised to use Ecflow variables that are set on `Families` or `Tasks`, but this is a bit of a hack. We would like this parameterisation to only be needed if those properties should be changeable at runtime (e.g. by the operators).

To Discuss: What should the API look like?

# Deployable Resources

There are many hacks to deploy resources in suites, or resources can be managed and deployed out of band with the suite. It is, however, better to manage versioning of deployed resources in conjunction with the suite. This ensures that a deployed suite always runs what is expected.

Pyflow provides a new mechanism for deploying resources as resources. This can include static data files and anything else that should already be in place for tasks to run correctly.

***Do not use Resources to deploy scripts or other executable code.***

The Resource mechanism provides a decoupling between:
    
 1. Specifying what resource should be deployed (at suite generation time)
 2. Obtaining the resource, and on which host this resource should be obtained
 3. On which host(s) the resource should be deployed
 
The host that runs the resource task can be selected by setting the host attribute on the Resources family. The hosts onto which the resources are deployed are specified in a list - this enables the suite to retrieve an external resource only once, even if it needs to be deployed to multiple locations.

In [81]:
# BROKEN
with pf.Suite('s'):
    with pf.Resources(host=pf.LocalHost()):
        pf.DataResource('script', [pf.LocalHost('localhost')], 'some data'.encode('utf-8'))

TypeError: expected str, bytes or os.PathLike object, not NoneType

Data can be retrieved from a number of types of location.

In [2]:
# BROKEN
with pf.Suite('s'):
    with pf.Resources():
        
        # Deploy data directly from the python code
        pf.DataResource('data1', [pf.LocalHost('localhost')], "this is some data".encode('utf-8'))
        
        # Deploy data from a file accessible at generation time
        pf.FileResource('data2', [pf.LocalHost('localhost')], 'path/to/data.dat')
        
        # Deploy data accessible from a URL
        pf.WebResource('data3', [pf.LocalHost('localhost')], 'htts://example.com/data')
        pf.WebResource('data4', [pf.LocalHost('localhost')], 'htts://example.com/data', md5='0123456789abcdef')

TypeError: expected str, bytes or os.PathLike object, not NoneType

The resource class can be derived from to obtain more complex resources. The FDB test suite has a MARSResource that runs on a host that has a MARS client to obtain test data from MARS, and which is then transferred to the relevant hosts for testing the FDB tools (which do not have a working and configured MARS client able to interact with the operational MARS).

To extend the functionality of a Resource class, the `get_resource` member function should be overriden to return an array of lines that can be combined into a script to be run on the Resource execution host to obtain the data.

# Script handling in pyflow

Pyflow is designed to facilitate a number of modes of use:

 * Running scripts that already exist, or are developed outside of the pyflow suite.
 * Running scripts that are stand-alone in the pyflow suite.
 * Running scripts that are generated from pieces, and templated using pyflow objects.
 
This enables clean pathways for migrating existing suites, whilst also giving flexibility for generated functionality.

## Script Locations

ecflow uses a well defined strategy for locating the scripts to run. It looks in the location specified by ECF_FILES if it is specified, or ECF_HOME otherwise (these can be set using the `files=` or `home=` arguments to the Suite or to anchor families above).

Then, given a specific script path /a/b/c/task the following locations will be considered (in order):

```
$ECF_FILES/a/b/c/task
$ECF_FILES/b/c/task
$ECF_FILES/c/task
$ECF_FILES/task
```

This is designed for a use case such as the operational forecast suites, where tasks/families are grouped macroscopically at a high level (e.g. each forecast ensemble member), where all the tasks differ only by ECFLOW variables that have been set.

In pyflow we define a type of Family called an AnchorFamily (a Suite counts as an AnchorFamily for this purpose). The value of ECF_FILES is updated for an AnchorFamily relative to the most recent parent AnchorFamily. All scripts within an AnchorFamily with the same name *must* be identical. Consider the suite layout:

```
Suite(s, files='root-path')
  Task(t1)
  Family(f1)
    Task(t1)
    Task(t2)
    Task(t3)
  Family(f2)
    AnchorFamily(f3)
      Task(t1)
      Family(f1)
        Task(t1)
        Task(t2)
        Task(t3)
```

This will correspond to an on disk arrangement of

```
root-path/
  t1.ecf
  t2.ecf
  t3.ecf
  f2/
    f3/
      t1.ecf
      t2.ecf
      t3.ecf
```

If the scripts are generated within pyflow then the appropriate uniqueness of scripts will be tested at generation time, and they will be automatically deployed to these locations. If scripts are supplied by the user outside of pyflow, they should be supplied to match this structure.

## Script Generation
Scripts are generated by a combination of:

 1. The script attribute of the task (a Script object)
 2. Attributes of the Task object
 3. The execution host (which may be an attribute of the Task object, or one of its parents)
 
The simplest example of a script can be seen here

In [83]:
 with pf.Suite('s', host=pf.LocalHost('localhost'), files='/s') as s:
    pf.Task('t', script='Running on $ECF_HOST')
    
s.deploy_suite(target=pf.deployment.Notebook)

Note that the script is automatically run with `set -uex`. As such any access to undefined variables, or any commands that fail, will trigger failure of the overall script. If the success of individual commands needs to be tested, this behaivour will need to be selectively turned off (`set +e`).

The script proper is placed within a `%nopp / %end` pair. As such, explicit access to ecflow pre-processing is not available in the script object.

If the host has more complicated behaviour, the preamble and postamble applied are more complex. In particular, if the ecflow_client is (known to be) available on the target host then the relevant environment/ecflow variables are introduced, and the `PATH` is updated such that the ecflow_client is available.

This is also coupled with:

 1. Access to referenced ecflow Variables (or other exportable objects, such as Repeats).
 2. Manuals
 3. Modules
 4. Working directory information

In [84]:
with pf.Suite('s', host=pf.LocalHost(), files='/s', A_VARIABLE='has a value') as s:
    pf.Task('t',
            script='Running on $ECF_HOST\nVariable value $A_VARIABLE',
            manual="This is a multi-line manual\nwhich can contain instructions",
            workdir='/tmp/pyflow/s',
            modules=['ecbuild'])
    
s.deploy_suite(target=pf.deployment.Notebook)

## What is a valid script

Pyflow scripts are instances of the Script class. At generation time, these call some composition functionality to combine script fragments (in the `generate_stub` method) and then call the `generate` method which can be overridden to provide customisable functionality. A number of Script types can be found in the source file `pyflow/script.py`.

Scripts are automatically generated from simple strings or lists of other objects that are convertible to Scripts themselves.

In [85]:
t = pf.Task('t', script='echo "I am a simple script"')
print(type(t.script))
print(t.script)

<class 'pyflow.script.Script'>
echo "I am a simple script"


In [86]:
t = pf.Task('t', script=[
    'echo "I am the first line"',
    'echo "I am the second line"\necho "and I am the third"'
])
print(type(t.script))
print(t.script)

<class 'pyflow.script.Script'>
echo "I am the first line"
echo "I am the second line"
echo "and I am the third"


Scripts can be loaded from files. Additional environment variables can be supplied explicitly (they can also be supplied by the host).
 
In pyflow we aim to minimise the number of environment variables that are made available to scripts and the number of Variables (and other ecflow objects) that are exported to the scripts. This is typically done by analysing the scripts for references to the variables used which are then automatically exported.

There are cases, especially where environment variables are used by opaque binaries, where this exporting cannot be automatic. In these contexts, environment variables can be explicitly exported using the `Script.define_environment_variable(name, value)` function, and pyflow objects can be explicitly exported by using the `Script.force_exported` function. These should be used *minimally* to make scripts work such that we keep generated scripts to minimal length and complexity, and that it is clear what interdependencies actually exist.

i.e. there should not be large numbers of environment variables or ecflow variable exports contained in included header files shared between many tasks.

In [87]:
class Config:
    debug = 1
config = Config()

with pf.Suite('exporting', host=pf.LocalHost()) as s:
    with pf.Task('mars', DEBUG=config.debug) as t:
        t.script = pf.FileScript('sample_script.sh')
        t.script.define_environment_variable("ENV1", 1234)
        t.script.force_exported(t.DEBUG)

s.deploy_suite(target=pf.deployment.Notebook)

## Script templating

It is useful to be able to build scripts out of paramaterisable components. These have two major advantages:

 1. Script components can be reused in multiple contexts, which encourages modular and object-oriented suite design.
 2. Referenced pyflow objects (Variables, Tasks, Labels, ...) are expanded at suite/script generation time, and any referencing errors will be caught at that point. This makes it easy to change the names of ecflow nodes and avoid runtime errors from missing symbols (including by typos).
 
Templating uses the Jinja2 engine. This is a very powerful templating engine for building templated scripts in a python environment. From pyflow, objects should be supplied to the templates as arguments to the TemplateScript object or TemplateFileScript object.
 
An example follows where Labels attached to a task are updated according to the ecflow variables:

In [88]:
def update_label(label, text):
    return pf.TemplateScript(
        'ecflow_client --alter=change label {{ LABEL.name }} "{{ TEXT }}" {{ LABEL.parent.fullname }}',
        LABEL=label,
        TEXT=text
    )


with pf.Suite('s', A_VARIABLE=1234) as s:
    pf.RepeatDate('DATE_REPEAT',
                  datetime.date(year=2019, month=1, day=1),
                  datetime.date(year=2019, month=12, day=31))
    
    t = pf.Task('a_task', labels={'date_label': '', 'var_label': '', 'static_label': ''})
    t.script = [
        update_label(t.date_label, s.DATE_REPEAT),
        update_label(t.var_label, s.A_VARIABLE),
        update_label(t.static_label, 'some static text')
    ]
     
print(t.script)    

ecflow_client --alter=change label date_label "$DATE_REPEAT" /s/a_task
ecflow_client --alter=change label var_label "$A_VARIABLE" /s/a_task
ecflow_client --alter=change label static_label "some static text" /s/a_task


Templatable scripts can be loaded from files, and any valid Script object can be used as the input into a TemplateScript object. Once a script object exists, additional parameters can be added using the `add_parameters` method.

In [89]:
with pf.Suite('s', A_VARIABLE=1234) as s:
    pf.RepeatDate('DATE_REPEAT',
                  datetime.date(year=2019, month=1, day=1),
                  datetime.date(year=2019, month=12, day=31))
    
    t = pf.Task('a_task')
    t.script = pf.TemplateFileScript('template_sample_script.sh', TASK=t)
    
    t2 = pf.Task('another_task')
    t2.script = pf.TemplateScript([
            pf.FileScript('sample_script.sh'),
            'Current task: {{ TASK.name }} ({{ TASK.fullname }}, in suite {{ TASK.suite.name }})',
            'Variable {{ VAR.name }} has value {{ VAR }}, and started with value {{ VAR.value }}',
            'And date: {{ DATE }}'
        ],
        TASK=t2
    )
    t2.script.add_parameters(VAR=s.A_VARIABLE, DATE=s.DATE_REPEAT)
    
print(t.script)
print("\n------------------------------------\n")
print(t2.script)

echo "A am a templated sample script"
echo "I belong to task a_task with full path /s/a_task"

------------------------------------

echo "I am a sample script"
echo "With multiple lines"
echo "Env variable ENV1: $ENV1"

Current task: another_task (/s/another_task, in suite s)
Variable A_VARIABLE has value $A_VARIABLE, and started with value 1234
And date: $DATE_REPEAT


# Extra Examples (more complex)

## Conditional Suite Structure

One of the goals of building an Object-Oriented suite is avoiding tangled, procedural complexity in constructing suites. Making a suite configurable, and multi-purpose requires conditionality in how the suite is constructed.

The most obvious way to do this is to put conditional expressions, namely if statements, into the suite structure. This works, but leads to a long-term increase in the complexity of the suite. But worse, it puts the configuration- and system-dependent logic about how a suite should be built into the structure of the suite rather than with the configuration where it belongs.

This example shows delegation of conditional behaviour to a configuration, such that the configuration can use arbitrary logic and complexity (in this case just a lookup) to determine which subsections of a suite get built.

In [90]:
class Config:
    
    def __init__(self, **tests):
        
        # Default tests that should be built. Otherwise assume not
        self.enabled_tests = {
            'test3': True
        }
        self.enabled_tests.update(tests)
        
    def build_test(self, cls, name, *args, **kwargs):
        if self.enabled_tests.get(name, False):
            return cls(name, *args, **kwargs)
        
        
class ATest(pf.Task):
    def __init__(self, name, val):
        super().__init__(name, script="echo test={} : val={}".format(name, val))
        

class TestingSuite(CourseSuite):
    
    def __init__(self, name, config, **kwargs):
        super().__init__(name, **kwargs)
        with self:
            config.build_test(ATest, 'test1', 1234)
            config.build_test(ATest, 'test2', 4321)
            config.build_test(ATest, 'test3', 6666)
            config.build_test(ATest, 'test4', 7777)

In [91]:
print(TestingSuite('default_tests', Config()))

suite default_tests
  defstatus suspended
  edit ECF_FILES '/Users/macw/git/pyflow/tutorials/course/scratch/files/default_tests'
  edit ECF_HOME '/Users/macw/git/pyflow/tutorials/course/scratch/out'
  edit ECF_JOB_CMD 'bash -c 'export ECF_PORT=%ECF_PORT%; export ECF_HOST=%ECF_HOST%; export ECF_NAME=%ECF_NAME%; export ECF_PASS=%ECF_PASS%; export ECF_TRYNO=%ECF_TRYNO%; export PATH=/Users/macw/opt/miniconda3/envs/pyflow_test/bin:$PATH; ecflow_client --init="$$" && %ECF_JOB% && ecflow_client --complete || ecflow_client --abort ' 1> %ECF_JOBOUT% 2>&1 &'
  edit ECF_KILL_CMD 'pkill -15 -P %ECF_RID%'
  edit ECF_STATUS_CMD 'true'
  edit ECF_OUT '%ECF_HOME%'
  label exec_host "localhost"
  task test3
endsuite



In [92]:
print(TestingSuite('add_test4', Config(test4=True)))

suite add_test4
  defstatus suspended
  edit ECF_FILES '/Users/macw/git/pyflow/tutorials/course/scratch/files/add_test4'
  edit ECF_HOME '/Users/macw/git/pyflow/tutorials/course/scratch/out'
  edit ECF_JOB_CMD 'bash -c 'export ECF_PORT=%ECF_PORT%; export ECF_HOST=%ECF_HOST%; export ECF_NAME=%ECF_NAME%; export ECF_PASS=%ECF_PASS%; export ECF_TRYNO=%ECF_TRYNO%; export PATH=/Users/macw/opt/miniconda3/envs/pyflow_test/bin:$PATH; ecflow_client --init="$$" && %ECF_JOB% && ecflow_client --complete || ecflow_client --abort ' 1> %ECF_JOBOUT% 2>&1 &'
  edit ECF_KILL_CMD 'pkill -15 -P %ECF_RID%'
  edit ECF_STATUS_CMD 'true'
  edit ECF_OUT '%ECF_HOME%'
  label exec_host "localhost"
  task test3
  task test4
endsuite



In [93]:
print(TestingSuite('override_default_test', Config(test1=True, test2=True, test3=False)))

suite override_default_test
  defstatus suspended
  edit ECF_FILES '/Users/macw/git/pyflow/tutorials/course/scratch/files/override_default_test'
  edit ECF_HOME '/Users/macw/git/pyflow/tutorials/course/scratch/out'
  edit ECF_JOB_CMD 'bash -c 'export ECF_PORT=%ECF_PORT%; export ECF_HOST=%ECF_HOST%; export ECF_NAME=%ECF_NAME%; export ECF_PASS=%ECF_PASS%; export ECF_TRYNO=%ECF_TRYNO%; export PATH=/Users/macw/opt/miniconda3/envs/pyflow_test/bin:$PATH; ecflow_client --init="$$" && %ECF_JOB% && ecflow_client --complete || ecflow_client --abort ' 1> %ECF_JOBOUT% 2>&1 &'
  edit ECF_KILL_CMD 'pkill -15 -P %ECF_RID%'
  edit ECF_STATUS_CMD 'true'
  edit ECF_OUT '%ECF_HOME%'
  label exec_host "localhost"
  task test1
  task test2
endsuite



## Structural Delegation

This first example demonstrates delegating a structural decision to a configuration object. We wish to loop over two different axes - one an integer axis, and the other a string based one. The configuration objects decide how this should be done, and the order of the looping.

Further configuration objects can be derived from `Config1` and `Config2` to update the values, while leaving the structures the same.

Once the suite has delegated construction of the looping structure to the config, the construction of the tasks within the looping structure can be continued in the normal way.

In [94]:
class ConfigBase:
    suite_name = None
    min_integer = 1
    max_integer = 5
    strings = ['a', 'b', 'c', 'd', 'e']
    
    def build_nested_loops(self, **kwargs):
        raise NotImplementedError

class Config1(ConfigBase):
    suite_name = 'config_string_integer'
    def build_nested_loops(self, **kwargs):
        with pf.Family('string_looper'):
            pf.RepeatEnumerated('REPEAT_STRING', self.strings)
            with pf.Family('integer_looper', **kwargs) as inner:
                pf.RepeatInteger('REPEAT_INTEGER', self.min_integer, self.max_integer)
        return inner
    
class Config2(ConfigBase):
    suite_name = 'config_integer_string'
    def build_nested_loops(self, **kwargs):
        with pf.Family('integer_looper'):
            pf.RepeatInteger('REPEAT_INTEGER', self.min_integer, self.max_integer)
            with pf.Family('string_looper', **kwargs) as inner:
                pf.RepeatEnumerated('REPEAT_STRING', self.strings)
        return inner
              
class NestedLoopingSuite(CourseSuite):
    def __init__(self, config):
        super().__init__(config.suite_name)
        
        with self:
            with config.build_nested_loops(labels={'info': ''}) as f:
                (                
                    LabelSetter((f.info, '$REPEAT_INTEGER : $REPEAT_STRING'))
                    >>
                    WaitSeconds(2)
                )
    
s1 = NestedLoopingSuite(Config1())
print(s1)
print('\n------------------------------------------------\n')
s2 = NestedLoopingSuite(Config2())
print(s2)

s1.deploy_suite()
s1.replace_on_server(server_host, server_port)
s2.deploy_suite()
s2.replace_on_server(server_host, server_port)

suite config_string_integer
  defstatus suspended
  edit ECF_FILES '/Users/macw/git/pyflow/tutorials/course/scratch/files/config_string_integer'
  edit ECF_HOME '/Users/macw/git/pyflow/tutorials/course/scratch/out'
  edit ECF_JOB_CMD 'bash -c 'export ECF_PORT=%ECF_PORT%; export ECF_HOST=%ECF_HOST%; export ECF_NAME=%ECF_NAME%; export ECF_PASS=%ECF_PASS%; export ECF_TRYNO=%ECF_TRYNO%; export PATH=/Users/macw/opt/miniconda3/envs/pyflow_test/bin:$PATH; ecflow_client --init="$$" && %ECF_JOB% && ecflow_client --complete || ecflow_client --abort ' 1> %ECF_JOBOUT% 2>&1 &'
  edit ECF_KILL_CMD 'pkill -15 -P %ECF_RID%'
  edit ECF_STATUS_CMD 'true'
  edit ECF_OUT '%ECF_HOME%'
  label exec_host "localhost"
  family string_looper
    repeat enumerated REPEAT_STRING "a" "b" "c" "d" "e"
    family integer_looper
      repeat integer REPEAT_INTEGER 1 5
      label info ""
      task set_labels
      task wait_2
        trigger set_labels eq complete
    endfamily
  endfamily
endsuite


----------------