# Anchor Families

In [1]:
# Following code is needed to preconfigure this notebook
import sys
import os
sys.path.insert(0, os.path.abspath('../../..'))

import pyflow as pf

scratchdir = os.path.join('/', 'path', 'to', 'scratch')
filesdir = os.path.join(scratchdir, 'files')
outdir = os.path.join(scratchdir, 'out')


class CourseSuite(pf.Suite):
    """
    This CourseSuite object will be used throughout the course to provide sensible
    defaults without verbosity
    """
    def __init__(self, name, **kwargs):
        
        config = {
            'host': pf.LocalHost(),
            'files': os.path.join(filesdir, name),
            'home': outdir,
            'defstatus': pf.state.suspended
        }
        config.update(kwargs)
        
        super().__init__(name, **config)


class MyTask(pf.Task):
    
    """Counts to the double of a number, first half using a for loop then a while loop"""
    
    def __init__(self, name, default_value=0, **kwargs):
        
        variables = {
            'HALF': default_value,
            'LIMIT': 2*default_value,
        }
        variables.update(**kwargs)
        
        labels = {
            'counter_label': 'count to {}'.format(2*default_value)
        }
        
        script = [
            'echo "This is a counting task named {}"'.format(name),
            'for i in $(seq 1 $HALF); do echo "count $i/$LIMIT"; done',
            'i=$[$HALF+1]; while [ $i -lt $LIMIT ]; do echo "count $i/$LIMIT" ; i=$[$i+1]; done'
        ]
        
        super().__init__(name,
                         script=script,
                         labels=labels,
                         **variables)


class MyFamily(pf.Family):
    
    def __init__(self, name, counters, **kwargs):
        
        labels = {
            'total_counters': counters
        }
        
        super().__init__(name, labels=labels, **kwargs)
        
        with self:
            pf.sequence(MyTask('{}_{}'.format(name,i), i) for i in range(counters))      


class LabelSetter(pf.Task):
    
    def __init__(self, *args, **kwargs):
        """
        Accepts a sequence of label-value tuples
        """
        script = [
            pf.TemplateScript(
                'ecflow_client --alter=change label {{ LABEL.name }} "{{ VALUE }}" {{ LABEL.parent.fullname }}',
                LABEL=label, VALUE=value
            ) for label, value in args
        ]
        
        name = kwargs.pop('name', 'set_labels')
        super().__init__(name, script=script, **kwargs)

The `Family` class provides the fundamental visual block of pyflow. Families provide two distinct roles within suites:

1. Visually grouping related families/tasks
2. Logically grouping related families/tasks from an execution perspective
 
Due to constraints imposed by the order in which **ecFlow** searches for scripts within the configured `files` location, by default _all_ tasks with the same name must share the same script located in the `files` directory (if scripts are deployed by **pyflow**, they will be deployed to this directory). This means that tasks with the same name must either be avoided, or written to have identical scripts, and is a significant constraint on encapsulation in object-oriented suite design.

For simple agregation of tasks, it is encouraged to use `pf.Family` or derive from it. This provides minimal encapsulation of tasks, but not of scripts. All tasks with the same name will share the same script. We build such library of classes and objects so we can re-use these components (Tasks, Families, Suites) in different contexts. A given task class could be used in a research workflow and then reused in another operational workflow.

However different contexts may require some differences in the suite execution. To ensure that we still have a concise, maintainable and easily checkable suite, we need to cater for those differences preferably in a single entity (as opposed to spreadout through the suite).

To that aim, we introduce the use of a _configuration object_ that will handle the differences, and therefore interact and configure our objects under each different context.

This results in suites that are _configurable_ for different use-cases and different contexts and build fundamentally different generated suites from the same components

A configuration object can be constructed manually for different use cases or as a result of parsing configuration files. It can be used to:

* Provide constants and data for specific cases, that will be needed in the suites
* Switch functionality on/off or modify it
* Configuration for hosts where to run the tasks
* Locations of and details of data to process
 
But most importantly, as objects, these configuration objects can be programmable in themselves (can include code). The suite components can delegate part of the suite definition to these _configurators_ and as such the structure of the suite can be determined by logic in the configuration object if necessary.

In [2]:
with CourseSuite('family_example') as s:
    with pf.Family('simple', labels={'example': ''}) as f:
        LabelSetter((f.example, 'example text'))
    MyFamily('derived_family', 5)

s

For more complex functionality containing groups of tasks that require encapsulation we encourage the use of `AnchorFamily`.

The `AnchorFamily` class updates the `files` location according to the relative path of the family from the suite (or previous `AnchorFamily`). Within an `AnchorFamily`, all script lookups are relative to this new location, providing isolation and encapsulation.

All tasks with the same name _within an `AnchorFamily`_ **must share the same script** located in the `files` location _for that `AnchorFamily`_.

As such it is encouraged to:

* Use `AnchorFamily` to encapsulate independent units within a suite. Typically these are the subtrees that make sense to deploy as a whole.
* Use `Family` to aggregate tasks that could share scripts with each other. This can be within an `AnchorFamily`.

The following example shows a suite with identical task names using different scripts, by scoping them with the `AnchorFamily`.

In [3]:
with CourseSuite('anchor_families', files=filesdir) as s:
    with pf.Family('f1'):
        pf.Task('test1')        # Script <files>/test1.ecf
    with pf.Family('f2'):
        pf.Task('test1')        # Script <files>/test1.ecf
    with pf.AnchorFamily('f'):
        with pf.Family('f1'):
            pf.Task('test1')    # Script <files>/f/test1.ecf
            pf.Task('test2')    # Script <files>/f/test2.ecf
        with pf.Family('f2'):
            pf.Task('test2')    # Script <files>/f/test2.ecf
            
s

This supports 2 ways of attaching scripts to identical `Tasks` with different parameters:

* Generate one script per task containing the parameters
* Use one script that is parameterised by the `Variables` on the `Families` and `Tasks`