# Script Handling

In [1]:
# Following code is needed to preconfigure this notebook
import datetime
import sys
import os
sys.path.insert(0, os.path.abspath('../../..'))

import pyflow as pf

**pyflow** is designed to facilitate a number of modes of use:

* Running scripts that already exist, or are developed outside of the **pyflow** suite.
* Running scripts that are stand-alone in the **pyflow** suite.
* Running scripts that are generated from pieces, and templated using **pyflow** objects.
 
This enables clean pathways for migrating existing suites, whilst also giving flexibility for generated functionality.

## Script Locations

**ecFlow** uses a well defined strategy for locating the scripts to run. It looks in the location specified by `ECF_FILES` if it is specified, or `ECF_HOME` otherwise (these can be set using the `files=` or `home=` arguments to the Suite or to anchor families above).

Then, given a specific script path `/a/b/c/task` the following locations will be considered (in order):

```
$ECF_FILES/a/b/c/task
$ECF_FILES/b/c/task
$ECF_FILES/c/task
$ECF_FILES/task
```

This is designed for a use case such as the operational forecast suites, where tasks/families are grouped macroscopically at a high level (e.g. each forecast ensemble member), where all the tasks differ only by `ECFLOW` variables that have been set.

In **pyflow** we define a type of Family called an AnchorFamily (a Suite counts as an AnchorFamily for this purpose). The value of `ECF_FILES` is updated for an AnchorFamily relative to the most recent parent AnchorFamily. All scripts within an AnchorFamily with the same name _must_ be identical. Consider the suite layout:

```
Suite(s, files='root-path')
  Task(t1)
  Family(f1)
    Task(t1)
    Task(t2)
    Task(t3)
  Family(f2)
    AnchorFamily(f3)
      Task(t1)
      Family(f1)
        Task(t1)
        Task(t2)
        Task(t3)
```

This will correspond to an on disk arrangement of:

```
root-path/
  t1.ecf
  t2.ecf
  t3.ecf
  f2/
    f3/
      t1.ecf
      t2.ecf
      t3.ecf
```

If the scripts are generated within pyflow then the appropriate uniqueness of scripts will be tested at generation time, and they will be automatically deployed to these locations. If scripts are supplied by the user outside of **pyflow**, they should be supplied to match this structure.

## Script Generation

Scripts are generated by a combination of:

1. The script attribute of the task (a Script object)
2. Attributes of the Task object
3. The execution host (which may be an attribute of the Task object, or one of its parents)
 
The simplest example of a script can be seen below.

In [2]:
with pf.Suite('s', host=pf.LocalHost(), files='/s') as s:
    pf.Task('t', script='Running on $ECF_HOST')
    
s.deploy_suite(target=pf.Notebook)

Note that the script is automatically run with `set -uex`. As such any access to undefined variables, or any commands that fail, will trigger failure of the overall script. If the success of individual commands needs to be tested, this behaivour will need to be selectively turned off (`set +e`).

The script proper is placed within a `%nopp / %end` pair. As such, explicit access to **ecFlow** pre-processing is not available in the script object.

If the host has more complicated behaviour, the preamble and postamble applied are more complex. In particular, if the `ecflow_client` is (known to be) available on the target host then the relevant environment variables are introduced, and the `PATH` is updated such that the ecflow_client is available.

This is also coupled with:

1. Access to referenced **ecFlow** Variables (or other exportable objects, such as Repeats).
2. Manuals
3. Modules
4. Working directory information

In [3]:
with pf.Suite('s', host=pf.LocalHost(), files='/s', A_VARIABLE='has a value') as s:
    pf.Task('t',
            script='Running on $ECF_HOST\nVariable value $A_VARIABLE',
            manual="This is a multi-line manual\nwhich can contain instructions",
            workdir='/tmp/pyflow/s',
            modules=['ecbuild'])
    
s.deploy_suite(target=pf.Notebook)

### Manual via docstring

**pyflow** also supports writing of texts for script manuals via Python docstrings in derived `pf.Task` classes.

In [4]:
class DocumentedTask(pf.Task):
    """
    This is a multi-line manual
    which can contain instructions
    """


with pf.Suite('s', host=pf.LocalHost(), files='/s') as s:
    DocumentedTask('dt')

s.deploy_suite(target=pf.Notebook)

## What is a valid script?

Pyflow scripts are instances of the Script class. At generation time, these call some composition functionality to combine script fragments (in the `generate_stub` method) and then call the `generate` method which can be overridden to provide customisable functionality. A number of Script types can be found in the source file `pyflow/script.py`.

Scripts are automatically generated from simple strings or lists of other objects that are convertible to Scripts themselves.

In [5]:
t = pf.Task('t', script='echo "I am a simple script"')

print(type(t.script))
print(t.script)

<class 'pyflow.script.Script'>
echo "I am a simple script"


In [6]:
t = pf.Task('t', script=[
    'echo "I am the first line"',
    'echo "I am the second line"\necho "and I am the third"'
])

print(type(t.script))
print(t.script)

<class 'pyflow.script.Script'>
echo "I am the first line"
echo "I am the second line"
echo "and I am the third"


Scripts can be loaded from files. Additional environment variables can be supplied explicitly (they can also be supplied by the host).
 
In **pyflow** we aim to minimise the number of environment variables that are made available to scripts and the number of Variables (and other **ecFlow** objects) that are exported to the scripts. This is typically done by analysing the scripts for references to the variables used which are then automatically exported.

There are cases, especially where environment variables are used by opaque binaries, where this exporting cannot be automatic. In these contexts, environment variables can be explicitly exported using the `Script.define_environment_variable(name, value)` function, and **pyflow** objects can be explicitly exported by using the `Script.force_exported` function. These should be used _minimally_ to make scripts work such that we keep generated scripts to minimal length and complexity, and that it is clear what interdependencies actually exist.

In other words, there should not be large numbers of environment variables or **ecFlow** variable exports contained in included header files shared between many tasks.

In [7]:
class Config:
    debug = 1


config = Config()

with pf.Suite('exporting', host=pf.LocalHost()) as s:
    with pf.Task('mars', DEBUG=config.debug) as t:
        t.script = pf.FileScript('sample_script.sh')
        t.script.define_environment_variable("ENV1", 1234)
        t.script.force_exported(t.DEBUG)

s

## Script Templating

It is useful to be able to build scripts out of paramaterisable components. These have two major advantages:

1. Script components can be reused in multiple contexts, which encourages modular and object-oriented suite design.
2. Referenced **pyflow** objects (Variables, Tasks, Labels, ...) are expanded at suite/script generation time, and any referencing errors will be caught at that point. This makes it easy to change the names of ecflow nodes and avoid runtime errors from missing symbols (including by typos).
 
Templating uses the Jinja2 engine. This is a very powerful templating engine for building templated scripts in a Python environment. From **pyflow**, objects should be supplied to the templates as arguments to the TemplateScript object or TemplateFileScript object.
 
An example follows where Labels attached to a task are updated according to the **ecFlow** variables.

In [8]:
def update_label(label, text):
    return pf.TemplateScript(
        'ecflow_client --alter=change label {{ LABEL.name }} "{{ TEXT }}" {{ LABEL.parent.fullname }}',
        LABEL=label,
        TEXT=text
    )


with pf.Suite('s', A_VARIABLE=1234) as s:
    pf.RepeatDate('DATE_REPEAT',
                  datetime.date(year=2019, month=1, day=1),
                  datetime.date(year=2019, month=12, day=31))
    
    t = pf.Task('a_task', labels={'date_label': '', 'var_label': '', 'static_label': ''})
    t.script = [
        update_label(t.date_label, s.DATE_REPEAT),
        update_label(t.var_label, s.A_VARIABLE),
        update_label(t.static_label, 'some static text')
    ]
     
print(t.script)

ecflow_client --alter=change label date_label "$DATE_REPEAT" /s/a_task
ecflow_client --alter=change label var_label "$A_VARIABLE" /s/a_task
ecflow_client --alter=change label static_label "some static text" /s/a_task


Templatable scripts can be loaded from files, and any valid Script object can be used as the input into a TemplateScript object. Once a script object exists, additional parameters can be added using the `add_parameters` method.

In [9]:
with pf.Suite('s', A_VARIABLE=1234) as s:
    pf.RepeatDate('DATE_REPEAT',
                  datetime.date(year=2019, month=1, day=1),
                  datetime.date(year=2019, month=12, day=31))
    
    t = pf.Task('a_task')
    t.script = pf.TemplateFileScript('template_sample_script.sh', TASK=t)
    
    t2 = pf.Task('another_task')
    t2.script = pf.TemplateScript([
            pf.FileScript('sample_script.sh'),
            'Current task: {{ TASK.name }} ({{ TASK.fullname }}, in suite {{ TASK.suite.name }})',
            'Variable {{ VAR.name }} has value {{ VAR }}, and started with value {{ VAR.value }}',
            'And date: {{ DATE }}'
        ],
        TASK=t2
    )
    t2.script.add_parameters(VAR=s.A_VARIABLE, DATE=s.DATE_REPEAT)