# Wrapping Existing Scripts with Kosh

In this tutorial we show you how to wrap existing command line based script and use them in combination with Kosh

Table of Content

* [The Script](#script)
* [Script Expectations](#expect)
* [Description of a wrapper](#wrapper)
  * [Initializing](#init)
  * [Adding parameters](#adding)
  * [Positional parameters](#pos)
* [Setting up the notebook](#setting)
* [Part 1: Feeding a single object to the wrapper](#single)
  * [Named parameters](#named)
  * [Positional parameters](#positional)
  * [Mapping parameters names to your objects attributes](#mapping)
  * [Complex attributes mapping](#complex)
* [Part 2: Passing multiple objects to the wrapper](#multiple)


## The script<a id="script"></a>

We will be using a simple *dummy* script that simply outputs the parameters passed to it.



In [1]:
! python ../tests/baselines/scripts/dummy.py --help

usage: dummy.py [-h] [--param1 PARAM1] [--param2 PARAM2] [--combined COMBINED]
                [--run RUN]

optional arguments:
  -h, --help           show this help message and exit
  --param1 PARAM1      First prameter (default: None)
  --param2 PARAM2      Second parameter (default: None)
  --combined COMBINED  A param that will come from two dataset attributes
                       (default: None)
  --run RUN, -r RUN    run (default: None)


Example:

In [2]:
! python ../tests/baselines/scripts/dummy.py --param1 P1 --param2=P2 --combined=COMB -r my_run blah blah blah

Run:my_run, P1:P1, P2:P2, C:COMB, extras:['blah', 'blah', 'blah']


## Script expectations<a id="expect"></a>

At the moment, the script must follow these requirements:

 * It must muse executed from the command line and can be in any language
 * arguments are passed either via underscore(s) (e.g `--param1`, `-r`), or be simply positional.
 
## Setting up the wrapper<a id='wrapper'></a>

### Initializing with an executable<a id='init'></a>

When initiating the wrapper, we must let it know:

 * The `executable` (e.g `python ../tests/baselines/scripts/dummy.py`)

### Adding parameters to parse<a id='adding'></a>

Once you're wrapper is *linked* to the executable, you will need to let it know the parameters that you want to map

This is done via the `add_argument` command (similar to the `argparse` module)

The syntax is:
```python
wrapper.add_argument(parameter, feed_name, default, mapper, feed_pos)
```

* `parameter`: is the name of the parameter in your script (e.g `--param`)
* `feed_name`: is the name of the attribute to match to this parameter in your input feeds. By default it will be the same as the parameter name
* `default`: is the name of the default value to use if the parameter is not attached in the feed, `use_default` means do not construct this parameter in the command line to let the script choose the value. Otherwise this value will be used to construct the command line, if the parameter could not be constructed via feed objects.
* `mapper`: is a function that takes the **feed object** and the **feed_name** as input and return the value to use. Failure on the function will result in using the *default* value for this parameter.
* `feed_pos`: Index of  the object in the input feed that will be used to construct this paramter value. Possible values:
  * `index`: The `index`th object in the feed will be used to construct the value
  * `None`: All objects will be scanned in order. Once a value is obtained, subsequent feed objects will be ignored.
  * `-1`: All objects will be scanned in order. The last value successfully constructed will be used.

### Positional parameters<a id='pos'></a>

Positional parameters can be defined by passing `parameter=''` ***AND*** `feed_name='some_name'`

**IMPORTANT**: Positional parameters will be constructed in the order they have beed added to the wrapper via the `add_argument` function.

## Setting up the notebook<a id='setting'></a>

Let's import the necessary modules and create an empty store with a single attributeless dataset


In [3]:
import kosh
store = kosh.create_new_db("script_wrapping_tutorial.sql")
dataset = store.create(name="tutorial")

## Part 1: Single object feed<a id='single'></a>

### Named parameters<a id='named'></a>

Let's setup our script wrapper, will simply tell it to use our dataset's `param1` attribute value as the value for the `--param1` command line argument.

In [4]:
# How to run my script?
my_executable = "python ../tests/baselines/scripts/dummy.py"
wrapper = kosh.utils.KoshScriptWrapper(executable=my_executable)
wrapper.add_argument("--param1")

Now let's use this on our dataset:

In [5]:
# First let's set `param1`
dataset.param1 = "parameter 1"
# By default the call returns the process output and error pipes
o, e = wrapper.run(dataset)
print(o.decode())

Run:None, P1:parameter 1, P2:None, C:None, extras:[]



In [6]:
# If you prefer you can get back the process itself before the call to `communicate`
p = wrapper.run(dataset, call_communicate=False)
print(p.communicate()[0].decode())

Run:None, P1:parameter 1, P2:None, C:None, extras:[]



You can inspect the call that was generated at:
wrapper.constructed_command_line

In [7]:
print("We called:", wrapper.constructed_command_line)

We called: python ../tests/baselines/scripts/dummy.py --param1 'parameter 1'


In [8]:
# Let's double check we get the same answer
!python ../tests/baselines/scripts/dummy.py --param1 'parameter 1'

Run:None, P1:parameter 1, P2:None, C:None, extras:[]


Let's map all parameters

In [9]:
wrapper = kosh.utils.KoshScriptWrapper(executable=my_executable)
wrapper.add_argument("--param1")
wrapper.add_argument("--param2")
wrapper.add_argument("--run")
wrapper.add_argument("--combined")
wrapper.run(dataset)
# note that parameters not mapped to the dataset where not
# constructed by default, letting the script pick their default value
print(wrapper.constructed_command_line)

python ../tests/baselines/scripts/dummy.py --param1 'parameter 1'


We can pass the desired values at call time to override any value found in dataset(s):

In [10]:
wrapper.run(dataset,param2="P2")
print(wrapper.constructed_command_line)

python ../tests/baselines/scripts/dummy.py --param1 'parameter 1' --param2 'P2'


We can also set our own default for unmapped parameters, (rather than the script's ones)

In [11]:
wrapper = kosh.utils.KoshScriptWrapper(executable=my_executable)
wrapper.add_argument("--param1", default="use_default")
wrapper.add_argument("--param2", default="my_p2_default")
wrapper.add_argument("--combined", default="my_Def_combined")
wrapper.run(dataset)
# note that parameters not mapped to the datset where
# constructed with our new defaults
# while `--run` was left unconstructed
print(wrapper.constructed_command_line)

python ../tests/baselines/scripts/dummy.py --param1 'parameter 1' --param2 'my_p2_default' --combined 'my_Def_combined'


In [12]:
# We can mix and match, call time always override everything else
wrapper.run(dataset, param2="P2", run="MY RUN")
print(wrapper.constructed_command_line)

python ../tests/baselines/scripts/dummy.py --param1 'parameter 1' --param2 'P2' --combined 'my_Def_combined'


In [13]:
# we can also override the dataset mapping
wrapper.run(dataset, param2="P2", run="MY RUN", param1="my forced param1")
# Note that `run` is NOT constructed because our wrapper does NOT know about it
print(wrapper.constructed_command_line)

python ../tests/baselines/scripts/dummy.py --param1 'my forced param1' --param2 'P2' --combined 'my_Def_combined'


In [14]:
# We need to let the wrapper know about `run`
wrapper = kosh.utils.KoshScriptWrapper(executable=my_executable)
wrapper.add_argument("--param1", default="use_default")
wrapper.add_argument("--param2", default="my_p2_default")
wrapper.add_argument("--combined", default="my_Def_combined")
wrapper.add_argument("--run")
wrapper.run(dataset, param2="P2", run="MY RUN", param1="my forced param1")
# Note that `run` is now constructed because our wrapper does know about it
print(wrapper.constructed_command_line)

python ../tests/baselines/scripts/dummy.py --param1 'my forced param1' --param2 'P2' --combined 'my_Def_combined' --run 'MY RUN'


In [15]:
# we can also let it know about the "-r" alias if we prefer
wrapper = kosh.utils.KoshScriptWrapper(executable=my_executable)
wrapper.add_argument("--param1", default="use_default")
wrapper.add_argument("--param2", default="my_p2_default")
wrapper.add_argument("--combined", default="my_Def_combined")
wrapper.add_argument("-r")

# we need to pass it via "r" though
wrapper.run(dataset, param2="P2", r="MY RUN")
print(wrapper.constructed_command_line)

python ../tests/baselines/scripts/dummy.py --param1 'parameter 1' --param2 'P2' --combined 'my_Def_combined' -r 'MY RUN'


### Positional parameters<a id='positional'></a>

Now let's let our wrapper know that we want some positional parameters

In order to map them to our dataset attributes and preserve the order we need to declare them in the order they should be passed to the script

In [16]:
wrapper = kosh.utils.KoshScriptWrapper(executable=my_executable)
wrapper.add_argument("--param1", default="use_default")
wrapper.add_argument("--param2", default="my_p2_default")
wrapper.add_argument("--combined", default="my_Def_combined")
wrapper.add_argument("-r")
# Now the positional argument and their corresponding target attributes on the feed
# Also we can similarly replace the default values
wrapper.add_argument("", feed_attribute="first")
wrapper.add_argument("", feed_attribute="second", default="my_def_second")
wrapper.add_argument("", feed_attribute="third")

# let's set first on our dataset
dataset.first = "positional_1"
wrapper.run(dataset)
# Note that the last positional arg was not constructed
# because it's value was not updated from "use_default"
# and no positional argument exists after it
print(wrapper.constructed_command_line)

python ../tests/baselines/scripts/dummy.py --param1 'parameter 1' --param2 'my_p2_default' --combined 'my_Def_combined' 'positional_1' 'my_def_second'


Note that the declaration order matters:

In [17]:
wrapper = kosh.utils.KoshScriptWrapper(executable=my_executable)
wrapper.add_argument("--param1", default="use_default")
wrapper.add_argument("--param2", default="my_p2_default")
wrapper.add_argument("--combined", default="my_Def_combined")
wrapper.add_argument("-r")
# Now the positional argument and their corresponding target attributes on the feed
# Also we can similarly replace the default values
wrapper.add_argument("", feed_attribute="second", default="my_def_second")
wrapper.add_argument("", feed_attribute="first")
wrapper.add_argument("", feed_attribute="third")

# let's set first on our dataset
dataset.first = "positional_1"
wrapper.run(dataset)
# Note that the last positional arg was not constructed
# because it's value was not updated from "use_default"
# and no positional argument exists after it
print(wrapper.constructed_command_line)

python ../tests/baselines/scripts/dummy.py --param1 'parameter 1' --param2 'my_p2_default' --combined 'my_Def_combined' 'my_def_second' 'positional_1'


### Mapping parameters<a id='mapping'></a>

While being able to map parameters is nice, it is often impractical as metadata will not match exactly the script expected parameters names.

Similarly to *positional parameters* we can pass a `feed_name` to point to the corresponding attribute in the feed object

In our case let's say that the `run` parameter actually maps to the `name` attribute of our dataset

In [18]:
wrapper = kosh.utils.KoshScriptWrapper(executable=my_executable)
wrapper.add_argument("--param1", default="use_default")
wrapper.add_argument("--param2", default="my_p2_default")
wrapper.add_argument("--combined", default="my_Def_combined")
wrapper.add_argument("-r", feed_attribute="name")
wrapper.run(dataset)
# Note that `--run` was mapped to `tutorials` which our dataset name
print(wrapper.constructed_command_line)

python ../tests/baselines/scripts/dummy.py --param1 'parameter 1' --param2 'my_p2_default' --combined 'my_Def_combined' -r 'tutorial'


### Complex name mapping: using functions<a id='complex'></a>

Sometimes we will need some more elaborate ways to construct the value.

One can pass a `mapper`function to the `add_argument` command that will take the feed object passed at call time as an input as well as the feed attribute it is mapped to.

Here we will map `combined` to path created by joining the `root` attribute of our dataset to its `name` attribute


In [19]:
import os
wrapper = kosh.utils.KoshScriptWrapper(executable=my_executable)
wrapper.add_argument("--param1", default="use_default")
wrapper.add_argument("--param2", default="my_p2_default")
wrapper.add_argument("--combined", default="my_Def_combined", mapper=lambda x, y: os.path.join(x.root, x.name))
wrapper.add_argument("--run", feed_attribute="name")

# let's set first on our dataset
dataset.root = "/my/root/path"
wrapper.run(dataset)
print(wrapper.constructed_command_line)

python ../tests/baselines/scripts/dummy.py --param1 'parameter 1' --param2 'my_p2_default' --combined '/my/root/path/tutorial' --run 'tutorial'


We can use this for positional parameters as well

In this case we will map the third positional arguments to the uri of the first associated data source with mime_type "py"

In [20]:
def my_function(obj, attribute):
    associated_source = obj.search(mime_type="py")[0]
    return associated_source.uri

wrapper = kosh.utils.KoshScriptWrapper(executable=my_executable)
wrapper.add_argument("--param1", default="use_default")
wrapper.add_argument("--param2", default="my_p2_default")
wrapper.add_argument("--combined", default="my_Def_combined", mapper=lambda x, y: os.path.join(x.root, x.name))
wrapper.add_argument("--run", feed_attribute="name")
wrapper.add_argument("", feed_attribute="first")
wrapper.add_argument("", feed_attribute="second", default="my_def_second")
wrapper.add_argument("", feed_attribute="third", mapper=my_function)


dataset.associate("../setup.py", "py")
wrapper.run(dataset)
print(wrapper.constructed_command_line)

python ../tests/baselines/scripts/dummy.py --param1 'parameter 1' --param2 'my_p2_default' --combined '/my/root/path/tutorial' --run 'tutorial' 'positional_1' 'my_def_second' '/g/g19/cdoutrix/git/kosh/setup.py'


## Part 2: Passing multiple objects to the run command<a id='multiple'></a>

So far we showed how to map parameters to a single object, but we can *feed* many objects to our `run` command.

Via the `add_argument` you can control which object of the feed will be used to construct each parameter.

By default, the wrapper will construct the parameter values from the first valid object fed to it.

For example let's feed it the following 3 objects:

In [21]:
d1 = store.create(name='d1', metadata={"run": "run1"})
d2 = store.create(name='d2', metadata={"run": "run2", "param1":2})
d3 = store.create(name='d3', metadata={"run": "run3", "param1":3, "param2":3})

wrapper = kosh.utils.KoshScriptWrapper(executable=my_executable)
wrapper.add_argument("--param1", default="p1_default")
wrapper.add_argument("--param2", default="my_p2_default")
wrapper.add_argument("--run")
wrapper.add_argument("--combined", default='combined_default')


Let's *feed* the following 3 objects to our wrapper. They all contain `run` so the value of `run` used will be the one from the first object fed to it. `param1` is only on two datasets so it will use the first value it can construct which is from `d2`. Only `d3` has the `param2` attribute so it will be taken from it. Finally none of them has the `combined` attribute so the *default value* from the `add_argument` command will be used.

In [22]:
wrapper.run(d1, d2, d3)
print(wrapper.constructed_command_line)

python ../tests/baselines/scripts/dummy.py --param1 2 --param2 3 --run 'run1' --combined 'combined_default'


Note how changing the *feed* order matters:

In [23]:
wrapper.run(d3, d1, d2)
print(wrapper.constructed_command_line)

python ../tests/baselines/scripts/dummy.py --param1 3 --param2 3 --run 'run3' --combined 'combined_default'


Now, this default might not be your prefered way, so you can also tell the wrapper to use the value from the last possible object, for this use the `feed_pos=-1` argument

In [24]:
wrapper = kosh.utils.KoshScriptWrapper(executable=my_executable)
wrapper.add_argument("--param1", default="p1_default", feed_pos=-1)
wrapper.add_argument("--param2", default="my_p2_default", feed_pos=-1)
wrapper.add_argument("--run", feed_pos=-1)
wrapper.run(d1, d2, d3)
print(wrapper.constructed_command_line)

python ../tests/baselines/scripts/dummy.py --param1 3 --param2 3 --run 'run3'


Note how `param1` is not coming from the last passed object.

In [25]:
wrapper.run(d3, d1, d2)
print(wrapper.constructed_command_line)

python ../tests/baselines/scripts/dummy.py --param1 2 --param2 3 --run 'run2'


But sometimes you want to control exactly which object from the feed must be used.
Let's force `param1` and `param2` to come from the second passed object (`feed_pos=1` because of 0 indexing in Python)

In [26]:
wrapper = kosh.utils.KoshScriptWrapper(executable=my_executable)
wrapper.add_argument("--param1", default="p1_default", feed_pos=1)
wrapper.add_argument("--param2", default="my_p2_default", feed_pos=1)
wrapper.add_argument("--run")
wrapper.run(d1, d2, d3)
print(wrapper.constructed_command_line)

python ../tests/baselines/scripts/dummy.py --param1 2 --param2 'my_p2_default' --run 'run1'


`feed_pos` can also be used with positional arguments

In [27]:
wrapper = kosh.utils.KoshScriptWrapper(executable=my_executable)
wrapper.add_argument("--param1", default="p1_default", feed_pos=1)
wrapper.add_argument("--param2", default="my_p2_default", feed_pos=1)
wrapper.add_argument("", feed_attribute="name", feed_pos=1)
wrapper.add_argument("", feed_attribute="param1", feed_pos=1)
wrapper.add_argument("--run")
wrapper.run(d1, d2, d3)
print(wrapper.constructed_command_line)

python ../tests/baselines/scripts/dummy.py --param1 2 --param2 'my_p2_default' --run 'run1' 'd2' 2
