# SoS Actions and common action options

* **Difficulty level**: intermediate
* **Time need to lean**: 10 minutes or less
* **Key points**:
  
  

## SoS Action

Although arbitrary python functions can be used in SoS step process, SoS defines many special functions called **`actions`** that accepts some shared parameters, and can behave differently in different modes of SoS.

For example, function `time.sleep(5)` would be executed in run mode,

In [1]:
%run
[0]
import time
st = time.time()
time.sleep(1)
print('I just slept {:.2f} seconds'.format(time.time() - st))

I just slept 1.01 seconds


and also in dryrun mode (option `-n`),

In [2]:
%run -n
[0]
import time
st = time.time()
time.sleep(1)
print('I just slept {:.2f} seconds'.format(time.time() - st))

I just slept 1.00 seconds


because these statements are regular Python functions. However, if you put the statements in an action `python`, the statements would be executed in run mode,

In [3]:
%run
[0]
python:
    import time
    st = time.time()
    time.sleep(1)
    print('I just slept {:.2f} seconds'.format(time.time() - st))

I just slept 1.00 seconds


but will print out the script it would execute in dryrun mode (option `-n`)

In [4]:
%run -n
[0]
python:
    import time
    st = time.time()
    time.sleep(1)
    print('I just slept {:.2f} seconds'.format(time.time() - st))

python:
import time
st = time.time()
time.sleep(1)
print('I just slept {:.2f} seconds'.format(time.time() - st))




## Action options

Actions define their own parameters but their execution is controlled by a common set of options.

###  `active`

Action option `active` is used to activate or inactivate an action in an input loop. It accept either a condition that returns a boolean variable (`True` or `False`), or one or more integers, or slices that corresponds to indexes of active substeps.

The first usage allows you to execute an action only if certain condition is met, so

```
if cond:
  action(param)
```

is equivalent to

```
action(param, active=cond)
```
or
```
action: active=cond
  param
```
in script format. For example, the following action will only be executed if `a.txt` exists

In [5]:
!echo "something" > a.txt

sh: active=path('a.txt').exists()
  echo `wc a.txt`


1 1 10 a.txt


For the second usage, when a loop is defined by `for_each` or `group_by` options of `input:` statement, an action after input would be repeated for each substep. The `active` parameter accepts an integer, either a non-negative number, a negative number (counting backward), a sequence of indexes, or a slice object, for which the action would be active.

For example, for an input loop that loops through a sequence of numbers, the first action `run` is executed for all groups, the second action is executed for even number of groups, the last action is executed for the last step.

In [6]:
seq = range(5)
input: for_each='seq'
run: expand=True
   echo I am active at all groups {_index}
run: active=slice(None, None, 2), expand=True
   echo I am active at even groups {_index}
run: active=-1, expand=True
   echo I am active at last group {_index}

echo I am active at all groups 0
I am active at all groups 0
echo I am active at even groups 0
I am active at even groups 0
echo I am active at all groups 1
I am active at all groups 1
echo I am active at all groups 2
I am active at all groups 2
echo I am active at even groups 2
I am active at even groups 2
echo I am active at all groups 3
I am active at all groups 3
echo I am active at all groups 4
I am active at all groups 4
echo I am active at even groups 4
I am active at even groups 4
echo I am active at last group 4
I am active at last group 4


### `allow_error`

Option `allow_error` tells SoS that the action might fail but this should not stop the workflow from executing. This option essentially turns an error to a warning message and change the return value of action to `None`. 

For example, in the following example, the wrong shell script would stop the execution of the step so the following action is not executed.

In [7]:
%sandbox --expect-error
run: 
    This is not shell
print('Step after run')

This is not shell
/var/folders/ys/gnzk0qbx5wbdgm531v82xxljv5yqy8/T/tmp61jtc31t.sh: line 1: This: command not found



---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
script_-6007614678196007106 in <module>
      run(r"""This is not shell
----> """)
      print('Step after run')

RuntimeError: Failed to execute commmand "/bin/bash -ev /var/folders/ys/gnzk0qbx5wbdgm531v82xxljv5yqy8/T/tmp61jtc31t.sh" (ret=127, workdir=/private/var/folders/ys/gnzk0qbx5wbdgm531v82xxljv5yqy8/T/tmp6gu8uy4m, script now in /var/folders/ys/gnzk0qbx5wbdgm531v82xxljv5yqy8/T/tmp6gu8uy4m/.sos/scratch_0_0_e3a03a5f.sh)


but in this example, the error of `run` action is turned to a warning message and the later step would still be executed.

In [8]:
run: allow_error=True
    This is not shell
print('Step after run')

This is not shell
/var/folders/ys/gnzk0qbx5wbdgm531v82xxljv5yqy8/T/tmp9ldhoswt.sh: line 1: This: command not found




Step after run


###  `args`

All script-executing actions accept an option `args`, which changes how the script is executed.

By default, such an action has an `interpreter` (e.g. `bash`), a default `args='{filename:q}'`, and the script would be executed as `interpreter args`, which is
```
bash {filename:q}
```
where `{filename:q}` would be replaced by the script file created from the body of the action.

If you would like to change the command line with additional parameters, or different format of filename, you can specify an alternative `args`, with variables `filename` (filename of temporary script) and `script` (actual content of the script).

For example, option `-n` can be added to command `bash` to execute script in dryrun mode

In [8]:
bash: args='-n {filename:q}'
    echo "-n means running in dryrun mode (only check syntax)"

and you can actually execute a command without filename, and instead executing the script directly from command line

In [9]:
python: args='-m timeit {script}'
    '"-".join(str(n) for n in range(100))'

10000 loops, best of 3: 32.1 usec per loop


### `container` and `engine`

Parameter `container` and `engine` specify name or URL and execution engine of the container used to execute the action. Parameter `engine` is usually derived from `container` but can be specified explicitly as one of

* `engine='docker'`: Execute the script in specified container using [docker](https://www.docker.com/)
* `engine='singularity'`: Execute the script with [singularity](https://www.sylabs.io/)
* `engine='local'`: Execute the script locally, this is the default mode.

Parameters `container` and `engine` accept the following values:

| `container` | `engine` | execute by | example | comment | 
| -- | -- | -- | -- | -- |
| `tag` | ` `  | docker | `container='ubuntu'` | docker is the default container engine |
| `name` | `docker` | docker | `container='ubuntu', engine='docker'` | treat `name` as docker tag |
| `docker://tag` | ` ` | docker |  `container='docker://ubuntu'`  | |
| `filename.simg` | ` ` | singularity | `container='ubuntu.simg'` | |
| `shub://tag` | ` ` | singularity | `container='shub://GodloveD/lolcow'` | Image will be pulled to a local image |
| `name` | `singularity` | singularity | `container='a_dir', engine='singularity'` | treat `name` as singularity image file or directory |
| `docker://tag` | `singularity` | singularity |  `container='docker://godlovdc/lolcow', engine='singularity'`  |  |
| `file://filename` | ` ` | singularity | `container='file://ubuntu.simg'` | |
| `local://name` | ` ` | local | `container='local:any_tag'` | `local://any_tag` is equivalent to `engine='local'` |
| `name` | `local` | local | `engine=engine` with `parameter: engine='docker'` | Usually used to override parameter `container` |

Basically,
* `container='tag'` pulls and uses docker image `tag`
* `container='filename.simg` uses an existing singularity image
* `container='shub://tag'` pulls and uses singularity image `shub://tag`, which will generate a local `tag.simg` file

If a docker image is specified, the action is assumed to be executed in the specified docker container. The image will be automatically downloaded (pulled) if it is not available locally. 

For example, executing the following script 

```
[10]
python3: container='python'
  set = {'a', 'b'}
  print(set)
  ```

under a docker terminal (that is connected to the docker daemon) will

1. Pull docker image `python`,  which is the official docker image for Python 2 and 3.
2. Create a python script with the specified content
3. Run the docker container `python` and make the script available inside the container
4. Use the `python3` command inside the container to execute the script.

Additional `docker_run` parameters can be passed to actions when the action
is executed in a docker image. These options include

* `name`: name of the container (option `--name`)
* `tty`: if a tty is attached (default to `True`, option `-t`)
* `stdin_open`: if stdin should be open (default to `False`, option `-i`)
* `user`: username (default o `root`, option `-u`)
* `environment`: Can be a string, a list of string or dictinary of environment variables for docker (option `-e`)
* `volumes`: shared volumes as a string or list of strings, in the format of `hostdir` (for `hostdir:hostdir`) or `hostdir:mnt_dir`, in addition to current working directory which will always be shared.
* `volumes_from`: container names or Ids to get volumes from
* `port`: port opened (option `-p`)
* `extra_args`: If there is any extra arguments you would like to pass to the `docker run` process (after you check the actual command of `docker run` of SoS

Because of the different configurations of docker images, use of docker in SoS can be complicated. Please refer to http://vatlab.github.io/SOS/doc/tutorials/SoS_Docker_Guide.html for details.


### `docker_image` (deprecated)

In [None]:
`docker_image='tag'` is now replaced with `container='tag'` or `container='docker://tag'` 

### `default_env`

Option `default_env` set environment variables **if they do not exist in the system**. The value of this option should be a dictionary with string keys and values.

### `docker_file` (deprecated)

This option allows you to import a docker from specified `docker_file`, which can be an archive file (`.tar`, `.tar.gz`, `.tgz`, `.bzip`, `.tar.xz`, `.txz`) or a URL to an archive file (e.g. `http://example.com/exampleimage.tgz`). SoS will use command `docker import` to import the `docker_file`. However, because SoS does not know the repository and tag names of the imported docker file, you will still need to use option `docker_image` to specify the image to use.

### `env`

Option `env` set environment variables **that overrides system variables defined in `os.environ`**. This option can be used to define `PATH` and other environmental variables for the action.

### `input`

Parameter `input` specifies the input files that an action needs before it can be executed. However, unlike targets in `input:` statement of a step where lacking an input target would trigger the execution of an auxiliary step (if needs) to produce it, SoS would yield an error if the input file does not exist.

For example, in the following example, step `20` is executed after step `10` so its `report` action can report the content of `a.txt` produced by step `10`.

In [10]:
%sandbox
%run
[10]
output: 'a.txt'
bash:
    echo 'content of a.txt' > a.txt

[20]
report: input='a.txt'

content of a.txt



However, in the following example, step `20` is executed as the first step of workflow `default`. The `report` action requires input file `a.txt` and yields an error.

In [11]:
%sandbox --expect-error
%run
[a: provides='a.txt']
bash:
    echo 'content of a.txt' > a.txt

[20]
report: input='a.txt'

ValueError: Input file a.txt does not exist.


`a.txt` has to be put into the input statement of step `20` for the auxiliary step to be executed:

In [12]:
%sandbox
%run
[a: provides='a.txt']
bash:
    echo 'content of a.txt' > a.txt

[20]
input: 'a.txt'
report: input=_input[0]

content of a.txt



Although all actions accept parameter `input` and SoS will always check the existence of specified input file, the action themselves might or might not make use of this parameter. Roughly speaking, script-executing actions such as `run`, `bash` and `python` accepts this parameter and prepend the content of all input files to the script; report-generation actions `report`, `pandoc` and `RMarkdown` append the content of input files after the specifie dscript, and other actions usually ignore this parameter.

For example, if you have a function that needs to be included in a Python script (more likely multiple scripts), you could define it in a separate file and include it with scripts defined in a `python` action: 

In [13]:
%run
# define a function and save to file myfunc.inc
report: output="myfunc.inc"
  def myfunc():
    print("Hello")

[1]
python: input='myfunc.inc'
    myfunc()

Hello


### `output`

Similar to `input`, parameter `output` defines the output of an action, which can be a single name (or target) or a list of files or targets. SoS would check the existence of output target after the completion of the action. For example, 

In [14]:
%sandbox --expect-error
%run
[10]
bash: output='non_existing.txt'

RuntimeError: Output target non_existing.txt does not exist after completion of action bash


In addition to checking the existence of input and output files, specifying `input` and `output` of an action will allow SoS to create signatures of action so that it will not be executed when it is called again with the same input and output files. This is in addition to step-level signature and can be useful for long-running actions.

For example, suppose action `sh` is time-consuming that produces output `test.txt`

In [15]:
%run -s default
[10]
import time, os
time.sleep(2)

sh: input=[], output='test.txt'
   touch test.txt

print(os.path.getmtime('test.txt'))


1512781007.0


Because the action has parameter `input` and `output`, a signature will be created so it will not be re-executed even when the step itself is changed (from `sleep(2)` to `sleep(1)`).

In [16]:
%run -s default
[10]
import time, os
time.sleep(1)

sh: input=[], output='test.txt'
   touch test.txt

print(os.path.getmtime('test.txt'))


1512781008.0


Note that we have to use option `-s default` for our examples because the default mode for SoS in Jupyter is `ignore` so no siguatures will be saved and used by default.

### `stdout`

Option `stdout` is applicable to script-executing actions such as `bash` and `R` and redirect the standard out of the action to specified file. The value of the option should be a path-like object (`str`, `path`, etc), or `False`. The file will be opened in `append` mode so you will have to remove or truncate the file if the file already exists. If `stdout=False`, the output will be suppressed (redirect to `/dev/null` under linux).

For example,

In [11]:
!rm -f test.log

sh: stdout='test.log'
ls *.ipynb



In [12]:
!head -2 test.log

Extending_SoS.ipynb
Language_Module.ipynb


### `stderr`

Option `stderr` is similar to `stdout` but redirects the standard error output of actions. `stderr=False` also suppresses stderr.

### `tracked`

If an action takes a long time to execute and the step it resides tend to be changed (for example, during the development of a workflow step), you might want to keep action-level signatures so that the action could be skipped if it has been executed before.

Action-level signature is controlled by parameter `tracked`, which can be `None` (no signature), `True` (record signature), `False` (do not record signature), a string (filename), or a list of filenames. When this parameter is `True` or one or more filenames, SoS will

1. if specified, collect targets specified by parameter `input`
2. if specified, colelct targets specified by parameter `output`
3. if one or more files are specified, collect targets from parameter `tracked`

These files, together with the content of the first parameter (usually a script), will be used to create a step signature and allow the actions with the same signature be skipped.

For example, suppose action `sh` is time-consuming that produces output `test.txt`

In [17]:
%run -s force
[10]
import time, os
time.sleep(2)

sh: output='test.txt', tracked=True
   touch test.txt

print(os.path.getmtime('test.txt'))


1512781011.0


Because of the `tracked=True` parameter, a signature will be created with `output` and it will not be re-executed even when the step itself is changed (from `sleep(2)` to `sleep(1)`).

In [18]:
%run -s default
[10]
import time, os
time.sleep(1)

sh: output='test.txt', tracked=True
   touch test.txt

print(os.path.getmtime('test.txt'))


INFO: Action [32msh[0m is [32mignored[0m due to saved signature


1512781011.0


Note that the signature can only be saved and used with appropriate signature mode (`force`, `default` etc).

### `workdir`

Option `workdir` changes the current working directory for the action, and change back once the action is executed. The directory will be created if it does not exist.

In [19]:
bash: workdir='tmp'
   touch a.txt
bash:
    ls tmp
    rm tmp/a.txt
    rmdir tmp

a.txt


## Core Actions

Let us start by listing all options for action `run` and compare it with actions `script` and `bash` before we dive into the details:

|action |condition | interpreter (configurable for `script`) | args (configurable) | command |comment|
|:--|:--|:-|:-|:-|:-|:-|
|`run`| `windows`|  | `{filename}` | `{filename}` |execute script directly as `.bat` file|
| | non-windows | `/bin/bash` | `-ev {filename}`| `/bin/bash -ev {filename}` | execute script by bash, print command and exit with error |
| | script with shebang line (`#!`) |  | `{filename}`| `{filename}` | execute script directly |
| |      |  `/bin/bash`   | `{script}` | `/bin/bash` content of  script | `script` as arguments of `/bin/bash` |
|`bash` | | `/bin/bash` | `{filename}`| `/bin/bash {filename}` | execute script as a bash script |
|`script`| |  | `{filename}` | `{filename}` | execute script directly |
|     | | `any_interpreter` | `{filename}` | `any {filename}` | execute with specified interpreter |
|    | | `any_interpreter` | `{script}` | `any_interpreter` content of script | execute content of script directly in command line|

Note that
1. All actions except `script` has fixed interpreter although action `run` uses different interpreter for different situations.
3. All actions accept configurable `args`, which can contain `{filename}` and `{script}` with `filename` being the name of the temporary script file, and `script` being the content of the script. In the latter case, the content of the script goes to the command line directly. It can of course contain any other fixed options.
3. If no interpreter is specified, the command will consist of only `args` so either the script file (if `args={filename}`) or the content of the script (if `args={script}`) is executed. SoS will make the script file executable in this case.
4. All script-executing actions except for `run` and `script` have fixed interpreters.
