# Getting Started with SoS Workflow System

* **Difficulty level**: easy
* **Time need to lean**: 10 minutes or less
* **Key points**:
  * SoS uses classic Jupyter or Jupyter Lab with a SoS kernel as its IDE
  * SoS steps can be developed and executed in SoS Notebook
  * SoS workflows are embedded in Jupyter notebook
  * Complete SoS workflows can be executed in Jupyter notebook with magics `%run` and `%sosrun`, or with the `sos` command from command line

## SoS Workflow System in Jupyter

SoS Workflow System uses SoS Notebook as its IDE. The following figure illustrates the overall design of SoS Workflow System and SoS Notebook:

![JupyterCon18 SoS Talk](https://vatlab.github.io/sos-docs/doc/media/SoS_Notebook_and_Workflow.png)

Basically,

* SoS Notebook is a [Jupyter Notebook](https://jupyter.org/) with a SoS kernel.
* SoS Notebook serves as a super kernel to all other Jupyter kernels and allows the use of multiple kernels in a single notebook.
* SoS Notebook also serves as the IDE for SoS Workflow System.

The figure is linked to a [youtube video](https://www.youtube.com/watch?v=U75eKosFbp8) for a [presentation on SoS during the 2018 JupyterCon](https://github.com/vatlab/JupyterCon2018), which introduces both SoS Notebook and SoS Workflow System and can be a good starting point for you to learn SoS. The SoS Workflow part starts at [20min](https://www.youtube.com/watch?v=U75eKosFbp8#t=20m).

## Running SoS

The [Running SoS section](https://vatlab.github.io/sos-docs/running.html#content) of the [SoS Homepage](https://vatlab.github.io/sos-docs/) contains all the instructions on how to install and run SoS. Briefly, you have the following options to use SoS

* Try SoS using our live server [http://vatlab.github.io/sos/live](http://vatlab.github.io/sos/live).
* Start a Jupyter notebook server from our docker image [mdabioinfo/sos-notebook](https://hub.docker.com/r/mdabioinfo/sos-notebook/).
* Install `sos` and `sos-notebook` locally if you have a local Python (3.6 or higher) installation and a working Jupyter server with kernels of interest.
* Check with your system administrator if you have access to an institutional JupyterHub server with SoS installed.

For the purpose of this tutorial, it is good enough to use our live server [http://vatlab.github.io/sos/live](http://vatlab.github.io/sos/live). After you see the following interface, select New -> SoS to create a SoS notebook. You can also go to `examples` and open existing SoS notebooks.

## Using the SoS kernel

This tutorial is written in a SoS Notebook, which consists of multiple **markdown cells** and **code cells**. With the SoS kernel, each code cell can have its own kernel. SoS Notebook allows you to use multiple kernels in a single notebook and exchange variables among live kernels. This allow you to develop scripts and analyze data in different languages.

For example, the following three code cells perform a multi-language data analysis where the first cell defines a few variables, the second cell runs a bash script to convert an excel file to csv format, and the last cell uses R to read the csv file and generate a plot. Three different kernels, SoS, [bash_kernel](https://github.com/takluyver/bash_kernel), and [IRkernel](https://github.com/IRkernel/IRkernel) are used, and a `%expand` magic is used to pass filenames from the SoS kernel to other kernels.

In [1]:
excel_file = 'data/DEG.xlsx'
csv_file = 'DEG.csv'
figure_file = 'output.pdf'

In [2]:
%expand
xlsx2csv {excel_file} > {csv_file}

In [3]:
%expand
data <- read.csv('{csv_file}')
pdf('{figure_file}')
plot(data$log2FoldChange, data$stat)
dev.off()

<div class="bs-callout bs-callout-primary" role="alert">
  <h4>SoS is extended from Python 3.6+</h4>
    <p>The SoS workflow system extends the syntax of Python 3.6+ so <b>SoS codecells accept any Python code</b></p>
</div>

## Use magic `%run` to execute the current cell as a SoS workflow

Scripts in different languages can be converted to steps in SoS workflows by adding section headers in the format of

```
[header_name]
```
or
```
[header_name: options]
```


<div class="bs-callout bs-callout-primary" role="alert">
  <h4>%run</h4>
    <p> The <code>%run</code> magic execute the content of the cell as a complete SoS workflow using an external process.</p>  
</div>

The SoS magic `%run` can be used to execute workflows defined in the current cell. For example, the following cell executes a simple `hello_world` workflow with a single `print` statement.

In [4]:
%run

print('This is our first hello world workflow')

This is our first hello world workflow


SoS starts an external `sos` process, execute the workflow and displays the output in the notebook. **The workflow is executed independently and does not share any variables in the SoS kernel**. For example, if you define a variable in the SoS kernel

In [5]:
my_name = 'sos_in_notebook.ipynb'

You can use the variable in another SoS cell:

In [6]:
print(f'This notebook is named {my_name}')

This notebook is named sos_in_notebook.ipynb


But the variable is not available in the following cell, which is executed by magic `%run` as an independent workflow unless you [define a parameter and pass the variable to it from command line](parameters.html).

In [7]:
%run

print(f'This notebook is named {my_name}')

[91mERROR[0m: [91m[default]: 
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
script_4008715518445167588 in <module>
----> print(f'This notebook is named {my_name}')

NameError: name 'my_name' is not defined[0m


A notebook cell can contain complete workflows with multiple steps. For example, the following cell defines a SoS workflows that resembles the analysis that was performed using three different kernels. The workflow consists of a `global` section that defines variables for all steps, and two sections `plot_10` and `plot_20` that constitute two steps of workflow `plot`. The first step executes a shell script and the second step executes a `R` script. The syntax of this workflow will be discussed in details in [the next tutorial](doc/user_guide/scripts_in_sos.html).

In [8]:
%run

[global]
excel_file = 'data/DEG.xlsx'
csv_file = 'DEG.csv'
figure_file = 'output.pdf'

[plot_10]
run: expand=True
    xlsx2csv {excel_file} > {csv_file}

[plot_20]
R: expand=True
    data <- read.csv('{csv_file}')
    pdf('{figure_file}')
    plot(data$log2FoldChange, data$stat)
    dev.off()

xlsx2csv data/DEG.xlsx > DEG.csv



null device 
          1 


### Controlling verbosity (option `-v`)

Magics `%run` (actually the underlying `sos` command) accepts a set of optional arguments, the easiest of which is option `-v` that controls the verbosity of output.

<div class="bs-callout bs-callout-info" role="alert">
    <h4>The verbosity (<code>-v</code>) argument of magics <code>%run</code>, <code>%sosrun</code> and command <code>sos run</code></h4>
    <p>The verbosity argument <code>-v</code> accepts values </p>
    <ul>
        <li><code>-v 0</code>: Display no system messages except errors</li>
        <li><code>-v 1</code>: Display errors and warnings, and a text-based progress bar</li>
        <li><code>-v 2 (default)</code>: Display errors, warnings, and informational messages</li>
        <li><code>-v 3</code>: Display additional debug messages</li>
        <li><code>-v 4</code>: Display very verbose trace messages for development purposes</li>
     </ul>
</div>

As you have seen, the default verbosity level `-v2` displays messages to report the status of execution:

In [9]:
%run

print('This is our first hello world workflow')

This is our first hello world workflow


You can suppress these messages using option `-v1` even `-v0`

In [10]:
%run -v0

print('This is our first hello world workflow')

This is our first hello world workflow


### Running long workflows in background *

<div class="bs-callout bs-callout-info" role="alert">
    <h4>Execute workflow in non-blocking mode</h4>
    <p>You can execute a workflow using magics <code>%run</code>, <code>%sosrun</code>, and <code>%runfile</code> in the background by adding a <code>&</code> at the end of the magic. The workflow will be executed in a queue while you can continue to work the notebook.</p>
</div>

SoS Notebook usually starts a workflow and waits until the workflow is completed. If the workflow takes a long time to execute, you can send workflows to a queue in which workflows will be executed one by one while you continue to work on the notebook. A status table will be displayed for each queued workflows and log messages and results will continue to send back to SoS Notebook.

In [11]:
%run -v0 &

import time
for i in range(5):
    print(i)
    time.sleep(10)

0,1,2,3,4
,plot,Workflow ID  4dd99bdf71b08cf8,Index  #1,completed  Ran for 1 sec


## Execute embedded workflows using magic `%sosrun`

A SoS notebook can have multiple workflow sections defined in multiple code cell. These sections constitute the content of the **embedded SoS script** of the notebook.

<div class="bs-callout bs-callout-primary" role="alert">
  <h4>Embedded SoS script</h4>
  <p>An embed SoS script consists of SoS sections in all SoS cells of a notebook.</p>  
</div>

The easiest way to view the embedded script of a SoS notebook is to use the `%preview --workflow` magic as follows (The option `-n` lists the script in the notebook instead of the console panel). As you can see, the embedded script consists of steps from the entire notebook.

In [12]:
%preview -n --workflow

<div class="bs-callout bs-callout-primary" role="alert">
  <h4>%sosrun</h4>
  <p> The <code>%sosrun</code> magic execute workflows defined in the embedded SoS script of a notebook.</p>  
</div>

The `%sosrun` magic can be used to execute any of the workflows defined in the notebook. For example, the following magic execute the workflow `plot` defined in the above section. Because multiple workflows are defined in this notebook (`hello_world`, and `plot`), a workflow name is required for this magic.

In [13]:
%sosrun plot

xlsx2csv data/DEG.xlsx > DEG.csv



null device 
          1 


<div class="bs-callout bs-callout-warning" role="alert">
  <h4>Warning</h4>
    <p>Workflow cells can only be executed by SoS magics <code>%run</code> and <code>%sosrun</code>. SoS will not produce any output if you execute a workflow cell directly.</p>  
</div>

## Execute external script with magic `%runfile`

<div class="bs-callout bs-callout-primary" role="alert">
  <h4>%runfile filename</h4>
  <p> The <code>%runfile</code> execute a SoS script from specified file with specified option. Both SoS scripts (usually with extension <code>.sos</code>) and SoS notebooks (with extension <code>.ipynb</code>) are supported.</p>  
</div>

The third magic to execute SoS workflows in SoS Notebook is to use the `%runfile` magic, which execute workflows from a specified external file. For example, instead of using magic `%sosrun`, you can execute the current notebook with magic 

In [14]:
%runfile sos_in_notebook.ipynb plot

xlsx2csv data/DEG.xlsx > DEG.csv



null device 
          1 


## Execute embedded workflows with command `sos`

The `%sosrun` magic calls an external command `sos` to execute workflows defined in the notebook. Although for the sake of convenience we will use magic `%run` to execute workflows throughout this documentation, please remember that **you can execute the notebook using command `sos` from command line**.

![running notebook from command line](../media/sos_cmd_cli.png)


Alternatively, you can also write the workflow in a text file (usually with extension `.sos`) and execute it with command `sos run`:

![running script from command line](../media/sos_cmd_script_cli.png)


## Further reading

* [Inclusion of scripts](Inclusion_of_scripts.html)
* [How to define and execute basic forward-type workflows](doc/user_guide/forward_workflow.html)
* [Command line interface](doc/user_guide/cli.html)