# Getting Started with SoS Workflow System

* **Difficulty level**: easy
* **Time need to lean**: 10 minutes or less
* **Key points**:
  * SoS uses classic Jupyter or Jupyter Lab with a SoS kernel as its IDE
  * SoS steps can be developed and executed in SoS Notebook
  * SoS workflows are embedded in Jupyter notebook
  * Complete SoS workflows can be executed in Jupyter notebook with magics `%run` and `%sosrun`, or with the `sos` command from command line

## SoS Workflow System in Jupyter

SoS Workflow System uses SoS Notebook as its IDE. The following figure illustrates the overall design of SoS Workflow System and SoS Notebook:

![JupyterCon18 SoS Talk](https://vatlab.github.io/sos-docs/doc/media/SoS_Notebook_and_Workflow.png)

Basically,

* SoS Notebook is a [Jupyter Notebook](https://jupyter.org/) with a SoS kernel.
* SoS Notebook serves as a super kernel to all other Jupyter kernels and allows the use of multiple kernels in a single notebook.
* SoS Notebook also serves as the IDE for SoS Workflow System.

The figure is linked to a [youtube video](https://www.youtube.com/watch?v=U75eKosFbp8) for a [presentation on SoS during the 2018 JupyterCon](https://github.com/vatlab/JupyterCon2018), which introduces both SoS Notebook and SoS Workflow System and can be a good starting point for you to learn SoS. The SoS Workflow part starts at [20min](https://www.youtube.com/watch?v=U75eKosFbp8#t=20m).

## Running SoS

The Running SoS section of the [SoS Homepage](https://vatlab.github.io/sos-docs/) contains all the instructions on how to install SoS. Briefly, you have the following options to use SoS

* Try SoS using our live server [http://vatlab.github.io/sos/live](http://vatlab.github.io/sos/live).
* Start a Jupyter notebook server from our docker image [mdabioinfo/sos-notebook](https://hub.docker.com/r/mdabioinfo/sos-notebook/).
* Install `sos` and `sos-notebook` locally if you have a local Python (3.6 or higher) installation and a working Jupyter server with kernels of interest.
* Check with your system administrator if you have access to an institutional JupyterHub server with SoS installed.

For the purpose of this tutorial, it is good enough to use our live server [http://vatlab.github.io/sos/live](http://vatlab.github.io/sos/live). After you see the following interface, select New -> SoS to create a SoS notebook. You can also go to `examples` and open existing SoS notebooks.

## Using the SoS kernel

This tutorial is written in a SoS Notebook, which consists of multiple **markdown cells** and **code cells**. With the SoS kernel, each code cell can have its own kernel. SoS Notebook allows you to use multiple kernels in a single notebook and exchange variables among live kernels. This allow you to develop scripts and analyze data in different languages.

For example, the following three code cells perform a multi-language data analysis where the first cell defines a few variables (in Python, as SoS is based on Python), the second cell runs a bash script to convert an excel file to csv format, and the last cell uses R to read the csv file and generate a plot. Three different kernels, SoS, [bash_kernel](https://github.com/takluyver/bash_kernel), and [IRkernel](https://github.com/IRkernel/IRkernel) are used, and a `%expand` magic is used to pass filenames from the SoS kernel to other kernels.

In [1]:
excel_file = 'data/DEG.xlsx'
csv_file = 'DEG.csv'
figure_file = 'output.pdf'

In [2]:
%expand
xlsx2csv {excel_file} > {csv_file}

In [3]:
%expand
data <- read.csv('{csv_file}')
pdf('{figure_file}')
plot(data$log2FoldChange, data$stat)
dev.off()

The SoS cell above is called a **scratch cell** because it does not contain a formal SoS step. Such cells accept:
* **Any Python statements** because SoS is extended from Python 3.6,
* **SoS magics** as documented [here](https://vatlab.github.io/sos-docs/doc/documentation/SoS_Magics.html), and
* **Any SoS step definition without header**, which we will introduce later.

## Use magic `%run` to execute the current cell as a SoS workflow

Scripts in different languages can be converted to steps in SoS workflows by adding section headers in the format of

```
[header_name]
```
or
```
[header_name: options]
```


<div class="bs-callout bs-callout-info" role="alert">
  <h4>%run</h4>
    <p> The <code>%run</code> magic execute the content of the cell as a complete SoS workflow.</p>  
</div>

The SoS magic `%run` can be used to execute workflows defined in the current cell. For example, the following cell executes a simple `hello_world` workflow with a single `print` statement.

In [5]:
%run
[hello_world]
print('This is our first hellp world workflow')

0,1,2,3,4
,hello_world,Workflow ID  74fbf4d9ab411228,Index  #1,completed  Ran for 0 sec


This is our first hellp world workflow


SoS starts an external `sos` process, execute the workflow and displays the output in the notebook. A status table is created to list the workflow name, ID and other information, which can be removed if you click the status icon.

A notebook cell can contain complete workflows with multiple steps. For example, the following cell defines a SoS workflows that resembles the analysis that was performed using three different kernels. The workflow consists of a `global` section that defines variables for all steps, and two sections `plot_10` and `plot_20` that constitute two steps of workflow `plot`. The first step executes a shell script and the second step executes a `R` script. The syntax of this workflow will be discussed in details soon.

In [3]:
%run

[global]
excel_file = 'data/DEG.xlsx'
csv_file = 'DEG.csv'
figure_file = 'output.pdf'

[plot_10]
run: expand=True
    xlsx2csv {excel_file} > {csv_file}

[plot_20]
R: expand=True
    data <- read.csv('{csv_file}')
    pdf('{figure_file}')
    plot(data$log2FoldChange, data$stat)
    dev.off()

0,1,2,3,4
,plot,Workflow ID  221e15922696f853,Index  #3,completed  Ran for < 5 seconds


xlsx2csv data/DEG.xlsx > DEG.csv



null device 
          1 


## Execute embedded workflows using magic `%sosrun`

A SoS notebook can have multiple workflow sections defined in multiple code cell. These sections constitute the content of the **embedded SoS script** of the notebook.

<div class="bs-callout bs-callout-info" role="alert">
  <h4>Embedded SoS script</h4>
  <p>An embed SoS script consists of SoS sections in all SoS cells of a notebook.</p>  
</div>

The easiest way to view the embedded script of a SoS notebook is to use the `%preview --workflow` magic as follows (The option `-n` lists the script in the notebook instead of the console panel). As you can see, the embedded script consists of steps from the entire notebook.

In [7]:
%preview -n --workflow

<div class="bs-callout bs-callout-info" role="alert">
  <h4>%sosrun</h4>
  <p> The <code>%sosrun</code> magic execute workflows defined in the embedded SoS script of a notebook.</p>  
</div>

The `%sosrun` magic can be used to execute any of the workflows defined in the notebook. For example, the following magic execute the workflow `plot` defined in the above section. Because multiple workflows are defined in this notebook (`hello_world`, and `plot`), a workflow name is required for this magic.

In [4]:
%sosrun plot

0,1,2,3,4
,plot,Workflow ID  262714c9215ce33a,Index  #4,completed  Ran for 1 sec


xlsx2csv data/DEG.xlsx > DEG.csv



null device 
          1 


<div class="bs-callout bs-callout-warning" role="alert">
  <h4>Warning</h4>
    <p>Workflow cells can only be executed by SoS magics <code>%run</code> and <code>%sosrun</code>. SoS will not produce any output if you execute a workflow cell directly.</p>  
</div>

## Execute embedded workflows with command `sos`

The `%run` and `%sosrun` magics call an external command `sos` to execute workflows defined in the notebook. Although for the sake of convenience we will use magic `%run` to execute workflows throughout this documentation, please remember that **you can execute the notebook, or a text file with the workflows, using command `sos` from command line**.

Using a magic `!` that execute any shell command, we can mimick the execution of this notebook from the command line in the notebook as:

In [6]:
!sos run sos_in_notebook.ipynb plot

INFO: Running [32mplot_10[0m: 
xlsx2csv data/DEG.xlsx > DEG.csv

INFO: Running [32mplot_20[0m: 
null device 
          1 
INFO: Workflow plot (ID=262714c9215ce33a) is executed successfully with 2 completed steps.


This is also equivalent of running command `sos run sos_in_notebook.sos plot` if you create a plain text file named `sos_in_notebook.sos` with the content of the embedded workflow.

## Further reading

* [Inclusion of scripts](Inclusion_of_scripts.html)
* [How to define and execute basic forward-type workflows](doc/user_guide/forward_workflow.html)