# How to deal with input and output of subkernels

* **Difficulty level**: easy
* **Time need to lean**: 20 minutes or less
* **Key points**:
  * SoS works with any Jupyter kernel
  * `%expand` treats input of cell as a Python f-string and expand it before sendint to subkernel
  * `%capture` captures output from subkernels and save them into Python variables
  * `%render` renders output from subkernels in different formats

A SoS kernel is a master kernel that can start, stop, and interact with any Jupyter kernel, which are called *subkernels* by SoS. Although most of the times SoS just passes user input to subkernels and sendes outputs from subkernels to the frontend (notebook), you can use a few SoS magics to modify user inputs before they are sent to the subkernel, and process outputs before they are sent to the frontend.

<p align="center">
  <img src="https://vatlab.github.io/sos-docs/doc/media/subkernel.png">
</p>


## <a id="magic_expand"></a> Interpolate user input with magic `%expand` 

Script in SoS cells are by default sent to SoS or the subkernels in verbatim. However, similar to the `expand` option of the SoS actions, you can interpolate scripts before they are executed by the kernels.

For example, without a `%expand` magic, user inputs are sent to the subkernel in verbatim:

In [1]:
print("A parameter {par} is specified.")

[1] "A parameter {par} is specified."


With variable `par` defined in the SoS kernel,

In [2]:
par = 100

the `%expand` treats the content of the following cell as a [Python f-string](https://www.python.org/dev/peps/pep-0498/) and expands expressions inside braces:

In [3]:
%expand
print("A parameter {par} is specified.")

[1] "A parameter 100 is specified."


If the script contains `{ }`, which is quite common in R, you can double the braces 

In [4]:
%expand
if ({par} > 50) {{
    print("A parameter {par} greater than 50 is specified.");
}}

[1] "A parameter 100 greater than 50 is specified."


If there are multiple braces, it is obviously better to use a different sigil, such as `${ }` to interpolate the script

In [5]:
%expand ${ }
if (${par} > 50) {
    print("A parameter ${par} greater than 50 is specified.");
}

[1] "A parameter 100 greater than 50 is specified."


Although not the topic of this tutorial, it is worth mentioning that the usage of the `%expand` magic is the same as the `expand` option of SoS actions so that you can convert the above script that was executed in a R session to an R action in a SoS workflow as follows:

In [6]:
R: expand='${ }'
if (${par} > 50) {
    print("A parameter ${par} greater than 50 is specified.");
}

[1] "A parameter 100 greater than 50 is specified."


## <a id="magic_capture"></a>Capture cell output with magic `%capture` 

Magic `%capture` all or part of the output of a cell to a SoS variable. To understand how this magic works, you will need to understand [how Jupyter  works](https://jupyter-client.readthedocs.io/en/stable/messaging.html). Briefly, after a cell is executed, the kernel sends one or more of messages `stream`, `display_data`, and other controlling messages before it sends `execute_result` to conclude the execution. The `stream` message type can contain standard output (`stdout`) and standard error output (`stderr`), and the `display_data` message can contain a lot more complex data under a dictionary with keys `text/html`, `text/plain`, `text/markdown` etc, and the frontend will decide how to display these messages.

### Determine what to capture

The `%capture` magic can capture the following types of information

| name | message |
|-- | -- |
| `stdout` | `stdout` of `stream` messages |
| `stderr` | `stderr` of `stream` messages |
| `text` | `text/plain` of `display_data` or `execute_result` messages |
| `html` | `text/html` of `display_data` or `execute_result` messages |
| `markdown` | `text/markdown` of `display_data` or `execute_result` messages |
| `raw` | All above messages |

The first step to capture output from a cell is to determine what types of messages are sent by the cell. If you are uncertain, you can open the console panel (right click and select `New Console for Notebook` if you are using Jupyter Lab), and use the `%capture` magic without option (or with the `raw` option).

In [7]:
%capture
echo "I am from Bash"

I am from Bash


The messages that has been returned by the cell will be displayed in the console window

```python
[('stream', {'name': 'stdout', 'text': 'I am from Bash\n'})]
```
for this cell, from which you can see that the message is of type `stdout`. You can then specify the `stdout` type,

In [8]:
%capture stdout
echo "I am from Bash"

I am from Bash


The captured result is by default saved to a variable `__captured` in the SoS kernel:

In [9]:
__captured

'I am from Bash\n'

You can use option `-t` (`--to`) to assign the name of the variable

In [10]:
%capture stdout --to bash_output
echo "I am from Bash"

I am from Bash


In [11]:
bash_output

'I am from Bash\n'

As a more complex example, the following cell runs a SPARQL query and returns multiple messages.

In [12]:
%capture

%format json
%display table 
%endpoint http://dbpedia.org/sparql

PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX foaf:  <http://xmlns.com/foaf/0.1/>
SELECT ?person ?name
WHERE {
    ?person a foaf:Person .
    ?person rdfs:label ?name
    FILTER regex(?name,"da Vinci","i")
    FILTER langMatches(lang(?name),"en")
} LIMIT 2

person,name
http://dbpedia.org/resource/Paul_Da_Vinci,Paul Da Vinci
http://dbpedia.org/resource/Leonardo_da_Vinci,Leonardo da Vinci


In [13]:
__captured

[('display_data',
  {'data': {'text/html': '<div class="krn-spql"><div class="magic">Return format: JSON</div><div class="magic">Display: table</div><div class="magic">Endpoint set to: http://dbpedia.org/sparql</div></div>',
    'text/plain': 'Return format: JSON\nDisplay: table\nEndpoint set to: http://dbpedia.org/sparql\n'},
   'metadata': {}}),
 ('display_data',
  {'data': {'text/html': '<div class="krn-spql"><table><tr class=hdr><th>person</th>\n<th>name</th></tr><tr class=odd><td class=val><a href="http://dbpedia.org/resource/Paul_Da_Vinci" target="_other">http://dbpedia.org/resource/Paul_Da_Vinci</a></td>\n<td class=val>Paul Da Vinci</td></tr><tr class=even><td class=val><a href="http://dbpedia.org/resource/Leonardo_da_Vinci" target="_other">http://dbpedia.org/resource/Leonardo_da_Vinci</a></td>\n<td class=val>Leonardo da Vinci</td></tr></table><div class="tinfo">Total: 2, Shown: 2</div></div>'},
   'metadata': {}})]

You can then save the content of `text/html` to a variable `html_table`

In [14]:
%capture html --to html_table

%format json
%display table 
%endpoint http://dbpedia.org/sparql

PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX foaf:  <http://xmlns.com/foaf/0.1/>
SELECT ?person ?name
WHERE {
    ?person a foaf:Person .
    ?person rdfs:label ?name
    FILTER regex(?name,"da Vinci","i")
    FILTER langMatches(lang(?name),"en")
} LIMIT 2

person,name
http://dbpedia.org/resource/Paul_Da_Vinci,Paul Da Vinci
http://dbpedia.org/resource/Leonardo_da_Vinci,Leonardo da Vinci


which contains the `text/html` data of two messages

In [15]:
print(html_table)

<div class="krn-spql"><div class="magic">Return format: JSON</div><div class="magic">Display: table</div><div class="magic">Endpoint set to: http://dbpedia.org/sparql</div></div><div class="krn-spql"><table><tr class=hdr><th>person</th>
<th>name</th></tr><tr class=odd><td class=val><a href="http://dbpedia.org/resource/Paul_Da_Vinci" target="_other">http://dbpedia.org/resource/Paul_Da_Vinci</a></td>
<td class=val>Paul Da Vinci</td></tr><tr class=even><td class=val><a href="http://dbpedia.org/resource/Leonardo_da_Vinci" target="_other">http://dbpedia.org/resource/Leonardo_da_Vinci</a></td>
<td class=val>Leonardo da Vinci</td></tr></table><div class="tinfo">Total: 2, Shown: 2</div></div>


Then, if you would like to process the output programatically, you can use one of the many powerful Python modules

In [16]:
from bs4 import BeautifulSoup
soup = BeautifulSoup(html_table, 'html.parser')
soup.find_all('a')

[<a href="http://dbpedia.org/resource/Paul_Da_Vinci" target="_other">http://dbpedia.org/resource/Paul_Da_Vinci</a>,
 <a href="http://dbpedia.org/resource/Leonardo_da_Vinci" target="_other">http://dbpedia.org/resource/Leonardo_da_Vinci</a>]

### Capture formatted content

If the output of a cell is well-formatted, it is possible to capture the output as variables in a type other than `str`.

For example, if you would like to capture the size of some files from a few notebook files. Instead of using Python scripts, you could possibly use a shell command as follows 

In [17]:
!ls -l ex*.ipynb  | awk '{{printf("%s,%d\n", $9, $5)}}'

10:35,0
11:29,0
13:44,0
13:42,0
11:30,0


The output is well formatted so you can capture it in csv format as follows

In [18]:
%capture stdout --as csv --to notebooks
!ls -l ex*.ipynb  | awk '{{printf("%s,%d\n", $9, $5)}}'

10:35,0
11:29,0
13:44,0
13:42,0
11:30,0


The resulting variable is a Pandas DataFrame but unfortunately treated the first data line as header, which is not entirely correct here.

In [19]:
notebooks

Unnamed: 0,10:35,0
0,11:29,0
1,13:44,0
2,13:42,0
3,11:30,0


The `%capture` magic can capture data in `text` (default), `json`, `csv`, and `tsv` format, and can append to instead of replacing an existing variable (option `-a`). Please refer to the [SoS Magics reference](sos_magics.html) or command `%capture -h` for a comlete list of options.

## <a id="magic_render"></a>Intercept and render cell output with magic `%render` 

The `%render` magic intercepts the output of a cell, convert it to certain format before displaying it in the notebook. The format can be any format supported by the [`IPython.display` module](https://ipython.org/ipython-doc/3/api/generated/IPython.display.html) and is default to `Markdown`.

For example, if you have a dataset 

In [20]:
data = [('John', 'Smith', 50),
        ('Eve', 'Jackson', 35)
       ]

You can format it in HTML format

In [21]:
res = '''
<table>
  <tr>
    <th>Firstname</th>
    <th>Lastname</th> 
    <th>Age</th>
  </tr>
'''
for first_name, last_name, age in data:
    res += f'''
  <tr>
    <td>{first_name}</td>
    <td>{last_name}</td> 
    <td>{age}</td>
  </tr>
'''
res += '</table>'

print(res)


<table>
  <tr>
    <th>Firstname</th>
    <th>Lastname</th> 
    <th>Age</th>
  </tr>

  <tr>
    <td>John</td>
    <td>Smith</td> 
    <td>50</td>
  </tr>

  <tr>
    <td>Eve</td>
    <td>Jackson</td> 
    <td>35</td>
  </tr>
</table>


and render it as a HTML table.

In [22]:
%render --as HTML
res

Firstname,Lastname,Age
John,Smith,50
Eve,Jackson,35


Currently `%render` only renders `stdout` (of `stream` messages, default) and `text` (`text/plain` of `display_data` messages) contents, and you should probably use `%capture raw` to check the type of output before you `%render`.

The `%render` magic accepts any renderer that is defined in the `IPython.display` module. The following cell lists all renderers,

In [23]:
import IPython.display
import inspect

print('Options of magic %render')
for key in IPython.display.__dict__.keys():
    cls = getattr(IPython.display, key)
    if inspect.isclass(cls) and issubclass(cls, IPython.display.DisplayObject):
        print('* {}'.format(key))

Options of magic %render
* DisplayObject
* TextDisplayObject
* Pretty
* HTML
* Markdown
* Math
* Latex
* SVG
* ProgressBar
* JSON
* GeoJSON
* Javascript
* Image
* Video
* Audio
* Code


and of course a `%render` magic would treat the output as markdown format and display the items as bullet points:

In [24]:
%render
import IPython.display
import inspect

print('Options of magic %render')
for key in IPython.display.__dict__.keys():
    cls = getattr(IPython.display, key)
    if inspect.isclass(cls) and issubclass(cls, IPython.display.DisplayObject):
        print('* {}'.format(key))

Options of magic %render
* DisplayObject
* TextDisplayObject
* Pretty
* HTML
* Markdown
* Math
* Latex
* SVG
* ProgressBar
* JSON
* GeoJSON
* Javascript
* Image
* Video
* Audio
* Code


The ability to render text output as markdown text alleviatea a problem with the Jupyter notebooks in that its markdown cells cannot contain variables, so you cannot really mix results with their descriptions as easily as what Rmarkdown inline expressions do. However, with the `%render` magic, you can write markdown text as a string in any kernel, and use the `%render` magic to display it. 

For example, if you have `res` obtained from some analysis

In [25]:
res <- rnorm(5)

You can report the result by generating a markdown text programmatically and use the `%render` magic to render it

In [26]:
%render
cat(paste0('Array of size ', length(res), '\n\n'),
    paste('*', res, '\n'))

Array of size 5

 * 1.08093160164272 
 * 0.847488857762835 
 * -0.702061156101016 
 * 0.471996793250837 
 * -0.200615864745463 


This is less intuitive than writing down markdown text directly but you have the flexibility to generate the markdown text using any language and you can use conditions and loops to automate the output of long reports.

## Further reading

* [Exchanging data among kernels](doc/user_guide/exchange_variable.html) for other options to exchange data between kernels