# Lecture 9 - Introduction to Systems Biology

In this lecture you have learned about the main types of biological pathways. Most pathway databases like [Reactome](https://reactome.org/), [KEGG](https://www.genome.jp/kegg/), and [BioCyc](https://biocyc.org/) allow you to visually navigate through the pathways and search for certain keywords. 

These user-friendly features are convenient for quick searches but not so useful when you need to perform them repeatedly and would rather automate certain tasks. Many biological databases offer programatic access through a [REST](https://en.wikipedia.org/wiki/Representational_state_transfer) API. 

> In short, a REST API is a way to perform a query over the web using HTTP where the query parameters are encoded in the URL and the request response contains the query result in a text format (plain text, xml, json, or html).


### Learning objectives:

- Learn to *"browse the web"* (i.e. to read data from webpages) using Python
- First (very brief) encounter with the *Pandas* library
- Using *pyvis* to draw networks inside a Jupyter notebook

## Exercise 1

In this exercise we will query the [**Rhea**](https://www.rhea-db.org/) database to obtain all the biochemical reactions in the aromatic amino acid pathway. 

**Rhea** provides a REST API as described in the [**documentation**](https://www.rhea-db.org/help/rest-api). We can use the [**requests**](https://docs.python-requests.org/en/latest/) library in Python to make a request to a REST API. 

In [None]:
import requests

RHEA_URL = "https://www.rhea-db.org/rhea?"

response = requests.get(RHEA_URL, params={'query': 'ec:1.1.1.1'})

By default, Rhea will return an HTML page just as if you had performed the query directly on the website:

In [None]:
from IPython.display import display, HTML

HTML(response.text)

-------

As explained in the [**documentation**](https://www.rhea-db.org/help/rest-api) we can request for the result to be presented instead as a tab-separated table:

In [None]:
response = requests.get(RHEA_URL, params={'query': 'ec:1.1.1.1', 'format': 'tsv'})

print(response.text)

The [**Pandas**](https://pandas.pydata.org/) library can help us create a DataFrame object directly from this tab-separated table:

> PS: I highly recommend you explore the [documentation](https://pandas.pydata.org/docs/user_guide/index.html). Pandas is one of the **best** scientific libraries in Python.

In [None]:
import pandas as pd
import io

pd.read_csv(io.StringIO(response.text), sep='\t')

### 1.1

Use the Rhea API to find all the reactions in the aromatic amino acid pathway (using the list of EC numbers below) and return the result as a Pandas DataFrame.

> Tip: you can search for multiple terms in Rhea if you separate them with " or ".

In [None]:
pathway = ['2.7.1.71', '2.5.1.19', '4.2.3.5', '5.4.99.5', '4.2.1.51', '1.3.1.12',
           '2.6.1.57', '4.1.3.27', '2.4.2.18', '5.3.1.24', '4.1.1.48', '4.2.1.20']

In [None]:
# type your code here...

Click below to see the solution...

In [None]:

query = ' or '.join(pathway)

response = requests.get(RHEA_URL, params={'query': query,
                                          'format': 'tsv', 
                                          'columns': 'rhea-id,ec,equation'})

df = pd.read_csv(io.StringIO(response.text), sep='\t')

df

## Exercise 2

[**Pyvis**](https://pyvis.readthedocs.io/en/latest/) is a python library for interactive network visualization. Here is a simple example of how to use it:

In [None]:
from pyvis.network import Network

net = Network(directed=True, notebook=True, height='300px', width='500px')

nodes = ['a', 'b', 'c', 'd']
net.add_nodes(nodes)

edges = [('a', 'b'), ('b', 'c'), ('b', 'd')]
net.add_edges(edges)

net.show('tmp1.html')
display(HTML(filename='tmp1.html')) # fix for html not displaying correctly 

> Tip: try to drag around the nodes and see what happens.
    
--------- 

We can use [bipartite graphs](https://en.wikipedia.org/wiki/Bipartite_graph) (graphs with two kinds of nodes) to represent chemical reactions.

Let's create a simple network with two reactions:
- R1: a + b -> c
- R2: c -> d + e

In [None]:
net = Network(directed=True, notebook=True, height='300px', width='500px')

metabolites = ['a', 'b', 'c', 'd', 'e']
for m in metabolites:
    net.add_node(m)

reactions = ['R1', 'R2']
for r in reactions:
    net.add_node(r, shape='box')

substrates = [('a', 'R1'), ('b', 'R1'), ('c', 'R2')]
net.add_edges(substrates)

products = [('R1', 'c'),  ('R2', 'd'), ('R2', 'e')]
net.add_edges(products)

net.show('tmp2.html')
display(HTML(filename='tmp2.html'))

### 2.1

Create a metabolic network using the reactions in the dataframe obtained in the previous exercise:

> Tip 1: use [iterrows()](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.iterrows.html) to iterate over the rows in the DataFrame.

> Tip 2: use the string [split()](https://docs.python.org/3/library/stdtypes.html#str.split) method to break the equations into substrates and products.

> Tip 3: this is a hard exercise, don't be frustrated if it takes a while... 💪



In [None]:
# type your code here...

Click below to see the solution...

In [None]:

net = Network(directed=True, notebook=True, height='600px', width='1000px')

for i, row in df.iterrows():
    reaction = row['Reaction identifier']
    net.add_node(reaction, shape='box', label=row['EC number'])
    
    left, right = row['Equation'].split(' = ')
    
    for compound in left.split(' + '):
        net.add_node(compound)
        net.add_edge(compound, reaction)
        
    for compound in right.split(' + '):
        net.add_node(compound)
        net.add_edge(reaction, compound)
        
net.show('tmp3.html')
display(HTML(filename='tmp3.html'))

> 🧠 Can you find your way through the graph by *"walking"* from shikimate to L-tryptophan? 

### Conclusion:

This was a very *condensed* tutorial with many new concepts at once. You learned how to open websites from Python, how to load tabular data into a Pandas dataframe (we will explore pandas in more detail later), and how to create and display networks inside Jupyter using Pyvis. 

Don't worry if it felt like too much at once (it was indeed), the main goal was to show you how much stuff you can do with just a few lines of code 😎