# Supplementary material for the brave and curious

## About Jupyter notebooks
<span style="color:red">**Assignment (1 min):**</span> Visit the [Jupyter Project website](https://jupyter.org/index.html). Scroll down on the homepage to the section "Currently in use at". Recognize any companies and/or institutions?  

<span style="color:red">**Assignment (5 min):**</span> For some additional motivation that you can do cool stuff with Jupyter notebooks, visit [A gallery of interesting Jupyter Notebooks](https://github.com/jupyter/jupyter/wiki/A-gallery-of-interesting-Jupyter-Notebooks).

**Noteworthy examples (TA's opinion):**
- [Lung Cancer Post-Translational Modification and Gene Expression Regulation¶
](http://nbviewer.jupyter.org/github/MaayanLab/CST_Lung_Cancer_Viz/blob/master/notebooks/CST_Data_Viz.ipynb?flush_cache=true)
- [2010 US Census data](https://anaconda.org/jbednar/census/notebook)
 
 
<span style="color:red">**Assignment (1 min):**</span> Can you come up with some arguments to support the statement: "Jupyter notebooks will revolutionize computational science?"

**Hint:** See [The Paper of the Future by Alyssa Goodman et al. (Authorea Preprint, 2017)](https://dx.doi.org/10.22541/au.148769949.92783646). This article explains and shows with demonstrations how scholarly "papers" can morph into long-lasting rich records of scientific discourse, enriched with deep data and code linkages, interactive figures, audio, video, and commenting. It includes an interactive d3.js visualization and has an astronomical data figure with an IPYthon Notebook "behind" it.

## Restarting the kernels manually

The kernel maintains the state of a notebook's computations. You can reset this state by restarting the kernel. This is done by clicking on the <button class='btn btn-default btn-xs'><i class='fa fa-repeat icon-repeat'></i></button> in the toolbar above. 

**Note:** Restarting the kernel whipes the memory clean. 

<span style="color:red">**Assignment (2 min):**</span>
1. Execute the cell below. This will save the variable dna_seq, containing a short string of DNA sequence, in the memory. 
2. Now restart the kernel manually as described above. 
3. In the second cell below try showing the contents of dna_seq. 

Does python still know what dna_seq is? 

## All you need to know
Some words of wisdom by Tim Peters, i.e. the Zen of Python.

In [1]:
# ignore the first two lines: they allow you to show multiple outputs per cell
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all" 

import this

The Zen of Python, by Tim Peters

Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one-- and preferably only one --obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.
Now is better than never.
Although never is often better than *right* now.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea -- let's do more of those!


## Strings, lists and dictionaries
An example for each of the most important types in python

In [2]:
x = 'One definition of systems biology: the study of the interactions between the components of biological systems, \
and how these interactions give rise to the function and behavior of that system (for example, the enzymes and \
metabolites in a metabolic pathway or the heart beats)' # string
print(type(x))
print(x)
print()

x = [0.5,2,24] # list
print(type(x))
print(x)
print('The (rough) cell cycle time of e.coli =',x[0],'hrs, of yeast =',x[1],'hrs, of a human cell',x[2],'hrs')
print()

x = {'e. coli':5,'yeast':12,'human':2.9e3} # dictionary
print(type(x))
print(x)
print('The genome size of e.coli =',x['e. coli'],'Mbp, of yeast =', x['yeast'],'Mbp, of a human cell',x['human'],'Mbp')

<class 'str'>
One definition of systems biology: the study of the interactions between the components of biological systems, and how these interactions give rise to the function and behavior of that system (for example, the enzymes and metabolites in a metabolic pathway or the heart beats)

<class 'list'>
[0.5, 2, 24]
The (rough) cell cycle time of e.coli = 0.5 hrs, of yeast = 2 hrs, of a human cell 24 hrs

<class 'dict'>
{'e. coli': 5, 'yeast': 12, 'human': 2900.0}
The genome size of e.coli = 5 Mbp, of yeast = 12 Mbp, of a human cell 2900.0 Mbp



## For the brave
Mathematically, this can be concisely summarized as
\begin{align*}
\text{Maximize } Z &=c^T v, \text{ such that } \\
Sv &= 0 \\
\alpha_k \leq & ~ v_k \leq \beta_k, \text{ for all k}.
\end{align*}
Here $Z$ is the objective, $c$ is a column vector with coefficients for reactions in the objecte, $v$ is the column vector of fluxes representing all reactions in the model. As such $c^T v$ is a linear combination of fluxes, the objective. $S$ is the stoichiometry matrix, $\alpha_k$ is the lower bound for reaction $k$ and similarly $\beta_k$ is the upper bound for reaction $k$.

This can be thought of as first constraining all possible solutions to the ones that allow a steady state and satisfy the bounds ( this results in a multi dimensional cone within the null space of S) and then finding the optimal solution among the remaining degrees of freedom.

## Parsimonious enzyme-usage flux balance analysis (pFBA)
(Lewis et al., 2010) http://doi.org/10.1038/msb.2010.47

pFBA is an often-used extension of normal flux balance balance analysis. It performs FBA as usual but with an additional optimization that minimizes the total flux in the network. Intuitively this means that if there are two solutions that achieve the same growth rate but one solution is a lot longer than the other, i.e. it has more steps and therefore the total flux is higher, pFBA will return the shorter solution. 

Biologically, this may also be interpreted as giving you an optimal solution that is efficient in its enzyme usage.

**Note:** There is one pitfall with pFBA, the 'objective_value' returned refers to the total flux in the network not the objective flux. You can get the fluxes, as with model.optimize() through the 'fluxes' attribute

In [3]:
model_pfba = M.copy()
sol = pfba(model_pfba)
print('Solution status: {}'.format(sol.status))
print('Objective value: {}'.format(sol.fluxes['biomass_reaction']))
print('Total flux in pFBA solution: {}'.format(sol.objective_value))

NameError: name 'M' is not defined

In [None]:
model_fba = M.copy()

sol = model_fba.optimize()

# total_flux =  calculate the total flux here, make use of sol.fluxes
# make sure you take the absolute value of the flux (use abs()) so that negative fluxes (going in the backward direction) 
# are not reducing the total flux value

print('Total flux in FBA solution: {}'.format(total_flux))

<span style="color:red">**Assignment (3 min):**</span> Using the cell below inspect the input and output of the networks in the FBA and pFBA solutions. Are the growth rates still the same? What happens to the amount of flux going in and out of the network

In [None]:
model_fba.summary()
print() # empty line
model_pfba.summary()