## Synthesizing building blocks

In [None]:
import cobra
from qbio_resources.plotting_functions import substrate_name_to_rxn, plot_theoretical_yields
from matplotlib import pyplot
%matplotlib inline

----
## A) Simulating carbon yield of key biomass components for 10 substrates

This section will assess the potential of 10 carbon substrates to produce essential biomass components. 

To do this simulate the maximum theoretical yield of the following biomass components
 - Histidine
 - ATP hydrolysis (using ATPM)
 
using substrates contained in the dictionary of substrates below

In [None]:
print(substrate_name_to_rxn)

### 1) Load the model and save it as a variable called `model` 

 - Set lower bound of the ATPM reaction to 0. This reaction will be discussed later


### 2) Interate through the substrates and optimize for histidine productions

**Instructions**
- Use -10 $\frac{mmol}{gDW \cdot hr}$ as the lower bound for each substrate


- Store the maximum theoretical yield in a `list` called `yield_list`
  - The yield is defined as $\mathrm{\frac{product\_secretion\_flux}{absolute\_value\_of\_substrate\_uptake\_flux}}$ 
  
  
- Store the substrate names in a `list` called `substrate_list`


- Store the number of reactions active in each condition in a `list` called `num_reaction_list`
  - Only consider reactions with an absolute flux value $\geq 10^{-6}$ 
  
  
- Use `cobra.flux_analyis.pfba()` function like in Exercise 1 to simulate model
  - This will give the minimal set of reactions that must be active
  
**Hints:** 

1) You will be reusing the same or similar code to run each of the following analyses. It might be worth writing a function to do this.

2) You may run into divide by zero errors when calculating the product yield if histidine cannot be synthesized by the particular substrate. You will need to find a way to account for this.

### 3) Use the `plot_theoretical_yields()` function imported above to visualize results

This function takes the following arguments (in order):
1. List of substrate yields
2. List of substrate names
3. The name (string) of the metabolite that was optimized for
4. List of the number of reactions active in the simulation

### 4) Visualize the correlation (or lack thereof) between the maximum theoretical yield and number of reaction active

 - Use matplotlib's scatter plot function which can be executed with `pyplot.scatter(x_values, y_values)` where `x_values` and `y_values` correspond to lists of values that should be plotted on the x- and y-axis, respectively
 
 - Add x- and y-axis labels with `pyplot.xlabel(label_string)` and `pyplot.ylabel(label_string)` 

### 5) Repeat above analysis for anaerobic histidine production

### 6) Repeat above analysis for aerobic and anaerobic ATP production

To do this, we will use the non-growth associated ATP maintenance reaction (ATPM). This reaction is used to model cellular ATP demands that are not due to growth (maintaining ion gradients, etc.) and has the following form:

In [None]:
model.reactions.ATPM.reaction

Therefore maximizing flux through this ATP hydrolyis reaction will give an indication of the maximum amount of ATP that can be produced from a given substrate.

---
## B) Assess solution variability
One caveat of COBRA methods is that an optimal FBA solution is not unique. In fact there exists an infinite number of alternative optimal solutions that fall within what is called a "solution space". Therefore many model predictions require that this reality is addressed in order to ensure that these predictions are meaningful.

A common way to get an understanding of solution variability is called "flux variability analysis" (FVA). This section will outline how this method is used.

### 1) Like above, optimize for the aerobic synthesis of histidine from glucose
- Save the maximum histidine synthesis flux in a variable called `max_flux`

### 2) Set the upper and lower bound of the histidine production flux equal to `max_flux`

### 3) Find the maximum and minimum flux possible through phosphofructokinase (PFK) at the maximum histidine synthesis condition
- The solutions to the two optimizations will represent the total flux variability of PFK at the optimal histidine production

**Hint:** the minimum flux can be found by executing `model.optimize('minimize')`

### 4) Reset the model
 - Return the objective to `EX_his__L_e` and return `EX_his__L_e`'s default bounds (lower_bound = 0, upper_bound = 1000)

### 5) Perform flux variability analysis on every reaction in the model

 - Use the cobra function `cobra.flux_analysis.flux_variability_analysis()` with `fraction_of_optimum=1`. This will perform the optimizations above on every reaction in the model at the optimal solution.

 - Assign the output to a variable called `fva_solution`

### 6) Remove values from `fva_solution` with 'maximum' and 'minimum' absolute values < $10^{-3}$ 

- Store the resulting values in two lists `maximum_flux` and `minimum_flux` 

**Hint:** 

1) The maximum or minimum values can be obtained from `fva_solution` with:

In [None]:
fva_solution['maximum'].values

2) Two lists of equal length can be iterated through together using `zip` like below

In [None]:
a = [1, 2, 3]
b = [4, 5, 6]
for i1, i2 in zip(a,b):
    print(i1, i2)

### 7) Visualize the variability of the reactions in the optimal solution using the `plot_variability()` function below

In [None]:
def plot_variability(maximum_flux, minimum_flux):
    x_array = list(range(0, len(maximum_flux)))
    pyplot.semilogy(x_array, minimum_flux, x_array, maximum_flux, '#1f77b4')
    pyplot.fill_between(x_array, minimum_flux, maximum_flux)
    pyplot.fill()
    pyplot.yscale('symlog')
    pyplot.xlabel('Reaction')
    pyplot.ylabel('Max and min flux possible in model')

----
## C) Sampling the solution space

Methods have been developed to try to address the uncertainty that comes from alternate optimimal solutions. One such method is called "solution space sampling". This approach involves performing FBA optimation on the model many times (10,000+ simulations) until the entirety of the solution space has been effictively sampled. This ultimately provides prediction of which reactions/fluxes are most "likely" within a solution space. 

This section will outline how this approach can be implemented.

### 1) Like above, set the lower and upper bound of histidine exchange equal to the optimal histidine synthesis flux in glucose aerobic conditions

### 2) Sample the model 300 times using `cobra.flux_analysis.sample()` function
- This may take a few minutes to run
- Assign the output to a variable called `sampled_fluxes` 

### 3) Look up the sampled distribution of fluxes for glucose uptake 
This is a reaction that we would expect to show no variability at the optimal solution, as it is the limiting substrate

**Hint:** `sampled_fluxes` is a pandas `DataFrame` whose columns correspond to each reaction in the model. The values in a column can be returned by executing `data_frame_variable['column_name']`

### 4) Visualize the variability of the following reactions as a histogram
Attempt to infer the most "likely" fluxes that would be carried by the following reactions:

1. Phosphofructokinase (PFK)
2. Glucose-6-phosophate Isomerase (PGI)
3. Phosphopentomutase (PPM)
4. L-aspartate uptake via facillitated diffusion (ASPtpp)

**Hints:** 

1) The method for querying column values (shown above) returns a pandas `Series`. This data type has a method that can be used to easily create histagrams with `series_variable_name.hist()`

2) Set the number of bins to 25

3) The `series_variable_name.hist()` function returns an intance of a matplotlib axis. Set this to a variable named `ax`.

4) The x-axis and y-axis labels can be set using `ax.set_xlabel(label_name)` and `ax.set_ylabel(label_name)`, respectively
   - The flux value of the sampling should be plotted on the x-axis and the number of sampling solutions corresponding to that flux value should be plotted on the y-axis