# Setup of Noteboook

The follwing code clones the github repository with course files. 
Subsequently it imports all libraries and custom modules needed for this notebook

In [1]:
# clone the github repository
!git clone https://github.com/DataHow/analytics-course-scripts.git

Cloning into 'analytics-course-scripts'...
remote: Enumerating objects: 336, done.[K
remote: Counting objects: 100% (100/100), done.[K
remote: Compressing objects: 100% (70/70), done.[K
remote: Total 336 (delta 63), reused 65 (delta 30), pack-reused 236[K
Receiving objects: 100% (336/336), 5.36 MiB | 15.13 MiB/s, done.
Resolving deltas: 100% (197/197), done.


In [2]:
# import libaries
import pandas as pd
import numpy as np
import scipy
import importlib  
import itertools
import scipy.integrate
import plotly.express as px
import plotly.graph_objects as go
from plotly.subplots import make_subplots
from matplotlib import pyplot as plt
pd.set_option('display.max_rows', None)

# import custom modules
simulator = importlib.import_module("analytics-course-scripts.scripts.modules.simulator")
plothelpers = importlib.import_module("analytics-course-scripts.scripts.modules.plothelpers")

# Introduction to Cell Culture Fed-Batch Process Simulator

The simulator is aimed at providing in-silico data to test some of the machine learning tool discussed during the course. The simulator is mimicking the behavior of a fed-batch cell culture process, where only few components are present:


*   The cells, responsible for the production of the product, which are consuming glucose to sustain metabolism and producing lactate as by-product. These are indicated by VCD (viable cell density), typically expressed in million cells / ml.
*   Glucose (Glc) is consumed by the cells and it is continuously fed to the process (F_glc). Too low glucose concentrations are slowing down cell growth and product production. To high glucose concentrations are poisoning the system and accelerating cell death.
*   Lactate (Lac) is a by-product of the cells and it is poisoning the cells, so that too high lactate concentrations are slowing down cell metabolism and accelerating cell death
*   Product (Titer) is produced by the cells. The faster the cells are growing, the less are producing the product. 

The simulator's results are based on the following equations to create the in-sillico data

### Model Equation Parameters

- Balance on VCD:  $\frac{dVCD}{dt}$ = (μ<sub>g</sub> - μ<sub>d</sub>)VCD
- Balance on Glucose: $\frac{dGlc}{dt}$ = $-k_{Glc} \frac{Glc}{Glc + 0.05}$ VCD + F<sub>Glc</sub>
- Balance on Lactate: $\frac{dLac}{dt}$ = k<sub>Lac</sub> VCD
- Balance on Product (Titer): $\frac{dProd}{dt}$ = k<sub>Prod</sub>$\frac{Glc}{Glc + K_{g, Glc} }$ ($\frac{μ_{g}}{μ_{g,max}}$)<sup>2</sup> VCD $-2 \frac{dAggr}{dt}$ 

Where:
- Growth rate: $μ_{g} =  μ_{g,max}\frac{Glc}{Glc +K_{g, Glc}}\frac{K_{i, Lac}}{Lac+K_{i, Lac}} $
- Death rate: $μ_{d} = μ_{d,max}(1+\frac{φ}{1+φ})\frac{Lac}{Lac+K_{d, Lac}}$
- Glc saturation: φ = e<sup>0.1(Glc-75)</sup>

The user can change the different rates on the simulator, in order to change the process behavior (please use default values at the beginning)


### Process Parameters

Please insert the values of the process manipulated variables:

- Feed start (day): day at which Glc feed is started
- Feed end (day): ay at which Glc feed is stopped
- Feed rate: mass rate (g/L/day) at which Glc is feed (continuous feed over 24hours)
- Initial Glc concentration (g/L): Glc at time t = 0
- Initial VCD (10^6 cell/mL): VCD at time t = 0

# Generate Experiment Run

In [3]:
""" Model parameters: Dictate the behaviour of the cell process """
MU_G_MAX = 0.05;
MU_D_MAX = 0.025;
K_G_GLC  = 1;
K_I_LAC  = 30;
K_D_LAC  = 50;
K_GLC    = 0.04;
K_LAC    = 0.06;
K_PROD   = 1;
MODEL_PARAM = [MU_G_MAX,MU_D_MAX,K_G_GLC,K_I_LAC,K_D_LAC,K_GLC,K_LAC,K_PROD]

""" Process parameters: Conditions at which process is run """
FEED_START = 3.0
FEED_END = 12.0
GLC_FEED_RATE = 12.0
GLC_0 = 10.0
VCD_0 = 0.3
PROCESS_PARAM = [FEED_START,FEED_END,GLC_FEED_RATE,GLC_0,VCD_0]


In [4]:
# Generate experiment run
t,x = simulator.predict_process(MODEL_PARAM,PROCESS_PARAM)
times = np.array(t)
run = pd.DataFrame(x,columns=["X:VCD", "X:Glc", "X:Lac", "X:Titer"])
run.head(15)

Unnamed: 0,X:VCD,X:Glc,X:Lac,X:Titer
0,0.3,10.0,0.0,0.0
1,0.877666,9.484266,0.777573,0.129431
2,2.389344,8.030174,2.971193,0.931287
3,5.451007,4.419283,8.432332,7.038657
4,9.954081,9.17163,19.380135,32.67212
5,14.236051,9.490923,36.991679,107.606442
6,16.097601,6.789103,59.1792,249.179059
7,15.297378,3.687971,82.05645,427.837006
8,12.870307,2.335591,102.445937,595.285539
9,10.24535,3.474163,119.046093,738.83047


In [5]:
# Plot generated experiment run
fig = make_subplots(rows=1, cols=4, subplot_titles=run.columns)
for i,column in enumerate(run.columns):
    fig.add_trace(go.Scatter(x=times/24, y=run[column].values), row=1, col=i+1)
fig.update_layout(showlegend=False, title_text="X Variable evolution for generated run")
fig.show()

# Generate Design of Experiments

In order to show the complexity of the model, in spite of the apparent simplicity and small number of components, in this section the user can similate the behavior of the process in a broad range of the process variables.

A number of simulations defined by "num_runs" will be generated. For this number of simulations, a latin hypercube design (LHD) is created, to uniformly map the 5-dimensional space of the variables.

### Manipulated Variables

For each of the manipulated variables defined in the section above, the use can define the limits of the exploration space (first value: lower limit; second value: upper limit).

*Note: num_runs is defining the number of simulations. The generated simulations are saved locally in the file "ExperimentsDOE_test.csv", which will be used as test set for the machine learnign tools in the followng lectures.*




In [6]:
""" DOE Dataset definition (variable = [lower bound, upper bound]) """
""" Model parameters: Dictate the behaviour of the cell process """
MU_G_MAX = 0.05;
MU_D_MAX = 0.025;
K_G_GLC  = 1;
K_I_LAC  = 30;
K_D_LAC  = 50;
K_GLC    = 0.04;
K_LAC    = 0.06;
K_PROD   = 1;

""" Process parameters: Conditions at which process is run """
FEED_START = [1, 4]
FEED_END = [8, 12]
GLC_FEED_RATE = [5, 20]
GLC_0 = [10, 80.0]
VCD_0 = [0.1, 1.0]

""" Number of experiments to generate """
NUM_RUNS = 40

""" Filename and filepath for the dataset """
FILENAME = "owu.csv"
FILEPATH = "/content/"

# Collect parameters to dictionary
VAR_LIMS = {"mu_g_max":MU_G_MAX,
    "mu_d_max": MU_D_MAX,
    "K_g_Glc" : K_G_GLC,
    "K_I_Lac" : K_I_LAC,
    "K_d_Lac" : K_D_LAC,
    "k_Glc" : K_GLC,
    "k_Lac" : K_LAC,
    "k_Prod" : K_PROD,
    "feed_start" : FEED_START,
    "feed_end" : FEED_END,
    "Glc_feed_rate" : GLC_FEED_RATE,
    "Glc_0" : GLC_0,
    "VCD_0" : VCD_0}

In [7]:
# Generate Dataset
data = simulator.generate_data(VAR_LIMS, NUM_RUNS, FILENAME)
# Import DOE
doe = pd.read_csv(FILEPATH+FILENAME.replace(".csv","_doe.csv"),index_col=None, usecols =["feed_start","feed_end","Glc_feed_rate","Glc_0","VCD_0"])
# Import OWU
owu = pd.read_csv(FILEPATH+FILENAME,index_col=None, usecols = ["X:VCD", "X:Glc", "X:Lac", "X:Titer","W:Feed"])
owu.index = pd.MultiIndex.from_product([list(range(NUM_RUNS)),list(range(15))], names=["run","time"])

In [8]:
# Read DOE - Initial Conditions
doe

Unnamed: 0,feed_start,feed_end,Glc_feed_rate,Glc_0,VCD_0
0,2.5,10.0,12.5,45.0,0.55
1,1.269231,11.230769,19.038462,52.179487,0.111538
2,1.576923,10.410256,5.961538,77.307692,0.388462
3,2.423077,9.076923,19.423077,39.615385,0.596154
4,1.192308,8.871795,5.576923,53.974359,0.55
5,1.038462,10.615385,7.115385,79.102564,0.826923
6,3.807692,8.564103,12.884615,23.461538,0.157692
7,1.807692,11.435897,10.961538,28.846154,0.526923
8,2.653846,9.589744,15.192308,12.692308,0.965385
9,2.192308,9.179487,19.807692,14.487179,0.342308


In [9]:
# Read OWU - Process Evolutions
owu

Unnamed: 0_level_0,Unnamed: 1_level_0,X:VCD,X:Glc,X:Lac,X:Titer,W:Feed
run,time,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
0,0,0.55,45.0,0.0,0.0,0.0
0,1,1.725244,44.00819,1.489389,0.054632,0.0
0,2,4.779558,41.08962,5.872389,1.311539,12.5
0,3,10.278397,40.27148,16.487303,15.870869,12.5
0,4,15.886799,40.0836,35.542541,85.824621,12.5
0,5,18.568089,35.78307,60.77645,245.221131,12.5
0,6,18.099141,30.5052,87.483607,468.616199,12.5
0,7,15.811653,26.66724,112.033694,706.666289,12.5
0,8,12.920362,25.38948,132.740373,924.761704,12.5
0,9,10.123559,26.87354,149.296217,1108.215155,12.5


## Analyse single experiment

Here you can visualize particular run from the generated dataset. You can change the `select_run_ix` to decide which run will be plotted.



In [17]:
""" Select specific run to be plotted """
PLOT_RUN_ID = 27
""" Select group of run to be plotted"""
PLOT_RUNS_IDS = [0,1,2,3,4,5,6,7,8,9] 

In [18]:
# Evolution profiles for selected runs
if PLOT_RUN_ID is not None:
    fig = make_subplots(rows=1, cols=5, subplot_titles=owu.columns)
    plot_run_ix = owu.index.get_level_values("run") == PLOT_RUN_ID
    for i,c in enumerate(owu.columns):
        fig.add_trace(go.Scatter(x=list(range(15)), y=owu[c].values[plot_run_ix]), row=1, col=i+1)
    fig.update_layout(showlegend=False, title_text="X Variable evolution for run "+str(PLOT_RUN_ID))
    fig.show()
if PLOT_RUNS_IDS is not None:
    fig = make_subplots(rows=1, cols=5, subplot_titles=owu.columns)
    for j in PLOT_RUNS_IDS:
        plot_run_ix = owu.index.get_level_values("run") == j
        for i,c in enumerate(owu.columns):
            fig.add_trace(go.Scatter(x=list(range(15)), y=owu[c].values[plot_run_ix], name="Run id = " + str(j),marker=dict(color=px.colors.qualitative.Plotly[j % 10])), row=1, col=i+1)
    fig.update_layout(showlegend=False, title_text="X Variable evolution for runs "+str(PLOT_RUNS_IDS))
    fig.show()


## Analyse all experiments

Here you can visualize all run from the generated dataset. By changing the `select_color` you decide on the coloring of the experiments. The options are:
* `Run_id` runs are colored by the order in which they appear in the dataset.
* `Titer_14` runs are colored by the amount of Titer at day 14 or the experiments.
* `Glc_0` run are colored by the designed initial Glucose level
* `VCD_0` run are colored by the designed initial VCD level
* `feed_start` run are colored by the designed feeding start day 
* `feed_end` run are colored by the designed feeding end day
* `Glc_feed_rate` run are colored by the designed Glucose feed rate

In [21]:
""" Plot coloring option, one of (Run_id, Titer_14, Glc_0, VCD_0, feed_start, feed_end, Glc_feed_rate)"""
PLOT_COLOR = "Titer_14"
""" Select run to be highlighted in all plots """
PLOT_HIGHLIGHT = 10

In [23]:
# Plot all Univariate plots
color_show=True
if PLOT_COLOR == "Titer_14":
    color_idx =np.repeat(np.array(owu["X:Titer"][:,14]),15)
elif PLOT_COLOR in doe.columns:
    color_idx = np.repeat(np.array(doe[PLOT_COLOR]),15)
else:
    PLOT_COLOR ="Run_id"
    color_idx=np.repeat(np.array(list(range(doe.shape[0]))),15)     
    color_show=False
owu["color"] = color_idx

for i in owu.columns[0:-1]:
    fig = go.Figure()
    fig.add_trace(go.Scatter(x=owu.index.get_level_values(1),y=owu[i],mode='markers',marker=dict(size=0,color="rgba(0,0,0,0)",colorscale='Portland',cmin=min(color_idx),cmax=max(color_idx),colorbar=dict(thickness=40, title=str(PLOT_COLOR))),showlegend=False))
    for color_val in np.unique(color_idx):
        color_val_norm = (color_val -min(color_idx)) / (max(color_idx)-min(color_idx))
        owu_subset = owu[owu['color']==color_val]
        fig.add_trace(go.Scatter(x=owu_subset.index.get_level_values(1),y=owu_subset[i],mode='lines+markers',name="Run id = " + str(owu.index.get_level_values(0)[color_val == color_idx][0]),marker=dict(color=plothelpers.get_color('Portland',color_val_norm))))
    if PLOT_HIGHLIGHT is not None:
        highlight_ix = owu.index.get_level_values(0)==PLOT_HIGHLIGHT
        fig.add_trace(go.Scatter(x=owu_subset.index.get_level_values(1),y=owu[i][highlight_ix],mode='lines+markers',name="Run id = " + str(PLOT_HIGHLIGHT),marker=dict(color="black",size=10)))              
    fig.update_layout(showlegend=False,xaxis_title='days',yaxis_title=i, title=i)
    fig.show()
if 'color' in owu.columns: owu.drop(columns='color',inplace=True)

# Tasks

1. Identify which process parameters produce the highest level of titer in the simulated experiments.
2. Replicate it in single experiment generation and try to modify it to produce even higher level of Titer.