In [20]:
import scipy
import numpy as np
import pandas as pd

<details>
<summary>
<font size="3" color="black">
<b>Introduction to Cell Culture Fed-Batch Process Simulator ⏏︎Click to open</b>
</font>
</summary>

### **Understanding the Fed-Batch Cell Culture Process Simulator**
The fed-batch cell culture process simulator is designed to generate in-silico (simulated) data that helps in testing and understanding various machine learning tools discussed in your course. Here's an overview of the key components and dynamics modeled in this simulator:

#### 1. Key Components:
- **Viable Cell Density (VCD)**: Represents the concentration of live cells capable of reproduction in the culture, typically measured in \(10^6\) cells/mL. This parameter is crucial as it directly influences the production rate of the bioproduct.
- **Glucose (Glc)**: Serves as the primary energy source for the cells. It is continuously fed into the system to maintain optimal growth conditions. The balance of glucose concentration is vital; too little can limit cell growth and product formation, while too much can be toxic and lead to increased cell death.
- **Lactate (Lac)**: A metabolic by-product produced by the cells. High levels of lactate can be toxic and adversely affect cell metabolism and viability.
- **Product (Titer)**: The bioproduct produced by the cells. The production dynamics are inversely related to the growth rate of the cells—faster growing cells tend to produce less product.

#### 2. Dynamics and Interactions:
- **Cell Metabolism**: Cells consume glucose to sustain their metabolism, which includes growth and maintenance. The consumption rate affects how much glucose needs to be fed into the system.
- **Product Formation**: As cells metabolize glucose, they also produce the desired product (Titer). The efficiency of product formation can vary based on the cell's health and the culture conditions.
- **Toxicity and Death**: Both glucose and lactate have threshold levels beyond which they can harm the cells. Managing these concentrations is critical to prevent accelerated cell death and decreased productivity.

#### 3. Simulator Equations:
The simulator uses a set of equations that describe how these components interact over time. These equations typically involve:
- **Growth Equations**: Describe how the VCD changes over time, influenced by nutrient availability (like glucose) and inhibitory effects (like lactate).
- **Consumption and Production Equations**: Model how glucose is consumed and how lactate and the product (Titer) are produced. These might include kinetic models that consider factors like growth rate, substrate concentration, and inhibitory effects.
- **Feeding Strategies**: Algorithms that determine how and when nutrients (especially glucose) are added to the culture to optimize growth and productivity.

### **Application of Machine Learning:**
Machine learning can be applied to analyze the in-silico data generated by this simulator to:
- **Predictive Modeling**: Develop models that predict cell growth, product formation, or nutrient consumption under various conditions.
- **Optimization**: Use algorithms to find optimal feeding and operation strategies that maximize product yield and minimize costs.
- **Anomaly Detection**: Identify unusual patterns in the data that could indicate problems in the culture process.

By understanding and utilizing this simulator, you can explore the complexities of bioprocess control and optimization in a controlled, risk-free environment. This allows for experimentation with different strategies and machine learning techniques to enhance the understanding and efficiency of bioprocesses.

The simulator's results are based on the following equations to create the in-sillico data


### **Explaining Model Equations for Simulator**

The equations provided describe the dynamics of the fed-batch cell culture process in terms of how the viable cell density, glucose, lactate, and product concentrations change over time. Here's a breakdown of each equation and the parameters involved:

#### 1. Balance on Viable Cell Density (VCD)
   $$
   \frac{dVCD}{dt} = (\mu_g - \mu_d)VCD
   $$
   - **$\mu_g$ (Growth rate)** and **$\mu_d$ (Death rate)** are key factors determining the net rate of change of VCD. The growth rate is typically influenced by nutrient availability and metabolic health, while the death rate is influenced by toxic by-products and suboptimal conditions.
   - **VCD** reflects the concentration of viable cells, and this equation calculates how it changes over time considering both growth and death.

#### 2. Balance on Glucose
   $$
   \frac{dGlc}{dt} = -k_{Glc} \frac{Glc}{Glc + 0.05} VCD + F_{Glc}
   $$
   - **$k_{Glc}$** is the rate at which glucose is consumed by the cells.
   - **Glc** is the glucose concentration, and the term $\frac{Glc}{Glc + 0.05}$ represents a Michaelis-Menten type kinetics, suggesting glucose consumption rate increases with concentration but saturates at high levels.
   - **$F_{Glc}$** is the rate of glucose feed into the system, helping maintain glucose levels to support cell growth without reaching toxic levels.

#### 3. Balance on Lactate
   $$
   \frac{dLac}{dt} = k_{Lac} VCD
   $$
   - **$k_{Lac}$** represents the rate of lactate production per cell. This equation shows that lactate production is proportional to the cell density, reflecting that it's a metabolic by-product of cell activity.

#### 4. Balance on Product (Titer)
   $$
   \frac{dProd}{dt} = k_{Prod}\frac{Glc}{Glc + K_{g, Glc}} \left(\frac{μ_{g}}{μ_{g,max}}\right)^2 VCD - 2 \frac{dAggr}{dt}
   $$
   - **$k_{Prod}$** is the rate of product formation, which depends on the glucose concentration and the normalized squared growth rate, suggesting that product formation is less efficient under either suboptimal or overly rapid growth conditions.
   - **$\frac{dAggr}{dt}$** represents the rate of product aggregation or degradation, which reduces the net product concentration.

#### 5. Growth Rate and Death Rate
   - **Growth rate function ($\mu_g$)**:
      $$
      \mu_g =  μ_{g,max}\frac{Glc}{Glc + K_{g, Glc}}\frac{K_{i, Lac}}{Lac + K_{i, Lac}}
      $$
      - **$μ_{g,max}$** is the maximum growth rate, achievable under optimal conditions.
      - **$K_{g, Glc}$** and **$K_{i, Lac}$** are Michaelis-Menten constants for glucose and lactate inhibition, respectively.
   - **Death rate function ($\mu_d$)**:
      $$
      \mu_d = μ_{d,max}(1+\frac{φ}{1+φ})\frac{Lac}{Lac + K_{d, Lac}}
      $$
      - **$μ_{d,max}$** is the maximum death rate, influenced by the level of glucose saturation and lactate concentration.
      - **$K_{d, Lac}$** is the Michaelis-Menten constant for lactate in the death rate equation.
      - **$\phi$** is a factor representing glucose saturation, increasing exponentially with glucose concentration over a threshold. $φ = e^{0.1}(Glc-75)$

These equations model the complex interactions in cell culture processes, where maintaining optimal levels of nutrients and by-products is crucial for maximizing productivity and viability. The simulator allows users to adjust these rates to explore different scenarios and optimize the process behavior. Understanding and tuning these parameters can help in developing strategies for industrial-scale bioprocesses.


### **Process Parameters**

Please insert the values of the process manipulated variables:

- Feed start (day): day at which Glc feed is started
- Feed end (day): ay at which Glc feed is stopped
- Feed rate: mass rate (g/L/day) at which Glc is feed (continuous feed over 24 hours)
- Initial Glc concentration (g/L): Glc at time t = 0
- Initial VCD (10^6 cell/mL): VCD at time t = 0


### **Bolus Feed Addition Method**

```python
import numpy as np
from scipy.integrate import solve_ivp

# 假设所有必要的参数和初始条件都已经定义
bolus_times = np.array([0, 24, 48, 72, 96, 120, 144, 168, 192])  # 补料时间点，单位小时

def process_ode(t, y, p):
    # 定义变量和常微分方程
    VCD, Glc, Lac, titer = y
    # 此处省略ODE的具体计算过程...
    dVCDdt = -VCD  # 示例
    dGlcdt = -Glc  # 示例
    dLacdt = Lac   # 示例
    dTiterdt = titer  # 示例
    return [dVCDdt, dGlcdt, dLacdt, dTiterdt]

def bolus_event(t, y, p):
    # 当时间t接近补料时间点时，事件函数接近0
    return t - bolus_times[np.searchsorted(bolus_times, t)]

# 设置事件属性
bolus_event.terminal = False  # 事件不会终止积分
bolus_event.direction = 1     # 事件在值增加时触发

def handle_event(sol, p, bolus_amount):
    # 事件发生时处理补料
    idx = np.searchsorted(bolus_times, sol.t_events[0][0])
    sol.y[1][-1] += bolus_amount[idx]  # 假设bolus_amount是一个与bolus_times对应的补料数组

# 初始条件和参数
y0 = [1.0, 2.0, 3.0, 4.0]  # 假设的初始条件
t_span =      [0, 192] # 总时间
t_eval = np.arange(0, 192+24, 24)
bolus_amount = [0, 0, 0, 10, 10, 10, 10, 0, 0]  # 每次补料10单位葡萄糖

# 使用solve_ivp进行积分
sol = solve_ivp(process_ode, t_span, y0, args=(None,), events=bolus_event, t_eval=t_eval, dense_output=True)

# 处理事件结果
for te in sol.t_events[0]:
    handle_event(sol, None, bolus_amount)

# 输出结果
print("Solution at specific times:", sol.sol(bolus_times))



```

</details>

In [21]:
def process_ode(t, y, p):
    (
        mu_g_max,
        mu_d_max,
        K_g_Glc,
        K_I_Lac,
        K_d_Lac,
        k_Glc,
        k_Lac,
        k_Prod,
        feed_start,
        feed_end,
        Glc_feed_rate,
    ) = p

    VCD, Glc, Lac, titer = y[0], y[1], y[2], y[3]
    MM_Glc = Glc / (K_g_Glc + Glc)
    mu_g = mu_g_max * MM_Glc * K_I_Lac / (K_I_Lac + Lac)
    phi = np.exp(0.1 * (Glc - 75.0))
    mu_d = mu_d_max * (1.0 + phi / (1.0 + phi)) * Lac / (K_d_Lac + Lac)
    growth_ratio = mu_g / mu_g_max

    # compute mass balances
    Glc_Min = Glc / (0.05 + Glc)
    dVCDdt = (mu_g - mu_d) * VCD
    dGlcdt = -k_Glc * Glc_Min * VCD
    dLacdt = k_Lac * VCD
    dTiterdt = k_Prod * MM_Glc * ((1.0 - growth_ratio) ** 2.0) * VCD

    # add feed rate
    if feed_end >= t >= feed_start:
        dGlcdt += Glc_feed_rate

    return [dVCDdt, dGlcdt, dLacdt, dTiterdt]

In [22]:
def predict_process(feed_start, feed_end, feed_rate, glc_0, vcd_0):
    MU_G_MAX = 0.05
    MU_D_MAX = 0.025
    K_G_GLC = 1
    K_I_LAC = 30
    K_D_LAC = 50
    K_GLC = 0.04
    K_LAC = 0.06
    K_PROD = 1
    MODEL_PARAM = [MU_G_MAX, MU_D_MAX, K_G_GLC, K_I_LAC, K_D_LAC, K_GLC, K_LAC, K_PROD]

    mu_g_max, mu_d_max, K_g_Glc, K_I_Lac, K_d_Lac, k_Glc, k_Lac, k_Prod = MODEL_PARAM

    y0 = [vcd_0, glc_0, 0, 0]
    t_start, t_end = 0, 24 * 14
    t_span = np.arange(t_start, t_end + 24, 24)
    p = (
        mu_g_max,
        mu_d_max,
        K_g_Glc,
        K_I_Lac,
        K_d_Lac,
        k_Glc,
        k_Lac,
        k_Prod,
        24.0 * feed_start,
        24.0 * feed_end,
        feed_rate / 24,
    )

    # integrates equations
    sol = scipy.integrate.solve_ivp(
        process_ode,
        t_span=[t_start, t_end],
        y0=y0,
        t_eval=t_span,
        method="BDF",
        args=([p]),
        rtol=1e-6,
        atol=1e-6,
    )

    t = sol.t.tolist()
    y = sol.y.T.tolist()

    return t, y

In [23]:
def generate_data(doe_scaled, filename):
    col_names = ["timesteps", "X:VCD", "X:Glc", "X:Lac", "X:Titer", "W:Feed"]
    owu_df = pd.DataFrame(columns=col_names)

    num_runs = len(doe_scaled)

    i = 0
    for i in range(num_runs):
        feed_start = doe_scaled[i, 0].round()
        feed_end = doe_scaled[i, 1].round()
        feed_rate = doe_scaled[i, 2]
        glc_0 = doe_scaled[i, 3]
        vcd_0 = doe_scaled[i, 4]

        t, x = predict_process(feed_start, feed_end, feed_rate, glc_0, vcd_0)

        time = np.array([t]).T / 24
        xvar = x
        wvar = np.zeros_like(time)
        wvar[int(feed_start) : int(feed_end), :] = feed_rate
        res = np.hstack([time, xvar, wvar])

        owu_df = pd.concat(
            [owu_df, pd.DataFrame(res, columns=col_names)], ignore_index=True
        )

    owu_df.index = pd.MultiIndex.from_product(
        [list(range(num_runs)), list(range(15))], names=["run", "time"]
    )
    owu_df.to_csv(filename)
    return owu_df

In [24]:
DOE_FILENAME = 'dataset/datahow_concise/owu_doe.csv'
doe_df = pd.read_csv(DOE_FILENAME)

In [25]:
OWU_FILENAME = 'dataset/datahow_concise/owu.csv'
owu_df = generate_data(doe_df.values, OWU_FILENAME)

In [26]:
owu_df.head(10)

Unnamed: 0_level_0,Unnamed: 1_level_0,timesteps,X:VCD,X:Glc,X:Lac,X:Titer,W:Feed
run,time,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
0,0,0.0,0.55,45.0,0.0,0.0,0.0
0,1,1.0,1.725244,44.008188,1.489389,0.054632,0.0
0,2,2.0,4.779558,41.089862,5.872389,1.311539,12.5
0,3,3.0,10.281522,46.518976,16.490684,15.722004,12.5
0,4,4.0,15.843605,46.342735,35.525329,85.316154,12.5
0,5,5.0,18.448543,42.118516,60.639939,243.66668,12.5
0,6,6.0,17.949288,36.9702,87.145958,465.229084,12.5
0,7,7.0,15.678095,33.265127,111.48835,701.440408,12.5
0,8,8.0,12.8173,32.095324,132.024633,918.197936,12.5
0,9,9.0,10.043513,33.661695,148.450216,1100.697239,12.5
