# Tutorial 1.2 - Stochastic Genetic Information Process

Here we examine a CME model of stochastic Genetic Information Process.

In this model, we include the transcription and translation of gene and mRNA together with degradation of both mRNA and protein.

The model presented here can be found in the classic article: [Analytical distributions for stochastic gene expression](https://www.pnas.org/doi/full/10.1073/pnas.0803850105).


In [None]:
# Import Standard Python Libraries
import os
import numpy as np

# Import jLM Libraries
import jLM.CME as CME
import jLM.units as units
import jLM.CMEPostProcessing as PostProcessing

# Enable plotting inline in the Jupyter notebook
%matplotlib inline

In [None]:
# Import Standard Python Libraries
import os
import numpy as np
import matplotlib.pyplot as plt

# Import pyLM Libraries
from pyLM import *
from pyLM.units import *
from pySTDLM import *
from pySTDLM.PostProcessing import *

# Enable plotting inline in the Jupyter notebook
%matplotlib inline

## Constants

Rates of reactions come from the [Cell, 2022](https://www.cell.com/cell/fulltext/S0092-8674(21)01488-4?_returnURL=https%3A%2F%2Flinkinghub.elsevier.com%2Fretrieve%2Fpii%2FS0092867421014884%3Fshowall%3Dtrue) paper's Whole Cell Model of DnaA coding gene (G_0001) at initial conditions.

Degradation rate of protein is calculated based on 25 hours' half life in [Maier et al, 2011](https://www.embopress.org/doi/full/10.1038/msb.2011.38).

In [None]:
# Constants
k_transcription  = 6.41e-4       # Transcription, s^-1
k_degra_mRNA = 2.59e-3     # degradation of mRNA, s^-1
k_translation = 7.2e-2        # translation, s^-1
k_degra_ptn = 7.70e-6      # degradation of protein, s^-1

## Define CME simulation

In [None]:
# Create our CME simulation object
sim = CME.CMESimulation(name='Gene Expression')

In [None]:
# Define our chemical species
species = ['gene', 'mRNA', 'protein']
sim.defineSpecies(species)

In [None]:
# Add reactions to the simulation

sim.addReaction(reactant='gene', product=('gene','mRNA'), rate=k_transcription)
sim.addReaction(reactant='mRNA', product='', rate=k_degra_mRNA)
sim.addReaction(reactant='mRNA', product=('mRNA','protein'), rate=k_translation)
sim.addReaction(reactant='protein', product='', rate=k_degra_ptn)


In [None]:
# Set our initial species counts
# Initial count of protein as 148 from proteomics
sim.addParticles(species='gene', count=1)
sim.addParticles(species='mRNA', count=1)
sim.addParticles(species='protein', count=148)

Finally, we define the simulation execution parameters. We have the simulation run for 6300 seconds of real time to cover the entire cell cyle.

The traces are recorded per 1 second.

Then we name the simulation output file and save the simulation definition to it.

In [None]:
# Simulation time is 6300 (entire cell life cycle of Minimal Cell).
writeInterval = 1
simtime = 6300

sim.setWriteInterval(writeInterval)
sim.setSimulationTime(simtime)

filename = "./T2.1-GeneticInformationProcess.lm"

os.system("rm -rf %s"%(filename)) # Remove previous LM file 

sim.save(filename)

In [None]:
# Print out the information of the system
sim

## Run Simulation

Change **`reps`** to simulate more cell replicates

In [None]:
# Run multiple replicates using the Gillespie solver
reps = 10

sim.run(filename=filename, method="lm::cme::GillespieDSolver", replicates=reps)

## Post-Processing

In [None]:
# Import Custom Analysis and Plotting Modules
import sys
sys.path.append('../analyze_scripts')
import custom_plot as plot

Go to **`fig_dir`** to see the plots

In [None]:
# Create folder to store plotted figures
fig_dir = './plots_GIP/'

if not os.path.exists(fig_dir):
    os.mkdir(fig_dir)

Serialize traces in LM file to a 3D Numpy Array with dimesions *(time, species, replicates)*.

In [None]:
fileHandle = PostProcessing.openLMFile(filename) # Create h5py file handle
timestep = PostProcessing.getTimesteps(fileHandle) # use PostProcessing to get the timesteps of the simulation

traces = np.zeros((len(timestep), len(sim.particleMap), reps)) # Initiate 3D array

# go to plot python script for detail
traces = plot.get_sim_data(traces, reps, filename) # Get 3D array

print('The size of the 3D trajectories is {0} with dimensions time, species, and replicates.'.format(np.shape(traces)))

**Mean and Variance of mRNA and Protein**

Plot the population average and span of mRNA and protein abundances

Shaded area is the full span, and the solid line is the population average.

In [None]:
# mRNA and Protein in one plot
trace_mRNA = traces[:,1,:] # 2D array
trace_ptn = traces[:,2,:] # 2D array

time = timestep/60
xlabel = 'Time [Min]'
title = f'mRNA and Protein in {reps} Replicates'
percentile = [0,100] # Full span
fig_size = [7, 7]
fig_name = f'GIP_mRNA_Protein_{reps}Replicates'

left_data = [trace_mRNA]
left_colors = ['red']
left_ylabel = f'mRNA'
left_plots = ['range_avg']
left_ylabel_color = 'red'
left_legends = len(left_data)*['']

right_data = [trace_ptn]
right_colors = ['blue']
right_ylabel = f'Protein'
right_plots = ['range_avg']
right_ylabel_color = 'blue'
right_legends = len(left_data)*['']

plot.plot_time_dualAxes(fig_dir, fig_name, fig_size,
            time, xlabel, title, percentile,
            left_data, left_legends, left_colors, left_ylabel, left_plots, left_ylabel_color,
            right_data, right_legends, right_colors, right_ylabel, right_plots, right_ylabel_color,
            xlimit=[0,simtime/60], title_set=True, fonts_sizes=[21, 21, 24, 18],
            extension='.png', tick_setting=[12, 4.5, 15, 'out'], line_widths = [3, 4.5], legend_pos='best')

**mRNA and protein in each single cell replicate individually**

The genetic information process occurrs in each single cell replicate. Now, let's look at the protein synthesis at single-cell level. You will an increase/burst in protein count when there are mRNAs and the halting when no mRNA.

Change **`rep`** to see different pattern of stochastic protein synthesis along the cell cycle.

In [None]:
# mRNA and Protein in one plot
rep = 1 # Choose cell replicate `rep`
trace_mRNA = traces[:,1,:] # 2D array
trace_ptn = traces[:,2,:] # 2D array

time = timestep/60
xlabel = 'Time [Min]'
title = f'mRNA and Protein in Cell {rep}'
percentile = [0,100] # Full span
fig_size = [7, 7]
fig_name = f'GIP_mRNA_Protein_Cell{rep}'

left_data = [trace_mRNA[:,rep-1]]
left_colors = ['red']
left_ylabel = f'mRNA'
left_plots = ['single']
left_ylabel_color = 'red'
left_legends = len(left_data)*['']

right_data = [trace_ptn[:,rep-1]]
right_colors = ['blue']
right_ylabel = f'Protein'
right_plots = ['single']
right_ylabel_color = 'blue'
right_legends = len(left_data)*['']

plot.plot_time_dualAxes(fig_dir, fig_name, fig_size,
            time, xlabel, title, percentile,
            left_data, left_legends, left_colors, left_ylabel, left_plots, left_ylabel_color,
            right_data, right_legends, right_colors, right_ylabel, right_plots, right_ylabel_color,
            xlimit=[0,simtime/60], title_set=True, fonts_sizes=[21, 21, 24, 18],
            extension='.png', tick_setting=[12, 4.5, 15, 'out'], line_widths = [3, 4.5], legend_pos='best')

**Distribution of Protein at the end of the cell cycle**

Increase the **`reps`** from 10 to 100 after restart the kernel, and redo the histogram

In [None]:
# Protein Distribution
ptn_endcycle = trace_ptn[-1,:] # 1D array

fig_size=[7, 7]
fig_name=f'GIP_Proteins_CycleEnd_{reps}replicates'
data_list=[ptn_endcycle]
legends=['']
colors = ['blue']
xlabel='Protein Counts at Cycle End'
ylabel='Cells'
title=f'Proteins Distribution in {reps} Replicates'
bins=min(int(reps/2), 10)

plot.plot_hists(fig_dir, fig_name, fig_size,
            data_list, legends, colors, xlabel, ylabel, title, bins,
            mean_median=[False, False],
            title_set=True, fonts_sizes=[21, 21, 21, 18],
            extension='.png', range=None, 
            tick_setting=[12, 4.5, 18, 'out'], line_widths = [3, 4.5], legend_pos='upper left')