# Tutorial 1.2 - Stochastic Genetic Information Process

Here we examine a CME model of stochastic Genetic Information Process.

In this model, we include the transcription and translation of gene and mRNA together with degradation of both mRNA and protein.

The model presented here can be found in the article: [Analytical distributions for stochastic gene expression](https://www.pnas.org/doi/full/10.1073/pnas.0803850105).


In [None]:
# Import Standard Python Libraries
import os
import numpy as np
import matplotlib.pyplot as plt

# Import pyLM Libraries
from pyLM import *
from pyLM.units import *
from pySTDLM import *
from pySTDLM.PostProcessing import *

# Enable plotting inline in the Jupyter notebook
%matplotlib inline

## Constants

Rates of reactions come from the [cell](https://www.cell.com/cell/fulltext/S0092-8674(21)01488-4?_returnURL=https%3A%2F%2Flinkinghub.elsevier.com%2Fretrieve%2Fpii%2FS0092867421014884%3Fshowall%3Dtrue) paper's Whole Cell Model of DnaA coding gene (G_0001) at initial conditions.

Degradation rate of protein is calculated based on 25 hours' half life in [Maier et al, 2011](https://www.embopress.org/doi/full/10.1038/msb.2011.38).

In [None]:
# Constants
k_transcription  = 6.41e-4       # Transcription, s^-1
k_degra_mRNA = 2.59e-3     # degradation of mRNA, s^-1
k_translation = 7.2e-2        # translation, s^-1
k_degra_ptn = 7.70e-6      # degradation of protein, s^-1

## Define CME simulation

We begin by creating a [CMESimulation](https://luthey-schulten.chemistry.illinois.edu/software/LM2.4/_autosummary/pyLM.CME.html#module-pyLM.CME) "object" that we call ```sim```. This object will include the definition of the whole stochastic simulation.

In [None]:
# Create our CME simulation object
sim = CME.CMESimulation(name='Gene Expression')

Next we define the chemical species with simulation. First. we specify the names of the chemical species.  Then we register these species with the simulation.  The [```defineSpecies()```](https://luthey-schulten.chemistry.illinois.edu/software/LM2.4/_autosummary/pyLM.CME.html#module-pyLM.CME) function can be called multiple times and will add any new names to the list of species.

In [None]:
# Define our chemical species
species = ['gene', 'mRNA', 'ptn']
sim.defineSpecies(species)

Here we add reactions to the simulation. We use the [```addReaction()```](https://luthey-schulten.chemistry.illinois.edu/software/LM2.4/_autosummary/pyLM.CME.html#module-pyLM.CME) function that is a member of the ```CMESimulation``` object. We add a bimolecular association reaction and a unimolecular dissociation reaction. When more than one reactant is involved, the list of reactant names should be passed as a tuple as can be seen in the reactant of the association reaction. The rates in this command must be in units of molecules and seconds, for instance units of ```/molecule/sec``` for the association reaction.

In [None]:
# Add reactions to the simulation

sim.addReaction(reactant='gene', product=('gene','mRNA'), rate=k_transcription)
sim.addReaction(reactant='mRNA', product='', rate=k_degra_mRNA)
sim.addReaction(reactant='mRNA', product=('mRNA','ptn'), rate=k_translation)
sim.addReaction(reactant='ptn', product='', rate=k_degra_ptn)


Next, we add the initial particle counts to the simulation using the [```addParticles()```](https://luthey-schulten.chemistry.illinois.edu/software/LM2.4/_autosummary/pyLM.CME.html#module-pyLM.CME) function.

In [None]:
# Set our initial species counts

sim.addParticles(species='gene', count=1)
sim.addParticles(species='mRNA', count=1)
sim.addParticles(species='ptn', count=0)

Finally, we define the simulation execution parameters. We have the simulation run for 6300 seconds of real time to cover the entire cell cyle.

The traces are recorded per 1 second.

Then we name the simulation output file and save the simulation definition to it.

In [None]:
# Simulation time is 6300 (entire cell life cycle of Minimal Cell).
writeInterval = 1
simtime = 6300

sim.setWriteInterval(writeInterval)
sim.setSimulationTime(simtime)

filename = "./T2.1-GeneticInformationProcess.lm"

os.system("rm -rf %s"%(filename)) # Remove previous LM file 

sim.save(filename)

In [None]:
# Print out the information of the system
sim

## Run Simulation

In [None]:
# Run multiple replicates using the Gillespie solver
reps = 10

sim.run(filename=filename, method="lm::cme::GillespieDSolver", replicates=reps)

## Post-Processing

Create Picture Folder

In [None]:
plotfolder = './plots_GeneticInformationProcess/'

if not os.path.exists(plotfolder):
    os.mkdir(plotfolder)

#### Serialize traces in LM file to a 3D Numpy Array with dimesions *(time, species, replicates)*.

In [None]:
import plot as plot

fileHandle = PostProcessing.openLMFile(filename) # Create h5py file handle
timestep = PostProcessing.getTimesteps(fileHandle) # use PostProcessing to get the timesteps of the simulation

traces = np.zeros((len(timestep), len(sim.particleMap), reps)) # Initiate 3D array

# go to plot python script for detail
traces = plot.get_sim_data(traces, reps, filename) # Get 3D array

print('The size of the 3D trajectories is {0} with dimensions time, species, and replicates.'.format(np.shape(traces)))

Plot the Trace of mRNA and protein in Single replicate individually

You need to compare the traces of mRNA (e.g. mRNA_replicate_1) and protein (e.g. ptn_replicate_1) in the same single replicate. You will an increase/burst in protein count when there are mRNA and the protein count holds or gradually decrease when no mRNA.

In [None]:
legends = ['mRNA']
for rep_mRNA in range(1, 10+1): # Plot only the first 10
    mRNA_singlereplicate = traces[:,1,rep_mRNA-1]  # Slice single replicate from the 3D array  
    mRNA_singlereplicate = mRNA_singlereplicate[:, np.newaxis] # Change shape from (n,) to (n,1)
    fig_path = './plots_GeneticInformationProcess/{0}_replicate{1}.png'.format(legends[0], rep_mRNA)
    ylabel = 'Count'; title = 'Trace of {0} in Replicate {1} '.format(legends[0], rep_mRNA)

    plot.plot_traces(timestep, mRNA_singlereplicate, legends, fig_path, ylabel, title)


legends = ['protein']
for rep_ptn in range(1, 10+1): # Plot only the first 10

    ptn_singlereplicate = traces[:,2,rep_ptn-1]
    ptn_singlereplicate = ptn_singlereplicate[:, np.newaxis]

    fig_path = './plots_GeneticInformationProcess/{0}_replicate{1}.png'.format(legends[0], rep_ptn)
    ylabel = 'Count'; title = 'Trace of {0} in Replicate {1} '.format(legends[0], rep_ptn)

    plot.plot_traces(timestep, ptn_singlereplicate, legends, fig_path, ylabel, title)

Plot the min, max and ensemble average of mRNA and protein

In [None]:
# mRNA
trace_mRNA = traces[:,1,:]
specie = 'mRNA'
fig_path = './plots_GeneticInformationProcess/minmaxavg_{0}_{1}replicates.png'.format(specie, reps)
ylabel = 'Count'; title = 'Trace of {0} Among {1} replicates'.format(specie, reps)
plot.plot_min_max_avg(timestep, trace_mRNA, fig_path, ylabel, title)

# protein
trace_ptn = traces[:,2,:]
specie = 'protein'
fig_path = './plots_GeneticInformationProcess/minmaxavg_{0}_{1}replicates.png'.format(specie, reps)
ylabel = 'Count'; title = 'Trace of {0} Among {1} replicates'.format(specie, reps)
plot.plot_min_max_avg(timestep, trace_ptn, fig_path, ylabel, title)

#### Plot the distribution of Protein at the end of the simulation

Go to *fig_path* to see the plots

In [None]:
# Protein Distribution
ptns_end = traces[-1,2,:] # Slice Numpy array to get the counts of ptn at the end of the whole cell cycle
fig_path = plotfolder + 'Distribution_Ptns_at_{0}_seconds_{1}replicates.png'.format(simtime, reps)
xlabel = 'Counts of Protein [#]'
title = 'Distribution of Ptn Counts at {0} seconds {1} replicates'.format(simtime, reps)
plot.plot_histogram(data=ptns_end, figure_path=fig_path, bins=20, xlabel=xlabel, title=title)

# mRNA Distribution
mRNAs_end = traces[-1,1,:] # Slice Numpy array to get the counts of ptn at the end of the whole cell cycle
fig_path = plotfolder + 'Distribution_mRNA_at_{0}_seconds_{1}replicates.png'.format(simtime, reps)
xlabel = 'Counts of mRNA [#]'
title = 'Distribution of mRNA Counts at {0} seconds {1} replicates'.format(simtime, reps)
plot.plot_histogram(data=mRNAs_end, figure_path=fig_path, bins=5, xlabel=xlabel, title=title)

## Discussion 2

1. Do mRNA and protein reach steady-state during the 6300 seconds' simulation? How can you tell this from the plots? Try to increase the replicates numbers reps from 10 to 100 to make your analysis more statitically realistic.

2. The initial count of protein P\_0001/DnaA from experimental proteomics data is 148. Compare the mean count of protein at the end of the cell cycle to this experimental count. Does the simulation roughly generate 148 proteins during the entire cell cycle? And why this is important? Please consider cell division.