# 20191108 theory-vs-experiment analysis

(c) 2019 Manuel Razo & Niko McCarty. This work is licensed under a [Creative Commons Attribution License CC-BY 4.0](https://creativecommons.org/licenses/by/4.0/). All code contained herein is licensed under an [MIT license](https://opensource.org/licenses/MIT).

In [1]:
import os
import glob
import itertools
import re
import regex
import numpy as np
import pandas as pd
import skbio
import git

# Import this project's library
import rnaseq_barcode as rnaseq

# Import Interactive plot libraries
import bokeh.plotting
import bokeh.layouts
from bokeh.themes import Theme
import holoviews as hv
import hvplot
import hvplot.pandas
import panel as pn
import bokeh_catplot

bokeh.io.output_notebook()
hv.extension('bokeh')

In [2]:
theme = Theme(json=rnaseq.viz.pboc_style_bokeh())
hv.renderer('bokeh').theme = theme
bokeh.io.curdoc().theme = theme

# Objective

In this notebook we will take the output generated in `barcode_quantification.ipynb`, i.e. the cDNA counts normalized by the corresponding gDNA counts, and compute the corresponding fold-change in gene expression in order to compare theory vs experiment.

First we will read the list of normalized barcode counts.

In [3]:
# Find home directory for repo
repo = git.Repo("./", search_parent_directories=True)
homedir = repo.working_dir

# Define directory with barcodes
bcdir = f'{homedir}/data/barcodes/20191108_RNA_DNA_seq/'

df_norm = pd.read_csv(f'{bcdir}barcode_norm_counts.csv')

df_norm.head()

Unnamed: 0,op_bc,gfp_bc,counts,bio_rep,tec_rep,repressor,operator,norm_counts
0,AAAAAAATATAATTAGGACC,AGTC,9,1,1,60,O1,0.125
1,AAAAAAATATAATTAGGACC,CGTT,18,1,1,1740,O1,0.28125
2,AAAAAAATATAATTAGGACC,GTAC,61,1,1,260,O1,0.622449
3,AAAAAAATATAATTAGGACC,TCAG,98,1,1,124,O1,0.308176
4,AAAAAAATATAATTAGGACC,TGCA,9,1,1,1220,O1,0.409091


Just for curiosity let's look at the ECDF of the normalized counts

In [9]:
p = bokeh.plotting.figure(
    width=400,
    height=300,
    x_axis_type='log',
    x_axis_label='norm. counts (cDNA/gDNA)',
    y_axis_label='ECDF'
)

bokeh_catplot.ecdf(
    data=df_norm,
    cats='operator',
    val='norm_counts',
    formal=True,
    p=p
)

bokeh.io.show(p)



We can see from this that the trend in the ECDF is what we would expect, having O3 as the operator with mostly high normalized values.

Let's compute the fold-change by normalizing the counts by the zero repressor count. As a first approximation we will take the mean count for each repressor copy number and operator combination and normalized by the corresponding ∆lacI count.

In [15]:
# group data
df_group = df_norm.groupby(
    ['bio_rep', 'tec_rep', 'operator', 'repressor']
)

# Initialize dataframe to save output
df_fc = pd.DataFrame([])

# Loop through groups
for group, data in df_group:
    # Extract information
    bio_rep, tec_rep, op, rep = [*group]
    # Extract ∆lacI information
    data_delta = df_norm[
        (df_norm.bio_rep == bio_rep)
        & (df_norm.tec_rep == tec_rep)
        & (df_norm.operator == op)
        & (df_norm.repressor == 0)
    ]
    # Compute mean ∆lacI counts
    mean_delta = data_delta.norm_counts.mean()
    # Compute mean strain info
    mean_strain = data.norm_counts.mean()
    # Compute fold-change
    fc = mean_strain / mean_delta
    series = pd.Series([fc, mean_strain, bio_rep, tec_rep, op, rep],
                      index=['fold_change', 'mean_norm', 'bio_rep',
                               'tec_rep', 'operator', 'repressor'])
    df_fc = df_fc.append(series, ignore_index=True)
    
df_fc.head()

Unnamed: 0,bio_rep,fold_change,mean_norm,operator,repressor,tec_rep
0,1.0,1.0,1.454559,O1,0.0,1.0
1,1.0,0.446021,0.648764,O1,22.0,1.0
2,1.0,0.339889,0.494388,O1,60.0,1.0
3,1.0,0.324489,0.471989,O1,124.0,1.0
4,1.0,0.337867,0.491448,O1,260.0,1.0


Let's now compute the theoretical fold-change

In [23]:
# Import constants
constants = rnaseq.thermo.load_constants()

# Extract binding energies
era = [constants[op] for op in df_fc.operator.unique()]

# Define repressor array
rep_array = np.logspace(1, np.log10(2000), 50)

# Generate meshgrid to feed into function
rr, ee = np.meshgrid(rep_array, era)

# # Instantiate simple repression class
theory = rnaseq.thermo.SimpleRepression(
    rr,
    ee,
    effector_conc=0,
    ka=constants["Ka"],
    ki=constants["Ki"],
    ep_ai=constants["ep_AI"],
)

# Compute fold-change for lacI titration
fc_theory = theory.fold_change()

In [35]:
colors = bokeh.palettes.Dark2_3

p_fc = bokeh.plotting.figure(
 width=400,
    height=300,
    x_axis_type='log',
    y_axis_type='log',
    x_axis_label='repressor/cell',
    y_axis_label='fold_change'
)

for i, fc in enumerate(fc_theory):
    p_fc.line(rep_array, fc, color=colors[i],
              line_width=2)

df_group = df_fc.groupby('operator')
# Loop through operators
for i, (group, data) in enumerate(df_group):
    p_fc.circle(data.repressor, data.fold_change, color=colors[i])
    
bokeh.io.show(p_fc)