# Thermodynamic Model to predict gene expression.

(c) 2020 Tom Röschinger. This work is licensed under a [Creative Commons Attribution License CC-BY 4.0](https://creativecommons.org/licenses/by/4.0/). All code contained herein is licensed under an [MIT license](https://opensource.org/licenses/MIT).

In [2]:
import wgregseq
%load_ext autoreload
%autoreload 2

import numpy as np
import pandas as pd

import matplotlib.pyplot as plt

# Set default plotting style
wgregseq.plotting_style();

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


In this notebook we write down a thermodynamic model to predict gene expression for promoter sequences. The goal is to use these predictions, and try to identify locations of binding sites, without knowing the underlying energy matrix for the binding site.

### Input

The input into the model is the binding energy of a transcription factor. This binding energy is given by an energy matrix that might be created arbitrarily, and by the sequence, which might contain single mutations or scrambles. Since the binding energy is the only thing that is going to vary between the sequences, 

## Simple repression motif

From [Chure et al., 2019](https://www.pnas.org/content/116/37/18275.short) the fold change in expression due to the simple repression motif is given by

\begin{equation}
    \text{fold-change} = \left( 1 + e^{-\beta\Delta \epsilon_{RA} +\log\left( R_A/N_{NS} \right) } \right)^{-1},
\end{equation}

where $R_A$ is the repressor copy number, $N_{NS}$ the number of non-specific binding sites and $\Delta \epsilon_{RA}$ the binding energy of the repressor to the specific site compared to non-specific background. Since we are interested in the change in binding energy due to mutation, let's look at that.

Labeling fold change by $f$, we can write (for the wild type)

$$
-\log (1/f^{\text{(wt)}} -1) + \log (R_A/N_{NS}) = \beta\Delta\epsilon_{RA}^{\text{(wt)}}.
$$

Now, let's write the difference in binding energies. A nice thing is that the repressor copy number and number of non-specific binding sites cancels,

$$
-\log (1/f^{\text{(wt)}} -1) + \log (1/f^{\text{(mut)}} -1) = \beta(\Delta\epsilon_{RA}^{\text{(wt)}} - \Delta\epsilon_{RA}^{\text{(mut)}}).
$$