# Modeling experimental noise

(c) 2019 Manuel Razo. This work is licensed under a [Creative Commons Attribution License CC-BY 4.0](https://creativecommons.org/licenses/by/4.0/). All code contained herein is licensed under an [MIT license](https://opensource.org/licenses/MIT)

---

In [1]:
import os
import pickle
import cloudpickle
import itertools
import glob

# Our numerical workhorses
import numpy as np
import scipy as sp
import pandas as pd

# Import matplotlib stuff for plotting
import matplotlib.pyplot as plt
import matplotlib.cm as cm
import matplotlib as mpl

# Seaborn, useful for graphics
import seaborn as sns

# Import the project utils
import sys
sys.path.insert(0, '../../')
import ccutils

# Magic function to make matplotlib inline; other style specs must come AFTER
%matplotlib inline

# This enables SVG graphics inline
%config InlineBackend.figure_format = 'retina'

tmpdir = '../../tmp/'
figdir = '../../fig/moment_dynamics_numeric/'
datadir = '../../data/csv_maxEnt_dist/'

In [2]:
# Set PBoC plotting format
ccutils.viz.set_plotting_style()
# Increase dpi
mpl.rcParams['figure.dpi'] = 110

### $\LaTeX$ macros
$\newcommand{kpon}{k^{(p)}_{\text{on}}}$
$\newcommand{kpoff}{k^{(p)}_{\text{off}}}$
$\newcommand{kron}{k^{(r)}_{\text{on}}}$
$\newcommand{kroff}{k^{(r)}_{\text{off}}}$
$\newcommand{rm}{r _m}$
$\newcommand{gm}{\gamma _m}$
$\newcommand{rp}{r _p}$
$\newcommand{gp}{\gamma _p}$
$\newcommand{mm}{\left\langle m \right\rangle}$
$\newcommand{ee}[1]{\left\langle #1 \right\rangle}$
$\newcommand{bb}[1]{\mathbf{#1}}$
$\newcommand{foldchange}{\text{fold-change}}$
$\newcommand{\ee}[1]{\left\langle #1 \right\rangle}$
$\newcommand{\bb}[1]{\mathbf{#1}}$
$\newcommand{\dt}[1]{{\partial{#1} \over \partial t}}$
$\newcommand{\Km}{\bb{K}}$
$\newcommand{\Rm}{\bb{R}_m}$
$\newcommand{\Gm}{\bb{\Gamma}_m}$
$\newcommand{\Rp}{\bb{R}_p}$
$\newcommand{\Gp}{\bb{\Gamma}_p}$
$\newcommand{\Var}{\text{Var}}$
$\newcommand{\std}{\text{STD}}$

## Experimental noise in the data and information loss

The objective of this notebook will be to explore the possible sources of experimental noise when determining the single-cell distributions. Our theoretical model predicts the protein distribution $P(p)$, but in reality what we get to observe is not this quantity, but an indirect redout given by a number of electrons that the photomultiplier in the microscope camera emits given a certain number of photons. That means we have a Markov chain of the form
$$
c \rightarrow p \rightarrow \nu,
$$
where $c$ is the environmental concentration of the inducer, $p$ is the protein copy number and $\nu$ is the intensity readout from the microscope. The data processing inequality tells us that for this chain it must be the case that
$$
I(c; \nu) \leq I(c; p),
$$
in other words, as we have more and more steps in the chain we can only lose information about the input.

So far our theoretical model overestimates the channel capacity that cells can process compared to the experimental determination of this quantity. More fundamentally we see that the single-cell intensity data that we obtain from microscopy has a higher noise compared with the theoretical predictions. Here we will try to model sources of experimental noise to see if we can explain the discrepancy between theory and data. Our first attempt will be to model a Poisson distributed amount of photons from each fluorophore.

### Poisson emission of photons

The protein that we are indirectly measuring in the microscope is a YFP molecule that upon some wavelength excitation it emits an unknown number of photons $\nu$. That means that the quantity we get to observe experimentally - ignoring for now the downstream conversion into electrons by the photomultiplier - is not the protein number but certain photon count.

If we assume each protein emits on average $\lambda$ photons, we have that for a single protein the number of photons $\nu$ is distributed as
$$
P(\nu) = {\lambda^\nu e^{- \lambda} \over \nu!},
$$
i.e. a Poission distribution. So for a given protein copy number $p$ the probability of observing $\nu$ photons is given by
$$
P(\nu \mid p) = {(p\lambda)^\nu e^{- p\lambda} \over \nu!},
$$
since the sum of i.i.d. Poisson distributed random variables with mean $\lambda$ is also poisson with the mean multiplied by the number of random variables added. Therefore for a given readout $\nu$ we have to marginalize over all possible protein values, i.e.
$$
P(\nu) = \sum_{p=0}^{\infty} P(\nu \mid p) P(p),
$$
where $P(p)$ is the theoretical distribtion of protein.

The mean photon emission in this model is then computed as
$$
\ee{\nu} = \sum_{\nu = 0}^\infty \nu P(\nu).
$$
Substituting the defitino of $P(\nu)$ gives
$$
\ee{\nu} = \sum_{\nu = 0}^\infty \nu \sum_{p=0}^{\infty} P(\nu \mid p) P(p).
$$
Rearranging the sums results in
$$
\ee{\nu} = \sum_{p = 0}^\infty P(p)
\underbrace{\sum_{\nu = 0}^\infty \nu P(\nu \mid p)}_{\lambda p}.
$$
As shown in the underbrace we notice that this is the mean photon emission by $p$ proteins. This then results in
$$
\ee{\nu} = \lambda \sum_{p = 0}^\infty p P(p)
= \lambda \ee{p}
$$
This result makes sense. The mean number of photons emitted is equal to the mean number of protein $\ee{p}$ times the mean photon count per protein $\lambda$.

To compute the variance in photon emission we first need to compute the second moment $\ee{\nu^2}$. This is computed as
$$
\ee{\nu^2} = \sum_{p = 0}^\infty P(p)
\underbrace{\sum_{\nu = 0}^\infty \nu^2 P(\nu \mid p)}_{\lambda p + (\lambda p)^2}.
$$
Again we noticed that the term with the underbrace is nothing but the second moment of a Poisson distribution. Substituting this results in
$$
\ee{\nu^2} = \sum_{p = 0}^\infty P(p)\left( \lambda p + (\lambda p)^2 \right)
= \lambda \ee{p} + \lambda^2 \ee{p^2}.
$$

With these two results we can compute the variance in photon count $\Var(\nu)$ as
$$
\Var(\nu) = \ee{\left( \nu - \ee{\nu} \right)^2} =
\ee{\nu^2} - \ee{\nu}^2.
$$
Substituting the two moments we obtained result in
$$
\Var(\nu) = \lambda \ee{p} + \lambda^2 \ee{p^2} - 
\left( \lambda \ee{p} \right)^2.
$$
Rearranging terms we obtain
$$
\Var(\nu) = \lambda \ee{p} + \lambda^2 \left( \ee{p^2} - \ee{p}^2 \right)
= \lambda \ee{p} + \lambda^2 \Var(p).
$$

This is an interesting result. What this shows is that the variance of the number of photons is proportional to the variance on the number of proteins plus an extra linear term that depends on the mean. So there is a larger variance given the Poisson nature of the photon emission that we are assuming.

If we now compute the noise (STD / mean) we obtain
$$
\text{noise} = {\std(\nu) \over \ee{\nu}} =
{\sqrt{\lambda \ee{p} + \lambda^2 \Var(p)} \over \lambda \ee{p}}
$$