## siRNA knockdown calibrated ##

This notebook fits analytical functions to Rafał’s data.

## Motivation
siRNA (small interfering RNA) triggers specific degradation of mRNA. The RISC (RNA-induced silencing complex), which consists of siRNA and some proteins, cuts mRNA containing a strand sequence complementary to the sequence of the siRNA. Reducing protein expression by adding siRNA is called gene knockdown.

Gene knockdown is a promising approach for the treatment of some diseases, e.g. cancer. The aim of this project is to study the influence of siRNA on gene expression and mRNA degradation on the single cell level to promote development of siRNA-based medical treatments.

This is done by fitting the solutions of the differential equations describing the expression network to measured fluorescence traces of cells transfected with a GFP mRNA and a RFP mRNA, where siRNA specific for GFP mRNA is added and RFP is used as a reference.

Among the fit parameters, there is the GFP mRNA degradation rate $\delta_\text{g}$ and the RFP mRNA degradation rate $\delta_\text{r}$.

The solutions look like this:

$$f_\text{red}(t) =
m_\text{r}\,k_\text{tl} \left(
\frac{1}{\beta_\text{r}-\delta_\text{r}+k_\text{m,r}} \mathrm{e}^{-(\beta_\text{r}+k_\text{m,r})(t-t_0)}
-\frac{1}{\beta_\text{r} - \delta_\text{r}} \mathrm{e}^{-\beta_\text{r} (t-t_0)}
+\frac{k_\text{m,r}}{(\beta_\text{r}-\delta_\text{r}) (\beta_\text{r}-\delta_\text{r}+k_\text{m,r})} \mathrm{e}^{-\delta_\text{r} (t-t_0)}
\right)
$$

$$f_\text{green}(t) =
m_\text{g}\,k_\text{tl} \left(
\frac{1}{\beta_\text{g}-\delta_\text{g}+k_\text{m,g}} \mathrm{e}^{-(\beta_\text{g}+k_\text{m,g})(t-t_0)}
-\frac{1}{\beta_\text{g} - \delta_\text{g}} \mathrm{e}^{-\beta_\text{g} (t-t_0)}
+\frac{k_\text{m,g}}{(\beta_\text{g}-\delta_\text{g}) (\beta_\text{g}-\delta_\text{g}+k_\text{m,g})} \mathrm{e}^{-\delta_\text{g} (t-t_0)}
\right)
$$

## Notebook structure
The notebook has the following structure:

At first, the model functions are defined and the data is loaded. The next section contains code for fitting the two models separately. The next section contains code for fitting the two traces in one run with parameters shared among the models.

Fitting requires that the result list `R` is defined, which can be done by running the corresponding cell. When `R` has been populated by fitting, the results can be plotted. There are cells for plotting the results of the separate fit, the results of the combined fit, and the pure parameter distributions of all fits.

Additionally, there are cells for saving and loading paramaters by python’s `pickle` module.

In [None]:
# Import modules needed

# Standard library
from collections import OrderedDict
from copy import deepcopy
import inspect
import os
import pickle
import re
import sys

# Scientific stack
import numpy as np
np.seterr(divide='print')
import pandas as pd
import scipy as sc
import scipy.optimize as so
import scipy.stats as ss
from sklearn.neighbors import KernelDensity

# Matplotlib
%matplotlib inline
from matplotlib.backends.backend_pdf import PdfPages
from matplotlib.gridspec import GridSpec
import matplotlib.lines as mlin
import matplotlib.patches as mptch
import matplotlib.pyplot as plt

# Notebook utilities
import IPython
import ipywidgets as wdg

In [None]:
# Define utility functions
def getTimeStamp():
    """Returns a human-readable string representation of the current time"""
    import time
    from datetime import datetime
    return datetime.now().strftime("%Y-%m-%d–%H%M%S")


def getOutpath(filename='', timestamp=None):
    """Returns (and creates, if necessary) the path to a directory
    called “out” inside the current directory.
    If `filename` is given, the filename is appended to the output directory.
    A timestamp will be added to the filename if `timestamp != ''`.
    If timestamp is `None`, the current timestamp is used.
    """
    # Create output directory
    outpath = os.path.join(os.getcwd(), 'out')
    if not os.path.isdir(outpath) and not os.path.lexists(outpath):
        os.mkdir(outpath)

    # If requested, build filename
    if len(filename) > 0:
        if timestamp == None:
            timestamp = getTimeStamp()
        outpath = os.path.join(outpath, ((timestamp + '_') if len(timestamp) > 0 else '') + filename)
    return outpath

In [None]:
def model(t, t0, m, k, beta, delta, offset):
    """General expression model function"""

    f = np.zeros(np.shape(t))
    idx_after = (t > t0)
    dt = t[idx_after] - t0

    f1 = np.exp(- (beta + k) * dt) / (beta - delta + k)
    f2 = - np.exp(- beta * dt) / (beta - delta)
    f3 = k * np.exp(- delta * dt) / (beta - delta) / (beta - delta + k)

    f[idx_after] = (f1 + f2 + f3) * m

    return f + offset

def red(t, tr, m_ktl, kmr, betr, deltr, offr):
    """Model function for RFP data"""
    return model(t=t, t0=tr, m=m_ktl, k=kmr, beta=betr, delta=deltr, offset=offr)

def green(t, tg, m_ktl, kmg, betg, deltg, offg):
    """Model function for GFP data"""
    return model(t=t, t0=tg, m=m_ktl, k=kmg, beta=betg, delta=deltg, offset=offg)

def combined(t, tr, tg, m_ktl, kmr, kmg, betr, betg, deltr, deltg, offr, offg):
    """Model function for a combined fit of red and green data"""
    return np.stack(
        (model(t=t, t0=tr, m=m_ktl, k=kmr, beta=betr, delta=deltr, offset=offr),
         model(t=t, t0=tg, m=m_ktl, k=kmg, beta=betg, delta=deltg, offset=offg)),
        axis=1)

In [None]:
# Set default parameter values
m_ktl_0 = 2e4

tr_0 = 4.5
kmr_0 = 0.1
betr_0 = 0.3
deltr_0 = 0.03
offr_0 = 0

tg_0 = 2
kmg_0 = 0.1
betg_0 = 0.04
deltg_0 = 2
offg_0 = 0

MAX_m_ktl = 5e5
MAX_tr = 30
MAX_tg = 30
MAX_kmr = 30
MAX_kmg = 30
MAX_betr = 10
MAX_betg = 10
MAX_deltr = None
MAX_deltg = None

MIN_m_ktl = 1

In [None]:
class FitParameters:
    """FitParameters facilitates managing values and bounds of fit parameters"""
    def __init__(self, fun, independent=[], fixed=[]):
        # Store function
        self.fun = fun

        # Get parameters of fun
        params = inspect.signature(self.fun).parameters

        # Build data frame of parameters
        self.df = pd.DataFrame(columns=['value', 'min', 'max'],
                               index=[p for p in params.keys()],
                               dtype=np.float64)

        # Set “independent” and “fixed” flag
        self.df.add(pd.DataFrame(columns=['independent', 'fixed'], dtype=np.bool))
        for p in self.df.index.values:
            self.df.loc[p, 'independent'] = p in independent
            self.df.loc[p, 'fixed'] = p in fixed

        # Set default parameters
        for p in self.df.index.values:
            if params[p].default == inspect.Parameter.empty:
                if self.df.loc[p, 'independent']:
                    self.df.loc[p, 'value'] = np.NaN
                else:
                    self.df.loc[p, 'value'] = 0
            else:
                self.df.loc[p, 'value'] = params[p].default

    def set(self, p, **props):
        """Allows user to change parameter properties"""
        if p not in self.df.index.values:
            raise KeyError("Unknown parameter name: {}".format(par))

        for prop, val in props.items():
            if prop == 'value':
                self.df.loc[p, 'value'] = val
            elif prop == 'min':
                self.df.loc[p, 'min'] = val
            elif prop == 'max':
                self.df.loc[p, 'max'] = val
            elif prop == 'independent':
                self.df.loc[p, 'independent'] = val
            elif prop == 'fixed':
                self.df.loc[p, 'fixed'] = val
            else:
                raise KeyError("Illegal parameter property: {}".format(prop))

    def eval_params(self, params=[], **vals):
        """Returns parameters for evaluating the function.

        Arguments:
        params: optional list of values of free parameters
        vals: dictionary of parameter values

        If a value for a parameter is specified in both `params` and `vals`,
        the value from `vals` is used.
        Values for independent parameters must be specified in `vals`."""
        # Add additional values from `params` to vals
        if len(params) != 0:
            par_names = self.names()
            if np.size(par_names) != len(params):
                raise ValueError("Wrong number of parameters given ({})".format(len(params)))
            for pn, pv in zip(par_names, params):
                if pn not in vals:
                    vals[pn] = pv

        # Fill values unspecified so far from `self.df`
        for p in self.df.index.values:
            if p not in vals:
                if self.df.loc[p, 'independent']:
                    raise ValueError("Independent parameter `{}` not specified".format(p))
                else:
                    vals[p] = self.df.loc[p, 'value']
        return vals

    def fixed_params(self):
        """Returns a dictionary of all fixed parameters and their values."""
        return self.df.loc[self.df.index[self.df.loc[:,'fixed']],'value'].to_dict()

    def eval(self, params=[], **vals):
        """Evaluates the function.

        Arguments:
        params: optional list of values of free parameters
        vals: dictionary of parameter values

        If a value for a parameter is specified in both `params` and `vals`,
        the value from `vals` is used.
        Values for independent parameters must be specified in `vals`."""
        return self.fun(**self.eval_params(params, **vals))

    def freeIdx(self):
        """Returns a list of names of free parameters"""
        return [p for p in self.df.index.values
                if not (self.df.loc[p, 'independent'] or self.df.loc[p, 'fixed'])]

    def bounds(self):
        """Returns a list of bound tuples of free parameters
        for use in scipy.optimize.minimize"""
        bnds = []
        for p in self.freeIdx():
            # Get parameter bounds
            min_val = self.df.loc[p, 'min']
            max_val = self.df.loc[p, 'max']

            # Replace missing values with default minimum and maximum values
            if np.isnan(min_val):
                min_val = None
            if np.isnan(max_val):
                max_val = None

            # Append to bounds list
            bnds.append((min_val, max_val))
        return bnds

    def initial(self):
        """Returns a numpy.ndarray of initial values for use in scipy.optimize.minimize"""
        return self.df.loc[self.freeIdx(), 'value'].values.copy()

    def index(self, p):
        """Returns the index of a given parameter in the parameter vector"""
        idx = np.flatnonzero(self.df.index.values == p)
        if len(idx) == 0:
            raise KeyError("Unknown parameter name: {}".format(p))
        return idx[0]

    def names(self, onlyFree=True):
        """Returns an array of the parameter names.

        If `onlyFree == True`, only free parameters are returned.
        Else, all parameters (including independent and fixed parameters) are returned."""
        if onlyFree:
            return np.array(self.freeIdx(), dtype=np.object_)
        else:
            return self.df.index.values.copy()

    def free_values(self, values):
        """Returns a dictionary of all free parameters and their values.

        Arguments:
            values -- an iterable with the values of the free parameters

        The entries in `values` must have the same order as the parameters
        returned by `FitParameters.names`.
        """
        return {p: v for p, v in zip(self.freeIdx(), values)}

    def copy(self):
        """Returns a deep copy of this instance"""
        return deepcopy(self)

In [None]:
# Separate models for scipy.optimize.minimize
red_p = FitParameters(red, independent='t')
red_p.set('tr', min=0, max=MAX_tr, value=tr_0)
red_p.set('m_ktl', min=MIN_m_ktl, max=MAX_m_ktl, value=m_ktl_0)
red_p.set('kmr', min=0, max=MAX_kmr, value=kmr_0)
red_p.set('betr', min=0.001, max=MAX_betr, value=betr_0)
red_p.set('deltr', min=0.001, max=MAX_deltr, value=deltr_0)
red_p.set('offr', value=offr_0)

green_p = FitParameters(green, independent='t')
green_p.set('tg', min=0, max=MAX_tg, value=tg_0)
green_p.set('m_ktl', min=MIN_m_ktl, max=MAX_m_ktl, value=m_ktl_0)
green_p.set('kmg', min=0, max=MAX_kmg, value=kmg_0)
green_p.set('betg', min=0.001, max=MAX_betg, value=betg_0)
green_p.set('deltg', min=0.001, max=MAX_deltg, value=deltg_0)
green_p.set('offg', value=offg_0)

In [None]:
# Combined model for scipy.optimize.minimize
combined_p = FitParameters(combined, independent = 't')
combined_p.set('tr', min=0, max=MAX_tr, value=tr_0)
combined_p.set('tg', min=0, max=MAX_tg, value=tg_0)
combined_p.set('m_ktl', min=MIN_m_ktl, max=MAX_m_ktl, value=m_ktl_0)
combined_p.set('kmr', min=0, max=MAX_kmr, value=kmr_0)
combined_p.set('kmg', min=0, max=MAX_kmg, value=kmg_0)
combined_p.set('betr', min=0, max=MAX_betr, value=betr_0)
combined_p.set('betg', min=0, max=MAX_betg, value=betg_0)
combined_p.set('deltr', min=0, max=MAX_deltr, value=deltr_0)
combined_p.set('deltg', min=0, max=MAX_deltg, value=deltg_0)
combined_p.set('offr', value=offr_0)
combined_p.set('offg', value=offg_0)

## Jacobian
To increase the efficiency of fitting, the Jacobian matrix of the objective function is provided to the optimization routine.
If the objective function is a typical negative log-likelihood function with normal distribution of residuals
$$
 L(\theta) = \sum_{t\in T} \frac{1}{2\sigma_t^2} \big(D_t - f(t\mid\theta)\big)^2 \text{,}
$$
where $D_t$ is the measured data at time $t$ and $f(t\mid\theta)$ is the value of the model function at time $t$ with parameters $\theta$, the Jacobian is:
$$\begin{align}
\nabla L(\theta) &= \nabla \sum_{t\in T} \frac{1}{2\sigma_t^2} \big(D_t - f\left(t\,\middle|\,\theta\right)\big)^2 \\
&= \sum_{t\in T} \nabla \frac{1}{2\sigma_t^2} \big( D_t - f\left(t\,\middle|\,\theta\right) \big)^2 \\
&= \sum_{t\in T} \frac{2}{2\sigma_t^2} \big( D_t - f\left(t\,\middle|\,\theta\right) \big) \nabla\big( D_t - f\left(t\,\middle|\,\theta\right) \big) \\
&= \sum_{t\in T} \frac{1}{\sigma_t^2} \big( D_t - f\left(t\,\middle|\,\theta\right) \big)\big(\nabla D_t - \nabla  f\left(t\,\middle|\,\theta\right)\big) \\
&= -\sum_{t\in T} \frac{1}{\sigma_t^2} \big(D_t - f\left(t\,\middle|\,\theta\right)\big) \nabla f\left(t\,\middle|\,\theta\right) \\
\end{align}$$
We see that for calculating the Jacobian of the objective function we need the Jacobian of the model function.

We use the general expression model function
$$
f\left(t \,\middle|\, t_0, m, k, \beta, \delta, a\right) = a + m \left(\frac{k \mathrm{e}^{- \delta \left(t - t_{0}\right)}}{\left(\beta - \delta\right) \left(k + \beta - \delta\right)} + \frac{\mathrm{e}^{\left(- k - \beta\right) \left(t - t_{0}\right)}}{k + \beta - \delta} - \frac{\mathrm{e}^{- \beta \left(t - t_{0}\right)}}{\beta - \delta}\right) \text{,}
$$
where $t_0$ is the mRNA expression onset time, $m$ is the product of initial mRNA amount and translation rate, $k$ is the maturation rate, $\beta$ is the protein degradation rate, $\delta$ is the mRNA degradation rate, and $a$ is a vertical offset.

The Jacobian $\nabla f\left(t \,\middle|\, t_0, m, k, \beta, \delta, a\right)$ of the general expression model function is the vector of the derivatives with respect to the various parameters:
$$\begin{align*}
\frac{\partial f}{\partial t_0} &= m \left(\frac{k \delta \mathrm{e}^{- \delta \left(t - t_{0}\right)}}{\left(\beta - \delta\right) \left(k + \beta - \delta\right)} - \frac{\beta \mathrm{e}^{- \beta \left(t - t_{0}\right)}}{\beta - \delta} + \frac{\left(k + \beta\right) \mathrm{e}^{\left(- k - \beta\right) \left(t - t_{0}\right)}}{k + \beta - \delta}\right)\\
\frac{\partial f}{\partial m} &= \frac{k \mathrm{e}^{- \delta \left(t - t_{0}\right)}}{\left(\beta - \delta\right) \left(k + \beta - \delta\right)} + \frac{\mathrm{e}^{\left(- k - \beta\right) \left(t - t_{0}\right)}}{k + \beta - \delta} - \frac{\mathrm{e}^{- \beta \left(t - t_{0}\right)}}{\beta - \delta}\\
\frac{\partial f}{\partial k} &= m \left(- \frac{k \mathrm{e}^{- \delta \left(t - t_{0}\right)}}{\left(\beta - \delta\right) \left(k + \beta - \delta\right)^{2}} + \frac{\mathrm{e}^{\left(- k - \beta\right) \left(t - t_{0}\right)}}{k + \beta - \delta} \left(- t + t_{0}\right) - \frac{\mathrm{e}^{\left(- k - \beta\right) \left(t - t_{0}\right)}}{\left(k + \beta - \delta\right)^{2}} + \frac{\mathrm{e}^{- \delta \left(t - t_{0}\right)}}{\left(\beta - \delta\right) \left(k + \beta - \delta\right)}\right)\\
\frac{\partial f}{\partial \beta} &= m \left(- \frac{k \mathrm{e}^{- \delta \left(t - t_{0}\right)}}{\left(\beta - \delta\right) \left(k + \beta - \delta\right)^{2}} - \frac{k \mathrm{e}^{- \delta \left(t - t_{0}\right)}}{\left(\beta - \delta\right)^{2} \left(k + \beta - \delta\right)} + \frac{\mathrm{e}^{\left(- k - \beta\right) \left(t - t_{0}\right)}}{k + \beta - \delta} \left(- t + t_{0}\right) - \frac{\mathrm{e}^{- \beta \left(t - t_{0}\right)}}{\beta - \delta} \left(- t + t_{0}\right) - \frac{\mathrm{e}^{\left(- k - \beta\right) \left(t - t_{0}\right)}}{\left(k + \beta - \delta\right)^{2}} + \frac{\mathrm{e}^{- \beta \left(t - t_{0}\right)}}{\left(\beta - \delta\right)^{2}}\right)\\
\frac{\partial f}{\partial \delta} &= m \left(\frac{k \left(- t + t_{0}\right) \mathrm{e}^{- \delta \left(t - t_{0}\right)}}{\left(\beta - \delta\right) \left(k + \beta - \delta\right)} + \frac{k \mathrm{e}^{- \delta \left(t - t_{0}\right)}}{\left(\beta - \delta\right) \left(k + \beta - \delta\right)^{2}} + \frac{k \mathrm{e}^{- \delta \left(t - t_{0}\right)}}{\left(\beta - \delta\right)^{2} \left(k + \beta - \delta\right)} + \frac{\mathrm{e}^{\left(- k - \beta\right) \left(t - t_{0}\right)}}{\left(k + \beta - \delta\right)^{2}} - \frac{\mathrm{e}^{- \beta \left(t - t_{0}\right)}}{\left(\beta - \delta\right)^{2}}\right)\\
\frac{\partial f}{\partial a} &= 1\\
\end{align*}$$

In [None]:
def general_jacobian(t, t0, m, k, beta, delta, a):
    """Returns the Jacobi matrix of the general expression model function
    with time along axis=0 and parameters along axis=1"""

    # Initialize Jacobian
    #jac = np.zeros((np.size(t), 6))
    jac = np.zeros((np.size(t), 4))

    # Get time after onset “kink”
    after_t0 = (t > t0)

    # Define abbreviations for frequent terms
    dt = t[after_t0] - t0
    bmd = beta - delta
    kbd = k + bmd
    kpb = k + beta

    # Derive w.r.t. t0
    jac[after_t0, 0] = m * (k * delta * np.exp(-delta * dt) / bmd / kbd 
                          - beta * np.exp(-beta * dt) / bmd
                          + kpb * np.exp(-kpb * dt) / kbd)

    # Derive w.r.t. m
    jac[after_t0, 1] = (k * np.exp(-delta * dt) / bmd / kbd
                      + np.exp(-kpb * dt) / kbd
                      - np.exp(-beta * dt) / bmd)

    # Derive w.r.t. k
    #jac[after_t0, 2] = m * (-k * np.exp(-delta * dt) / bmd / kbd**2
    #                       - dt * np.exp(-kpb * dt) / kbd
    #                       - np.exp(-kpb * dt) / kbd**2
    #                       + np.exp(-delta * dt) / bmd / kbd)

    # Derive w.r.t. beta
    #jac[after_t0, 3] = m * (-k * np.exp(-delta * dt) / bmd / kbd**2
    #                       - k * np.exp(-delta * dt) / bmd**2 / kbd
    #                       - dt * np.exp(-kpb * dt) / kbd
    #                       + dt * np.exp(-beta * dt) / bmd
    #                       - np.exp(-kpb * dt) / kbd**2
    #                       + np.exp(-beta * dt) / bmd**2)

    # Derive w.r.t. delta
    jac[after_t0, 2] = m * (-k * dt * np.exp(-delta * dt) / bmd / kbd
                           + k * np.exp(-delta * dt) / bmd / kbd**2
                           + k * np.exp(-delta * dt) / bmd**2 / kbd
                           + np.exp(-kpb * dt) / kbd**2
                           - np.exp(-beta * dt) / bmd**2)

    # Derive w.r.t. a
    jac[:, 3] = 1

    return jac

In [None]:
def red_jacobian(t, tr, m_ktl, kmr, betr, deltr, offr):
    """Wrapper function for Jacobian of red model function"""
    return general_jacobian(t=t, t0=tr, m=m_ktl, k=kmr, beta=betr, delta=deltr, a=offr)

def green_jacobian(t, tg, m_ktl, kmg, betg, deltg, offg):
    """Wrapper function for Jacobian of red model function"""
    return general_jacobian(t=t, t0=tg, m=m_ktl, k=kmg, beta=betg, delta=deltg, a=offg)

## Read in data and prepare result list

In [None]:
# Calculate kernel density estimation of parameter distributions
def parameter_KDE(par_tab):
    """Returns a kernel density estimation of the parameter values for plotting"""
    dens_res = 200
    bw_div = 15

    par_dist = {}

    for par_name in par_tab.columns:
        # Get parameter values
        par_vals = par_tab.loc[:,par_name].values
        par_vals = par_vals.reshape((par_vals.size, 1))

        # Test parameter values for validity
        if np.any(np.logical_not(np.isfinite(par_vals))):
            print("Warning: invalid values encountered for “{}”".format(par_name))
            par_vals = par_vals[np.isfinite(par_vals)]
            if par_vals.size > 0:
                # Calculate distribution of valid entries
                par_vals = par_vals.reshape((par_vals.size, 1))
            else:
                # No valid entries found; cancel distribution calculation
                par_dist[par_name] = {'val': [], 'prob': []}
                continue

        # Get parameter extrema and bandwidth
        par_min = np.min(par_vals)
        par_max = np.max(par_vals)
        bw = (par_max - par_min) / bw_div

        # Get kernel density estimation of parameter values
        kde = KernelDensity(kernel='epanechnikov', bandwidth=bw).fit(par_vals)
        par_x = np.linspace(par_min, par_max, dens_res).reshape((dens_res, 1))
        par_dens = np.exp(kde.score_samples(par_x))

        # Adjust values for nicer plotting (KDE >= 0, edges == 0)
        #par_dens[par_dens < 0] = 0
        if par_dens[0] != 0:
            par_dens = np.insert(par_dens, 0, 0)
            par_x = np.insert(par_x, 0, par_min)
        if par_dens[-1] != 0:
            par_dens = np.append(par_dens, 0)
            par_x = np.append(par_x, par_max)

        # Insert KDE into dict
        par_dist[par_name] = {'val': par_x.flatten(), 'prob': par_dens.flatten()}
    return par_dist

In [None]:
def plot_kde(ax, dist, label, clr_face='b', clr_edge='k', mark=None):
    """Plots the current parameter value in relation to the distribution
    in the whole dataset."""
    ax.fill_betweenx(dist['val'], dist['prob'], color=clr_face)
    if mark != None:
        ax.axhline(y=mark, color=clr_edge)
    ax.set_xticks([])
    ax.spines['left'].set_position('zero')
    for s in [ax.spines[pos] for pos in ['bottom', 'right', 'top']]:
        s.set_visible(False)
    ax.set_title(label)

In [None]:
def getDataLabel(d, filename=False):
    """Returns a nicely formatted name for the data info dict `d`.
    `d` must have the keys "measurement", "sample" and "condition",
    such as the elements of `D`.
    Set `filename=True` for a filename-friendly output."""
    if filename:
        return "{0[measurement]}_{0[sample]}_{0[condition]}".format(d)
    return "{0[sample]}: {0[condition]} [{0[measurement]}]".format(d)

In [None]:
# Prepare data loading

# Define available files
datafiles = [
    {
    #    "sample": "A549",
    #    "condition": "control",
    #    "measurement": "Test",
    #    "file": "data/A549_control_test.xlsx"
    #}, {
        "sample": "A549",
        "condition": "siRNA",
        "measurement": "2016-01-09_seq3",
        "file": "data/2016-01-09_seq3_A549_siRNA_#molecules.xlsx"
    }, {
        "sample": "A549",
        "condition": "control",
        "measurement": "2016-01-09_seq5",
        "file": "data/2016-01-09_seq5_A549_Control_#molecules.xlsx"
    }, {
        "sample": "A549",
        "condition": "siRNA",
        "measurement": "2016-12-20_seq3",
        "file": "data/2016-12-20_seq3_A549_siRNA_#molecules.xlsx"
    }, {
        "sample": "A549",
        "condition": "control",
        "measurement": "2016-12-20_seq4",
        "file": "data/2016-12-20_seq4_A549_control_#molecules.xlsx"
    }, {
        "sample": "Huh7",
        "condition": "siRNA",
        "measurement": "2017-05-26_seq10",
        "file": "data/2017-05-26_seq10_Huh7_siRNA_#molecules.xlsx"
    }, {
        "sample": "Huh7",
        "condition": "siRNA",
        "measurement": "2017-05-26_seq11",
        "file": "data/2017-05-26_seq11_Huh7_siRNA_#molecules.xlsx"
    }, {
        "sample": "Huh7",
        "condition": "siRNA",
        "measurement": "2017-06-02_seq4",
        "file": "data/2017-06-02_seq4_Huh7_siRNA_#molecules.xlsx"
    }, {
        "sample": "Huh7",
        "condition": "siRNA",
        "measurement": "2017-06-02_seq5",
        "file": "data/2017-06-02_seq5_Huh7_siRNA_#molecules.xlsx"
    }, {
        "sample": "Huh7",
        "condition": "control",
        "measurement": "2017-05-26_seq6",
        "file": "data/2017-05-26_seq6_Huh7_control_#molecules.xlsx"
    }, {
        "sample": "Huh7",
        "condition": "control",
        "measurement": "2017-05-26_seq7",
        "file": "data/2017-05-26_seq7_Huh7_control_#molecules.xlsx"
    }, {
        "sample": "Huh7",
        "condition": "control",
        "measurement": "2017-06-02_seq6",
        "file": "data/2017-06-02_seq6_Huh7_control_#molecules.xlsx"
    }, {
        "sample": "Huh7",
        "condition": "control",
        "measurement": "2017-06-02_seq7",
        "file": "data/2017-06-02_seq7_Huh7_control_#molecules.xlsx"
    }
]

# By default, mark all files for loading
load_idcs = range(len(datafiles))

# Define function for loading data
def load_data_from_files():
    """Loads data from specified files into `D`.
    Requires `load_idcs` to hold a list of indices to `datafiles`."""
    global D
    D = []
    for i in load_idcs:
        # Show message
        print("Loading file: {}".format(datafiles[i]["file"]))

        # Read sheets from excel file
        X = pd.read_excel(datafiles[i]['file'], dtype=np.float64, sheet_name=[
            '#RFP', '#GFP_corrected', '#RFP_error', '#GFP_error'])

        # Write data into easy-to-access structure
        d = {}
        d['sample'] = datafiles[i]['sample']
        d['condition'] = datafiles[i]['condition']
        d['measurement'] = datafiles[i]['measurement']
        d['file'] = datafiles[i]['file']
        d['t'] = X['#RFP'].values[:,0].flatten()
        #d['rfp'] = X['RFP'].values[:,1:]
        #d['gfp'] = X['GFP_corrected'].values[:,1:]
        d['rfp'] = X['#RFP'].values[:,1:]
        d['gfp'] = X['#GFP_corrected'].values[:,1:]
        d['rfp_error'] = X['#RFP_error'].values[:,1:]
        d['gfp_error'] = X['#GFP_error'].values[:,1:]
        d['nTraces'] = np.shape(d['rfp'])[1]
        d['iFile'] = i
        D.append(d)

In [None]:
# Read in data from excel sheets

# Prompt user for files to load
lbl = wdg.Label('Select the files to load:')
lbl.layout.width = 'initial'
entries = []
for f in datafiles:
    entries.append("{} {}: {}".format(
        f['sample'], f['condition'], f['file']))
sel_entry = wdg.SelectMultiple(options=entries, rows=len(entries))
sel_entry.layout.width = 'initial'
bload = wdg.Button(description='Load')
bselall = wdg.Button(description='Select all')
bselnone = wdg.Button(description='Select none')

# Define callbacks
def sel_all_files(_):
    sel_entry.value = entries
def sel_no_files(_):
    sel_entry.value = ()
def load_button_clicked(_):
    global load_idcs
    load_idcs = [entries.index(r) for r in sel_entry.value]
    vb.close()
    load_data_from_files()
bselall.on_click(sel_all_files)
bselnone.on_click(sel_no_files)
bload.on_click(load_button_clicked)

# Finally, show the widgets
vb = wdg.VBox((lbl, sel_entry, wdg.HBox((bload,bselall,bselnone))))
IPython.display.display(vb)

In [None]:
# Insert here the timestamp you want to enforce for the parameters.

# `require_timestamp` is the enforced timestamp as string or
# `None` for loading the latest file.
# Example:
# to enforce timestamp "2012-03-04–123456",
# execute: require_timestamp = "2012-03-04–123456"
require_timestamp = None

In [None]:
# Assess data files to load
outpath = getOutpath()
file_re = re.compile(r"(?P<date>\d{4}-\d{2}-\d{2}–\d{6})_fixed_distribution_moments\.xlsx$")

# Search file for fixed parameters
# After this cell, `fixparfile` will be a string of the path to the file
# holding the fixed parameter values, or `None` if no file was found.
fixparfile = None
ts = None
files_found = {}

# Iteratively search parameter files and add them to dictionary
for f in os.listdir(outpath):
    m = file_re.match(f)
    if not m:
        continue
    files_found[m.group('date')] = os.path.join(outpath, m.string)

# Select newest (or preferred) timestamp or set `fixparfile` to None,
# if no file was found
if len(files_found) > 0:
    if require_timestamp:
        if require_timestamp in files_found.keys():
            # Make `files_found` point to the requested file
            fixparfile = files_found[require_timestamp]
            ts = require_timestamp
        else:
            print("Requested timestamp “{}” not found.".format(require_timestamp))
    else:
        ts = sorted(files_found.keys(), reverse=True)[0]
        fixparfile = files_found[ts]

if fixparfile is not None:
    print("Found parameter data with timestamp “{}”:\n{}".format(ts, fixparfile))
else:
    print("No parameter data found.")

# DEBUG
#for d, c in datafiles.items():
#    print(d + ": " + str(c))

In [None]:
# Load fixed parameters
F = pd.read_excel(fixparfile, dtype=np.float64, sheet_name=['red', 'green'])

red_p.set('kmr', fixed=True, value=F['red'].loc['kmr', 'mean'])
red_p.set('betr', fixed=True, value=F['red'].loc['betr', 'mean'])

green_p.set('kmg', fixed=True, value=F['green'].loc['kmg', 'mean'])
green_p.set('betg', fixed=True, value=F['green'].loc['betg', 'mean'])

combined_p.set('kmr', fixed=True, value=F['red'].loc['kmr', 'mean'])
combined_p.set('betr', fixed=True, value=F['red'].loc['betr', 'mean'])
combined_p.set('kmg', fixed=True, value=F['green'].loc['kmg', 'mean'])
combined_p.set('betg', fixed=True, value=F['green'].loc['betg', 'mean'])

In [None]:
# Provide output tables

# Initialize result dictionary
R = []

# Get a list of fit parameters
par_names = green_p.names().tolist()
par_names.extend(p for p in red_p.names() if p not in par_names)
par_names.sort()

# Iteratively populate the result dictionary
for k in range(len(D)):
    R.insert(k, {})
    nTraces = np.shape(D[k]['gfp'])[1]
    nTimes = np.shape(D[k]['gfp'])[0]
    tpl_traces = np.empty((nTimes, nTraces))
    tpl_traces.fill(np.NaN)

    R[k]['green'] = {}
    R[k]['green']['params'] = pd.DataFrame(index=np.arange(nTraces), columns=green_p.names(), dtype='float64')
    R[k]['green']['fit'] = np.copy(tpl_traces)
    R[k]['green']['success'] = np.zeros(nTraces, dtype=np.bool_)
    R[k]['green']['chisq'] = np.full(nTraces, np.NaN)

    R[k]['red'] = {}
    R[k]['red']['params'] = pd.DataFrame(index=np.arange(nTraces), columns=red_p.names(), dtype='float64')
    R[k]['red']['fit'] = np.copy(tpl_traces)
    R[k]['red']['success'] = np.zeros(nTraces, dtype=np.bool_)
    R[k]['red']['chisq'] = np.full(nTraces, np.NaN)

    R[k]['combined'] = {}
    R[k]['combined']['params'] = pd.DataFrame(index=np.arange(nTraces), columns=combined_p.names(), dtype='float64')
    #R[k]['combined']['fit'] = {}
    R[k]['combined']['success'] = np.zeros(nTraces, dtype=np.bool_)
    R[k]['combined']['chisq'] = np.full((nTraces, 2), np.NaN)

### Pickle or load fitting results
Pickling is only reasonable if the result list `R` has already been populated by fitting (see below).

In [None]:
# Pickle fit results for future sessions
outfile = getOutpath('fit_results.pickled')
with open(outfile, 'wb') as f:
    pickle.dump(R, f)

In [None]:
# Load pickled results (requires file suffix “.pickled”)
pickfiles = [f for f in os.listdir(getOutpath()) if f.lower().endswith('.pickled')]
pickfiles.sort(reverse=True)

lbl = wdg.Label('Select the file to load:')
lbl.layout.width = 'initial'
rad = wdg.RadioButtons(options=pickfiles)
but = wdg.Button(description='Load')
vb = wdg.VBox([lbl, rad, but])
IPython.display.display(vb)

def clicked_on_but(b):
    global R
    with open(getOutpath(rad.value, ''), 'rb') as f:
        R = pickle.load(f)
    print('Loaded: ' + rad.value)
    vb.close()
but.on_click(clicked_on_but)

In [None]:
# Write results to XLSX
if len(R) != len(D):
    raise ValueError("R and D must have the same length!")

for k in range(len(D)):
    # Collect information about this file
    measurement = D[k]['measurement']
    sample = D[k]['sample']
    condition = D[k]['condition']
    time = D[k]['t']
    rfp_raw = D[k]['rfp']
    gfp_raw = D[k]['gfp']
    rfp_error = D[k]['rfp_error']
    gfp_error = D[k]['gfp_error']

    rfp_fit = R[k]['red']['fit']
    gfp_fit = R[k]['green']['fit']
    rfp_params = R[k]['red']['params']
    gfp_params = R[k]['green']['params']
    rfp_chsq = R[k]['red']['chisq']
    gfp_chsq = R[k]['green']['chisq']

    try:
        hasCombined = True
        cmb_fit_rfp = R[k]['combined']['fit']['red']
        cmb_fit_gfp = R[k]['combined']['fit']['green']
        cmb_params = R[k]['combined']['params']
        cmb_chsq = R[k]['combined']['chisq']
    except:
        hasCombined = False

    # Write data to file
    file = getOutpath("{}_{}_{}.xlsx".format(sample, measurement, condition))
    xlsx_writer = pd.ExcelWriter(file, engine='xlsxwriter')

    pd.DataFrame(time).to_excel(xlsx_writer, sheet_name="t")
    pd.DataFrame(rfp_raw).to_excel(xlsx_writer, sheet_name="RFP_raw")
    pd.DataFrame(gfp_raw).to_excel(xlsx_writer, sheet_name="GFP_raw")
    pd.DataFrame(rfp_error).to_excel(xlsx_writer, sheet_name="RFP_error")
    pd.DataFrame(gfp_error).to_excel(xlsx_writer, sheet_name="GFP_error")

    pd.DataFrame(rfp_fit).to_excel(xlsx_writer, sheet_name="RFP_fit")
    pd.DataFrame(gfp_fit).to_excel(xlsx_writer, sheet_name="GFP_fit")
    pd.DataFrame(rfp_chsq).to_excel(xlsx_writer, sheet_name="RFP_chisq")
    pd.DataFrame(gfp_chsq).to_excel(xlsx_writer, sheet_name="GFP_chisq")
    rfp_params.to_excel(xlsx_writer, sheet_name="RFP_params")
    gfp_params.to_excel(xlsx_writer, sheet_name="GFP_params")

    if hasCombined:
        pd.DataFrame(cmb_fit_rfp).to_excel(xlsx_writer, sheet_name="RFP_fit_cmb")
        pd.DataFrame(cmb_fit_gfp).to_excel(xlsx_writer, sheet_name="GFP_fit_cmb")
        cmb_params.to_excel(xlsx_writer, sheet_name="params_cmb")
        pd.DataFrame(cmb_chsq).to_excel(xlsx_writer, sheet_name="chisq_cmb")
    
    xlsx_writer.save()

## Fit and plot separate models

In [None]:
def plotSeparate(ds, tr, pdf=None, par_kde=None):
    """Fits and plots the data, treating RFP and GFP separately.

    Keyword arguments:
    ds -- the dictionary key of the dataset
    tr -- the index of the trace in the dataset to be processed
    pdf -- a PdfPages object to which the figure is written if it is not None
    par_kde -- if containing dict of values of parameter distributions, plot distributions
    """

    # Plot fit results
    fig = plt.figure()

    if par_kde != None:
        fig.set_figwidth(1.6 * fig.get_figwidth())

        pn_red = [p for p in ['m_ktl', 'tr', 'kmr', 'betr', 'deltr', 'offr']
                 if not red_p.df.loc[p,'fixed']]
        pn_green = [p for p in ['m_ktl', 'tg', 'kmg', 'betg', 'deltg', 'offg']
                 if not green_p.df.loc[p,'fixed']]

        grid = (2, max(len(pn_red), len(pn_green)))
        gs = GridSpec(grid[0], grid[1])

        # Plot green parameters
        for pi, label in enumerate(pn_green):
            ax = plt.subplot(gs.new_subplotspec((0, pi)))
            data = par_kde['green'][label]
            clr_face = '#00ff0055'
            clr_edge = '#009900ff'
            curr_val = R[ds]['green']['params'].loc[tr,label]
            plot_kde(ax, data, label, clr_face, clr_edge, curr_val)

        # Plot red parameters
        for pi, label in enumerate(pn_red):
            ax = plt.subplot(gs.new_subplotspec((1, pi)))
            data = par_kde['red'][label]
            clr_face = '#ff000055'
            clr_edge = '#990000ff'
            curr_val = R[ds]['red']['params'].loc[tr,label]
            plot_kde(ax, data, label, clr_face, clr_edge, curr_val)

        # Adjust subplot layout
        gs.tight_layout(fig, pad=0, rect=(0.5, 0, 1, 1))

        # Create axes for fit
        gs_fit = GridSpec(1, 1)
        ax = fig.add_subplot(gs_fit[0])
        gs_fit.tight_layout(fig, pad=0, rect=(0, 0, 0.5, 1))

    else:
        ax = fig.gca()

    p_tr = ax.axvline(R[ds]['red']['params']['tr'][tr], label='RFP onset',
                       color='#ff0000', linewidth=.5, linestyle='--')
    p_tg = ax.axvline(R[ds]['green']['params']['tg'][tr], label='GFP onset',
                      color='#00ff00', linewidth=.5, linestyle='--')
    p_fr, = ax.plot(D[ds]['t'], R[ds]['red']['fit'][:,tr], '-', label='RFP (fit)', color='#ff0000', linewidth=1)
    p_fg, = ax.plot(D[ds]['t'], R[ds]['green']['fit'][:,tr], '-', label='GFP (fit)', color='#00ff00', linewidth=1)
    p_dr, = ax.plot(D[ds]['t'], D[ds]['rfp'][:,tr], '-', label='RFP (measured)', color='#990000', linewidth=.5)
    p_dg, = ax.plot(D[ds]['t'], D[ds]['gfp'][:,tr], '-', label='GFP (measured)', color='#009900', linewidth=.5)

    # Format plot
    ax.set_xlabel('Time [h]')
    ax.set_ylabel('Number of molecules [10³]')
    ax.set_title('{} #{:03d}\n(separate fit)'.format(getDataLabel(D[ds]), tr))
    ax.legend(handles=[p_dg, p_fg, p_tg, p_dr, p_fr, p_tr])

    # Write figure to pdf
    if pdf != None:
        pdf.savefig(fig, bbox_inches='tight')

    # Show and close figure
    if i_plot < 200:
        i_plot += 1
        plt.show(fig)
    plt.close(fig)

In [None]:
# Fit traces separately
for ds in range(len(D)):
    nTraces = np.shape(D[ds]['rfp'])[1]

    for tr in range(nTraces):
        print('Fitting „{}“ #{:03d}/{:03d} …'.format(getDataLabel(D[ds]), tr, nTraces))
        
        # Prepare data
        time = D[ds]['t']
        data_red = D[ds]['rfp'][:,tr].flatten()
        data_green = D[ds]['gfp'][:,tr].flatten()

        wght_red = D[ds]['rfp_error'][:,tr]**2
        wght_green = D[ds]['gfp_error'][:,tr]**2
        
        # Adjust parameter properties for onset time and offset
        red_p.set('tr', max=time[data_red.argmax()])
        red_p.set('offr',
                       min=data_red[:10].min(),
                       max=data_red[:10].max(),
                       value=np.median(data_red[:10]))
        green_p.set('tg', max=time[data_green.argmax()])
        green_p.set('offg',
                       min=data_green[:10].min(),
                       max=data_green[:10].max(),
                       value=np.median(data_green[:10]))

        # Assess fixed parameters
        red_fixed = red_p.df.loc[red_p.df.index[red_p.df.loc[:,'fixed']],'value'].to_dict()
        green_fixed = green_p.df.loc[green_p.df.index[green_p.df.loc[:,'fixed']],'value'].to_dict()

        # Objective function (closure)
        def objective_red(params):
            """Objective function for RFP in separate model"""
            cur_val = red_p.eval(params, t=time)
            lik = np.sum(.5 * (data_red - cur_val)**2 / wght_red)
            return lik

        # Jacobian/gradient (closure)
        def gradient_red(params):
            """Gradient for RFP in separate model"""
            J = red_jacobian(**red_p.eval_params(params, t=time))
            residuals = (data_red - red_p.eval(params, t=time)).reshape((np.size(time),1))
            vrnc = wght_red.reshape(np.shape(residuals))
            return -np.sum(J * residuals / vrnc, axis=0).flatten()

        # Fit the data
        result = sc.optimize.minimize(objective_red,
                                      red_p.initial(),
                                      method='TNC',# one of: 'SLSQP' 'TNC' 'L-BFGS-B'
                                      bounds=red_p.bounds(),
                                      jac=gradient_red,
                                      options={'disp':True,
                                               'maxiter': 10000}
                                     )

        # Print result
        print("\tRed success {}: {}".format(result.success, result.message))

        # Save results to R
        res_red = red_p.free_values(result.x)
        R[ds]['red']['params'].iloc[tr] = res_red#result.x
        best_fit = red(time, **res_red, **red_fixed)
        R[ds]['red']['fit'][:,tr] = best_fit
        R[ds]['red']['success'][tr] = result.success
        R[ds]['red']['chisq'][tr] = np.sum((best_fit - data_red)**2)

        # Fit green data
        def objective_green(params):
            """Objective function for green model"""
            cur_val = green_p.eval(params, t=time)
            lik = np.sum(.5 * (data_green - cur_val)**2 / wght_green)
            return lik

        def gradient_green(params):
            """Gradient for GFP in separate model"""
            J = green_jacobian(**green_p.eval_params(params, t=time))
            residuals = (data_green - green_p.eval(params, t=time)).reshape((np.size(time),1))
            vrnc = wght_green.reshape(np.shape(residuals))
            return -np.sum(J * residuals / vrnc, axis=0).flatten()

        result = sc.optimize.minimize(objective_green,
                                      green_p.initial(),
                                      method='TNC',
                                      bounds=green_p.bounds(),
                                      jac=gradient_green,
                                      options={'disp': True,
                                               'maxiter': 10000})
        print("\tGreen success {}: {}".format(result.success, result.message))

        res_green = green_p.free_values(result.x)
        R[ds]['green']['params'].iloc[tr] = res_green
        best_fit = green(time, **res_green, **green_fixed)
        R[ds]['green']['fit'][:,tr] = best_fit
        R[ds]['green']['success'][tr] = result.success
        R[ds]['green']['chisq'][tr] = np.sum((best_fit - data_green)**2)

        # DEBUG
        #if tr >= 2:
        #    print("Breaking loop for debugging purposes")
        #    break

In [None]:
# Plot results of separate fit
ts = getTimeStamp()
i_plot = 0
for ds in range(len(D)):
    par_kde = {}
    for t in ('red', 'green'):
        par_kde[t] = parameter_KDE(R[ds][t]['params'])
    pdffile = os.path.join(getOutpath(), '{}_separate_{}.pdf'.format(ts, getDataLabel(D[ds], True)))
    with PdfPages(pdffile) as pdf:
        for tr in range(np.shape(D[ds]['rfp'])[1]):
            plotSeparate(ds, tr, pdf, par_kde)

            # DEBUG
            #if tr >= 2:
            #    print("Break loop")
            #    break

## Fit and plot combined model

In [None]:
def plotCombined(ds, tr, pdf=None, par_kde=None):
    """Fits and plots the data, treating RFP and GFP together.
    
    Keyword arguments:
    ds -- the dictionary key of the dataset
    tr -- the index of the trace in the dataset to be processed
    pdf -- a PdfPages object to which the figure is written if it is not None
    par_kde -- if containing dict of values of parameter distributions, plot distributions
    """

    # Plot fit results
    fig = plt.figure()
    #fig.set_tight_layout(False)

    if par_kde != None:
        fig.set_figwidth(1.6 * fig.get_figwidth())

        pn_both = ['m_ktl']
        pn_red = [p for p in ['tr', 'kmr', 'betr', 'deltr', 'offr']
                 if not red_p.df.loc[p, 'fixed']]
        pn_green = [p for p in ['tg', 'kmg', 'betg', 'deltg', 'offg']
                   if not green_p.df.loc[p, 'fixed']]

        # Plot combined parameters
        grid = (2, 1+max(len(pn_red), len(pn_green)))
        gs = GridSpec(grid[0], grid[1])

        #for pi, label in enumerate(pn_both):
        pi = 0
        label = pn_both[pi]
        ax = plt.subplot(gs.new_subplotspec((pi, 0), rowspan=2))
        data = par_kde['combined'][label]
        clr_face = '#0000ff55'
        clr_edge = '#000099ff'
        curr_val = R[ds]['combined']['params'].loc[tr,label]
        plot_kde(ax, data, label, clr_face, clr_edge, curr_val)

        # Plot green parameters
        for pi, label in enumerate(pn_green):
            ax = plt.subplot(gs.new_subplotspec((0, pi+1)))
            data = par_kde['combined'][label]
            clr_face = '#00ff0055'
            clr_edge = '#009900ff'
            curr_val = R[ds]['combined']['params'].loc[tr,label]
            plot_kde(ax, data, label, clr_face, clr_edge, curr_val)

        # Plot red parameters
        for pi, label in enumerate(pn_red):
            ax = plt.subplot(gs.new_subplotspec((1, pi+1)))
            data = par_kde['combined'][label]
            clr_face = '#ff000055'
            clr_edge = '#990000ff'
            curr_val = R[ds]['combined']['params'].loc[tr,label]
            plot_kde(ax, data, label, clr_face, clr_edge, curr_val)

        # Adjust subplot layout
        gs.tight_layout(fig, pad=0, rect=(0.5, 0, 1, 1))

        # Create axes for fit
        gs_fit = GridSpec(1, 1)
        ax = fig.add_subplot(gs_fit[0])
        gs_fit.tight_layout(fig, pad=0, rect=(0, 0, 0.5, 1))

    else:
        ax = fig.gca()

    #wr = np.sqrt(D[ds]['rfp'][:,tr])
    #ax.fill_between(D[ds]['t'], D[ds]['rfp'][:,tr]-wr, D[ds]['rfp'][:,tr]+wr, color='#ff000033')
    #wg = np.sqrt(D[ds]['gfp'][:,tr])
    #ax.fill_between(D[ds]['t'], D[ds]['gfp'][:,tr]-wg, D[ds]['gfp'][:,tr]+wg, color='#00ff0033')

    p_tr = ax.axvline(R[ds]['combined']['params']['tr'][tr], label='RFP onset',
                       color='#ff0000', linewidth=.5, linestyle='--')
    p_tg = ax.axvline(R[ds]['combined']['params']['tg'][tr], label='GFP onset',
                      color='#00ff00', linewidth=.5, linestyle='--')
    p_fr, = ax.plot(D[ds]['t'], R[ds]['combined']['fit']['red'][tr], '-', label='RFP (fit)', color='#ff0000', linewidth=1)
    p_fg, = ax.plot(D[ds]['t'], R[ds]['combined']['fit']['green'][tr], '-', label='GFP (fit)', color='#00ff00', linewidth=1)
    p_dr, = ax.plot(D[ds]['t'], D[ds]['rfp'][:,tr], '-', label='RFP (measured)', color='#990000', linewidth=.5)
    p_dg, = ax.plot(D[ds]['t'], D[ds]['gfp'][:,tr], '-', label='GFP (measured)', color='#009900', linewidth=.5)

    # Format plot
    ax.set_xlabel('Time [h]')
    ax.set_ylabel('Fluorescence intensity [a.u.]')
    ax.set_title('{} {} [{}] #{:03d}\n(combined fit)'.format(
        D[ds]['sample'], D[ds]['condition'], D[ds]['measurement'], tr))
    ax.legend(handles=[p_dg, p_fg, p_tg, p_dr, p_fr, p_tr])

    # Write figure to pdf
    if pdf != None:
        pdf.savefig(fig, bbox_inches='tight')

    # Show and close figure
    plt.show(fig)
    plt.close(fig)

In [None]:
# Fit combined model
combined_fixed = combined_p.fixed_params()
for ds in range(len(D)):
    R[ds]['combined']['fit'] = {'red': [], 'green': []}
    nTraces = np.shape(D[ds]['rfp'])[1]

    for tr in range(nTraces):
        print('Fitting „{}“ #{:03d}/{:03d}. '.format(getDataLabel(D[ds]), tr, nTraces))#, end='')

        # Get the data for fitting
        time = D[ds]['t']
        data = np.stack([D[ds]['rfp'][:,tr], D[ds]['gfp'][:,tr]], axis=1)
        wght = np.stack([D[ds]['rfp_error'][:,tr], D[ds]['gfp_error'][:,tr]], axis=1)**2
        
        # Adjust parameter properties for onset time and offset
        combined_p.set('tr', max=time[data[:,0].argmax()])
        combined_p.set('offr',
                       min=data[:10,0].min(),
                       max=data[:10,0].max(),
                       value=np.median(data[:10,0]))
        combined_p.set('tg', max=time[data[:,1].argmax()])
        combined_p.set('offg',
                       min=data[:10,1].min(),
                       max=data[:10,1].max(),
                       value=np.median(data[:10,1]))

        # Get amplitude correction
        amp_red = data[:,0].max() - data[:,0].min()
        amp_green = data[:,1].max() - data[:,1].min()
        amp_correct = amp_red - amp_green

        # Fit the data
        def objective_fcn(params):
            """Objective function for combined model"""
            cur_val = combined_p.eval(params, t=time)
            chisq = np.sum(.5 * (data - cur_val)**2 / wght)
            return chisq

        def gradient_combined(params):
            """Gradient for combined model"""
            J_red = red_jacobian(**{p: v for p, v in combined_p.eval_params(params, t=time).items()
                                    if p in {'t', 'm_ktl', 'tr', 'kmr', 'betr', 'deltr', 'offr'}})
            J_green = green_jacobian(**{p: v for p, v in combined_p.eval_params(params, t=time).items()
                                    if p in {'t', 'm_ktl', 'tg', 'kmg', 'betg', 'deltg', 'offg'}})
            residuals = (data - combined_p.eval(params, t=time)) / wght
            Jr = -np.sum(residuals[:,np.newaxis,0] * J_red, axis=0)
            Jg = -np.sum(residuals[:,np.newaxis,1] * J_green, axis=0)

            # Assemble gradient according to parameter order:
            # (tr, tg, m_ktl, kmr, kmg, betr, betg, deltr, deltg, offr, offg)
            #return np.array([Jr[0], Jg[0], Jr[1]+Jg[1], Jr[2], Jg[2], Jr[3], Jg[3],
            #                 Jr[4], Jg[4], Jr[5], Jg[5]])
            return np.array([Jr[0], Jg[0], Jr[1]+Jg[1], Jr[2], Jg[2], Jr[3], Jg[3]])

        result = sc.optimize.minimize(objective_fcn,
                                      combined_p.initial(),
                                      method='TNC',#'L-BFGS-B','TNC'
                                      bounds=combined_p.bounds(),
                                      jac=gradient_combined,
                                      options={'disp': True,
                                               'maxiter': 10000})

        # Save results to R
        res_combined = combined_p.free_values(result.x)
        best_fit = combined(time, **res_combined, **combined_fixed)
        tr_idx = R[ds]['combined']['params'].index[tr]

        R[ds]['combined']['params'].loc[tr_idx, list(res_combined.keys())] = res_combined
        #R[ds]['combined']['params'].loc[tr_idx, combined_fixed.keys()] = combined_fixed

        R[ds]['combined']['fit']['red'].insert(tr, best_fit[:,0])
        R[ds]['combined']['fit']['green'].insert(tr, best_fit[:,1])

        R[ds]['combined']['success'][tr] = result.success
        R[ds]['combined']['chisq'][tr,:] = np.sum((best_fit - data)**2, axis=0)

        # Print result
        print("\tSuccess {} after {} iterations: {}".format(
            result.success, result.nit, result.message))

        # DEBUG
        #if tr >= 20:
        #    print("Breaking loop for debugging purposes")
        #    break

In [None]:
# Plot results of combined fit
ts = getTimeStamp()
for ds in range(len(D)):
    par_kde = {}
    for t in ('combined',):
        par_kde[t] = parameter_KDE(R[ds][t]['params'])
    pdffile = os.path.join(getOutpath(), '{}_combined_{}.pdf'.format(ts, getDataLabel(D[ds], True)))
    with PdfPages(pdffile) as pdf:
        for tr in range(np.shape(D[ds]['rfp'])[1]):
            plotCombined(ds, tr, pdf, par_kde=par_kde)
            
            # DEBUG
            #if tr >= 40:
            #    print("Forcing break")
            #    break

In [None]:
# Compare chisquare
print("     {:10s} {:10s}".format('red', 'green'))
for i in range(nTraces):
    print("{:03d}: {:10.0f} {:10.0f}".format(
        i,
        R[0]['red']['chisq'][i] - R[0]['combined']['chisq'][i,0],
        R[0]['green']['chisq'][i] - R[0]['combined']['chisq'][i,1]
    ))

## Parameter correlations ($t_0$, $mk_\text{tl}$, $\delta$)

In [None]:
# Set this to True to plot all figures in this section log-log, else set to False
want_loglog = True
want_ellipse = True
want_trim = False
want_clean = True

In [None]:
def do_PCA(p1, p2, n_sigma=1, draw=True, isLog=False):
    """
    Calculates properties of a confidence ellipse by PCA.

    Input:
        p1 -- 1dim array of observations of first (horizontal) dimension
        p2 -- 1dim array of observations of second (vertical) dimension
        n_sigma -- number of confidence intervals to use as ellipse radius
        draw -- boolean flag whether to create Ellipse artist
        isLog -- if True, all returned values refer to the log10 of p1 and p2

    Returns dictionary with fields:
        center -- tuple (x,y) of ellipse center coordinates
        width -- total (horizontal) width of ellipse
        height -- total (vertical) height of ellipse
        angle -- rotation angle of ellipse in degrees
        artist -- matplotlib.patches.Ellipse artist, or None if draw==False
    """
    # Get logarithm of values
    if isLog:
        p1 = np.log10(p1)
        p2 = np.log10(p2)

    # Get eigenvalues and eigenvectors of covariance matrix
    eigvals, eigvecs = np.linalg.eig(np.cov(p1, p2))

    # Sort eigenvalues and eigenvectors
    if eigvals[1] > eigvals[0]:
        eigvals = eigvals[::-1]
        eigvecs = eigvecs[:,::-1]

    # Calculate ellipse properties
    widths = n_sigma * np.sqrt(eigvals)
    theta = np.degrees(np.arctan2(*eigvecs[::-1,0]))
    mean1 = np.mean(p1)
    mean2 = np.mean(p2)

    # Build ellipse
    if draw:
        ellipse = mptch.Ellipse(xy=(mean1, mean2), width=widths[0],
                                height=widths[1], angle=theta)
    else:
        ellipse = None

    return {"center": (mean1, mean2),
            "width": widths[0],
            "height": widths[1],
            "angle": theta,
            "artist": ellipse}

def construct_ellipse(width, height, center=(0,0), theta=0, isLog=False):
    """
    Calculates points on the ellipse line.

    Input:
        width -- length of horizontal semi-axis
        height -- length of vertical semi-axis
        center -- tuple of horizontal and vertical coordinates of ellipse center
        theta -- angle (in degrees) of ellipse rotation
        isLog -- if True, transform logarithm for 
    """
    # Calculate points on ellipse
    phi = np.linspace(0, 2 * np.pi, 200)
    vals = np.matrix([width * np.cos(phi), height * np.sin(phi)])

    # Set up rotation matrix
    theta_rad = np.deg2rad(theta)
    c = np.cos(theta_rad)
    s = np.sin(theta_rad)
    R = np.matrix([[c, -s], [s, c]])

    # Rotate and shift ellipse
    vals_trafo = (R * vals + np.matrix([center]).T).A

    # If logarithmic data were given, raise with basis 10
    if isLog:
        vals_trafo = np.power(10, vals_trafo)

    # Ensure that first and last point are numerically equal
    vals_trafo[:,-1] = vals_trafo[:,0]

    return vals_trafo[0,:], vals_trafo[1,:]

def eliminate_outliers(iData, thresh_chsq=3.6, thresh_delta=1e-2):
    """
    Eliminates outliers based on heuristic thresholds.
    A logical index array is returned, in which outliers are marked
    as False and traces to keep as True.

    Input:
        iData -- index of dataset in D and R
        thresh_chsq -- chisquare threshold: sort out traces above threshold
        thresh_delta -- delta threshold: sort out traces below threshold

    Returns:
        idx -- logical array with as many elements as traces
    """
    # Obtain chi-square values
    cqr = R[iData]['red']['chisq']
    cqg = R[iData]['green']['chisq']
    if cqr.shape != cqg.shape:
        raise ValueError("Incompatible shapes of RFP and GFP data.")

    # Get index based on chi-square thresholding
    rawr = D[iData]['rfp']
    rawg = D[iData]['gfp']
    qmr = np.log10(cqr / rawr.max(axis=0))
    qmg = np.log10(cqg / rawg.max(axis=0))
    idx = (qmr < thresh_chsq) & (qmg < thresh_chsq)

    # Sort out outliers based on small delta value
    deltr = R[iData]['red']['params']['deltr'].values
    deltg = R[iData]['green']['params']['deltg'].values
    idx &= (deltr > thresh_delta) & (deltg > thresh_delta)

    return idx

def trimOutliers(x, log=False):
    """Returns a logical index of all non-outliers."""
    if log:
        x = np.log10(x)
    p90 = np.percentile(x, 95)
    thrsh = p90 * 1.5
    i_in = x <= thrsh
    if log:
        p05 = np.percentile(x, 5)
        i_in = np.logical_and(i_in, x >= p05 - .5 * (p90 - p05))
    return i_in

def plotParamCorrelations(p_green, p_red, lbl='', param_name='',
                          param_unit='a.u.', loglog=False, pdf=None,
                          trim=True, fit_diag=False, equal_aspect=False,
                          p2_green=None, p2_red=None, desc1=None, desc2=None,
                          draw_ellipse=True):
    """
    Plots the parameter time correlations for dataset `ds`
    and saves the plot to PDF if `pdf` is a `PdfPages` instance.

    Arguments:
        p_green -- 1-dim array of parameter values for GFP
        p_red -- array of parameter values for RFP, same shape as p_green
        lbl -- dataset label for figure title
        param_name -- parameter name, used for figure labels
        param_unit -- unit of parameter, used in axes labels
        loglog -- if True, scale both axes logarithmically
        pdf -- PdfPages object to save figure to (None for no saving)
        trim -- if True, trim outliers
        fit_diag -- if True, fit and plot diagonal (slope 1) to data
        equal_aspect -- if True, use same scaling for x- and y-axis
        p2_green -- 1-dim array of second dataset for GFP
        p2_red --1-dim array of second dataset for RFP, same shape as p2_green
        desc1 -- short label for first dataset
        desc2 -- short label for second dataset
        draw_ellipse -- if True, draw a confidence ellipse
    """
    # Define constants
    col1 = "#1f77b4"
    col2 = "#ff7f0e"

    if desc1 is not None:
        descc1 = "{}: ".format(desc1)
    else:
        descc1 = ""
    if desc2 is not None:
        descc2 = "{}: ".format(desc2)
    else:
        descc2 = ""

    # Test for second dataset
    hasSecond = False
    if p2_green is not None and p2_red is not None:
        hasSecond = True
    
    # Trim outliers
    out_lbl = ""
    out_lbl2 = ""
    if trim:
        ing = trimOutliers(p_green, log=loglog)
        inr = trimOutliers(p_red, log=loglog)
        if hasSecond:
            ing2 = trimOutliers(p2_green, log=loglog)
            inr2 = trimOutliers(p2_red, log=loglog)

            # Get combined value ranges
            maxg = max(p_green[ing].max(), p2_green[ing2].max())
            ming = min(p_green[ing].min(), p2_green[ing2].min())
            maxr = max(p_red[inr].max(), p2_red[inr2].max())
            minr = min(p_red[inr].min(), p2_red[inr2].min())

            # Trim outliers outside of combined ranges
            in_all = np.all(np.stack((p_green >= ming, p_green <= maxg,
                                      p_red >= minr, p_red <= maxr)), axis=0)
            p_green = p_green[in_all]
            p_red = p_red[in_all]
            in_all2 = np.all(np.stack((p2_green >= ming, p2_green <= maxg,
                                      p2_red >= minr, p2_red <= maxr)), axis=0)
            p2_green = p2_green[in_all2]
            p2_red = p2_red[in_all2]

            # Set outlier labels
            n_out = np.sum(~in_all)
            out_lbl = " ($+${} outlier{})".format(n_out, '' if n_out == 1 else 's')
            n_out2 = np.sum(~in_all2)
            out_lbl2 = " ($+${} outlier{})".format(n_out2, '' if n_out2 == 1 else 's')

        elif not np.all(ing) or not np.all(inr):
            in_all = np.logical_and(ing, inr)
            n_out = np.sum(np.logical_not(in_all))
            p_green = p_green[in_all]
            p_red = p_red[in_all]
            out_lbl = "\n($+${} outlier{})".format(n_out, '' if n_out == 1 else 's')

    # Make figure and axes
    f, ax = plt.subplots(figsize=(4, 4))

    # Plot data
    ax.plot(p_green, p_red, '.', color=col1, ms=2, mec='none', zorder=3)
    if hasSecond:
        ax.plot(p2_green, p2_red, '.', color=col2, ms=2, mec='none', zorder=3)

    # Fix and get axes limits
    if loglog:
        ax.set_xscale('log')
        ax.set_yscale('log')
    ax.autoscale(enable=False)
    if equal_aspect:
        ax.set_aspect('equal')
    xlim = np.array(ax.get_xlim())
    ylim = np.array(ax.get_ylim())

    # Fit and plot line
    if fit_diag:
        offset = so.least_squares(lambda p: p_green - p_red + p, np.zeros(1)).x
        diag_x_vals = np.linspace(xlim.min(), xlim.max(), 200)
        diag_y_vals = diag_x_vals + offset
        ax.plot(diag_x_vals, diag_y_vals, '-', color=col1, lw=.75, zorder=2)

    # Draw confidence ellipse
    if draw_ellipse:
        ellinfo = do_PCA(p_green, p_red, isLog=loglog)
        ell_x, ell_y = construct_ellipse(width=ellinfo['width'],
                        height=ellinfo['height'], center=ellinfo['center'],
                        theta=ellinfo['angle'], isLog=loglog)
        ax.plot(ell_x, ell_y, '-', lw=.5, color=col1, zorder=4)
        ctr = ellinfo['center']
        if loglog:
            ctr = np.power(10, ctr)
        ax.plot(*ctr, marker='+', mec=col1, mew=.75, ms=5, zorder=4)
        if hasSecond:
            ellinfo = do_PCA(p2_green, p2_red, isLog=loglog)
            ell_x, ell_y = construct_ellipse(width=ellinfo['width'],
                            height=ellinfo['height'], center=ellinfo['center'],
                            theta=ellinfo['angle'], isLog=loglog)
            ax.plot(ell_x, ell_y, '-', lw=.5, color=col2, zorder=4)
            ctr = ellinfo['center']
            if loglog:
                ctr = np.power(10, ctr)
            ax.plot(*ctr, marker='+', mec=col2, mew=.75, ms=5, zorder=4)

    # Draw diagonal
    diag_lim = (max(xlim.min(), ylim.min()), min(xlim.max(), ylim.max()))
    ax.plot(diag_lim, diag_lim, '--k', lw=.5, zorder=1)

    # Show legend
    if hasSecond:
        dummy1 = mlin.Line2D([], [], marker='.', ls='', color=col1, label=desc1)
        dummy2 = mlin.Line2D([], [], marker='.', ls='', color=col2, label=desc2)
        ax.legend(handles=(dummy1, dummy2),
                  loc=('lower left' if fit_diag else 'lower right'),
                  borderpad=.1, borderaxespad=.1, labelspacing=.2, handletextpad=.5)
        #ax.legend(handles=(dummy1, dummy2), ncol=2, mode='expand',
        #         loc='lower left', bbox_to_anchor=(0, 1, 1, 0.1))

    # Write offset
    if fit_diag:
        ax.text(xlim.max()-.1, ylim.min()+.1,
                "$t_\mathrm{{r}}-t_\mathrm{{g}}={:.2f}$".format(
                    offset.item()), {'color': col1, 'ha': 'right', 'va': 'bottom'})

    # Write statistics information
    stat_text = "{}{} cells".format(descc1, p_green.size) + out_lbl
    if hasSecond:
        stat_text += "\n{}{} cells".format(descc2, p2_green.size) + out_lbl2
    ax.text(xlim.min() * 1.01, ylim.max() * .99, stat_text,
           {'color': 'k', 'ha': 'left', 'va': 'top'})

    # Axes formatting
    ax.set_xlabel("{} for GFP [{}]".format(param_name, param_unit), color='g')
    ax.set_ylabel("{} for RFP [{}]".format(param_name, param_unit), color='r')
    ax.set_title('GFP/RFP correlation of {}\n{}'.format(param_name, lbl))
    f.tight_layout(pad=0)

    # Show, optionally save, and close the figure
    plt.show(f)
    if pdf is not None:
        pdf.savefig(f)
    plt.close(f)

In [None]:
# Onset time correlations for separate files
with PdfPages(getOutpath("onset_correlations.pdf")) as pdf:
    for i in range(len(R)):
        # Load relevant data
        if want_clean:
            idx = eliminate_outliers(i)
        else:
            idx = np.ones((D[i]['nTraces']), dtype=np.bool_)

        t_green = R[i]['green']['params']['tg'].values[idx]
        t_red = R[i]['red']['params']['tr'].values[idx]
        lbl = getDataLabel(D[i])

        #plotOnsetCorrelations(t_green, t_red, lbl, pdf)
        plotParamCorrelations(t_green, t_red, lbl=lbl, param_name='Onset time',
                param_unit='h', fit_diag=True, equal_aspect=True, pdf=pdf,
                loglog=False, trim=want_trim, draw_ellipse=want_ellipse)

In [None]:
# Correlate onsets of merged datasets
onset_sorted = []
samples = set()
conditions = ["control", "siRNA"]

# Write relevant data in list
for i in range(len(R)):
    if want_clean:
        idx = eliminate_outliers(i)
    else:
        idx = np.ones((D[i]['nTraces']), dtype=np.bool_)

    onset_sorted.append({
        'green': R[i]['green']['params']['tg'].values[idx],
        'red': R[i]['red']['params']['tr'].values[idx],
        'sample': D[i]['sample'],
        'condition': D[i]['condition']
    })

    samples.add(onset_sorted[i]['sample'])
    if onset_sorted[i]['condition'] not in conditions:
        conditions.append(onset_sorted[i]['condition'])

# Iterate over relevant data list
with PdfPages(getOutpath("onset_correlations_merged.pdf")) as pdf:
    for sample in samples:
        for condition in conditions:
            # Get the indices for this property combination
            idx = [i for i in range(len(onset_sorted))
                   if onset_sorted[i]['sample'] == sample
                   and onset_sorted[i]['condition'] == condition]
            #if len(idx) == 0:
            #    continue

            # Merge corresponding datasets
            t_red = np.concatenate([onset_sorted[i]['red'] for i in idx])
            t_green = np.concatenate([onset_sorted[i]['green'] for i in idx])
            lbl = "{} ({})".format(sample, condition)

            # Plot merged datasets
            plotParamCorrelations(t_green, t_red, lbl=lbl, param_name='Onset time',
                    param_unit='h', fit_diag=True, equal_aspect=True, pdf=pdf,
                    loglog=want_loglog, trim=want_trim)

In [None]:
# Correlate onsets of merged datasets in one figure
onset_sorted = []
samples = set()
conditions = ["control", "siRNA"]

# Write relevant data in list
for i in range(len(R)):
    if want_clean:
        idx = eliminate_outliers(i)
    else:
        idx = np.ones((D[i]['nTraces']), dtype=np.bool_)

    onset_sorted.append({
        'green': R[i]['green']['params']['tg'].values[idx],
        'red': R[i]['red']['params']['tr'].values[idx],
        'sample': D[i]['sample'],
        'condition': D[i]['condition']
    })

    samples.add(onset_sorted[i]['sample'])
    if onset_sorted[i]['condition'] not in conditions:
        conditions.append(onset_sorted[i]['condition'])

# Iterate over relevant data list
with PdfPages(getOutpath("onset_correlations_merged_log.pdf")) as pdf:
    for sample in samples:
        lbl = "{}".format(sample)
        p_red = []
        p_green = []
        desc = []

        for condition in conditions:
            # Get the indices for this property combination
            idx = [i for i in range(len(onset_sorted))
                   if onset_sorted[i]['sample'] == sample
                   and onset_sorted[i]['condition'] == condition]
            if len(idx) == 0:
                continue

            # Merge corresponding datasets
            p_red.append(np.concatenate([onset_sorted[i]['red'] for i in idx]))
            p_green.append(np.concatenate([onset_sorted[i]['green'] for i in idx]))
            desc.append(condition)

        # Plot merged datasets
        plotParamCorrelations(p_green[0], p_red[0], lbl=lbl, param_name='Onset time',
                param_unit='h', fit_diag=True, equal_aspect=True, pdf=pdf,
                loglog=False, trim=want_trim,
                p2_green=p_green[1], p2_red=p_red[1], desc1=desc[0], desc2=desc[1])

In [None]:
# m_ktl for separate files
with PdfPages(getOutpath("m-ktl_correlations.pdf")) as pdf:
    for i in range(len(R)):
        # Load relevant data
        if want_clean:
            idx = eliminate_outliers(i)
        else:
            idx = np.ones((D[i]['nTraces']), dtype=np.bool_)

        p_green = R[i]['green']['params']['m_ktl'].values[idx]
        p_red = R[i]['red']['params']['m_ktl'].values[idx]
        lbl = getDataLabel(D[i])

        plotParamCorrelations(p_green, p_red, lbl=lbl, pdf=pdf,
                              param_name=r"$mk_\mathrm{tl}$",
                              param_unit=r"$\mathregular{h^{-1}}$",
                              loglog=True, trim=want_trim)

# delta for separate files
with PdfPages(getOutpath("delta_correlations.pdf")) as pdf:
    for i in range(len(R)):
        # Load relevant data
        if want_clean:
            idx = eliminate_outliers(i)
        else:
            idx = np.ones((D[i]['nTraces']), dtype=np.bool_)

        p_green = R[i]['green']['params']['deltg'].values[idx]
        p_red = R[i]['red']['params']['deltr'].values[idx]
        lbl = getDataLabel(D[i])

        plotParamCorrelations(p_green, p_red, lbl=lbl, pdf=pdf,
                              param_name=r"$\delta$",
                              param_unit=r"$\mathregular{h^{-1}}$",
                              loglog=want_loglog, trim=want_trim)

In [None]:
# Correlate m_ktl and delta of merged datasets
mktl_sorted = []
delta_sorted = []
samples = set()
conditions = ["control", "siRNA"]

# Write relevant data in list
for i in range(len(R)):
    sample = D[i]['sample']
    condition = D[i]['condition']

    samples.add(sample)
    if condition not in conditions:
        conditions.append(condition)

    if want_clean:
        idx = eliminate_outliers(i)
    else:
        idx = np.ones((D[i]['nTraces']), dtype=np.bool_)

    mktl_sorted.append({
        'green': R[i]['green']['params']['m_ktl'].values[idx],
        'red': R[i]['red']['params']['m_ktl'].values[idx],
        'sample': sample,
        'condition': condition
    })
    delta_sorted.append({
        'green': R[i]['green']['params']['deltg'].values[idx],
        'red': R[i]['red']['params']['deltr'].values[idx],
        'sample': sample,
        'condition': condition
    })

# Iterate over merged m_ktl list
with PdfPages(getOutpath("m-ktl_correlations_merged.pdf")) as pdf:
    for sample in samples:
        for condition in conditions:
            # Get the indices for this property combination
            idx = [i for i in range(len(mktl_sorted))
                   if mktl_sorted[i]['sample'] == sample
                   and mktl_sorted[i]['condition'] == condition]
            #if len(idx) == 0:
            #    continue

            # Merge corresponding datasets
            p_red = np.concatenate([mktl_sorted[i]['red'] for i in idx])
            p_green = np.concatenate([mktl_sorted[i]['green'] for i in idx])
            lbl = "{} ({})".format(sample, condition)

            # Plot merged datasets
            plotParamCorrelations(p_green, p_red, lbl=lbl, pdf=pdf,
                    param_name=r"$mk_\mathrm{tl}$",
                    param_unit=r"$\mathregular{h^{-1}}$",
                    loglog=want_loglog, trim=want_trim)

# Iterate over merged delta list
with PdfPages(getOutpath("delta_correlations_merged.pdf")) as pdf:
    for sample in samples:
        for condition in conditions:
            # Get the indices for this property combination
            idx = [i for i in range(len(delta_sorted))
                   if delta_sorted[i]['sample'] == sample
                   and delta_sorted[i]['condition'] == condition]
            #if len(idx) == 0:
            #    continue

            # Merge corresponding datasets
            p_red = np.concatenate([delta_sorted[i]['red'] for i in idx])
            p_green = np.concatenate([delta_sorted[i]['green'] for i in idx])
            lbl = "{} ({})".format(sample, condition)

            # Plot merged datasets
            plotParamCorrelations(p_green, p_red, lbl=lbl, pdf=pdf,
                    param_name=r"$\delta$",
                    param_unit=r"$\mathregular{h^{-1}}$",
                    loglog=want_loglog, trim=want_trim)

In [None]:
# Correlate m_ktl and delta of merged datasets in one figure
mktl_sorted = []
delta_sorted = []
samples = set()
conditions = ["control", "siRNA"]

# Write relevant data in list
for i in range(len(R)):
    sample = D[i]['sample']
    condition = D[i]['condition']

    samples.add(sample)
    if condition not in conditions:
        conditions.append(condition)

    if want_clean:
        idx = eliminate_outliers(i)
    else:
        idx = np.ones((D[i]['nTraces']), dtype=np.bool_)

    mktl_sorted.append({
        'green': R[i]['green']['params']['m_ktl'].values[idx],
        'red': R[i]['red']['params']['m_ktl'].values[idx],
        'sample': sample,
        'condition': condition
    })
    delta_sorted.append({
        'green': R[i]['green']['params']['deltg'].values[idx],
        'red': R[i]['red']['params']['deltr'].values[idx],
        'sample': sample,
        'condition': condition
    })

# Iterate over merged m_ktl list
with PdfPages(getOutpath("m-ktl_correlations_merged_log.pdf")) as pdf:
    for sample in samples:
        lbl = "{}".format(sample)
        p_red = []
        p_green = []
        desc = []

        for condition in conditions:
            # Get the indices for this property combination
            idx = [i for i in range(len(mktl_sorted))
                   if mktl_sorted[i]['sample'] == sample
                   and mktl_sorted[i]['condition'] == condition]
            #if len(idx) == 0:
            #    continue

            # Merge corresponding datasets
            p_red.append(np.concatenate([mktl_sorted[i]['red'] for i in idx]))
            p_green.append(np.concatenate([mktl_sorted[i]['green'] for i in idx]))
            desc.append(condition)

        # Plot merged datasets
        plotParamCorrelations(p_green[0], p_red[0], lbl=lbl, pdf=pdf,
                param_name=r"$mk_\mathrm{tl}$",
                param_unit=r"$\mathregular{h^{-1}}$",
                loglog=want_loglog,  trim=want_trim,
                equal_aspect=True,
                p2_green=p_green[1], p2_red=p_red[1], desc1=desc[0], desc2=desc[1])

# Iterate over merged delta list
with PdfPages(getOutpath("delta_correlations_merged_log.pdf")) as pdf:
    for sample in samples:
        lbl = "{}".format(sample)
        p_red = []
        p_green = []
        desc = []

        for condition in conditions:
            # Get the indices for this property combination
            idx = [i for i in range(len(delta_sorted))
                   if delta_sorted[i]['sample'] == sample
                   and delta_sorted[i]['condition'] == condition]

            # Merge corresponding datasets
            p_red.append(np.concatenate([delta_sorted[i]['red'] for i in idx]))
            p_green.append(np.concatenate([delta_sorted[i]['green'] for i in idx]))
            desc.append(condition)

        # Plot merged datasets
        plotParamCorrelations(p_green[0], p_red[0], lbl=lbl, pdf=pdf,
                param_name=r"$\delta$",
                param_unit=r"$\mathregular{h^{-1}}$",
                loglog=want_loglog,  trim=want_trim,
                equal_aspect=True,
                p2_green=p_green[1], p2_red=p_red[1], desc1=desc[0], desc2=desc[1])

In [None]:
# Set up merged parameter lists
# Stores all parameters in a data structure accessible as
# parameters[<sample>][<condition>][<parameter>][<color>]
par_names = ("onset", "mktl", "delta")
colors = ("red", "green")
parameters = {}

# Write relevant data in list
for i, r in enumerate(R):
    if want_clean:
        idx = eliminate_outliers(i)
    else:
        idx = np.ones((D[i]['nTraces']), dtype=np.bool_)

    sample = D[i]['sample']
    condition = D[i]['condition']

    if sample not in parameters:
        parameters[sample] = {}
    if condition not in parameters[sample]:
        parameters[sample][condition] = {}
    for p in par_names:
        if p not in parameters[sample][condition]:
            parameters[sample][condition][p] = {}
        for c in colors:
            if c not in parameters[sample][condition][p]:
                parameters[sample][condition][p][c] = []

    parameters[sample][condition]["onset"]["red"] = np.concatenate(
        (parameters[sample][condition]["onset"]["red"],
         r['red']['params']['tr'].values[idx]))
    parameters[sample][condition]["onset"]["green"] = np.concatenate(
        (parameters[sample][condition]["onset"]["green"],
         r['green']['params']['tg'].values[idx]))
    parameters[sample][condition]["mktl"]["red"] = np.concatenate(
        (parameters[sample][condition]["mktl"]["red"],
         r['red']['params']['m_ktl'].values[idx]))
    parameters[sample][condition]["mktl"]["green"] = np.concatenate(
        (parameters[sample][condition]["mktl"]["green"],
         r['green']['params']['m_ktl'].values[idx]))
    parameters[sample][condition]["delta"]["red"] = np.concatenate(
        (parameters[sample][condition]["delta"]["red"],
         r['red']['params']['deltr'].values[idx]))
    parameters[sample][condition]["delta"]["green"] = np.concatenate(
        (parameters[sample][condition]["delta"]["green"],
         r['green']['params']['deltg'].values[idx]))

In [None]:
parameters

## Population analysis

Goal: test whether control and siRNA are equally distributed.

In [None]:
def ranksums(X, Y, roundDigits=None, debug=False):
    """Calculates the ranksums for the Wilcoxon-Mann-Whitney-U test.
    For degenerate values, the mean rank is used.

    Input:
        X -- 1-dimensional array of first sample (need not be sorted)
        Y -- 2-dimensional array of second sample (need not be sorted)
        roundDigits -- number of digits to round to; no rounding if None
        debug -- if True, the rank lists are printed

    Returns:
        tuple (ranksum of X, ranksum of Y)
    """
    # Flatten, round and sort X and Y
    if len(X.shape) > 1:
        X = X.flatten()
    if len(Y.shape) > 1:
        Y = Y.flatten()
    if roundDigits is not None:
        X = X.round(decimals=roundDigits)
        Y = Y.round(decimals=roundDigits)
    XY = np.concatenate((X,Y))
    XY.sort()

    # Compute ranks
    R = {}
    iR = 0
    xy_uq, xy_ct = np.unique(XY, return_counts=True)
    for i, c in enumerate(xy_ct):
        iR_old = iR
        iR += c
        R[xy_uq[i]] = np.arange(iR, iR_old, -1).sum() / c

    # Compute rank sums
    if debug:
        xnew = []
        ynew = []

    sumX = 0
    for x in X:
        sumX += R[x]
        if debug:
            xnew.append(R[x])

    sumY = 0
    for y in Y:
        sumY += R[y]
        if debug:
            ynew.append(R[y])

    if debug:
        print("X: {}\nY:{}".format(str(xnew), str(ynew)))

    return sumX, sumY

def criticalValue(U, n1, n2):
    """Computes the critical value for the Wilcoxon-Mann-Whitney test.
    The critical value is estimated using an estimation formula
    assuming large n1 and n2.
    The null hypothesis cannot be rejected when the critical value
    is inside the desired range of a gaussian distribution.

    Input:
        U -- the smaller Mann-Whitney-U test statistic of the two samples
        n1 -- the number of elements in one of the samples
        n2 -- the number of elements in the other sample

    Returns:
        Estimated critical value
    """
    return (U - n1 * n2 / 2) / np.sqrt(n1 * n2 * (n1 + n2 + 1) / 12)

def WilcoxonMannWhitney(X, Y, alpha=.05, roundDigits=None, verbose=False):
    """Wilcoxon-Mann-Whitney-U test

    Input:
        X -- first sample as 1-dim array
        Y -- second sample as 1-dim array
        alpha -- significance level
        roundDigits -- if not None, X and Y are rounded; corresponds
                       to the "decimals" parameter of ndarray.round
        verbose -- if True, print detailed information

    Returns:
        True if null hypothesis cannot be rejected (X and Y are from
        same population), False otherwise
    """
    # Obtain ranksums
    sumX, sumY = ranksums(X, Y, roundDigits=roundDigits)
    nX = X.size
    nY = Y.size

    # Find critical value
    Ux = sumX - nX * (nX + 1) / 2
    Uy = sumY - nY * (nY + 1) / 2
    Z = criticalValue(min(Ux, Uy), nX, nY)

    # Get interval of non-critical values
    intZ = ss.norm.ppf((alpha/2, 1-alpha/2))

    # Get p-value
    sgnZ = -1 if Z < 0 else 1
    p = ss.norm.cdf(-sgnZ * Z) + 1 - ss.norm.cdf(sgnZ * Z)

    # Get test result
    isAccept = False
    if Z > intZ[0] and Z < intZ[1]:
        isAccept = True

    # DEBUG
    if verbose:
        print("""Ranksums: x={:6.0f}, y={:6.0f}
U:        x={:6.3f}, y={:6.3f}
Z={:.3f} {} [{:.3f}, {:.3f}]
p-value={:.3f}, 𝛼={:.3f}
Null hypothesis is {}.""".format(sumX, sumY, Ux, Uy, Z,
                                 '∈' if isAccept else '∉',
                                 intZ[0], intZ[1], p, alpha, isAccept))

    return isAccept, p

In [None]:
X = np.array((1,2,3,4,4,6))
Y = np.array((2,2,3,4,5,6))
ranksums(X, Y, debug=True)

In [None]:
X = np.array((1,2,3,4,4,6,1,1))
Y = np.array((2,2,3,4,5,6,7,10))
WilcoxonMannWhitney(X, Y, verbose=True)

In [None]:
WilcoxonMannWhitney(parameters['A549']['control']['mktl']['green'],
                    parameters['A549']['siRNA']['mktl']['green'],
                    roundDigits=3, alpha=.01, verbose=True)

In [None]:
def BrownForsythe(Y, alpha=.05, verbose=False):
    """Brown-Forsythe test"""
    k = len(Y)
    N = 0
    Ni = np.zeros(k)
    Zij = []
    Zi = []
    denominator = 0

    for i, y in enumerate(Y):
        Ni[i] = y.size
        N += Ni[i]
        my = np.median(y)
        Zij.insert(i, abs(y - my))
        Zi.insert(i, np.mean(Zij[i]))
        denominator += np.sum((Zij[i] - Zi[i])**2)

    Z = np.mean(np.concatenate(Zij))
    numerator = np.sum(Ni * (Zi - Z)**2)

    W = (N - k) / (k - 1) * numerator / denominator
    maxW = ss.f.ppf((1 - alpha,), k - 1, N - k)[0]
    p = 1 - ss.f.cdf(W, k - 1, N - k)

    isAccept = False
    if W <= maxW:
        isAccept = True

    if verbose:
        print("W={:.3f} {} {:.3f}\np-value={:.3f}, 𝛼={:.3f}\nNull hypothesis is {}.".format(
            W, "≤" if isAccept else "≰", maxW, p, alpha, isAccept))

    return isAccept

In [None]:
BrownForsythe((parameters['A549']['control']['mktl']['green'],
              parameters['A549']['siRNA']['mktl']['green']),
             verbose=True)

In [None]:
BrownForsythe((parameters['A549']['control']['mktl']['green'],
              parameters['A549']['siRNA']['mktl']['green']),
             verbose=True)

In [None]:
samples = ("Huh7", "A549")
par_names = ("onset", "mktl", "delta")
colors = ("red", "green")
alpha = .05

for sample in samples:
    for par_name in par_names:
        for color in colors:
            print("********************\n{:9s}: {:5s}\n{:9s}: {:5s}\n{:9s}: {:5s}\n".format(
                    "Sample", sample, "Parameter", par_name, "Color", color))
            print("Wilcoxon-Mann-Whitney-U test:")
            WilcoxonMannWhitney(parameters[sample]['control'][par_name][color],
                                parameters[sample][ 'siRNA' ][par_name][color],
                                roundDigits=3, alpha=alpha, verbose=True)
            print("\nBrown-Forsythe test:")
            BrownForsythe((parameters[sample]['control'][par_name][color],
                           parameters[sample][ 'siRNA' ][par_name][color]),
                          alpha=alpha, verbose=True)
            print("********************")

In [None]:
samples = ("Huh7",)
par_names = ("mktl",)
colors = ("red", "green")
alpha = .05

for sample in samples:
    for par_name in par_names:
        for color in colors:
            print("********************\n{:9s}: {:5s}\n{:9s}: {:5s}\n{:9s}: {:5s}\n".format(
                    "Sample", sample, "Parameter", par_name, "Color", color))
            print("Wilcoxon-Mann-Whitney-U test:")
            WilcoxonMannWhitney(parameters[sample]['control'][par_name][color],
                                parameters[sample][ 'siRNA' ][par_name][color],
                                roundDigits=3, alpha=alpha, verbose=True)
            print("\nBrown-Forsythe test:")
            BrownForsythe((parameters[sample]['control'][par_name][color],
                           parameters[sample][ 'siRNA' ][par_name][color]),
                          alpha=alpha, verbose=True)
            print("********************")

## Playground for parameter correlations

In [None]:
# Make list of deltas and mktls
delta_sorted = []
mktl_sorted = []
onset_sorted = []
samples = set()
conditions = ["control", "siRNA"]

# Write relevant data in list
for i in range(len(R)):
    sample = D[i]['sample']
    condition = D[i]['condition']

    samples.add(sample)
    if condition not in conditions:
        conditions.append(condition)

    if want_clean:
        idx = eliminate_outliers(i)
    else:
        idx = np.ones((D[i]['nTraces']), dtype=np.bool_)

    delta_sorted.append({
        'green': R[i]['green']['params']['deltg'].values[idx],
        'red': R[i]['red']['params']['deltr'].values[idx],
        'sample': sample,
        'condition': condition
    })
    mktl_sorted.append({
        'green': R[i]['green']['params']['m_ktl'].values[idx],
        'red': R[i]['red']['params']['m_ktl'].values[idx],
        'sample': sample,
        'condition': condition
    })
    onset_sorted.append({
        'green': R[i]['green']['params']['tg'].values[idx],
        'red': R[i]['red']['params']['tr'].values[idx],
        'sample': D[i]['sample'],
        'condition': D[i]['condition']
    })

params = {}
for sample in samples:
    lbl = "{}".format(sample)
    params[sample] = {}

    for condition in conditions:
        params[sample][condition] = {'delta': {}, 'mktl': {}, 't0': {}}

        # Get the indices for this property combination
        idx = [i for i in range(len(delta_sorted))
               if delta_sorted[i]['sample'] == sample
               and delta_sorted[i]['condition'] == condition]

        # Merge corresponding datasets
        params[sample][condition]['delta']['red'] = \
            np.concatenate([delta_sorted[i]['red'] for i in idx])
        params[sample][condition]['delta']['green'] = \
            np.concatenate([delta_sorted[i]['green'] for i in idx])

        params[sample][condition]['mktl']['red'] = \
            np.concatenate([mktl_sorted[i]['red'] for i in idx])
        params[sample][condition]['mktl']['green'] = \
            np.concatenate([mktl_sorted[i]['green'] for i in idx])

        params[sample][condition]['t0']['red'] = \
            np.concatenate([onset_sorted[i]['red'] for i in idx])
        params[sample][condition]['t0']['green'] = \
            np.concatenate([onset_sorted[i]['green'] for i in idx])

In [None]:
# Calculate Pearson correlation coefficient and save to XLSX
samples = set(params.keys())
conditions = set()
par_names = set()
colors = ['red', 'green']
corr_lin_col_name = "Pearson (linear)"
corr_log_col_name = "Perason (logarithmic)"

# Collect keys
for k1 in samples:
    conditions.update(params[k1].keys())
    for k2 in conditions:
        par_names.update(params[k1][k2].keys())

# Initialize table
muli = pd.MultiIndex(names=("Sample", "Condition", "Parameter"),
                     levels=[[], [], []], labels=[[], [], []])
par_corr_tab = pd.DataFrame(index=muli,
                            columns=(corr_lin_col_name,corr_log_col_name),
                            dtype=np.float_)

# Populate table
for k1, d1 in params.items():
    for k2, d2 in d1.items():
        for k3, d3 in d2.items():
            log_idx = np.logical_and(d3['green'] > 0, d3['red'] > 0)
            d3_log_green = np.log10(d3['green'][log_idx])
            d3_log_red = np.log10(d3['red'][log_idx])

            r_pearson_lin = ss.pearsonr(d3['green'], d3['red'])[0]
            r_pearson_log = ss.pearsonr(d3_log_green, d3_log_red)[0]

            print("{:4s} {:7s} {:5s}: pearson correlation: lin={: 5.3f} log={: 5.3f}".format(
                k1, k2, k3, r_pearson_lin, r_pearson_log))
            par_corr_tab.loc[(k1, k2, k3), (corr_lin_col_name, corr_log_col_name)] = \
                (r_pearson_lin, r_pearson_log)

# Save to excel file
xlsx_file = getOutpath("CORR-COEFF" + ".xlsx")
xlsx_writer = pd.ExcelWriter(xlsx_file, engine='xlsxwriter')
par_corr_tab.to_excel(xlsx_writer, sheet_name="Correlation")
xlsx_writer.save()

In [None]:
par_corr_tab

In [None]:
# Kolmogorov-Smirnov test
sample = "Huh7"
res_red = ss.ks_2samp(params[sample]["control"]["mktl"]["red"],
                      params[sample]["siRNA"]["mktl"]["red"])
res_green = ss.ks_2samp(params[sample]["control"]["mktl"]["green"],
                        params[sample]["siRNA"]["mktl"]["green"])
print(res_red.pvalue)
print(res_green.pvalue)

## $\chi^2$ analysis

In [None]:
i = 4 # 4 or 9
r = R[i]
d = D[i]
print(getDataLabel(d))

# Raw traces
dr = d['rfp']
dg = d['gfp']

# Fitted traces


# Chi-square (sum of squared residuals)
cqr = r['red']['chisq']
cqg = r['green']['chisq']

# Logarithm of chi-square
lcqr = np.log10(cqr)
lcqg = np.log10(cqg)

# Maxima of raw traces
mr = dr.max(axis=0)
mg = dg.max(axis=0)

# Relative chi-square
qmr = np.log10(cqr / mr)
qmg = np.log10(cqg / mg)

In [None]:
# Calculate fits
pr = r['red']['params']
pg = r['green']['params']

fr = np.empty_like(dr)
fg = np.empty_like(dg)

for j in range(d['nTraces']):
    fr[:,j] = red_p.eval(**pr.iloc[j,:].to_dict(), t=d['t'])
    fg[:,j] = green_p.eval(**pg.iloc[j,:].to_dict(), t=d['t'])

lnrr = np.log10(np.sum(((fr - dr) / mr)**2, axis=0))
lnrg = np.log10(np.sum(((fg - dg) / mg)**2, axis=0))

In [None]:
f, ax = plt.subplots(1, 2)
ax[0].hist(qmr, color='r', bins=80)
ax[1].hist(qmg, color='g', bins=80)
plt.show(f)
plt.close(f)

In [None]:


for j in range(D[i]['nTraces']):
    #print("{:03d}: {:15f} {:15f}".format(j, lcqr[j], lcqg[j]))
    print("{:03d}: {:15f} {:15f}".format(j, qmr[j], qmg[j]))

In [None]:
print(np.flatnonzero(qmr >= 3.6))
print(np.flatnonzero(qmg >= 3.6))

In [None]:
j = 166
plt.plot(d['t'], fr[:,j], d['t'], dr[:,j])

In [None]:
j = 237
plt.plot(d['t'], fg[:,j], d['t'], dg[:,j])

In [None]:
f, ax = plt.subplots(1, 2)
ax[0].plot(mr, lcqr, '.r', ms=1.5)
ax[1].plot(mg, lcqg, '.g', ms=1.5)

plt.show(f)
plt.close(f)

## Parameter distributions

In [None]:
# Plot violin distributions of the data sets (both separate and combined)
pn_both = ('m_ktl',)
pn_red = ('tr', 'kmr', 'betr', 'deltr', 'offr')
pn_green = ('tg', 'kmg', 'betg', 'deltg', 'offg')

grid = (2, len(pn_both)+max(len(pn_red), len(pn_green)))

with PdfPages(os.path.join(getOutpath(), '{:s}_parameter_distributions.pdf'.format(getTimeStamp()))) as pdf:
    for ds in range(len(D)):

        par_kde = {}
        fit_types = []

        # Check for separate fit
        if 'red' in R[ds] and 'green' in R[ds]:
            hasSeparate = True
            fit_types += ['red', 'green']
        else:
            hasSeparate = False

        # Check for combined fit
        par_kde_combined = {}
        if 'combined' in R[ds]:
            hasCombined = True
            fit_types += ['combined']
        else:
            hasCombined = False

        # Calculate parameter distributions
        for t in fit_types:
            par_kde[t] = parameter_KDE(R[ds][t]['params'])

        # Plot parameter distributions
        for typeName, hasType in zip(('separate', 'combined'), (hasSeparate, hasCombined)):
            if not hasType:
                continue

            fig = plt.figure()
            gs = GridSpec(grid[0], grid[1])

            if typeName == 'combined':
                # Combined fit; define specific settings
                pn_green_temp = pn_green
                pn_red_temp = pn_red
                offset_both = len(pn_both)
                kde_label_green = 'combined'
                kde_label_red = 'combined'

                # Plot combined parameters
                for pi, label in enumerate(pn_both):
                    ax = plt.subplot(gs.new_subplotspec((pi, 0), rowspan=2))
                    data = par_kde['combined'][label]
                    clr_face = '#0000ff55'
                    #clr_edge = '#000099ff'
                    plot_kde(ax, data, label, clr_face)
            else:
                # Separate fit; define specific settings
                pn_green_temp = pn_both + pn_green
                pn_red_temp = pn_both + pn_red
                offset_both = 0
                kde_label_green = 'green'
                kde_label_red = 'red'

            # Plot green parameters
            for pi, par_label in enumerate(pn_green_temp):
                ax = plt.subplot(gs.new_subplotspec((0, pi+offset_both)))
                data = par_kde[kde_label_green][par_label]
                clr_face = '#00ff0055'
                #clr_edge = '#009900ff'
                plot_kde(ax, data, par_label, clr_face)

            # Plot red parameters
            for pi, par_label in enumerate(pn_red_temp):
                ax = plt.subplot(gs.new_subplotspec((1, pi+offset_both)))
                data = par_kde[kde_label_red][par_label]
                clr_face = '#ff000055'
                #clr_edge = '#990000ff'
                plot_kde(ax, data, par_label, clr_face)

            # Show and close figure
            fig.suptitle(getDataLabel(D[ds]) + " (" + typeName + " fit)")
            fig.tight_layout(pad=0, rect=(0, 0, 1, .93))
            pdf.savefig(fig, bbox_inches='tight')
            plt.show(fig)
            plt.close(fig)

## Playground
This section contains code that was/is used for developing ideas.

In [None]:
# Plot parameter correlations

# Get parameters to be correlated
par_cor = (('tr', 'tg'), ('m_ktl', 'm_ktl'), ('kmr', 'kmg'),
           ('betr', 'betg'), ('deltr', 'deltg'), ('offr', 'offg'))

for i, r in enumerate(R):
    with PdfPages(os.path.join(getOutpath(), '{:s}_{}_parameter_correlations.pdf'.format(getTimeStamp(), getDataLabel(D[i], True)))) as pdf:
        for pr, pg in par_cor:
            # Get parameter values
            valr = r['red']['params'].loc[:,pr].values
            valg = r['green']['params'].loc[:,pg].values

            # Sort out outliers
            idx1 = np.ones(np.size(valr), dtype=np.bool_)
            #isr = valr.argsort()[-2:]
            #isg = valg.argsort()[-2:]
            #if valr[isr[0]] < 0.9 * valr[isr[1]]:
            #    idx1[isr[1]] = False
            #if valg[isg[0]] < 0.9 * valg[isg[1]]:
            #    idx1[isg[1]] = False

            # Plot
            fig = plt.figure(figsize=(12,4))
            tit = "{}\nCorrelation {} – {}".format(getDataLabel(D[i]), pr, pg)
            ax1 = fig.add_subplot(131)
            ax1.set_xscale('log')
            ax1.set_yscale('log')
            ax1.plot(valr[idx1], valg[idx1], '.')

            ax1.set_autoscale_on(False)
            lmt = np.array([ax1.get_xlim(), ax1.get_ylim()])
            diag = (lmt[:,0].max(), lmt[:,1].min())
            ax1.plot(diag, diag, '-k')

            ax1.set_xlabel(pr, color='r')
            ax1.set_ylabel(pg, color='g')
            ax1.set_title(tit)

            fig.tight_layout(pad=0)
            plt.show(fig)
            pdf.savefig(fig)
            plt.close(fig)

In [None]:
# Plot the parameter distributions for the datasets
ds_keys = list(R.keys())
ds_keys.sort()
params = R[ds_keys[0]]['combined']['params'].columns
grid = (len(params), len(ds_keys))
i_col = 0

pdffile = os.path.join(getOutpath(), '{:s}_parameters.pdf'.format(getTimeStamp()))
with PdfPages(pdffile) as pdf:
    fig = plt.figure()
    fig.set_figheight(grid[0] * .8 * fig.get_figheight())
    fig.set_figwidth(grid[1] * .8 * fig.get_figwidth())

    for ds in ds_keys:
        i_row = 0
        for p in params:
            ax = plt.subplot2grid(grid, (i_row, i_col))
            ax.hist(R[ds]['combined']['params'][p], bins=100)
            if i_row == grid[0] - 1:
                ax.set_xlabel('Value [a.u.]')
            if i_col == 0:
                ax.set_ylabel('Occurrences [#]')
            ax.set_title('{:s}: {:s}'.format(ds, p))
            i_row += 1
        i_col += 1

    pdf.savefig(fig)
    plt.show(fig)
    plt.close(fig)

In [None]:
# Plot onset time correlations
pdffile = os.path.join(getOutpath(), '{:s}_onset_correlations.pdf'.format(getTimeStamp()))
with PdfPages(pdffile) as pdf:
    for k in R.keys():
        fig = plt.figure()
        plt.plot([0, 30], [0, 30], 'k-')
        plt.plot(R[k]['combined']['params']['tr'], R[k]['combined']['params']['tg'], '.')
        plt.xlabel('Onset RFP [h]')
        plt.ylabel('Onset GFP [h]')
        plt.title(k)
        pdf.savefig(fig)
        plt.show()
        plt.close()
    

In [None]:
# Degradation rate ratio
def plotHistograms(maxH):
    Rkeys = sorted(R.keys())
    for ds in Rkeys:
        #deltg = R[ds]['green']['params']['deltg']
        #deltr = R[ds]['red']['params']['deltr']
        deltg = R[ds]['combined']['params']['deltg']
        deltr = R[ds]['combined']['params']['deltr']
        quot = deltg / deltr

        fig = plt.figure()
        plt.hist(quot, bins=150, range=(0, maxH))
        plt.title(ds)
        plt.xlabel('$\delta_\mathrm{green} / \delta_\mathrm{red}$ [a.u.]')
        plt.ylabel('Occurrences [#]')
        plt.show(fig)
        plt.close(fig)

wdg.interact(plotHistograms, maxH=wdg.IntSlider(
    value=100, min=0, max=1000, step=10, description='Histogram maximum', continuous_update=False));

In [None]:
# Fit distribution to degradation rate quotient histograms
def gamma(x, p=2, b=1, s=10):
    return s * b**p * x**(p-1) * np.exp(-b * x) / sc.special.gamma(p)

def gamma2(x, p1=1.9, p2=2.1, b1=0.9, b2=1.1, s1=10, s2=10):
    return gamma(x, p1, b1, s1) + gamma(x, p2, b2, s2)

def weibull(x, lmbd=.2, k=2, s=10):
    return s * lmbd * k * (lmbd * x)**(k - 1) * np.exp(- (lmbd * x)**k)

def weibull2(x, lmbd1=.15, lmbd2=.25, k1=1.9, k2=2.1, s1=10, s2=10):
    return weibull(x, lmbd=lmbd1, k=k1, s=s1) + weibull(x, lmbd=lmbd2, k=k2, s=s2)

# Define models
model_gamma = lm.Model(gamma)
model_gamma.set_param_hint(name='p', min=.01)
model_gamma.set_param_hint(name='b', min=.01)
model_gamma.set_param_hint(name='s', min=1)

model_gamma2 = lm.Model(gamma2)
model_gamma2.set_param_hint(name='p1', min=.01)
model_gamma2.set_param_hint(name='p2', min=.01)
model_gamma2.set_param_hint(name='b1', min=.01)
model_gamma2.set_param_hint(name='b2', min=.01)
model_gamma2.set_param_hint(name='s1', min=1)
model_gamma2.set_param_hint(name='s2', min=1)

model_weibull = lm.Model(weibull)
model_weibull.set_param_hint(name='lmbd', min=.001)
model_weibull.set_param_hint(name='k', min=.001, max=5)
model_weibull.set_param_hint(name='s', min=1)

model_weibull2 = lm.Model(weibull2)
model_weibull2.set_param_hint(name='lmbd1', min=.001)
model_weibull2.set_param_hint(name='lmbd2', min=.001)
model_weibull2.set_param_hint(name='k1', min=.001, max=5)
model_weibull2.set_param_hint(name='k2', min=.001, max=5)
model_weibull2.set_param_hint(name='s1', min=1)
model_weibull2.set_param_hint(name='s2', min=1)

maxH = 40

with PdfPages(os.path.join(getOutpath(), '{:s}_degradation_distribution.pdf'.format(getTimeStamp()))) as pdf:
    for ds in sorted(R.keys()):
        # Calculate degradation rate quotient
        deltg = R[ds]['combined']['params']['deltg']
        deltr = R[ds]['combined']['params']['deltr']
        quot = deltg / deltr

        # Create histogram
        fig = plt.figure()
        ax = fig.add_subplot(1, 2, 1)
        hist_val, hist_edg = ax.hist(quot, bins=70, range=(0, maxH), label='Histogram')[:2]
        hist_ctr = (hist_edg[:-1] + hist_edg[1:]) / 2

        # Fit models
        result_g = model_gamma.fit(hist_val, x=hist_ctr)
        result_g2 = model_gamma2.fit(hist_val, x=hist_ctr)
        result_w = model_weibull.fit(hist_val, x=hist_ctr)
        result_w2 = model_weibull2.fit(hist_val, x=hist_ctr)

        # Select models
        #print('gamma: {}'.format(result_g.chisqr))
        #print('gamma2: {}'.format(result_g2.chisqr))
        #print('weibull: {}'.format(result_w.chisqr))
        #print('weibull2: {}'.format(result_w2.chisqr))

        if result_g2.chisqr < .7 * result_g.chisqr:
            res_g = result_g2
            name_g = 'gamma2'
        else:
            res_g = result_g
            name_g = 'gamma'

        if result_w2.chisqr < .7 * result_w.chisqr:
            res_w = result_w2
            name_w = 'weibull2'
        else:
            res_w = result_w
            name_w = 'weibull'

        # Plot models
        x = np.linspace(.1, 5, 100)
        ax.plot(hist_ctr, res_g.best_fit, '-', label=name_g, color='orange')
        ax.plot(hist_ctr, res_w.best_fit, '-', label=name_w, color='magenta')
        ax.legend()
        ax.set_xlabel('$\delta_\mathrm{green} / \delta_\mathrm{red}$ [a.u.]')
        ax.set_ylabel('Counts [#]')
        ax.set_title(ds)

        # Print fit reports
        rep = res_g.fit_report(show_correl=False) + '\n' + res_w.fit_report(show_correl=False)
        ax = fig.add_subplot(1, 2, 2)
        ax.set_axis_off()
        ax.text(0, 1, rep, ha='left', va='top', family='monospace', size=5.5)

        # Display, save and close figure
        plt.show(fig)
        pdf.savefig(fig)
        plt.close(fig)

In [None]:
# Scatter plot of degradation rates
Rkeys = sorted(R.keys())
for ds in Rkeys:
    deltg = R[ds]['combined']['params']['deltg']
    deltr = R[ds]['combined']['params']['deltr']

    fig = plt.figure()
    h = plt.plot(deltg, deltr, '.')
    plt.title(ds)
    plt.xlabel('$\delta_\mathrm{green}$ [a.u.]')
    plt.ylabel('$\delta_\mathrm{red}$ [a.u.]')
    plt.xscale('log')
    plt.yscale('log')
    plt.show(fig)
    plt.close(fig)