# Project EA P2

# Simulations of the model with telomerase

## References:
1. The asymmetry of telomere replication contributes to replicative senescence heterogeneity
2. The Length of the Shortest Telomere as the Major Determinant of the Onset of Replicative Senescence

Our goal here is to determine the parameters $\beta$ and $L_0$ of the distribution to fit the following empirical statistical parameters:
+ [mode](https://fr.wikipedia.org/wiki/Mode_(statistiques) = 260 (taken from [1])
+ [skewness](https://fr.wikipedia.org/wiki/Asym%C3%A9trie_(statistiques) = 0.73 (taken from [2])

Afterwards, we simulate random draws from this distribution to compare the size of the smallest telomeres, similarly to what is done in [2].

In [1]:
from numpy import random as npr
import random
import numpy as np
from scipy import stats as scs
from ipywidgets import FloatSlider, IntSlider, Text, IntProgress, VBox, HBox
from bqplot import pyplot as plt

In [2]:
# Parameters from the model
s = 3.5
p = 0.026
L_0 = 430
beta = 1
n = 1700    # maximal length allowed for one telomere (>> mode length, which is around L_0)

In [3]:
# gives the probability of action of telomerase as a function of telomere length
def P(L, L_0, beta):
    if L >= L_0:
        return 1./(1 + beta*(L-L_0))
    else:
        return 1

In [4]:
# markov chain
def evolution(L, L_0, beta):
    prob = P(L, L_0, beta)
    x = random.random()
    if x >= prob:
        return L - 2*s
    else:
        return L - 2*s + 2*npr.geometric(p=p)

**Remark**: we modified a little bit the equations found in the paper to transform the state space from half integers to integers.

In [5]:
# an example of trajectory
L = [450]
n_ex = 100
for k in range(n_ex):
    L.append(evolution(L[-1], L_0, beta))
plt.figure(2, title='Example of trajectory')
plt.clear()
plt.plot(range(n_ex+1), L, labels=['One trajectory'])
plt.plot([1, n_ex+1], [L_0, L_0], colors=['red'], labels=['L_0'])
plt.legend()
plt.show()

## Approximation of the distribution using simulation of the Markov chain

Here we compute the distribution by a Monte Carlo method: that is simulating many independants draws of the Markov chain. It's a quite inefficient method, as the convergence is very slow. Still it's interesting to test it, and the approximate distribution found that way can be used as a starting point for the next method (see below).

In [6]:
# computes histogram from many independant draws from the Markov chain
L = 450
L_realisation = []
n_realisation = 10**4
len_buffer = 500
for i in range(n_realisation):
    for k in range(len_buffer):
        L = evolution(L, L_0, beta)
    L_realisation.append(L)
plt.figure(3, title='Histogram of the distribution of the Markov chain')
plt.clear()
plt.hist(L_realisation, bins=int(4*(n_realisation)**(1/3)), range=(0, max(L_realisation)))
plt.show()

In [7]:
# compute a curve from the histogram
plt.figure(4, title='Distribution computed from the histogram')
plt.clear()
hist, bin_edges = np.histogram(L_realisation, bins=int(4*(n_realisation)**(1/3)), range=(0., 1400.), density=True)
plt.plot([0.5*(bin_edges[i]+bin_edges[i+1]) for i in range(len(bin_edges)-1)], hist)
plt.show()

In [8]:
# Empirical computation of the mode and the skewness
print('Skewness:' + str(scs.skew(L_realisation)))
print('Mode:' + str(bin_edges[np.argmax(hist)-1: np.argmax(hist)+1]))

Skewness:1.8553971153463187
Mode:[ 439.53488372  455.81395349]


## Iterative computation of the distribution

We use the following formula to compute iteratively the distribution, where $\Pi_n$ designate the value of the distribution at the $n$-th iteration. At equal computation cost, this method yields more precise results than the previous one. We test two different initializations: piecewise constant, and initialization from the approximate distribution found above (warm start). In the latter case, the convergence is quicker (40 iterations are enough for convergence). It's logical since we start closer to the true stationary distribution.

$$\Pi_{n+1}(i)= \Pi_{n}(i+3.5)(1-P_{n}(i+3.5))+ \sum_{k=1}^{i+3.5}p (1-p)^{k-1}P_{n}(i+3.5-k) \Pi_{n}(i+3.5-k)$$

In [9]:
# useful for quick computation
def precompute_arrays(L_0, beta):
    geom = np.array([p*(1-p)**k for k in range(n)])
    probas = np.array([P(k, L_0, beta) for k in range(n)])
    return geom, probas

In [10]:
# one iteration from the distribution computation
def iter_distr(pi, L_0, beta, geom, probas):
    new_pi = np.array([0.]*len(pi))
    for i in range(len(pi)-7):
        pi_i = pi[i+7]*(1-P(i+7, L_0, beta)) + np.sum(geom[:int(i/2.+3.5)]*probas[i+5::-2]*pi[i+5::-2])
        new_pi[i] = pi_i
    for i in range(len(pi)-7, len(pi)):
        new_pi[i] = 0.
    return new_pi

In [11]:
# Many iterations for the distribution computation + graph plot + skewness and mode computation
def evol_distr(n_iter, pi, L_0, beta, verbose=True):
    geom, probas = precompute_arrays(L_0, beta)
    pi2 = np.copy(pi)
    for k in range(n_iter):
        pi2 = iter_distr(pi2, L_0, beta, geom, probas)
        pi2 = pi2 / np.sum(pi2)
    
    r = np.array(range(n))/2
    if verbose:
        plt.figure()
        plt.plot(r, pi2)
        plt.show()
    mode = r[np.argmax(pi2)]
    mean = np.sum(r*pi2)
    std = np.sqrt(np.sum(r**2*pi2)-mean**2)
    skewness = np.sum(((r-mean)/std)**3*pi2)
    
    if verbose:
        print('Mode: ' + str(mode))
        print('Skewness: ' + str(skewness))
    return pi2, mode, skewness

### Using a piecewise constant function as an initialization

In [22]:
pi = np.array([0.]*n)
for k in range(L_0, 2*L_0):
    pi[k] = 1./L_0
plt.figure(title='Initialization of the distribution: piecewise constant')
plt.plot(range(n), pi)
plt.show()

In [23]:
n_iter = 40
print('After 40 iterations:')
pi_final, m, s = evol_distr(n_iter, pi, L_0, beta)

Mode: 219.5
Skewness: 1.82959136316


In [24]:
n_iter = 200
print('After 200 iterations:')
pi_final, m, s = evol_distr(n_iter, pi, L_0, beta)

Mode: 218.5
Skewness: 1.86825163703


### Using the empirical distribution as an initialization

In [26]:
pi = np.array([0.]*n)
for k in range(n):
    i = 0
    while i<len(bin_edges)-1 and bin_edges[i] < k:
        i += 1
    pi[k] = hist[i-1]
# Renormalize to take into account small errors
s = np.sum(pi)
for k in range(n):
    pi[k] = pi[k]/s
plt.figure(title='Initialization of the distribution: empirical distribution')
plt.plot(range(n), pi)
plt.show()

In [27]:
n_iter = 40
print('After 40 iterations:')
pi_final, m, s = evol_distr(n_iter, pi, L_0, beta)

Mode: 218.0
Skewness: 1.87223966963


In [28]:
n_iter = 200
print('After 200 iterations:')
pi_final, m, s = evol_distr(n_iter, pi, L_0, beta)

Mode: 218.0
Skewness: 1.86824701755


## Determination of $\beta$ and $L_0$

We provide a graphical user interface to study the influence of $\beta$ and $L_0$ on the statistical parameters of the distribution (mode and skewness). The following values of $\beta$ and $L_0$ seem to fit best the parameters given at the beginning:

+ $beta = 0.0236$
+ $L_0 = 101$

When the user changes the value of $beta$ or $L_0$, the stationnary distribution is recomputed starting from the previous distribution (hot restart). The `n_iter` slider controls the number of iterations on the distribution done at each update of the parameters. You may consider increasing it for instance if the distribution has not yet converged at the end of the computation.

The first computation is longer (800 iterations) to ensure we start from a correct stationnary distribution.

*Remark* : the value of $beta$ and $L_0$ are close to the one found in [2] (they find $beta = 0.045$ and $L_0 = 90$, but we renormalized our Markov chain to take integer values instead of half-integers, which explains why we find a $beta$ approximately twice smaller).

In [31]:
# GUI construction
slider_beta = FloatSlider(min=0.02, max=0.03, value=0.0236, step=0.0001, readout_format='.4f', description='beta:', continuous_update=False)
slider_L_0 = IntSlider(min=80, max=150, value=101, step=1, description='L_0:', continuous_update=False)
slider_n_iter = IntSlider(min=100, max=1500, value=300, step=50, description='n_iter:', continuous_update=False)
mode_output = Text(value='0.0', placeholder='Type something', description='Mode:', disabled=False)
skewness_output = Text(value='0.0', placeholder='Type something', description='Skewness:', disabled=False)

n_iter_initial = 800
progress_bar = IntProgress(value=0, min=0, max=slider_n_iter.value, step=1,description='Progress:', bar_style='success')
items = []
left_box = VBox([slider_beta, slider_L_0, slider_n_iter])
right_box = VBox([progress_bar, mode_output, skewness_output])
box = HBox([left_box, right_box])
display(box)

# Initialization of the distribution: piecewise constant
global pi
pi = np.array([0.]*n)
for k in range(L_0, 2*L_0):
    pi[k] = 1./L_0

plt.figure(1, title='Current distribution')
plt.show()

# Callback when the user changes one of the GUI values: the distribution is recomputed
def on_value_change(change, **kwargs):
    global pi
    beta = slider_beta.value
    L_0 = slider_L_0.value
    geom, probas = precompute_arrays(L_0, beta)
    
    r = np.array(range(n))/2
    
    if 'init' in kwargs:
        n_iter = n_iter_initial
    else:
        n_iter = slider_n_iter.value
    progress_bar.max = n_iter
    for k in range(n_iter):
        pi = iter_distr(pi, L_0, beta, geom, probas)
        pi = pi / np.sum(pi)
        if k % 50 == 0:
            plt.clear()
            plt.plot(r, pi)
        
            mode = r[np.argmax(pi)]
            mean = np.sum(r*pi)
            std = np.sqrt(np.sum(r**2*pi)-mean**2)
            skewness = np.sum(((r-mean)/std)**3*pi)
            mode_output.value = str(mode)
            skewness_output.value = str(skewness)
        progress_bar.value = k+1
        
slider_beta.observe(on_value_change, names='value')
slider_L_0.observe(on_value_change, names='value')
#slider_n_iter.observe(on_value_change, names='value')
on_value_change(None, init=True)

## About the size of the smallest telomeres

Here we simulate according to the stationary distribution found above. We want to study the mean and variance of the size of the smallest telomeres in a cell. As in [2], we find that the smallest telomere's size has a higher variance than the one of the subsequent telomeres.

In [12]:
# Redifining the parameters with the values found above
L_0 = 101
beta = 0.0236

In [14]:
# Initialization
pi = np.array([0.]*n)
for k in range(L_0, 2*L_0):
    pi[k] = 1./L_0
    
# Computation of the stationary distribution
n_iter = 1000
print('Stationary distribution (one thousand iteration, takes a dozen seconds):')
pi_final, m, s = evol_distr(n_iter, pi, L_0, beta)
cdf = np.cumsum(pi_final)
print('Cumulated sum of the stationary distribution:')
plt.figure()
plt.plot(np.arange(0, 850, step=0.5), cdf)
plt.show()

Stationary distribution (one thousand iteration, takes a dozen seconds):


Mode: 260.5
Skewness: 0.729789529549
Cumulated sum of the stationary distribution:


In [15]:
# Initialization of the simulation: a random variable following the above distribution
def initialization_from_distr():
    u = npr.rand()
    return np.where(cdf>u)[0][0] / 2.

In [57]:
# simulation of the distribution computed above
L_1 = []
L_2 = []
L_3 = []
L_4 = []
L_all = []

for k in range(n_realisation):
    list_lengths = [initialization_from_distr() for k in range(32)]
    L_all.extend(list_lengths)
    list_lengths = np.sort(list_lengths)
    L_1.append(list_lengths[0])
    L_2.append(list_lengths[1])
    L_3.append(list_lengths[2])
    L_4.append(list_lengths[3])

In [70]:
plt.figure(title='Distribution of telomeres')
plt.clear()
colors = ['red', 'magenta', 'blue', 'cyan', 'black']
labels=['smallest', '2nd smallest', '3rd smallest', '4th smallest', 'all']
i = 0
for L in [L_1, L_2, L_3, L_4, L_all]:
    hist, bin_edges = np.histogram(L, bins=100, range=(50., 400.), density=True)
    plt.plot([0.5*(bin_edges[i]+bin_edges[i+1]) for i in range(len(bin_edges)-1)], hist, colors=colors[i:i+1], labels=labels[i:i+1])
    i += 1
plt.legend()
plt.show()

i = 0
for L in [L_1, L_2, L_3, L_4]:
    print('Mean of the %s telomere length: %.0f and its variance: %.0f' % (labels[i], np.mean(L), np.std(L)))
    i += 1

Mean of the smallest telomere length: 138 and its variance: 24
Mean of the 2nd smallest telomere length: 161 and its variance: 21
Mean of the 3rd smallest telomere length: 175 and its variance: 20
Mean of the 4th smallest telomere length: 187 and its variance: 19
