# Alexandra Gavrilina

# HOMEWORK 2: COALESCENT WITH MUTATION

In [1]:
import numpy as np
import random
import math
import scipy
from scipy import special
import matplotlib.pyplot as plt
%matplotlib inline

## 1. Basics

Chromosome is an interval $[0, 1]$.

Individual (or individual’s genome) is a set of $M$ chromosomes, numbered from $0$ to $M − 1$.

Chromosomes with the same id (from different individuals) are related by a single tree genealogy (no recombination).

Genealogies for chromosomes with different ids are simulated independently from each other.

## 2. Coalescence with mutation
Let there be $K$ lineages. Mutation rate is $\mu$, effective population size over time is $ν(t)$. Assume that $ν(t)$ is piecewise constant function.

Coalescence with mutation is a Poisson process with the (variable) rate

$$\omega(K, t) = K \mu + \dfrac{1}{v(t)} \begin{pmatrix} K \\ 2 \end{pmatrix}.$$

Simulation scheme.

(1) Set $t = 0$, initialise $K$.

(2) Sample time $T$ till the next event from Poisson process with the rate $\omega (K, t)$. Set $t = t + T$.

(3) Generate type of the event following Bernoulli distribution with weights proportional to Kµ (mutation) and $\dfrac{1}{v(t)} \begin{pmatrix} K \\ 2 \end{pmatrix}$ (coalescence).

* Mutation: sample ancestral lineages $l$ where mutation occurs independently from $K$ available lineages. Sample mutation position $p$ on a genome uniformly on $[0, 1]$. All individuals which are decedents of $l$ get variant $1$ at position $p$. All other individuals have variant $0$ at position $p$.

* Coalescence. Choose uniformly a random pair of lineages $l_1$ and $l_2$. These two lineages coalesce at time $t$. Update genealogy. Set $K = K − 1$.

(4) stop if $K = 1$. Otherwise go to step 2.

In [2]:
'''
M: number of chromosomes
K: lineages
mu: mutation rate
'''
M = 1000
K = 10
mu = 2

In [3]:
'''
T: time periods
N: population sizes
'''
T = np.array([0, .04, .01, .07], dtype=float)
N = np.array([2, 5, 10, 6], dtype=int)

In [4]:
'''
Population size at time t
Input: time t
Output: v(t)
'''
def v(t):
    return np.piecewise(t, (t < 100, t >= 100), (1, 2))

In [5]:
'''
Poisson process rate
Input: K, mu, v(t)
Output: w(K, mu)
'''
def rate(K, mu, v):
    return K*mu + (1/v)*scipy.special.binom(K, 2)

In [6]:
'''
Coalescence and mutation
Input: K, mu, T, N
Output: coalescent (pair of lineages, time), mutation (positionm time), nc (coalescence number), nm ()
'''
def coal_mut(K, mu, T, N):
    t = 0
    coalescent = [] # list for coalescence
    mutation = [] # list for mutations
    nm = 0 # number of mutations
    nc = 0 # coalescence number
    w = rate(mu, K, v(t)) # rate of Poisson process
    while K != 1:
        t += np.random.exponential(1 / w) # from Poisson process
        if np.random.binomial(1, K*mu/w): # binomial distribution
            i = random.randint(0, K-1)
            p = random.random()
            mutation.append([t, i, p]) # time, id, position    
            nm += 1
        else:
            i = random.randint(0, K - 1)
            j = random.randint(0, K - 1)
            coalescent.append([t, min(i, j), max(i, j)]) # time, pair i-j
            nc += 1
        K -= 1 # decrease K: K = K - 1
    return coalescent, mutation, nc, nm

In [7]:
random.seed(1)

coal, mut, nc, nm = coal_mut(K, mu, T, N)

print("Сoalescence number: ", nc)
print("Coalescent: time t, i and j")
print(np.asarray(coal))

print("Number of mutations: ", nm)
print("Mutation: time t, i, position p:")
print(np.array(mut))

Сoalescence number:  4
Coalescent: time t, i and j
[[0.18582178 1.         4.        ]
 [0.32000425 0.         3.        ]
 [0.32826334 0.         3.        ]
 [0.34203032 1.         2.        ]]
Number of mutations:  5
Mutation: time t, i, position p:
[[0.0494092  2.         0.56920387]
 [0.19817279 1.         0.49543509]
 [0.20068388 3.         0.47224524]
 [0.30639009 3.         0.78872335]
 [0.36207458 0.         0.69583287]]
