# Finite Populations

## Exponential Growth

The growth rate of bacteria is described by an exponential function

$$
    \frac{dn(t))}{dt} = r n(t) \Leftrightarrow n(t) = n(0) e^{r t}
$$

When having multiple species, the relative proportions of their growth rate describe which population will dominate over time

$$
    \frac{n(t)}{n'(t)} = e^{r-r'}\frac{n(0)}{n'(0)}
$$

The term $r-r'$ defines which species will dominate. If $r > r'$ then the species with $r$ growth rate will dominate and vice versa.

The proportions over all the species is then given as 

$$
    \rho_i(t) 
    = 
    \frac{n_i(t)}{\sum_j n_j(t)} 
    = 
    \frac{e^{r_i t}n_i(0)}{\sum_j e^{r_j t} n_j(0)}
    =
    \frac{e^{r_i t} (\sum_k n_k(0)) \rho_i(0)}{\sum_j e^{r_j t} (\sum_k n_k(0)) \rho_j(0)}
    =
    \frac{e^{r_i t} \rho_i(0)}{\sum_j e^{r_j t} \rho_j(0)}
    = 
    \frac{e^{r_i t}}{\langle e^{r t} \rangle} \rho_i (0)
$$

## Evolution in a controlled environment


Assume we have a starting population of bacteria and after a certain time take a fraction (0.01) of the population and regrow that population.

Initially all bacteria are the same and have the same growth rate $r$.
At some point in time, a mutant appears with the growth rate $r + dr$.

Overnight we then have a population consisting of the wildtype with $n$ which grows to $ne^{rt_1}$ and the mutant with $1$ which grows to $e^{(r+dr)t_1}$.
Splitting the population and letting them grow again gives then the wildtype with $0.01ne^{r t_1}$ which grows to $0.01ne^{r (t_1+t_2)}$ and the mutant with $e^{(r+dr)t_1}$ which grows to $0.01e^{(r+dr)(t_1+t_2)} $.
After $k$ nights the population sizes for the wildtype is gives as $0.01^k n e^{r(\sum_{i=1}^k t_i)}$ and for the mutant as $0.01^k e^{(r+dr)(\sum_{i=1}^k t_i)}$
Writing $\sum_{i=1}^k t_i = t$ the fraction of the population is then given as 

$$
    \rho(t) 
    = 
    \frac{0.01^k e^{(r+dr)t}}{0.01^k e^{(r+dr)t} + 0.01^k n e^{rt}}
    =
    \frac{e^{(r+dr)t}}{e^{(r+dr)t} + n e^{rt}}
    =
    \frac{e^{drt}}{e^{drt} + n}
$$

If $dr > 1$ the mutant population is guaranteed to dominant the population after some time. If $dr < 0$ the mutant population will decrease over time and if $dr = 0$ the mutant population will stay constant.

\subsection{Stochastic dynamics in finite populations}

Assume a population evolving from one generation to another while the population size stays constant. All of the population have the same growth rate $r$. At the start, all have a unique genotype. Over night all will grow by exactly the same amount from $N$ to $100 N$. After this we take a random sample of size $N$. 

The frequency of any gene variant in the population will vary during the sampling due to genetic drift. Due to the random sampling it may occur that some genes are left out or are overselected, which after some generations will cause the entire population to stem from one single individual.

Using coalescent theory we can determine how long on average it takes for this take over to occur and how far back we will find the current population most recent common ancestor (MRCA).

We describe the lineage tree by the times $t_k$ that the population had $k$ parallel lineages, i.e. times between $k$ and $k-1$ ancestors.
For this we will determine the probability to spend $t_k$ generations with $k$ ancestors, that is we want to find the probability distribution that the $k$ lineages merge to $k-1$ lineages.

We start at the first generation in the past where there were $k$ ancestors. For there to be $k$ ancestors also in the previous generation, all $k$ individuals need to have a different parent in the previous generation.
The probability of this for a population of $N$ individuals is 

$$
    P(all \ k \ different) = 1 (1 - \frac{1}{N}) (1 - \frac{2}{N}) ... (1 - \frac{k}{N}) \approx 1 - \frac{k(k-1)}{2N} + O(\frac{1}{N^2})
$$

The probability then that all $k$ individuals remain seperate for atleast $t$ time steps is 

$$
    P_k(t) \approx (1 - \frac{k(k-1)}{2N})^t
$$

We can approximate this with the exponential function through a taylor expansion which gives 

$$
    P_k(t) \approx e^{- \frac{k(k-1)t}{2N}}
$$

The probability that all $k$ remain seperate for exactly $t$ time steps is then just the probability atleast $t$ time steps minus $t+1$ time steps

$$
    p_k(t) = P_k(t) - P_k(t+1) = e^{- \frac{k (k-1)t}{2N}} (1 - e^{- \frac{k(k-1)}{2N}})
$$

The average time that all $k$ remain seperate is then 

$$
    \langle t_k \rangle = \int_{t=0}^{\infty} t p_k(t) dt = ... = \frac{2N}{k(k-1)}
$$

The time to the most recent common ancestor is then the sum over all the times $t_k$ 

$$
    T_1 = \sum_{k=2}^N \langle t_k \rangle = \sum_{k=2}^N \frac{2N}{k(k-1)} = 2N (\frac{N - 1}{N}) = 2N - 2 \approx 2N
$$

When looking at populations, mutations can also occur, where the bases of the genetic code are changed randomly. 
Each time an individual reproduces, there is a probability $\beta$ of mutation per base, thus the probability that one of the $L$ letters mutates is $L \beta$. If the mutation rate per individual is $\mu$, then an individual mutates with the probability $\mu$ and stays the same with the probability $1 - \mu$.

The variation is proportional to the total amount of mutations in the population tree. The average amount of mutations is is proportional to the number of individuals in the tree and the mutation rate.

$$
    M = \mu \sum_{k=2}^N k \langle t_k \rangle = \mu \sum_{k=2}^N \frac{2N}{k-1} = 2\mu N \sum_{k=2}^N \frac{1}{k-1} \approx 2 \mu N \log(N)
$$

This is the amount of mutations we observe in a population tree.
$k \langle t_k \rangle$ is that $k$ individuals were present during the time the number lineages went from $k$ to $k-1$.

We observe different behaviour between individuals. A pair coalesces (same parent) with a probability of $1/N$. A mutation occurs with $2 \mu$ and nothing happens with $1 - \frac{1}{N} - 2\mu$.

If $n$ mutations occur before the coalescence, that means that this probability is given by, $n$ mutations, $k$ nothing happening with $k \in [0, \infty)$, giving the probability 

$$
    \sum_{k=0}^{\infty} (2\mu)^n (1 - 2\mu - \frac{1}{N})^k \frac{1}{N} \frac{(k+n)!}{k!n!} = (\frac{2\mu N}{2\mu N + 1})^n
$$

This is the same as the probability of a mutation occuring when something happens $n$ times, times the probability that a coalescence occurs when something happens.

Writing $x = 2\mu N$ and $y=\frac{x}{x+1}$ ,the number of expected mutations is then given by

$$
    \langle n \rangle = \sum_i i y^i (1 - y) = \frac{y}{1-y} = 2\mu N
$$ 

The mutation of a single site is given by $\beta$. For a population of size $N$ the probability that the site has mutated $n$ times since their common ancestor is 

$$
    P(n) = (\frac{2 \beta N}{2 \beta N + 1})^n \frac{1}{2 \beta N + 1}
$$

with $\langle n \rangle = 2 \beta N \approx 2\cdot 10^{-4}$, meaning that all individuals of a population have the same letter at a single site.

## Moran model

We assume a fixed population size of size $N$. Looking at a single position in the genome, and assume that the individuals with the letter "A" replicate with a rate $\sigma$ and all other with the rate $1$.
Initially we have $N - n$ individuals with the letter "A" and $n$ individuals with another letter (mutants). Per unit time there is a probability $\mu$ that an individual undergoes a mutation at the chosen position.
During a time interval $dt$ the following events happen:

- An "A" individual duplicates : $\sigma(N - n)dt$ \\
- Another type duplicates : $ndt$ \\
- An "A" type mutates : $\mu(N - n)dt$ \\
- Another type mutates to "A" : $\frac{\mu n}{3}dt$ \\
- An "A" type is removed : $(1 - \frac{n}{N})$ \\
- Another type is removed : $\frac{n}{N}$ \\


\subsection{Deriving instantaneous rates of change in mutant frequency}

Using the moran model, the rate of decrease of a mutant is then 

$$
    T(f, \delta f = - \frac{1}{N}, dt) = \sigma(N - n) \frac{n}{N} dt = \sigma n (1 - f) dt
$$

and the increase of a mutant is then 

$$
    T(f, \delta f = + \frac{1}{N}, dt) = n (1 - \frac{n}{N}) dt = nf (1 - f) dt
$$

using the substitution from before we getthe new change probabilities

- An "A" individual duplicates : $N\sigma(1 - f)dt$ \\
- Another type duplicates : $Nfdt$ \\
- An "A" type mutates : $N\mu(1 - f)dt$ \\
- Another type mutates to "A" : $\frac{\mu}{3}Nfdt$ \\
- An "A" type is removed : $(1 - f)$ \\ 
- Another type is removed : $f$ \\

The probability of changing by one individual during $dt$, that is the total increase of a mutant is 

$$
    T(f, \delta f = - \frac{1}{N}, dt) = N(\sigma f (1 - f) + \frac{\mu}{3}f)dt
$$
and the total decrease of a mutant is 
$$
    T(f, \delta f = + \frac{1}{N}, dt) = N(f(1-f) + \mu(1-f))dt
$$

Which is $(death * replication) + mutation$


# Probability of fixation

- $f$ : Fraction of the population with the mutant genotype
- $\pi(f)$ : Probability that the mutant will eventually take over the population given the starts from a fraction $f$
- $T(f, \delta f, dt)$ : Probability that the fraction changes from $f$ to $f + \delta f$ in a small interval $dt$

The fixation probability distribution obeys the Master equation:

$$
\pi(f) = \int T(f, \delta f, dt) \pi(f + \delta f) d\delta f
$$

Due to the change $\delta f$ being small in $dt$ we can do a taylor expansion around $\delta f$

$$
\pi(f + \delta f) = \pi(f) + \delta f \pi'(f) + \frac{(\delta f)^2}{2}\pi''(f) + ...
$$

Which gives 

$$
\pi(f) 
= 
\int T(f, \delta f, dt) \left( \pi(f) + \delta f \pi'(f) + \frac{(\delta f)^2}{2}\pi''(f)\right) d\delta f 
=
\pi(f) = \pi(f) + \pi'(f) \langle \delta f \rangle_f + \pi''(f) \frac{\langle (\delta f)^2 \rangle_f}{2} 
$$

Where $\langle \delta f \rangle_f = \int \delta f T(f, \delta f, dt) d \delta f$ and $\langle (\delta f)^2 \rangle_f = \int (\delta f)^2 T(f, \delta f, dt) d \delta f$.

Defining $X(f) = \pi'(f)$ gives us 

$$
X(f)\langle \delta f \rangle_f + X'(f) \frac{\langle (\delta f)^2}{2} = 0 \Leftrightarrow \frac{X'(f)}{X(f)} = -2 \frac{\langle \delta f \rangle_f}{\langle (\delta f)^2 \rangle_f}
$$

Because now the ODE $y'=a y \Rightarrow y(t) = c * e^{\int a dt}$ we get 

$$
X(f) = c * e^{\int -2 \frac{\langle \delta f \rangle_f}{\langle (\delta f)^2 \rangle_f} df}
$$

And then because $X(f) = \pi'(f)$

$$
\pi(f) = C' + C \int exp\left({\int -2 \frac{\langle \delta f \rangle_f}{\langle (\delta f)^2 \rangle_f} df}\right) df
$$

If we set $g(f) := -2 \frac{\langle \delta f \rangle_f}{\langle (\delta f)^2 \rangle_f}$ then we can rewrite the integral to 

$$
\pi(f) = \int e^{\int g(f) df} df
$$

With the boundary conditions $f = 0$, the mutant will never take over the population this, $\pi(0) = 0$, and with $f = 1$ the mutant has take over the population, thus $\pi(1) = 1$ we get the constants

$$
C = \frac{1}{[\int e^{G(f)}df]_1 - [\int e^{G(f)}df]_0} \qquad C' = \frac{- [\int e^{G(f)}df]_0}{[\int e^{G(f)}df]_1 - [\int e^{G(f)}df]_0}
$$

This gives in the end

$$
\pi(f) = \frac{\int_0^f \exp \left( -2 \int_0^x \frac{\langle \delta f \rangle_y}{\langle (\delta f)^2 \rangle_y} dy \right)dx}{\int_0^1 \exp \left( -2 \int_0^x \frac{\langle \delta f \rangle_y}{\langle (\delta f)^2 \rangle_y} dy \right)dx}
$$

For the moran model, we calculate the averages $\langle \delta f \rangle_f$ and $\langle (\delta f)^2 \rangle_f$

$$
\begin{align*}
    &\langle \delta f \rangle = \frac{1}{N} nf(1-f) - \frac{1}{N}\sigma n (1 - f) = f(1-f)(1-\sigma)dt \\
    &\langle (\delta f)^2 \rangle = \frac{1}{N^2} nf(1-f) - \frac{1}{N^2}\sigma n (1 - f) = \frac{f(1-f)(1+\sigma)}{N}dt\\
\end{align*}
$$

The ratio between these two is given as

$$
N\frac{1-\sigma}{1+\sigma}
$$

The integral then becomes

$$
\pi(f) 
= 
\frac{\int_0^f \exp \left( -2 \int_0^x N\frac{1-\sigma}{1+\sigma} dy \right)dx}{\int_0^1 \exp \left( -2 \int_0^x N\frac{1-\sigma}{1+\sigma} dy \right)dx}
= 
\frac{\int_0^f \exp \left( -2  Nx\frac{1-\sigma}{1+\sigma} \right)dx}{\int_0^1 \exp \left( -2 Nx\frac{1-\sigma}{1+\sigma} \right)dx}
= 
\frac{1 - e^{2Nf \frac{\sigma - 1}{\sigma + 1}}}{1 - e^{2N \frac{\sigma - 1}{\sigma + 1}}}
$$

Now $\sigma$ is the relative replication rate of the dominant genotype. $\sigma \approx 0$ means that the mutant has a large fitness advantage and vice-versa.
Even with a large advantage the mutant may not fixate itself and the higher the $sigma$ the lower the fixation probability.

For mutations with small deleterious effects, we can write $\sigma - 1 = s$ and then rewrite our probability distribution $e^{\frac{2(\sigma - 1)}{\sigma + 1}} = e^{\frac{2s}{2+s}}$ which then gives for $s \approx 0$

$$
\pi(\frac{1}{N}) \approx \frac{1 - e^{s}}{1 - e^{Ns}} \approx \frac{1}{N} \frac{Ns}{e^{Ns} - 1} \xrightarrow[s \rightarrow 0]{} \frac{1}{N}
$$

Meaning that neutral mutations ($s = 0$) spread with the probability $ \frac{1}{N} $. The further effect of selection when $s > 0$ depends on $N$.

# Finite population implementation

In [3]:
from enum import Enum
import numpy as np

In [9]:
class Base(Enum):
    A = 0,
    T = 1,
    C = 2,
    G = 3
    
class Action(Enum):
    Kill = 0.2
    Replicate = 0.4
    Survive = 0.4
    
class Individual:
    gene_code: list[Base] = []
    parent = None
    generation: int
    
    def __init__(self, length: int, generation: int = 1):
        self.gene_code = np.random.choice([Base.A, Base.T, Base.C, Base.G], length)
        self.generation = generation
        
class Simulation:
    population_size: int
    population: list[Individual] = []
    
    def __init__(self, length: int, population_size: int):
        self.population_size = population_size
        for i in range(population_size):
            self.population.append(Individual(length, 1))
    
    def run(self, iterations: int):
        for i in range(iterations):
            new_population: list[Individual] = np.random.choice(self.population, self.population_size)
            for individual in self.population:
                pass
                

In [8]:
S = Simulation(50, 100)
S.run(50)