<a href="https://colab.research.google.com/github/RobBurnap/Bioinformatics-MICR4203-MICR5203/blob/main/notebooks/ctmc_primer_extended.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>


# CTMC Primer for Phylogenetics



**CTMC = Continuous-Time Markov Chain**

It’s the mathematical framework almost all phylogenetic substitution models (JC, HKY, GTR, PAM, etc.) are built on.

---

### 1. Markov chain
- A process that jumps between states (e.g., nucleotides A,C,G,T or amino acids).
- **Markov property**: the probability of the next state depends only on the current state, not on the past history.

---

### 2. Continuous-time
- Time is treated as a **continuous** variable, not discrete steps.
- Mutations can occur at **any time** along a branch, not just at fixed intervals.

---

### 3. Rate matrix (Q)
- Describes the **instantaneous rates** of change between states.
- Off-diagonals: $q_{ij}$ = rate of substitution from $i \to j$.
- Diagonals: $q_{ii} = -\sum_{j \ne i} q_{ij}$, so each row sums to **zero**.

---

### 4. Transition probabilities
To get the probability of being in state $j$ after time $t$ (given start in $i$), compute the **matrix exponential**:

$$
P(t) = e^{Qt}
$$

Here $P(t)$ is the **transition probability matrix**.

**Example:** $P_{AG}(0.5)$ is the probability that base A becomes G after evolutionary time $t=0.5$ (in units of expected substitutions per site).

#JC69 rate matrix example

In [None]:

import numpy as np
import pandas as pd
from scipy.linalg import expm

states = ["A","C","G","T"]
n = 4

# JC69 rate matrix with expected rate = 1 (so t is in expected substitutions/site)
mu = 1.0  # overall rate scale
alpha = mu / (n - 1)  # off-diagonal rate so that expected rate is 1
Q = np.full((n,n), alpha)
np.fill_diagonal(Q, -alpha*(n-1))

t = 0.5  # branch length
P = expm(Q * t)

pd.DataFrame(P, index=states, columns=states).round(6)

Unnamed: 0,A,C,G,T
A,0.635063,0.121646,0.121646,0.121646
C,0.121646,0.635063,0.121646,0.121646
G,0.121646,0.121646,0.635063,0.121646
T,0.121646,0.121646,0.121646,0.635063


## Demo:
$P(t) \to I$ as $t \to 0$
Here we show how the transition matrix approaches the identity matrix as branch length $t$ becomes very small.


In [None]:

ts = [1e-6, 1e-3, 1e-1]
for t in ts:
    P = expm(Q * t)
    print(f"t = {t}")
    print(pd.DataFrame(P, index=states, columns=states).round(6))
    print()


t = 1e-06
          A         C         G         T
A  0.999999  0.000000  0.000000  0.000000
C  0.000000  0.999999  0.000000  0.000000
G  0.000000  0.000000  0.999999  0.000000
T  0.000000  0.000000  0.000000  0.999999

t = 0.001
          A         C         G         T
A  0.999001  0.000333  0.000333  0.000333
C  0.000333  0.999001  0.000333  0.000333
G  0.000333  0.000333  0.999001  0.000333
T  0.000333  0.000333  0.000333  0.999001

t = 0.1
          A         C         G         T
A  0.906380  0.031207  0.031207  0.031207
C  0.031207  0.906380  0.031207  0.031207
G  0.031207  0.031207  0.906380  0.031207
T  0.031207  0.031207  0.031207  0.906380




### Demo: HKY-like model
We can also build a simplified HKY rate matrix with unequal base frequencies and different transition/transversion rates.


In [None]:

# Example HKY-like Q
pi = np.array([0.4, 0.2, 0.2, 0.2])  # A,C,G,T frequencies
kappa = 3.0  # transition/transversion ratio

Q_hky = np.zeros((n,n))
for i, xi in enumerate(states):
    for j, xj in enumerate(states):
        if i != j:
            # transitions: A<->G, C<->T
            if (xi,xj) in [("A","G"),("G","A"),("C","T"),("T","C")]:
                Q_hky[i,j] = kappa * pi[j]
            else:
                Q_hky[i,j] = pi[j]
# set diagonals
np.fill_diagonal(Q_hky, -Q_hky.sum(axis=1))

# scale so average rate = 1
rate = -np.sum(pi * np.diag(Q_hky))
Q_hky = Q_hky / rate

# compute transition matrix at t=0.5
P_hky = expm(Q_hky * 0.5)
pd.DataFrame(P_hky, index=states, columns=states).round(6)


Unnamed: 0,A,C,G,T
A,0.709081,0.068152,0.154616,0.068152
C,0.136304,0.633955,0.068152,0.161589
G,0.309231,0.068152,0.554465,0.068152
T,0.136304,0.161589,0.068152,0.633955
