# **Markov Chains** 

They are a model for describing systems that move from **state to state** via random transitions 



Here is a simple example: 

Imagine the scenario where we're the heads of a restaurant, and we only serve three items, one item per day: 
1) Burgers
2) Pizzas 
3) Hotdogs 

The associated states are below: 

<img src="https://i.ytimg.com/vi/i3AkTO9HLXo/maxresdefault.jpg" alt="Image Alt Text" width="600" height="315">

But what do all those numbers and lines mean?
* They represent the probabilities of going from one state to another 
* Each state represents the menu item which is served that day 

Looking at our Markov Chain, we can observe the following states and their probabilities: 
* If we have a burger on a specific day, then the next day we will have:  
    * A burger again with probability $0.2$
    * A pizza with probability $0.6$
    * A hotdog with probability $0.2$

* If we have pizza on a specific day, then the next day we will have:  
    * A burger with probability $0.3$
    * A pizza again with probability $0$
    * A hotdog with probability $0.7$

* If we have a hotdog on a specific day, then the next day we will have:  
    * A burger with probability $0.5$
    * A pizza with probability $0$
    * A hotdog again with probability $0.5$


### **A More Formal Set Up** ##

We define our state space $K = {1, 2, ... K}$ for finite $K$

Transition matrix: $P$, which is a $K$ by $K$ real matrix satisfying: 
* $P(i,j) \geq 0$       $\forall i,j$ $\in K$ (non-negative)
    * This represents the probability of going from state $i$ to state $j$

* $\sum_{j} P(i,j) = 1$          $\forall i$ $\in$ $K$
    * All the probabilities going out of a state sum to $1$

Given any $X_0 \in K$, defined a random sequence $X_0, X_1, X_2, ... $ by 

$$P[X_{n+1} = j | X_n = i, X_{n-1}, ... X_0] = P(i,j)$$

This means that the probability depends **only on** $X_n = i$, or the probability of transitioning to the next state depends only on the current state, and not the sequence of events that preceded it

More generally: $X_o$ has a probability distribution on $K$

For the burger situation, here is the transition matrix: 

$$
\text{Transition Matrix} = 
\begin{array}{c|ccc}
& \text{Burger} & \text{Pizza} & \text{Hotdog} \\
\hline
\text{Burger} & 0.2 & 0.6 & 0.2 \\
\text{Pizza} & 0.3 & 0 & 0.7 \\
\text{Hotdog} & 0.5 & 0 & 0.5 \\
\end{array}
$$

$$



### **Matrix-vector formulation**

Let $\pi_{n}$ be a row vector describing the probability distribution over states after $n$ transitions, i.e. 

$\pi_n (i) = P[X_{n} = i]$
* So each element of the row vector represents the probability of being in the corresponding state after $n$ transitions

Given a certain $\pi_n$, what does $\pi_{n+i}$ look like? 

$\rightarrow \pi_{n+i}(j) = \sum_{i \in K} \pi_{n}(i) \cdot P(i,j)$

So: $\pi_{n+i} = \pi_{n}P$ 

Let us consider the following example: 

Consider a two-state Markov chain with the following transition matrix:

$$ P = \begin{pmatrix} 0.8 & 0.2 \\ 0.4 & 0.6 \end{pmatrix} $$

This matrix represents the probabilities of transitioning between the two states. For example, $ P_{1,1} = 0.8 $ represents the probability of transitioning from state 1 to state 1, and $ P_{2,1} = 0.4 $ represents the probability of transitioning from state 2 to state 1.

Now, let's assume we start with an initial probability distribution $ \pi_0 = [0.5, 0.5] $. This means that at the beginning, there's an equal probability of being in either state 1 or state 2.

To find the probability distribution after 2 transitions ( $ \pi_2 $ ), we can use the equation $ \pi_{n+1} = \pi_{n}P $:

$$ \pi_2 = \pi_0 P^2 $$

$$ \pi_2 = [0.5, 0.5] \times \begin{pmatrix} 0.8 & 0.2 \\ 0.4 & 0.6 \end{pmatrix}^2 $$

$$ \pi_2 = [0.5, 0.5] \times \begin{pmatrix} 0.8 & 0.2 \\ 0.4 & 0.6 \end{pmatrix} \times \begin{pmatrix} 0.8 & 0.2 \\ 0.4 & 0.6 \end{pmatrix} $$

$$ \pi_2 = [0.5, 0.5] \times \begin{pmatrix} (0.8 \times 0.8 + 0.2 \times 0.4) & (0.8 \times 0.2 + 0.2 \times 0.6) \\ (0.4 \times 0.8 + 0.6 \times 0.4) & (0.4 \times 0.2 + 0.6 \times 0.6) \end{pmatrix} $$

$$ \pi_2 = [0.5, 0.5] \times \begin{pmatrix} 0.72 & 0.28 \\ 0.56 & 0.44 \end{pmatrix} $$

$$ \pi_2 = [0.5 \times 0.72 + 0.5 \times 0.56, 0.5 \times 0.28 + 0.5 \times 0.44] $$

$$ \pi_2 = [0.64, 0.36] $$

So after 2 transitions, the probability distribution over the two states is approximately $ \pi_2 = [0.64, 0.36] $. 

### **Invariant Distribution**

* Also known as the stationary distribution 

Definition: A distribution $\pi$ over $K$ is **invariant** for $P$ if $\pi P = \pi$
* In other words, $p$ does not change under the action of $P$
 
* So the probability distribution over the states of a Markov chain that remains unchanged over time

* Once a markov chain reaches its invariant distribution, the probability of finding the system in each state will stabilize, where further transitions will not alter probabilities

Note that if $\pi_0$ is invariant, then: 

$$ \pi_n = \pi_0 P^{n} = \pi_0 \forall n$$

Definition: A distribution $\pi$ over $K$ is **invariant** for $P$ if 

$$\pi P = \pi$$

Finding an invariant distribution: The condition $\pi P = \pi$ corresponds to $K$ linear equations: 

The following are referred to as the **balance equations**

$$ \pi(j) = \sum_{i \in K} \pi(i) P(i,j) $$

* For each state $j$, the probability of being in state $j$ after transition $(\pi(j))$ should equal the sum of the probabilities of being in all other states $(\pi(i))$ multiplied by the probability of transitioning from each of those states to state $j(P(i,j))$

* The primary purpose of these **balance equations** is to find the equilibrium distribution of the Markov chain
* Can help us assess if a Markov chain will converge to a unique equilibrium distribution

### **Balance Equations Example**


Let's consider a simpler example with a $(2 \times 2)$ transition matrix for a Markov chain.

Suppose we have the following transition matrix:

$$
P = \begin{pmatrix} 0.8 & 0.2 \\ 0.4 & 0.6 \end{pmatrix}
$$

To find the invariant distribution $(\pi = [\pi_1, \pi_2])$, we'll set up the balance equations for each state $(j)$:

1. For state 1:
   $$
   \pi_1 = \pi_1 \cdot 0.8 + \pi_2 \cdot 0.4
   $$

2. For state 2:
   $$
   \pi_2 = \pi_1 \cdot 0.2 + \pi_2 \cdot 0.6
   $$

We have the following system of equations:

1. For state 1:
   $$
   \pi_1 = 0.8\pi_1 + 0.4\pi_2
   $$

2. For state 2:
   $$
   \pi_2 = 0.2\pi_1 + 0.6\pi_2
   $$

To solve this system of equations, we can use various methods such as substitution, elimination, or matrix operations. Let's use substitution:

From equation 1, we can express $\pi_1$ in terms of $\pi_2$:

$$
\pi_1 - 0.8\pi_1 = 0.4\pi_2
$$
$$
0.2\pi_1 = 0.4\pi_2
$$
$$
\pi_1 = 2\pi_2
$$

Now, we substitute this expression for $(\pi_1)$ into equation 2:

$$
\pi_2 = 0.2(2\pi_2) + 0.6\pi_2
$$
$$
\pi_2 = 0.4\pi_2 + 0.6\pi_2
$$
$$
\pi_2 = 1\pi_2
$$

So, we have found that $\pi_2 = \pi_2$, which holds true for any value of $(\pi_2)$. This implies that $(\pi_2)$ can take any value.

Now, using the expression $(\pi_1 = 2\pi_2)$, we find that $(\pi_1 = 2\pi_2)$.

Therefore, the invariant distribution $(\pi = [\pi_1, \pi_2])$ can be any vector of the form $[2k, k]$, where k is any real number.

To find the specific values of k, we use the constraint $(\pi_1 + \pi_2 = 1)$:

$$
2k + k = 1
$$
$$
3k = 1
$$
$$
k = \frac{1}{3}
$$

Now, we can find $(\pi_1)$:

$$
\pi_1 = 2 \times \frac{1}{3} = \frac{2}{3}
$$

Therefore, the specific values of $(\pi_1)$ and $(\pi_2)$ are $(\frac{2}{3})$ and $(\frac{1}{3})$, respectively.

Hence, the invariant distribution $(\pi)$ is:

$$
\pi = \left[\frac{2}{3}, \frac{1}{3}\right]
$$

### **Convergence to Invariant Distribution**

Informal Theorem: Under mild conditions, a Markov chain converges to a unique invariant distribution, for any initial disribution $\pi_0$

There are two main conditions in order for a unique convergence to an invariant distribution 

**1) Irreduciblity**

Definition A Markov chain with transition matrix $P$ is **irreducible** if: 
* $\forall i, j \in K$ $\exists n$ such that $[P^n](i,j) > 0$
* So for each $i$ and $j$, there exists a path of transitions leading from $i$ to $j$ (we can reach any state from any other state, possible after some number of steps)

**2) Aperiodicity**

Definition: A Markov chain with transition matrix $P$ is **aperiodic** if 
* $\forall i, j \in K$ $\text{gcd} [n$ such that $[P^n](i,j) > 0] = 1$

* A chain does not return to a given state periodically, so the greatest common divisor of the set of all possible return times to a state is $1$

* If the gcd of these cycle lengths is greater than $1$, it means that the chain returns to the state in a periodic manner, with a specific period

### **Fundamental Theorem of Markov Chains**

If $P$ is irreducible and aperiodic, then it has a unique variant distribution $\pi$ with $\pi(i) > 0$ for all $i$. Also, the distribution after $n$ steps converges to $\pi$ as $n \rightarrow \infty$ for any initial distribution $\pi_0$

So: 
* $\forall P[X_n = i] \rightarrow \pi(i)$ as $n \rightarrow \infty$

