In [2]:
import numpy as np
import numpy.linalg as la
import matplotlib.pyplot as plt

# Introduction to Markov Chains

A Markov chain is a mathematical model used to describe a set of states and the probability of transitioning between them. In this simple example, we use Markov chain to model the weather. We have two states to represent the possible weather for a day: Sunny and Snowy. After collecting weather data for many years, you observed that the chance of a snowy day occurring after one snowy day is 90% and that the chance of a snowy day after one sunny day is 70%.

We can see this visually with the following graph. Do you understand how we were able to obtain the other numbers? Recall that we are dealing with probabilities that should sum up to 100%.

<img src="weather_graph.png" width=446px></img>

This is a *directed graph* because *edges have direction*. We can represent this (unsurprisingly) using a matrix, similarly to how we created the adjacency matrix, using the following notation: the columns of the matrix represent outgoing edges, while the rows represent incoming edges:

<img src="weather_matrix.png" width=305px></img>

hence each entry of the matrix is given by:

$$ M_{ij} = \text{probability of moving from } j \text{ to } i $$

The matrix above is called the **Markov matrix**, which has the following properties:

- $M_{ij}$ entry of a transition matrix has the probability of transitioning from state $j$ to state $i$

- Since the entries are probabilities, they are always non-negative real numbers, and the columns should sum to 1.

**Try this!**

Write the matrix above as a 2d numpy array. Define it as the variable `M`.

In [8]:
#clear
M = np.array([[0.3, 0.1], [0.7, 0.9]])

Now that we have created the model, we can use it to calculate various probabilities. Let's say that today was a sunny day, which we can represent by a vector that is 100% sunny and 0% snowy.

**Try this!**

Write this initial vector as a 1d numpy array, where the first entry corresponds to Sunny and the second entry corresponds to Snowy. Recal that the sum of the states should be equal to 1. Define it as the variable `x`.

In [4]:
#clear
x = np.array([1.0, 0.0])

If we multiply our transition matrix by our state vector, we can find the probability of having each type of day tomorrow:

In [5]:
x1 = M @ x
x1

array([0.3, 0.7])

This doesn't give us any new information, so lets see what happens when we multiply the state vector again:

In [6]:
x2 = M @ x1
x2

array([0.16, 0.84])

Now, we have "simulated" the Markov chain twice, which tells us the weather probability in _two_ days.  What would happen if we multiplied our new vector by the matrix a large number of times?

**Try this!**

Write a loop to left-multiply (${\bf Mx}$) the state vector $15$ times, printing out each intermediate value. Start your iterations using the state vector defined above as `x`.

In [7]:
xc = x.copy()
# Write loop here
#clear
for i in range(15):
    xc = M @ xc
    print(xc)

[0.3 0.7]
[0.16 0.84]
[0.132 0.868]
[0.1264 0.8736]
[0.12528 0.87472]
[0.125056 0.874944]
[0.1250112 0.8749888]
[0.12500224 0.87499776]
[0.12500045 0.87499955]
[0.12500009 0.87499991]
[0.12500002 0.87499998]
[0.125 0.875]
[0.125 0.875]
[0.125 0.875]
[0.125 0.875]


You can see that for enough iterations we will eventually converge to a steady state ${\bf x}^* $, and multiplying this steady state by the Markov matrix will no longer modify the vector, i.e.

$$ {\bf M}{\bf x}^* = {\bf x}^* $$

Note that this is an eigensystem problem, where $(1,{\bf x}^*)$ is an eigenpair. Indeed, we  found the eigenvector of ${\bf M}$ with corresponding eigenvalue $\lambda = 1$!

Computing the eigenvector like this is called the [*Power Iteration method*](https://en.wikipedia.org/wiki/Power_iteration), and can be used to find the eigenvector that corresponds to the *dominant* eigenvalue (largest eigenvalue in magnitude).

**Check your answers!**

Implement the function `power_iteration()` that takes a matrix `M` and starting vector `x`, and computes the eigenvector corresponding to dominant eigenvalue (same as you have done above).

For simplicity, use $100$ iterations for your loop.

In [9]:
#grade_clear
#clear
def power_iteration(M, x):
    # Perform power iteration and return steady state vector xstar
    xc = x.copy()
    #clear
    for i in range(100):
        xc = M @ xc
    #clear
    return xc

Run your `power_iteration()` function on `M` and a new vector,
$$ {\bf x} = \begin{bmatrix} 0.5 \\ 0.5\end{bmatrix} $$

Do you get the same result as before?

In [10]:
power_iteration(M, np.array([0.5, 0.5]))

array([0.125, 0.875])

As long as the starting state vector `x` is normalized (the entries add up to one), the steady state solution will be the same. There is one caveat to this statement, which we will discuss in the next section.

Take a look at the code snippet below. Notice that the steady state solution does not change, regardless of the initial vector (here generated at random).

In [16]:
# run this as many times as you want, the bottom vector should always stay the same!
random_vector = np.random.rand(2)
random_vector /= np.sum(random_vector) # normalize

print(random_vector)
print(power_iteration(M, random_vector))

[0.28563721 0.71436279]
[0.125 0.875]
