# Markov Chain Monte Carlo

Used when you do not have independence in samples. The **markov property** looks at the most previous sample to help inform the next sample!

This is a stochastic process. Other processes like this are q-theory, brownian motion, and poisson process.

Monte carlo uses these simulated R.V.s to approximate integrals, etc. but the R.V. don't need to be independent in order to approximate integrals. MCMC constructs a dependent sequence of RV that can be used to approximate the integrals like the ordinary MC. The advantages of introducing this dependence is that very general "black box" algorithms (and corresponding theory) are available to perform the required simulations. This page will discuss some basics of Markov chains and MCMC but know that there are very important unanswered questions about how and when MCMC works.

## Definition

A markov chain is just a sequence of R.V. $\{ x_1, x_2, ... \}$ with a specific type of dependence structure. 
In particular, a Markov chain satisfies 

$$P(X_{n+1} \in B | X_1, ..., X_{n-1}, X_n) = P(X_{n+1} \in B | X_n)$$

where $X_{n+1}$ is the cuture, $X_1, ..., X_{n-1}$ is the past, and $X_n$ is the present.
Therefore, this property states that the future is only dependent on the present.
This is called the *markov property*.

Independence is a trivial Markov Chain.

From the markov property, we can argue that the probabilistic properties of the chain are completely determined by 

i. initial distribution for $X_0$

ii. the transition distribution, i.e. distribution of $X_{n+1}$ given $X_n$

Note: Assume that the markov chain is homogeneous (aka, the transition distribution does not depend on $n$). 

Example: **simple random walk**
Let $v_1, v_2, ...$ be iid $\sim Unif(-1,1)$

Set $x_0 = 0$ and $X_n = \sum_{i=1}^n U_i = X_{n-1} + U_n$.
The initial distribution is $P(X_0 = 0) = 1$.
The transition distribution is determined by $$x_n = \begin{cases} 
x_{n-1}-1 & prob.= 1/2\\
x_{n-1}+1 & prob.= 1/2\\
\end{cases}$$

While very simple, the random walk is an important example in probability theory, having connections to advanced things like Brownian Motion. In some conditions, random walk becomes brownian motion.

In [1]:
import numpy as np

# Task: predict next number

# Sample set 1: 
X = np.array([0,1,2,3,4,5,6,7,8,9]*100 + [3])

total_count_dict = {}
dependent_count_dict = {}
for i in range(len(X)-1):

    if X[i] not in total_count_dict:
        total_count_dict[X[i]] = 0
    total_count_dict[X[i]] += 1
    
    if X[i] not in dependent_count_dict:
        dependent_count_dict[X[i]] = {}
    if X[i+1] not in dependent_count_dict[X[i]]:
        dependent_count_dict[X[i]][X[i+1]] = 0
    dependent_count_dict[X[i]][X[i+1]] += 1

# Normalize each value in dependent_count_dict by total_count_dict
for k,d in dependent_count_dict.items():
    for key in d.keys():
        d[key] /= total_count_dict[k]

print("Transition Matrix:")
dependent_count_dict

Transition Matrix:


{0: {1: 1.0},
 1: {2: 1.0},
 2: {3: 1.0},
 3: {4: 1.0},
 4: {5: 1.0},
 5: {6: 1.0},
 6: {7: 1.0},
 7: {8: 1.0},
 8: {9: 1.0},
 9: {0: 0.99, 3: 0.01}}

## Brownian Motion

Multiple plays - i.e. Gambler's ruin problem.

## Discrete time Markov Chain (DTMC)

$P(X_{n+1}=j | X_n = i, X_{n-1}, ..., X_1, X_0) = P(X_{n+1}=j|X_n=i)$

(Markov property for DTMC)

If we have a homogeneous MC, then $P(X_{n+1}=j | X_n =i) = p_{ij} = P(X_1 = j|X_0 = i) = P(X_2=j | X_1 = i) = P(X_3=j|X_2=i) = ...$

We can put the $P_{ij}$ in a matrix over the state space $S = \{ 1, 2, ..., m \}$ into a transition matrix: 

$$
P = \begin{bmatrix}
p_{11} & p_{12} & ... & p_{1m}\\
p_{21} & p_{22} & ... & p_{2m}\\
\vdots &&&\\
p_{m1} & p_{m2} & ... & p_{mm}\\
\end{bmatrix}
$$

To describe completely a DTMC, we need $X_0$ (the initial state) and $P$ (transition probabilities).

Example: Two-state DTMC

$S = \{ 1, 2 \}$  and $P = \begin{bmatrix} \alpha&1-\alpha\\ 1-\beta&\beta\\ \end{bmatrix}$

where $0 < \alpha$ and $\beta < 1$

$p_{11} = \alpha = P(X_1=1 | X_0 = 1)$