In [1]:
import numpy as np
import matplotlib.pyplot as plt

Regarding the notation used

- $\pi^{(i)} \rightarrow \,$ distribution corresponding to time step $\, i; \,$ $\, \pi^{(i)} = [P(X_i = 0) \quad P(X_i = 1) \quad P(X_i = 2) \quad \dots \quad P(X_i = r)]. \,$
- $\pi \rightarrow \,$ stationary distribution.
- $\pi_i \rightarrow \,$ long-run probability of being in state $\, i \,$ in the stationary distribution; $\, \pi_i = \underset{k \rightarrow \infty}{\text{lim}} P(X_k = i), \,$ where $\, k \,$ is the number of steps taken (i.e., the time step index). For example, if $\, k= 1 \, \text{million}, \,$ then $\, \pi_i \,$ is the probability that the Markov chain is in state $\, i \,$ at time step $\, 1 \, \text{million}. \,$

<h2>Task 1</h2>

**(a)** 

$$ P =
\begin{bmatrix} 
0.3 & 0.5 & 0.2 \\
0.3 & 0.4 & 0.3 \\
0.6 & 0.2 & 0.2
\end{bmatrix}
$$

**(b)**

Let $\, X_0 = 1. \,$ This means that the initial (probability) distribution is $\, \pi^{(0)} = [1 \quad 0 \quad 0]. \,$

Hence the distribution of $\, X_1 \,$ is $\, \pi^{(1)} = \pi^{(0)} P \,$ and the distribution of $\, X_2 \,$ is $\, \pi^{(2)} = \pi^{(1)} P. \,$

In [2]:
P = np.array([0.3, 0.5, 0.2, 0.3, 0.4, 0.3, 0.6, 0.2, 0.2]).reshape(3,3)
pi0 = np.array([1, 0, 0])[np.newaxis, :]

In [3]:
# The distribution of X1
pi0 @ P

array([[0.3, 0.5, 0.2]])

In [4]:
# The distribution of X2
(pi0 @ P) @ P

array([[0.36, 0.39, 0.25]])

**(c)**

In [5]:
def stationary_distr(n, pi, P):
    """
    Args:
        n: number of steps to take
        pi: initial (probability) distribution of each state
        P: transition matrix
    """
    for i in range(n):
        pi = pi @ P
    return pi.ravel()

In [6]:
stat_distr = stationary_distr(n=1000000, pi=pi0, P=P)

In [7]:
stat_distr

array([0.37168142, 0.38938053, 0.23893805])

**(d)**

A Markov chain is said to be *time reversible* if

$$ \large\pi_i P_{ij} = \pi_j P_{ji} \quad \quad \forall i,j \in \mathbb{S} $$

where $\, \pi \,$ is the stationary distribution of the Markov chain and $\, P \,$ is the transition matrix.

In [8]:
def check_reversibility(P, pi):
    forward = np.zeros(P.size)
    backward = np.zeros(P.size)
    k = 0
    for i in range(P.shape[0]):
        for j in range(P.shape[1]):
            forward[k] = pi[i] * P[i,j]
            backward[k] = pi[j] * P[j,i]
            k += 1
    return forward, backward

In [9]:
check_reversibility(P=P, pi=stat_distr)

(array([0.11150442, 0.18584071, 0.07433628, 0.11681416, 0.15575221,
        0.11681416, 0.14336283, 0.04778761, 0.04778761]),
 array([0.11150442, 0.11681416, 0.14336283, 0.18584071, 0.15575221,
        0.04778761, 0.07433628, 0.11681416, 0.04778761]))

- We can clearly see that the Markov chain is not time reversible, since the condition mentioned above does not hold (if it did hold, the two arrays should contain the same elements on each index position).

<h2>Task 2</h2>

In [10]:
def generate_Markov_chain(s, n):
    x = np.zeros(n)
    x[0] = 0
    for i in range(1, n):
        x[i] = np.random.choice(a=np.array([0, x[i-1]+1]), size=1, replace=False, p=np.array([s, 1-s])).item()
    return x

In [11]:
marko = generate_Markov_chain(s=1/2, n=100000)

**(a)**

In [13]:
print(f'Empirical state space: [{marko.min()}, {marko.max()}]')

Empirical state space: [0.0, 14.0]


The theoretical state space $\, \mathbb{S} \,$ consists of all nonnegative integers; $\, \mathbb{S} \subseteq \mathbb{Z}_{+}. \,$ This is because at each step, the chain either resets to 0 (with probability $\, s=0.5) \,$ or increments by 1 (i.e., moves from $\, i \,$ to $\, i+1) \,$ (with probability $\, 1-s = 0.5)).$

In a Markov chain, the transition matrix is constructed in a way that each element $\, P(i,j) \,$ represents a probability of moving from state $\, i \,$ to state $\, j. \,$ Hence, for any current state $\, i, \,$ the probability of moving to state 0 is 

$$ P(i,0) = \frac{1}{2}. $$

Likewise, the probability of moving up one state is

$$ P(i, i+1) = \frac{1}{2}. $$

For all the other states $\, j \ne 0, \,$ $\, j \ne i+1, \,$ we have

$$ P(i, j) = 0. $$

Putting these together we get

$$
P(i,j) =
\begin{cases} 
\frac{1}{2}, & \text{if} \, \, j=0 \, \, \text{or} \, \, j=i+1 \\
0, & \text{otherwise}
\end{cases}
$$

**(b)**

$\pi = [\pi^{(0)} \quad \pi^{(1)} \quad \dots \quad \pi^{(n)}] \quad \Rightarrow \quad$ these represent the long run proportions of that the Markov chain spends in each state $\, \pi^{(i)}. \,$

In [30]:
np.unique(marko, return_counts=True)[1] / marko.size

array([5.0121e-01, 2.5099e-01, 1.2421e-01, 6.2210e-02, 3.0790e-02,
       1.5310e-02, 7.5600e-03, 3.8500e-03, 1.9300e-03, 1.0200e-03,
       4.6000e-04, 2.8000e-04, 1.2000e-04, 4.0000e-05, 2.0000e-05])

In [32]:
(np.unique(marko, return_counts=True)[1] / marko.size).round(3)

array([0.501, 0.251, 0.124, 0.062, 0.031, 0.015, 0.008, 0.004, 0.002,
       0.001, 0.   , 0.   , 0.   , 0.   , 0.   ])

- The empirical distribution (above) is obtained by counting how many time each state appeared and dividing them by the total number of steps taken.

More generally, the stationary distribution satisfies

$$ \pi_j = \sum_{i} \pi_i P(i, j) \quad \forall j. $$

A more interesting inspection turns out to be again the theoretical distribution of $\, \pi, \,$ when $\, n \rightarrow \infty. \,$

$$ \pi_0 = \sum_{i=0}^{\infty} \pi_i P(i,0) = \sum_{i=0}^{\infty} \pi_i \cdot \frac{1}{2} = \frac{1}{2} \sum_{i=0}^{\infty} \pi_i = \frac{1}{2} \cdot 1 = \frac{1}{2}. $$

where the last equality follows from the fact that $\, \pi \,$ is a probability distribution, so its elements must sum up to 1.

The Markov chain we're dealing with is defined such that the only way to arrive at state 1 is from state 0, and the only way to arrive to state 2 is from state 1, and so on.

$$ \pi_1 = \pi_0 P(0,\boldsymbol{1}) = \frac{1}{2} \cdot \frac{1}{2} = \frac{1}{4} $$

$$ \pi_2 = \pi_1 P(1,\boldsymbol{2}) = \frac{1}{4} \cdot \frac{1}{2} = \frac{1}{8} $$

$$ \pi_3 = \pi_2 P(2,\boldsymbol{3}) = \frac{1}{8} \cdot \frac{1}{2} = \frac{1}{16} $$

and so on. We get a recursive formula for the stationary distribution

$$ \pi_i = \left(\frac{1}{2} \right)^{i+1}, $$

which matches the empirical result.

In [38]:
def f(i):
    return (1/2)**(i+1)

In [49]:
# With recursion formula
for i in range(len(np.unique(marko))):
    print(np.round(f(i), 3))

0.5
0.25
0.125
0.062
0.031
0.016
0.008
0.004
0.002
0.001
0.0
0.0
0.0
0.0
0.0


In [47]:
# Empirical result
for val in (np.unique(marko, return_counts=True)[1] / marko.size).round(3):
    print(val)

0.501
0.251
0.124
0.062
0.031
0.015
0.008
0.004
0.002
0.001
0.0
0.0
0.0
0.0
0.0


**(c)**

We'll use the stationary condition once again

$$ \pi_j = \sum_{i} \pi_i P(i, j) \quad \forall j. $$

The only way to get to state $\, j \ge 1 \,$ is from $\, j-1, \,$ and this happens with a probability of $\, s-1. \,$ Hence we have

$$ \pi_1 = s (1-s) $$

$$ \pi_2 = \pi_1 (1-s) = s (1-s) (1-s) $$

$$ \pi_3 = \pi_2 (1-s) = s (1-s) (1-s) (1-s) $$

and so on. Hence we get a recursive formula any element of the stationary distribution

$$ \pi_j = s (1-s)^j. $$

<h2>Task 3</h2>

- Let $\, X = (X_1,...,X_n) \,$ be our original sample. 
- A bootstrap sample $\, X^j = (X_1^j,...,X_n^j) \,$ is formed by drawing $\, n \,$ independent samples from $\, X \,$ with replacement. 
- Let $\, Y^j = (Y_1^j,...,Y_n^j) \,$ be a random vector that counts how many times each original observation $\, X_i \,$ appears in the bootstrap sample $\, X^j. \,$ Hence $\, Y_i^j \,$ indicates how many times $\, X_i \,$ is drawn (chosen) from $\, n \,$ draws in the bootstrap sample $\, X^j. \,$

We will assume that all of the original data points $\, X_1,...,X_n \,$ are equally likely to be drawn. Hence, for $\, n  \,$ draws, each of the original data points $\, X_1,...,X_n \,$ are drawn with a probability of $\, \frac{1}{n}. \,$ 

It is not hard to see when reading the first two sentences from [this](https://en.wikipedia.org/wiki/Multinomial_distribution) wikipedia page that

$$ Y^j \sim \text{Multinomial} \left(n, \frac{1}{n}, \frac{1}{n}, ..., \frac{1}{n} \right), $$

where $\, n \,$ is the number of trials (draws) and $\, \frac{1}{n}, \frac{1}{n}, ..., \frac{1}{n} \,$ are the probabilities of each event happening (i.e., the probability of drawing each of the original samples $\, X_1,...,X_n).$

Not sure what the latter question means. So if we generate $\, b \,$ bootstrap samples independently, each producing a random vector $\, Y^j, \, j=1,...,b, \,$ then we will have $\, b \,$ i.i.d. samples from $\, Y^j \sim \text{Multinomial} \left(n, \frac{1}{n}, \frac{1}{n}, ..., \frac{1}{n} \right).$

<h2>Task 4</h2>

In [2]:
def jackknife(X):
    n = X.shape[0]
    jackknife_samples = np.zeros((n, n-1))
    for i in range(n):
        jackknife_samples[i, :] = np.delete(X, i)
    return jackknife_samples

In [3]:
X = np.arange(1,11)    # Some random data
samples = jackknife(X=X)

In [4]:
samples

array([[ 2.,  3.,  4.,  5.,  6.,  7.,  8.,  9., 10.],
       [ 1.,  3.,  4.,  5.,  6.,  7.,  8.,  9., 10.],
       [ 1.,  2.,  4.,  5.,  6.,  7.,  8.,  9., 10.],
       [ 1.,  2.,  3.,  5.,  6.,  7.,  8.,  9., 10.],
       [ 1.,  2.,  3.,  4.,  6.,  7.,  8.,  9., 10.],
       [ 1.,  2.,  3.,  4.,  5.,  7.,  8.,  9., 10.],
       [ 1.,  2.,  3.,  4.,  5.,  6.,  8.,  9., 10.],
       [ 1.,  2.,  3.,  4.,  5.,  6.,  7.,  9., 10.],
       [ 1.,  2.,  3.,  4.,  5.,  6.,  7.,  8., 10.],
       [ 1.,  2.,  3.,  4.,  5.,  6.,  7.,  8.,  9.]])

- So there is no randomness involved in the jackknife subsamples once we have the original data locked in (X).
- At every iteration, we just leave one of the original data points out.

Looking at the diagonal of the matrix above, we'll see that the count matrix $\, Y \,$ takes the following form

In [15]:
n = X.shape[0]
Y = np.ones((n, n))
np.fill_diagonal(Y, 0)

In [16]:
Y

array([[0., 1., 1., 1., 1., 1., 1., 1., 1., 1.],
       [1., 0., 1., 1., 1., 1., 1., 1., 1., 1.],
       [1., 1., 0., 1., 1., 1., 1., 1., 1., 1.],
       [1., 1., 1., 0., 1., 1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 0., 1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1., 0., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1., 1., 0., 1., 1., 1.],
       [1., 1., 1., 1., 1., 1., 1., 0., 1., 1.],
       [1., 1., 1., 1., 1., 1., 1., 1., 0., 1.],
       [1., 1., 1., 1., 1., 1., 1., 1., 1., 0.]])

- As there is no randomness, this matrix is **deterministic**.
- So I guess this is a [degenerate distribution](https://en.wikipedia.org/wiki/Degenerate_distribution), as it only supports a single point.