# Consinstently estimating Markov Chains with Noisy Aggregated Data

## Notebook con esperimenti numerici per il seminario di fine corso di Metodi Numerici per le Catene di Markov (versione non stazionaria)

In [2]:
# Import some basic stuff
import numpy as np
import matplotlib.pyplot as plt
from utilities import P_mom_nonstationary, add_noise, generate_random_P

In [3]:
# Number of repeated observations
K = 30
# Population size
N = 100
# Number of states
S = 20
# Number of timesteps
T = 2000

### Generate the data

In [4]:
# True transition matrix
P = generate_random_P(S)

# Initial distribution
pi_0 = np.random.rand(S)
pi_0 = pi_0/pi_0.sum()

Let's generate the observed data. We immediately generate the $K$ observations.

This is done by generating $n_0\sim\mathrm{Multinomial}(N,\pi_0)$ at $t=0$. Then $n_t\sim\mathrm{Multinomial}(N,\mu_t)$ when $t>0$, where $\mu_t=\pi_0^TP^t$.

In [5]:
mu_t = pi_0.T
n_t_vector = []
y_t_vector = []
for t in range(T):
    # create K observations of the observed data (multinomial draw)
    n_t = np.random.multinomial(n=N, pvals=mu_t, size=K)
    # create noisy observations
    y_t, _ = add_noise(n_t)
    # append the observations
    n_t_vector.append(n_t)
    y_t_vector.append(y_t)
    # update the distribution of x_t for the next iteration
    mu_t = np.dot(mu_t,P)

**OSS**: `n_t_vector` and `y_t_vector` are lists of length $T$, the item in the list in position $t\in\{0,\dots,T-1\}$ is a $K\times S$ `np.ndarray` that contains the $K$ observations for timestep $t+1$ 

In [7]:
type(y_t_vector[0]), y_t_vector[0].shape

(numpy.ndarray, (30, 20))

### Estimators of $P$

In [6]:
# Let's fix a value of t, as an example
t = 1000
P_mom_t = P_mom_nonstationary(y_t_array = y_t_vector[t-1],
                              y_tp1_array = y_t_vector[t], 
                              A_t = np.eye(S), 
                              A_tp1 = np.eye(S),
                              N = N)
#print("The rows of P_mom_t sum to 1." if all(P_mom_t.sum(axis=1) == np.ones(S)) else "The rows of P_mom_t do not sum to 1.")

All sanity checks passed successfully.


In [9]:
np.linalg.norm(P_mom_t-P)

3.3102760572268592