# Consinstently estimating Markov Chains with Noisy Aggregated Data

## Notebook con esperimenti numerici per il seminario di fine corso di Metodi Numerici per le Catene di Markov (versione non stazionaria)

In [5]:
# to avoid the pain of restarting the kernel each time
%load_ext autoreload
%autoreload 2

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


In [6]:
# Import some basic stuff
import numpy as np
import matplotlib.pyplot as plt
from utilities import P_mom_nonstationary, add_noise, generate_random_P

In [7]:
# Number of repeated observations
K = 30
# Population size
N = 100
# Number of states
S = 20
# Number of timesteps
T = 2000

### Generate the data

In [8]:
# True transition matrix
P = generate_random_P(S)

# Initial distribution
pi_0 = np.random.rand(S)
pi_0 = pi_0/pi_0.sum()

The function `create_observations` generates $K$ independent observations for each of the $T$ timesteps of the aggregate process.

In the non stationary case, this is done by generating $n_0\sim\mathrm{Multinomial}(N,\pi_0)$ at $t=0$. Then $n_t\sim\mathrm{Multinomial}(N,\mu_t)$ when $t>0$, where $\mu_t=\pi_0^TP^t$.

In [21]:
from utilities import create_observations

n_t_vector, y_t_vector, A = create_observations(T,K,N,
                                                pi_0,
                                                stationary=False,
                                                P=P,
                                                noise_type='binomial',
                                                alpha=0.2)

#print(f'n_t_vector.shape = {n_t_vector.shape}, (T, K, S) = ({T}, {K}, {S})')

**OSS**: `n_t_vector` and `y_t_vector` are `np.ndarray`s of shape $T\times K\times S$. 

They can be thought as lists of length $T$ in which the item at position $t\in\{0,\dots,T-1\}$ is a $K\times S$ `np.ndarray` that contains the $K$ observations for timestep $t+1$ 

### Estimators of $P$

In [24]:
# Let's fix a value of t, as an example
t = 1000
P_mom_t = P_mom_nonstationary(y_t_array = y_t_vector[t-1],
                              y_tp1_array = y_t_vector[t], 
                              A_t = np.eye(S), 
                              A_tp1 = np.eye(S),
                              N = N)
#print("The rows of P_mom_t sum to 1." if all(P_mom_t.sum(axis=1) == np.ones(S)) else "The rows of P_mom_t do not sum to 1.")

In [27]:
np.linalg.norm(P_mom_t-P,'fro')/(S**2)

0.00854914702747366