# Stochastic Models in Neurocognition

## Class 7 - STATISTICS FOR PROCESSES

<hr>

**Preliminary Notes**:

Homework to send to Josue & Etienne before January 9th.

<hr>

## 1 - Likelihood

For a given model, $X\sim\mathbb{P}_\theta$, $g_\theta(x)=\mathbb{P}_\theta(X=x)$, one can compute the likelihood: $$\theta\rightarrow\mathcal{L}_\theta(X)=\mathbb{P}_\theta(X=x)|_{x=X}$$ 

The MLE is $\hat{\theta}=\underset{\theta\in\Theta}{argmax}\,\,\mathcal{L}_\theta(X)=\underset{\theta\in\Theta}{argmax}\,\,\mathcal{l}_\theta(X)$ with $l$ the log-likelihood.

One need $\theta$ to be the *minimal possible set of parameters*. 

### 1.1 Markov Chain

**Setup**: Markov chain with two states $X_0, X_1, ..., X_n$ with values $\{0,1\}$. We have full knowledge of the setup if we know $P=P(X_0 = 0)$ and $P_{0\rightarrow1}$ and $P_{1\rightarrow0}$.

**Example**: Binned spike trains (e.g. 0001000100100110..., at least one spike has happened at step 4) with a refractory period (after a spike, a neuron is less likely to produce another spike). 

<span style="color:red">ADD GRAPH (state 0 to 1 and back)</span>

**Parameter space:**

$$\theta = (P, P_{0\rightarrow1}, P_{1\rightarrow0})$$

For a given realisation $x_0,...,x_n$:

\begin{align}
P_\theta((X_0, ..., X_n) &= (x_0,...,x_n)) = P(X_0=x_0, ..., X_n=x_n)\\
&=P_\theta(X_0=x_0, ..., X_n-1=x_n-1).P_\theta(X_n=x_n|\text{all the point from 0 to n-1})\quad\text{markov property: $P_\theta(X_n=x_n|X_{n-1}=x_{n-1})$}\\
&=P_\theta(X_0=x_0)*...*P_\theta(X_n=x_n|X_{n-1}=x_{n-1})\\
&=p^{x_0}(1-p)^{1-x_0}.p_{0\rightarrow1}^{n_{0\rightarrow1}}.p_{1\rightarrow0}^{n_{1\rightarrow0}}.p_{0\rightarrow0}^{n_{0\rightarrow0}}.p_{1\rightarrow1}^{n_{1\rightarrow1}}\\
\end{align}

With $n_{i\rightarrow j}$ the number of transition from i to j seen in the sequence $x_0,...,x_n$.

**Likelihoods:**

So the likelihood of a sequence $X_0,...,X_n$ with counts $N-{i\rightarrow j}$ is the number of times a transition $i\rightarrow j$ occurs in the sequence is:

$$L_\theta(X)=p^{X_0}(1-p)^{1-X_0}.p_{0\rightarrow1}^{n_{0\rightarrow1}}.(1-p_{0\rightarrow1})^{n_{1\rightarrow0}}.p_{1\rightarrow0}^{n_{0\rightarrow0}}.(1-p_{1\rightarrow0})^{n_{1\rightarrow1}}$$

The Log-likelihood is:

$$l_\theta(X) = X_0log(p) + (1-X_0)log(1-p) + N_{1\rightarrow0}log(p_{1\rightarrow0}) + N_{1\rightarrow1}log(1 - p_{1\rightarrow0}) + N_{0\rightarrow0}log(1 - p_{0\rightarrow1}) + N_{0\rightarrow1}log(p_{0\rightarrow1})$$ 


<span style="color:red">FINISH FORMULAS ABOVE</span>

**MLE of the transition**:

$\frac{\delta l_\theta(x)}{\delta p_{1\rightarrow0}} = \frac{N_{1\rightarrow0}}{p_{1\rightarrow0}} - \frac{N_{1\rightarrow1}}{1 - p_{1\rightarrow0}}$

<span style="color:red">ADD FORMULA + Explanation</span>

Hence:

$$\hat{p}_{1\rightarrow0} = \frac{N_{1\rightarrow0}}{N_{1\rightarrow1}+N_{1\rightarrow0}}$$

i.e. the number of times "1 to 0" occured divided by the number of times "1" was seen. Similarly:

$$\hat{p}_{0\rightarrow1} = \frac{N_{0\rightarrow1}}{N_{0\rightarrow1}+N_{0\rightarrow0}}$$

In general (for tutorial), **without further constraint, the MLE for the transition i to j is equal to** $\frac{N_{i\rightarrow j}}{\#\{i\rightarrow j\}}$

<span style="color:red">vERIFY FORMULA</span>

> This estimator is consistent if the chain is recurrent. Recurrence means that the chain will visit the transition an infinite number of times when $n\rightarrow+\infty$. 

<span style="color:red">ADD GRAPH</span>

Usually when seeing a graph, one can guess what are the recurrent states

<span style="color:red">ADD GRAPH</span>

Behind the fact that $\hat{p}_{i\rightarrow j}\underset{n\rightarrow+\infty}{\rightarrow}p_{i\rightarrow j}$ if i and j are recurrent, then you have the erg <span style="color:red">FINISH</span>

### About Continuous Markov Processes with discrete states

<span style="color:red">ADD GRAPH SLIDE 9</span>

The MLE is $$\hat{\lambda}_{i\rightarrow j} = \frac{1}{\text{mean duration of being in state i and jumping to state j}}$$
It is consistant if the chain is recurrent.

### 1.2 Point processes

We observe a point process $N$ with conditional intensity $\lambda(.)$ that depends only on the previous point of $N$. If we observe $N=\{t_1,...,t_n\}$ with $T_{max}$. 

<span style="color:red">ADD GRAPH + formula slide 10</span>

thanks to thinning, one knows that $N$ can be generated as follows:

<span style="color:red">ADD slide 11, 12</span>

So the likelihood of $N$ is:

\begin{align}
L_\lambda(N) &= \prod_{T\in N}\lambda(t).e^{-\int_0^{T_{max}}\lambda(t)dt}\\
\end{align}

The log-likelihood of $N$ is:

\begin{align}
l_\lambda(N) &= \sum_{T\in N}log(\lambda(t))-\int_0^{T_{max}}\lambda(t)dt\\
&= \int_0^{T_{max}}log(\lambda(t))dN_t-\int_0^{T_{max}}\lambda(t)dt\\
\end{align}

<u>**Homogeneous Poisson Process on $[0,T_{max}]$**</u>

We observe $N$ a Poisson process with intensity (fixed) $\theta\in\mathbb{R}_+$. The conditional intensity is therefore $\lambda(t)=\theta$. 

\begin{align}
l_\lambda(N)&= \int_0^{T_{max}}log(\lambda(t))dN_t-\int_0^{T_{max}}\lambda(t)dt\\
l_\theta(N)&=log(\theta)N_{[0,T_{max}]}-\theta T_{max}\\
\frac{\delta l_\theta}{\delta\theta}&= \frac{1}{\theta}N_{[0,T_{max}]}-T_{max}
\end{align}

We verify the signs of the derivative and:

$$\hat{\theta}=\frac{N_{[0,T_{max}]}}{T_{max}}$$

This is the **classical estimator of the firing rate**.

<u>**Non-homogeneous Poisson Process on $[0,T_{max}]$**</u>

If we see $N_1, ..., N_n$ n IID Poisson process with intensity $f(t)$, then $N_1\cup...\cup N_n$ is a Poisson process with intensity $nf(t)$ (the sum of the intensities). 

The asymptotic here is when $n\rightarrow+\infty$ because one will have an increasing amount of points everywhere.

<span style="color:red">ADD GRAPH + formulas below</span>

\begin{align}
l_\lambda(N) &= \int_0^{T_{max}}log(\lambda(t))dN_t-\int_0^{T_{max}}\lambda(t)dt\\
l_{[a, b]}(N) &= log(na)N_{[0,\frac{Tmax}{2}]} + log(nb)N_{]\frac{Tmax}{2}, Tmax]}-na\frac{Tmax}{2} - nb\frac{Tmax}{2}\\
\hat{a} &=\\
\hat{b} &=
\end{align}

One can also select models in general:

- by using AIC: $m\in\mathcal{M}$ with $d_m$ degrees of freedom then $\hat{m}=\underset{m\in\mathcal{M}}{argmin}\, -l_m(N) + d_m$
- or if it is possible find out with iid trials with cross-validation

<u>Remark</u> log-likelihood is not the only contrast for point processes. Least Square exists, this is: $$-2\int^{Tmax}_0\lambda(t)dN_t + \int_0^{Tmax}\lambda(t)^2dt$$

## 2 -Time rescaling theorem and the goodness-of-fit tests

### Time rescaling theorem (also time change theorem, Wattanabe theorem)

> If $N$ is a point process of conditional intensity $\lambda(t)$ with **compensator** $\Lambda(t)=\int_0^t\lambda(u)du$ on $[0,Tmax]$ then the process $N'=\{\Lambda(T)\text{ for }T\in N\}$ is a Poisson process of rate 1 on $[0,\Lambda(Tmax)]$

<u>Remark:</u>

$t\rightarrow N_t=N_{[0,Tmax]}$, a counting process. increasing. Then $t\rightarrow \Lambda(t) = \int_0^{Tmax}\lambda(u)du$. $\Lambda(t)$ is a compensator because $M_t = N_t - \Lambda(t)$ is a martingale. This means that $\mathbb{E}[M_t|\mathcal{F}_S]=M_S\text{ for }s<t$ with $\mathcal{F}_S$ the past until $S$.

$$\mathbb{E}[M_t|\mathcal{F}_0]=\mathbb{E}[M_t]=M_0 = N_0-\Lambda(0) = 0$$

$M_t$ is centered.

The compensator will follow $N_t$ more closely than its expectation -> "super centered" noise for the difference $N_t-\Lambda(t)$. Moreover:

$$\mathbb{E}[n_{[0, t]}] = \mathbb{E}[n_{[0, \Lambda(t)]}] = E[N_t] - E[\Lambda(t)]$$

the mean number of points on $[0, \Lambda(t)]$ is roughly $\Lambda(t)$ -> rate 1.

### Ogada's tests

I observe a process $N$ on $[0, Tmax]$ with intensity $\lambda$. I want to test whether $H_0:\lambda(t) = \lambda_0(t)$. 
- One computes the candidate compensator $\Lambda_0(t)=\int^t_0\lambda_0(t)dt$
- One computes $N'=\{\Lambda_0(t), T\in N\}$
- Under $H_0$, $N'$ should be a Poisson process with rate 1. 

$K$ tests are performed on the $N'$ points (with $K=2+L$) in order to check for the null hypothesis.

1) **First test**

 The points are uniformly distributed on $[0, \Lambda_0(Tmax)]$: Kolmogorov Smirnov test of uniformityon the points of Tmax
 
```R
ks.test
```
 
2) **Second test**
 
The delays between points are exponential with parameter 1: Kolmogorov Sirnov of exponentiality on the delays

3) **Third test**

Independence between the delays: autocorreletion tests with different lags

 
```R
acf.(,ci=1-0.05/K) # where K is the number of test
```

Be careful, at the end you have $2 + l$ test (with $l$ the number of lags), so correct with Bonferroni the level of the p-value (thus the division by K).

> **We reject either if the p-values in a or b are smaller than 0.05/K, or if c results in one of the computed correlation falling outside of the confidence interval on a visualization graph**

**In practice (simulation)** -- one have a new algorithm to simulate N with intensity $\lambda$. To check that it works, one computes $\Lambda(t)=\int_0^t\lambda(u)du$, then $N'=\{\Lambda(t), T\in N\}$, then Ogada's tests on $N'$. One wants large p-values to not reject.

**In practice (Real data)** -- one observe $N$. One thinks the model $\lambda_\theta$ is good but you don´t know $\theta$ and you want to verify it. 
- Then $\hat{\theta}$ (MLE, etc.) please verify in simulation of the model that $\hat{\theta}$ is a good estimator for the model.
- $\Lambda(t) = \int_0^t\lambda_\hat{\theta}(u)du$
- $\hat{N}'=\{\hat{\Lambda}(T), t\in N\}$
- Ogada's tests on $\hat{N}'$

Be careful this test is conservative:
- if you reject: really there is a problem with the model
- if you accept, we you don't know