# Packages installation

In [None]:
#install packages needed 
!pip install matplotlib
!pip install particles

# Importations

In [None]:
import matplotlib.pyplot as plt 
import numpy as np
import seaborn as sb
import scipy.stats as stats

# Modules from particles
import particles 
from particles import distributions as dists # Where proba distributions are defined
from particles import state_space_models as ssm # Where state-space-models are defined

# A word about what we are trying to achieve

Generally, for the type of problems we are speaking about, we deal with:
- An hidden variable $X_t$ that follows a markov process. This variable can be multivariate. It is characterized by a **transition kernel**.
- An observed variable $Y_t$ whose distribution depends on $X_t$. It is characterized by an **emission law**. 

The basic task is, given that we *know* the transition kernel, the emission law, and all their parameters, to recover the $X_t$ based on a sequence of $Y_t$ (*filtering* / *complete smoothing*).

A more challening tasks is to perform **bayesian inference**, that is, to estimate the posterior of the parameters given the data (and prior on the parameters). This task relies on particle MCMC algorithms. 

In our setting: 

- $X$ is composed of 
    - $u_t$, the expression level. It is the variable that, in real life, researchers are trying to recover. 
    - $s_t$, the local scaling term. A variable that follows a markov process independently from everything else and will be useful at some point. 
    - We might even add $a_t$ and $x_t$
- $Y$ is simply $y_t$, the read counts. It is basically **a noisy observation of $u_t$**. Its distribution is described by the **emission law** that involves $u_t$ and $s_t$. 

## What we have to do 

Given the level of complexity of the model, our assignment has changed. We can focus on:

1. Creating the model
    - Having a working subclass of Feynman Kac 
    - being able to generate data (should not be too hard)

2. Using the bootstrap filter

3. Using the PMMH algorithm to perform some bayesian inference

## How we are going to do it 

Since our model is pretty complicated, the previous approach subclassing (ssm.StateSpaceModel) will not work. 
Instead, we are going to define ourselves a Feynman-Kac model as explained here: 
https://particles-sequential-monte-carlo-in-python.readthedocs.io/en/latest/notebooks/Defining_Feynman-Kac_models_manually.html

- In the method M, we are going to define a way to sample from our X
    - If we do it smartly, we should be able to do it using a function that we could also use to generate the data 
- In the method logG, we are going to compute the loglikelihood 

!Note that xp and x are tables containing N particle (we will have to loop throught them)

## Various notes and tips

- We can further simplify the model if it is too complicated
- We could use **numba** (a package) to make the loops run faster
- Compute the loglikelihood, not the likelihood. When we have to sum likelihood, it is always possible to use the log_sum_exp trick
- Simulate the data using parameter's value that make sens (look in the articles) or ev


## Choice regarding the drifts
The explanations in the supplementary materials are unclear. What we are chosing to do is:

**Upward drift**

$u_{t+1} = u_t + Z, \quad Z \sim \mathcal{E}(\frac{\lambda_u}{u_t})$

**Downward drift**

$u_{t+1} = u_t - Z, \quad Z \sim \mathcal{E}(\frac{\lambda_d}{u_t})$

## Regarding the emission law

**We have two definitions of it**
- The 1st will be useful to generate the data 
- The 2nd will be useful to compute the likelihood (we can truncate the sum at 10 or 30)

**Other tips**
- Likelihod of a truncated negative binomial: compute it the usual way, then divide by (1 - probability of 0)
- Poisson law truncated in 0: keep generating until you gave a non-zero value