# Models

List of models that could be implemented as part of the numerical experiments for the paper. Could be further expanded in the future when jump processes are considered.

Selection of appropriate forward and backward proposals can be reduced to the selection of an appropriate Linear SDE whose dynamics closely match that of the signal process.

# Ornstein Uhlenbeck (1-D)

# Model:
$$dX(s) = -\rho X(s) ds + \phi dB(s) \qquad X(0) = 0$$

$$f_t: Y_t | E_t = e_t \sim \mathcal{N}(e_t, \eta^2) \qquad \qquad \theta = (\rho, \phi, \eta^2)$$

## Create a notebook for 1-D OU filtering:

### Proposal Choices - Forward Proposals:

The only choice that needs to be made for the forward proposal, is a choice of Linear SDE, with a known, Linear Gaussian transition density:

- Optimal Proposal (corresponds to linearisation)
- Scaled Brownian Motion Proposal
- Unscaled Brownian Motion Proposal

### Proposal Choices - Backward Proposals:

A choice of linear SDE with known linear Gaussian transition density (that satisfies the matching condition) immediately implies a full backward proposal.

The constant diffusion coefficient means that the Delyon-Hu bridge is easy to implement, and doesn't involve the final stochastic integral. Instead, it only involves the expression given in Roberts and Papaspiliopoulos (2012). 

#### Evaluation of Impact of Bridge Choice on Proposals

- Optimal End Point Proposal, True Diffusion Bridge of OU Process (OU-SDE for end point and bridge)
- Optimal End Point Proposal, Scaled Brownian Bridge
- Optimal End Point Proposal, Unscaled Brownian Bridge  
- Optimal End Point Proposal, Delyon-Hu Bridge

#### Evaluation of Impact of End Point choice on Proposals

- Optimal End Point Proposal, True Diffusion Bridge of OU Process (OU-SDE for end point and bridge)
- Scaled Brownian Proposal, True Diffusion Bridge of OU Process (OU-SDE for end point and bridge)
- Unscaled Proposal, True Diffusion Bridge of OU Process (OU-SDE for end point and bridge)
- Driftless End Point Proposal, True Diffusion Bridge of OU Process (OU-SDE for end point and bridge)

#### Other interesting Proposals

- Scaled Brownian End Point Proposal, Scaled Brownian Diffusion Bridge (Scaled Brownian SDE for end point and bridge)
- Scaled Brownian End Point Proposal, Delyon-Hu Bridge
- Unscaled Brownian End Point Proposal, Unscaled Brownian Diffusion Bridge
- Unscaled Brownian End Point Proposal, Delyon-Hu Bridge 

## Plots: - Look through the results and decide how best to communicate the ideas

- ESS for a single run
- Root MSE of estimator of the first moment over time across different particle filters
- Boxplots of log likelihood estimate time $T$

# To do
## Filtering
- Get the filtering working on the TV CDSSM for the VanDerMeulen - Schauer bridges.

## Smoothing
- Run some experiments on the smoothing fk models that you have working already. Setup the framework to test this on different SDEs.

## Particle Gibbs
- Start looking at running a parameter inference algorithm in one dimension.
- You can run PGBS, PG and PMMH all for the same model.

## Running in higher dimensions
- Add the higher dimensional processes
- Start building and implementing the models in higher dimensions.

## Smoothing Algorithms that require a density:

### Offline Smoothing: 

- FFBS
- FFBS-Reject
- FFBS-MCMC (recommended)
- FFBS-QMC
- Two-Filter Smoothing

### Online Smoothing:
- PaRIS $\mathcal{O}(NT)$
- Online Smoothing $\mathcal{O}(N^2T)$

### MCMC

- Particle Gibbs with Backward/Ancestral Sampling (can you implement ancestral too?)
- The higher dimensional methods covered in Finke et al and Finke & Corenflos

### Calclulation of Path Integrals with Vectorisation in 1D:

For Girsanov path integrals (forward proposals), the inputs are:

times: (num+1, )
X: (N, num+1)

`b_1` is the drift of the model sde
`b_2` is the drift of the proposal sde
`Cov` is the common covariance coefficient of the model and proposal sde. Taken in code from the model SDE.

The model sde object will always have float values for the parameters. 
The proposal sde will (in general) have dimension (N, ) values for its parameters.

For Delyon-Hu bridge path integrals (backward proposals), the inputs are:

times: (num+1, )
X: (N, num+1)

The auxiliary bridge has attribute x_end: of dimension (N, )
The Delyon-Hu bridge is a particular choice of bridge construction, so we don't introduce parameters of dimension (N, ).
The issue of calculations involving these 3 quantities is solved in the code by applying matrix transpose in various places
to ensure that the broadcasting occurs correctly.

For van-der-Meulen Schauer bridge path integrals (backward proposals): the inputs are:

- times: (num+1, )
- X: (N, num+1)
- 
`b`, `b_tilde`, `Cov`, `Cov_tilde`, `r`, `H`

### The Bridge of the Optimal Forward Proposal of the OU process:

$$M_t^{\rightarrow}[e_{t-1}, dv_t] \qquad dX_t(s) = b_t^{\rightarrow, opt}(s, X_t(s); y_t)dx + \sigma_t(s, X_t(s))dB(s) \qquad X(0) = e_{t-1}$$

In this case, we have the following expression for $b_t^{\rightarrow}$:

$$b_t^{\rightarrow, opt}(s, x; y_t) = -\rho x + \phi^2 a(s, \Delta_t)\frac{y_t - a(s, \Delta_t)x - b(s, \Delta_t)}{L^2v(s, \Delta_t) + \sigma_Y^2}$$


- $a(s, t) = e^{-\rho(t-s)}$
- $b(s, t) = \mu (1-e^{-\rho(t-s)})$
- $c(s, t) = \frac{\phi^2}{2\rho}(1 - e^{-2\rho (t-s)})$

This forward proposal, is itself a Linear SDE, with the following expressions for $A(t)$, $B(t)$, $C(t)$:

- $A(s) = \phi^2 a(s, \Delta_t)\frac{y_t - b(s, \Delta_t)}{L^2v(s, \Delta_t) + \sigma_Y^2}$
- $B(s) = -\rho - \phi^2 \frac{a(s, \Delta_t)^2}{L^2v(s, \Delta_t) + \sigma_Y^2}$
- $C(s) = \phi$

We want to choose our auxiliary bridge process to be the diffusion bridge of the proposal SDE. Setting $K = L^2 \frac{\phi^2}{2\rho}$, this gives the following complete expressions:

- $A(s) = \phi^2 e^{-\rho (\Delta_t-s)}\frac{y_t - \mu (1-e^{-\rho(\Delta_t-s)})}{K(1 - e^{-2\rho (\Delta_t -s)}) + \sigma_Y^2}$
- $B(s) = -\rho - \phi^2 \frac{e^{-2\rho (\Delta_t-s)}}{K(1 - e^{-2\rho (\Delta_t -s)}) + \sigma_Y^2}$
- $C(s) = \phi$

Recall that the general Linear SDE, given by:

$$dX(s) = A(s) + B(s)X(s)ds + C(s)dB(s) \qquad X(0) = x(0)$$

can by solved analytically by using the integrating factor:

$$dY(s) = -B(s)Y(s)ds \qquad Y(0) = 1.$$

by applying Ito's Formula, we define $Z(s) = X(s)Y(s)$, and thus obtain that:

$$dZ(s) = A(s)Y(s)ds + C(s)Y(s)dB(s) \qquad Z(0) = x(0)$$


Hence, we have: defining $U(s) = Y(s)^{-1}$, for $s_2 > s_1$:

$$X(s_2) = X(s_1)\frac{Y(s_1)}{Y(s_2)} + U(s_2)\int_{s_1}^{s_2} A(u)Y(u) du + U(s_2)\int_{s_1}^{s_2} C(u)Y(u) dB(u)$$


The general solution to this ODE, is of the form:

$$Y(s) = \exp[-\int_0^s B(u) du]$$

$$I(s) = -\int_0^s B(u) du$$

For the particular choice of $B(u)$, the integral can be solved analytically, and is given by (integral done with Chat GPT):

$$I(s) = \rho s + \frac{\phi^2}{2\rho}\log(\frac{K + \sigma_Y^2 - Ke^{-2\rho \Delta_t}}{K + \sigma_Y^2 - Ke^{-2\rho (\Delta_t-s)}})$$

$$Y(s) = e^{\rho s}(\frac{K + \sigma_Y^2 - Ke^{-2\rho \Delta_t}}{K + \sigma_Y^2 - Ke^{-2\rho (\Delta_t-s)}})^{\frac{\phi^2}{2\rho}}$$

To evaluate the transition density of this Linear SDE exactly, we would be required to analytically evaluate the integrals, for $s_2 > s_1$

$$I_1(s_1, s_2) = \int_{s_1}^{s_2} A(u)Y(u) du \qquad I_2(s_1, s_2) = \int_{s_1}^{s_2} C^2(u)Y^2(u)du$$

Both of these integrals cannot be evaluated analytically, so can cannot analytically find the transition density of the Optimal Forward Proposal of the OU SDE. Thus, we cannot find analytically an expression for the true diffusion bridge of the Optimal Forward Proposal, due to the appearance of the transition density in the drift function. A possible solution to this problem, would be to evaluate the integrals by using numerical methods. This is a possible extension to the code that could be considered in the future. 

Possible code extension:

Choose a certain level of imputation $\Delta s = \frac{\Delta_t}{K}$, where $K$ is the number of imputed points, and then, when the Linear SDE is initialised, calculate:

- $B(s)$ at each point on the imputed grid. Use this to evaluate numerically $Y(s)$ at every point on the imputed grid.
- $A(s)Y(s)$ at each point on the imputed grid. Use these points to evaluate numerically $I_1(0, s)$ at every point on the imputed grid.
- $C(s)^2 Y(s)^2$ at each point on the imputed grid. Use these points to evaluate numerically $I_2(0, s)$ at every point on the imputed grid.

These output can then be used to construct the usual functions $a(s,t)$ $b(s, t)$ and $v(s, t)$ for the Linear SDE, as long as the input points $s$ and $t$ are on the imputed grid.

Using this approach, we can find numerically the transition density of a general 1D linear SDE, which can be used to construct an approximation of the diffusion bridge, or to construct forward proposals.

Note that $K$ must be chosen such that `num` divides $K$. The simplest approach would be to take thm to be equal. 

The same principle can be applied to construct a numerical approximation of the transition density in the case where $B(s) = 0$. In this case, we only need to evaluate the integrals $I_3(0, s)$ and $I_4(0, s)$, where these integrals are given by:

$$I_3(s_1, s_2) = \int_{s_1}^{s_2} A(u) du \qquad I_4(s_1, s_2) = \int_{s_1}^{s_2} C^2(u)du$$

However, we should leave this idea for now, since this level of generality is not required.

The function that we are linearising, is given by:

$$b_t^{\rightarrow, opt}(s, x; y_t) = -\rho x + \phi^2 a(s, \Delta_t)\frac{y_t - a(s, \Delta_t)x - b(s, \Delta_t)}{L^2v(s, \Delta_t) + \sigma_Y^2}$$

Taking the partial derivative w.r.t x, we obtain:

$$\tilde{b}_t^{\rightarrow, opt}(s, x; y_t) = -\rho -  \phi^2 \frac{a^2(s, \Delta_t)}{L^2 v(s, \Delta_t) + \sigma_Y^2}$$

To locally linearise the process in the code, we take:

- $\tilde{A} = b_t(\Delta_t, x_{end}; y_t) - \tilde{b}_t^{\rightarrow, opt}(\Delta_t, x_{end}; y_t)*x_{end}$
- $\tilde{B} = \tilde{b}_t^{\rightarrow, opt}(\Delta_t, x_{end}; y_t)$

So, we therefore have:

- $\tilde{A} = -\rho x_{end} + \phi^2 \frac{y_t - x_{end}}{\sigma_Y^2} + \rho x_{end} + \phi^2 \frac{x_{end}}{\sigma_Y^2} = \frac{y_t}{\sigma_Y^2}$
- $\tilde{B} = -\rho - \phi^2 * \frac{1}{\sigma_Y^2}$
- $\tilde{C} = \phi$


So we have that $\tilde{\rho} \tilde{\mu} = \frac{y_t}{\sigma_Y^2}$ and we have that $\tilde{\rho} = \rho + \phi^2 \frac{1}{\sigma_Y^2}$

Hence, the local linear OU approximation of the optimal forward proposal for the OU process, is given by:

- $\tilde{\rho} = \rho + \phi^2 \frac{1}{\sigma_Y^2}$
- $\tilde{\mu} = \frac{y_t}{\sigma_Y^2}(\rho + \frac{\phi^2}{\sigma_Y^2})^{-1}$

Say that we have a multivariate linear, Gaussian transition density:
$$X_t | X_s = x_s \sim \mathcal{N}(Ax + b, \Sigma)$$

We can find the gradient of the transition density to be:

$$ \nabla_{x_s} \log(p_{s, t}(x_t|x_s)) = [A^T\Sigma^{-1}(x_t - b) -A^T\Sigma^{-1}Ax_s]$$

Where $K = A^T\Sigma^{-1}A$

# Notes on the filtering results from the TV CDSSM

## Bootstrap DA

- All reparameterised model results appear to match the original parameterisation.
- the results from all DA Bootstrap filters match results from the filter without the use of data augmentation.

## Forward Proposals

- Use of any auxiliary bridge other than Delyon & Hu causes the results from reparameterised fk models to not match the original parameterisation.
- Some (negative) bias is introduced into estimates of the log likelihood.
- The filters track the filtering distribution well, except at the last timestep.
- For the fk models that are not working, the ESS drops significantly in the first timestep, sometimes all the way to 0.

## Backward Proposals

- Use of any auxiliary bridge other than Delyon & Hu causes the results from reparameterised fk models to not match the original parameterisation.
- Some (postive) bias is introduced into estimates of the log likelihood.
- The filters track the filtering distribution well, except at the last timestep.


## Conclusions

- The DA bootstrap models should be very close to the standard bootstrap. The difference may be caused by bias introduced by the imputation. Increasing the number of imputed points may help with this.
- For the backward proposals, given the observation noise is low, they should be able to propose points that are very close to $y_T \approx 0.44$. Instead, they all propose around $0.32$, which is close to the filtering mean at time $T-1$. 

# Score function of the OU process

$\theta = (\rho, \mu, \phi, \eta)$

$E_t | E_{t-1} = e_{t-1} \sim \mathcal{N}(a(\theta)e_{t-1} + b(\theta), v(\theta))$

$\nabla_{\theta}\log(p_\theta(x_t|x_{t-1})) = -\frac{1}{2}v'(\theta) - \frac{1}{2}\frac{v(\theta)k'(\theta) - k(\theta)v'(\theta)}{v(\theta)^2}$

$k(\theta) = (x_t - a(\theta)x_{t-1} - b(\theta))^2$

$k'(\theta) = -2(a'(\theta)x_{t-1} + b'(\theta))(x_t - a(\theta)x_{t-1} - b(\theta))$

Given these expressions, all that is left to do is take the derivatives of $a$, $b$ and $v$ with respect to $(\rho, \mu, \phi)$

$v(\theta) = \frac{\phi^2}{2\rho}(1-e^{-2\rho(t-s)}) \quad \frac{dv}{d\rho} = \frac{\phi^2}{2\rho}(2((t-s)+\frac{1}{2\rho})e^{-2\rho(t-s)}-\frac{1}{\rho}) \quad \frac{dv}{d\phi} = \frac{\phi}{\rho}(1-e^{-2\rho(t-s)}) \quad \frac{dv}{d\mu} = 0$

$a(\theta) = e^{-\rho(t-s)} \quad \frac{da}{d\rho} = -(t-s)e^{-\rho(t-s)} \quad \frac{da}{d\phi} = 0 \quad \frac{da}{d\mu} = 0$

$b(\theta) = \mu(1-e^{-\rho(t-s)}) \quad \frac{db}{d\rho} = \mu (t-s)e^{-\rho (t-s)}  \quad \frac{db}{d\phi} = 0 \quad \frac{db}{d\mu} = 1 - e^{-\rho(t-s)}$

# Particle MCMC for CDSSMs

## The Model: The OU-CDSSM process in 1D:

Consider the Ornstein-Uhlenbeck process in one-dimension:

$$dX(s) = -\rho X(s) ds + \phi dB(s) \qquad X(0) = 0.$$ (1)

This process is observed at $T$ discrete times $s_1 < s_2 < \dots < s_T$, with noise. So we are working with a continuous time set $\mathcal{S} = [0, S]$ and a discrete one $\mathcal{T} = \{1, \dots T\}$. We set $E_t = X(s_t)$

We assume for simplicity that observation times are equidistant $s_t - s_{t-1} = \Delta_t = 1$, and we define each observation density $f_t(y_t|e_t)$ to be linear, Gaussian:

$$Y_t |E_t = e_t \sim \mathcal{N}(e_t, \eta^2)$$ (2)

Equation (1)-(2) defines a CD-SSM with parameters $\theta = (\rho, \phi, \eta^2)$. As the OU process is a linear SDE, it has a tractable transition density $p_t(e_t|e_{t-1})$, given by:

$$E_t|E_{t-1} = e_{t-1} \sim \mathcal{N}(e^{-\rho \Delta s}e_{t-1}, \frac{\phi^2}{2\rho}(1 - e^{-2\rho \Delta s}))$$

Thus, the random variables $(E_{1:T}, Y_{1:T})$ form a linear, Gaussian state space model, and it is possible to derive analytically the filtering and smoothing distributions, through Kalman filtering and smoothing. One can also implement particle filters/smoothers on discrete space, without data augmentation, with the optimal proposal being analytically tractable. Finally, the data augmentation approaches outlined in the contribution can be used.

## Model Reparameterisation

We introduce a reparaemterisation of the CDSSM, so that the distribution of $(E_{1:T}, Y_{1:T})$ can be defined similarly through the following updates: 

$$E_t = aE_{t-1} + \epsilon_t \quad \quad \epsilon_t \sim \mathcal{N}(0, \sigma_X^2) \qquad (3)$$
$$Y_t = E_{t} + \eta_t \quad \quad \eta_t \sim \mathcal{N}(0, \sigma_Y^2) \qquad (4)$$

For $t \in \mathcal{T}$, where $X(0) = E_0 = 0$ is known. Under this model parameterisation, we define the parameter vector $\theta^\star = (a, \sigma_X^2, \sigma_Y^2)$. There is a bijective mapping between $\theta$ and $\theta^*$, so this can be considered as a reparameterisation of the CDSSM.

- $a = e^{-\rho \Delta s}$
- $\sigma_X^2 = \frac{\phi^2}{2\rho}(1 - e^{-2\rho \Delta s})$
- $\sigma_Y^2 = \eta^2$

Given data $Y_{1:T}$, by defining a prior on $\theta^*$ that we call $\nu(\theta^*)$ can setup a Bayesian inference problem to infer either $\theta^*$ or jointly the latent states $(E_{1:t}, \theta^*)$. We can use various MCMC algorithms that use Sequential Monte Carlo methods to approach this problem, known as Particle MCMC methods. As the original model is a CDSSM, we can also use the Data Augmentation methods that have been developed.

This reparameterisation is motivated by the application of these algorithms. Specifically, idealised algorithms exist for Linear, Gaussian state space models (e.g IMMH) that serve as idealised versions of Particle MCMC algorithms. It is more natural to consider such methods in the context of a natural parameterisation for a Linear, Gaussian State Space model. 

Further, under this parameterisation it is possible to select a conjugate prior for this model, assuming that both ($E_{1:T}, Y_{1:T}$) are observed. This selection enables one to do an update of $\theta | E_{1:T}, Y_{1:T}$ within a Gibbs sampler by sampling from the true conditional distribution, as opposed to using a Metropolis step within the sampler. 

## Selection of the Prior distribution - Go through this?

We define the prior distribution $\nu(\theta^*)$ as follows:

- $(\rho, \sigma_X^2) \sim NIG(\mu, \lambda, \alpha_X, \beta_X)$
- $(\sigma_Y^2) \sim IG(\alpha_Y, \beta_Y)$ # Note that it may be better to fix this parameter in practice (Alex advice)

Then, if we assume that the data for our model is $E_{1:T}, Y_{1:T}$, then the above prior is conjugate for this model. It is possible to analytically derive the posterior distribution of the parameter given the data for this model, it is the following:

- $(\rho, \sigma_X^2) | X_{1:T} \sim NIG(\mu', \lambda', \alpha', \beta')$
- $(\sigma_Y^2) |X_{1:T}, Y_{1:T} \sim IG(\alpha'_Y, \beta'_Y)$

Where we have that:

- $A = \sum_{t=0}^{T-1} x_t^2 + \lambda$
- $B = \lambda \mu + \sum_{t=1}^T x_t x_{t-1}$
- $C = \sum_{t=1}^T x_t^2 + 2\beta + \lambda \mu^2$

Then, expressing the posterior parameters for the distribution of $(\rho, \sigma_X) | X_{1:T}$ in terms of $A, B, C$:
- $\alpha' = \frac{T}{2} + \alpha$
- $\beta' = \frac{1}{2}(C - \frac{B^2}{A})$
- $\mu' = \frac{B}{A}$
- $\lambda' = A$

The posterior parameters of $\sigma_Y^2 | X_{1:T}, Y_{1:T}$ are given by:

- $\alpha_Y' = \alpha_Y + \frac{T}{2}$
- $\beta_Y' = \beta_Y + \frac{1}{2} \sum_{t=1}^T (y_t - x_t)^2$

With these results, we can now do some inference!

## Choice of MCMC Algorithms

Take as your example the OU process. Marginalising out the paths between the end points, such a process is a LGSSM, so we can implement the following MCMC algorithms to infer the parameter $\theta^*$ or jointly $(E_{1:T}, \theta^*)$:

- IMMH with access to the true marginal likelihood $p_{\theta}(y_{1:T})$ - **correct**
- PMMH with varying $N$ to test performance when there is low variance. - **correct**
- Single Site Gibbs - Conjugate theta update - **correct**
- Single Site Gibbs - MH theta update
- Particle Gibbs - Conjugate theta update - **correct**
- Particle Gibbs - MH theta update
- Particle Gibbs with the backward step - Conjugate Theta update - **correct**
- Particle Gibbs with the backward step - MH theta update

We then also have for CDSSMs:

- PMMH using Forward Proposal (no need for a reparameterised fk class as only filtering)
- Particle Gibbs - Forward Proposal - MH theta update
- Particle Gibbs - Backward Proposal - MH theta update
- PGBS - Forward Proposal - MH theta update
- PGBS - Backward Proposal - MH theta update

This gives a total of 13 different MCMC algorithms to debug.

- Full conditional update Gibbs (will require Kalman filter simulation)

Start off with this to familiarise yourself with the `mcmc.py` library, and then take it from there!

You have created the following new classes for general use:

- IMMH: Idealised Marginal MH algos, that can be applied for parameter inference on LGSSMs
- AutoGibbs: Automated full parameter updates using MWG steps - uses MetropoliswithinGibbs class.
- AutoParticleGibbs: PG with automated full parameter udpates using MWG steps - uses MetropoliswithinGibbs class.
- CDSSM_ParticleGibbs: PG using reparameterised CDSSM FK models, with automated full parameter updates, using MWG steps - uses CDSSM_MetropoliswithinGibbs class.

## First Results

I ran and debugged the algorithms for $T=100$ timesteps, using $1000$ MCMC steps. This resulted in the following output:

- **IMMH**: Run time: 4.8s, MSJD: 4.5
- **PMMH**: Run time: 10.7s, MSJD: 7.0
- **SSGibbs**: Run time: 0.4s, MSJD: 197874.9
- **SSAutoGibbs**: Run time: 41.2s, MSJD: 0.0
- **PGibbs**: Run time: 11.5s, MSJD: 314790.6
- **AutoPGibbs**: Run time: 54.6s, MSJD: 0.0
- **PGibbsBS**: Run time: 15.2s, MSJD: 284293.5
- **AutoPGibbsBS**: Run time: 57.8s, MSJD: 0.0
- **CDSSM_PMMH**: Run time: 41.9s, MSJD: 3.6
- **FW_CDSSM_PG**: Run time: 84.5s, MSJD: 0.0
- **BW_CDSSM_PG**: Run time: 59.6s, MSJD: 0.0
- **BW_CDSSM_PGBS**: Run time: 74.5s, MSJD: 0.0
- **FW_CDSSM_PGBS**: Run time: 112.1s, MSJD: 0.0


The total run time of all the algorithms was 568.7 seconds

#### Run times for niter=100000

- MCMC algorithm IMMH finished. Run time: 422.917699709069, MSJD: 226.15912305396
- MCMC algorithm PMMH finished. Run time: 1156.0924493749626, MSJD: 228.37935091689414
- MCMC algorithm SSGibbs finished. Run time: 27.983020332991146, MSJD: 973.3467055182768
- MCMC algorithm PGibbs finished. Run time: 1165.92318862502, MSJD: 1720.186074574441
- MCMC algorithm PGibbsBS finished. Run time: 1512.8000926659442, MSJD: 1412.609163801374

Total run time: 4285.7 seconds

#### Run times for niter=1000, Nsteps=100

- MCMC algorithm SSAutoGibbs finished. Run time: 492.66876841697376, MSJD: 2.364475633333281
- MCMC algorithm AutoPGibbs finished. Run time: 505.24563491600566, MSJD: 1.5275133605689981
- MCMC algorithm AutoPGibbsBS finished. Run time: 510.16826579195913, MSJD: 2.7337942300261173

Total run time: 1508.1 seconds

#### Run times for niter=100, Nsteps=100

- MCMC algorithm CDSSM_PMMH finished. Run time: 4.326336166006513, MSJD: 0.0006807386535997021
- MCMC algorithm FW_CDSSM_PGBS finished. Run time: 303.1901090419851, MSJD: 0.12257182145364039
- MCMC algorithm BW_CDSSM_PGBS finished. Run time: 161.39417616603896, MSJD: 0.5776958863503586
- MCMC algorithm BW_CDSSM_PG finished. Run time: 160.71721804200206, MSJD: 0.09070488696201738
- MCMC algorithm FW_CDSSM_PG finished. Run time: 298.9900925840484, MSJD: 0.34570081561265703
- Total time: 928.617932000081

#### Run times for niter=23000, Nsteps=10

- MCMC algorithm IMMH finished. Run time: 95.13535604206845, MSJD: 53.52646511809797
- MCMC algorithm PMMH finished. Run time: 263.93639608402736, MSJD: 55.533539226168244
- MCMC algorithm SSGibbs finished. Run time: 6.433977541979402, MSJD: 228.29189829263794
- MCMC algorithm SSAutoGibbs finished. Run time: 1147.6820821660804, MSJD: 4.0953430680514575
- MCMC algorithm PGibbs finished. Run time: 267.7983410420129, MSJD: 219.5383843231522
- MCMC algorithm AutoPGibbs finished. Run time: 1401.8546708330978, MSJD: 7.510379642630769
- MCMC algorithm PGibbsBS finished. Run time: 342.2828895000275, MSJD: 228.94151514681462
- MCMC algorithm AutoPGibbsBS finished. Run time: 1477.0890752909472, MSJD: 9.439677113318712
- MCMC algorithm CDSSM_PMMH finished. Run time: 1010.1793559170328, MSJD: 2.4864619067024045
- MCMC algorithm FW_CDSSM_PG finished. Run time: 7995.6691050839145, MSJD: 14.40418243649925
- MCMC algorithm BW_CDSSM_PG finished. Run time: 4603.258861584007, MSJD: 2.5234067378015217
- MCMC algorithm BW_CDSSM_PGBS finished. Run time: 4850.449482000084, MSJD: 3.8864972710139414
- MCMC algorithm FW_CDSSM_PGBS finished. Run time: 8519.406258875038, MSJD: 4.676462371430976
- MCMC Algorithm runs complete. Total run time: 31981.175851960317

#### Run times for niter=80000, Nsteps=1

Plot ideas:


- Chain Pairplots
- Marginal histograms of parameters
- ACFs
- Acceptance Rate vs MSJD

Next steps:

- Write a function to store the asymptotic_var for the states and for the parameters.
    - We can use this to calculate the effective sample size per unit time of the methods that have been run.
- Clean up the code base a bit. It is a mess!
    - Extend the PMMH class so that it creates a state container with which to store x, and stores it.
    - Create an IMMH class to do marginal inference on just the latent states.
    - All algorithms that run the MWG step are running extrememly slowly. Try to fix the bug!
    - Think about automating Gibbs sampling of the states so that it has separate MWG steps. This may improve algo performance.


Coding stuff:

- Write a function that takes as input a completed run `MCMC` object and adds an attribute `mcse` and `ess` for the individual chain. We can use this to evaluate the performance of the MCMC algorithms by taking `ess`/`cpu`
- Finish writing the function that converts an MCMC output into an inference_data object. This will be useful for analysing chain outputs using the stats and plots in `arviz`. We can combine runs from multiple chains, and construct pooled estimators of `ess` and `cpu_time`
- Write a `PIMH` and `iCSMC` class for standard SSMs, then write them for CDSSMs. Test them to make sure that they work.
- Extend `CDSSM_SMC` class in `feynman_kac.py` so that after running the filter, it transforms the paths in the particle history object, instead of applying the transform at each time step in the filter.
- Start building out the functionality for inference in higher dimensions.
- Can run nice experiments with `online_smoothing.py` and `hybrid_paris.py` as benchmarks. 
- Fix the issues in `parallel_filtering.py` 

To do:

- Store log posterior and acceptance rate information in the 'summary_stats' group
- Don't worry about the acceptance rates: they vary at differnt timesteps for NUTS runs.

# TS_MvOrnsteinUhlenbeck results

## General Comments:

- The filtering and smoothing is working for both the Bootstrap Reparameterised methods and the Forward Reparameterised methods.
- The bootstrap reparameterised methods perform slightly worse than the standard boostrap in the filtering, due to the introduction of bias in the sampling.
- The forward reparameterised methods outperform the standard bootstrap and bootstrap reparameterised methods in the filtering. This translates to outperformance in the smoothing.
- The 'better' forward proposals outperform more standard/naive choices of forward proposals (e.g the OUP vs NDBBrP)
- The backward proposals don't work, likely because there is an issue with the end point proposal.

## Bootstrap Reparameterised Models

### Filtering

- For all of the Bootstrap Reparameterised filters, the performance of the estimators of the filtering means/marginal likelihoods is reasonable.
    - There is a small amount of additional bias from the bbostrap methods, that will come from sampling using a numerical scheme.
    - The level of bias is the same for all reparameterised models.
    - This implies that the simulation from the multivariate model SDEs is working well.

### Smoothing 

- When using the Delyon-Hu bridge, the smoothing distribution is correctly captured.
- When using a van-der-Meulen Schauer bridge, the smoothing distribution is correctly captured (with some bias) with all choices of auxiliary bridge at times $t>2$
- At time $t=2$, in one of the dimensions, all of the van-der-Meulen Schauer bridges fail to correctly move the particles to capture the smoothing distribution.
    - This is likely due to the difficulty of the problem: even when the ancestors can be resampled, it may be the case that they cannot selsect a single particle.


## Forward Reparameterised Results



### Filtering

- For all of the Bootstrap Reparameterised filters, the performance of the estimators of the filtering means/marginal likelihoods is reasonable.
    - There is a small amount of additional bias from the bbostrap methods, that will come from sampling using a numerical scheme.
    - The level of bias is the same for all reparameterised models.
    - This implies that the simulation from the multivariate model SDEs is working well.

### Smoothing 

- When using the Delyon-Hu bridge, the smoothing distribution is correctly captured.
- When using a van-der-Meulen Schauer bridge, the smoothing distribution is correctly captured (with some bias) with all choices of auxiliary bridge at times $t>2$
- At time $t=2$, in one of the dimensions, all of the van-der-Meulen Schauer bridges fail to correctly move the particles to capture the smoothing distribution.
    - This is likely due to the difficulty of the problem: even when the ancestors can be resampled, it may be the case that they cannot selsect a single particle.


## Backward Reparameterised Results

There is an issue in the filtering: the performance of all backwared methods is much worse than that of the bootstrap. As a result of this,
the performance of the smoothing methods is also poor.

It is likely that since the performance is okay for all of the forward smoothing methods, that there is an issue with the proposal of the end point.

# Additional algorithms

## Online smoothing of pathspace additive functionals

The evaluation of pathspace additive functionals is useful, as it can be applied to develop:

- An Online EM algorithm for maximum likelihood
    - This requires evaluation of the smoothing expectation of the joint likelihood in the E-step.
    - The M-step is non-trivial on the pathspace.
- An Online gradient ascent algorithm for Maximum likelihood
    - This requires evaluation of the score function of the model at each update.
        - Using Fisher's identity, we can evaluate the score by integrating the gradient of the joint likelihood 
            w.r.t the smoothing distribution.

## Rejection-based SMC algorithms

If we are able to come up with an upper bound for $M_t(z_t|z_{t-1})G_t(z_{t-1}, z_t)$, then we can implement the following algorithms:

- FFBS-Reject (Offline smoothing) (Need to define method) `fk.upper_bound_trans(t)`
- PaRIS algorithm (Online smoothing) (Need to define method) `fk.ssm.upper_bound_log_pt(t)` 

These methods do smoothing at a cost of $\mathcal{O}(NT)$ where $N$ is the number of particles. Non-trivial to come up with an upper bound for the forward and backward methods.

