GP prediction with uncertain inputs
---
*Nicolas Knudde & Joachim Van der Herten 2016*

Here are some implementation notes on the GP prediction with uncertain inputs in the variational framework in `GPflow.gplvm.Bayesian GP-LVM`. The reference for this work is [Girard 2003, *Gaussian process priors with uncertain inputs—application to multiple-step ahead time series forecasting.*](http://www.dcs.gla.ac.uk/~rod/publications/GirRasMur02-TR-2002-119.pdf); in the formulation of [Damianou 2011, *Variational Gaussian Process Dynamical Systems*](https://arxiv.org/abs/1107.4985); these notes serve to map the conclusions of that paper to the implementation in GPflow. The reference work simply uses the properties of conditional expecations and covariances.

Two things are not covered by this notebook: prior mean functions and the extension to multiple independent outputs. These extensions are straightforward in theory but we have taken some care in the code to ensure that they are handled efficiently. 

### Prediction of mean and variance
The main result is that in the case of an uncertain test point $\boldsymbol x_\star$, even though the output distribution is not Gaussian, it is possible to calculate the mean and covariance of the output if the kernel expectations can be. The resulting formula for the mean is:

$$ \boldsymbol \mu_\star = \boldsymbol \Psi_{\star}  \boldsymbol \chi $$

, and for the covariance:

$$ \boldsymbol \Sigma_\star = \boldsymbol \chi^T (\boldsymbol \Phi_{\star} - \boldsymbol \Psi_{\star}^T \boldsymbol \Psi_{\star}) \boldsymbol \chi + \xi_\star \boldsymbol I - \text{tr}((\boldsymbol K_{uu}^{-1} - (\boldsymbol K_{uu}+\beta \boldsymbol \Phi)^{-1}) \boldsymbol \Phi_\star) \boldsymbol I$$

with $\boldsymbol \chi = \beta (\boldsymbol K_{uu}+\beta \boldsymbol \Phi)^{-1} \boldsymbol \Psi^T \boldsymbol Y$, $\mathbf \Phi_{\star} \in \mathbb{R}^{M \times M}$ and $\mathbf \Psi_{\star} \in \mathbb{R}^{1 \times M}$.

In fact the formula for the mean is exactly the same as the precition formula with certain input for the GP-LVM but with kernel expectations, but the derivation will be repeated here.

### Calculation of mean
\begin{align*}
\boldsymbol L &= \text{chol}( \boldsymbol K_{uu} ) \\
\boldsymbol A &= \sqrt{\beta} \boldsymbol L^{-1} \boldsymbol \Psi^T\\
\boldsymbol {tmp} &= \boldsymbol L^{-1}\boldsymbol \Phi \\
\boldsymbol {AAT} &= \beta \boldsymbol L^{-1} \boldsymbol {tmp} ^T \\ &= \beta \boldsymbol L^{-1}  \boldsymbol \Phi \boldsymbol L^{-T} \\
\boldsymbol B &=  \boldsymbol {AAT} + \boldsymbol I \\ &= \beta \boldsymbol L^{-1} \boldsymbol {tmp} ^T  \\&= \beta \boldsymbol L^{-1}  \boldsymbol \Phi \boldsymbol L^{-T} + \boldsymbol I \\
\boldsymbol {LB} &= \text{chol}( \boldsymbol B ) \\
\boldsymbol {c} &=\sqrt{\beta} \boldsymbol {LB}^{-1} \boldsymbol A \boldsymbol Y \\
\boldsymbol{tmp1} &= \boldsymbol L^{-1} \boldsymbol \Psi_\star^T \\
\boldsymbol{tmp2} &= \boldsymbol {LB}^{-1} \boldsymbol {tmp1} \\
 \boldsymbol{\mu}_\star &= \boldsymbol{tmp2}^T \boldsymbol c \\
&= \sqrt{\beta} \boldsymbol {tmp1}^T \boldsymbol {LB}^{-T} \boldsymbol {LB}^{-1} \boldsymbol A \boldsymbol Y \\
&= \beta \boldsymbol \Psi_\star \boldsymbol L^{-T} \boldsymbol {LB}^{-T} \boldsymbol {LB}^{-1} \boldsymbol L^{-1} \boldsymbol \Psi^T \boldsymbol Y \\ 
&= \beta \boldsymbol \Psi_\star \boldsymbol L^{-T} \boldsymbol {(\boldsymbol L^{-1}  \boldsymbol \Phi \boldsymbol L^{-T} + \boldsymbol I)}^{-1} \boldsymbol L^{-1} \boldsymbol \Psi^T \boldsymbol Y \\ 
&= \beta \boldsymbol \Psi_\star  \boldsymbol {( \boldsymbol \beta \Phi + \boldsymbol K_{uu})}^{-1} \boldsymbol \Psi^T \boldsymbol Y\end{align*}

### Calculation of covariance
\begin{align*}
\boldsymbol {tmp3} &= \boldsymbol{LB}^{-1} \boldsymbol L^{-1} \boldsymbol \Psi_\star^T \\
\boldsymbol {tmp4} &= \boldsymbol {tmp3} \boldsymbol {tmp3}^T \\
&= \boldsymbol{LB}^{-1} \boldsymbol L^{-1} \boldsymbol \Psi_\star^T \boldsymbol \Psi_\star  \boldsymbol L^{-T} \boldsymbol{LB}^{-T} \\
\boldsymbol {tmp5} &= \boldsymbol L^{-1} \boldsymbol{\Phi}_\star \boldsymbol L^{-T} \\
\boldsymbol {tmp6} &= \boldsymbol {LB}^{-1} \boldsymbol {L}^{-1} \boldsymbol{\Phi}_\star \boldsymbol L^{-T} \boldsymbol {LB}^{-T} \\
TT &= \text{tr}(\boldsymbol {tmp5} - \boldsymbol {tmp6}) \\
&= \text{tr}(\boldsymbol L^{-T} \boldsymbol L^{-1}\boldsymbol{\Phi}_\star - \boldsymbol L^{-T} \boldsymbol {LB}^{-T}\boldsymbol {LB}^{-1} \boldsymbol {L}^{-1} \boldsymbol{\Phi}_\star)\\
&=\text{tr}((\boldsymbol K_{uu}^{-1} - (\boldsymbol K_{uu}+\beta \boldsymbol \Phi)^{-1}) \boldsymbol \Phi_\star)\\
\boldsymbol{diagonals} &= (\xi_\star - TT) \boldsymbol{I} \\
\boldsymbol{covar1} &= \boldsymbol{c}^T (\boldsymbol {tmp6}-\boldsymbol {tmp4}) \boldsymbol{c} \\
&= \beta \boldsymbol{Y}^T  \boldsymbol{A}^T \boldsymbol{LB}^{-T} \boldsymbol{LB}^{-1} \boldsymbol L^{-1} (\boldsymbol{\Phi}_\star - \boldsymbol \Psi_\star^T \boldsymbol \Psi_\star)  \boldsymbol L^{-T} \boldsymbol{LB}^{-T} \boldsymbol {LB}^{-1} \boldsymbol A \boldsymbol Y \\ 
&= \boldsymbol \chi^T (\boldsymbol \Phi_{\star} - \boldsymbol \Psi_{\star}^T \boldsymbol \Psi_{\star}) \boldsymbol \chi \\
\boldsymbol \Sigma_\star &=\boldsymbol{diagonals} +\boldsymbol{covar1} \\
&= \boldsymbol \chi^T (\boldsymbol \Phi_{\star} - \boldsymbol \Psi_{\star}^T \boldsymbol \Psi_{\star}) \boldsymbol \chi + \xi_\star \boldsymbol I - \text{tr}((\boldsymbol K_{uu}^{-1} - (\boldsymbol K_{uu}+\beta \boldsymbol \Phi)^{-1}) \boldsymbol \Phi_\star) \boldsymbol{I}
\end{align*}