# Gaussian Process Latent Variable Models with SVI 

Vidhi Lalchand, 2021

## Introduction 

In this notebook we demonstrate the GPLVM model class introduced in [Lawrence, 2005](https://proceedings.neurips.cc/paper/2003/file/9657c1fffd38824e5ab0472e022e577e-Paper.pdf) and its Bayesian incarnation introduced in [Titsias & Lawrence, 2010](http://proceedings.mlr.press/v9/titsias10a/titsias10a.pdf).

GPLVMs use Gaussian processes in an unsupervised context, where a low dimensional representation of the data ($X \equiv \{\mathbf{x}_{n}\}_{n=1}^{N}\in \mathbb{R}^{N \times Q}$) is learnt given some high dimensional real valued observations $Y \equiv \{\mathbf{y}_{n}\}_{n=1}^{N} \in \mathbb{R}^{N \times D}$. $Q < D$ provides dimensionality reduction. The forward mapping ($X \longrightarrow Y$) is governed by GPs independently defined across dimensions $D$. Q (the dimensionality of the latent space is usually fixed before hand).

One can either learn point estimates for each $\mathbf{x}_{n}$ by maximizing the GP marginal likelihood (use `gpytorch.mlls.ExactMarginalLogLikelihood` for this) jointly wrt. the kernel hyperparameters $\theta$ and the latent inputs $\mathbf{x}_{n}$. Alternatively, one can variationally integrate out $X$ by using the sparse variational formulation where a variational distribution $q(X) = \prod_{n=1}^{N}\mathcal{N}(\mathbf{x}_{n}; \mu_{n}, s_{n}\mathbb{I}_{Q})$.This tutorial focuses on the latter. 

The probabilistic model is: 

\begin{align*}
\textrm{ Prior on latents: } p(X) &= \displaystyle \prod _{n=1}^N \mathcal{N} (\mathbf{x}_{n};\mathbf{0}, \mathbb{I}_{Q}),\\
\textrm{Prior on mapping: }    p(\mathbf{f}|X, \mathbf{\theta}) &= \displaystyle \prod_{d=1}^{D}\mathcal{N}(\mathbf{f}_{d}; \mathbf{0}, K_{ff}^{(d)}),\\
\textrm{Data likelihood: }  p(Y| \mathbf{f}, X) &= \prod_{n=1}^N \prod_{d=1}^D \mathcal{N}(y_{n,d}; \mathbf{f}_{d}(\mathbf{x}_{n}), \sigma^{2}_{y}),
\end{align*}


In [14]:
# Standard imports
import matplotlib.pylab as plt
import torch 
import numpy as np
from pathlib import Path

import os
print(os.getcwd())
print(Path(__file__).resolve().parent)

# gpytorch imports
#from gpytorch.mlls import VariationalELBO
#from gpytorch.priors import NormalPrior

%matplotlib inline
%load_ext autoreload
%autoreload 2

/home/vidhi/Desktop/Workspace/gpytorch/examples/04_Variational_and_Approximate_GPs


NameError: name '__file__' is not defined

### Set up training data 

We use the canonical multi-phase oilflow dataset used in [Titsias & Lawrence, 2010](http://proceedings.mlr.press/v9/titsias10a/titsias10a.pdf) that consists of 1000, 12 dimensional observations belonging to three known classes corresponding to different phases of oilflow. 

In [None]:
import urllib
import tarfile

urllib.request.urlretrieve(url,'3PhData.tar.gz')
with tarfile.open('3PhData.tar.gz', 'r') as f:
    f.extract('DataTrn.txt')
    f.extract('DataTrnLbls.txt')


Y = np.loadtxt(fname='DataTrn.txt') 
labels = np.loadtxt(fname='DataTrnLbls.txt')
labels = (labels @ np.diag([1, 2, 3])).sum(axis=1)

### Setting up the model

We will be using the GPLVM model class which is compatible with three different modes of inference. 

Since we're performing VI, we'll be using a `~gpytorch.models.ApproximateGP`. Similar to the [SVGP example](./SVGP_Regression_CUDA.ipynb), we'll use a `VariationalStrategy` and a `CholeskyVariationalDistribution` to define the posterior approximation 



In [2]:
from models.latent_variable import *
from matplotlib import pyplot as plt
from tqdm import trange
from gpytorch.means import ZeroMean
from gpytorch.mlls import VariationalELBO
from gpytorch.priors import NormalPrior
from gpytorch.likelihoods import GaussianLikelihood
from gpytorch.variational import VariationalStrategy
from gpytorch.variational import CholeskyVariationalDistribution
from gpytorch.kernels import ScaleKernel, RBFKernel
from gpytorch.distributions import MultivariateNormal

def _init_pca(Y, latent_dim):
    U, S, V = torch.pca_lowrank(Y, q = latent_dim)
    return torch.nn.Parameter(torch.matmul(Y, V[:,:latent_dim]))

class bGPLVM(gpytorch.models.bayesian_gplvm):
     
        def __init__(self, n, data_dim, latent_dim, n_inducing, pca=False):
         
        self.n = n
        self.batch_shape = torch.Size([data_dim])
        
        # Locations Z_{d} corresponding to u_{d}, they can be randomly initialized or 
        # regularly placed with shape (D x n_inducing x latent_dim).
        self.inducing_inputs = torch.randn(data_dim, n_inducing, latent_dim)
    
        # Sparse Variational Formulation
        
        q_u = CholeskyVariationalDistribution(n_inducing, batch_shape=self.batch_shape) 
        q_f = VariationalStrategy(self, self.inducing_inputs, q_u, learn_inducing_locations=True)
    
        # Define prior for X
        X_prior_mean = torch.zeros(n, latent_dim)  # shape: N x Q
        prior_x = NormalPrior(X_prior_mean, torch.ones_like(X_prior_mean))
    
        # Initialise X with PCA or 0s.
        if pca == True:
             X_init = _init_pca(Y, latent_dim) # Initialise X to PCA 
        else:
             X_init = torch.nn.Parameter(torch.zeros(n, latent_dim))
          
        # LatentVariable (X)
        X = VariationalLatentVariable(n, data_dim, latent_dim, X_init, prior_x)
        #X = PointLatentVariable(n, latent_dim, X_init)
        #X = MAPLatentVariable(n, latent_dim, X_init, prior_x)
        
        super(bGPLVM, self).__init__(X, q_f)
        
        # Kernel 
        self.mean_module = ConstantMean(ard_num_dims=latent_dim)
        self.covar_module = ScaleKernel(RBFKernel(ard_num_dims=latent_dim))


    def forward(self, X):
        mean_x = self.mean_module(X)
        covar_x = self.covar_module(X)
        dist = MultivariateNormal(mean_x, covar_x)
        return dist
    
     def _get_batch_idx(self, batch_size):
            
        valid_indices = np.arange(self.n)
        batch_indices = np.random.choice(valid_indices, size=batch_size, replace=False)
        return np.sort(batch_indices)


IndentationError: unindent does not match any outer indentation level (<tokenize>, line 55)