# Description modgp.py Model in GPflow

This notebook makes a detailed explanation of the modgp.py module.

In [37]:
import numpy as np
import GPflow
from GPflow import settings
from GPflow.tf_wraps import eye
import tensorflow as tf
from modulating_likelihood import ModLik # likelihood for the modulated GP

Now we define our model class

In [35]:
class ModGP(GPflow.model.Model):
    def __init__(self, X, Y, kern1, kern2, Z):    
        #call parent class constructor
        GPflow.model.Model.__init__(self)
        
        # set atributes
        self.X, self.Y, self.kern1, self.kern2 = X, Y, kern1, kern2
        self.likelihood = ModLik()
        self.Z = Z
        self.num_inducing = Z.shape[0]
        self.num_data = X.shape[0]
        
        # initialize variational mean
        self.q_mu1 = GPflow.param.Param(np.zeros((self.Z.shape[0], 1)))
        self.q_mu2 = GPflow.param.Param(np.zeros((self.Z.shape[0], 1)))
        
        # initialize variational covariance matrices
        q_sqrt = np.array([np.eye(self.num_inducing) for _ in range(1)]).swapaxes(0, 2) 
        self.q_sqrt1 = GPflow.param.Param(q_sqrt.copy())
        self.q_sqrt2 = GPflow.param.Param(q_sqrt.copy())

The name of our model class is ModGP, this class heritages atributes and methods from the general class GPflow.model.Model.

We first define the constructor where the data, kernels and inducing points are inputs. The ModGP constructor calls then the constructor of GPflow.model.Model.

Atributes are defined: input X and output data Y, kernels used, inducing variables Z, kind of likelihood, number of inducing variables and number of data observations.

Then the variational parameters ared defined and initialized. that is, the mean and variance of each variational distribution. This parameters are instantiations of the class Params, the mean is initialized with a vector of $M$ zeros, where $M$ is the number of induing points.

The variational square root of covariance matrices are defined by first generating a list of identity matrices by the code "[np.eye(self.num_inducing) for _ in range(1)]", in this case only one matrix is generated, but when passing this list trough np.array(), the array we get have dimension 1xMxM, and this array is modified to have dimension MxMx1 (.swapaxes(0, 2)). I think this shape is required for running correctly the code in GPflow when calling "GPflow.conditionals.conditional".

Now we describe the second method in the class ModGP
































In [34]:
    def build_prior_KL(self):
        # get cov matrix prior
        K1 = self.kern1.K(self.Z) + eye(self.num_inducing) * settings.numerics.jitter_level
        K2 = self.kern2.K(self.Z) + eye(self.num_inducing) * settings.numerics.jitter_level
        
        # calculate KL div
        KL1 = GPflow.kullback_leiblers.gauss_kl(self.q_mu1, self.q_sqrt1, K1)
        KL2 = GPflow.kullback_leiblers.gauss_kl(self.q_mu2, self.q_sqrt2, K2)
        return KL1 + KL2

The method "build_prior_KL" calculates the KL divergence between the prior and the variational posterior over the inducing variables.  the cov matrix for each prior (over u_f andu_g) are altered with a little noise variance using the settings.numerics.jitter_level.
After calculating the cov matrices ther KL divergence between each prior and variational approximation  is calculated. The prior is assumed to have zero mean function, therefore only the cov matrix in necessary, for the variational distribution both its mean vector and cov matrix are required.

As shown in the report the KL divergence is equal to sum the KL divergence between each prior and approximation.




























Now we describe the "build_likelihood" method































In [33]:
    def build_likelihood(self):
            # Get prior KL.
            KL = self.build_prior_KL()

            # Get conditionals
            fmean1, fvar1 = GPflow.conditionals.conditional(self.X, self.Z, self.kern1, self.q_mu1,
                                                            q_sqrt=self.q_sqrt1, full_cov=False, whiten=False)

            fmean2, fvar2 = GPflow.conditionals.conditional(self.X, self.Z, self.kern2, self.q_mu2,
                                                            q_sqrt=self.q_sqrt2, full_cov=False, whiten=False)

            fmean, fvar = tf.concat(1, [fmean1, fmean2]), tf.concat(1, [fvar1, fvar2])

            # Get variational expectations.
            var_exp = self.likelihood.variational_expectations(fmean, fvar, self.Y)

            # re-scale for minibatch size
            scale = tf.cast(self.num_data, settings.dtypes.float_type) /\
                    tf.cast(tf.shape(self.X)[0], settings.dtypes.float_type)

            return tf.reduce_sum(var_exp) * scale - KL

In this method, the initial calculation of the KL is computed, then the mean and variance of the conditional distributions $q(\textbf{f})$ and $q(\textbf{g})$ are computed, i.e. $q(\textbf{f}) = \int p(\textbf{f}|\textbf{u}_f)q(\textbf{u}_f) \ \text{d}\textbf{u}_f$. This is done through the function "GPflow.conditionals.conditional".

Then the means and variances for each variational distribution are concatenated.

Given the mean and variance of the necesary distributions to compute the $N$ variational expectations, then we can call the function "variational_expectations" defined in our own likelihood Class. If I am not wrong this give us the N expectations.

If a minibatch is used, then the necessary scale is computed (those lines of code need need to be analyzed properly).

Finally, the sum of the $N$ variational expectations (scaled) minus the KL divergence are given back. This is the value of the Evidence Lower Bound (ELBO) !

Now we describe the last section for making predictions over the latent functions.

In [None]:
    @GPflow.param.AutoFlow((tf.float64, [None, None]))
    def predict_f(self, Xnew):
        return GPflow.conditionals.conditional(Xnew, self.Z, self.kern1, self.q_mu1,
                                               q_sqrt=self.q_sqrt1, 
                                               full_cov=False, whiten=False)
    
    @GPflow.param.AutoFlow((tf.float64, [None, None]))
    def predict_g(self, Xnew):
        return GPflow.conditionals.conditional(Xnew, self.Z, self.kern2, self.q_mu2,
                                               q_sqrt=self.q_sqrt2, 
                                               full_cov=False, whiten=False)

Here we make use of decorators (@); a convenient way to alter functions and methods in python.
We define the functions for getting the prediction of each latent function, by taking the conditional given the variational posteriors.