Extended and Unscented Gaussian Processes
Author: Daniel Steinberg
Date: 31st Oct 2014
License: LGPLv3, see COPYING and COPYING.LESSER for more information
This code contains python 2.7 and 3.x implementations of the Extended and Unscented Gaussian Processes (E/UGPs) for inversion problems as described in . These are similar to regular GPs, but the latent function, f can optionally have an extra nonlinear relationship to the observations, y, in the likelihood,
- Normal GP likelihood: y ~ N(f, σ2I) or for a single observation, n, yn = fn + ε.
- E/UGP likelihood: y ~ N(g(f), σ2I) or for a single observation, n, yn = g(fn) + ε
Where g(.) is an arbitrary scalar function (maps R to R). The posterior Gaussian process parameters are learned using a variational objective with Gauss-Newton style linearization (of g) and mean finding.
The extended GP uses a first order Taylor series expansion of g about the posterior mean of the latent function (f) to obtain an approximate posterior within the Gauss-Newton iterations. The unscented GP uses statistical linearisation, specifically the unscented transform, to linearise g about the posterior mean in the Gauss-Newton iterations. The EGP requires derivatives dg/dfn for the Taylor expansion, however the UGP does not, and can be used on nonlinear functions that do not have derivatives e.g. signum functions,
NLopt gradient free methods (BOBYQA) uses free energy lower bound to optimise kernel hyperparameters and σ.
Also included in this repository are:
- A regular Gaussian process (as a working base class for the non linear methods).
- The experimental from  for both experiments:
- The "toy" inversion problems
- The USPS handwritten digits binary classification experiment
You can obtain  from my homepage.
Table of Contents
For the linearized GPs:
- Python 2.7 or 3.x (tested with 3.3)
For the tests:
For the classification experiment:
- Octave (or Matlab for running the GPML classifiers)
- GPMLtoolbox (Matlab/Octave toolbox for the competing GP classification methods)
All of these should be packages on most major GNU/Linux distributions (I am using Fedora) apart from the GPML toolbox, and should also be easily obtainable with various scientific python packages for Windows and OS X (e.g. Anaconda, Enthought Canopy, PythonXY, etc).
NOTE: Make sure you use the correct libraries for python 2.7 or 3.x!
A setup.py script has been provided which can be used with distutils, easy_install and pip. It only checks for the scipy and numpy dependencies, so you will need to install the NLopt dependency before running this script. This is how you can download and install the code from github using pip,
git clone https://github.inside.nicta.com.au/SIEF/linearizedGP cd linearizedGP sudo pip install .
If you don't have pip, you can replace the last line with,
sudo python setup.py install
Where you should substitute the correct python/pip commands for the version of
python you are using (like
python3 depending on your OS).
If you don't want to run the experiments or tests, you do not need to keep the
code downloaded from github now.
Once installed, this package can be imported as,
# Import this package from linearizedGP import extendedGP from linearizedGP import unscentedGP from linearizedGP import kernels # Other imports for this example import numpy as np import scipy.linalg as la
Now let's make some example data
# Some settings N = 200 # training data sigma = 0.2 # observation (Y) noise k_sigma = 0.8 # kernel amplitude k_length = 0.6 # kernel length scale # A simple nonlinear function and its derivative nlfunc = lambda f: f**3 + f**2 + f dnlfunc = lambda f: 3*f**2 + 2*f + 1 # Make a random draw from a GP X = np.linspace(0, 10, N) # Input observations L = np.linalg.cholesky(kernels.kern_m52(X[:, np.newaxis, # matern 5/2 X[:, np.newaxis], k_sigma, k_length)) # cholesky of a cov matrix F = np.random.randn(N).dot(L) # The "latent" function Y = nlfunc(F) + np.random.randn(N) * sigma # Target observations
Now that we have the data, we want to instantiate the GPs and also but bounds on the kernel hyperparameters and likelihood noise,
# Extended GP egp = extendedGP.extendedGP(nlfunc, dnlfunc, kernels.kern_m52) egp.learnLB((1e-1, 1e-1), ynoise=1e-2) # establish lower bounds for NLopt # Unscented GP ugp = unscentedGP.unscentedGP(nlfunc, kernels.kern_m52) # no deriv required ugp.learnLB((1e-1, 1e-1), ynoise=1e-2) # establish lower bounds for NLopt
Given the data and the bounds on the hyperparameters we can now learn the values for the posterior parameters and hyperparameters (as long as we set some initial values for the hyperparameters),
# Extended GP egp_objective = egp.learn(X, Y, kparams=(1.0, 1.0), ynoise=1.0, verbose=True) # Unscented GP ugp_objective = ugp.learn(X, Y, kparams=(1.0, 1.0), ynoise=1.0, verbose=True)
In both cases the objective function is the variational lower bound on log-marginal likelihood (free energy). Now the learned parameters are in the GP objects, e.g.,
print(egp.m) print(egp.C) print(egp.kparams) print(egp.ynoise)
We can use these trained GPs to predict new targets, Y*, and latent functions F* given new test inputs, X*.
# Test data Ns = 1000 Xs = np.linspace(0, 10, Ns) # EGP prediction EYs_e, VYs_e, EFs_e, VFs_e = egp.predict(Xs) # This uses quadrature # UGP prediction EYs_u, VYs_u, EFs_u, VFs_u = ugp.predict(Xs) # This uses the unscented tfm.
Where the expected values of Y* and the latent function F* are
EFs_ respectively. The variance of these predictions is given by
VFs_ respectively. Here is an example of the unscented GP applied to the this
polynomial nonlinear transformation,
To run the experiments from , firstly make sure you have the dependencies installed for the relevant experiments. I prefer running python code from an ipython console, and the following instructions will assume this is the case.
Note that since we released the paper, we have changed the code slightly so that you may not get the exact results we got in the paper. If you wish to get the exact same results, make sure the initial guess at the posterior mean (m) is all zeros instead of being randomly initialised (change lines 166 and 219 of the EGP and UGP code respectively). We made this change because initialising m as all zeros lead to some pathological errors for some forward models (like for functions that have zero gradient at zero).
Toy inversion experiments
Firstly, generate the data using the
experiments/datagen.py script. This will
generate random data for all of the forward models in the paper and save them
as mat files in the
data directory (thanks to
socallaghan for automating this). These files
can also be opened with Matlab and Octave.
Now to run the experiments, point the
dataset variable in
experiments/toytest.py to the data you just generated (e.g.
"data/expdata.mat"), change the values of the
dolinear variables as you see fit (the
dolinear variable toggles an extra
experiment for running a normal GP), then from the ipython console run,
In : run experiments/toytest.py
This should then display the results of 5-fold cross validation of the EGP and UGP (and a linear GP if toggled) on predicting the generated data, as in .
Here are the steps to run this experiment:
Download and extract the USPS handwritten digit data from here. This includes the data as a .mat file and a script to subset the digits.
I've written a helper script generates the subset used for the classification experiment in  called
experiments/makeUSPS35data.m, copy this script to where you extracted this dataset, and run it from Octave or Matlab (from the same directory). This creates a file called
Make sure the
experiments/uspsbclass.pypoints to the file you generated in step 2. Then simply run from the linearizedGP code directory,
In : run experiments/uspsbclass.py
- To run the Octave/Matlab scripts for the GMPL algorithms, run the
experiments/uspsbclass.mscript, making sure
datapathpoint to the relevant places.
 D. M. Steinberg, E. V. Bonilla. Extended and Unscented Gaussian Processes, Advances in Neural and Information Processing Systems (NIPS), Montreal, December 2014, [PDF].
This research was supported by the Science Industry Endowment Fund (RP 04-174) Big Data Knowledge Discovery project.