AvGPR

AvGPR is a package that calculates a weighted average Gaussian Process regression model over 5 implementations from packages in both R and Python. It uses cross validation to optimise the weights to select the regression models that are best capturing the data.

Installation

You can install the development version of AvGPR from GitHub with:

# install.packages("devtools")
devtools::install_github("benjmuller/AvGPR")

This package requires the python modules ‘numpy’, ‘GPy’, ‘sklearn’, and ‘warnings’ in the python virtual environment. These modules can be installed onto the current reticulate python virtual environment with:

install_python_packages()

Alternatively, the models can be installed on a new python environment with:

create_python_environment()

Example

This is a basic example of a simple GPR using the AvGPR function:

library(AvGPR)
# create_python_environment()
set.seed(1)
n <- 15
X <- runif(n, min = 0, max = 4 * pi)
Y <- 2 * sin(X)
XX <- seq(0, 4 * pi, length.out=100)
AvGPR(data.frame(X=X), data.frame(Y=Y), data.frame(X=XX))

Gaussian Process Tutorial

Introduction

This document is an introduction to Gaussian Process regression (GPR) where it will discuss; how it works, an intuitive way to think about it and show some code for real-world examples.

Background

A Gaussian process is a stochastic process that is used to emulate an event that has normally distributed random variables which are indexed over time or space. Gaussian processes are formed by the joint distribution of these normal random variables. Gaussian Processes can be used as a non-linear multivariate interpolation tool. This process is called Gaussian Process regression or Kriging. Using this method, we can write multi-output prediction problems in the following parametric form:

$\hat{y}(x)=m(x)+Z(x),$

where $\hat{y}$ is the simulated output, is the mean function and is zero-mean Gaussian Process. The mean function is a “best fit” parametric function of the independent variables. The zero-mean Gaussian Process is specified by a covariance function. There are many choices of covariance functions that better suit different types of events being emulated. A list of some standard functions can be found here (https://www.cs.toronto.edu/~duvenaud/cookbook/).

Idea behind Gaussian Processes

A way to understand Gaussian processes, in particular, getting the parametric from the non-parametric form, is by considering a bivariate normal:

A contour plot of a bivariate normal with marginal means 0, marginal variances 2 and covariances 1.

The normal distributions that construct the bivariate normal have covariance 1, hence the conditioning of an observation (data point) gives information about what possible values the other random variable can take. Consider the observation and plotting some random samples of possible values for the conditional distribution :

Consider expanding this notion to a 10-variate normal distribution where we fix with $Y_i \sim N(0,2)$ with correlations that are specified by the squared exponential covariance function (SE):

Now suppose we condition the observations , and and plot samples of possible values:

LHS: Random samples. RHS: Random samples overlaid onto one set of axes.

The families of curves that join the points are subsets to the family of solutions which are compatible with the imposed values. If we take a large number of these sample curves, we can see that they will appear to give regions of confidence. It is clear that at this stage, this becomes a nonlinear regression problem. So, setting the mean and the covariance function to be parametric functions that define the covariance in the discrete multivariate case. This gives us the construction of the Gaussian Process regression.

1-Dimensional Example

In this section we will go through a one dimensional example of Gaussian process regression. The data is created from a random sample of 8 points on the sine function in the domain $[-2 \pi, 2 \pi]$ . The randomly generated values that will be used in this example are:

$-4.572 \quad -1.788 \quad-0.761 \quad 0.095 \quad 1.727 \quad 3.585 \quad 4.181 \quad 5.454$

The covariance function that this example will be using is the squared exponential function (SE). This has the form

$k(x_i, x_j)=\sigma ^2 \exp \left(\frac{-1}{2l^2}\sum^d_{k=1}(x_{i,k}-x_{j,k})^2 \right)$

Using the samples and the chosen covariance function, this gives the covariance matrix:

library(plgp)

kernel <- function(x1, x2, sigma = 1, l = 1) {
  matrix.dist <- distance(x1,x2)
  kern <- (sigma ^ 2) * exp(- matrix.dist / (2 * l ^ 2))
  return(kern)
}

$K= \begin{pmatrix} 1.0000 & 0.0208 & 0.0007 & 1.8635 \times 10^{-5} & 2.4219\times 10^{-9} & 3.5625\times 10^{-15} & 2.3081\times 10^{-17} & 1.4867\times 10^{-22} \\ 0.0208 & 1.0000 & 0.5902 & 0.1699 & 0.0021 & 5.3845\times 10^{-7} & 1.8335\times 10^{-8} & 4.0867\times 10^{-12} \\ 0.0007 & 0.5902 & 1.0000 & 0.6933 & 0.0453 & 7.9174\times 10^{-5} & 4.9720\times 10^{-6} & 4.0966\times 10^{-9} \\ 1.8635\times 10^{-5} & 0.1699 & 0.6933 & 1.0000 & 0.2640 & 0.0023 & 0.0002 & 5.8046\times 10^{-7} \\ 2.4219\times 10^{-9} & 0.0021 & 0.0453 & 0.2640 & 1.0000 & 0.1780 & 0.0492 & 0.0010 \\ 3.5625\times 10^{-15} & 5.3845\times 10^{-7} & 7.9174\times 10^{-5} & 0.0023 & 0.1780 & 1.0000 & 0.8373 & 0.1744 \\ 2.3081\times 10^{-17} & 1.8335\times 10^{-18} & 4.9720\times 10^{-6} & 0.0002 & 0.0492 & 0.8373 & 1.0000 & 0.4447 \\ 1.4867\times 10^{-22} & 4.0867\times 10^{-12} & 4.0966\times 10^{-9} & 5.8046\times 10^{-7} & 0.0010& 0.1744 & 0.4447 & 1.0000 \end{pmatrix}$

Here we will plot a coloured gradient to give a visual representation of the correlations between each of the random variables. This is beneficial when the matrix is large, as there are too many values to reliably interpolate the behaviors.

Using the covariance function to construct covariance matrices from combinations of the observed data and data we want to predict, we can calculate the posterior distribution. The posterior mean vector and covariance matrix are given (respectively) by:

$\mu_* = K^T_*K^{-1}y \\ \Sigma_* = K_{**} - K^T_*K^{-1}K_*$

where are the outputs from the observed data, is the kernel for the observed data, is the kernel for the observed data against the data we want to predict and $K_{**}$ is the kernel for the data we want to predict.

posterior <- function(xvals.s, x.train, y.train, sigma = sigma, l = l, sigma.y = 1e-8) {
  n <- nrow(x.train); m <- nrow(xvals.s)
  k <- kernel(x.train, x.train, sigma = sigma, l = l) + sigma.y ^ 2 * diag(n)
  k.s <- kernel(x.train, xvals.s, sigma = sigma, l = l)
  k.ss <- kernel(xvals.s, xvals.s, sigma = sigma, l = l) + 1e-8 ^ 2 * diag(m)
  k.inv <- solve(k)
  y.train <- data.matrix(y.train)
  colnames(y.train) <- NULL
  mu <- t(k.s) %*% k.inv %*% y.train
  cov <-  k.ss - t(k.s) %*% k.inv %*% k.s
  
  return(list(mu, cov))
}

Using the predictive distribution, we can plot the Gaussian process by choosing the data we want to predict to be a large sample of points of the domain:

2-Dimensional Example

In this example we will be using the Branin function which has the form: $a(x_2 - bx_2^2+cx_1-r)^2+s(1-t)\cos(x_1)+s$ . We will be using the parameters , $b=\frac{5.1}{4 \pi ^ 2}$ , $c=\frac{5}{\pi}$ , , and $t=\frac{1}{8\pi}$ . The range of values and can take are $x_1 \in [-5,10]$ and $x_2 \in [0,15]$ . This example will use 40 random samples as training data.

Since the covariance matrix has dimensions 40x40, it is difficult to understand the behavior by looking at the correlation coefficients. So we will plot a coloured gradient for the correlations. We will be using the same SE function to calculate the covariance coefficients. Plotting the covariance gradient:

Using the same matrix multiplication as for the one-dimensional case, we can calculate the predictive distribution for this Gaussian process. Using this we can plot the GP:

library(rgl)
plotgp_2D <- function(x1, x2, x.train, y.train, sigma = 1, l = 1,
                        xlimit = c(), ylimit = c(), zlimit = c()) {
  xvals <- expand.grid(x1, x2)
  mu <- posterior(xvals, x.train, y.train, sigma = sigma, l = l)[[1]]
  
  zlim <- range(mu)
  zlen <- zlim[2] - zlim[1] + 1
  colorlut <- hcl.colors(zlen, palette = "Blues")
  col <- colorlut[ mu - zlim[1] + 1 ]
  persp3d(x=x1, y=x2, z=mu, color=col, zlab = "y(x1, x2)")
}

x1 <- seq(from = -5, to = 10, by = 0.2)
x2 <- seq(from = 0, to = 15, by = 0.2)
plotgp_2D(x1, x2, x.train, y.train, sigma = 1, l = 2)

Higher dimensional Examples

In this section we have some example functions for higher dimensional Gaussian Process regression.

Borehole function (8-dimensional)

The borehole function models water flowing through a borehole. The equation has the form:

$f(x)=\frac{2\pi T_u(H_u-H_l)}{ln\left(\frac{r}{r_w}\right)\left(1+\frac{2LT-u}{ln\left(\frac{r}{r_w}\right)r_w^2K_w} + \frac{T_u}{T_l} \right)} .$

OTL Circuit Function (6-dimensional)

The OTL Circuit function models an output transformerless push-pull circuit. The equation has the form:

$V_m(x)=\frac{(V_{b1} + 0.74)\beta (R_{x2}+9)}{\beta(R_{c2}+9)+R_f} + \frac{11.35R_f}{\beta (R_{c2}+9)+R_f} + \frac{0.74R_f\beta(R_{c2}+9)}{(\beta(R_{c2}+9)+R_f)R_{c1}}, \\ \text{ where } V_{b1}=\frac{12R_{b2}}{R_{b1}+R_{b2}}.$

Piston Simulation Function (7-dimensional)

The piston simulation function models the circular motion of a piston within a cylinder. The equation has the form:

$C(x) = 2 \pi \sqrt{\frac{M}{k+S^2\frac{P_0V_0}{T_0}\frac{T_a}{V^2}}},\\ \text{ where } V = \frac{S}{2k}\left( \sqrt{A^2 + 4k \frac{P_0V_9}{T_0}T_a} - A \right), \\ A=P_0S + 19.62M - \frac{kV_0}{S}.$

Robot Arm Function (8-dimensional)

The robot arm function models the position of a 4 segment robotic arm. The equation has the form:

$f(x) = (u^2 + v^2)^{0.5}, \\ \text{ where } u= \sum^4_{i=1}L_i\cos\left(\sum^i_{j=1} \theta_j\right), \\ v= \sum^4_{i=1}L_i\sin\left(\sum^i_{j=1} \theta_j\right).$

AvGPR

AvGPR (Average Gaussian Process Regression) is an R package that uses weighted model averaging to calculate an average Gaussian Process regression model. It does this by emulating 5 different GPR packages then through cross validation, it calculates a “goodness of fit” parameter which is used to find the weight for each model. This is an advantageous method of regression as it gains the benefits associated with model averaging. The measure of fit statistic that is used in this package is:

$\sum^n_{i=1}[(\sigma_i - \epsilon_i) ^ 2 + \epsilon_i^2],$

where denotes the predicted point, $\sigma_i$ is the variance of i, and $\epsilon_i$ is the residual. Through analysis and experimentation, this statistic has shown to provide a better representation of the fit than RSS and normalised RSS.

Here is an example plot from an AvGPR regression:

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
R		R
inst/python		inst/python
man		man
tools		tools
AvGPR.Rproj		AvGPR.Rproj
DESCRIPTION		DESCRIPTION
LICENSE		LICENSE
LICENSE.md		LICENSE.md
NAMESPACE		NAMESPACE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Licenses found

Repository files navigation

AvGPR

Installation

Example

Gaussian Process Tutorial

Introduction

Background

Idea behind Gaussian Processes

1-Dimensional Example

2-Dimensional Example

Higher dimensional Examples

Borehole function (8-dimensional)

OTL Circuit Function (6-dimensional)

Piston Simulation Function (7-dimensional)

Robot Arm Function (8-dimensional)

AvGPR

About

Licenses found

Releases

Packages

Languages

License

Licenses found

benjmuller/AvGPR

Folders and files

Latest commit

History

Repository files navigation

AvGPR

Installation

Example

Gaussian Process Tutorial

Introduction

Background

Idea behind Gaussian Processes

1-Dimensional Example

2-Dimensional Example

Higher dimensional Examples

Borehole function (8-dimensional)

OTL Circuit Function (6-dimensional)

Piston Simulation Function (7-dimensional)

Robot Arm Function (8-dimensional)

AvGPR

About

Topics

Resources

License

Licenses found

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages