# Gaussian processes – Kernel functions (R version)
GMM, INSA Toulouse, France <br />
Andrés F. López-Lopera, ONERA-DTIS <br />
May 2021
<br />
___

For this lab session, your are free to use the language of your choice (e.g. R or Python). In this notebook we propose R implementations.

In [None]:
library(plot3D) # library used for 2D plots
rm(list=ls()) # clean all the variables from the environment

## Covariance functions

We recall some usual covariance functions on $k: \mathbb{R} \times \mathbb{R} \to \mathbb{R}$:
- Squared Exponential (SE):
$$ k(x,y) = \sigma^2 \exp\left( - \frac{(x-y)^2}{2 \theta^2} \right)$$

- Matérn 5/2:
$$ k(x,y) = \sigma^2 \left(1+\frac{\sqrt{5} |x-y|}{\theta}+\frac{5 |x-y|^2}{3 \theta^2}\right)  \exp\left( - \frac{\sqrt{5}|x-y|}{\theta} \right) $$ 

- Matérn 3/2:
$$ k(x,y) = \sigma^2 \left(1+\frac{\sqrt{3} |x-y|}{\theta}\right)  \exp\left( - \frac{\sqrt{3}|x-y|}{\theta} \right) $$ 

- Exponential:
$$ k(x,y) = \sigma^2 \exp\left( - \frac{|x-y|}{\theta} \right)  $$ 

- Brownian:
$$ k(x,y) = \sigma^2 \min(x, y) $$ 

- White noise:
$$ k(x,y) = \sigma^2 \delta_{x,y} $$ 

- Constant:
$$ k(x,y) = \sigma^2 $$ 

- Linear:
$$ k(x,y) = \sigma^2 x y $$ 

- Cosine:
$$ k(x,y) = \sigma^2 \cos\left(\frac{x-y}{\theta}\right) $$ 

- Sinc:
$$ k(x,y) = \sigma^2 \frac{\theta}{x-y} \sin\left(\frac{x-y}{\theta}\right) $$ 

**Question 1.** For at least three kernels of your choice, write a function that takes as input the vectors ``x``, ``y`` and ``param`` and that returns the matrix with general terms $k(x_i, y_j)$.

In [None]:
SEKernel <- function(x, y, param){
    # input:
    #  x,y: input vectors
    #  param: parameters (sigma2,theta)
    # output:
    #  covariance matrix cov(x,y)
    sigma2 <- param[1]; theta <- param[2]
    dist <- outer(x/theta, y/theta, '-')
    kern = ## to be filled (covariance matrix)
    return(kern)
}

## to be filled (to define another 3 kernel functions) 

In [None]:
# Defining a wrapper function that takes the name of the kernel as an input parameter
# This function is called later
kernCompute <- function(x, y, kernType, param){
    # input:
    #  x,y: input vectors
    #  kernType: name of the covariance function
    #  param: parameters (e.g. sigma, theta)
    # output:
    #  kern: covariance matrix cov(x,y) according to the type of the covariance
    kernFun <- get(paste(kernType, 'Kernel', sep = ''))
    return(kernFun(x, y, param))
}

**Question 2.** For a grid of 100 points on $[0, 1]$, compute the covariance matrix associated to each kernel you wrote in **Question 1**. Simulate Gaussian samples using the function ``sample``.

In [None]:
# function for generating GP samples
jitter = 1e-10  # small number to ensure numerical stability (eigenvalues of K can decay rapidly)
sample <- function(mu, var, jitter, N){
    # Generate N samples from a multivariate Gaussian \mathcal{N}(mu, var)
    L <- ## to be filled (Cholesky decomposition)
    f_post <- ## to be filled (samples)
    return(f_post)
}    

In [None]:
n <- 100 # number of input points
x <- y <- seq(0, 1, length=n) # input vectors
param <- c(1, 0.1) # parameters of the GP
nsamples <- 10 # number of GP samples

In [None]:
# samples from different types of kernels
set.seed(1)
par(mfrow = c(1,2))
options(repr.plot.width = 15, repr.plot.height = 6)

kern <- kernCompute(x, y, kernType = 'SE', param)
image2D(kern, xlab = 'x', ylab = "x'", main = 'Squared Exponential')
samples <- sample(matrix(0, nrow = n, ncol = 1), kern,  jitter, nsamples)
matplot(x, samples, type = 'l', xlab = 'x', ylab = 'Y(x)', main = "GP samples")

## to be filled (repeat the plots for other kernel functions)

**Question 3.** Change the kernel and the kernel parameters. What are the effects on the sample paths? Write down your observations.

**Question 4.**  Using the SE kernel, generate a large number of samples and extract the vectors of the samples evaluated at two (or three) points of the input space. Plot the associated cloud of points. What happen if the two input points are close by? what happen if they are far away?

In [None]:
nsamples <- 100 # number of samples
kern <- kernCompute(x, y, kernType = 'SE', param) # covariance matrix
set.seed(1); samples <- sample(matrix(0, nrow = n, ncol = 1), kern, jitter, nsamples)

idx_points <- c(10, 15, 100)
x_points <- ## to be filled
samples_points <- ## to be filled

par(mfrow = c(1,2))
options(repr.plot.width = 15, repr.plot.height = 6)
plot(samples_points[1,], samples_points[2,], xlab = 'y1', ylab = 'y2')
plot(samples_points[1,], samples_points[3,], xlab = 'y1', ylab = 'y3')

## Building new kernels from other ones

**Question 5.**  As discussed in the cours, we can create new kernels by combining predefined ones, e.g.:

$$
\begin{array}{ll}
	\text{Sum of kernels:} & k(x, y) = k_1(x, y) + k_2(x, y) \\
	\text{Product of kernels:} & k(x, y) = k_1(x, y) \times k_2(x, y)
\end{array}	
$$

Play to make combinations of the kernel you wrote previously. Display the resulting covariance matrix and some GP samples.

In [None]:
par(mfrow = c(1,2))
options(repr.plot.width = 15, repr.plot.height = 6)

nsamples <- 10
kern <- ## to be filled

image2D(kern, xlab = 'x', ylab = 'y', main = 'linear + cosine')
set.seed(1); samples <- sample(matrix(0, nrow = n, ncol = 1), kern,  jitter, nsamples)
matplot(x, samples, type = 'l', xlab = 'x', ylab = 'samples')

## Gaussian process regression

We aim at approximating the test function $f : x \in [0, 1] \mapsto x + sin(6\pi x)$ by a Gaussian process regression model:

$$m(x) = k(x, X) k(X,X)^{-1} Y$$

$$c(x,y) = k(x,y) - k(x, X) k(X,X)^{-1} x(X,y)$$


**Question 6.** We write two functions $m$ and $c$ that return the conditional mean and covariance. These functions will typically take as inputs the scalar/vector of prediction point(s) ``x``, the DoE vector ``X``, the vector of responses ``Y``, a kernel function ``kern``, and the covariance parameters ``param``.

In [None]:
# functions used for computing the conditional mean and covariance functions
cond_mean <- function(x, X, Y, kernType, param){
    # input:
    #  x: vector of prediction points
    #  X: DoE vector
    #  Y: vector of responses
    #  kernType: type of covariance functions
    #  param: parameters of the covariance
    # output:
    #  m: conditional mean
    m <- ## to be filled
    return(m)
}

cond_cov <- function(x, X, Y, kernType, param){
    # input:
    #  x: vector of prediction points
    #  X: DoE vector
    #  Y: vector of responses
    #  kernType: type of covariance functions
    #  param: parameters of the covariance
    # output:
    #  C: # conditional covariance
    C <- ## to be filled
    return(C)
} 

**Question 7.** Create a design of experiment $X$ composed of 5 to 20 points in the input space (regularly spaced points for instance) and compute the vector of observations $Y =
f(X)$. Display in the same figure the design points and the target function.

In [None]:
f <- function(x) # target function
    return(10*x + sin(6*pi*x))

n_design <- 11 ## to be filled (number of input points)
X <- ## to be filled (design points)
Y <- ## to be filled (responses at the design points)

options(repr.plot.width = 6, repr.plot.height = 6)
par(mfrow = c(1,1))
plot(X, Y, type = 'p', pch = 4, col = 'red')
X2 <- seq(0, 1, length = 1e3)
lines(X2, f(X2), pch = 4, col = 'red', lty = 2, lwd = 1)
legend('topleft', c('obs','test function'),
       col = c('red','red'), lty = c(NA,2), pch = c(4,NA))

**Question 8.**  Considering the SE kernel, draw on the same graph $f(x)$, $m(x)$ and $95\%$ confidence intervals: $m(x) \pm 1.96 \sqrt{c(x, x)}$.

In [None]:
x <- seq(0, 1, length = 500) # vector of prediction points
kernType <- 'SE' # type of covariance functions
param <- c(1, 0.1) # parameters of the covariance
mu <- ## to be filled (mean vector)
Cov <- ## to be filled (covariance matrix)

plotGP <- function(x, m, c, X, Y, y) {
    # input:
    #  x: test points
    #  m: conditional mean vector
    #  c: conditional covariance matrix
    #  X: DoE vector
    #  Y: vector of responses
    #  y: responses at test points
    # output: GP regression plot
    upperBound <- m + 1.96*sqrt(abs(diag(c)))
    lowerBound <- m - 1.96*sqrt(abs(diag(c)))
    
    plot(x, upperBound, type = 'l', col = 'lightblue',
         lwd = 2, xlab = 'x', ylab = 'f(x)',
         xlim = c(min(x), max(x)),
         ylim = c(min(f(x)),max(f(x))))
    lines(x, lowerBound, type = 'l', col = 'lightblue', lwd = 2)
    polygon(c(x,rev(x)), c(upperBound, rev(lowerBound)),
            col = 'lightblue', border = NA)
    points(X, Y, type = 'p', pch = 4, col = 'red', lwd = 2)
    lines(x, m, col = 'dodgerblue', lwd = 2)
    lines(x, y, col = 'red', lty = 2, lwd = 1)

    legend('topleft', c('obs','predicted mean','CI 95%','test function'),
           col = c('red','dodgerblue','lightblue','red'), lty = c(NA,1,1,2),
           pch = c(4,NA,NA,NA), ncol = 1)
}

options(repr.plot.width = 7, repr.plot.height = 6)
plotGP(x, mu, Cov, X, Y, f(x))

**Question 9.**  Change the kernel as well as the values in ``param``. What is the effect of
- $\sigma^2$ on $m(x)$? Can you prove this result?
- $\sigma^2$ on the conditional variance $v(x) = c(x, x)$? Can you prove this result?
- $\theta$ on $m(x)$ (try (very) small and large values)?
- $\theta$ on $v(x)$ (try (very) small and large values)?

**Question 10.** Generate samples from the conditional process

In [None]:
par(mfrow = c(1,2))
nsamples <- 10
samples <- ## to be filled 

options(repr.plot.width = 15, repr.plot.height = 6)
image2D(Cov,  xlab = 'x', ylab = 'y', main = "conditional covariance matrix")
matplot(x, samples, type = 'l', xlab = 'x', ylab = 'samples')
points(X, Y, type = 'p', pch = 4, col = 'red', lwd = 2)

**Question 11.**  Use the resulting model to predict values of $f$ for $x \in [1, 1.5]$. What can you conclude?

In [None]:
x <- seq(0, 1.5, length = 500) # vector of prediction points
mu <- cond_mean(x, X, Y, kernType, param) # conditional mean
Cov <- cond_cov(x, X, Y, kernType, param) # conditional covariance

options(repr.plot.width = 7, repr.plot.height = 6)
plotGP(x, mu, Cov, X, Y, f(x))

**Question 12.** Repeat the procedure in **Question 11** but this time considering $k(x,y) = k_{lin}(x,y) + k_{cos}(x,y) + k_{SE}(x,y)$. For instance, fix the length-scale parameter of the cosine kernel to $\theta_{cos} = 1/(6\pi)$.

In [None]:
par(mfrow = c(1,2))
options(repr.plot.width = 15, repr.plot.height = 6)

linCosineSEKernel <- function(x, y, param){
    # input:
    #  x,y: input vectors
    #  param: parameters (sigma2_lin, sigma2_cos, theta_cos, sigma2_SE, theta_SE)
    # output:
    #  kern: covariance matrix cov(x,y)    
    kern <- ## to be filled 
    return(kern)
}

x <- seq(0, 1.5, length = 500) # vector of prediction points
kernType <- 'linCosineSE' # type of covariance functions
param <- c(1, 1, 1/(6*pi), 1, 0.5) # parameters of the covariance

#kern <- kernCompute(x, x, kernType, param) # covariance matrix
#image2D(kern, xlab = 'x', ylab = 'y', main = "linear + cosine")

mu <- cond_mean(x, X, Y, kernType, param) # conditional mean
Cov <- cond_cov(x, X, Y, kernType, param) # conditional covariance
image2D(Cov, xlab = 'x', ylab = 'y', main = "linear + cosine")
plotGP(x, mu, Cov, X, Y, f(x))

**Bonus question.** After testing different kernels and various values for $\sigma^2$ and $\theta$, which one would you recommend?