# Scalable Exact GP Posterior Sampling using Contour Integral Quadrature

This notebook demonstrates the most simple usage of contour integral quadrature with msMINRES as described [here](https://arxiv.org/pdf/2006.11267.pdf) to sample from the predictive distribution of an exact GP.

Note that to achieve results where Cholesky would run the GPU out of memory, you'll either need to have KeOps installed (see our KeOps tutorial in this same folder), or use the `checkpoint_kernel` beta feature. Despite this, on this relatively simple example with 1000 training points but seeing to sample at 20000 test points in 1D, we will achieve significant speed ups over Cholesky.

In [1]:
import math
import torch
import gpytorch
from matplotlib import pyplot as plt

%matplotlib inline
%load_ext autoreload
%autoreload 2

In [2]:
# Training data is 11 points in [0,1] inclusive regularly spaced
train_x = torch.linspace(0, 1, 1000)
# True function is sin(2*pi*x) with Gaussian noise
train_y = torch.sin(train_x * (2 * math.pi)) + torch.randn(train_x.size()) * 0.3

### Are we running with KeOps?

If you have KeOps, change the below flag to `True` to run with a significantly larger test set.

In [3]:
HAVE_KEOPS = False

### Define an Exact GP Model and train

In [4]:
class ExactGPModel(gpytorch.models.ExactGP):
    def __init__(self, train_x, train_y, likelihood):
        super(ExactGPModel, self).__init__(train_x, train_y, likelihood)
        self.mean_module = gpytorch.means.ConstantMean()
        
        if HAVE_KEOPS:
            self.covar_module = gpytorch.kernels.ScaleKernel(gpytorch.kernels.keops.RBFKernel())
        else:
            self.covar_module = gpytorch.kernels.ScaleKernel(gpytorch.kernels.RBFKernel())
    
    def forward(self, x):
        mean_x = self.mean_module(x)
        covar_x = self.covar_module(x)
        return gpytorch.distributions.MultivariateNormal(mean_x, covar_x)

# initialize likelihood and model
likelihood = gpytorch.likelihoods.GaussianLikelihood()
model = ExactGPModel(train_x, train_y, likelihood)

In [5]:
train_x = train_x.cuda()
train_y = train_y.cuda()
model = model.cuda()
likelihood = likelihood.cuda()

In [6]:
# Find optimal model hyperparameters
model.train()
likelihood.train()

# Use the adam optimizer
optimizer = torch.optim.Adam([
    {'params': model.parameters()},  # Includes GaussianLikelihood parameters
], lr=0.1)

# "Loss" for GPs - the marginal log likelihood
mll = gpytorch.mlls.ExactMarginalLogLikelihood(likelihood, model)

training_iter = 50
for i in range(training_iter):
    # Zero gradients from previous iteration
    optimizer.zero_grad()
    # Output from model
    output = model(train_x)
    # Calc loss and backprop gradients
    loss = -mll(output, train_y)
    loss.backward()
    print('Iter %d/%d - Loss: %.3f   lengthscale: %.3f   noise: %.3f' % (
        i + 1, training_iter, loss.item(),
        model.covar_module.base_kernel.lengthscale.item(),
        model.likelihood.noise.item()
    ))
    optimizer.step()

Iter 1/50 - Loss: 0.897   lengthscale: 0.693   noise: 0.693
Iter 2/50 - Loss: 0.853   lengthscale: 0.644   noise: 0.644
Iter 3/50 - Loss: 0.807   lengthscale: 0.598   noise: 0.598
Iter 4/50 - Loss: 0.765   lengthscale: 0.554   noise: 0.554
Iter 5/50 - Loss: 0.715   lengthscale: 0.513   noise: 0.513
Iter 6/50 - Loss: 0.678   lengthscale: 0.474   noise: 0.474
Iter 7/50 - Loss: 0.637   lengthscale: 0.439   noise: 0.437
Iter 8/50 - Loss: 0.602   lengthscale: 0.408   noise: 0.402
Iter 9/50 - Loss: 0.566   lengthscale: 0.380   noise: 0.370
Iter 10/50 - Loss: 0.537   lengthscale: 0.356   noise: 0.340
Iter 11/50 - Loss: 0.506   lengthscale: 0.335   noise: 0.312
Iter 12/50 - Loss: 0.473   lengthscale: 0.317   noise: 0.286
Iter 13/50 - Loss: 0.440   lengthscale: 0.301   noise: 0.262
Iter 14/50 - Loss: 0.421   lengthscale: 0.288   noise: 0.240
Iter 15/50 - Loss: 0.391   lengthscale: 0.276   noise: 0.219
Iter 16/50 - Loss: 0.371   lengthscale: 0.266   noise: 0.201
Iter 17/50 - Loss: 0.350   length

### Define test set

If we have KeOps installed, we'll test on 50000 points instead of 20000.

In [7]:
if HAVE_KEOPS:
    test_n = 50000
else:
    test_n = 20000

test_x = torch.linspace(0, 1, test_n).cuda()
print(test_x.shape)

torch.Size([20000])


### Draw a sample with CIQ

To do this, we just add the `ciq_samples` setting to the rsample call. We additionally demonstrate all relevant settings for controlling Contour Integral Quadrature:

- The `ciq_samples` setting determines whether or not to use CIQ
- The `num_contour_quadrature` setting controls the number of quadrature sites (Q in the paper).
- The `minres_tolerance` setting controls the error we tolerate from minres (here, <0.01%).

Note that, of these settings, increase num_contour_quadrature is unlikely to improve performance. As Theorem 1 from the paper demonstrates, virtually all of the error in this method is controlled by minres_tolerance. Here, we use a quite tight tolerance for minres.

In [9]:
import time

model.train()
likelihood.train()

# Get into evaluation (predictive posterior) mode
model.eval()
likelihood.eval()

# Test points are regularly spaced along [0,1]
# Make predictions by feeding model through likelihood

test_x.requires_grad_(True)

with torch.no_grad():
    observed_pred = likelihood(model(test_x))
    
    # All relevant settings for using CIQ.
    #   ciq_samples(True) - Use CIQ for sampling
    #   num_contour_quadrature(10) -- Use 10 quadrature sites (Q in the paper)
    #   minres_tolerance -- error tolerance from minres (here, <0.01%).
    with gpytorch.settings.ciq_samples(True), gpytorch.settings.num_contour_quadrature(10), gpytorch.settings.minres_tolerance(1e-4):
        %time y_samples = observed_pred.rsample(torch.Size([2]))
    
    # Make sure we use Cholesky
    with gpytorch.settings.fast_computations(covar_root_decomposition=False):
        %time y_samples = observed_pred.rsample(torch.Size([2]))



hello None
Wall time: 237 ms




Wall time: 1.54 s
