# Multitask GP Regression

## Introduction

Multitask regression, introduced in [this paper](https://papers.nips.cc/paper/3189-multi-task-gaussian-process-prediction.pdf) learns similarities in the outputs simultaneously. It's useful when you are performing regression on multiple functions that share the same inputs, especially if they have similarities (such as being sinusodial). 

Given inputs $x$ and $x'$, and tasks $i$ and $j$, the covariance between two datapoints and two tasks is given by

$$  k([x, i], [x', j]) = k_\text{inputs}(x, x') * k_\text{tasks}(i, j)
$$

where $k_\text{inputs}$ is a standard kernel (e.g. RBF) that operates on the inputs.
$k_\text{task}$ is a lookup table containing inter-task covariance.

In [1]:
import math
import torch
import gpytorch
from matplotlib import pyplot as plt
from Data_Gen_Script import VField
import numpy as np
from scipy.stats import uniform

%matplotlib inline
%load_ext autoreload
%autoreload 2

tensor([[0.7755, 1.2539],
        [0.9630, 1.5359],
        [1.1248, 2.1697],
        ...,
        [0.5153, 1.5580],
        [0.6410, 0.6414],
        [1.3727, 1.7126]])


### Set up training data

In the next cell, we set up the training data for this example. We'll be using 100 regularly spaced points on [0,1] which we evaluate the function on and add Gaussian noise to get the training labels.

We'll have two functions - a sine function (y1) and a cosine function (y2).

For MTGPs, our `train_targets` will actually have two dimensions: with the second dimension corresponding to the different tasks.

In [2]:
n = 2000 # input size
x = np.random.rand(n,2)
vfield = VField()
y = vfield(x)
train_x = torch.Tensor(x[:int(0.8*n), :])
test_x = torch.Tensor(x[int(0.8*n):, :])
# test_x.shape
train_y = y[:int(0.8*n), :]
test_y = y[int(0.8*n):, :]

## Define a multitask model

The model should be somewhat similar to the `ExactGP` model in the [simple regression example](../01_Exact_GPs/Simple_GP_Regression.ipynb).
The differences:

1. We're going to wrap ConstantMean with a `MultitaskMean`. This makes sure we have a mean function for each task.
2. Rather than just using a RBFKernel, we're using that in conjunction with a `MultitaskKernel`. This gives us the covariance function described in the introduction.
3. We're using a `MultitaskMultivariateNormal` and `MultitaskGaussianLikelihood`. This allows us to deal with the predictions/outputs in a nice way. For example, when we call MultitaskMultivariateNormal.mean, we get a `n x num_tasks` matrix back.

You may also notice that we don't use a ScaleKernel, since the MultitaskKernel will do some scaling for us. (This way we're not overparameterizing the kernel.)

In [3]:
class MultitaskGPModel(gpytorch.models.ExactGP):
    def __init__(self, train_x, train_y, likelihood):
        super(MultitaskGPModel, self).__init__(train_x, train_y, likelihood)
        self.mean_module = gpytorch.means.MultitaskMean(
            gpytorch.means.ConstantMean(), num_tasks=2
        )
        self.covar_module = gpytorch.kernels.MultitaskKernel(
            gpytorch.kernels.RBFKernel(), num_tasks=2, rank=1
        )

    def forward(self, x):
        mean_x = self.mean_module(x)
        covar_x = self.covar_module(x)
        return gpytorch.distributions.MultitaskMultivariateNormal(mean_x, covar_x)

    
likelihood = gpytorch.likelihoods.MultitaskGaussianLikelihood(num_tasks=2)
model = MultitaskGPModel(train_x, train_y, likelihood)

### Train the model hyperparameters

In [4]:
# this is for running the notebook in our testing framework
import os
smoke_test = ('CI' in os.environ)
training_iterations = 2 if smoke_test else 100


# Find optimal model hyperparameters
model.train()
likelihood.train()

# Use the adam optimizer
optimizer = torch.optim.Adam(model.parameters(), lr=0.1)  # Includes GaussianLikelihood parameters

# "Loss" for GPs - the marginal log likelihood
mll = gpytorch.mlls.ExactMarginalLogLikelihood(likelihood, model)

for i in range(training_iterations):
    optimizer.zero_grad()
    output = model(train_x)
    loss = -mll(output, train_y)
    loss.backward()
    print('Iter %d/%d - Loss: %.3f' % (i + 1, training_iterations, loss.item()))
    optimizer.step()

Iter 1/100 - Loss: 1.113
Iter 2/100 - Loss: 1.077
Iter 3/100 - Loss: 1.040
Iter 4/100 - Loss: 1.003
Iter 5/100 - Loss: 0.965
Iter 6/100 - Loss: 0.927
Iter 7/100 - Loss: 0.888
Iter 8/100 - Loss: 0.849
Iter 9/100 - Loss: 0.809
Iter 10/100 - Loss: 0.769
Iter 11/100 - Loss: 0.729
Iter 12/100 - Loss: 0.688
Iter 13/100 - Loss: 0.647
Iter 14/100 - Loss: 0.606
Iter 15/100 - Loss: 0.565
Iter 16/100 - Loss: 0.523
Iter 17/100 - Loss: 0.482
Iter 18/100 - Loss: 0.441
Iter 19/100 - Loss: 0.400
Iter 20/100 - Loss: 0.360
Iter 21/100 - Loss: 0.320
Iter 22/100 - Loss: 0.281
Iter 23/100 - Loss: 0.242
Iter 24/100 - Loss: 0.204
Iter 25/100 - Loss: 0.167
Iter 26/100 - Loss: 0.132
Iter 27/100 - Loss: 0.098
Iter 28/100 - Loss: 0.065
Iter 29/100 - Loss: 0.034
Iter 30/100 - Loss: 0.005
Iter 31/100 - Loss: -0.022
Iter 32/100 - Loss: -0.047
Iter 33/100 - Loss: -0.069
Iter 34/100 - Loss: -0.089
Iter 35/100 - Loss: -0.107
Iter 36/100 - Loss: -0.122
Iter 37/100 - Loss: -0.134
Iter 38/100 - Loss: -0.143
Iter 39/100 -

### Make predictions with the model

In [5]:
# Set into eval mode
model.eval()
likelihood.eval()

# Initialize plots
# f, (y1_ax, y2_ax) = plt.subplots(1, 2, figsize=(8, 3))

# Make predictions
with torch.no_grad(), gpytorch.settings.fast_pred_var():
    predictions = likelihood(model(test_x))
    mean = predictions.mean
    covariance = predictions.covariance_matrix
    lower, upper = predictions.confidence_region()

print(mean.shape, covariance.shape)
print(f"mean:\n {mean}\n covariance:\n {covariance}")
    
# # This contains predictions for both tasks, flattened out
# # The first half of the predictions is for the first task
# # The second half is for the second task

# # Plot training data as black stars
# y1_ax.plot(train_x.detach().numpy(), train_y[:, 0].detach().numpy(), 'k*')
# # Predictive mean as blue line
# y1_ax.plot(test_x.numpy(), mean[:, 0].numpy(), 'b')
# # Shade in confidence 
# y1_ax.fill_between(test_x.numpy(), lower[:, 0].numpy(), upper[:, 0].numpy(), alpha=0.5)
# y1_ax.set_ylim([-3, 3])
# y1_ax.legend(['Observed Data', 'Mean', 'Confidence'])
# y1_ax.set_title('Observed Values (Likelihood)')

# # Plot training data as black stars
# y2_ax.plot(train_x.detach().numpy(), train_y[:, 1].detach().numpy(), 'k*')
# # Predictive mean as blue line
# y2_ax.plot(test_x.numpy(), mean[:, 1].numpy(), 'b')
# # Shade in confidence 
# y2_ax.fill_between(test_x.numpy(), lower[:, 1].numpy(), upper[:, 1].numpy(), alpha=0.5)
# y2_ax.set_ylim([-3, 3])
# y2_ax.legend(['Observed Data', 'Mean', 'Confidence'])
# y2_ax.set_title('Observed Values (Likelihood)')

# None

torch.Size([400, 2]) torch.Size([800, 800])
mean:
 tensor([[2.7331, 3.0249],
        [0.6658, 1.1221],
        [1.4976, 1.8884],
        [0.9946, 1.5341],
        [1.6013, 2.0730],
        [1.1205, 1.5370],
        [1.2469, 1.6842],
        [0.7642, 1.2598],
        [3.1728, 3.4279],
        [1.2255, 1.7454],
        [1.5258, 1.9614],
        [1.3026, 1.6858],
        [2.1324, 2.5344],
        [0.5791, 1.0772],
        [2.5914, 2.8625],
        [0.8225, 1.3300],
        [0.5106, 1.0130],
        [0.7567, 1.2487],
        [1.1896, 1.5807],
        [2.0414, 2.4620],
        [2.2778, 2.5796],
        [0.6476, 1.1159],
        [0.5458, 1.0540],
        [0.9920, 1.4213],
        [0.9265, 1.4216],
        [2.1411, 2.4702],
        [0.3873, 0.8949],
        [3.4742, 3.6996],
        [2.4984, 2.8052],
        [0.5960, 1.1137],
        [0.8272, 1.3549],
        [0.6943, 1.1472],
        [0.6862, 1.1471],
        [1.2699, 1.7299],
        [1.0771, 1.6077],
        [2.7110, 2.9777],
        [3.43

In [6]:
loc = torch.Tensor([[0.2, 0.1]])
print(loc.shape)
with torch.no_grad(), gpytorch.settings.fast_pred_var():
    pred = likelihood(model(loc))
    mean = pred.mean
    covar = pred.covariance_matrix

mean

torch.Size([1, 2])


tensor([[0.5207, 1.0256]])