# Capturing Non-Linearity with Kernels

### Authors: James Chapman, Sahan Bulathwela and John Shawe-Taylor

## Introduction

This exercise aims to help you understand how kernel functions can be used to increase the experssivity of the model family by allowing the model to look for non-linear solutions while using a linear family. 

## Expressivity vs generalisation

When the solution space we search has access to relatively complex solutions (e.g. in contrast to linear solutions), we are increasing the expressivity of models. This in tern reduces the generalisation capabilities of the model and can risk overfitting. 

### Loading Data

We use a synthesised dataset for today's exercise. We start by taking the simpler dataset where the features have a linear relationship with the target veriable. 

In [1]:
#DO NOT CHANGE THIS CODE
import numpy as np
import matplotlib.pyplot as plt
X_train=np.loadtxt('./data/X_train_linear.txt')
X_test=np.loadtxt('./data/X_test_linear.txt')
y_train=np.loadtxt('./data/y_train_linear.txt')
y_test=np.loadtxt('./data/y_test_linear.txt')

### 1) Linear Regression
#### a) estimate w and b using linear regression based on the training data (X_train, y_train)

In [2]:
#YOUR CODE HERE

#### b) calculate the mean squared error for the training data and test data

In [3]:
#YOUR CODE HERE

### 2) Exploring the effects of dimensionality
#### a) Using different sized subsets of the training data, repeat (1) and illustrate the trends of training error and test error as the sample size is increased

*** Hint:*** You may use a visualisation to explore this relationship.


In [4]:
#YOUR CODE HERE

### 3) Regularisation (Ridge Regression)

In this section, we penalise the model for choosing high complexity using L2-Regularisation, aiming to *improve generalisability*.

#### a) Using a suitable plot, demonstrate the effect of ridge regeularisation on the train and test error

In [5]:
#YOUR CODE HERE

#### b) Using a suitable metric or plot, demonstrate the effect of ridge regularisation on the estimated weights

In [6]:
#YOUR CODE HERE

## 4) Kernel Regression

In this section, we aim to use kernelisation, which allows the solution to provide a solution that can go beyond linear modelling. This modification, *increases expressivity* of the downstream model.

### a) Demonstrate the equivalence of Ridge and Kernel Ridge Regression when using a linear kernel

First, we use a "linear" kernel, where we solve the regression problem using the dual form, but the solution is equivalent to the primal form linear regression. 

In [7]:
#YOUR CODE HERE

### b) Construct the feature space represented by a polynomial kernel with degree 2 and demonstrate that for small values of ridge regularisation, the predictions of ridge regression with the explicit feature space and kernel ridge regression with the kernel representation are the same (or similar)

In [8]:
#YOUR CODE HERE

### c) Using the following data plot the train and test error for kernel ridge regression with polynomial kernel of different degrees

In [9]:
#DO NOT CHANGE THIS CODE
X_train=np.loadtxt('./data/X_train_poly.txt')
X_test=np.loadtxt('./data/X_test_poly.txt')
y_train=np.loadtxt('./data/y_train_poly.txt')
y_test=np.loadtxt('./data/y_test_poly.txt')

In [10]:
#YOUR CODE HERE

### d) Repeat 3a) for a polynomial kernel with degree of your choice

In [11]:
#YOUR CODE HERE

### e) Using the following data demonstrate compare the performance of kernel ridge regression with rbf kernel and polynomial kernel

In [12]:
#DO NOT CHANGE THIS CODE
X_train=np.loadtxt('./data/X_train_poly.txt')
X_test=np.loadtxt('./data/X_test_poly.txt')
y_train=np.loadtxt('./data/y_train_poly.txt')
y_test=np.loadtxt('./data/y_test_poly.txt')

FileNotFoundError: ./data/X_train_nonlinear.txt not found.

In [None]:
#YOUR CODE HERE