## Kernel methods

Author: Julian Lißner

For questions and feedback write a mail to: [lissner@mechbau.uni-stuttgart.de](mailto:lissner@mechbau.uni-stuttgart.de)

In [None]:
import numpy as np
import matplotlib.pyplot as plt
import sys

sys.path.extend( ['provided_functions', 'incomplete_functions' ])
import kernels
import model_evaluation as evaluate

import result_check as check
import display_sets as display
import sample_sets as sample

## Interpolation and regression
- they are closely related but have a distinct difference
- in interpolation each support point (training sample) is matched exactly
- in regression the support points are chosen and training samples are fit under the given loss

### Interpolation

- given some support points (training samples), interpolating these will yield an approximation of the original function for good hyperparameters
- the coefficients $\underline w$ (model weight) are given as $\underline w = (\underline{\underline K}(\gamma))^{-1} \, \underline y \quad$ <br>with $\, \underline{\underline K}(\gamma) \hat = k( x_i, x_j; \gamma) $ for all training samples in $x$ and $y$
- the model can then be evaluated as $ \underline{\hat y}(\underline x_{\rm valid} ) = \underline{w} \, \underline{\underline{K}} ( \underline{x}, \underline x_{\rm valid}; \gamma )$

-----
__Task:__ Compute the kernel matrix using the available training data. Apply it for the interpolation of `x_valid`.<br>
Note that multiple executions will give you different results.

In [None]:
# all variables below can be adjusted
my_function = lambda x: -0.2*x**2 + np.sin( 2*x) + 0.5*(x+1)
interval = [0, 4]
n_train = 6
x_train, y_train = sample.function( my_function, interval, n_train) 
x_valid = sample.uniform_interval( *interval)


slope = 0.5
nugget = 1e-12

kernel_matrix = kernels.linear( x_train, x_train, slope)
kernel_matrix -= nugget* #TODO #subtract the nugget on the diagonal
K_inv = #TODO
coefficients = #TODO
kernel_response = kernels.linear( x_train, x_valid, slope)
y_interpolated = #TODO #evaluation of x_valid using 'kernel_response'

fig, ax = plt.subplots( figsize=(6,6) )
ax.scatter( x_train, y_train, facecolor='lightblue', edgecolor='k', label='support points' )
ax.plot( x_valid, my_function( x_valid), color='blue', linewidth=2, label='reference solution' )
ax.plot( x_valid, y_interpolated, color='red', linewidth=2, label='interpolation' )
ax.grid( ls=':' )
ax.legend()

- since the linear kernel is rather bad, different kernels can improve the result
- the gaussian kernel is a generally applicable kernel for a wide range of problems
- the gaussian kernel is defined as $k(x, x^{\prime}) = \exp( -\gamma ||x-x^{\prime}||_2^2 ) $
- an efficient implementation of the gaussian kernel for higher dimensional samples is already provided
----
__Task:__ Implement the `gaussian_1d` kernel in 'kernels.py'.

In [None]:
check.kernel_implementation( kernels.gaussian_1d)

- hyperparameters are parameters the model can not learn, the expert (i.e. you) has to adjust them
- here, hyperparameters are:
    - kernel function
    - parameters of the function
- the MSE (mean squared error) is a good error measure for model evaluation, it is defined as
$$ MSE = \frac1n\sum\limits_{i=1}^{n} (y_i -\hat y_i)^2 $$

----
__Task:__ Write the function `interpolation` in 'model_evaluation.py' and tune the hyperparameters to get a good match.

In [None]:
## Optional, adjust the sampled function
my_function = lambda x: -0.2*x**2 + np.sin( 2*x) + 0.5*(x+1)
interval = [0, 4]
n_train = 6
x_train, y_train = sample.function( my_function, interval, n_train, noise=0) 
x_valid = sample.uniform_interval( *interval)
y_valid = my_function( x_valid)

## Hyperparameters
## if you change the kernel function you will need different parameters
#help( kernels)
kernel_function = kernels.gaussian_1d #TODO #try different kernels
parameters = [ 1.01] 

## Interpolation
y_interpolated = evaluate.interpolation( #TODO, kernel_function, parameters, y_train=y_train, nugget=#TODO ) 
MSE = lambda y_1, y_2: #TODO
print( 'achieved MSE:', MSE( y_valid, y_interpolated) )

## Plotting
fig, ax = plt.subplots( figsize=(6,6) )
ax.scatter( x_train, y_train, facecolor='lightblue', edgecolor='k', label='support points' )
ax.plot( x_valid, y_valid, color='blue', linewidth=2, label='reference solution' )
ax.plot( x_valid, y_interpolated, color='red', linewidth=2, label='interpolation' )
ax.grid( ls=':' )
_=ax.legend()

### Least square regression
- in regression the best fit for multiple samples is searched
- interpolating these samples would usually lead to _overfitting_/oscillations
- the evaluation for regression is the same as for interpolation
- the model training is different with manually selected support points $\underline z$ under consideration of the loss/cost function
- the formula for the `coefficients` $\underline w$ is given as
$$ 
\underline w = (\underline{\underline L}^T\,\underline{\underline L})^{-1}\,\underline{\underline L}^T \, \underline y_{\rm train} $$

$\quad$ with $\underline{\underline L} = k( x_i, z_j;\gamma)$ for all training samples $x_i$ and support points $z_j$
- the model can then be evaluated as $ \underline{\hat y}(\underline x^\prime ) = \underline{w} \, \underline{\underline{K}} ( \underline{x}, \underline x^\prime; \gamma )$
- note that unlike $\underline{\underline K}$, the kernel matrix $\underline{\underline L}$ is generally not rectangular 
- the computation of ${\underline w}$ can be implemented more efficiently using the SVD
- recall the definition $\text{SVD}( \underline{\underline L}) = \underline{\underline V}\, \underline{\underline{\sigma}}\, \underline{\underline W}^T$ with $\underline{\underline V}^T \underline{\underline V}=\underline{\underline I}$ and $\underline{\underline W}^T \underline{\underline W}=\underline{\underline I}$
------
__Task:__ Write out the reformulation of the equation for ${\underline w}$ using the SVD and derive the efficient implementation on paper. Implement the efficient computation of the weights.<br>
Tune the hyperparameters to get a good fit.

In [None]:
## Definition of data, Optional: adjust the sampled function
my_function = lambda x: (-2*np.tanh( x-5) -2 + 1/3*x**2 -1/20*x**3 + np.exp( x/3) -1e-4*np.exp(x) -2*np.abs(x)**0.3).flatten()
interval = [0, 10]
n_samples = 125
noise = 0.15

x_train, y_train = sample.function( my_function, interval, n_samples, noise) 
x_valid = sample.uniform_interval( *interval)
y_valid = my_function( x_valid)


## Hyperparameters
kernel_function = kernels.linear #TODO #choose different kernel functions
parameters = [ 0.52] #TODO
support_points = np.array( [9, 1, 1.2] ) #TODO


## Model training
kernel_matrix = kernel_function( #TODO..., *parameters) 
L = kernel_matrix
coefficients = #TODO


## Evaluation
y_regression = evaluate.interpolation( #TODO, x_valid, kernel_function, parameters, coefficients) 

print( 'achieved MSE:', MSE( y_regression,y_valid) ) 
## Plotting
y_support = evaluate.interpolation( support_points, support_points, kernel_function, parameters, coefficients) 
fig, ax = plt.subplots( figsize=(6,6) )
ax.scatter( x_train, y_train, facecolor='lightblue', edgecolor='k', alpha=0.5, label='training samples' )
ax.scatter( support_points, y_support, facecolor='red', edgecolor='k', label='support points' )
ax.plot( x_valid, y_valid, color='blue', linewidth=2, label='reference solution' )
ax.plot( x_valid, y_regression, color='red', linewidth=2, label='interpolation' ) 

ax.grid( ls=':' )
_=ax.legend()
