<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Kernel-Trick" data-toc-modified-id="Kernel-Trick-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Kernel Trick</a></span><ul class="toc-item"><li><span><a href="#Higher-Dimension" data-toc-modified-id="Higher-Dimension-1.1"><span class="toc-item-num">1.1&nbsp;&nbsp;</span>Higher Dimension</a></span></li></ul></li><li><span><a href="#Kernel-Discussion" data-toc-modified-id="Kernel-Discussion-2"><span class="toc-item-num">2&nbsp;&nbsp;</span>Kernel Discussion</a></span><ul class="toc-item"><li><span><a href="#RBF---Radius-Basis-Kernel" data-toc-modified-id="RBF---Radius-Basis-Kernel-2.1"><span class="toc-item-num">2.1&nbsp;&nbsp;</span>RBF - Radius Basis Kernel</a></span><ul class="toc-item"><li><span><a href="#Motivation" data-toc-modified-id="Motivation-2.1.1"><span class="toc-item-num">2.1.1&nbsp;&nbsp;</span>Motivation</a></span><ul class="toc-item"><li><span><a href="#Intro-Situation" data-toc-modified-id="Intro-Situation-2.1.1.1"><span class="toc-item-num">2.1.1.1&nbsp;&nbsp;</span>Intro Situation</a></span></li><li><span><a href="#Intro-Situation---One-Function" data-toc-modified-id="Intro-Situation---One-Function-2.1.1.2"><span class="toc-item-num">2.1.1.2&nbsp;&nbsp;</span>Intro Situation - One Function</a></span></li></ul></li><li><span><a href="#Intro-Situation---One-Function" data-toc-modified-id="Intro-Situation---One-Function-2.1.2"><span class="toc-item-num">2.1.2&nbsp;&nbsp;</span>Intro Situation - One Function</a></span><ul class="toc-item"><li><span><a href="#Adding-more-points" data-toc-modified-id="Adding-more-points-2.1.2.1"><span class="toc-item-num">2.1.2.1&nbsp;&nbsp;</span>Adding more points</a></span></li><li><span><a href="#Q:-What-would-it-look-like-for-5-different-points?" data-toc-modified-id="Q:-What-would-it-look-like-for-5-different-points?-2.1.2.2"><span class="toc-item-num">2.1.2.2&nbsp;&nbsp;</span>Q: What would it look like for 5 different points?</a></span></li><li><span><a href="#Q:-What-about-an-additional-point-(not-easily-separable)?" data-toc-modified-id="Q:-What-about-an-additional-point-(not-easily-separable)?-2.1.2.3"><span class="toc-item-num">2.1.2.3&nbsp;&nbsp;</span>Q: What about an additional point (not easily separable)?</a></span></li></ul></li><li><span><a href="#How-This-Is-Helpful?" data-toc-modified-id="How-This-Is-Helpful?-2.1.3"><span class="toc-item-num">2.1.3&nbsp;&nbsp;</span>How This Is Helpful?</a></span></li></ul></li><li><span><a href="#Note-on-Hyperparameter-$\gamma$" data-toc-modified-id="Note-on-Hyperparameter-$\gamma$-2.2"><span class="toc-item-num">2.2&nbsp;&nbsp;</span>Note on Hyperparameter $\gamma$</a></span></li><li><span><a href="#Scikit-Learn-Example" data-toc-modified-id="Scikit-Learn-Example-2.3"><span class="toc-item-num">2.3&nbsp;&nbsp;</span>Scikit Learn Example</a></span></li></ul></li></ul></div>

# Kernel Trick

When a simple model isn't good enough, extend to higher dimensions

## Higher Dimension

(consider a line with no good cut; extend to parabola)
(consider a 2D → 3D & EQUIVALENT making a higher degree polynomial)

We consider all the possible ways to combine:

2D → 5D
$x,y --> (x,y,x^2,xy,y^2)$

2d separator <-- Get a 4D separator

A kernel describes the mapping/transformation

With polynomial kernel, we get more options to try (vs linear)

# Kernel Discussion

> A **kernel** is really a way to split the data into different parts

A common one for SVMs is **RBF**

## RBF - Radius Basis Kernel

### Motivation

#### Intro Situation

Imagine two points and we defined, we can generate a function for each point
![](images/rbf_2_separated.png)

#### Intro Situation - One Function

![](images/rbf_2_combined.png)

#### Adding more points

![](images/rbf_3_combined.png)
![](images/rbf_3_all.png)

#### Q: What would it look like for 5 different points?

![](images/rbf_5_open.png)

#### Q: What about an additional point (not easily separable)?

![](images/rbf_6_open.png)

### How This Is Helpful?

> We can use the hills and valleys to separate the points!

Record the heights over each point for each RBF

This turns into a vector (higher dimensional space) → likely separable

Uses hyperplane to get the weights

![https://www.researchgate.net/figure/Figure-B16-Non-linear-classifier-using-Kernel-trick_fig13_324250451](images/kernel_trick_hyperdimensional.png)

## Note on Hyperparameter $\gamma$

$\gamma$ hyperparameter gives narrow/fat (big/small $\gamma$) 

This essentially allows us to get less/more overfitting

## Scikit Learn Example

> https://scikit-learn.org/stable/auto_examples/svm/plot_rbf_parameters.html

In [None]:
print(__doc__)

import numpy as np
import matplotlib.pyplot as plt
from matplotlib.colors import Normalize

from sklearn.svm import SVC
from sklearn.preprocessing import StandardScaler
from sklearn.datasets import load_iris
from sklearn.model_selection import StratifiedShuffleSplit
from sklearn.model_selection import GridSearchCV


# Utility function to move the midpoint of a colormap to be around
# the values of interest.

class MidpointNormalize(Normalize):

    def __init__(self, vmin=None, vmax=None, midpoint=None, clip=False):
        self.midpoint = midpoint
        Normalize.__init__(self, vmin, vmax, clip)

    def __call__(self, value, clip=None):
        x, y = [self.vmin, self.midpoint, self.vmax], [0, 0.5, 1]
        return np.ma.masked_array(np.interp(value, x, y))

# #############################################################################
# Load and prepare data set
#
# dataset for grid search

iris = load_iris()
X = iris.data
y = iris.target

# Dataset for decision function visualization: we only keep the first two
# features in X and sub-sample the dataset to keep only 2 classes and
# make it a binary classification problem.

X_2d = X[:, :2]
X_2d = X_2d[y > 0]
y_2d = y[y > 0]
y_2d -= 1

# It is usually a good idea to scale the data for SVM training.
# We are cheating a bit in this example in scaling all of the data,
# instead of fitting the transformation on the training set and
# just applying it on the test set.

scaler = StandardScaler()
X = scaler.fit_transform(X)
X_2d = scaler.fit_transform(X_2d)

# #############################################################################
# Train classifiers
#
# For an initial search, a logarithmic grid with basis
# 10 is often helpful. Using a basis of 2, a finer
# tuning can be achieved but at a much higher cost.

C_range = np.logspace(-2, 10, 13)
gamma_range = np.logspace(-9, 3, 13)
param_grid = dict(gamma=gamma_range, C=C_range)
cv = StratifiedShuffleSplit(n_splits=5, test_size=0.2, random_state=42)
grid = GridSearchCV(SVC(), param_grid=param_grid, cv=cv)
grid.fit(X, y)

print("The best parameters are %s with a score of %0.2f"
      % (grid.best_params_, grid.best_score_))

# Now we need to fit a classifier for all parameters in the 2d version
# (we use a smaller set of parameters here because it takes a while to train)

C_2d_range = [1e-2, 1, 1e2]
gamma_2d_range = [1e-1, 1, 1e1]
classifiers = []
for C in C_2d_range:
    for gamma in gamma_2d_range:
        clf = SVC(C=C, gamma=gamma)
        clf.fit(X_2d, y_2d)
        classifiers.append((C, gamma, clf))

# #############################################################################
# Visualization
#
# draw visualization of parameter effects

plt.figure(figsize=(8, 6))
xx, yy = np.meshgrid(np.linspace(-3, 3, 200), np.linspace(-3, 3, 200))
for (k, (C, gamma, clf)) in enumerate(classifiers):
    # evaluate decision function in a grid
    Z = clf.decision_function(np.c_[xx.ravel(), yy.ravel()])
    Z = Z.reshape(xx.shape)

    # visualize decision function for these parameters
    plt.subplot(len(C_2d_range), len(gamma_2d_range), k + 1)
    plt.title("gamma=10^%d, C=10^%d" % (np.log10(gamma), np.log10(C)),
              size='medium')

    # visualize parameter's effect on decision function
    plt.pcolormesh(xx, yy, -Z, cmap=plt.cm.RdBu)
    plt.scatter(X_2d[:, 0], X_2d[:, 1], c=y_2d, cmap=plt.cm.RdBu_r,
                edgecolors='k')
    plt.xticks(())
    plt.yticks(())
    plt.axis('tight')

scores = grid.cv_results_['mean_test_score'].reshape(len(C_range),
                                                     len(gamma_range))

# Draw heatmap of the validation accuracy as a function of gamma and C
#
# The score are encoded as colors with the hot colormap which varies from dark
# red to bright yellow. As the most interesting scores are all located in the
# 0.92 to 0.97 range we use a custom normalizer to set the mid-point to 0.92 so
# as to make it easier to visualize the small variations of score values in the
# interesting range while not brutally collapsing all the low score values to
# the same color.

plt.figure(figsize=(8, 6))
plt.subplots_adjust(left=.2, right=0.95, bottom=0.15, top=0.95)
plt.imshow(scores, interpolation='nearest', cmap=plt.cm.hot,
           norm=MidpointNormalize(vmin=0.2, midpoint=0.92))
plt.xlabel('gamma')
plt.ylabel('C')
plt.colorbar()
plt.xticks(np.arange(len(gamma_range)), gamma_range, rotation=45)
plt.yticks(np.arange(len(C_range)), C_range)
plt.title('Validation accuracy')
plt.show()