# Kernel Trick

In this exercise you learn to work with kernel functions directly.

a) What is the feature map $\phi$ of the polynomial kernel $K(\vec x, \vec y) = (\vec x^T \vec y)^d$ for two-dimensional ($d = 2$) observations $\vec x, \vec y \in \mathbb R^d$?

#### TODO

b) Consider the radial basis function kernel (RBF) / Gaussian Kernel:
$$ K(\vec x, \vec y) = \exp \left(-\frac{1}{2}||\vec x - \vec y||^2\right) $$
It is possible to show, that the feature map of this kernel is given by:
$$ \phi(\vec x)^T \phi(\vec y) = \sum_{j=0}^\infty \frac{(\vec x^T \vec y)^j}{j!}\exp\left(-\frac{1}{2}||\vec x||\right)\exp\left(-\frac{1}{2}||\vec y||\right) $$
Evaluate this kernel:
 - What dimensionality has its feature map?
 - What does this mean for the training data?

#### TODO

# Support Vector Machines with SKLearn

In this task we will explore the decision boundaries for two classes with $d = 2$ for the SVM. To do so, we will use the `make_blob` function from `SKLearn`. In the jupyter notebook you can find an example using `make_blobs` to generate some data. Furthermore, you can find a simple function `plot_svc_decision_function` which plots the support vectors and decision boundaries of a trained SVM model.

In [None]:
from sklearn.datasets import make_blobs
from plotly import graph_objects as go
import numpy as np

# Example
blobs = make_blobs(n_samples=400,n_features=2,centers=2,cluster_std=2,random_state=42)
X = blobs[0]
y = blobs[1]

X0 = X[y==0]
X1 = X[y==1]

fig = go.Figure()
fig.add_trace(go.Scatter(x=X0[:,0],y=X0[:,1],mode="markers",fillcolor="red"))
fig.add_trace(go.Scatter(x=X1[:,0],y=X1[:,1],mode="markers",fillcolor="blue"))
fig.show()

In [None]:
def plot_svc_decision_function(model, xlim=None, ylim=None, plot_support=False):
    # If provided a function, this will fail and set is_fun to True
    is_fun = False
    try:
        dump = model.decision_function([[0,0]])
    except:
        is_fun = True
    if xlim is None:
        xlim = [0,1]
    if ylim is None:
        ylim = [0,1]
    
    # create grid to evaluate model
    x = np.linspace(xlim[0], xlim[1], 30)
    y = np.linspace(ylim[0], ylim[1], 30)
    Y, X = np.meshgrid(y, x)
    xy = np.vstack([X.ravel(), Y.ravel()]).T
    if is_fun:
        P = np.array([model(p) for p in xy]).reshape(X.shape)
    else:
        P = model.decision_function(xy).reshape(X.shape)
    
    # plot decision boundary and margins
    fig = go.Figure(layout=go.Layout(
        xaxis=dict(range=xlim),
        yaxis=dict(range=ylim)
    ))
    fig.add_trace(go.Contour(x=x,y=y,z=P.T,contours=dict(start=-1,end=1,size=1)))
    
    # plot support vectors
    if plot_support and not is_fun:
        pos_supp = model.support_vectors_[model.dual_coef_[0] >= 0]
        neg_supp = model.support_vectors_[model.dual_coef_[0] < 0]
        fig.add_trace(go.Scatter(x=pos_supp[:,0],y=pos_supp[:, 1], mode="markers", fillcolor="red"))
        fig.add_trace(go.Scatter(x=neg_supp[:,0],y=neg_supp[:, 1], mode="markers", fillcolor="cyan"))
    fig.show()

a) Use the `make_blobs` function to generate two datasets with $N=60$ and $N=120$ data points with $2$ centers and `cluster_std` equal to $0.6$. Make sure to use the same `random_state` (seed) for both data sets.

Train a linear SVM model with $C = 1e10$ on both data sets and plot the decision boundaries using `plot_svc_decision_function` function.

Compare both decision boundaries - What do you see? Interpret the results with respect to the optimization problem of the SVM.

In [None]:
from sklearn.svm import LinearSVC # <- This is the linear SVM model of sklearn
# TODO

b) Use the `make_blobs` function to generate a dataset with $N=100$ data points with $2$ centers and a `cluster_std` of $0.8$.

Train two linear SVM models on this data set:
 - For the first SVM use $C = 10$.
 - For the second one, use $C = 0.1$.
 
Again, plot the decision boundaries using `plot_svc_decision_function` function.

Compare both decision boundaries - What do you see? Interpret the results with respect to the optimization problem of the SVM.

In [None]:
# TODO

c) Implement a grid search to find the best SVM with respect to its cross-validated accuracy on the `iris` data set. Use a $5$-fold cross validation and vary different $C$ parameters, as well as kernels and their parameters. You may use SKLearns' `GridSearchCV` function to implement the grid search. A list of kernels can be found here https://scikit-learn.org/stable/modules/svm.html#svm-kernels.

What is the best configuration you can find and what is its accuracy?

In [None]:
# TODO

# Threshold Logic Unit

a) Design a TLU Network for each of the following Boolean functions:

 - $f(x) = x_1 \neg x_2 \neg x_3 \lor \neg x_1 x_3 \lor x_1 x_4$
 - $g(x) = (x_1 \lor x_2) \land (\neg x_3 \lor x_2x_3) \land (x_4 x_5)$

#### TODO

b) How many computations (summation and multiplication) are necessary to compute the output of the following networks? How many parameters are stored in each network?

*Note:* Here $10 \to 20$ denotes a layer with $10$ inputs and $20$ outputs.

 - $N_1: 10 \to 20 \to 10$
 - $N_2: 10 \to 200 \to 10 \to 10$
 - $N_3: 10 \to 200 \to 500 \to 10$

#### TODO

c) Design a TLU structure with only two units which implements XOR.

*Hint:* Your design does not need to be a layered network.

#### TODO

# SVMs and Perceptrons

a) Consider a linear SVM with provided $\vec{w} = (-1,2)$ and $b = -3$.

Draw a minimal perceptron network, that encodes the linear SVM classifier with a minimal number of perceptrons.

#### TODO

b) You are using an SVM with the $\tanh$ kernel function $k(x,y) = \tanh(\alpha x^Ty + c)$ on data in $\mathbb{R}^3$. After training, you obtain the all primal and dual parameters for the kernel SVM and wish to encode the classifier in a perceptron network.

How many perceptrons are necessary to encode the parametrized classifier?

*Hint:* Take a look at the dual classification function on slide $2:93$ of the lecture. You will have to replace the product with the kernel.

#### TODO

c) In the jupyter notebook, you are provided with a dataset and a fitted kernel SVM classifier. You can obtain the parameters of the kernel SVM via the fields `support_vectors_` $(x_i)$ and `dual_coef_` $(\lambda_i y_i)$. The parameter $\alpha$ is $0.25$ and $c$ is $-2$. Compute the matrices necessary to compute classifications analogous to the formula from the lecture:

$$ f(\vec{x}) = \varphi_2(W_2 \cdot \mathbb{1} \varphi_1(W_1 \cdot \mathbb{1} \vec{x})$$

Write a function `perceptron_SVM`, that returns $-1$ or $1$ for a given point and test your perceptron network against the provided SVM with the `same_decision` function provided in the notebook.

In [None]:
import numpy as np
from sklearn.svm import SVC
from sklearn.datasets import make_moons
from plotly import graph_objects as go

moons = make_moons(1000,noise=.1,random_state=42)
X = moons[0]
y = moons[1]*2-1
svc = SVC(C=100,kernel='sigmoid', gamma=.25, coef0=-2)
svc.fit(X,y)

# Show decision boundaries
plot_svc_decision_function(svc, xlim=[-1.5,2.5], ylim=[-1,1.5])

# Show raw data
fig = go.Figure()
fig.add_trace(go.Scatter(x=X[y==-1][:,0],y=X[y==-1][:,1], mode="markers",fillcolor="red"))
fig.add_trace(go.Scatter(x=X[y==1][:,0],y=X[y==1][:,1], mode="markers",fillcolor="blue"))
fig.show()

In [None]:
a = .25
c = -2

# Adds as many 1's as necessary to stretch p to length l
# Example: padded([5,4,6],6) = [1,1,1,5,4,6]
def padded(p, l):
    ret = np.ones(l)
    ret[-len(p):] = p
    return ret

M1 = None # TODO
M2 = None # TODO

def perceptron_svc(p):
    global M1, M2
    # TODO
    return 1

In [None]:
def same_decision(func):
    global svc
    N = 2000
    l = []
    for p in np.random.sample(N).reshape(N//2,2)*4-1.5:
        l.append([*p,func(p)])
        if svc.predict([p])[0] != func(p):
            print("Prediction did not match on point ({:6.4f}, {:6.4f}).".format(p[0],p[1]))
            return False
    l = np.array(l)
    fig = go.Figure()
    la = l[l[:,2]>0][:,:2].T
    lb = l[l[:,2]<0][:,:2].T
    fig.add_trace(go.Scatter(x=la[0],y=la[1],mode="markers"))
    fig.add_trace(go.Scatter(x=lb[0],y=lb[1],mode="markers"))
    fig.show()
    print("All predictions were the same as the kernel SVM.")
    return True

same_decision(perceptron_svc)