# Self-study try-it activity 14.1: Selecting hyperplanes for two-dimensional data

## Assignment overview:

In this assignment, you will work with two different sets of data points â€” one exhibiting a linear pattern and the other a non-linear pattern. Your task is to determine which type of kernel performs best for each data set and provide a rationale for your selection.



In [None]:
# Import the necessary libraries.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn import svm
import math
import warnings
from sklearn.metrics import confusion_matrix, ConfusionMatrixDisplay
#Use the following to plot SVMs.
from mlxtend.plotting import plot_decision_regions # On terminal, install: pip install mlxtend

#### Task 1: SVM-based classifier on a simple data set
Load the two-dimensional data `Case1linear/X.npy` that has 20 rows and two columns, and the corresponding target `Case1linear/y.npy.`

In [None]:
n = 20 # n points in each group
# number used for colour array construction (assumes balanced classes)
X = np.load('data/Case1linear/X.npy')
y = np.load('data/Case1linear/y.npy')

**Plot the two-dimensional points and colour them so that points with the same class have the same colour.**

In [None]:
color = np.concatenate((np.repeat("red", n), np.repeat("blue",n)), axis=0) #y is split half half
plt.scatter(X[:,0], X[:,1], c = color, alpha= .5)
plt.xlabel("$x_1$", size = 15)
plt.ylabel("$x_2$", size = 15)
plt.title("Two classes that are linearly separable")
plt.show()

**Train an SVM classifier with a linear kernel.**

In [None]:
clf = svm.SVC(kernel = 'linear')
clf.fit(X, y);

**Show the confusion matrix on the training set.**

In [None]:
cm = confusion_matrix(y, clf.predict(X))
cmd = ConfusionMatrixDisplay(cm, display_labels=np.unique(y))
cmd.plot();

**Visualise the decision boundary of the kernel by using the `mlxtend` package.**

In [None]:
with warnings.catch_warnings(): #Otherwise, the package might throw an error that there is no "boundary" when all samples belong to the same class.
    warnings.simplefilter("ignore")
    plot_decision_regions(X, y, clf=clf, legend=2, colors = "red,blue", markers= "o", zoom_factor = 10);
    ax=plt.gca();
    plt.title("Linear decision boundary")
    plt.show();

## Question:
Relate the visualisation of the decision boundary with the performance observed in the confusion matrix.

## Answer:
The data is linearly separable, as shown in the visualisation, so the confusion matrix indicates perfect classification.

#### Task 2: SVM-based classifier on nested data
Load the two-dimensional data `Case1rings/X.npy` that has 150 rows and two columns, and the corresponding target `Case1rings/y.np`.

In [None]:
X = np.load('data/Case1rings/X.npy')
y = np.load('data/Case1rings/y.npy')

**Plot the two-dimensional points and colour them so that points with the same class have the same colour.**

In [None]:
n = 100
color = np.concatenate((np.repeat("blue", n), np.repeat("red",n/2)), axis=0)
plt.scatter(X[:,0], X[:,1], c = color, alpha= .5)
plt.xlabel("$x_1$", size = 15)
plt.ylabel("$x_2$", size = 15)
plt.title("A clustered class in a ring class")
plt.show()

**Train an SVM classifier with a linear kernel.**

In [None]:
clf = svm.SVC(kernel = 'linear')
clf.fit(X, y);

**Show the confusion matrix on the training set.**

In [None]:
cm = confusion_matrix(y, clf.predict(X))
cmd = ConfusionMatrixDisplay(cm, display_labels=np.unique(y))
cmd.plot();

**Visualise the decision boundary of the kernel by using the `mlxtend` package.**

In [None]:
with warnings.catch_warnings(): #Otherwise, the package might throw an error that there is no "boundary" when all samples belong to the same class.
    warnings.simplefilter("ignore")
    plot_decision_regions(X, y, clf=clf, legend=2, colors = "red,blue", markers= "o", zoom_factor = 10);
    ax=plt.gca();
    plt.show();

**Train an SVM classifier with a radial kernel.**

In [None]:
clf = svm.SVC(kernel = 'rbf')
clf.fit(X, y)
cm = confusion_matrix(y, clf.predict(X))
cmd = ConfusionMatrixDisplay(cm, display_labels=np.unique(y))
cmd.plot();

In [None]:
with warnings.catch_warnings(): #Otherwise, the package might throw an error that there is no "boundary" when all samples are classified as '1'
    warnings.simplefilter("ignore")
    plot_decision_regions(X, y, clf=clf, legend=2, colors = "red,blue", markers= "o", zoom_factor = 10);
    ax=plt.gca();
    plt.show();

**Try polynomial kernel with degree 2.**

In [None]:
clf = svm.SVC(kernel = 'poly', degree = 2)
clf.fit(X, y)
cm = confusion_matrix(y, clf.predict(X))
cmd = ConfusionMatrixDisplay(cm, display_labels=np.unique(y))
cmd.plot();

In [None]:
with warnings.catch_warnings(): #Otherwise, the package might throw an error that there is no "boundary" when all samples are classified as '1'
    warnings.simplefilter("ignore")
    plot_decision_regions(X, y, clf=clf, legend=2, colors = "red,blue", markers= "o", zoom_factor = 10);
    ax=plt.gca();
    plt.show();

## Question:
Compare the classifiers obtained in this case and discuss why some are better than others.

## Answer:

RBF and polynomial kernels succeed because they model non-linear decision boundaries, whereas linear SVM fails on non-linearly separable data.