# Support Vector Machines
Load the `mnist` dataset. Split it into training and test sets. Train and test a support vector machine model using scikit-learn. Check the documentation to identify the most important hyperparameters, attributes, and methods of the model. Use them in practice.

## Importing Datasets

In [1]:
import pandas as pd
import sklearn.model_selection
import sklearn.metrics
import sklearn.svm
import plotly.express as px

## Loading the Dataset

In [2]:
df = pd.read_csv("../../datasets/mnist.csv")
df = df.set_index("id")
df.head(3)

Unnamed: 0_level_0,class,pixel1,pixel2,pixel3,pixel4,pixel5,pixel6,pixel7,pixel8,pixel9,...,pixel775,pixel776,pixel777,pixel778,pixel779,pixel780,pixel781,pixel782,pixel783,pixel784
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
31953,5,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
34452,8,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
60897,5,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


## Splitting Data into Training and Test Sets

In [3]:
x = df.drop(["class"], axis=1)
y = df["class"]

x_train, x_test, y_train, y_test = sklearn.model_selection.train_test_split(x, y)

## Training a Model

In [4]:
model = sklearn.svm.SVC()
model.fit(x_train, y_train);

## Testing the Model

In [5]:
y_predicted = model.predict(x_test)
accuracy = sklearn.metrics.accuracy_score(y_test, y_predicted)
accuracy

0.936

## Hyperparameter Tuning

In [12]:
c_list = range(1, 40, 3)  # [1, 4, 7, ... ]
kernel_list = ["linear", "poly", "rbf", "sigmoid"]
result_df = pd.DataFrame(columns=["C", "Kernel", "Accuracy"])

for c in c_list:
    for kernel in kernel_list:
        model = sklearn.svm.SVC(C=c, kernel=kernel)
        model.fit(x_train, y_train)
        y_predicted = model.predict(x_test)
        accuracy = sklearn.metrics.accuracy_score(y_test, y_predicted)
        result_df = result_df.append({"C": c, "Kernel": kernel, "Accuracy": accuracy}, ignore_index=True)

result_df

Unnamed: 0,C,Kernel,Accuracy
0,1,linear,0.896
1,1,poly,0.897
2,1,rbf,0.936
3,1,sigmoid,0.853
4,4,linear,0.896
5,4,poly,0.913
6,4,rbf,0.938
7,4,sigmoid,0.812
8,7,linear,0.896
9,7,poly,0.915


In [13]:
c_df = result_df[result_df["Kernel"]=="rbf"]

fig = px.line(x=c_df["C"], y=c_df["Accuracy"], labels={'x':'C', 'y':'Accuracy'})
fig.show()

In [16]:
kernel_df = result_df[result_df["C"]==22]

fig = px.bar(kernel_df, x='Kernel', y='Accuracy')
fig.show()