![img.png](..%2F..%2Fimgs%2Fimg.png)


### MNIST Digits - Classification Using SVM

**Objective** We will develop a model using Support Vector Machine which should correctly classify the handwritten digits from 0-9 based on the pixel values given as features. Thus, this is a 10-class classification problem.

**Feature** Since image is of 28 x 28 pixels, we regard each pixel as a feature - 784 features in total for an image.


## Load data

In [2]:
from sklearn.svm import SVC
from sklearn.model_selection import KFold
from sklearn.model_selection import GridSearchCV

import torchvision

train_dataset = torchvision.datasets.MNIST(root='../../data/',
                                           train=True,
                                           download=True)

test_dataset = torchvision.datasets.MNIST(root='../../data/',
                                          train=False)

  Referenced from: '/Users/amberm/anaconda3/lib/python3.9/site-packages/torchvision/image.so'
  warn(


### Transfer data to the required format

In [3]:

training_data = train_dataset.train_data.numpy()[:5000].reshape(5000, -1)
# (5000, 28, 28) -> (5000, 784)
training_label = train_dataset.train_labels[:5000].numpy()

test_data = test_dataset.test_data.numpy()[:5000].reshape(5000, -1)
test_label = test_dataset.test_labels[:5000].numpy()



In [4]:
#Print training data size
print('Training data size: ', training_data.shape)
print('Training data label size:', training_label.shape)
print('Training data size: ', test_data.shape)
print('Training data label size:', test_label.shape)

Training data size:  (5000, 784)
Training data label size: (5000,)
Training data size:  (5000, 784)
Training data label size: (5000,)


### Normalization

In [None]:
# Normalization

training_data = training_data / 255.0
test_data = test_data/225.0

### Build model (Linear-based SVM)

1. Linear-based SVM
SVM is a supervised machine learning algorithm that helps in classification or regression problems. It aims to find an optimal boundary between the two possible outputs. SVM should classify an instance as only one of two classes; yes/no, 1/0, or true/false.

In the base form, linear separation, SVM tries to find a line that maximizes the separation between a two-class data set of 2-dimensional space points. To generalize, the objective is to find a hyperplane that maximizes the separation of the data points to their potential classes in an n-dimensional space. The data points with the minimum distance to the hyperplane (closest points) are called Support Vectors.

In the image below, the Support Vectors are the 3 points (2 blue and 1 green) laying on the scattered lines, and the separation hyperplane is the solid red line:

![img.png](..%2F..%2Fimgs%2Flinear_svm.png)


In [6]:
# linear model
# Cfloat, default=1.0
# model_linear = SVC(kernel='linear')
# model_linear.fit(training_data, training_label)


 2. Non-linear model
Both linear models have linear decision boundaries (intersecting hyperplanes) while the non-linear kernel models (polynomial or Gaussian RBF) have more flexible non-linear decision boundaries with shapes that depend on the kind of kernel and its parameters.

![img.png](..%2F..%2Fimgs%2Flinear_svm.png)


In [5]:

# non-linear model
# using rbf kernel, C=1, default value of gamma

# model
non_linear_model = SVC(kernel='rbf')
non_linear_model.fit(training_data, training_label)

# predict
y_pred = non_linear_model.predict(test_data)

### Prediction and accuracy

In [6]:
# Accuracy
from sklearn import metrics
y_pred = non_linear_model.predict(test_data)

print("Accuracy without best param:", metrics.accuracy_score(y_true=test_label, y_pred=y_pred), "\n")


Accuracy without best param: 0.932 



### Binary Classification VS Multiclass Classification

In its most simple type, SVM doesn’t support multiclass classification natively. It supports binary classification and separating data points into two classes. For multiclass classification, the same principle is utilized after breaking down the multiclassification problem into multiple binary classification problems.

The idea is to map data points to high dimensional space to gain mutual linear separation between every two classes. This is called a One-to-One approach, which breaks down the multiclass problem into multiple binary classification problems. A binary classifier per each pair of classes.

Another approach one can use is One-to-Rest. In that approach, the breakdown is set to a binary classifier per each class.

A single SVM does binary classification and can differentiate between two classes. So that, according to the two breakdown approaches, to classify data points from m classes data set:

![img.png](https://www.baeldung.com/wp-content/uploads/sites/4/2020/10/multiclass-svm1.png)
In the One-to-Rest approach, the classifier can use \pmb{m} SVMs. Each SVM would predict membership in one of the \pmb{m} classes.
![img.png](https://www.baeldung.com/wp-content/uploads/sites/4/2020/10/multiclass-svm2-e1601952762246.png)

In the One-to-One approach, the classifier can use \pmb{\frac{m (m-1)}{2}} SVMs.
![img.png](https://www.baeldung.com/wp-content/uploads/sites/4/2020/10/multiclass-svm3-e1601952776445.png)



In [11]:
print(non_linear_model.decision_function(test_data)[0])

[ 1.76263225 -0.2926847   6.12095354  7.2675715   3.76465637  3.81325744
  0.70350723  9.31040532  3.79613291  8.27581795]


[AB,AC,BC]
[1.2343,-9.33, -4.32]
[A, C, C]

### Optimisation

- Training method
    - Cross Validation (KFold)
        In KFold, during each round you will use one fold as the test set and all the remaining folds as your training set.


![kfold](..%2F..%2Fimgs%2Fkfold.png)

In [13]:
# creating a KFold object with 5 splits
folds = KFold(n_splits = 5, shuffle = True, random_state = 10)

In [None]:

# specify range of hyperparameters
# Set the parameters by cross-validation
hyper_params = [ {'gamma': [1e-2, 1e-3, 1e-4],
                     'C': [5,10]}]

# specify model
# model = SVC(kernel="rbf")

# set up GridSearchCV()
model_cv = GridSearchCV(estimator = non_linear_model,
                        param_grid = hyper_params,
                        scoring= 'accuracy',
                        cv = folds,
                        verbose = 1,
                        return_train_score=True)


In [14]:
# fit the model
model_cv.fit(training_data, training_label)

# printing the optimal accuracy score and hyperparameters
best_score = model_cv.best_score_
best_hyperparams = model_cv.best_params_

print("The best test score is {0} corresponding to hyperparameters {1}".format(best_score, best_hyperparams))


Fitting 5 folds for each of 6 candidates, totalling 30 fits



KeyboardInterrupt

