## This notebook shows how to develop SVM Classifier for MNIST handwritten digits classification

In [4]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.svm import SVC
from sklearn.metrics import confusion_matrix, accuracy_score
from sklearn.metrics import classification_report
import pickle 
import gzip
import random

## SVM Classifier

Loading the Dataset

You can download MNIST Dataset here: http://yann.lecun.com/exdb/mnist/

In [5]:
#Parameters:
#filename : Name of the MNIST '.gz' file with extension
#type : 'image' or 'label' to specify the type of data
#n_datapoints : Number of datapoints
    
def load_mnist(filename, type, n_datapoints):
    # MNIST Images have 28*28 pixels dimension
    image_size = 28
    f = gzip.open(filename)
    
    if(type == 'image'):
        f.read(16)    # Skip Non-Image information
        buf = f.read(n_datapoints * image_size * image_size)
        data = np.frombuffer(buf, dtype=np.uint8).astype(np.float32)
        data = data.reshape(n_datapoints, image_size, image_size, 1)
    elif(type == 'label'):
        f.read(8) # Skip Inessential information
        buf = f.read(n_datapoints)
        data = np.frombuffer(buf, dtype=np.uint8).astype(np.int64)
        data = data.reshape(n_datapoints, 1)
    return data

In [6]:
# Training Dataset
train_size = 60000
test_size = 10000
dirpath = '/Users/daveyap/Desktop/github/Handwritten-digits-classification/'
X = load_mnist(dirpath + 'train-images-idx3-ubyte.gz', 'image', train_size)
y = load_mnist(dirpath + 'train-labels-idx1-ubyte.gz', 'label', train_size)
X_test = load_mnist(dirpath + 't10k-images-idx3-ubyte.gz', 'image', test_size)
y_test = load_mnist(dirpath + 't10k-labels-idx1-ubyte.gz', 'label', test_size)

## SVM Classifier

Create and train a SVM Classifier with SKlearn

In [7]:
classifier = SVC()

In [10]:
#classifier.fit(X, y)
classifier.fit(X.reshape(X.shape[0], 28*28), y)

  return f(**kwargs)


SVC()

In [11]:
#classifier.score(X_test, y_test)
classifier.score(X_test.reshape(X_test.shape[0], 28*28), y_test)

0.9792

Accuracy of the classifier is 97.92%

## Save the Model

The trained SVC Model is saved, so that the model can be reconstructed and used in different python modules or applications.

Joblib is a library, that allows us to save a copy of runtime data in a file, and load it back whenever we need it.

1. joblib.dump - Save the runtime data in a file
2. joblib.load - Load the saved data from file to the runtime
   Other option is to use pickle. But using pickle can result in very large file size. KNN model trained on 60K MNIST    28*28 images can be around 420MB.

Joblib offers compression. The same KNN model can weigh only 4MB if saved using compression enabled joblib.

In [13]:
import joblib

In [14]:
joblib.dump(classifier,'svc_mnist_60k.gz',compress=('gzip',3))

['svc_mnist_60k.gz']