# Support Vector Machines
## Quadratic Programming Problem
There are a number of sources that will show that the first step to building a support vector machine is solving the following quadratic programming problem.

\begin{equation}
W(\alpha) = \sum_i^n \alpha_i - \sum_i^n\sum_j^n \alpha_i \alpha_j y_i y_j \vec{x}_i^T \vec{x}_j
\end{equation}
\begin{equation}
\sum_i^n \alpha_iy_i = 0, \quad \quad \alpha_i \geq 0
\end{equation}

where $\alpha_i$ is unknown, $y_i$ is the label, and $x_i$ is the data.

These equations have important characteristics to them. The most important are
\begin{equation} 
\vec{w} = \sum_i^n \alpha_i y_i x_i
\end{equation} where $w$ is a vector that helps to define our decision boundary and that most of the $\alpha_i$'s will be zero. So, most of the vectors that constitute $w$ will be 0.

Therefore, the points closed to our decision boundary will be used to define it.

## Kernel Trick
Equation (1) works simply in the case of a problem that is linearly seperable. However, in the case of classifying points that are not linearly seperable, we can use something called a kernel trick and modify equation (1) to be
\begin{equation}
W(\alpha) = \sum_i^n \alpha_i - \sum_i^n\sum_j^n \alpha_i \alpha_j y_i y_j K(\vec{x}_i,\vec{x}_j)
\end{equation}
where $K(\vec{x}_i, \vec{x}_j)$ is some transformation of $\vec{x}_i$ and $\vec{x}_j$. Some examples of common kernel tricks are

1. $(\vec{x}^T_i \vec{x}_j)^n$ where $n$ is any power.
1. $(\vec{x}^T_i \vec{x}_j + b)^n$ where $n$ is any power and $b$ is some bias.
1. $e^{||\vec{x}_i - \vec{x}_j||^2/\sigma^2}$ where $\sigma$ can be modified for various fitting.

The only requirement for a kernel trick is that it returns some numerical value that can be considered some kind of distance between points. That is, if $x_i$ and $x_j$ were images, there was some determination of a distance between the two images that comes out of this kernel trick.

## Machine Learning Algorithms
Understanding these elements, we can begin to write a Support Vector Algorithm for classification. We will start with doing this under the pretense that all data is numerical in nature and so the kernel tricks needed will require no conversion to have numerical returns. 

The steps to writing a machine learning algorithm are

1. Train
1. Validate
1. Test

where using a training sample and equation (2) and (4), we come out with a model. That model is then validated against another sample. Several models will be built with various kernel tricks, and then they will also be validated. The model with the highest accuracy is then chosen and tested against our test set for accuracy and reported on.

#### Necessary Packages
1. **Numpy**: Numpy is needed for various matrix algebra and added mathematics functions.
1. **cvxopt**: cvxopt is used for Quadratic Programming. I don't know how to do it, so following Andrew Tulloch's code, I used it for solving the best weights.
1. **matplotlib**: This is explicity used for plotting and data visualization.
1. **argh**: Argh allows me to execute examples.
1. **itertools**: itertools helps in creating a contour plot. Also adopted from Andrew Tulloch's code.
1. **os**: Commly imported package. I use it for quickly reading in every file within a folder.
1. **scipy and scimage**: These are used for image transformations. 
1. **pandas**: Pandas is useful for dataframes. This way, I can label matrices within a single dataset.
1. **mnist**: This is used to process images from the mnist database.

In [151]:
import numpy as np
import numpy.linalg as la
import cvxopt
import matplotlib.pyplot as plt
import matplotlib.cm as cm
import argh
import itertools
import pandas as pd
from os import listdir
from os.path import isfile, join
from scipy.ndimage.filters import gaussian_filter
from skimage.transform import resize
from mnist import MNIST

#### Kernel Functions
Before we are able to write a trainer, it is necessary to define a few kernel functions which will be a part of the Kernel class.

In [3]:
class Kernel(object):
    @staticmethod
    def linear():
        def f(x, y):
            return np.inner(x, y)
        return f
    
    @staticmethod
    def polynomial(dim, offset = 0.0):
        def f(x, y):
            return (np.inner(x,y) + offset) ** dim
        return f
    
    @staticmethod
    def radial_basis(sigma):
        def f(x, y):
            num = la.norm(x - y) ** 2
            den = 2*sigma**2
            return np.exp(num/den)
        return f
        
    @staticmethod
    def gaussian(sigma):
        def f(x, y):
            num = la.norm(x-y) ** 2
            den = (2 * sigma ** 2)
            return np.exp(-np.sqrt(num/den))
        return f
    
    @staticmethod
    def sigmoid(gamma, offset):
        def f(x, y):
            return np.tanh(gamma * np.dot(x,y) + offset)
        return f

## Writing a Trainer
#### The Predicter Class
We also need to have a predicter so that the trainer is able to continue updating itself. Predicting will use the decision boundary equation
\begin{equation}
\sum_i^n \alpha_i y_i K(\vec{x_i}, \vec{x}) + b \geq 0 \implies x \text{ is a positive sample.}
\end{equation}

In [4]:
class Predicter(object):
    def __init__(self, kernel, bias,
                alpha, X, labels):
        self._kernel = kernel
        self._bias = bias
        self._alpha = alpha
        self._X = X
        self._labels = labels
    def predict(self, x):
        result = self._bias
        for a_i, x_i, y_i in zip(self._alpha,
                                 self._X,
                                 self._labels):
            result += a_i*y_i*self._kernel(x_i, x)
        return np.sign(result).item()

This predicter object intializes itself as having the various components needed for labeling an individual point, and then has a function that performs the function. Then, depending on whether the sign is positive or negative, it returns a +1 for positive samples and a -1 for negative samples.

#### The Trainer Class
Now is the significantly more difficult part. The trainer for SVM utilizes quadratic programming which is honestly something I'm not super familiar with, so I'll be utilizing the cvxopt package to find the proper $\alpha$'s to maximize $W(\alpha)$. I will also take a lot of (most of) inspiration from Andrew Tulloch's Trainer in his guide [here](http://tullo.ch/articles/svm-py/).

The following is a brief overview of each function within the Trainer class:

1. **__init__**: Initializes the trainer with a kernel function and the cost variable used to determine the accuracy of the quadratic programming maximization.
1. **train**: Computes the alphas (weights) and creates a predicter class for training iterations.
1. **compute_weights**: Computes the weights for the predicter by using quadratic programming to maximize $W(\alpha)$.
1. **create_predicter**: Creates predicter class by converting all below minimal weights to 0 and using all others as weights in predicter. This minimizes computation by only iterating over points with $\alpha > 0$. I also use Andrew Tulloch's code here to compute the bias which is based on a presentation from Carnegie Mellon.
1. **compute_gram**: This creates the Gram Matrix, which in Machine Learning is just a matrix of every $x_i, x_j$ pair in a kernel function where $G_{ij}$ = $K(x_i, x_j)$. The Gram Matrix is necessary for the quadratic programming maximization.

In [78]:
class Trainer(object):
    def __init__(self, kernel, cost):
        self._kernel = kernel
        self._c = cost
        
    def train(self, X, labels):
        weights = self.compute_weights(X, labels)
        return self.create_predicter(X, labels, weights)
    
    def compute_weights(self, X, labels):
        n = len(X)
        Gram = self.compute_gram(X)
        P = cvxopt.matrix(np.outer(labels, labels) * Gram)
        q = cvxopt.matrix(-1 * np.ones(n))

        # -a_i \leq 0
        # TODO(tulloch) - modify G, h so that we have a soft-margin classifier
        G_std = cvxopt.matrix(np.diag(np.ones(n) * -1))
        h_std = cvxopt.matrix(np.zeros(n))

        # a_i \leq c
        G_slack = cvxopt.matrix(np.diag(np.ones(n)))
        h_slack = cvxopt.matrix(np.ones(n) * self._c)

        G = cvxopt.matrix(np.vstack((G_std, G_slack)))
        h = cvxopt.matrix(np.vstack((h_std, h_slack)))

        A = cvxopt.matrix(labels, (1, n))
        b = cvxopt.matrix(0.0)

        # solution = cvxopt.solvers.qp(P, q, G, h, A, b)
        solution = cvxopt.solvers.qp(P, q, G, h, kktsolver='ldl', options={'kktreg':1e-9})
        # Lagrange multipliers
        return np.ravel(solution['x'])
    
    def create_predicter(self, X, labels, weights):
        """non_minimal_indices = weights > 1e-5
        X_non_minimal = [X[i] for i in non_minimal_indices if i == True]
        labels_non_minimal = [labels[i] for i in non_minimal_indices if i == True]
        weights_non_minimal = [weights[i] for i in non_minimal_indices if i == True]"""

        bias = np.mean(
            [y_k - Predicter(
                kernel=self._kernel,
                bias=0.0,
                alpha=weights,
                X=X,
                labels=labels).predict(x_k)
            for (y_k, x_k) in zip(labels, X)])
        
        return Predicter(
                kernel = self._kernel,
                bias = bias,
                alpha = weights,
                X = X,
                labels = labels)
    
    
    def compute_gram(self, X):
        n = len(X)
        G = np.zeros((n, n))
        for i, x_i in enumerate(X):
            for j, x_j in enumerate(X):
                G[i, j] = self._kernel(x_i, x_j)
        return G

#### The Tester Class
This just takes the predicter class and predicts the label of testing information and returns the predicted labels and whether or not they were correct.

In [79]:
class Tester(object):
    def __init__ (self, predicter, data, true_labels):
        self._predicter = predicter
        self._data = data
        self._true_labels = true_labels
    def compute_accuracy(self):
        flatten = lambda m: np.array(m).reshape(-1,)
        predictions = [self._predicter.predict(x) for x in self._data]
        correct = [flatten(predictions)[i] == flatten(self._true_labels)[i]
                   for i in range(len(flatten(predictions)))]
        return (predictions, correct)

#### Examples

In [80]:
# Procedure used to cause of a list of lists to be a single list
flatten = lambda m: np.array(m).reshape(-1,)

In [85]:
def linear_example(num_samples=500, num_features=2, grid_size=100, filename = "svm.pdf"):
    samples = np.matrix(np.random.uniform(low = -1.5, high = 1.5, size=num_samples * num_features)
                        .reshape(num_samples, num_features))
    
    labels = 2 * (samples.sum(axis=1) > 0) - 1.0
    flatten = lambda m: np.array(m).reshape(-1,)

    
    plt.scatter(flatten(samples[:,0]), flatten(samples[:,1]),
                c=flatten(labels), cmap=cm.Paired, edgecolor = "white",
                s = 16)
    plt.savefig("examples.pdf")
    
    training_samples = samples[1:num_samples//2,]
    testing_samples = samples[num_samples//2 + 1:,]
    training_labels = labels[1:num_samples//2]
    testing_labels = labels[num_samples//2 + 1:]
    
    trainer = Trainer(Kernel.linear(), 0.01)
    predicter = trainer.train(training_samples, training_labels)
    tester = Tester(predicter, testing_samples, testing_labels)
    predicted_labels, correct = tester.compute_accuracy()
    
    plot(predicter, training_samples, training_labels, testing_samples, 
         predicted_labels, correct, grid_size, "linearcase.pdf")
    
def polynomial_example(num_samples=500, num_features=2, grid_size=100, filename = "svm.pdf"):
    samples = np.matrix(np.random.uniform(low = -1.5, high = 1.5, size=num_samples * num_features)
                        .reshape(num_samples, num_features))
    labels = 2 * (np.sqrt(np.power(samples[:,0],2) + np.power(samples[:,1],2)) > 1) - 1.0
    
    training_samples = samples[1:num_samples//2,]
    testing_samples = samples[num_samples//2 + 1:,]
    training_labels = labels[1:num_samples//2]
    testing_labels = labels[num_samples//2 + 1:]
    
    plt.scatter(flatten(samples[:,0]), flatten(samples[:,1]),
                c=flatten(labels), cmap=cm.Paired, edgecolor = "white",
                s = 16)
    plt.savefig("poly.pdf")
    
    
    trainer = Trainer(Kernel.polynomial(2), 0.01)
    predicter = trainer.train(samples, labels)
    tester = Tester(predicter, testing_samples, testing_labels)
    predicted_labels, correct = tester.compute_accuracy()
    
    plot(predicter, training_samples, training_labels, testing_samples, 
         predicted_labels, correct, grid_size, "polynomialcase.pdf")


def plot(predicter, training_X, training_y, testing_X, testing_y,
         correct, grid_size, filename):
    
    flatten = lambda m: np.array(m).reshape(-1,)
    x_min, x_max = training_X[:, 0].min() - 0.25, training_X[:, 0].max() + .25
    y_min, y_max = training_X[:, 1].min() - 0.25, training_X[:, 1].max() + .25
    xx, yy = np.meshgrid(np.linspace(x_min, x_max, grid_size),
                         np.linspace(y_min, y_max, grid_size),
                         indexing='ij')
    flatten = lambda m: np.array(m).reshape(-1,)
    result = []
    for (i, j) in itertools.product(range(grid_size), range(grid_size)):
        point = np.array([xx[i, j], yy[i, j]]).reshape(1, 2)
        result.append(predicter.predict(point))
    Z = np.array(result).reshape(xx.shape)

    plt.contourf(xx, yy, Z,
                 cmap=cm.Paired,
                 levels=[-0.001, 0.001],
                 extend='both',
                 alpha=0.8)
    plt.scatter(flatten(training_X[:, 0]), flatten(training_X[:, 1]),
                c=flatten(training_y), cmap=cm.Paired, edgecolor = "white",
                s = 16)
    colors = ["white" if x == True else "red" for x in correct]
    plt.scatter(flatten(testing_X[:, 0]), flatten(testing_X[:, 1]),
                c = flatten(testing_y), cmap = cm.Paired, edgecolor = colors,
                s =16)
    plt.xlim(x_min, x_max)
    plt.ylim(y_min, y_max)
    plt.savefig(filename)
    plt.clf()

    
argh.dispatch_command(linear_example)
argh.dispatch_command(polynomial_example)


     pcost       dcost       gap    pres   dres
 0: -4.2085e+01 -4.7135e+00  1e+03  3e+01  3e-10
 1: -1.4258e+00 -4.6904e+00  1e+01  3e-01  3e-10
 2: -1.1304e+00 -3.0098e+00  2e+00  2e-10  2e-12
 3: -1.3132e+00 -1.5110e+00  2e-01  1e-10  1e-12
 4: -1.4191e+00 -1.4574e+00  4e-02  1e-10  2e-12
 5: -1.4373e+00 -1.4473e+00  1e-02  1e-10  9e-13
 6: -1.4426e+00 -1.4442e+00  2e-03  1e-10  4e-13
 7: -1.4434e+00 -1.4437e+00  2e-04  2e-10  2e-13
 8: -1.4436e+00 -1.4436e+00  2e-05  2e-10  8e-14
 9: -1.4436e+00 -1.4436e+00  4e-07  2e-10  1e-14
Optimal solution found.
     pcost       dcost       gap    pres   dres
 0: -1.5168e+02 -1.0043e+01  3e+03  5e+01  4e-10
 1: -4.1129e+00 -9.9868e+00  4e+01  6e-01  4e-10
 2: -2.4429e+00 -7.0537e+00  5e+00  1e-02  1e-11
 3: -2.6941e+00 -3.2563e+00  6e-01  1e-03  9e-13
 4: -2.9059e+00 -2.9947e+00  9e-02  2e-04  2e-12
 5: -2.9409e+00 -2.9618e+00  2e-02  3e-05  1e-12
 6: -2.9464e+00 -2.9564e+00  1e-02  1e-05  6e-13
 7: -2.9507e+00 -2.9521e+00  1e-03  1e-06  3e-1

## Comparisons to LDA, QDA, and KNN

In [104]:
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis
from sklearn.discriminant_analysis import QuadraticDiscriminantAnalysis
from sklearn.neighbors import KNeighborsClassifier
from sklearn.model_selection import train_test_split

def linear_comparison(num_samples=1000, num_features=2, grid_size=100, filename = "svm.pdf"):
    samples = np.array(np.random.uniform(low = -1.5, high = 1.5, size=num_samples * num_features)
                        .reshape(num_samples, num_features))
    
    labels = 2 * (samples.sum(axis=1) > 0) - 1.0
    
    train_samples = samples[1:len(samples)//2,]
    train_labels = labels[1:len(samples)//2]
    test_samples = samples[(len(samples)//2 + 1):,]
    test_labels = labels[(len(samples)//2 + 1):,]
    
    
    trainer = Trainer(Kernel.linear(), 0.01)
    predicter = trainer.train(train_samples, train_labels)
    tester = Tester(predicter, test_samples, test_labels)
    predicted_labels, correct = tester.compute_accuracy()
    svm_accuracy = sum(correct)/len(correct)
    
    lda = LinearDiscriminantAnalysis()
    lda.fit(train_samples, train_labels)
    correct = (lda.predict(test_samples) == test_labels)
    lda_accuracy = sum(correct)/len(correct)
    
    qda = QuadraticDiscriminantAnalysis()
    qda.fit(train_samples, train_labels)
    correct = (qda.predict(test_samples) == test_labels)
    qda_accuracy = sum(correct)/len(correct)
    
    knn = KNeighborsClassifier(n_neighbors=3)
    knn.fit(train_samples, train_labels)
    correct = (knn.predict(test_samples) == test_labels)
    knn_accuracy = sum(correct)/len(correct)
    
    return(svm_accuracy, lda_accuracy, qda_accuracy, knn_accuracy)

def polynomial_comparison(num_samples=1000, num_features=2, grid_size=100, filename = "svm.pdf"):
    samples = np.array(np.random.uniform(low = -1.5, high = 1.5, size=num_samples * num_features)
                        .reshape(num_samples, num_features))
    labels = 2 * (np.sqrt(np.power(samples[:,0],2) + np.power(samples[:,1],2)) > 1) - 1.0
    
    train_samples = samples[1:len(samples)//2,]
    train_labels = labels[1:len(samples)//2]
    test_samples = samples[(len(samples)//2 + 1):,]
    test_labels = labels[(len(samples)//2 + 1):,]
    
    trainer = Trainer(Kernel.polynomial(2), 0.01)
    predicter = trainer.train(train_samples, train_labels)
    tester = Tester(predicter, test_samples, test_labels)
    predicted_labels, correct = tester.compute_accuracy()
    svm_accuracy = sum(correct)/len(correct)
    
    lda = LinearDiscriminantAnalysis()
    lda.fit(train_samples, train_labels)
    correct = (lda.predict(test_samples) == test_labels)
    lda_accuracy = sum(correct)/len(correct)
    
    qda = QuadraticDiscriminantAnalysis()
    qda.fit(train_samples, train_labels)
    correct = (qda.predict(test_samples) == test_labels)
    qda_accuracy = sum(correct)/len(correct)
    
    knn = KNeighborsClassifier(n_neighbors=3)
    knn.fit(train_samples, train_labels)
    correct = (knn.predict(test_samples) == test_labels)
    knn_accuracy = sum(correct)/len(correct)
    
    return(svm_accuracy, lda_accuracy, qda_accuracy, knn_accuracy)

In [100]:
linear_tests = np.mat([linear_comparison() for i in range(100)])

     pcost       dcost       gap    pres   dres
 0: -9.2812e+01 -8.9130e+00  2e+03  5e+01  3e-10
 1: -2.9513e+00 -8.8615e+00  3e+01  5e-01  3e-10
 2: -1.9662e+00 -6.0383e+00  4e+00  3e-10  3e-12
 3: -2.3319e+00 -2.7749e+00  4e-01  2e-10  1e-12
 4: -2.5543e+00 -2.6331e+00  8e-02  2e-10  2e-12
 5: -2.5919e+00 -2.6069e+00  2e-02  2e-10  9e-13
 6: -2.5994e+00 -2.6009e+00  1e-03  2e-10  3e-13
 7: -2.6003e+00 -2.6003e+00  3e-05  2e-10  5e-14
 8: -2.6003e+00 -2.6003e+00  3e-06  2e-10  2e-14
 9: -2.6003e+00 -2.6003e+00  4e-08  2e-10  1e-14
Optimal solution found.
     pcost       dcost       gap    pres   dres
 0: -7.8445e+01 -8.8158e+00  2e+03  5e+01  3e-10
 1: -2.5755e+00 -8.7612e+00  3e+01  6e-01  3e-10
 2: -1.6878e+00 -5.9130e+00  4e+00  3e-10  3e-12
 3: -2.0092e+00 -2.4899e+00  5e-01  2e-10  1e-12
 4: -2.2350e+00 -2.3167e+00  8e-02  2e-10  2e-12
 5: -2.2724e+00 -2.2873e+00  1e-02  2e-10  9e-13
 6: -2.2795e+00 -2.2813e+00  2e-03  2e-10  4e-13
 7: -2.2804e+00 -2.2805e+00  6e-05  2e-10  1e-1

  y = column_or_1d(y, warn=True)



Optimal solution found.
     pcost       dcost       gap    pres   dres
 0: -6.8569e+01 -5.0330e+00  1e+03  4e+01  4e-10
 1: -1.8624e+00 -5.0076e+00  2e+01  4e-01  4e-10
 2: -1.1967e+00 -3.4626e+00  2e+00  2e-10  4e-12
 3: -1.3622e+00 -1.6121e+00  2e-01  1e-10  8e-13
 4: -1.4767e+00 -1.5188e+00  4e-02  1e-10  2e-12
 5: -1.4927e+00 -1.5067e+00  1e-02  1e-10  1e-12
 6: -1.4985e+00 -1.5019e+00  3e-03  1e-10  5e-13
 7: -1.5000e+00 -1.5007e+00  7e-04  1e-10  4e-13
 8: -1.5003e+00 -1.5004e+00  4e-05  2e-10  9e-14
 9: -1.5004e+00 -1.5004e+00  5e-06  1e-10  1e-13
10: -1.5004e+00 -1.5004e+00  5e-08  2e-10  2e-15
Optimal solution found.
     pcost       dcost       gap    pres   dres
 0: -7.0840e+01 -5.0061e+00  1e+03  4e+01  4e-10
 1: -1.9174e+00 -4.9809e+00  2e+01  4e-01  4e-10
 2: -1.2277e+00 -3.4514e+00  2e+00  2e-10  4e-12
 3: -1.3875e+00 -1.6347e+00  2e-01  1e-10  8e-13
 4: -1.4961e+00 -1.5345e+00  4e-02  1e-10  2e-12
 5: -1.5138e+00 -1.5202e+00  6e-03  1e-10  1e-12
 6: -1.5167e+00 -1.517

ValueError: setting an array element with a sequence.

In [105]:
polynomial_tests = np.mat([polynomial_comparison() for i in range(100)])

     pcost       dcost       gap    pres   dres
 0: -1.4762e+02 -1.0007e+01  3e+03  5e+01  4e-10
 1: -3.9946e+00 -9.9507e+00  4e+01  6e-01  4e-10
 2: -2.3682e+00 -6.9692e+00  5e+00  9e-03  9e-12
 3: -2.6242e+00 -3.1933e+00  6e-01  1e-03  7e-13
 4: -2.8256e+00 -2.9370e+00  1e-01  2e-04  2e-12
 5: -2.8661e+00 -2.8935e+00  3e-02  3e-05  1e-12
 6: -2.8749e+00 -2.8836e+00  9e-03  8e-06  5e-13
 7: -2.8776e+00 -2.8808e+00  3e-03  2e-06  3e-13
 8: -2.8789e+00 -2.8794e+00  5e-04  1e-07  7e-14
 9: -2.8791e+00 -2.8792e+00  8e-05  5e-09  7e-14
10: -2.8791e+00 -2.8791e+00  2e-06  2e-10  1e-14
Optimal solution found.
     pcost       dcost       gap    pres   dres
 0: -1.4692e+02 -1.0011e+01  3e+03  5e+01  4e-10
 1: -3.9991e+00 -9.9540e+00  4e+01  6e-01  4e-10
 2: -2.3833e+00 -7.0459e+00  5e+00  2e-02  2e-11
 3: -2.6015e+00 -3.1720e+00  6e-01  2e-03  1e-12
 4: -2.8122e+00 -2.9112e+00  1e-01  3e-04  2e-12
 5: -2.8544e+00 -2.8711e+00  2e-02  3e-05  1e-12
 6: -2.8605e+00 -2.8649e+00  4e-03  8e-06  5e-1

In [106]:
print("Linear SVM-Average:", np.mean(linear_tests[0:,]))
print("Linear LDA-Average:", np.mean(linear_tests[1:,]))
print("Linear QDA-Average:", np.mean(linear_tests[2:,]))
print("Linear KNN-Average:", np.mean(linear_tests[3:,]))
print("Quadratic SVM-Average:", np.mean(polynomial_tests[0:,]))
print("Quadratic LDA-Average:", np.mean(polynomial_tests[1:,]))
print("Quadratic QDA-Average:", np.mean(polynomial_tests[2:,]))
print("Quadratic KNN-Average:", np.mean(polynomial_tests[3:,]))

Linear SVM-Average: 0.98006012024
Linear LDA-Average: 0.97995991984
Linear QDA-Average: 0.979944583044
Linear KNN-Average: 0.979820465674
Quadratic SVM-Average: 0.837870741483
Quadratic LDA-Average: 0.83778668448
Quadratic QDA-Average: 0.837660013905
Quadratic KNN-Average: 0.837768320145


#### Preprocessing
This returns the odd format of the .idx3 files as a numpy matrix so that is processes properly.

In [109]:
def preprocessing(num):
    mat = np.mat(num)
    return(mat)

#### Data Reading
Using the MNIST package (which was created to read this database), the information is processed and sorted into a training and testing dataset.

In [111]:
mndata = MNIST('samples')

train_images, train_labels = mndata.load_training()
test_images, test_labels = mndata.load_testing()
train_nums = [preprocessing(x) for x in train_images]
test_nums = [preprocessing(x) for x in test_images]

##### Creating Subsets
Because I haven't written a routine to handle classification of multiple classes, the below code creates a subset of samples that are only labeled 1 or 5.

In [114]:
# Create Dataframes of Training and Testing Images
train_df = pd.DataFrame()
train_df["Numbers"] = train_nums
train_df["Labels"] = train_labels
train_ones = train_df[train_df["Labels"] == 1]
train_fives = train_df[train_df["Labels"] == 5]

test_df = pd.DataFrame()
test_df["Numbers"] = test_nums
test_df["Labels"] = test_labels
test_ones = test_df[test_df["Labels"] == 1]
test_fives = test_df[test_df["Labels"] == 5]

train_samples = list(train_ones["Numbers"]) + list(train_fives["Numbers"])
train_labels = list(train_ones["Labels"]) + list(train_fives["Labels"])
test_samples = list(test_ones["Numbers"]) + list(test_fives["Numbers"])
test_labels = list(test_ones["Labels"]) + list(test_fives["Labels"])

#### Training/Testing
This just uses similar methods to above examples to show accuracy of the procedure.

In [None]:
trainer = Trainer(Kernel.polynomial(2), .1)
predicter = trainer.train(train_samples[0:1000], train_labels[0:1000])
tester = Tester(predicter, test_samples[0:1000], test_labels[0:1000])
predicted_labels, correct = tester.compute_accuracy()

In [148]:
# Compute Accuracy
sum(correct)/len(correct)

1.0