# CS 3120 Machine Learning Midterm
Devon DeJohn, Spring 2020

In [None]:
import sys
sys.path.append('../')
from source import multi_model as mm
mnist = mm.Model("../data/mnist.csv")

## MNIST dataset

### Partitions
```
Partition 0:
        name: 'train'
        size: 29400 / 42000
        pcnt: 70.0 %

Partition 1:
        name: 'test'
        size: 8400 / 42000
        pcnt: 20.0 %

Partition 2:
        name: 'validate'
        size: 4200 / 42000
        pcnt: 10.0 %
```

### Scores
I used three different models for classification: support vector machine, decision tree, k-nearest neighbors, and logistic regression.

I used a validation set to test the different models by varying the number of principal components. I did also do an initial test of scaled vs. unscaled data, but found that all models performed significantly better on scaled data.

Here are the results from the four different models, with PCA performed with values of `n = 8, 16, 32, 64`. For the `KNN` model, I ran a separate round of tests and arrived at the parameters shown below with the best performance.

Here are the models (and the parameters) I used:

```python
"support vector machine": SVC(),
"k-nearest neighbors": KNeighborsClassifier(weights="distance", metric="manhattan", n_jobs=-1),
"logistic regression": LogisticRegression(max_iter=1000),
"decision tree": DecisionTreeClassifier()
```

In [2]:
results = mm.compare_models(mnist)
for mdl, res in results.items():
    print(f"\n{mdl.upper()}\n" + "\n".join(res))
# end


SUPPORT VECTOR MACHINE
         PCA 8: 0.84429
        PCA 16: 0.47167
        PCA 32: 0.55429
        PCA 64: 0.55548

K-NEAREST NEIGHBORS
         PCA 8: 0.82857
        PCA 16: 0.57024
        PCA 32: 0.52071
        PCA 64: 0.4881

LOGISTIC REGRESSION
         PCA 8: 0.74
        PCA 16: 0.52238
        PCA 32: 0.50476
        PCA 64: 0.435

DECISION TREE
         PCA 8: 0.70381
        PCA 16: 0.4331
        PCA 32: 0.39405
        PCA 64: 0.36857


### Chosing the best model
The model that performs best out of the **very** limited set of hyperparameters explored above was the support vector machine with a principal component analysis done using 8 principal components, and where the data was scaled using `sklearn.preprocessing`. Here is the classification report for the testing dataset:

In [3]:
svm = mm.SVC()
xtrain = mm.process_data(mnist.train.X, 8)
ytrain = mnist.train.Y

xtest = mm.process_data(mnist.test.X, 8)
ytest = mnist.test.Y

svm.fit(xtrain, ytrain)
mnist.report(svm, xtest, ytest)

              precision    recall  f1-score   support

           0       0.92      0.96      0.94       835
           1       0.96      0.98      0.97       919
           2       0.90      0.92      0.91       862
           3       0.83      0.79      0.81       839
           4       0.76      0.78      0.77       827
           5       0.87      0.85      0.86       771
           6       0.95      0.94      0.94       821
           7       0.92      0.89      0.90       916
           8       0.82      0.84      0.83       810
           9       0.73      0.70      0.71       800

    accuracy                           0.87      8400
   macro avg       0.86      0.86      0.86      8400
weighted avg       0.87      0.87      0.87      8400



### Best Classifier: SVM
The classifier that performed the best for the `MNIST` dataset was the support vector machine, but not by much as can be seen below, where I test the `KNN` model using the same PCA feature reduction. Due to the relatively simple task of classifying handwritten digits (as opposed to more complex images like animals), all of these classifiers tend to perform similarly. However, there are just too many hyperparameters to make any sort of substantial claim about the performance of any particular model on this dataset.

In [4]:
knn = mm.KNeighborsClassifier(weights="distance", metric="manhattan", n_jobs=-1)
xtrain = mm.process_data(mnist.train.X, 8)
ytrain = mnist.train.Y

xtest = mm.process_data(mnist.test.X, 8)
ytest = mnist.test.Y

knn.fit(xtrain, ytrain)
mnist.report(knn, xtest, ytest)

              precision    recall  f1-score   support

           0       0.88      0.96      0.92       835
           1       0.96      0.98      0.97       919
           2       0.91      0.90      0.91       862
           3       0.82      0.74      0.78       839
           4       0.73      0.77      0.75       827
           5       0.85      0.77      0.81       771
           6       0.93      0.95      0.94       821
           7       0.91      0.89      0.90       916
           8       0.77      0.82      0.79       810
           9       0.69      0.68      0.68       800

    accuracy                           0.85      8400
   macro avg       0.85      0.85      0.85      8400
weighted avg       0.85      0.85      0.85      8400

