
# **CS 4361/5361 Machine Learning**

**Practice Exam 1, Part 2**

**Author:** [Olac Fuentes](http://www.cs.utep.edu/ofuentes/)<br>


Your task is to write programs to determine if a digit from the MNIST dataset has been flipped vertically. 

In [None]:
import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt
from sklearn.neighbors import KNeighborsClassifier 
from sklearn.metrics import accuracy_score, confusion_matrix
import time

Download data.

In [None]:
(x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data()

Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/mnist.npz


Subsample the data to make running times of classifiers more manageable.

In [None]:
x_train = x_train[::5]
y_train = y_train[::5]
x_test = x_test[::5]
y_test = y_test[::5]

Flip some of the examples vertically

In [None]:
f = np.random.random(size=x_train.shape[0])
x_train[f>0.7] = x_train[f>0.7,:,::-1]
y_train = np.int16(f>0.7)
f = np.random.random(size=x_test.shape[0])
x_test[f>0.7] = x_test[f>0.7,:,::-1]
y_test = np.int16(f>0.7)

In [None]:
import numpy as np
import matplotlib.pyplot as plt

for i in range(10):
  im = np.random.randint(0,x_train.shape[0])
  plt.imshow(x_train[im],cmap='gray')
  print('Class:',y_train[im])
  plt.show()

Now let's convert them to row form, as usual. 

In [None]:
x_train = np.float32(x_train/255).reshape(x_train.shape[0],-1)
x_test = np.float32(x_test/255).reshape(x_test.shape[0],-1)

**Question 1.** 
Write a classifier that simply predicts the majority class in the training set as the class for all test examples and compute its accuracy.


In [None]:
def most_common_class(y):
  # More general than required; it will work for any set of classes
  labels = list(set(y))
  labels.sort()
  counts = []
  for i in labels:
    counts.append(np.sum(y==i))
  return labels[np.argmax(counts)]

pred = np.zeros_like(y_test)+most_common_class(y_train)

print('Accuracy = {:6.4f}'.format(accuracy_score(y_test, pred)))

**Question 2.** 
Write a program to compare the performance of the k-nearest-neighbors classifier, random forest classifier and multilayer perceptron in this problem.


**Solution.** I added a couple more classifiers to the comparison. We took advantage of the fact that in Python, functions can be passed as parameters. 

In [None]:
from sklearn.neighbors import KNeighborsClassifier 
from sklearn.ensemble import RandomForestClassifier
from sklearn.neural_network import MLPClassifier
from sklearn.naive_bayes import MultinomialNB
from sklearn.tree import DecisionTreeClassifier

def evaluate_model(model, x_train, y_train, x_test, y_test):
  m = model()
  m.fit(x_train,y_train)
  pred = m.predict(x_test)
  return accuracy_score(y_test, pred)

models = [KNeighborsClassifier, RandomForestClassifier, MLPClassifier, MultinomialNB, DecisionTreeClassifier] # These are functions!
model_names = ['K-Nearest Neighbors', 'Random Forest', 'Multilayer Perceptron', 'Multinomial Naive Bayes', 'Decision Tree']

acc_list = []
for i in range(len(models)):
  print('Evaluating',model_names[i])
  acc_list.append(evaluate_model(models[i], x_train, y_train, x_test, y_test))
  print('Accuracy = {:6.4f}'.format(acc_list[-1]))

best = np.argmax(acc_list)
print('The best model is',model_names[best])
print('Accuracy = {:6.4f}'.format(acc_list[best]))

      

Evaluating K-Nearest Neighbors
Accuracy = 0.9710
Evaluating Random Forest
Accuracy = 0.9550
Evaluating Multilayer Perceptron
Accuracy = 0.9945
Evaluating Multinomial Naive Bayes
Accuracy = 0.7850
Evaluating Decision Tree
Accuracy = 0.8550
The best model is Multilayer Perceptron
Accuracy = 0.9945
