# Optimization Mini Project

## Introduction

The Modified National Institute of Standards and Technology (MNIST) dataset is a large database composed of images of handwritten digits. Each 'image' is a 28x28 dataframe containing single grayscale values in each element of the matrix. This database is frequently used to understand image classification machine learning models through pattern recognition. The goal of this project is to design a convolutional neural network that classifies the MNIST test dataset with 99% or greater accuracy. 

## Model Building

In [102]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import tensorflow as tf
import random
%matplotlib inline

In [103]:
mnist = tf.keras.datasets.mnist

(x_train, y_train),(x_test, y_test) = mnist.load_data()

# normalize grayscale values
x_train, x_test = x_train / 255.0, x_test / 255.0

ndata_train = x_train.shape[0] # = 60000
ndata_test = x_test.shape[0] # = 10000

In [104]:
# tensorflow requires us to include 3rd dimension for color (1)
x_train2 = x_train.reshape((ndata_train,28,28,1))
x_test2 = x_test.reshape((ndata_test,28,28,1))

xshape = x_train2.shape[1:4] # we only need the image dimensions for the model

### Network Architecture

Convolutional neural networks (CNNs) are often used in networks that take images as input. They work by applying a set of filters to the image, which can be thought of as a small subset of the original photo. Similarity scores (the dot product of filter and that portion of the image) are computed between the filter and image as the filter is moved across different positions in the image. An example of a filter is seen below in Figure 1.

The application of filters results in significantly more data which can increase processing times for the network. In order to 'fix' this problem, we can introduce a max pooling layer, which functions similar to a filter. The max pool filter moves across the similarity score matrix, but rather than computing the dot product, the largest value within the filter is used to define that whole section of similarity. As a result, we decrease the number of input neurons required by the dense neural network and identify the general vicinity of corresponding filter-image similarities. The output of the max pool layer is passed into a dense neural network to complete the classification. The final output of the model is a vector of 10 probabilities, one for each digit 0-9. The index with the largest probability is the number predicted by the network. A visualization of this process can be seen below in Figure 2.

Using this information, we now develop our own CNN with the goal of achieving 99% or greater classification accuracy on the MNIST test dataset. We began this process by fitting different combinations of convolution and max pool layers to the training set of 60,000 images. A validation split of 0.2 was used to prevent overfitting of the model. For the convolutional layers, we found that 2 layers of small filters, a max pool layer, another convolutional layer, and a final max pool layer present us with the largest validation accuracy. The first convolutional layer is composed of 20 5x5 filters and a ReLU activation. The second convolutional layer has 20 2x2 filters and a ReLU activation. The first max pool layer uses a pool size of 2x2 with a stride of 2 to prevent overlap. The final convolutional layer contains 50 2x2 filters and a ReLU activation. The final max pool layer once again uses a pool size of 2x2 and a stride of 2. The output of the final max pool layer is fed into multiple dense and dropout layers before outputting the final classification of the image. Once the model achieved greater than 99% validation accuracy, we retrained the same model using the entire training set.  A summary of our group's model can be seen below in Figure 3.  Additionally, Figure 4 displays a completed CNN similar to our own, which helps with a visualization of the model.

In [105]:
NNmodel = tf.keras.models.Sequential()
NNmodel.add(tf.keras.layers.Conv2D(filters = 20, kernel_size = (5,5), activation = tf.nn.relu, input_shape = xshape))
NNmodel.add(tf.keras.layers.Conv2D(filters = 20, kernel_size = (2,2), activation = tf.nn.relu))
NNmodel.add(tf.keras.layers.MaxPooling2D(pool_size = (2, 2), strides = 2))
NNmodel.add(tf.keras.layers.Conv2D(filters = 50, kernel_size = (2,2), activation = tf.nn.relu))
NNmodel.add(tf.keras.layers.MaxPooling2D(pool_size = (2,2), strides = 2))
NNmodel.add(tf.keras.layers.Flatten())
NNmodel.add(tf.keras.layers.Dropout(rate = 0.4))
NNmodel.add(tf.keras.layers.Dense(128,activation=tf.nn.relu))
NNmodel.add(tf.keras.layers.Dropout(rate = 0.2))
NNmodel.add(tf.keras.layers.Dense(64,activation=tf.nn.relu, kernel_regularizer = tf.keras.regularizers.l1(0.0005)))
NNmodel.add(tf.keras.layers.Dense(10,activation=tf.nn.softmax))

In [106]:
NNmodel.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

NNmodel.summary()

Model: "sequential_4"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 conv2d_12 (Conv2D)          (None, 24, 24, 20)        520       
                                                                 
 conv2d_13 (Conv2D)          (None, 23, 23, 20)        1620      
                                                                 
 max_pooling2d_8 (MaxPooling  (None, 11, 11, 20)       0         
 2D)                                                             
                                                                 
 conv2d_14 (Conv2D)          (None, 10, 10, 50)        4050      
                                                                 
 max_pooling2d_9 (MaxPooling  (None, 5, 5, 50)         0         
 2D)                                                             
                                                                 
 flatten_4 (Flatten)         (None, 1250)             

In [107]:
# save model
# NNmodel.save('my_model')

In [108]:
# no need to re-run, model coefficients are stored

# fit model to entire training set
# NNmodel.fit(x_train2,y_train,epochs=20,batch_size=200)

In [109]:
# evaluate on MNIST test set
NNmodel.evaluate(x_test2,y_test)



[2.667048215866089, 0.10599999874830246]

## Extract classified and misclassified image indices

In [110]:
# reload saved model
new_model = tf.keras.models.load_model('my_model')
new_model.summary()

Model: "sequential_1"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 conv2d_3 (Conv2D)           (None, 24, 24, 20)        520       
                                                                 
 conv2d_4 (Conv2D)           (None, 23, 23, 20)        1620      
                                                                 
 max_pooling2d_2 (MaxPooling  (None, 11, 11, 20)       0         
 2D)                                                             
                                                                 
 conv2d_5 (Conv2D)           (None, 10, 10, 50)        4050      
                                                                 
 max_pooling2d_3 (MaxPooling  (None, 5, 5, 50)         0         
 2D)                                                             
                                                                 
 flatten_1 (Flatten)         (None, 1250)             

In [111]:
# confirm that saved model yields same results on MNIST test set
new_model.evaluate(x_test2, y_test)



[0.04142482578754425, 0.9934999942779541]

In [112]:
# obtain vector of probabilities for each image
predicted = new_model.predict(x_test)
predicted




array([[2.0668310e-10, 5.1234097e-06, 1.9271326e-06, ..., 9.9998558e-01,
        9.4626749e-08, 5.5025171e-06],
       [1.6515084e-09, 1.5389305e-07, 9.9999988e-01, ..., 3.1046592e-11,
        5.8142628e-09, 8.2657351e-13],
       [2.2984310e-08, 9.9993062e-01, 2.9836983e-06, ..., 4.2366764e-06,
        1.7984901e-06, 1.6960884e-07],
       ...,
       [3.4410442e-15, 4.7695217e-08, 3.5074996e-12, ..., 2.7821939e-10,
        1.4908559e-09, 1.0875946e-09],
       [1.1795041e-06, 4.8979149e-10, 2.9072463e-11, ..., 1.9401449e-11,
        1.1148242e-04, 4.8751126e-08],
       [9.7171311e-09, 1.5760673e-08, 5.0838898e-09, ..., 7.2510586e-15,
        3.6665288e-08, 3.0370886e-10]], dtype=float32)

In [113]:
def get_guess(predicted):
    '''this method takes all predicted probabilities and returns the predicted classification for each image'''
    guesses = []
    for i in predicted:
        guess = i.argmax()
        guesses.append(guess)
    return guesses

guesses = get_guess(predicted)

In [114]:
# get indexes of correctly classified images
correct = np.where(np.equal(guesses, y_test))
correct = correct[0].tolist()

In [115]:
# view examples of correctly classified numbers (function is defined below)
#graph, pred, actual = correct_plot()

# print('Predicted number:', pred)
# print('Actual number:', actual)

In [116]:
# get indexes of misclassified images
incorrect = np.where(np.not_equal(guesses, y_test))
incorrect = incorrect[0].tolist()
    

In [117]:
# view examples of misclassified numbers
# graph, pred, actual = incorrect_plot()

# print('Predicted number:', pred)
# print('Actual number:', actual)

In [118]:
print('Correct:', len(correct))
print('Incorrect:', len(incorrect))

Correct: 9935
Incorrect: 65


### How does the model perform on the MNIST test set? Give a detailed response



### What are common mix-ups of numbers in the network? Why? Is it possible to get to 100% accuracy?

## Anvil Uplink

In [119]:
# upload sample file for testing functions
file = pd.read_csv('sample_data.csv', header = None)

In [120]:
import anvil.server
anvil.server.connect("QFOHMQDBBF35GJCMIEXXISPD-QRBGHVVVOTAK7STL")

In [121]:
df = pd.DataFrame(file)

In [122]:
df.columns

Int64Index([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16,
            17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27],
           dtype='int64')

In [123]:
import anvil.media

@anvil.server.callable
def image_classifier(file):
    '''this method takes an uploaded csv and classifies it 0-9'''
    with anvil.media.TempFile(file) as filename:
        file = pd.read_csv(filename, header = None)

    # first determine if grayscale values are normalized
    if file.max().values.mean() > 1:
        file = file / 255.0

    # convert to np array, reshape to make tf happy
    image = file.to_numpy().reshape(1, 28, 28, 1)

    # load saved model
    model = tf.keras.models.load_model('my_model')

    # get probabilities of each number, choose index of highest one
    predicted = model.predict(image).argmax()
    
    return predicted


In [124]:
import anvil.mpl_util

@anvil.server.callable
def image_plot(file):
    '''this method plots the user's uploaded csv'''
    with anvil.media.TempFile(file) as filename:
        file = pd.read_csv(filename, header = None)
   
    # convert to np array
    image = file.to_numpy()

    # plot
    plt.pcolor(1 - image[::-1, :], cmap = 'gray')
    plt.axis('off')

    return anvil.mpl_util.plot_image()

In [125]:
@anvil.server.callable
def incorrect_plot():
    '''this method returns a random instance of an incorrectly labeled image along with pred/actual value'''
    # choose random instance of incorrectly labeled image
    i = random.choice(incorrect)

    # plot
    plt.pcolor(1-x_test[i,::-1,:] , cmap = 'gray' )
    plt.axis('off')

    # get predicted and actual number
    pred = guesses[i]
    actual = y_test[i] 

    return anvil.mpl_util.plot_image(), pred, actual

In [126]:
@anvil.server.callable
def correct_plot():
    '''this method returns a random instance of an correctly labeled image along with pred/actual value'''
    # choose random instance of correctly labeled image
    i = random.choice(correct)

    # plot
    plt.pcolor(1-x_test[i,::-1,:] , cmap = 'gray' )
    plt.axis('off')

    # get predicted and actual number
    pred = guesses[i]
    actual = y_test[i] 

    return anvil.mpl_util.plot_image(), pred, actual