<h3>Data Preparation</h3>

In [2]:
#Importing tensorflow to load mnist database
import tensorflow as tf
from sklearn.neural_network import MLPClassifier

mnist = tf.keras.datasets.mnist

(x_train, y_train), (x_test, y_test) = mnist.load_data()
x_train, x_test = x_train / 255.0, x_test / 255.0

<h3>Building the classifier</h3>
<div>https://scikit-learn.org/stable/modules/neural_networks_supervised.html</div>
<div>https://scikitlearn.org/stable/modules/generated/sklearn.neural_network.MLPClassifier.html#sklearn.neural_network.MLPClassifier.fit</div>

In [3]:
import warnings
from sklearn.preprocessing import StandardScaler
from sklearn.exceptions import ConvergenceWarning


#flatten like in the HW
x_train_flatten = x_train.reshape(x_train.shape[0],x_train.shape[1]*x_train.shape[2])
x_test_flatten = x_test .reshape(x_test.shape[0],x_test.shape[1]*x_test.shape[2])

#scale the inputs
scaler = StandardScaler()
scaler.fit(x_train_flatten)
x_train_ready = scaler.transform(x_train_flatten)
x_test_ready = scaler.transform(x_test_flatten)

#Using stochastic gradient descent
clf = MLPClassifier(solver='sgd', alpha=1e-5,hidden_layer_sizes=(50,),max_iter=8
                    , verbose=True, random_state=1,
                    learning_rate_init=.1)

#https://scikit-learn.org/stable/auto_examples/neural_networks/plot_mnist_filters.html
#Catching the warning to get rid of the error message
with warnings.catch_warnings():
    warnings.filterwarnings("ignore", category=ConvergenceWarning,
                            module="sklearn")
    clf.fit(x_train_ready,y_train)

Iteration 1, loss = 0.31342783
Iteration 2, loss = 0.25560445
Iteration 3, loss = 0.24057617
Iteration 4, loss = 0.24962298
Iteration 5, loss = 0.26456192
Iteration 6, loss = 0.20037670
Iteration 7, loss = 0.15379867
Iteration 8, loss = 0.13941311


In [3]:
print("Training set score: %f" % clf.score(x_train_ready, y_train))
print("Test set score: %f" % clf.score(x_test_ready, y_test))

Training set score: 0.964217
Test set score: 0.947500


I decided to stick with 2 hidden layers and max iterations of 8 for my network (the loss seemed to start fluctuating around there, but below I made a loop that optimizes the hidden layer sizes for the best output in intervals of 2 between 5 and 100. Might take a bit of time to run but should result in optimal layer size. Commenteded out to save resources when you run the entire project, but when I ran it it took around ~5min and got a value of 91.

In [4]:
def optimize():
    s = 0 
    maxScore = 0
    #5<size<51, step sizes of 5
    for i in range(5,101,2):
        print("Fitting for ",i," ...")
        clf = MLPClassifier(solver='sgd', alpha=1e-5,hidden_layer_sizes=(i,),max_iter=8
                 , verbose=False, random_state=1,
                learning_rate_init=.1)
        with warnings.catch_warnings():
            warnings.filterwarnings("ignore", category=ConvergenceWarning,
                                    module="sklearn")
            clf.fit(x_train_ready,y_train)
            score = clf.score(x_test_ready, y_test)
        if(maxScore < score):
            maxScore = score
            s = i
            
    return s

p = optimize()

Fitting for , 5  ...
Fitting for , 7  ...
Fitting for , 9  ...
Fitting for , 11  ...
Fitting for , 13  ...
Fitting for , 15  ...
Fitting for , 17  ...
Fitting for , 19  ...
Fitting for , 21  ...
Fitting for , 23  ...
Fitting for , 25  ...
Fitting for , 27  ...
Fitting for , 29  ...
Fitting for , 31  ...
Fitting for , 33  ...
Fitting for , 35  ...
Fitting for , 37  ...
Fitting for , 39  ...
Fitting for , 41  ...
Fitting for , 43  ...
Fitting for , 45  ...
Fitting for , 47  ...
Fitting for , 49  ...
Fitting for , 51  ...
Fitting for , 53  ...
Fitting for , 55  ...
Fitting for , 57  ...
Fitting for , 59  ...
Fitting for , 61  ...
Fitting for , 63  ...
Fitting for , 65  ...
Fitting for , 67  ...
Fitting for , 69  ...
Fitting for , 71  ...
Fitting for , 73  ...
Fitting for , 75  ...
Fitting for , 77  ...
Fitting for , 79  ...
Fitting for , 81  ...
Fitting for , 83  ...
Fitting for , 85  ...
Fitting for , 87  ...
Fitting for , 89  ...
Fitting for , 91  ...
Fitting for , 93  ...
Fitting for ,

In [6]:
 #Uncomment if you want to do the above calculation yourself (~5min)
print("Optimal values: ",p)

clf = MLPClassifier(solver='sgd', alpha=1e-5,hidden_layer_sizes=(p,),max_iter=8
                    , verbose=True, random_state=1,
                    learning_rate_init=.1)
with warnings.catch_warnings():
    warnings.filterwarnings("ignore", category=ConvergenceWarning,
                                        module="sklearn")
    clf.fit(x_train_ready,y_train)

print("Training set score: %f" % clf.score(x_train_ready, y_train))
print("Test set score: %f" % clf.score(x_test_ready, y_test))

Optimal values:  91
Iteration 1, loss = 0.29100477
Iteration 2, loss = 0.19626521
Iteration 3, loss = 0.25345731
Iteration 4, loss = 0.28562124
Iteration 5, loss = 0.26906339
Iteration 6, loss = 0.24219481
Iteration 7, loss = 0.19200839
Iteration 8, loss = 0.17674323
Training set score: 0.979800
Test set score: 0.962900


Overall, all the performance didn't change much when I modified other parameters besides layer size. Increasing the iterations past 8 didn't have a big effect either as the loss seemed to hover around a set value. Most of the 'good' models had a score around 0.95 to 0.98.