## Question 1

Download the benchmark dataset, MNIST, from http://yann.lecun.com/exdb/mnist/. Implement multi-class classification for recognizing handwritten digits (also known as multiclass logistic regression ---this is simply a feedforward neural network with k output neurons, with one output neuron for each class, and each output neuron oi returns the probability that the input data-point xj is in class i) and try it on MNIST. 

Comments: No need to implement almost anything in DL by your own (this is true in general); the software framework (ie, the DL platform) typically provides implementations for all the things discussed in class, such as the learning algorithms, the regularizations methods, the cross-validation methods, etc.

Use your favorite deep learning platform. A few candidates:

1.	Marvin from http://marvin.is/ 
2.	Caffe from http://caffe.berkeleyvision.org) 
3.	TensorFlow from https://www.tensorflow.org
4.	Pylearn2 from http://deeplearning.net/software/pylearn2/
5.	Theano, Torch, Lasagne, etc. See more platforms at http://deeplearning.net/software_links/.

Read the tutorial about your selected platform (eg, for TensorFlow: https://www.tensorflow.org/tutorials), try it on MNIST; note that the first few examples in the tutorials are typically on MNIST or other simple image datasets, so you can follow the steps. 

Comments: MNIST is a standard dataset for machine learning and also deep learning. It’s good to try it on one shallow neural network (with one output neuron; eg, for recognizing a character A from a not-A character) before trying it on a deep neural network with multiple outputs. Downloading the dataset from other places in preprocessed format is allowed, but practicing how to read the dataset prepares you for other new datasets you may be interested in (thus, please, read the MNIST website carefully). 

1.	Try the basic minibatch SGD as your learning algorithm. It is recommended to try different initializations, different batch sizes, and different learning rates, in order to get a sense about how to tune the hyperparameters (batch size, and, learning rate). Remember to create and use validation dataset!. it will be very useful for you to read Chapter-11 of the textbook.

2.	It is recommended to try, at least, another optimization method of your choice (SGD with momentum, RMSProp, RMSProp with momentum, AdaGrad, AdaDelta, or Adam) and compare its performances to those of the basic minibatch SGD on the MNIST dataset. Which methods you want to try and how many you want to try and compare is up to you and up to the amount of time you have left to complete the assignment. Remember, this is a research course. You may want to read Chapter-8 also.

For this question we experimented with the following approaches:

1. Minibatch SGD with momentum, batch size = 20.
2. Minibatch SGD with momentum, batch size = 100.
2. Adam optimimizer, batch size = 20 
2. Adam optimimizer, batch size = 100 

We use Tensorflow for loading dataset and building the model. When fitting the model, the `validation_split` parameter is set to 0.2 and `epoch` = 10. 


In [1]:
import tensorflow as tf
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
from sklearn.metrics import classification_report
from sklearn.utils import class_weight

## Data preperation and Model building

In [2]:
mnist = tf.keras.datasets.mnist
(x_train, y_train), (x_test, y_test) = mnist.load_data()

x_train = tf.keras.utils.normalize(x_train, axis=1)
x_test = tf.keras.utils.normalize(x_test, axis=1)

In [3]:
tf.keras.backend.clear_session()

model = tf.keras.models.Sequential([
            tf.keras.layers.Flatten(),
            tf.keras.layers.Dense(128, activation=tf.nn.relu, name='L1'),
            tf.keras.layers.Dense(64, activation=tf.nn.relu, name='L2'),
            tf.keras.layers.Dense(32, activation=tf.nn.relu, name='L3'),
            tf.keras.layers.Dense(10, activation=tf.nn.softmax, name='L4')
        ])

## Mini batch SGD algorithm Momentum (batch size = 20)

In [4]:
sgd = tf.keras.optimizers.SGD(learning_rate=0.01, momentum=0.9)

model.compile(loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True), optimizer=sgd, metrics=['accuracy'])
model.fit(x_train, y_train, epochs=10, batch_size=20, validation_split=0.2)

Epoch 1/10


  output, from_logits = _get_logits(


[1m2400/2400[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 602us/step - accuracy: 0.7844 - loss: 0.7022 - val_accuracy: 0.9482 - val_loss: 0.1693
Epoch 2/10
[1m2400/2400[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 576us/step - accuracy: 0.9566 - loss: 0.1425 - val_accuracy: 0.9553 - val_loss: 0.1443
Epoch 3/10
[1m2400/2400[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 525us/step - accuracy: 0.9694 - loss: 0.0966 - val_accuracy: 0.9652 - val_loss: 0.1202
Epoch 4/10
[1m2400/2400[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 536us/step - accuracy: 0.9774 - loss: 0.0681 - val_accuracy: 0.9613 - val_loss: 0.1264
Epoch 5/10
[1m2400/2400[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 551us/step - accuracy: 0.9810 - loss: 0.0581 - val_accuracy: 0.9697 - val_loss: 0.1076
Epoch 6/10
[1m2400/2400[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 530us/step - accuracy: 0.9853 - loss: 0.0432 - val_accuracy: 0.9681 - val_loss: 0.1145
Epoch 7/10
[1m

<keras.src.callbacks.history.History at 0x321815f60>

In [5]:
eval_loss1, eval_acc1 = model.evaluate(x_test, y_test)

print('Test accuracy: ', eval_acc1)
print('Test loss: ', eval_loss1)

[1m313/313[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 319us/step - accuracy: 0.9661 - loss: 0.1433
Test accuracy:  0.9717000126838684
Test loss:  0.11572042107582092


In [6]:
y_pred = np.argmax(model.predict(x_test), axis=1)
print(classification_report(y_test, y_pred))

[1m313/313[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 375us/step
              precision    recall  f1-score   support

           0       0.98      0.99      0.98       980
           1       0.98      0.99      0.99      1135
           2       0.96      0.98      0.97      1032
           3       0.93      0.98      0.96      1010
           4       0.97      0.97      0.97       982
           5       0.98      0.94      0.96       892
           6       0.98      0.98      0.98       958
           7       0.98      0.97      0.98      1028
           8       0.95      0.97      0.96       974
           9       0.98      0.93      0.96      1009

    accuracy                           0.97     10000
   macro avg       0.97      0.97      0.97     10000
weighted avg       0.97      0.97      0.97     10000



## Minibatch SGD algorithm with momentum (batch size = 100)

In [7]:
tf.keras.backend.clear_session()
sgd = tf.keras.optimizers.SGD(learning_rate=0.01, momentum=0.9)
model2 = tf.keras.models.Sequential([
            tf.keras.layers.Flatten(),
            tf.keras.layers.Dense(128, activation=tf.nn.relu, name='L1'),
            tf.keras.layers.Dense(64, activation=tf.nn.relu, name='L2'),
            tf.keras.layers.Dense(32, activation=tf.nn.relu, name='L3'),
            tf.keras.layers.Dense(10, activation=tf.nn.softmax, name='L4')

    ])
model2.compile(loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True), optimizer=sgd, metrics=['accuracy'])
model2.fit(x_train, y_train, epochs=10, batch_size=100, validation_split=0.2)

Epoch 1/10


  output, from_logits = _get_logits(


[1m480/480[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 1ms/step - accuracy: 0.5852 - loss: 1.3260 - val_accuracy: 0.9102 - val_loss: 0.3052
Epoch 2/10
[1m480/480[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 863us/step - accuracy: 0.9191 - loss: 0.2824 - val_accuracy: 0.9378 - val_loss: 0.2153
Epoch 3/10
[1m480/480[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 881us/step - accuracy: 0.9411 - loss: 0.1997 - val_accuracy: 0.9487 - val_loss: 0.1803
Epoch 4/10
[1m480/480[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 860us/step - accuracy: 0.9529 - loss: 0.1559 - val_accuracy: 0.9536 - val_loss: 0.1577
Epoch 5/10
[1m480/480[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 909us/step - accuracy: 0.9616 - loss: 0.1274 - val_accuracy: 0.9606 - val_loss: 0.1358
Epoch 6/10
[1m480/480[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 921us/step - accuracy: 0.9704 - loss: 0.1028 - val_accuracy: 0.9603 - val_loss: 0.1359
Epoch 7/10
[1m480/480[0m [

<keras.src.callbacks.history.History at 0x35f031f30>

In [8]:
eval_loss2, eval_acc2 = model2.evaluate(x_test, y_test)

print('Test accuracy: ', eval_acc2)
print('Test loss: ', eval_loss2)

[1m313/313[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 318us/step - accuracy: 0.9643 - loss: 0.1202
Test accuracy:  0.9688000082969666
Test loss:  0.10390076041221619


In [9]:
y_pred = np.argmax(model2.predict(x_test), axis=1)
print(classification_report(y_test, y_pred))

[1m313/313[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 408us/step
              precision    recall  f1-score   support

           0       0.97      0.99      0.98       980
           1       0.98      0.99      0.98      1135
           2       0.98      0.97      0.97      1032
           3       0.96      0.97      0.96      1010
           4       0.98      0.95      0.96       982
           5       0.96      0.98      0.97       892
           6       0.97      0.97      0.97       958
           7       0.97      0.96      0.96      1028
           8       0.96      0.96      0.96       974
           9       0.95      0.95      0.95      1009

    accuracy                           0.97     10000
   macro avg       0.97      0.97      0.97     10000
weighted avg       0.97      0.97      0.97     10000



## Adam optimizer (batch size = 20)

In [10]:
tf.keras.backend.clear_session()

model3 = tf.keras.models.Sequential([
            tf.keras.layers.Flatten(),
            tf.keras.layers.Dense(128, activation=tf.nn.relu, name='L1'),
            tf.keras.layers.Dense(64, activation=tf.nn.relu, name='L2'),
            tf.keras.layers.Dense(32, activation=tf.nn.relu, name='L3'),
            tf.keras.layers.Dense(10, activation=tf.nn.softmax, name='L4')
    ])

model3.compile(loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True), optimizer='adam', metrics=['accuracy'])
model3.fit(x_train, y_train, batch_size=20, epochs=10, validation_split=0.2)

Epoch 1/10


  output, from_logits = _get_logits(


[1m2400/2400[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 875us/step - accuracy: 0.8367 - loss: 0.5249 - val_accuracy: 0.9561 - val_loss: 0.1402
Epoch 2/10
[1m2400/2400[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 796us/step - accuracy: 0.9613 - loss: 0.1257 - val_accuracy: 0.9615 - val_loss: 0.1213
Epoch 3/10
[1m2400/2400[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 767us/step - accuracy: 0.9775 - loss: 0.0737 - val_accuracy: 0.9689 - val_loss: 0.1078
Epoch 4/10
[1m2400/2400[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 806us/step - accuracy: 0.9813 - loss: 0.0565 - val_accuracy: 0.9638 - val_loss: 0.1193
Epoch 5/10
[1m2400/2400[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 771us/step - accuracy: 0.9843 - loss: 0.0481 - val_accuracy: 0.9722 - val_loss: 0.1078
Epoch 6/10
[1m2400/2400[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 787us/step - accuracy: 0.9891 - loss: 0.0344 - val_accuracy: 0.9673 - val_loss: 0.1289
Epoch 7/10
[1m

<keras.src.callbacks.history.History at 0x3600a17b0>

In [11]:
eval_loss3, eval_acc3 = model3.evaluate(x_test, y_test)

print('Test accuracy: ', eval_acc3)
print('Test loss: ', eval_loss3)

[1m313/313[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 319us/step - accuracy: 0.9629 - loss: 0.1731
Test accuracy:  0.9689000248908997
Test loss:  0.14368994534015656


In [12]:
y_pred = np.argmax(model3.predict(x_test), axis=1)
print(classification_report(y_test, y_pred))

[1m313/313[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 371us/step
              precision    recall  f1-score   support

           0       0.99      0.98      0.99       980
           1       0.99      0.98      0.99      1135
           2       0.98      0.97      0.98      1032
           3       0.93      0.99      0.96      1010
           4       0.99      0.94      0.96       982
           5       0.98      0.96      0.97       892
           6       0.96      0.99      0.97       958
           7       0.99      0.95      0.97      1028
           8       0.93      0.98      0.95       974
           9       0.95      0.95      0.95      1009

    accuracy                           0.97     10000
   macro avg       0.97      0.97      0.97     10000
weighted avg       0.97      0.97      0.97     10000



## Adam optimizer (batch size = 100)

In [13]:
tf.keras.backend.clear_session()

model4 = tf.keras.models.Sequential([
            tf.keras.layers.Flatten(),
            tf.keras.layers.Dense(128, activation=tf.nn.relu, name='L1'),
            tf.keras.layers.Dense(64, activation=tf.nn.relu, name='L2'),
            tf.keras.layers.Dense(32, activation=tf.nn.relu, name='L3'),
            tf.keras.layers.Dense(10, activation=tf.nn.softmax, name='L4')
    ])

model4.compile(loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True), optimizer='adam', metrics=['accuracy'])
model4.fit(x_train, y_train, batch_size=100, epochs=10, validation_split=0.2)

Epoch 1/10


  output, from_logits = _get_logits(


[1m480/480[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 1ms/step - accuracy: 0.7634 - loss: 0.8087 - val_accuracy: 0.9461 - val_loss: 0.1919
Epoch 2/10
[1m480/480[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 1ms/step - accuracy: 0.9508 - loss: 0.1638 - val_accuracy: 0.9586 - val_loss: 0.1402
Epoch 3/10
[1m480/480[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 1ms/step - accuracy: 0.9677 - loss: 0.1105 - val_accuracy: 0.9632 - val_loss: 0.1171
Epoch 4/10
[1m480/480[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 1ms/step - accuracy: 0.9766 - loss: 0.0799 - val_accuracy: 0.9640 - val_loss: 0.1192
Epoch 5/10
[1m480/480[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 1ms/step - accuracy: 0.9812 - loss: 0.0638 - val_accuracy: 0.9693 - val_loss: 0.1038
Epoch 6/10
[1m480/480[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 1ms/step - accuracy: 0.9839 - loss: 0.0516 - val_accuracy: 0.9697 - val_loss: 0.1039
Epoch 7/10
[1m480/480[0m [32m━━━━━━━

<keras.src.callbacks.history.History at 0x3608d4730>

In [14]:
eval_loss4, eval_acc4 = model4.evaluate(x_test, y_test)

print('Test accuracy: ', eval_acc4)
print('Test loss: ', eval_loss4)

[1m313/313[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 345us/step - accuracy: 0.9644 - loss: 0.1246
Test accuracy:  0.97079998254776
Test loss:  0.10548028349876404


In [15]:
y_pred = np.argmax(model4.predict(x_test), axis=1)
print(classification_report(y_test, y_pred))

[1m313/313[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 382us/step
              precision    recall  f1-score   support

           0       0.98      0.99      0.99       980
           1       0.99      0.99      0.99      1135
           2       0.97      0.95      0.96      1032
           3       0.93      0.99      0.96      1010
           4       0.97      0.97      0.97       982
           5       0.98      0.95      0.96       892
           6       0.97      0.98      0.98       958
           7       0.97      0.97      0.97      1028
           8       0.96      0.97      0.97       974
           9       0.98      0.95      0.96      1009

    accuracy                           0.97     10000
   macro avg       0.97      0.97      0.97     10000
weighted avg       0.97      0.97      0.97     10000

