## Question 1

Download the benchmark dataset, MNIST, from http://yann.lecun.com/exdb/mnist/. Implement multi-class classification for recognizing handwritten digits (also known as multiclass logistic regression ---this is simply a feedforward neural network with k output neurons, with one output neuron for each class, and each output neuron oi returns the probability that the input data-point xj is in class i) and try it on MNIST. 

Comments: No need to implement almost anything in DL by your own (this is true in general); the software framework (ie, the DL platform) typically provides implementations for all the things discussed in class, such as the learning algorithms, the regularizations methods, the cross-validation methods, etc.

Use your favorite deep learning platform. A few candidates:

1.	Marvin from http://marvin.is/ 
2.	Caffe from http://caffe.berkeleyvision.org) 
3.	TensorFlow from https://www.tensorflow.org
4.	Pylearn2 from http://deeplearning.net/software/pylearn2/
5.	Theano, Torch, Lasagne, etc. See more platforms at http://deeplearning.net/software_links/.

Read the tutorial about your selected platform (eg, for TensorFlow: https://www.tensorflow.org/tutorials), try it on MNIST; note that the first few examples in the tutorials are typically on MNIST or other simple image datasets, so you can follow the steps. 

Comments: MNIST is a standard dataset for machine learning and also deep learning. It’s good to try it on one shallow neural network (with one output neuron; eg, for recognizing a character A from a not-A character) before trying it on a deep neural network with multiple outputs. Downloading the dataset from other places in preprocessed format is allowed, but practicing how to read the dataset prepares you for other new datasets you may be interested in (thus, please, read the MNIST website carefully). 

1.	Try the basic minibatch SGD as your learning algorithm. It is recommended to try different initializations, different batch sizes, and different learning rates, in order to get a sense about how to tune the hyperparameters (batch size, and, learning rate). Remember to create and use validation dataset!. it will be very useful for you to read Chapter-11 of the textbook.

2.	It is recommended to try, at least, another optimization method of your choice (SGD with momentum, RMSProp, RMSProp with momentum, AdaGrad, AdaDelta, or Adam) and compare its performances to those of the basic minibatch SGD on the MNIST dataset. Which methods you want to try and how many you want to try and compare is up to you and up to the amount of time you have left to complete the assignment. Remember, this is a research course. You may want to read Chapter-8 also.

In [5]:
import tensorflow as tf
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
from sklearn.metrics import classification_report
from sklearn.utils import class_weight

**1. Data preperation and Model building**

In [6]:
mnist = tf.keras.datasets.mnist
(x_train, y_train), (x_test, y_test) = mnist.load_data()

x_train = tf.keras.utils.normalize(x_train, axis=1)
x_test = tf.keras.utils.normalize(x_test, axis=1)

In [7]:
tf.keras.backend.clear_session()

model = tf.keras.models.Sequential([
            tf.keras.layers.Flatten(),
            tf.keras.layers.Dense(128, activation=tf.nn.relu, name='L1'),
            tf.keras.layers.Dense(64, activation=tf.nn.relu, name='L2'),
            tf.keras.layers.Dense(32, activation=tf.nn.relu, name='L3'),
            tf.keras.layers.Dense(10, activation=tf.nn.softmax, name='L4')
        ])

**2. Model training using mini batch SGD algorithm Momentum Optimizer (batch size = 20)**

In [8]:
sgd = tf.keras.optimizers.SGD(learning_rate=0.01, momentum=0.9)

model.compile(loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True), optimizer=sgd, metrics=['accuracy'])
model.fit(x_train, y_train, epochs=10, batch_size=20)

Epoch 1/10


  output, from_logits = _get_logits(


[1m3000/3000[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 501us/step - accuracy: 0.8030 - loss: 0.6247
Epoch 2/10
[1m3000/3000[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 478us/step - accuracy: 0.9617 - loss: 0.1260
Epoch 3/10
[1m3000/3000[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 480us/step - accuracy: 0.9740 - loss: 0.0814
Epoch 4/10
[1m3000/3000[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 473us/step - accuracy: 0.9805 - loss: 0.0607
Epoch 5/10
[1m3000/3000[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 482us/step - accuracy: 0.9849 - loss: 0.0484
Epoch 6/10
[1m3000/3000[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 469us/step - accuracy: 0.9880 - loss: 0.0366
Epoch 7/10
[1m3000/3000[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 473us/step - accuracy: 0.9894 - loss: 0.0311
Epoch 8/10
[1m3000/3000[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 467us/step - accuracy: 0.9929 - loss: 0.0229
Epoch 9/10
[1m3000

<keras.src.callbacks.history.History at 0x32034fe50>

In [9]:
eval_loss1, eval_acc1 = model.evaluate(x_test, y_test)

print('Test accuracy: ', eval_acc1)
print('Test loss: ', eval_loss1)

[1m313/313[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 328us/step - accuracy: 0.9686 - loss: 0.1301
Test accuracy:  0.9722999930381775
Test loss:  0.11608284711837769


In [10]:
y_pred = np.argmax(model.predict(x_test), axis=1)
print(classification_report(y_test, y_pred))

[1m313/313[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 374us/step
              precision    recall  f1-score   support

           0       0.96      0.99      0.98       980
           1       0.99      0.99      0.99      1135
           2       0.96      0.98      0.97      1032
           3       0.98      0.96      0.97      1010
           4       0.95      0.98      0.97       982
           5       0.99      0.95      0.97       892
           6       0.98      0.98      0.98       958
           7       0.97      0.97      0.97      1028
           8       0.95      0.97      0.96       974
           9       0.98      0.94      0.96      1009

    accuracy                           0.97     10000
   macro avg       0.97      0.97      0.97     10000
weighted avg       0.97      0.97      0.97     10000



**3. Model training using minibatch SGD algorithm with momentum optimizer ( batch size = 40 )**

In [12]:
tf.keras.backend.clear_session()
sgd = tf.keras.optimizers.SGD(learning_rate=0.01, momentum=0.9)
model2 = tf.keras.models.Sequential([
            tf.keras.layers.Flatten(),
            tf.keras.layers.Dense(128, activation=tf.nn.relu, name='L1'),
            tf.keras.layers.Dense(64, activation=tf.nn.relu, name='L2'),
            tf.keras.layers.Dense(32, activation=tf.nn.relu, name='L3'),
            tf.keras.layers.Dense(10, activation=tf.nn.softmax, name='L4')

    ])
model2.compile(loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True), optimizer=sgd, metrics=['accuracy'])
model2.fit(x_train, y_train, epochs=10, batch_size=40)

Epoch 1/10
[1m1500/1500[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 525us/step - accuracy: 0.7404 - loss: 0.8340
Epoch 2/10
[1m1500/1500[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 538us/step - accuracy: 0.9491 - loss: 0.1671
Epoch 3/10
[1m1500/1500[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 553us/step - accuracy: 0.9665 - loss: 0.1073
Epoch 4/10
[1m1500/1500[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 522us/step - accuracy: 0.9757 - loss: 0.0787
Epoch 5/10
[1m1500/1500[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 547us/step - accuracy: 0.9794 - loss: 0.0629
Epoch 6/10
[1m1500/1500[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 532us/step - accuracy: 0.9843 - loss: 0.0515
Epoch 7/10
[1m1500/1500[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 505us/step - accuracy: 0.9874 - loss: 0.0390
Epoch 8/10
[1m1500/1500[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 525us/step - accuracy: 0.9904 - loss: 0.0315
Epoch 9/

<keras.src.callbacks.history.History at 0x35ee7add0>

In [13]:
eval_loss2, eval_acc2 = model2.evaluate(x_test, y_test)

print('Test accuracy: ', eval_acc2)
print('Test loss: ', eval_loss2)

[1m313/313[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 331us/step - accuracy: 0.9700 - loss: 0.1108
Test accuracy:  0.975600004196167
Test loss:  0.09041903913021088


In [14]:
y_pred = np.argmax(model2.predict(x_test), axis=1)
print(classification_report(y_test, y_pred))

[1m313/313[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 398us/step
              precision    recall  f1-score   support

           0       0.98      0.99      0.99       980
           1       0.99      0.99      0.99      1135
           2       0.97      0.98      0.98      1032
           3       0.96      0.97      0.97      1010
           4       0.98      0.96      0.97       982
           5       0.97      0.97      0.97       892
           6       0.99      0.98      0.98       958
           7       0.97      0.98      0.98      1028
           8       0.95      0.98      0.97       974
           9       0.97      0.96      0.97      1009

    accuracy                           0.98     10000
   macro avg       0.98      0.98      0.98     10000
weighted avg       0.98      0.98      0.98     10000



**4. Model training using Adam optimizer (batch size = 40)**

In [15]:
tf.keras.backend.clear_session()

model3 = tf.keras.models.Sequential([
            tf.keras.layers.Flatten(),
            tf.keras.layers.Dense(128, activation=tf.nn.relu, name='L1'),
            tf.keras.layers.Dense(64, activation=tf.nn.relu, name='L2'),
            tf.keras.layers.Dense(32, activation=tf.nn.relu, name='L3'),
            tf.keras.layers.Dense(10, activation=tf.nn.softmax, name='L4')
    ])

model3.compile(loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True), optimizer='adam', metrics=['accuracy'])
model3.fit(x_train, y_train, batch_size=40, epochs=10)

Epoch 1/10


  output, from_logits = _get_logits(


[1m1500/1500[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 771us/step - accuracy: 0.8350 - loss: 0.5558
Epoch 2/10
[1m1500/1500[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 758us/step - accuracy: 0.9613 - loss: 0.1233
Epoch 3/10
[1m1500/1500[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 753us/step - accuracy: 0.9751 - loss: 0.0789
Epoch 4/10
[1m1500/1500[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 765us/step - accuracy: 0.9816 - loss: 0.0589
Epoch 5/10
[1m1500/1500[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 751us/step - accuracy: 0.9871 - loss: 0.0416
Epoch 6/10
[1m1500/1500[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 796us/step - accuracy: 0.9892 - loss: 0.0331
Epoch 7/10
[1m1500/1500[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 753us/step - accuracy: 0.9914 - loss: 0.0267
Epoch 8/10
[1m1500/1500[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 762us/step - accuracy: 0.9924 - loss: 0.0229
Epoch 9/10
[1m1500

<keras.src.callbacks.history.History at 0x35efee890>

In [16]:
eval_loss3, eval_acc3 = model3.evaluate(x_test, y_test)

print('Test accuracy: ', eval_acc3)
print('Test loss: ', eval_loss3)

[1m313/313[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 328us/step - accuracy: 0.9700 - loss: 0.1334
Test accuracy:  0.9742000102996826
Test loss:  0.1181526631116867


In [17]:
y_pred = np.argmax(model3.predict(x_test), axis=1)
print(classification_report(y_test, y_pred))

[1m313/313[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 387us/step
              precision    recall  f1-score   support

           0       0.99      0.97      0.98       980
           1       0.98      1.00      0.99      1135
           2       0.96      0.99      0.97      1032
           3       0.97      0.97      0.97      1010
           4       0.94      0.99      0.97       982
           5       0.98      0.97      0.97       892
           6       0.99      0.98      0.98       958
           7       0.98      0.97      0.97      1028
           8       0.98      0.96      0.97       974
           9       0.98      0.95      0.96      1009

    accuracy                           0.97     10000
   macro avg       0.97      0.97      0.97     10000
weighted avg       0.97      0.97      0.97     10000

