## Question 1

### Download the benchmark dataset, MNIST, from http://yann.lecun.com/exdb/mnist/. Implement multi-class classification for recognizing handwritten digits (also known as multiclass logistic regression ---this is simply a feedforward neural network with k output neurons, with one output neuron for each class, and each output neuron oi returns the probability that the input data-point xj is in class i) and try it on MNIST. 

Comments: No need to implement almost anything in DL by your own (this is true in general); the software framework (ie, the DL platform) typically provides implementations for all the things discussed in class, such as the learning algorithms, the regularizations methods, the cross-validation methods, etc.

Use your favorite deep learning platform. A few candidates:

1.	Marvin from http://marvin.is/ 
2.	Caffe from http://caffe.berkeleyvision.org) 
3.	TensorFlow from https://www.tensorflow.org
4.	Pylearn2 from http://deeplearning.net/software/pylearn2/
5.	Theano, Torch, Lasagne, etc. See more platforms at http://deeplearning.net/software_links/.

Read the tutorial about your selected platform (eg, for TensorFlow: https://www.tensorflow.org/tutorials), try it on MNIST; note that the first few examples in the tutorials are typically on MNIST or other simple image datasets, so you can follow the steps. 

Comments: MNIST is a standard dataset for machine learning and also deep learning. It’s good to try it on one shallow neural network (with one output neuron; eg, for recognizing a character A from a not-A character) before trying it on a deep neural network with multiple outputs. Downloading the dataset from other places in preprocessed format is allowed, but practicing how to read the dataset prepares you for other new datasets you may be interested in (thus, please, read the MNIST website carefully). 

1.	Try the basic minibatch SGD as your learning algorithm. It is recommended to try different initializations, different batch sizes, and different learning rates, in order to get a sense about how to tune the hyperparameters (batch size, and, learning rate). Remember to create and use validation dataset!. it will be very useful for you to read Chapter-11 of the textbook.

2.	It is recommended to try, at least, another optimization method of your choice (SGD with momentum, RMSProp, RMSProp with momentum, AdaGrad, AdaDelta, or Adam) and compare its performances to those of the basic minibatch SGD on the MNIST dataset. Which methods you want to try and how many you want to try and compare is up to you and up to the amount of time you have left to complete the assignment. Remember, this is a research course. You may want to read Chapter-8 also.

In [335]:
import tensorflow as tf
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
from sklearn.metrics import classification_report
from sklearn.utils import class_weight

### Data preperation and Model building

In [336]:
mnist = tf.keras.datasets.mnist
(x_train, y_train), (x_test, y_test) = mnist.load_data()

x_train = tf.keras.utils.normalize(x_train, axis=1)
x_test = tf.keras.utils.normalize(x_test, axis=1)

In [337]:
tf.keras.backend.clear_session()

model = tf.keras.models.Sequential([
            tf.keras.layers.Flatten(),
            tf.keras.layers.Dense(128, activation=tf.nn.relu, name='L1'),
            tf.keras.layers.Dense(64, activation=tf.nn.relu, name='L2'),
            tf.keras.layers.Dense(32, activation=tf.nn.relu, name='L3'),
            tf.keras.layers.Dense(10, activation=tf.nn.softmax, name='L4')
        ])

### Model training using mini batch SGD algorithm

In [338]:
model.compile(loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True), optimizer='sgd', metrics=['accuracy'])
model.fit(x_train, y_train, epochs=10, batch_size=20)

Epoch 1/10


  output, from_logits = _get_logits(


[1m3000/3000[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 523us/step - accuracy: 0.5987 - loss: 1.3929
Epoch 2/10
[1m3000/3000[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 535us/step - accuracy: 0.9077 - loss: 0.3183
Epoch 3/10
[1m3000/3000[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 521us/step - accuracy: 0.9304 - loss: 0.2395
Epoch 4/10
[1m3000/3000[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 539us/step - accuracy: 0.9426 - loss: 0.1954
Epoch 5/10
[1m3000/3000[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 528us/step - accuracy: 0.9518 - loss: 0.1617
Epoch 6/10
[1m3000/3000[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 526us/step - accuracy: 0.9563 - loss: 0.1427
Epoch 7/10
[1m3000/3000[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 540us/step - accuracy: 0.9619 - loss: 0.1248
Epoch 8/10
[1m3000/3000[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 523us/step - accuracy: 0.9647 - loss: 0.1139
Epoch 9/10
[1m3000

<keras.src.callbacks.history.History at 0x2ef420dd6d0>

In [339]:
eval_loss1, eval_acc1 = model.evaluate(x_test, y_test)

print('Test accuracy: ', eval_acc1)
print('Test loss: ', eval_loss1)

[1m313/313[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 469us/step - accuracy: 0.9590 - loss: 0.1292
Test accuracy:  0.9646999835968018
Test loss:  0.11357878148555756


In [340]:
y_pred = np.argmax(model.predict(x_test), axis=1)
print(classification_report(y_test, y_pred))

[1m313/313[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 576us/step
              precision    recall  f1-score   support

           0       0.97      0.98      0.98       980
           1       0.99      0.98      0.98      1135
           2       0.97      0.96      0.97      1032
           3       0.96      0.95      0.96      1010
           4       0.95      0.98      0.96       982
           5       0.94      0.98      0.96       892
           6       0.96      0.98      0.97       958
           7       0.98      0.95      0.97      1028
           8       0.95      0.95      0.95       974
           9       0.98      0.94      0.96      1009

    accuracy                           0.96     10000
   macro avg       0.96      0.96      0.96     10000
weighted avg       0.96      0.96      0.96     10000



### Model training using SGD algorithm

In [341]:
tf.keras.backend.clear_session()

model2 = tf.keras.models.Sequential([
            tf.keras.layers.Flatten(),
            tf.keras.layers.Dense(128, activation=tf.nn.relu, name='L1'),
            tf.keras.layers.Dense(64, activation=tf.nn.relu, name='L2'),
            tf.keras.layers.Dense(32, activation=tf.nn.relu, name='L3'),
            tf.keras.layers.Dense(10, activation=tf.nn.softmax, name='L4')

    ])
model2.compile(loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True), optimizer='sgd', metrics=['accuracy'])
model2.fit(x_train, y_train, epochs=10)

Epoch 1/10


  output, from_logits = _get_logits(


[1m1875/1875[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 584us/step - accuracy: 0.5262 - loss: 1.6208
Epoch 2/10
[1m1875/1875[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 607us/step - accuracy: 0.8859 - loss: 0.3995
Epoch 3/10
[1m1875/1875[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 606us/step - accuracy: 0.9145 - loss: 0.2984
Epoch 4/10
[1m1875/1875[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 598us/step - accuracy: 0.9254 - loss: 0.2566
Epoch 5/10
[1m1875/1875[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 599us/step - accuracy: 0.9360 - loss: 0.2195
Epoch 6/10
[1m1875/1875[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 593us/step - accuracy: 0.9418 - loss: 0.2016
Epoch 7/10
[1m1875/1875[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 595us/step - accuracy: 0.9483 - loss: 0.1750
Epoch 8/10
[1m1875/1875[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 599us/step - accuracy: 0.9534 - loss: 0.1612
Epoch 9/10
[1m1875

<keras.src.callbacks.history.History at 0x2ef4322bdd0>

In [342]:
eval_loss2, eval_acc2 = model2.evaluate(x_test, y_test)

print('Test accuracy: ', eval_acc2)
print('Test loss: ', eval_loss2)

[1m313/313[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 470us/step - accuracy: 0.9518 - loss: 0.1593
Test accuracy:  0.9575999975204468
Test loss:  0.1401475965976715


In [343]:
y_pred = np.argmax(model2.predict(x_test), axis=1)
print(classification_report(y_test, y_pred))

[1m313/313[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 591us/step
              precision    recall  f1-score   support

           0       0.96      0.98      0.97       980
           1       0.99      0.98      0.98      1135
           2       0.95      0.95      0.95      1032
           3       0.95      0.95      0.95      1010
           4       0.96      0.96      0.96       982
           5       0.97      0.95      0.96       892
           6       0.94      0.98      0.96       958
           7       0.97      0.95      0.96      1028
           8       0.92      0.95      0.94       974
           9       0.96      0.94      0.95      1009

    accuracy                           0.96     10000
   macro avg       0.96      0.96      0.96     10000
weighted avg       0.96      0.96      0.96     10000



### Model training using Adam algorithm

In [344]:
tf.keras.backend.clear_session()

model3 = tf.keras.models.Sequential([
            tf.keras.layers.Flatten(),
            tf.keras.layers.Dense(128, activation=tf.nn.relu, name='L1'),
            tf.keras.layers.Dense(64, activation=tf.nn.relu, name='L2'),
            tf.keras.layers.Dense(32, activation=tf.nn.relu, name='L3'),
            tf.keras.layers.Dense(10, activation=tf.nn.softmax, name='L4')
    ])

model3.compile(loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True), optimizer='adam', metrics=['accuracy'])
model3.fit(x_train, y_train, epochs=10)

Epoch 1/10


  output, from_logits = _get_logits(


[1m1875/1875[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 840us/step - accuracy: 0.8444 - loss: 0.5189
Epoch 2/10
[1m1875/1875[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 832us/step - accuracy: 0.9637 - loss: 0.1154
Epoch 3/10
[1m1875/1875[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 828us/step - accuracy: 0.9769 - loss: 0.0737
Epoch 4/10
[1m1875/1875[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 845us/step - accuracy: 0.9835 - loss: 0.0534
Epoch 5/10
[1m1875/1875[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 793us/step - accuracy: 0.9869 - loss: 0.0387
Epoch 6/10
[1m1875/1875[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 811us/step - accuracy: 0.9891 - loss: 0.0325
Epoch 7/10
[1m1875/1875[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 863us/step - accuracy: 0.9910 - loss: 0.0262
Epoch 8/10
[1m1875/1875[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 827us/step - accuracy: 0.9929 - loss: 0.0214
Epoch 9/10
[1m1875

<keras.src.callbacks.history.History at 0x2ef4471b650>

In [345]:
eval_loss3, eval_acc3 = model3.evaluate(x_test, y_test)

print('Test accuracy: ', eval_acc3)
print('Test loss: ', eval_loss3)

[1m313/313[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 500us/step - accuracy: 0.9686 - loss: 0.1405
Test accuracy:  0.9732999801635742
Test loss:  0.11358001828193665


In [346]:
y_pred = np.argmax(model3.predict(x_test), axis=1)
print(classification_report(y_test, y_pred))

[1m313/313[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 591us/step
              precision    recall  f1-score   support

           0       0.99      0.98      0.98       980
           1       0.98      0.99      0.99      1135
           2       0.98      0.97      0.97      1032
           3       0.96      0.98      0.97      1010
           4       0.96      0.98      0.97       982
           5       0.98      0.95      0.97       892
           6       0.99      0.97      0.98       958
           7       0.98      0.96      0.97      1028
           8       0.96      0.97      0.97       974
           9       0.95      0.97      0.96      1009

    accuracy                           0.97     10000
   macro avg       0.97      0.97      0.97     10000
weighted avg       0.97      0.97      0.97     10000

