# Exercise - MNIST

1. Use the $\texttt{mnist}$ dataset (as just shown in the slides). Build a neural network using what we have explored so far and evaluate its performance on the test data.
1. Explore whether your neural network appears to be under- or overfitting by constructing plots of the train and test losses and accuracies during training. Use this information to improve your model - that is, train for longer if it appears to be underfitting and shorter if it appears to be overfitting. Does your test performance improve? What about your train performance?
1. (Bonus): Later during the semester, we will explore *convolutional neural networks*. For those of you finished with (1) and (2), you may try this now to improve your model; check https://www.tensorflow.org/api_docs/python/tf/keras/layers/Conv2D for details.

**See slides for more details!**

# Exercise 1

Use the $\texttt{mnist}$ dataset (as just shown in the slides). Build a neural network using what we have explored so far and evaluate its performance on the test data.

In [1]:
import tensorflow as tf


(x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data()

# Scale your features in some fashion (otherwise performance will likely suffer)
x_train = x_train/255 
x_test = x_test/255
print(x_train.shape, y_train.shape, x_test.shape, y_test.shape)

(60000, 28, 28) (60000,) (10000, 28, 28) (10000,)


In [None]:
import numpy as np
print(f'max ={np.max(x_train)}')
print(f'min ={np.min(x_train)}')

max =1.0
min =0.0


Here is a model to get you started.

Take note of the "Flatten" layer. This is important to reshape your data from (28, 28) to (784,).

Alternatively, you could reshape your data (the x's). This can be done using:

$\texttt{x = x.reshape(n, 784)}$ 

where $n$ is the number of samples (60k for training, 10k for test).

Then you don't need the Flatten layer, but remember to still specify an input shape of your first layer (i.e. 784 if you have done this reshaping).

**Note**: Do feel free to experiment with the number of layers, nodes per layer, and optimizer.

In [None]:
model = tf.keras.models.Sequential([
    tf.keras.layers.Flatten(input_shape=(28, 28)),
    tf.keras.layers.Dense(256, activation='relu'),
    tf.keras.layers.Dense(256, activation='relu'),
    tf.keras.layers.Dense(10, activation='softmax'),
])

model.compile(
    loss='sparse_categorical_crossentropy',
    optimizer='adam',
    metrics=['accuracy'],
)

print(model.summary())

Model: "sequential_12"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 flatten_12 (Flatten)        (None, 784)               0         
                                                                 
 dense_26 (Dense)            (None, 256)               200960    
                                                                 
 dense_27 (Dense)            (None, 256)               65792     
                                                                 
 dense_28 (Dense)            (None, 10)                2570      
                                                                 
Total params: 269322 (1.03 MB)
Trainable params: 269322 (1.03 MB)
Non-trainable params: 0 (0.00 Byte)
_________________________________________________________________
None


In [None]:
# Train the model
history = model.fit(x_train,
                    y_train,
                    validation_data=(x_test, y_test),
                    epochs=50,
                    batch_size=256,
                    verbose=1)

print(history.history)

Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
Epoch 11/50
Epoch 12/50
Epoch 13/50
Epoch 14/50
Epoch 15/50
Epoch 16/50
Epoch 17/50
Epoch 18/50
Epoch 19/50
Epoch 20/50
Epoch 21/50
Epoch 22/50
Epoch 23/50
Epoch 24/50
Epoch 25/50
Epoch 26/50
Epoch 27/50
Epoch 28/50
Epoch 29/50
Epoch 30/50
Epoch 31/50
Epoch 32/50
Epoch 33/50
Epoch 34/50
Epoch 35/50
Epoch 36/50
Epoch 37/50
Epoch 38/50
Epoch 39/50
Epoch 40/50
Epoch 41/50
Epoch 42/50
Epoch 43/50
Epoch 44/50
Epoch 45/50
Epoch 46/50
Epoch 47/50
Epoch 48/50
Epoch 49/50
Epoch 50/50
{'loss': [0.31669390201568604, 0.11408108472824097, 0.07571295648813248, 0.05469875782728195, 0.042892515659332275, 0.03053305856883526, 0.023329129442572594, 0.01772758550941944, 0.01395347062498331, 0.01206878013908863, 0.01100819744169712, 0.008060701191425323, 0.007344502955675125, 0.008808438666164875, 0.00959924515336752, 0.005797688849270344, 0.005790538154542446, 0.013207390904426575, 0.01085977721

In [None]:
test_loss, test_accuracy = model.evaluate(x_test, y_test)
print(f'Test Loss: {test_loss:.4f}')
print(f'Test Accuracy: {test_accuracy:.4f}')

Test Loss: 0.1241
Test Accuracy: 0.9813


Here is a small function you can use as a starting point for your network - but feel free to experiment!

# Exercise 2

Explore whether your neural network appears to be under- or overfitting by constructing plots of the train and test losses and accuracies during training. Use this information to improve your model - that is, train for longer if it appears to be underfitting and shorter if it appears to be overfitting. Does your test performance improve? What about your train performance?

In [None]:
# Reshape to additional dimension for single-channel image representation
x_train = x_train.reshape(*x_train.shape[:3], 1)
x_test = x_test.reshape(*x_test.shape[:3], 1)

# Build a simple convolutional neural network (CNN)
model_cnn = tf.keras.models.Sequential([
    tf.keras.layers.Conv2D(filters=4, kernel_size=3, activation='relu', input_shape=(28, 28, 1)),
    tf.keras.layers.Conv2D(filters=8, kernel_size=3, activation='relu'),
    tf.keras.layers.Conv2D(filters=8, kernel_size=3, activation='relu'),
    tf.keras.layers.Flatten(),
    tf.keras.layers.Dense(10, activation='softmax'),
])

model_cnn.summary()

model_cnn.compile(
    loss='sparse_categorical_crossentropy',
    optimizer='adam',
    metrics=['accuracy'],
)

# Train the CNN
model_cnn.fit(x_train, y_train, epochs=5, validation_data=(x_test, y_test))

# Evaluate the CNN on the test set
test_loss_cnn, test_accuracy_cnn = model_cnn.evaluate(x_test, y_test)
print(f'Test Loss (CNN): {test_loss_cnn:.4f}')
print(f'Test Accuracy (CNN): {test_accuracy_cnn:.4f}')


Model: "sequential_13"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 conv2d (Conv2D)             (None, 26, 26, 4)         40        
                                                                 
 conv2d_1 (Conv2D)           (None, 24, 24, 8)         296       
                                                                 
 conv2d_2 (Conv2D)           (None, 22, 22, 8)         584       
                                                                 
 flatten_13 (Flatten)        (None, 3872)              0         
                                                                 
 dense_29 (Dense)            (None, 10)                38730     
                                                                 
Total params: 39650 (154.88 KB)
Trainable params: 39650 (154.88 KB)
Non-trainable params: 0 (0.00 Byte)
_________________________________________________________________
Epoch 1/5
Epoch

# Exericse 3

Later during the semester, we will explore *convolutional neural networks*. For those of you finished with (1) and (2), you may try this now to improve your model; check https://www.tensorflow.org/api_docs/python/tf/keras/layers/Conv2D for details.

In [None]:
# To get you started 

# Reshape to additional dimension for single-channel image representation
x_train = x_train.reshape(*x_train.shape[:3], 1)
x_test = x_test.reshape(*x_test.shape[:3], 1)

# An example model
model_cnn = tf.keras.models.Sequential([
    tf.keras.layers.Conv2D(filters=4, kernel_size=3, activation='relu', input_shape=(28, 28, 1)),
    tf.keras.layers.Conv2D(filters=8, kernel_size=3, activation='relu'),
    tf.keras.layers.Conv2D(filters=8, kernel_size=3, activation='relu'),
    tf.keras.layers.Flatten(),
    tf.keras.layers.Dense(10, activation='softmax'),
])

model_cnn.summary()

model_cnn.compile(
    loss='sparse_categorical_crossentropy',
    optimizer='adam',
    metrics='accuracy',
)

model_cnn.fit(x_train,
                    y_train,
                    validation_data=(x_test, y_test),
                    epochs=50,
                    batch_size=256,
                    verbose=1)

model_cnn.evaluate(x_test, y_test)
print(f'Test Loss (CNN): {test_loss_cnn:.4f}')
print(f'Test Accuracy (CNN): {test_accuracy_cnn:.4f}')

Model: "sequential_14"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 conv2d_3 (Conv2D)           (None, 26, 26, 4)         40        
                                                                 
 conv2d_4 (Conv2D)           (None, 24, 24, 8)         296       
                                                                 
 conv2d_5 (Conv2D)           (None, 22, 22, 8)         584       
                                                                 
 flatten_14 (Flatten)        (None, 3872)              0         
                                                                 
 dense_30 (Dense)            (None, 10)                38730     
                                                                 
Total params: 39650 (154.88 KB)
Trainable params: 39650 (154.88 KB)
Non-trainable params: 0 (0.00 Byte)
_________________________________________________________________
Epoch 1/50
Epoc