Assignment 7: Neural Networks using Keras and Tensorflow Please see the associated document for questions

If you have problems with Keras and Tensorflow on your local installation please make sure they are updated. On Google Colab this notebook runs.

In [None]:
pip install tensorflow

In [16]:
# imports
from __future__ import print_function
import keras
from keras import utils as np_utils
import tensorflow
from keras.datasets import mnist
from keras.models import Sequential
from keras.layers import Dense, Dropout, Flatten
from keras.layers import Conv2D, MaxPooling2D
from keras import backend as K
import tensorflow as tf
from tensorflow.keras import regularizers
from matplotlib import pyplot as plt



In [5]:
# Hyper-parameters data-loading and formatting

batch_size = 128
num_classes = 10
epochs = 10

img_rows, img_cols = 28, 28

(x_train, lbl_train), (x_test, lbl_test) = mnist.load_data()



In [6]:
if K.image_data_format() == 'channels_first':
    x_train = x_train.reshape(x_train.shape[0], 1, img_rows, img_cols)
    x_test = x_test.reshape(x_test.shape[0], 1, img_rows, img_cols)
    input_shape = (1, img_rows, img_cols)
else:
    x_train = x_train.reshape(x_train.shape[0], img_rows, img_cols, 1)
    x_test = x_test.reshape(x_test.shape[0], img_rows, img_cols, 1)
    input_shape = (img_rows, img_cols, 1)

**Preprocessing**

In [7]:
x_train = x_train.astype('float32')
x_test = x_test.astype('float32')

x_train /= 255
x_test /= 255

y_train = keras.utils.np_utils.to_categorical(lbl_train, num_classes)
y_test = keras.utils.np_utils.to_categorical(lbl_test, num_classes)


In [11]:

## Define model ##
model = Sequential()

model.add(Flatten()) # since we want all 28x28 pixels on one row
model.add(Dense(64, activation = 'relu'))
model.add(Dense(64, activation = 'relu'))
model.add(Dense(num_classes, activation='softmax'))


model.compile(loss=keras.losses.categorical_crossentropy,
               optimizer=tensorflow.keras.optimizers.SGD(learning_rate = 0.1),
        metrics=['accuracy'],)

fit_info = model.fit(x_train, y_train,
           batch_size=batch_size,
           epochs=epochs,
           verbose=1,
           validation_data=(x_test, y_test))
score = model.evaluate(x_test, y_test, verbose=0)
print('Test loss: {}, Test accuracy {}'.format(score[0], score[1]))

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
Test loss: 0.08893831074237823, Test accuracy 0.9721999764442444


1. Preprocessing. In the notebook, the data is downloaded from an external server and imported into the notebook environment using the mnist.load_data() function call.

1.1. Explain the data pre-processing highlighted in the notebook

2. Network model, training, and changing hyper-parameters

2.1. How many layers does the network in the notebook have? How many neurons does
each layer have? What activation functions and why are these appropriate for this application? What is the total number of parameters for the network? Why do the input
and output layers have the dimensions they have?

It has four layers, the input layer has 784 (28*28) inputs, the two hidden layers in the middle have 64 neurons, the last one has 10 neurons, one representing each number from 0 to 9

2.2. What loss function is used to train the network? What is the functional form (a mathematical expression) of the loss function? and how should we interpret it? Why is it
appropriate for the problem at hand? 

Categorical crossentropy is the loss function used and it's mathematical formula is written as follows:
$$L_{CE}(s_1) = {-}\sum_{i=1}^{n} T_i Log(S_i)$$ 

Were $S$ is a vector with the output values after softmax application defined as: 
$$Softmax(Z) = \frac{exp(z_j)}{\sum{i=1}{n}exp(z_i)} for j = 1,...,n $$ 

where Z is the output layer.

$T$ is a vector with corresponding expected value or true value. For our case only one value in the vector $T$ will be equal to 1 and the rest is 0. 
This means that categorical crossentropy only depends on $S_i$ if  $T_i = 1$. as $S_i$ approaches 1, the value of $L_{CE}$ will go to zero, and as the value of $S_i$ approaches 0 $L_{CE}$ goes to infinity

Categorical crossentropy is good for our model since we want our model to choose between 10 different classes, and the categorical crossentropy function works for an arbitrary number of classes, as opposed to the binary crossentropy function which can only put things into two categories.

Softmax is continously differentiable function, which means that we can take the partial derivative of $$L_{CE}$$ and any of the weights to minimize the value. (why? how?)



2.3. Train the network for 10 epochs and plot the training and validation accuracy for each
epoch. 


2.4. Update the model to implement a three-layer neural network where the hidden layers
have 500 and 300 hidden units respectively. Train for 40 epochs. What is the best validation accuracy you can achieve? – Geoff Hinton (a co-pioneer of Deep learning)
claimed this network could reach a validation accuracy of 0.9847
(http://yann.lecun.com/exdb/mnist/) using weight decay (L2 regularization of weights 2

(kernels): https://keras.io/api/layers/regularizers/). Implement weight decay on hidden units and train and select 5 regularization factors from 0.000001 to 0.001. Train 3
replicates networks for each regularization factor. Plot the final validation accuracy
with standard deviation (computed from the replicates) as a function of the regularization factor. How close do you get to Hintons result? – If you do not get the same
results, what factors may influence this? (hint: What information is not given by Hinton
on the MNIST database that may influence Model training)


L1 makes the network sparse, the sum of the absolute values are added to the loss
 L2 add the sum of the squared values of the weights to the loss, punishment is more severe
 
 alpha parameter: how much attention to pay to this penalty

In [14]:
epochs = 60

model = Sequential()

model.add(Flatten()) # since we want all 28x28 pixels on one row
model.add(Dense(500, activation = 'relu'))
model.add(Dense(300, activation = 'relu'))
model.add(Dense(num_classes, activation='softmax', kernel_regularizer=regularizers.L1(0.001)))


model.compile(loss=keras.losses.categorical_crossentropy,
               optimizer=tensorflow.keras.optimizers.SGD(learning_rate = 0.1),
        metrics=['accuracy'],)

fit_info = model.fit(x_train, y_train,
           batch_size=batch_size,
           epochs=epochs,
           verbose=1,
           validation_data=(x_test, y_test))
score = model.evaluate(x_test, y_test, verbose=0)
print('Test loss: {}, Test accuracy {}'.format(score[0], score[1]))

Epoch 1/60
Epoch 2/60
Epoch 3/60
Epoch 4/60
Epoch 5/60
Epoch 6/60
Epoch 7/60
Epoch 8/60
Epoch 9/60
Epoch 10/60
Epoch 11/60
Epoch 12/60
Epoch 13/60
Epoch 14/60
Epoch 15/60
Epoch 16/60
Epoch 17/60
Epoch 18/60
Epoch 19/60
Epoch 20/60
Epoch 21/60
Epoch 22/60
Epoch 23/60
Epoch 24/60
Epoch 25/60
Epoch 26/60
Epoch 27/60
Epoch 28/60
Epoch 29/60
Epoch 30/60
Epoch 31/60
Epoch 32/60
Epoch 33/60
Epoch 34/60
Epoch 35/60
Epoch 36/60
Epoch 37/60
Epoch 38/60
Epoch 39/60
Epoch 40/60
Epoch 41/60
Epoch 42/60
Epoch 43/60
Epoch 44/60
Epoch 45/60
Epoch 46/60
Epoch 47/60
Epoch 48/60
Epoch 49/60
Epoch 50/60
Epoch 51/60
Epoch 52/60
Epoch 53/60
Epoch 54/60
Epoch 55/60
Epoch 56/60
Epoch 57/60
Epoch 58/60


Epoch 59/60
Epoch 60/60
Test loss: 0.0725211650133133, Test accuracy 0.9819999933242798


In [None]:
epochs = 60

model = Sequential()

model.add(Flatten()) # since we want all 28x28 pixels on one row
model.add(Dense(500, activation = 'relu'))
model.add(Dense(300, activation = 'relu'))
model.add(Dense(num_classes, activation='softmax'))


model.compile(loss=keras.losses.categorical_crossentropy,
               optimizer=tensorflow.keras.optimizers.SGD(learning_rate = 0.1),
        metrics=['accuracy'],)

fit_info = model.fit(x_train, y_train,
           batch_size=batch_size,
           epochs=epochs,
           verbose=1,
           validation_data=(x_test, y_test))
score = model.evaluate(x_test, y_test, verbose=0)
print('Test loss: {}, Test accuracy {}'.format(score[0], score[1]))

The value accuracy seems to converge at  .9816 accuracy, and the loss function seems to converge at around .726

3. Convolutional layers

3.1. Design a model that makes use of at least one convolutional layer – how performant a
model can you get? -- According to the MNIST database it should be possible reach to
99% accuracy on the validation data. If you choose to use any layers apart from the
convolutional layers and layers that you used in previous questions, you must describe
what they do. If you do not reach 99% accuracy, report your best performance, and
explain your attempts and thought process.

3.2. Discuss the differences and potential benefits of using convolutional layers over fully
connected ones for the application?

In [None]:
epochs = 60

model = Sequential()
model.add(Conv2D(28, (3, 3), activation='relu', input_shape=(28, 28, 1)))
model.add(MaxPooling2D((2, 2)))
model.add(Conv2D(56, (3, 3), activation='relu'))
model.add(MaxPooling2D((2, 2)))
model.add(Conv2D(56, (3, 3), activation='relu'))
model.add(Flatten()) # since we want all 28x28 pixels on one row
model.add(Dense(500, activation = 'relu'))
model.add(Dense(300, activation = 'relu', 
          kernel_regularizer=regularizers.L1(0.00001)))
model.add(Dense(num_classes, activation='softmax'))


model.compile(loss=keras.losses.categorical_crossentropy,
               optimizer=tensorflow.keras.optimizers.SGD(learning_rate = 0.1),
        metrics=['accuracy'],)

fit_info = model.fit(x_train, y_train,
           batch_size=batch_size,
           epochs=epochs,
           verbose=1,
           validation_data=(x_test, y_test))
score = model.evaluate(x_test, y_test, verbose=0)
print('Test loss: {}, Test accuracy {}'.format(score[0], score[1]))

Epoch 1/60
Epoch 2/60
Epoch 3/60
Epoch 4/60
Epoch 5/60
Epoch 6/60
Epoch 7/60
Epoch 8/60
Epoch 9/60
Epoch 10/60
Epoch 11/60
Epoch 12/60
Epoch 13/60
Epoch 14/60
Epoch 15/60
Epoch 16/60
Epoch 17/60
Epoch 18/60
Epoch 19/60
Epoch 20/60
Epoch 21/60
Epoch 22/60
Epoch 23/60

Epoch 1/60


ValueError: in user code:

    File "C:\Users\eliuh\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\keras\engine\training.py", line 1284, in train_function  *
        return step_function(self, iterator)
    File "C:\Users\eliuh\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\keras\engine\training.py", line 1268, in step_function  **
        outputs = model.distribute_strategy.run(run_step, args=(data,))
    File "C:\Users\eliuh\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\keras\engine\training.py", line 1249, in run_step  **
        outputs = model.train_step(data)
    File "C:\Users\eliuh\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\keras\engine\training.py", line 1050, in train_step
        y_pred = self(x, training=True)
    File "C:\Users\eliuh\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\keras\utils\traceback_utils.py", line 70, in error_handler
        raise e.with_traceback(filtered_tb) from None
    File "C:\Users\eliuh\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\keras\engine\input_spec.py", line 253, in assert_input_compatibility
        raise ValueError(

    ValueError: Exception encountered when calling layer 'sequential_16' (type Sequential).
    
    Input 0 of layer "conv2d_2" is incompatible with the layer: expected min_ndim=4, found ndim=2. Full shape received: (None, 784)
    
    Call arguments received by layer 'sequential_16' (type Sequential):
      • inputs=tf.Tensor(shape=(None, 28, 28, 1), dtype=float32)
      • training=True
      • mask=None
