#### Improvements that will be made to LeCun's 1998 LeNet5 Model

- 32 filters in first convolutional layer, and 16 filters in the second one
- Only need to run max-pooling once (not twice)
- Leverage ReLU and dropout (didn't exist back then)

In [1]:
import tensorflow
from tensorflow.keras.datasets import mnist
from tensorflow.keras.utils import to_categorical
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Input
from tensorflow.keras.layers import Dropout, BatchNormalization
from tensorflow.keras.layers import Conv2D, MaxPooling2D, Flatten #2D convolutional filter because of 2D image 
from matplotlib import pyplot as plt #for visualization purposes


#all image detection (even colour) uses 2D convolutional filter even though it needs components R,G,B
# this is because each layer just has 3 filters (one for each colour) rather than needing a separate dimension


#### General sturcture

Starts off with convolutional layers, then switches to dense layers for the last couple layers before output. Therefore need to flatten our data into 1D so that it is compatible with the dense layer (dense layer can only accept 1D inputs)

#### Load data

In [2]:
(x_train, y_train), (x_valid, y_valid) = mnist.load_data()

#### Preprocess data

Don't need to flatten the data like before, since we are passing the input arrays into the convolutional layers first, which require 2D not 1D inputs (for 2D convolution)

In [3]:
x_train = x_train.reshape(60000, 28, 28, 1).astype('float32') #28x28 unflattened image 

# ^the 1 indicates black/white (not 3 for colour RGB)

x_valid = x_valid.reshape(10000, 28, 28, 1).astype('float32')

In [4]:
x_train /= 255 #normalize data to fall between 0/1
x_valid /= 255

In [5]:
y_train = to_categorical(y_train, 10)
y_valid = to_categorical(y_valid, 10)


print(x_train.shape)
print(x_valid.shape)

print(y_train.shape)
print(y_valid.shape)
print(y_valid[0])

(60000, 28, 28, 1)
(10000, 28, 28, 1)
(60000, 10)
(10000, 10)
[0. 0. 0. 0. 0. 0. 0. 1. 0. 0.]


#### Design the neural network architecture

In [6]:
model = Sequential()


#first hidden layer
model.add(Input(shape=(28, 28, 1)))
model.add(Conv2D(32, kernel_size=(3,3), activation='relu')) #kernel size is dimensions of convolutional filter


#second hidden layer
model.add(Conv2D(64, kernel_size=(3,3), activation='relu'))
model.add(MaxPooling2D(pool_size=(2,2))) #by default, will have a stride size of 2 
model.add(BatchNormalization())
model.add(Dropout(0.2))
model.add(Flatten()) #turn into dense layer 1D input


#third hidden layer
model.add(Dense(128, activation='relu'))
model.add(BatchNormalization())
model.add(Dropout(0.5)) #apply more dropout to deeper layers, since more prone to memorization/overfitting

#output layer:
model.add(Dense(10, activation='softmax'))


In [7]:
model.summary()

#### Observations

- Has a LOT more parameters than a typical purely-dense network
- Conv2D portion doesn't actually contribute that many parameters despite being computationally heavy


#### Configuring the model

In [8]:
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

#### Training the model

In [None]:
model.fit(x_train, y_train, batch_size=128, epochs=10, verbose=1, validation_data=(x_valid, y_valid))

Epoch 1/10
[1m469/469[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m29s[0m 61ms/step - accuracy: 0.9070 - loss: 0.3197 - val_accuracy: 0.9743 - val_loss: 0.2097
Epoch 2/10
[1m193/469[0m [32m━━━━━━━━[0m[37m━━━━━━━━━━━━[0m [1m19s[0m 71ms/step - accuracy: 0.9827 - loss: 0.0587

#### ^ takes a really long time (without GPU to help boost computational speed)

#### Model Evalutation:

In [None]:
model.evaluate(x_valid, y_valid)

#### Evaluate a classification

In [None]:
import random
random_sample_idx = random.randint(0, len(x_valid - 1))

test_valid = x_valid[random_sample_idx].reshape(1, 28, 28)

In [None]:
model.predict(test_valid)

In [None]:
import numpy as np
np.argmax(model.predict(test_valid), axis=-1) #gets the highest probability in the output arr

In [None]:
plt.imshow(x_valid[random_sample_idx], cmap='Greys')