# ECBE329 MNIST

## Convolutional Neural Networks

---

In this notebook, we train a Convolutional Neural Network to classify images from the MNIST database.
 Load MNIST Database

MNIST is one of the most famous datasets in the field of machine learning. 
 - It has 70,000 images of hand-written digits
 - Very straight forward to download
 - Images dimensions are 28x28
 - Grayscale images

In [None]:
from keras.datasets import mnist
import numpy as np
from sklearn.model_selection import train_test_split

# use Keras to import pre-shuffled MNIST database

(X_train, y_train), (X_test, y_test) = mnist.load_data()
x = np.concatenate((X_train, X_test))
y = np.concatenate((y_train, y_test)) # parse

test_size = 0.1
X_train, X_test, y_train, y_test = train_test_split(x, y, test_size=test_size, random_state=None)

print("The MNIST database has a training set of %d examples." % len(X_train))
print("The MNIST database has a test set of %d examples." % len(X_test))

The MNIST database has a training set of 63000 examples.
The MNIST database has a test set of 7000 examples.


In [None]:
# rescale to have values within 0 - 1 range [0,255] --> [0,1]
X_train = X_train.astype('float32')/255
X_test = X_test.astype('float32')/255   #Training data

print('X_train shape:', X_train.shape)
print(X_train.shape[0], 'train samples')
print(X_test.shape[0], 'test samples')

X_train shape: (63000, 28, 28)
63000 train samples
7000 test samples


In [None]:
from keras.utils import np_utils

num_classes = 10 
# print first ten (integer-valued) training labels
print('Integer-valued labels:')
print(y_train[:10])

# one-hot encode the labels
# convert class vectors to binary class matrices
y_train = np_utils.to_categorical(y_train, num_classes)
y_test = np_utils.to_categorical(y_test, num_classes)

# print first ten (one-hot) training labels
print('One-hot labels:')
print(y_train[:10])

Integer-valued labels:
[5 3 0 7 0 1 8 3 3 7]
One-hot labels:
[[0. 0. 0. 0. 0. 1. 0. 0. 0. 0.]
 [0. 0. 0. 1. 0. 0. 0. 0. 0. 0.]
 [1. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 1. 0. 0.]
 [1. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 1. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0. 1. 0.]
 [0. 0. 0. 1. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 1. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 1. 0. 0.]]


In [None]:
# input image dimensions 28x28 pixel images. 
img_rows, img_cols = 28, 28

X_train = X_train.reshape(X_train.shape[0], img_rows, img_cols, 1)
X_test = X_test.reshape(X_test.shape[0], img_rows, img_cols, 1)
input_shape = (img_rows, img_cols, 1)

print('input_shape: ', input_shape)
print('x_train shape:', X_train.shape)

input_shape:  (28, 28, 1)
x_train shape: (63000, 28, 28, 1)


## Import


In [None]:
from keras.models import Sequential
from keras.layers import Conv2D, MaxPooling2D, Flatten, Dense, Dropout, Conv3D
from sklearn.metrics import accuracy_score, r2_score
from sklearn.model_selection import train_test_split, KFold

### Define the Model Architecture

You must pass the following arguments:
- filters - The number of filters.
- kernel_size - Number specifying both the height and width of the (square) convolution window.

There are some additional, optional arguments that you might like to tune:

- strides - The stride of the convolution. If you don't specify anything, strides is set to 1.
- padding - One of 'valid' or 'same'. If you don't specify anything, padding is set to 'valid'.
- activation - Typically 'relu'. If you don't specify anything, no activation is applied. You are strongly encouraged to add a ReLU activation function to every convolutional layer in your networks.

** Things to remember ** 
- Always add a ReLU activation function to the **Conv2D** layers in your CNN. With the exception of the final layer in the network, Dense layers should also have a ReLU activation function.
- When constructing a network for classification, the final layer in the network should be a **Dense** layer with a softmax activation function. The number of nodes in the final layer should equal the total number of classes in the dataset.

## PART 1

In [None]:
# build the model object
model = Sequential()

model.add(Dense(28,	input_shape=(28,28,1),	activation='relu'))	

# flatten since too many dimensions, we only want a classification output
model.add(Flatten())

# FC_1: fully connected to get all relevant data
model.add(Dense(32, activation='relu'))

# output a softmax to squash the matrix into output probabilities for the 10 classes
model.add(Dense(10, activation='softmax'))

model.summary()

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense (Dense)                (None, 28, 28, 28)        56        
_________________________________________________________________
flatten (Flatten)            (None, 21952)             0         
_________________________________________________________________
dense_1 (Dense)              (None, 32)                702496    
_________________________________________________________________
dense_2 (Dense)              (None, 10)                330       
Total params: 702,882
Trainable params: 702,882
Non-trainable params: 0
_________________________________________________________________


In [None]:
# compile the model
model.compile(loss='categorical_crossentropy', optimizer='rmsprop', 
              metrics=['accuracy'])

In [None]:
# train the model
hist = model.fit(X_train, y_train, batch_size=10, epochs=10, verbose=2, shuffle=True)

Epoch 1/10
6300/6300 - 11s - loss: 0.2829 - accuracy: 0.9185
Epoch 2/10
6300/6300 - 10s - loss: 0.1772 - accuracy: 0.9503
Epoch 3/10
6300/6300 - 11s - loss: 0.1578 - accuracy: 0.9558
Epoch 4/10
6300/6300 - 10s - loss: 0.1490 - accuracy: 0.9593
Epoch 5/10
6300/6300 - 10s - loss: 0.1435 - accuracy: 0.9609
Epoch 6/10
6300/6300 - 10s - loss: 0.1404 - accuracy: 0.9615
Epoch 7/10
6300/6300 - 10s - loss: 0.1377 - accuracy: 0.9627
Epoch 8/10
6300/6300 - 10s - loss: 0.1366 - accuracy: 0.9630
Epoch 9/10
6300/6300 - 10s - loss: 0.1338 - accuracy: 0.9642
Epoch 10/10
6300/6300 - 10s - loss: 0.1314 - accuracy: 0.9646


In [None]:
# evaluate test accuracy
score = model.evaluate(X_test, y_test, verbose=0)
accuracy = 100*score[1]

# print test accuracy
print('Test accuracy: %.4f%%' % accuracy)

Test accuracy: 95.9000%


## PART2

In [None]:
# build the model object
model = Sequential()


# CONV_1: add CONV layer with RELU activation and depth = 32 kernels
model.add(Conv2D(20, kernel_size=(5, 5),padding="valid",activation='sigmoid',input_shape=input_shape))

# POOL_1: downsample the image to choose the best features 
model.add(MaxPooling2D(pool_size=(2,2)))

# flatten since too many dimensions, we only want a classification output
model.add(Flatten())

# FC_1: fully connected to get all relevant data
model.add(Dense(100, activation='sigmoid'))

# output a softmax to squash the matrix into output probabilities for the 10 classes
model.add(Dense(10, activation='softmax'))

model.summary()

Model: "sequential_1"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d (Conv2D)              (None, 24, 24, 20)        520       
_________________________________________________________________
max_pooling2d (MaxPooling2D) (None, 12, 12, 20)        0         
_________________________________________________________________
flatten_1 (Flatten)          (None, 2880)              0         
_________________________________________________________________
dense_3 (Dense)              (None, 100)               288100    
_________________________________________________________________
dense_4 (Dense)              (None, 10)                1010      
Total params: 289,630
Trainable params: 289,630
Non-trainable params: 0
_________________________________________________________________


In [None]:
# compile the model
model.compile(loss='categorical_crossentropy', optimizer='rmsprop', 
              metrics=['accuracy'])

In [None]:
# train the model
k=10
kf = KFold(n_splits=k,shuffle=True,random_state=1)
for train_index, test_index in kf.split(X_train,y_train):  #10 folds
  Xtrain, X_val = X_train[train_index], X_train[test_index]
  ytrain, y_val = y_train[train_index], y_train[test_index]
  model.fit(Xtrain, ytrain, batch_size=32, epochs=1, verbose=2, shuffle=True)
  pred_values = model.predict(X_val)
  acc = r2_score(y_val,pred_values)




1772/1772 - 6s - loss: 0.6027 - accuracy: 0.8050
1772/1772 - 4s - loss: 0.1717 - accuracy: 0.9483
1772/1772 - 4s - loss: 0.1079 - accuracy: 0.9668
1772/1772 - 4s - loss: 0.0812 - accuracy: 0.9747
1772/1772 - 4s - loss: 0.0659 - accuracy: 0.9797
1772/1772 - 4s - loss: 0.0549 - accuracy: 0.9827
1772/1772 - 4s - loss: 0.0494 - accuracy: 0.9845
1772/1772 - 4s - loss: 0.0444 - accuracy: 0.9860
1772/1772 - 4s - loss: 0.0407 - accuracy: 0.9879
1772/1772 - 4s - loss: 0.0376 - accuracy: 0.9887


In [None]:
# evaluate test accuracy
score = model.evaluate(X_test, y_test, verbose=0)
accuracy = 100*score[1]

# print test accuracy
print('Test accuracy: %.4f%%' % accuracy)

Test accuracy: 98.1286%


## Part 3

In [None]:
# build the model object
model = Sequential()

# CONV_1: add CONV layer with RELU activation and depth = 32 kernels
model.add(Conv2D(20, kernel_size=(5, 5), padding='valid',activation='relu',input_shape=input_shape))

# POOL_1: downsample the image to choose the best features 
model.add(MaxPooling2D(pool_size=(2,2))) #pooling

model.add(Conv2D(40, kernel_size=(5, 5), padding='valid',activation='relu'))
model.add(MaxPooling2D(pool_size=(2,2)))



# flatten since too many dimensions, we only want a classification output
model.add(Flatten())

# FC_1: fully connected to get all relevant data
model.add(Dense(100, activation='relu'))

# output a softmax to squash the matrix into output probabilities for the 10 classes
model.add(Dense(10, activation='softmax'))

model.summary()

Model: "sequential_2"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d_1 (Conv2D)            (None, 24, 24, 20)        520       
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 12, 12, 20)        0         
_________________________________________________________________
conv2d_2 (Conv2D)            (None, 8, 8, 40)          20040     
_________________________________________________________________
max_pooling2d_2 (MaxPooling2 (None, 4, 4, 40)          0         
_________________________________________________________________
flatten_2 (Flatten)          (None, 640)               0         
_________________________________________________________________
dense_5 (Dense)              (None, 100)               64100     
_________________________________________________________________
dense_6 (Dense)              (None, 10)               

In [None]:
# compile the model
model.compile(loss='categorical_crossentropy', optimizer='rmsprop', 
              metrics=['accuracy'])

In [None]:
# train the model
k=10
kf = KFold(n_splits=k,shuffle=True,random_state=1)
for train_index, test_index in kf.split(X_train,y_train):
  Xtrain, X_val = X_train[train_index], X_train[test_index] # 10 folds
  ytrain, y_val = y_train[train_index], y_train[test_index]
  model.fit(Xtrain, ytrain, batch_size=32, epochs=1, verbose=2, shuffle=True)
  pred_values = model.predict(X_val)
  acc = r2_score(y_val,pred_values)



1772/1772 - 5s - loss: 0.1313 - accuracy: 0.9596
1772/1772 - 5s - loss: 0.0420 - accuracy: 0.9869
1772/1772 - 5s - loss: 0.0307 - accuracy: 0.9907
1772/1772 - 5s - loss: 0.0256 - accuracy: 0.9923
1772/1772 - 5s - loss: 0.0212 - accuracy: 0.9938
1772/1772 - 5s - loss: 0.0183 - accuracy: 0.9946
1772/1772 - 5s - loss: 0.0166 - accuracy: 0.9954
1772/1772 - 5s - loss: 0.0138 - accuracy: 0.9962
1772/1772 - 5s - loss: 0.0130 - accuracy: 0.9962
1772/1772 - 5s - loss: 0.0120 - accuracy: 0.9968


In [None]:
# evaluate test accuracy
score = model.evaluate(X_test, y_test, verbose=0)
accuracy = 100*score[1]

# print test accuracy
print('Test accuracy: %.4f%%' % accuracy)

Test accuracy: 99.0286%
