# Neural Networks with Tensorflow

In this assignment, we are going to train a Neural Networks on the Japanese MNIST dataset. It is composed of 70000 images of handwritten Hiragana characters. The target variables has 10 different classes.

Each image is of dimension 28 by 28. But we will flatten them to form a dataset composed of vectors of dimension (784, 1). The training process will be similar as for a structured dataset.

<img src='https://drive.google.com/uc?id=16TqEl9ESfXYbUpVafXD6h5UpJYGKfMxE' width="500" height="200">

Your goal is to run at least 3 experiments and get a model that can achieve 80% accuracy with not much overfitting on this dataset.

Some of the code have already been defined for you. You need only to add your code in the sections specified (marked with **TODO**). Some assert statements have been added to verify the expected outputs are correct. If it does throw an error, this means your implementation is behaving as expected.

# 1. Import Required Packages

[1.1] We are going to use numpy, matplotlib and google.colab packages

In [None]:
from google.colab import drive
import numpy as np
import matplotlib.pyplot as plt

# 2. Download Dataset

We will store the dataset into your personal Google Drive.


[2.1] Mount Google Drive

In [None]:
drive.mount('/content/gdrive')

[2.2] Create a folder called `DL_ASG_1` on your Google Drive at the root level

In [None]:
! mkdir -p /content/gdrive/MyDrive/DL_ASG_1

[2.3] Navigate to this folder

In [None]:
% cd /content/gdrive/MyDrive/DL_ASG_1

[2.4] Dowload the dataset files to your Google Drive if required

In [None]:
import requests
from tqdm import tqdm
import os.path

def download_file(url):
    path = url.split('/')[-1]
    if os.path.isfile(path):
        print (f"{path} already exists")
    else:
      r = requests.get(url, stream=True)
      with open(path, 'wb') as f:
          total_length = int(r.headers.get('content-length'))
          print('Downloading {} - {:.1f} MB'.format(path, (total_length / 1024000)))
          for chunk in tqdm(r.iter_content(chunk_size=1024), total=int(total_length / 1024) + 1, unit="KB"):
              if chunk:
                  f.write(chunk)

url_list = [
    'http://codh.rois.ac.jp/kmnist/dataset/kmnist/kmnist-train-imgs.npz',
    'http://codh.rois.ac.jp/kmnist/dataset/kmnist/kmnist-train-labels.npz',
    'http://codh.rois.ac.jp/kmnist/dataset/kmnist/kmnist-test-imgs.npz',
    'http://codh.rois.ac.jp/kmnist/dataset/kmnist/kmnist-test-labels.npz'
]

for url in url_list:
    download_file(url)

[2.5] List the content of the folder and confirm files have been dowloaded properly

In [None]:
! ls

# 3. Load Data

[3.1] Import the required modules from Tensorflow

In [None]:
import tensorflow 
from tensorflow import keras
from keras.models import Sequential
from keras.layers import Dense, Dropout, Flatten
from keras import backend as K
from keras.callbacks import ReduceLROnPlateau, EarlyStopping

# extra packages utilised

# plot model
from keras.utils.vis_utils import plot_model

# confusion matrix packages
from sklearn.metrics import confusion_matrix # for confusion matrix
import seaborn as sns # for confusion matrix readability

# Early learning stop and saving best model
from keras.callbacks import ModelCheckpoint
from keras.models import load_model


# Convolutional neural network stuff
from keras.layers import Conv2D, Lambda, MaxPooling2D # convolution layers
from keras.layers import BatchNormalization

# Synthetic data creation
from keras.preprocessing.image import ImageDataGenerator

# time execution
import timeit

[3.2] **TODO** Create 2 variables called `img_height` and `img_width` that will both take the value 28

In [None]:
# TODO (Students need to fill this section)
img_height = 28
img_width = 28

[3.3] Create a function that loads a .npz file using numpy and return the content of the `arr_0` key

In [None]:
def load(f):
    return np.load(f)['arr_0']

[3.4] **TODO** Load the 4 files saved on your Google Drive into their respective variables: x_train, y_train, x_test and y_test

In [None]:
# TODO (Students need to fill this section)
x_train = load('kmnist-train-imgs.npz')
x_test = load('kmnist-test-imgs.npz')
y_train = load('kmnist-train-labels.npz')
y_test = load('kmnist-test-labels.npz')

[3.5] **TODO** Using matplotlib display the first image from the train set and its target value

In [None]:
# TODO (Students need to fill this section)

plt.imshow(x_train[0])
print('Label', y_train[0])

# 4. Prepare Data

[4.1] **TODO** Reshape the images from the training and testing set to have the channel dimension last. The dimensions should be: (row_number, height, width, channel)

In [None]:
# TODO
x_train = np.reshape(x_train,(x_train.shape[0],img_height,img_width,1))
x_test = np.reshape(x_test,(x_test.shape[0],img_height,img_width,1))

[4.2] **TODO** Cast `x_train` and `x_test` into `float32` decimals

In [None]:
# TODO (Students need to fill this section)
x_train = x_train.astype('float32')
x_test = x_test.astype('float32')

[4.3] **TODO** Standardise the images of the training and testing sets. Originally each image contains pixels with value ranging from 0 to 255. after standardisation, the new value range should be from 0 to 1.

In [None]:
# TODO (Students need to fill this section)

# grayscale
x_train = x_train / 255
x_test = x_test / 255

[4.4] **TODO** Create a variable called `num_classes` that will take the value 10 which corresponds to the number of classes for the target variable

In [None]:
# TODO (Students need to fill this section)
num_classes = 10

Get first occurances of each character type

In [None]:
# create aray of zeros
list_of_positions = np.zeros((num_classes,), dtype=int)

# find first occurance of each letter type by number label
for i in range(10): # number 0 -9 = 10 unique numbers
  result = np.where(y_train == i) # see which y_train label has the number we want
  list_of_positions[i] = result[0][0] # get first occurance and change initial array value to that index value

# create figure 5x2 for showing images
fig=plt.figure(figsize=(num_classes,7))
columns = 5
rows = 2

# create subplots
for i in range(num_classes):
    fig.add_subplot(rows, columns, i+1)
    plt.imshow(x_train[list_of_positions[i]].squeeze(), cmap=plt.get_cmap('gray')) # black and white
    plt.axis('off') # turn off axis
    plt.title(i)  # label just needs to be i as thats how the list was created
plt.show()

[4.5] **TODO** Convert the target variable for the training and testing sets to a binary class matrix of dimension (rows, num_classes).

For example:
- class 0 will become [1, 0, 0, 0, 0, 0, 0, 0, 0, 0] 
- class 1 will become [0, 1, 0, 0, 0, 0, 0, 0, 0, 0] 
- class 5 will become [0, 0, 0, 0, 0, 1, 0, 0, 0, 0] 
- class 9 will become [0, 0, 0, 0, 0, 0, 0, 0, 0, 1] 

In [None]:
# TODO (Students need to fill this section)

# converts labels to one hot encoder format using np.eye
y_train = np.eye(num_classes)[y_train]
y_test = np.eye(num_classes)[y_test]

# 5. Define Neural Networks Architecure

[5.1] Set the seed for Tensorflow Keras



In [None]:
keras.utils.set_random_seed(1)

[5.2] **TODO** Define the architecture of your Neural Networks and save it into a variable called `model`

In [None]:
# TODO (Students need to fill this section)

model_naive = Sequential() # seems to be widely used

# Input layer  
model_naive.add(keras.Input(shape=(28, 28, 1)))
model_naive.add(Flatten())

# 1st hidden layer
model_naive.add(Dense(512, activation="relu"))

#  2nd hidden layer
model_naive.add(Dense(512, activation="relu"))

# Final  layer
model_naive.add(Dense(num_classes,activation="softmax"))
# To get a probability distribution, values should add to 1

[5.2] **TODO** Print the summary of your model

In [None]:
# TODO (Students need to fill this section)
model_naive.summary()

In [None]:
# plot model_naive 
plot_model(model_naive, to_file='model__naive_plot.png', show_shapes=True, show_layer_names=True)

# 6. Train Neural Networks

[6.1] **TODO** Create 2 variables called `batch_size` and `epochs` that will  respectively take the values 128 and 500

In [None]:
# TODO (Students need to fill this section)
batch_size = 128
epochs = 500

[6.2] **TODO** Compile your model with the appropriate loss function, the optimiser of your choice and the accuracy metric

In [None]:
# TODO (Students need to fill this section)
# try adam and sgd 
# do a list for learning rate as well - hyperparamter tuning
model_naive.compile(loss='categorical_crossentropy', optimizer=tensorflow.keras.optimizers.Adam(learning_rate=1e-2),metrics=['accuracy'])

[6.3] **TODO** Fit your model using the number of epochs defined. SAve the ouput to a variable called `history`. You can set up some callbacks if you wish.

In [None]:
# TODO (Students need to fill this section)

# adding callbacks

# reduce learning rate if val_loss reduction hits a limit
reduce_lr = ReduceLROnPlateau(monitor='val_loss', factor=0.2,
                              patience=5, min_lr=0.00001)

# implement early stopping to limit time spent training and over fitting
earlystoppings = EarlyStopping(monitor='val_loss', mode='min', verbose=1, patience=10) # patience larger than reduceLR

# save he best model loaded in later steps
modelcheckpoint = ModelCheckpoint('best_model_naive.h5', monitor='val_accuracy', mode='max', verbose=1, save_best_only=True)

# train #
model_naive.fit(x_train, y_train,
          batch_size=batch_size, epochs=epochs, verbose=1,validation_data=(x_test, y_test),callbacks=[reduce_lr,earlystoppings,modelcheckpoint])

# save history
history_naive = model_naive.history

[6.4] Save the weights of your model

In [None]:
model_naive.save_weights('./checkpoints/my_checkpoint')

# load the saved model
model_naive = load_model('best_model_naive.h5')

# 7. Analyse Results

Create functions

In [None]:
# functions to display results

def print_results(model,x_train, y_train,x_test, y_test):

  # evaluate model on both train and test sets
  # ouputs are score and accuracy
  # get for both train and test set
  score_train = model.evaluate(x_train, y_train)
  print('train score:', score_train[0])
  print('train accuracy:', score_train[1]) 

  score_test = model.evaluate(x_test, y_test)
  print('test score:', score_test[0])
  print('test accuracy:', score_test[1]) 
  


def plot_results(model):
    # use input to get model data containing metrics
    # plot metrics for both train and test on accuracy and loss
    history = model
    plt.plot(history.history['accuracy'])
    plt.plot(history.history['val_accuracy'])
    plt.title('model accuracy')
    plt.ylabel('accuracy')
    plt.xlabel('epoch')
    plt.legend(['train', 'test'], loc='upper right')
    plt.show()

    plt.plot(history.history['loss'])
    plt.plot(history.history['val_loss'])
    plt.title('model loss')
    plt.ylabel('loss')
    plt.xlabel('epoch')
    plt.legend(['train', 'test'], loc='upper left')
    plt.show()

def create_confusion_matrix(model,x_test,y_test):
  
    # confusion matrix
    # run prediction on x_test and compare to y_test (true labels)
    predictions = model.predict(x_test)
    y_pred_argmax = np.argmax(predictions, axis=1)
    y_test_argmax = np.argmax(y_test, axis=1)

    conf_matrix = confusion_matrix(y_test_argmax, y_pred_argmax)

    # Plot Confusion matrix use sns to add colour
    sns.heatmap(conf_matrix.T, square=True, annot=True, cbar=False, cmap=plt.cm.Blues)
    plt.xlabel('Predicted Values')
    plt.ylabel('True Values');
    plt.show();   

[7.1] **TODO** Display the performance of your model on the training and testing sets

In [None]:
# TODO
print_results(model_naive,x_train, y_train,x_test, y_test)

[7.2] **TODO** Plot the learning curve of your model

In [None]:
# TODO (Students need to fill this section)
plot_results(history_naive)

[7.3] **TODO** Display the confusion matrix on the testing set predictions

In [None]:
# TODO (Students need to fill this section)
create_confusion_matrix(model_naive,x_test,y_test)

# Experiment 2: Adding layers from 2 up to 10

4 hidden layers

In [None]:
%%script echo skipping=1
# Same model as naive but wit 4 layers
model_naive_4_layers = Sequential() # seems to be widely used

# Input layer  
model_naive_4_layers.add(keras.Input(shape=(28, 28, 1)))
model_naive_4_layers.add(Flatten())

# 1st hidden layer
model_naive_4_layers.add(Dense(512, activation="relu"))
#  2nd hidden layer
model_naive_4_layers.add(Dense(512, activation="relu"))
# 3rd hidden layer
model_naive_4_layers.add(Dense(512, activation="relu"))
#  4th hidden layer
model_naive_4_layers.add(Dense(512, activation="relu"))

# Final  layer
model_naive_4_layers.add(Dense(num_classes,activation="softmax"))

# compile model
model_naive_4_layers.compile(loss='categorical_crossentropy', optimizer=tensorflow.keras.optimizers.Adam(learning_rate=1e-2),metrics=['accuracy'])

# save he best model loaded in later steps
modelcheckpoint = ModelCheckpoint('best_model_naive_4_layers.h5', monitor='val_accuracy', mode='max', verbose=1, save_best_only=True)

# train #
model_naive_4_layers.fit(x_train, y_train,
          batch_size=batch_size, epochs=epochs, verbose=1,validation_data=(x_test, y_test),callbacks=[reduce_lr,earlystoppings,modelcheckpoint])

# save history
history_naive_4_layers = model_naive_4_layers.history

# print results
print_results(model_naive_4_layers,x_train, y_train,x_test, y_test)

plot_results(history_naive_4_layers)

In [None]:
%%script echo skipping=1
# show model
model_naive_4_layers.summary()

6 hidden layers

In [None]:
%%script echo skipping=1
# Same model as naive but with 6 layers
model_naive_6_layers = Sequential() # seems to be widely used

# Input layer  
model_naive_6_layers.add(keras.Input(shape=(28, 28, 1)))
model_naive_6_layers.add(Flatten())

# 1st hidden layer
model_naive_6_layers.add(Dense(512, activation="relu"))
#  2nd hidden layer
model_naive_6_layers.add(Dense(512, activation="relu"))
# 3rd hidden layer
model_naive_6_layers.add(Dense(512, activation="relu"))
#  4th hidden layer
model_naive_6_layers.add(Dense(512, activation="relu"))
# 5th hidden layer
model_naive_6_layers.add(Dense(512, activation="relu"))
#  6th hidden layer
model_naive_6_layers.add(Dense(512, activation="relu"))
# Final  layer
model_naive_6_layers.add(Dense(num_classes,activation="softmax"))

# compile model
model_naive_6_layers.compile(loss='categorical_crossentropy', optimizer=tensorflow.keras.optimizers.Adam(learning_rate=1e-2),metrics=['accuracy'])

# save he best model loaded in later steps
modelcheckpoint = ModelCheckpoint('best_model_naive_6_layers.h5', monitor='val_accuracy', mode='max', verbose=1, save_best_only=True)

# train #
model_naive_6_layers.fit(x_train, y_train,
          batch_size=batch_size, epochs=epochs, verbose=1,validation_data=(x_test, y_test),callbacks=[reduce_lr,earlystoppings,modelcheckpoint])

# save history
history_naive_6_layers = model_naive_6_layers.history

# print results
print_results(model_naive_6_layers,x_train, y_train,x_test, y_test)

plot_results(history_naive_6_layers)

In [None]:
%%script echo skipping=1
# show model
model_naive_6_layers.summary()

8 hidden layers

In [None]:
%%script echo skipping=1
# Same model as naive but with 8 layers
model_naive_8_layers = Sequential() # seems to be widely used

# Input layer  
model_naive_8_layers.add(keras.Input(shape=(28, 28, 1)))
model_naive_8_layers.add(Flatten())

# 1st hidden layer
model_naive_8_layers.add(Dense(512, activation="relu"))
#  2nd hidden layer
model_naive_8_layers.add(Dense(512, activation="relu"))
# 3rd hidden layer
model_naive_8_layers.add(Dense(512, activation="relu"))
#  4th hidden layer
model_naive_8_layers.add(Dense(512, activation="relu"))
# 5th hidden layer
model_naive_8_layers.add(Dense(512, activation="relu"))
#  6th hidden layer
model_naive_8_layers.add(Dense(512, activation="relu"))
# 7th hidden layer
model_naive_8_layers.add(Dense(512, activation="relu"))
#  8th hidden layer
model_naive_8_layers.add(Dense(512, activation="relu"))
# Final  layer
model_naive_8_layers.add(Dense(num_classes,activation="softmax"))

# compile model
model_naive_8_layers.compile(loss='categorical_crossentropy', optimizer=tensorflow.keras.optimizers.Adam(learning_rate=1e-2),metrics=['accuracy'])

# save he best model loaded in later steps
modelcheckpoint = ModelCheckpoint('best_model_naive_8_layers.h5', monitor='val_accuracy', mode='max', verbose=1, save_best_only=True)

# train #
model_naive_8_layers.fit(x_train, y_train,
          batch_size=batch_size, epochs=epochs, verbose=1,validation_data=(x_test, y_test),callbacks=[reduce_lr,earlystoppings,modelcheckpoint])

# save history
history_naive_8_layers = model_naive_8_layers.history

# print results
print_results(model_naive_8_layers,x_train, y_train,x_test, y_test)

plot_results(history_naive_8_layers)

In [None]:
%%script echo skipping=1
# show model
model_naive_8_layers.summary()

10 hidden layers

In [None]:
%%script echo skipping=1
# Same model as naive but with 8 layers
model_naive_10_layers = Sequential() # seems to be widely used

# Input layer  
model_naive_10_layers.add(keras.Input(shape=(28, 28, 1)))
model_naive_10_layers.add(Flatten())

# 1st hidden layer
model_naive_10_layers.add(Dense(512, activation="relu"))
#  2nd hidden layer
model_naive_10_layers.add(Dense(512, activation="relu"))
# 3rd hidden layer
model_naive_10_layers.add(Dense(512, activation="relu"))
#  4th hidden layer
model_naive_10_layers.add(Dense(512, activation="relu"))
# 5th hidden layer
model_naive_10_layers.add(Dense(512, activation="relu"))
#  6th hidden layer
model_naive_10_layers.add(Dense(512, activation="relu"))
# 7th hidden layer
model_naive_10_layers.add(Dense(512, activation="relu"))
#  8th hidden layer
model_naive_10_layers.add(Dense(512, activation="relu"))
# 9th hidden layer
model_naive_10_layers.add(Dense(512, activation="relu"))
#  10th hidden layer
model_naive_10_layers.add(Dense(512, activation="relu"))
# Final  layer
model_naive_10_layers.add(Dense(num_classes,activation="softmax"))

# compile model
model_naive_10_layers.compile(loss='categorical_crossentropy', optimizer=tensorflow.keras.optimizers.Adam(learning_rate=1e-2),metrics=['accuracy'])

# save he best model loaded in later steps
modelcheckpoint = ModelCheckpoint('best_model_naive_10_layers.h5', monitor='val_accuracy', mode='max', verbose=1, save_best_only=True)

# train #
model_naive_10_layers.fit(x_train, y_train,
          batch_size=batch_size, epochs=epochs, verbose=1,validation_data=(x_test, y_test),callbacks=[reduce_lr,earlystoppings,modelcheckpoint])

# save history
history_naive_10_layers = model_naive_10_layers.history

# print results
print_results(model_naive_10_layers,x_train, y_train,x_test, y_test)

plot_results(history_naive_10_layers)

In [None]:
%%script echo skipping=1
# show model
model_naive_10_layers.summary()

# Experiment 3: Adding dropout to naive model

In [None]:
# adding drop out of 20% to naive model
model_naive_dropout = Sequential() # seems to be widely used

# Input layer  
model_naive_dropout.add(keras.Input(shape=(28, 28, 1)))
model_naive_dropout.add(Flatten())

# 1st hidden layer
model_naive_dropout.add(Dense(512, activation="relu"))
model_naive_dropout.add(Dropout(0.2)) # dropout 20%

#  2nd hidden layer
model_naive_dropout.add(Dense(512, activation="relu"))
model_naive_dropout.add(Dropout(0.2)) # dropout 20%

# Final  layer
model_naive_dropout.add(Dense(num_classes,activation="softmax"))

# Compile
model_naive_dropout.compile(loss='categorical_crossentropy', optimizer=tensorflow.keras.optimizers.Adam(learning_rate=1e-2),metrics=['accuracy'])

# rename saved file for best model
modelcheckpoint = ModelCheckpoint('best_model_naive_dropout.h5', monitor='val_accuracy', mode='max', verbose=1, save_best_only=True)
# Fit model
model_naive_dropout.fit(x_train, y_train,
          batch_size=batch_size, epochs=epochs, verbose=1,validation_data=(x_test, y_test),callbacks=[reduce_lr,earlystoppings,modelcheckpoint])

# save history
history_naive_dropout = model_naive_dropout.history

# save weights
model_naive_dropout.save_weights('./checkpoints/my_checkpoint')


In [None]:
# get summary
model_naive_dropout.summary()

# plot model_naive 
plot_model(model_naive_dropout, to_file='model_dropout_plot.png', show_shapes=True, show_layer_names=True)

In [None]:
# load the saved model
model_naive_dropout = load_model('best_model_naive_dropout.h5')

#print results
print_results(model_naive_dropout,x_train, y_train,x_test, y_test)

#plot results
plot_results(history_naive_dropout)

# confusion matrix
create_confusion_matrix(model_naive_dropout,x_test,y_test)

# Experiment 3 Convolutional Neural Network


In [None]:
# since working with images, l'm trying try a Convolutional Neural Network
# kernel is 3x3 as its cheaper to implements
# Also adds more non linearity which may help
# activation function is relu except for final layer
# based off mnist model that achived 97% acccracy
# batch normalisation for regularisation and limit overfitting - need to justify usage


In [None]:
# model architecture
# no initial flattening as we need to perform convolutions on each image befoire flattening it
model_cnn_1=Sequential()
  
model_cnn_1.add(Conv2D(filters=64, kernel_size = (3,3), activation="relu", input_shape=(28,28,1))) # input as images + first kernel
model_cnn_1.add(Conv2D(filters=64, kernel_size = (3,3), activation="relu")) # second 3x3 kernel 

model_cnn_1.add(MaxPooling2D(pool_size=(2,2))) # max pool to reduce paramaters
model_cnn_1.add(BatchNormalization()) # add batch normalisation to hopefully improve training speed and reduce overfitting
model_cnn_1.add(Conv2D(filters=128, kernel_size = (3,3), activation="relu"))  # first of two kernels at 128x128
model_cnn_1.add(Conv2D(filters=128, kernel_size = (3,3), activation="relu"))

model_cnn_1.add(MaxPooling2D(pool_size=(2,2))) # max pool again
model_cnn_1.add(BatchNormalization())    
model_cnn_1.add(Conv2D(filters=256, kernel_size = (3,3), activation="relu")) # last kernel


model_cnn_1.add(MaxPooling2D(pool_size=(2,2)))
# flatten for dense layer     
model_cnn_1.add(Flatten())
model_cnn_1.add(BatchNormalization())
model_cnn_1.add(Dense(512,activation="relu"))
# final layer for prediction using softmax  
model_cnn_1.add(Dense(10,activation="softmax"))

# compile model
model_cnn_1.compile(loss="categorical_crossentropy", optimizer="adam", metrics=["accuracy"])

# train model
modelcheckpoint = ModelCheckpoint('best_model_cnn_1.h5', monitor='val_accuracy', mode='max', verbose=1, save_best_only=True)

model_cnn_1.fit(x_train, y_train, batch_size=batch_size, validation_data=(x_test, y_test), epochs=epochs,verbose=1, callbacks=[reduce_lr,earlystoppings,modelcheckpoint])

#save hisotry
history_cnn_1 = model_cnn_1.history

In [None]:
# plot cnn model architecture

# visualise model
!pip install visualkeras
from PIL import ImageFont
import visualkeras

model_cnn_1.summary()

visualkeras.layered_view(model_cnn_1, legend=True) 

In [None]:
#keras plot model
plot_model(model_cnn_1, to_file='model_cnn_1.png', show_shapes=True, show_layer_names=True)

In [None]:
# load model
model_cnn_1 = load_model('best_model_cnn_1.h5')

print_results(model_cnn_1,x_train, y_train,x_test, y_test)

In [None]:
# load model
model_cnn_1 = load_model('best_model_cnn_1.h5')

print_results(model_cnn_1,x_train, y_train,x_test, y_test)

# cnn plot results
plot_results(history_cnn_1)

# cnn confusion matrix
create_confusion_matrix(model_cnn_1,x_test,y_test)

# Synthetic data genenerator for CNN


In [None]:
# Data augmentation

# With data augmentation to prevent overfitting
# try rotating image as some people may write slanted
# trying zoom  as well as thy may write larger/smaller than average
# also adding shifting incase image isn't centered
# no flipping is necessary as its poitnless in this case due to edpected nature of written letters having defined directionality

datagen = ImageDataGenerator(

        rotation_range=15,  # randomly rotate images in the range 
        zoom_range = 0.1, # Randomly zoom image 
        width_shift_range=0.1,  # randomly shift images horizontally (fraction of total width)
        height_shift_range=0.1,  # randomly shift images vertically (fraction of total height)
        shear_range=0.3  # adds distortion to image
 
        )

#datagen.fit(x_train)
train_data_aug = datagen.flow(x_train, y_train, batch_size=batch_size)
#test_data_aug = datagen.flow(x_test, y_test, batch_size=batch_size)

In [None]:
# train model
model_cnn_data_aug=model_cnn_1 # use same architecture

modelcheckpoint = ModelCheckpoint('best_model_cnn_data_aug.h5', monitor='val_accuracy', mode='max', verbose=1, save_best_only=True)

model_cnn_data_aug.fit(train_data_aug, batch_size=batch_size, validation_data=(x_test, y_test), epochs=100,verbose=1,callbacks=[reduce_lr,earlystoppings,modelcheckpoint] )
#validation_data=(test_data_aug)
history_cnn_data_aug = model_cnn_data_aug.history

In [None]:
# load model
model_cnn_data_aug = load_model('best_model_cnn_data_aug.h5')

# print score 
score_train_cnn_data_aug = model_cnn_data_aug.evaluate(train_data_aug)
print('train score:', score_train_cnn_data_aug[0])
print('train accuracy:', score_train_cnn_data_aug[1]) 

score_test_cnn_data_aug = model_cnn_data_aug.evaluate(x_test, y_test)
print('test score:', score_test_cnn_data_aug[0])
print('test accuracy:', score_test_cnn_data_aug[1]) # 93% accurate accurate

In [None]:
# plot results for data augmentation
plot_results(history_cnn_data_aug)

In [None]:
# data augmentation confusion matrix
create_confusion_matrix(model_cnn_data_aug,x_test,y_test)

In [None]:
def predict_show_classes(model, x_val, y_val):

    #get the predictions for the test data using model input
    y_predict = np.argmax(model.predict(x_val), axis=-1)

    #get the values to be plotted using argmax

    y_true = np.argmax(y_val,axis=1)
    correct = np.nonzero(y_predict==y_true)[0]
    incorrect = np.nonzero(y_predict!=y_true)[0]

    # print out results
    print("Correct predicted classes:",correct.shape[0])
    print("Incorrect predicted classes:",incorrect.shape[0])

    # show metrics for classes
    target_names = ["Class {}:".format(i) for i in range(10)]
    print(classification_report(y_true, y_predict, target_names=target_names))
    return correct, incorrect

In [None]:
# getting metrics model from sklearn
from sklearn.metrics import classification_report
correct, incorrect = predict_show_classes(model_naive, x_test,y_test)