# Using Machine Learning Tools 2024, Assignment 3

## Sign Language Image Classification using Deep Learning

## Overview

In this assignment you will implement different deep learning networks to classify images of hands in poses that correspond to letters in American Sign Language. The dataset is contained in the assignment zip file, along with some images and a text file describing the dataset. It is similar in many ways to other MNIST datasets.

The main aims of the assignment are:

 - To implement and train different types of deep learning network;
 
 - To systematically optimise the architecture and parameters of the networks;
  
 - To explore under- or over-fitting and know what appropriate actions to take in these cases.
 

During this assignment you will go through the process of implementing and optimising deep learning approaches. The way that you work is more important than the results for this assignment, as what is most crucial for you to learn is how to take a dataset, understand the problem, write appropriate code, optimize performance and present results. A good understanding of the different aspects of this process and how to put them together well (which will not always be the same, since different problems come with different constraints or difficulties) is the key to being able to effectively use deep learning techniques in practice.

This assignment relates to the following ACS CBOK areas: abstraction, design, hardware and software, data and information, and programming.


## Scenario

A client is interested in having you (or rather the company that you work for) investigate whether it is possible to develop an app that would enable American sign language to be translated for people that do not sign, or those that sign in different languages/styles. They have provided you with a labelled dataset of images related to signs (hand positions) that represent individual letters in order to do a preliminary test of feasibility.

Your manager has asked you to do this feasibility assessment, but subject to a constraint on the computational facilities available.  More specifically, you are asked to do **no more than 50 training runs in total** (where one training run consists of fitting a DL model, with as many epochs as you think are needed, and with fixed model specifications and fixed hyperparameter settings - that is, not including hyper-parameter optimisation). In addition, because it is intended to be for a lightweight app, your manager wants to to **limit the number of total parameters in each network to a maximum of 500,000.** Also, the data has already been double-checked for problems by an in-house data wrangling team and all erroneous data has already been identified and then fixed by the client, so you **do not need to check for erroneous data** in this case.

In addition, you are told to **create a fixed validation set and any necessary test sets using _only_ the supplied _testing_ dataset.** It is unusual to do this, but here the training set contains a lot of non-independent, augmented images and it is important that the validation images must be totally independent of the training data and not made from augmented instances of training images.

The clients have asked to be informed about the following:
 - **unbiased median accuracy** estimate of the letter predictions from a deep learning model
 - the letter with the highest individual accuracy
 - the letter with the lowest individual accuracy
 - the three most common single types of error (i.e. where one letter is being incorrectly labelled as another)
 
Your manager has asked you to create a jupyter notebook that shows the following:
 - loading the data and displaying a sample of each letter
 - training and optimising both **densely connected** *and* **CNN** style models
 - finding the best single model, subject to a rapid turn-around and corresponding limit of 50 training runs in total
 - reporting clearly and concisely what networks you have tried, the method you used to optimise them, the associated learning curves, the number of total parameters in each, their summary performance and the selection process used to pick the best model
     - this should be clear enough that another employee, with your skillset, should be able to take over from you and understand your code and your methods
 - results from the model that is selected as the best, showing the information that the clients have requested
 - it is hoped that the median accuracy will exceed 94% overall and better than 85% for every individual letter, and you are asked to report (in addition to the client's requests):
     - the overall mean accuracy
     - the accuracy for each individual letter
     - a short written recommendation (100 words maximum) regarding how likely you think it is to achieve these goals either with the current model or by continuing to do a small amount of model development/optimisation


## Guide to Assessment

This assignment is much more free-form than others in order to test your ability to run a full analysis like this one from beginning to end, using the correct procedures. So you should use a methodical approach, as a large portion of the marks are associated with the decisions that you take and the approach that you use.  There are no marks associated with the performance - just report what you achieve, as high performance does not get better marks - to get good marks you need to use the right steps as well as to create clean, concise code and outputs, just as you've done in other assignments.

Make sure that you follow the instructions found in the scenario above, as this is what will be marked.  And be careful to do things in a way that gives you an *unbiased* result.

The notebook that you submit should be similar to those in the other assignments, where it is important to clearly structure your outputs and code so that it could be understood by your manager or your co-worker - or, even more importantly, the person marking it! This does not require much writing beyond the code, comments and the small amount of output text that you've seen in previous assignments.  Do not write long paragraphs to explain every detail of everything you do - it is not that kind of report and longer is definitely not better.  Just make your code clear, your outputs easy to understand (very short summaries often help here), and include a few small markdown cells that describe or summarise things when you think they are necessary.

Marks for the assignment will be determined according to the rubric that you can find on MyUni, with a breakdown into sections as follows:
 - 30%: Loading and displaying data, plus initial model training (acting as a baseline)
 - 50%: Optimisation of an appropriate set of models in an appropriate way (given the imposed constraints)
 - 20%: Comparison of models, selection of the single best model and reporting of final results

Your report (notebook) should be **divided clearly into three sections**, corresponding to the three bullet points listed above.

Remember that most marks will be for the **steps you take**, rather than the achievement of any particular results. There will also be marks for showing appropriate understanding of the results that you present.  

What you need to do this assignment can all be found in the first 10 weeks of workshops, lectures and also the previous two assignments.

## Final Instructions

While you are free to use whatever IDE you like to develop your code, your submission should be formatted as a Jupyter notebook that interleaves Python code with output, commentary and analysis, and clearly divided into three main sections as described above. 
- All data processing must be done within the notebook after calling appropriate load functions.
- Comment your code appropriately, so that its purpose is clear to the reader, but not so full of comments that it is hard to follow the flow of the code. Also avoid interspersing, in the same cell, code that is run with function definitions as they make code hard to follow.
- In the submission file name, do not use spaces or special characters.

The marks for this assignment are mainly associated with making the right choices and executing the workflow correctly and efficiently, as well as having clean and concise code and outputs. Make sure your code and outputs are easy to follow and not unnecessarily long. Use of headings and very short summaries can help, and try to avoid lengthy portions of text or plots. The readability of the report (notebook) will count towards the marks (and please note that _excessive_ commenting or text outputs or text in output cells is strongly discouraged and will result in worse grades, so aim for a modest, well-chosen amount of comments and text in outputs).

This assignment can be solved using methods from sklearn, pandas, matplotlib, seaborn and keras/tensorflow, as presented in the workshops. Other high-level libraries should not be used, even though they might have nice functionality such as automated hyperparameter or architecture search/tuning/optimisation. For the deep learning parts please restrict yourself to the library calls used in workshops 7-10 or ones that are very similar to these. You are expected to search and carefully read the documentation for functions that you use, to ensure you are using them correctly.

As ususal, feel free to use code from internet sources, ChatGPT or the workshops as a base for this assignment, but be aware that they may not do *exactly* what you want (code examples rarely do!) and so you will need to make suitable modifications. Appropriate references for substantial excerpts, even if modified, should be given.


In [1]:
# Set the seed for reproducibility
import numpy as np
import random
import tensorflow as tf
def set_seed(seed=42):
    random.seed(seed)
    np.random.seed(seed)
    tf.random.set_seed(seed)

set_seed(42)

In [2]:
import pandas as pd

from sklearn.model_selection import train_test_split
from tensorflow.keras.utils import to_categorical

# Load the dataset
train_data = pd.read_csv('sign_mnist_train.csv')
test_data = pd.read_csv('sign_mnist_test.csv')

# Extract labels and images
X_train = train_data.iloc[:, 1:].values
y_train = train_data.iloc[:, 0].values
X_test = test_data.iloc[:, 1:].values
y_test = test_data.iloc[:, 0].values

In [3]:

# Adjust labels to be in range 0-23 instead of 0-24
y_train = [i if i < 9 else i - 1 for i in y_train]
y_test = [i if i < 9 else i - 1 for i in y_test]

# Convert lists to numpy arrays
y_train = np.array(y_train)
y_test = np.array(y_test)

# Reshape images to 32x32 (since the images consists of 32x32 pixels)
X_train = X_train.reshape(-1, 32, 32, 1)
X_test = X_test.reshape(-1, 32, 32, 1)

# Normalize pixel values
X_train = X_train / 255.0
X_test = X_test / 255.0

# Create a fixed validation set and a test set from the testing data
X_val, X_final_test, y_val, y_final_test = train_test_split(X_test, y_test, stratify= y_test, test_size=0.5, random_state=42)

# One-hot encode labels
y_train_enc = to_categorical(y_train, num_classes=24)
y_val_enc = to_categorical(y_val, num_classes=24)
y_final_test_enc = to_categorical(y_final_test, num_classes=24)

In [4]:
# Let's see how big it is
print(X_train.shape)
print(X_test.shape)
n_total = X_train.shape[0]

(27455, 32, 32, 1)
(7172, 32, 32, 1)


In [5]:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Flatten



# Define the densely connected model
dense_model = Sequential([
    Flatten(input_shape=(32, 32, 1)),
    Dense(512, activation='relu'),
    Dense(256, activation='relu'),
    Dense(24, activation='softmax')
])

# Compile the model
set_seed(42)
dense_model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])

# Train the model
history_dense = dense_model.fit(X_train, y_train_enc, epochs=20, validation_data=(X_val, y_val_enc))

Epoch 1/20


  super().__init__(**kwargs)


[1m858/858[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 2ms/step - accuracy: 0.2827 - loss: 2.3754 - val_accuracy: 0.5683 - val_loss: 1.3102
Epoch 2/20
[1m858/858[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 2ms/step - accuracy: 0.6973 - loss: 0.9249 - val_accuracy: 0.7192 - val_loss: 0.9155
Epoch 3/20
[1m858/858[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 2ms/step - accuracy: 0.8297 - loss: 0.5159 - val_accuracy: 0.7462 - val_loss: 0.8934
Epoch 4/20
[1m858/858[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 2ms/step - accuracy: 0.9095 - loss: 0.2890 - val_accuracy: 0.7521 - val_loss: 0.8922
Epoch 5/20
[1m858/858[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 2ms/step - accuracy: 0.9410 - loss: 0.1866 - val_accuracy: 0.7513 - val_loss: 0.9831
Epoch 6/20
[1m858/858[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 2ms/step - accuracy: 0.9664 - loss: 0.1147 - val_accuracy: 0.7607 - val_loss: 1.0020
Epoch 7/20
[1m858/858[0m [32m━━━━━━━

In [6]:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Conv2D, MaxPooling2D, Flatten, Dense


# Define the CNN model
cnn_model = Sequential([
    Conv2D(32, (3, 3), activation='relu', input_shape=(32, 32, 1)),
    MaxPooling2D((2, 2)),
    Conv2D(64, (3, 3), activation='relu'),
    MaxPooling2D((2, 2)),
    Flatten(),
    Dense(128, activation='relu'),
    Dense(24, activation='softmax')
])

# Compile the model
set_seed(42)
cnn_model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])

# Train the model
history_cnn = cnn_model.fit(X_train, y_train_enc, epochs=20, validation_data=(X_val, y_val_enc))


Epoch 1/20


  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


[1m858/858[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m4s[0m 4ms/step - accuracy: 0.5623 - loss: 1.5057 - val_accuracy: 0.8795 - val_loss: 0.4001
Epoch 2/20
[1m858/858[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 3ms/step - accuracy: 0.9948 - loss: 0.0352 - val_accuracy: 0.9038 - val_loss: 0.3952
Epoch 3/20
[1m858/858[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 3ms/step - accuracy: 0.9988 - loss: 0.0076 - val_accuracy: 0.9163 - val_loss: 0.3932
Epoch 4/20
[1m858/858[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 3ms/step - accuracy: 1.0000 - loss: 0.0011 - val_accuracy: 0.9200 - val_loss: 0.3761
Epoch 5/20
[1m858/858[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 3ms/step - accuracy: 1.0000 - loss: 4.0789e-04 - val_accuracy: 0.9211 - val_loss: 0.3821
Epoch 6/20
[1m858/858[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 3ms/step - accuracy: 1.0000 - loss: 2.3329e-04 - val_accuracy: 0.9214 - val_loss: 0.3918
Epoch 7/20
[1m858/858[0m [32

In [7]:
from sklearn.metrics import confusion_matrix
def evaluate_model(name, model_name, X_val, y_val_enc, verbose = 2):
    model_eval = model_name.evaluate(X_val, y_val_enc, verbose = verbose)
    print(f"{name} Model - Test Accuracy: {model_eval[1]:.6f}\n")

    # Detailed evaluation of the model
    y_pred = np.argmax(model_name.predict(X_val), axis=1)

    # Compute accuracy per class, skipping index 9 (for J)
    y_true = np.argmax(y_val_enc, axis=1)
    accuracy_per_class = []
    for i in range(24):
        if np.sum(y_true == i) > 0:
            accuracy_per_class.append(np.mean(y_pred[y_true == i] == i))
        else:
            accuracy_per_class.append(np.nan)  # Handle classes with no samples

    # Filter out NaN values to calculate the median accuracy
    valid_accuracies = [acc for acc in accuracy_per_class if not np.isnan(acc)]
    median_accuracy = np.median(valid_accuracies)

    print(f"Unbiased Median Accuracy: {median_accuracy:.6f}")

    # Identify the letter with the highest individual accuracy
    highest_accuracy_class = np.nanargmax(accuracy_per_class)
    if highest_accuracy_class >= 9:  # Adjust for the missing 'J'
        letter_with_highest_accuracy = chr(highest_accuracy_class + ord('A') + 1)
    else:
        letter_with_highest_accuracy = chr(highest_accuracy_class + ord('A'))
    print(f"Letter with Highest Accuracy: {letter_with_highest_accuracy}")

    # Identify the letter with the lowest individual accuracy
    lowest_accuracy_class = np.nanargmin(accuracy_per_class)
    if lowest_accuracy_class >= 9:  # Adjust for the missing 'J'
        letter_with_lowest_accuracy = chr(lowest_accuracy_class + ord('A') + 1)
    else:
        letter_with_lowest_accuracy = chr(lowest_accuracy_class + ord('A'))
    print(f"Letter with Lowest Accuracy: {letter_with_lowest_accuracy}\n")


    # Calculate the confusion matrix
    conf_matrix = confusion_matrix(y_true, y_pred)

    # Set the diagonal elements to zero to exclude correct classifications
    np.fill_diagonal(conf_matrix, 0)

    # Flatten the confusion matrix and find top error indices
    flat_matrix = conf_matrix.flatten()
    top_indices = np.argsort(-flat_matrix)

    # Filter out zero errors
    non_zero_indices = top_indices[flat_matrix[top_indices] > 0]

    # Get top three non-zero errors
    top_3_indices = non_zero_indices[:3]
    top_3_errors = flat_matrix[top_3_indices]
    top_positions = np.unravel_index(top_3_indices, conf_matrix.shape)

    # Adjust indices to skip 'J' and map to the correct letters
    def adjust_for_j(index):
        return chr(index + ord('A') + 1) if index >= 9 else chr(index + ord('A'))
    
    print("Top three errors:")
    for i in range(len(top_3_errors)):
        print(f"-> Predict {adjust_for_j(top_positions[0][i])} as {adjust_for_j(top_positions[1][i])}")

    if len(top_3_errors) == 0:
        print("No error detected, the model is perfect")
    elif len(top_3_errors) != 3:
        print(f"Only {len(top_3_errors)} errors detected\n")

    # # Get the top three most common errors with adjustment for 'J'
    # common_errors = [(adjust_for_j(errors[0][i]), adjust_for_j(errors[1][i])) for i in range(3)]

    # print(f"Most Common Errors: {common_errors}")

    # Report overall mean accuracy and accuracy per letter
    mean_accuracy = np.nanmean(accuracy_per_class)
    print(f"Overall Mean Accuracy: {mean_accuracy:.6f}\n")

    # Print each letter and its accuracy
    letters = [chr(i + ord('A')) for i in range(26) if i not in [9, 25]]
    for i, acc in enumerate(accuracy_per_class):
        print(f"Letter {letters[i]}: Accuracy {acc:.6f}")


In [8]:
evaluate_model("Dense",dense_model, X_val, y_val_enc, 2)

113/113 - 0s - 1ms/step - accuracy: 0.8377 - loss: 0.8556
Dense Model - Test Accuracy: 0.837702

[1m113/113[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 866us/step
Unbiased Median Accuracy: 0.865254
Letter with Highest Accuracy: A
Letter with Lowest Accuracy: N

Top three errors:
-> Predict M as S
-> Predict N as A
-> Predict T as X
Overall Mean Accuracy: 0.827844

Letter A: Accuracy 1.000000
Letter B: Accuracy 1.000000
Letter C: Accuracy 0.954839
Letter D: Accuracy 0.852459
Letter E: Accuracy 0.955823
Letter F: Accuracy 0.967742
Letter G: Accuracy 0.844828
Letter H: Accuracy 0.963303
Letter I: Accuracy 0.840278
Letter K: Accuracy 0.638554
Letter L: Accuracy 0.951923
Letter M: Accuracy 0.812183
Letter N: Accuracy 0.365517
Letter O: Accuracy 0.902439
Letter P: Accuracy 1.000000
Letter Q: Accuracy 0.878049
Letter R: Accuracy 0.680556
Letter S: Accuracy 0.658537
Letter T: Accuracy 0.693548
Letter U: Accuracy 0.624060
Letter V: Accuracy 0.728324
Letter W: Accuracy 0.941748
Letter

In [9]:
evaluate_model("CNN", cnn_model, X_val, y_val_enc, 2)

113/113 - 0s - 1ms/step - accuracy: 0.9194 - loss: 0.6062
CNN Model - Test Accuracy: 0.919409

[1m113/113[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step
Unbiased Median Accuracy: 0.936792
Letter with Highest Accuracy: A
Letter with Lowest Accuracy: O

Top three errors:
-> Predict N as M
-> Predict Y as I
-> Predict T as H
Overall Mean Accuracy: 0.912078

Letter A: Accuracy 1.000000
Letter B: Accuracy 1.000000
Letter C: Accuracy 0.954839
Letter D: Accuracy 0.975410
Letter E: Accuracy 1.000000
Letter F: Accuracy 1.000000
Letter G: Accuracy 0.919540
Letter H: Accuracy 0.963303
Letter I: Accuracy 0.916667
Letter K: Accuracy 0.933735
Letter L: Accuracy 1.000000
Letter M: Accuracy 0.883249
Letter N: Accuracy 0.703448
Letter O: Accuracy 0.682927
Letter P: Accuracy 1.000000
Letter Q: Accuracy 0.975610
Letter R: Accuracy 0.777778
Letter S: Accuracy 0.926829
Letter T: Accuracy 0.693548
Letter U: Accuracy 0.939850
Letter V: Accuracy 0.878613
Letter W: Accuracy 0.980583
Letter X: 

Part 2

In [10]:
from tensorflow.keras.preprocessing.image import ImageDataGenerator

# Define the data augmentation generator
datagen = ImageDataGenerator(
    rotation_range=10,        # Randomly rotate images by 10 degrees
    width_shift_range=0.1,    # Randomly translate images horizontally by 10% of the width
    height_shift_range=0.1,   # Randomly translate images vertically by 10% of the height
    zoom_range=0.1,           # Randomly zoom images by 10%
    horizontal_flip=True      # Randomly flip images horizontally
)


In [11]:
from tensorflow.keras.callbacks import EarlyStopping, ReduceLROnPlateau
# Early stopping and learning rate reduction callbacks
early_stopping = EarlyStopping(monitor='val_loss', patience=5, restore_best_weights=True)
reduce_lr = ReduceLROnPlateau(monitor='val_loss', factor=0.2, patience=3, min_lr=0.00001)


In [12]:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Flatten, BatchNormalization
from tensorflow.keras.regularizers import l2
from tensorflow.keras.optimizers import Adam


def create_dense_model_with_regularization(layer_sizes, learning_rate=0.001, l2_lambda=0.01):
    set_seed(42)
    model = Sequential()
    model.add(Flatten(input_shape=(32, 32, 1)))
    for size in layer_sizes:
        model.add(Dense(size, activation='relu', kernel_regularizer=l2(l2_lambda)))
        model.add(BatchNormalization())
    model.add(Dense(24, activation='softmax', kernel_regularizer=l2(l2_lambda)))
    model.compile(optimizer=Adam(learning_rate=learning_rate), loss='categorical_crossentropy', metrics=['accuracy'])
    return model


In [13]:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Conv2D, MaxPooling2D, Dense, Flatten, BatchNormalization
from tensorflow.keras.regularizers import l2
from tensorflow.keras.optimizers import Adam
from tensorflow import keras

def create_cnn_model_with_regularization(conv_layers, dense_layers, learning_rate=0.1, l2_lambda=0.01):
    model = Sequential()
    # model = keras.models.Sequential()
    for filters, kernel_size in conv_layers:
        model.add(Conv2D(filters, kernel_size, activation='relu', kernel_regularizer=l2(l2_lambda), input_shape=(32, 32, 1)))
        model.add(BatchNormalization())
        model.add(MaxPooling2D((2, 2)))
    model.add(Flatten())
    for size in dense_layers:
        model.add(Dense(size, activation='relu', kernel_regularizer=l2(l2_lambda)))
        model.add(BatchNormalization())
    model.add(Dense(24, activation='softmax', kernel_regularizer=l2(l2_lambda)))
    model.compile(optimizer=Adam(learning_rate=learning_rate), loss='categorical_crossentropy', metrics=['accuracy'])
    return model

# def create_cnn_model_with_regularization(conv_layers, dense_layers, learning_rate=0.001):
#     model = Sequential()
#     # model = keras.models.Sequential()
#     for filters, kernel_size in conv_layers:
#         model.add(Conv2D(filters, kernel_size, activation='relu', input_shape=(32, 32, 1)))
#         model.add(BatchNormalization())
#         model.add(MaxPooling2D((2, 2)))
#     model.add(Flatten())
#     for size in dense_layers:
#         model.add(Dense(size, activation='relu'))
#         model.add(BatchNormalization())
#     model.add(Dense(24, activation='softmax'))
#     model.compile(optimizer=Adam(learning_rate=learning_rate), loss='categorical_crossentropy', metrics=['accuracy'])
#     return model

In [14]:
def return_best_model(group_name, model_group, X_train, y_train_enc, batch_size = 30):
    # Train and evaluate each  model
    model_histories = []
    for model in model_group:
        print(f'Model for {group_name}: {model}')
        set_seed(42)
        history = model.fit(
            datagen.flow(X_train, y_train_enc, batch_size=batch_size),
            steps_per_epoch=len(X_train) // batch_size,
            epochs=50,
            validation_data=(X_val, y_val_enc),
            callbacks=[early_stopping, reduce_lr],
            verbose=1
        )
        val_accuracy = history.history['val_accuracy'][-1]
        model_histories.append((val_accuracy, model))

    # Determine the best model based on validation accuracy
    best_val_accuracy, best_model = max(model_histories, key=lambda item: item[0])

    print(f'The best model is: {best_model} with a validation accuracy of: {best_val_accuracy}')
    return model_histories

In [15]:
# Define different dense models to experiment with
dense_models = [
    create_dense_model_with_regularization([512, 256], learning_rate=0.001),
    create_dense_model_with_regularization([1024, 512, 256], learning_rate=0.001),
    create_dense_model_with_regularization([1024, 512, 256, 128], learning_rate=0.001)
]

# Define different CNN models with regularization to experiment with
cnn_models = [
    create_cnn_model_with_regularization([(32, (3, 3)), (64, (3, 3))], [128],learning_rate=0.001),
    create_cnn_model_with_regularization([(32, (3, 3)), (64, (3, 3)), (128, (3, 3))], [256], learning_rate=0.001),
    create_cnn_model_with_regularization([(16, (3,3)), (32, (3, 3)), (64, (3, 3))], [512, 128], learning_rate=0.01)
]

  super().__init__(**kwargs)
  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


In [16]:
dense_histories = return_best_model("Dense", dense_models, X_train, y_train_enc, 30)

Model for Dense: <Sequential name=sequential_2, built=True>
Epoch 1/50
[1m 38/915[0m [37m━━━━━━━━━━━━━━━━━━━━[0m [1m3s[0m 4ms/step - accuracy: 0.1227 - loss: 13.0615   

  self._warn_if_super_not_called()


[1m915/915[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m5s[0m 4ms/step - accuracy: 0.2234 - loss: 6.9141 - val_accuracy: 0.1107 - val_loss: 5.9806 - learning_rate: 0.0010
Epoch 2/50
[1m915/915[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 148us/step - accuracy: 0.3000 - loss: 2.8580 - val_accuracy: 0.0990 - val_loss: 5.9176 - learning_rate: 0.0010
Epoch 3/50
[1m  1/915[0m [37m━━━━━━━━━━━━━━━━━━━━[0m [1m35s[0m 39ms/step - accuracy: 0.3000 - loss: 2.4996

  self.gen.throw(value)


[1m915/915[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m4s[0m 4ms/step - accuracy: 0.3062 - loss: 2.5931 - val_accuracy: 0.0772 - val_loss: 3.9443 - learning_rate: 0.0010
Epoch 4/50
[1m915/915[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 142us/step - accuracy: 0.4667 - loss: 2.1094 - val_accuracy: 0.0828 - val_loss: 3.9080 - learning_rate: 0.0010
Epoch 5/50
[1m915/915[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m4s[0m 4ms/step - accuracy: 0.3703 - loss: 2.2784 - val_accuracy: 0.1882 - val_loss: 3.6117 - learning_rate: 0.0010
Epoch 6/50
[1m915/915[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 146us/step - accuracy: 0.5000 - loss: 1.9259 - val_accuracy: 0.1952 - val_loss: 3.6049 - learning_rate: 0.0010
Epoch 7/50
[1m915/915[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m4s[0m 4ms/step - accuracy: 0.3948 - loss: 2.1694 - val_accuracy: 0.1913 - val_loss: 3.2765 - learning_rate: 0.0010
Epoch 8/50
[1m915/915[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 

In [17]:
cnn_histories = return_best_model("CNN", cnn_models, X_train, y_train_enc, 30)

Model for CNN: <Sequential name=sequential_5, built=True>
Epoch 1/50
[1m915/915[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m7s[0m 6ms/step - accuracy: 0.5385 - loss: 3.7018 - val_accuracy: 0.8268 - val_loss: 1.4151 - learning_rate: 0.0010
Epoch 2/50
[1m915/915[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 222us/step - accuracy: 0.9000 - loss: 1.1569 - val_accuracy: 0.8299 - val_loss: 1.4124 - learning_rate: 0.0010
Epoch 3/50
[1m915/915[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m6s[0m 6ms/step - accuracy: 0.8725 - loss: 1.2010 - val_accuracy: 0.7094 - val_loss: 1.5567 - learning_rate: 0.0010
Epoch 4/50
[1m915/915[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 223us/step - accuracy: 0.9333 - loss: 0.9365 - val_accuracy: 0.7011 - val_loss: 1.6058 - learning_rate: 0.0010
Epoch 5/50
[1m915/915[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m6s[0m 6ms/step - accuracy: 0.8956 - loss: 1.0339 - val_accuracy: 0.6615 - val_loss: 1.8788 - learning_rate: 0.0010
Model fo

Part 3

In [18]:
# Combine all histories and models
all_histories = dense_histories + cnn_histories

# Select the best model based on validation accuracy
best_model = max(all_histories, key=lambda x: x[0])[1]

In [19]:
evaluate_model("CNN", best_model, X_val, y_val_enc)

113/113 - 0s - 2ms/step - accuracy: 1.0000 - loss: 0.0736
CNN Model - Test Accuracy: 1.000000

[1m113/113[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step
Unbiased Median Accuracy: 1.000000
Letter with Highest Accuracy: A
Letter with Lowest Accuracy: A

Top three errors:
No error detected, the model is perfect
Overall Mean Accuracy: 1.000000

Letter A: Accuracy 1.000000
Letter B: Accuracy 1.000000
Letter C: Accuracy 1.000000
Letter D: Accuracy 1.000000
Letter E: Accuracy 1.000000
Letter F: Accuracy 1.000000
Letter G: Accuracy 1.000000
Letter H: Accuracy 1.000000
Letter I: Accuracy 1.000000
Letter K: Accuracy 1.000000
Letter L: Accuracy 1.000000
Letter M: Accuracy 1.000000
Letter N: Accuracy 1.000000
Letter O: Accuracy 1.000000
Letter P: Accuracy 1.000000
Letter Q: Accuracy 1.000000
Letter R: Accuracy 1.000000
Letter S: Accuracy 1.000000
Letter T: Accuracy 1.000000
Letter U: Accuracy 1.000000
Letter V: Accuracy 1.000000
Letter W: Accuracy 1.000000
Letter X: Accuracy 1.000