# Digit Recognizer - Image Classification with a LeNet Neural Network

The objective of this Kaggle competition is to build and train a Neural Network for image classification; the dataset is composed by image in grayscale representing hand written digits from 0 to 9 which have to be correctly labeled. For this competition I am planning to build, train and test different architectures for image classification: in particular, the LeNet neural network will be implemented; first, the original architecture is gonna be tested, then optuna will be used to perform hyperparameters optimization while maintaining the original architecture; in conclusion, the performances of the two approaches will be compared.

Let us set the autoreloader:

In [None]:
%load_ext autoreload
%autoreload 2  

Let us set up tensorboard:

In [None]:
%load_ext tensorboard

Let us import the required libraries:

In [None]:
import numpy as np
import optuna
import pandas as pd
import datetime
import matplotlib.pyplot as plt
import seaborn as sns 
from sklearn.model_selection import train_test_split

In [None]:
from preliminary.preprocess import show_image, create_dataset
from models.LeNet import LeNet_optimize, LeNet_predict, LeNet, LeNet_performance, plot_history
from models.LeNet_ensemble import LeNet_ensemble_performance, LeNet_ensemble_predict, plot_history_ensemble
from models.learning_curve import learning_curve, learning_curve_ensemble, plot_learning_curve
from utilities import generate_submission_file

Let us load the data:

In [None]:
dataset_train = pd.read_csv("train.csv").to_numpy()
dataset_test = pd.read_csv("test.csv").to_numpy()

The dataset is composed by images of size 28x28, which can be easily plotted:

In [None]:
image_index = 1
plot_size = (8,8)
plot_code = "show"
show_image(dataset_train, plot_code, image_index, plot_size)

In [None]:
image_index = 4
plot_size = (8,8)
plot_code = "explore"
show_image(dataset_test, plot_code, image_index, plot_size)

To feed the data to the Neural Networks that are gonna be trained it is necessary properly reshape the data into suitable numpy arrays:

In [None]:
X_train, y_train = create_dataset(dataset_train, "train")

In [None]:
X_test = create_dataset(dataset_test, "test")

Now it is possible to create the dictionaries for training the neural network and for using it to make predictions:

In [None]:
X_train_eval, X_test_eval, y_train_eval, y_test_eval = train_test_split(X_train, y_train, test_size=0.4)
X_val_eval, X_test_eval, y_val_eval, y_test_eval = train_test_split(X_test_eval, y_test_eval, test_size=0.5)

In [None]:
datasets_eval = {"train": [X_train_eval, y_train_eval], "val": [X_val_eval, y_val_eval], "test_eval": [X_test_eval, y_test_eval]}

In [None]:
X_train_pred, X_val_pred, y_train_pred, y_val_pred = train_test_split(X_train, y_train, test_size=0.3)

In [None]:
datasets_pred = {"train": [X_train_pred, y_train_pred], "val": [X_val_pred, y_val_pred], "test": X_test}

### Legacy LeNet Neural Network

Let us implement a version of theLeNet neural network similar to the original one: the flattening layer before the classification head can be replaced by a global average pooling layer, so that the total number of parameters can be reduced. The parameters of this simplified LeNet neural network are:

In [None]:
LeNet_params_original = {
    "kernel size 1": 5,
    "kernel size 2": 5,
    "n filter 1": 6,
    "n filter 2": 16,
    "activation function conv": "tanh",
    "l2 regularizer conv": 0,
    "pool size 1": 2,
    "pool size 2": 2,
    "dense size 1": 120,
    "dense size 2": 84,
    "activation function dense": "tanh",
    "l2 regularizer dense": 0
}

The performances of the LeNet neural network can be established by evaluating different metrics, like the accuracy, the confusion matrix and the ROC AUC scores. To compute these quantities it is possible to train the LeNet network multiple times and then evaluates its performances on the data, so that average values and standard deviations can be easily computed.

In [None]:
batch_size = 32
epochs = 100
n_samples = 10
results_original = LeNet_performance(datasets_eval, LeNet_params_original, batch_size, epochs, n_samples)

The performances of the original LeNet neural network are:

In [None]:
print(f"train accuracy - avg: {np.round(results_original['accuracy']['train'][0], decimals=3)}, std: {np.round(results_original['accuracy']['train'][1], decimals=3)}")
print(f"val accuracy - avg: {np.round(results_original['accuracy']['val'][0], decimals=3)}, std: {np.round(results_original['accuracy']['val'][1], decimals=3)}")
print(f"test accuracy - avg: {np.round(results_original['accuracy']['test'][0], decimals=3)}, std: {np.round(results_original['accuracy']['test'][1], decimals=3)}")

In [None]:
for c in range(10):
    print(f"roc auc score of class {c} (one vs rest appraoch) - avg: {np.round(results_original['roc auc']['mean'][c], decimals=4)}, std: {np.round(results_original['roc auc']['std'][c], decimals=4)}")

It is also possible to compute the learning curve for the standard LeNet neural network:

In [None]:
training_set_sizes = np.array([0.01, 0.02, 0.03,0.04, 0.05, 0.06, 0.07, 0.08, 0.09, 0.1])
n_samples = 20
batch_size = 32
epochs = 2000
train_acc_avg_original, test_acc_avg_original, train_acc_std_original, test_acc_std_original = learning_curve(training_set_sizes, datasets_eval, n_samples, LeNet_params_original, batch_size, epochs)

In [None]:
plot_size = (8,8)
classifier_name = "standard LeNet neural network"
plot_learning_curve(training_set_sizes, train_acc_avg_original, test_acc_avg_original, train_acc_std_original, test_acc_std_original, plot_size, classifier_name)
#plt.savefig("standard_LeNet_learning_curve.png")

In [None]:
plt.savefig("standard_LeNet_learning_curve.png")

Now the neural network can be used to make predictions:

In [None]:
batch_size = 32
epochs = 60
log_dir = "D://Codes//Python//Kaggle Competitions//Digit Recognizer//tensorboard_log//LeNet"
verbose = "auto"
model_original = LeNet(LeNet_params_original)
model_original, train_accuracy, val_accuracy, history, y_pred_original = LeNet_predict(datasets_pred, model_original, log_dir, batch_size, epochs, verbose)

In [None]:
print(f"train accuracy: {np.round(train_accuracy, decimals=3)}, val accuracy: {np.round(val_accuracy, decimals=3)}")

To check the convergence of the LeNet neural network used to maked predictions, it is possible to analyze the behaviour of the accuracy on the training set and on the validation set as a function of the epochs of training:

In [None]:
plot_size = (8,8)
classifier_name = "standard LeNet neural network"
plot_history(history, plot_size, classifier_name)
#plt.savefig("standard_LeNet_convergence_curve.png")

The model summary for the original LeNet neural network is:

In [None]:
model_original.summary()

The submission file for the original LeNet neural network can be produced, so that it is possible to establish a baseline to understand which model is better between the original one and the one obtained via hyperparameter optimization:

In [None]:
generate_submission_file(dataset_test, y_pred_original, "LeNet_baseline.csv")

### Optimized LeNet Neural Network

Let us now define the search space for the hyperparameters of the LeNet neural network:

In [None]:
search_space = {
    "kernel size 1": [3, 5, 7],
    "kernel size 2": [3, 5, 7],
    "n filter 1": [3, 5, 7, 9, 11, 13],
    "n filter 2": [3, 5, 7, 9, 11, 13],
    "activation function conv": ["sigmoid", "relu", "gelu", "elu", "tanh"],
    "l2 regularizer conv": [1e-6, 1e-2],
    "pool size 1": [2, 4],
    "pool size 2": [2, 4],
    "dense size 1": [32, 64, 128, 256],
    "dense size 2": [32, 64, 128, 256],
    "activation function dense": ["sigmoid", "relu", "gelu", "elu", "tanh"],
    "l2 regularizer dense": [1e-6, 1e-2]
}

Let us now find the set of the optimal parameters of the LeNet neural network:

In [None]:
batch_size = 32
epochs = 100
n_trial = 200
verbose = "auto"
storage_url = "sqlite:///optuna_study.db"
study = LeNet_optimize(datasets_eval, search_space, batch_size, epochs, n_trial, f"{str(datetime.datetime.today())[:-10]} - LeNet", verbose, storage_url) 

The results of the optimization procedure can be displayed; in particular, it is possible to plot the importance of the hyperparameters, the intermediate values of the objective function for the different trials and the optimization history:

In [None]:
optuna.visualization.plot_param_importances(study)

In [None]:
optuna.visualization.plot_intermediate_values(study)

In [None]:
optuna.visualization.plot_optimization_history(study, target_name="Validation Accuracy")

In a way analogous to what has already been done with the standard LeNet neural network, the performances of the optimized LeNet neural network can be evaluated:

In [None]:
batch_size = 32
epochs = 100
n_samples = 10
results = LeNet_performance(datasets_eval, study.best_params, batch_size, epochs, n_samples)

The performances of the LeNet neural network with hyperparameters optimization are:

In [None]:
print(f"train accuracy - avg: {np.round(results['accuracy']['train'][0], decimals=3)}, std: {np.round(results['accuracy']['train'][1], decimals=3)}")
print(f"val accuracy - avg: {np.round(results['accuracy']['val'][0], decimals=3)}, std: {np.round(results['accuracy']['val'][1], decimals=3)}")
print(f"test accuracy - avg: {np.round(results['accuracy']['test'][0], decimals=3)}, std: {np.round(results['accuracy']['test'][1], decimals=3)}")

In [None]:
for c in range(10):
    print(f"roc auc score of class {c} (one vs rest appraoch) - avg: {np.round(results['roc auc']['mean'][c], decimals=4)}, std: {np.round(results['roc auc']['std'][c], decimals=4)}")

It is also possible to compute the learning curve for the optimized LeNet neural network:

In [None]:
training_set_sizes = np.array([0.01, 0.02, 0.03,0.04, 0.05, 0.06, 0.07, 0.08, 0.09, 0.1])
n_samples = 20
batch_size = 32
epochs = 2000
train_acc_avg_optimized, test_acc_avg_optimized, train_acc_std_optimized, test_acc_std_optimized = learning_curve(training_set_sizes, datasets_eval, n_samples, study.best_params, batch_size, epochs)

In [None]:
plot_size = (8,8)
classifier_name = "optimized LeNet neural network"
plot_learning_curve(training_set_sizes, train_acc_avg_optimized, test_acc_avg_optimized, train_acc_std_optimized, test_acc_std_optimized, plot_size, classifier_name)
#plt.savefig("optimized_LeNet_learning_curve.png")

Now it is possible to use the optimal set of hyperparameters to train a LeNet neural network and use it for predictions:

In [None]:
batch_size = 32
epochs = 60
log_dir = "D://Codes//Python//Kaggle Competitions//Digit Recognizer//tensorboard_log//LeNet"
verbose = "auto"
model = LeNet(study.best_params)
model, train_accuracy, val_accuracy, history_optimized, y_pred = LeNet_predict(datasets_pred, model, log_dir, batch_size, epochs, verbose)

To check the convergence of the optimized LeNet neural network used to maked predictions, it is possible to analyze the behaviour of the accuracy on the training set and on the validation set as a function of the epochs of training:

In [None]:
plot_size = (8,8)
classifier_name = "optimized LeNet neural network"
plot_history(history_optimized, plot_size, classifier_name)
#plt.savefig("optimized_LeNet_convergence_curve.png")

The architecture of the trained neural network can be plotted as:

In [None]:
plot_model(model, to_file="LeNet_1", show_shapes=True, show_layer_names=True)

The model summary is:

In [None]:
model.summary()

Now it is possible to show the images in the test set together with their predicted label:

In [None]:
image_index = 67
plot_size = (8,8)
dataset_code = "predict"
predicted_label = np.argmax(y_pred[image_index,:])
show_image(dataset_test, dataset_code, image_index, plot_size, y_pred)

The output file for submission can be generated:

In [None]:
generate_submission_file(dataset_test, y_pred, "LeNet_1.csv")

### Deep Ensemble with LeNet neural networks

To obtain a better model, it is possible to build an ensemble of LeNet neural networks. To do this, let us consider the same architecture and the same set of hyperparameters, then, each model in the ensemble can be trained on the same dataset. The hyperparameters of each member of the ensemble can be set equal to the optimal ones obtained before. The performances of the ensemble can be tested with an approach analogous to the one used to evaluate a single network. 

Now it is possible to analyze in details the performances of the ensembles of the LeNet neural networks:

In [None]:
batch_size = 32
epochs = 100
n_samples = 10
ensemble_size = 10
results_ensemble = LeNet_ensemble_performance(datasets_eval, study.best_params, batch_size, epochs, n_samples, ensemble_size)

The performances of the ensemble of LeNet neural networks are:

In [None]:
print(f"train accuracy - avg: {np.round(results_ensemble['accuracy']['train'][0], decimals=3)}, std: {np.round(results_ensemble['accuracy']['train'][1], decimals=3)}")
print(f"val accuracy - avg: {np.round(results_ensemble['accuracy']['val'][0], decimals=3)}, std: {np.round(results_ensemble['accuracy']['val'][1], decimals=3)}")
print(f"test accuracy - avg: {np.round(results_ensemble['accuracy']['test'][0], decimals=3)}, std: {np.round(results_ensemble['accuracy']['test'][1], decimals=3)}")

In [None]:
for c in range(10):
    print(f"roc auc score of class {c} (one vs rest approach) - avg: {np.round(results_ensemble['roc auc']['mean'][c], decimals=5)}, std: {np.round(results_ensemble['roc auc']['std'][c], decimals=5)}")

It is also possible to compute the learning curve for the ensemble of LeNet neural networks:

In [None]:
training_set_sizes = np.array([0.01, 0.02, 0.03,0.04, 0.05, 0.06, 0.07, 0.08, 0.09, 0.1])
n_samples = 20
batch_size = 32
epochs = 2000
ensemble_size = 10
train_acc_avg_ensemble, test_acc_avg_ensemble, train_acc_std_ensemble, test_acc_std_ensemble = learning_curve_ensemble(training_set_sizes, datasets_eval, n_samples, study.best_params, batch_size, epochs, ensemble_size)

In [None]:
plot_size = (8,8)
classifier_name = "ensemble of LeNet neural networks"
plot_learning_curve(training_set_sizes, train_acc_avg_ensemble, test_acc_avg_ensemble, train_acc_std_ensemble, test_acc_std_ensemble, plot_size, classifier_name)
#plt.savefig("ensemble_LeNet_learning_curve.png")

Now the ensemble can be used to make predictions:

In [None]:
batch_size = 32
epochs = 100
ensemble_size = 10
models = [LeNet(study.best_params) for _ in range(ensemble_size)]
models, train_accuracy, val_accuracy, histories, y_pred_ensemble = LeNet_ensemble_predict(datasets_pred, models, batch_size, epochs)

In [None]:
print(f"train accuracy: {train_accuracy}, val accuracy: {val_accuracy}")

In [None]:
plot_size = (8,8)
plot_history_ensemble(histories, plot_size)
#plt.savefig("ensemble_LeNet_convergence_curve.png")

The submission file for the ensemble of LeNet neural networks can be produced:

In [None]:
generate_submission_file(dataset_test, y_pred_ensemble, "LeNet_ensemble.csv")