## Introduction
### 🔍 Efficient Exploration: Fine-Tuning and Grid Search with EfficientNetV2B0

Welcome to a pragmatic journey where we demystify the synergy between EfficientNetV2B0, fine-tuning, and grid search for image recognition tasks. No theatrics, just a logical exploration of methods to yield optimal results.

### 🌐 EfficientNetV2B0: A Solid Starting Point

We kick things off with EfficientNetV2B0, a reliable architecture known for its efficiency in image classification. We'll explore its features and understand how it forms the backbone of our approach to achieving accurate predictions.

### 🎯 Fine-Tuning Strategies: Focusing on the Last Ten Layers

Fine-tuning doesn't have to be a mystery. Discover a straightforward approach as we target the last ten layers of our base model. This precise adjustment allows us to adapt our model to the specific characteristics of our dataset, enhancing its ability to make accurate classifications.

### 🔍 Grid Search: Systematic Parameter Exploration

Enter grid search, a methodical way to navigate the hyperparameter landscape. We'll walk through how this systematic exploration helps us find the sweet spot for hyperparameter configuration, ensuring our model operates at its best without unnecessary complexity.

### 🌐 Practical Logic: Simplifying Complexity

This isn't just theoretical; it's a practical guide for tackling image recognition challenges. We break down complex concepts into practical insights, making it accessible for anyone looking to optimize their models.

Elevate your image recognition game with a logical and systematic approach. Let's dive in! 🚀


In [None]:
import numpy as np 
import pandas as pd 
import os
from sklearn.model_selection import train_test_split
import tensorflow as tf
from tensorflow.keras.layers import Conv2D, MaxPooling2D, Flatten, Dense, GlobalAveragePooling2D
from tensorflow.keras.wrappers.scikit_learn import KerasClassifier
from sklearn.model_selection import KFold
from sklearn.model_selection import GridSearchCV
from sklearn.metrics import confusion_matrix
import matplotlib.pyplot as plt
from sklearn.model_selection import KFold
import seaborn as sns

In [None]:
test_file = (f"/kaggle/input/digit-recognizer/test.csv")
test_csv = pd.read_csv(test_file)
train_file = (f"/kaggle/input/digit-recognizer/train.csv")
train_csv = pd.read_csv(train_file)

In [None]:
train_csv.columns

In [None]:
train_csv

### Visualiztion

In [None]:
row_index = 78  
label = train_csv.loc[row_index, 'label']
pixels = train_csv.loc[row_index, 'pixel0':'pixel783'].values
image = pixels.reshape(28, 28)  

# Display the image
plt.figure(figsize=(3,3))
plt.imshow(image, cmap='gray')
plt.title(f"Label: {label}")
plt.show()

In [None]:
train_csv["label"]

### Preprocessing for train_csv

In [None]:
X = train_csv.loc[:, 'pixel0':'pixel783'].values / 255.0
y = train_csv['label']

In [None]:
X_train_reshaped = X.reshape(-1, 28, 28, 1)
X_train_rgb = np.repeat(X_train_reshaped, 3, axis=-1)
X_train_rgb.shape,y.unique().shape

In [None]:
X_train, X_val, y_train, y_val = train_test_split(X_train_rgb, y, test_size=0.2, random_state=42)
X_train.shape, X_val.shape, y_train.shape, y_val.shape

### Preprocessing for the test_csv

In [None]:
X_test_csv = test_csv.loc[:, 'pixel0':'pixel783'].values / 255.0
X_test_csv_reshaped = X_test_csv.reshape(-1, 28, 28, 1)
X_test_csv_rgb = np.repeat(X_test_csv_reshaped, 3, axis=-1)
X_test_csv_rgb.shape

In [None]:
num_classes = 10
input_shape = (28, 28, 3)

> We have created a create_model function that has EfficientNetV2B0 as it's base layer followed by additional convolutional neural network (CNN) layers. We unfreezed the last 10 layers of EfficientNetV2B0 for our notebook. However, you can unfreeze more or less number of layers for fine-tuning with respect to your problem statement.
Also, we have taken input_shape, learning_rate, num_hidden_units, conv_units, dense_units, num_classes as arguments which will perform grid search on our of which input_shape and num_classes are fixed arguments(as we don't need to perform grid search on those).

In [None]:
def create_model(input_shape, learning_rate, num_hidden_units, conv_units, dense_units, num_classes):
    # Create the base EfficientNetV2B0 model
    base_model = tf.keras.applications.efficientnet_v2.EfficientNetV2B0(include_top=False)
    base_model.trainable = True  # Set the entire base model to be trainable

    # Freeze all layers except the last ten
    for layer in base_model.layers[:-10]:
        layer.trainable = False

    # Define the input layer
    inputs = tf.keras.layers.Input(shape=input_shape, name="input_layer")

    # Pass the input through the base model
    x1 = base_model(inputs, training=False)
    x1 = GlobalAveragePooling2D(name="global_average_pooling_layer")(x1)

    # Create the CNN-based model
    model_cnn = tf.keras.models.Sequential()
    model_cnn.add(Conv2D(conv_units, kernel_size=(3, 3), activation='relu', input_shape=input_shape))
    model_cnn.add(MaxPooling2D(pool_size=(2, 2)))
    model_cnn.add(Flatten())
    model_cnn.add(Dense(dense_units, activation='relu'))
    model_cnn.add(Dense(num_classes, activation='softmax'))

    # Pass the input through the CNN-based model
    x2 = model_cnn(inputs)

    # Concatenate the outputs of the two models
    x = tf.keras.layers.concatenate([x1, x2])

    # Add additional layers as needed
    x = Dense(num_hidden_units, activation='relu')(x)

    # Output layer
    outputs = Dense(num_classes, activation='softmax', name="output_layer")(x)

    # Create the final model
    final_model = tf.keras.Model(inputs, outputs)

    # Compile the model
    final_model.compile(
        loss="sparse_categorical_crossentropy",
        optimizer=tf.keras.optimizers.Adam(learning_rate=learning_rate),
        metrics=["accuracy"]
    )

    return final_model


> Here we have used EfficientNetV2B0 as our base model for transfer learning though you can increase the complexity of the base model by using other models labeled  from EfficientNet B0 to B7. You can use this link for the reference https://keras.io/api/applications/efficientnet/.
> 
> EfficientNet models are known for their ability to achieve state-of-the-art performance on various computer vision tasks while being computationally efficient. The scaling approach used in their design allows for effective model generalization across different tasks and datasets. Researchers often choose a specific EfficientNet variant based on the requirements of their task, balancing the need for accuracy with computational efficiency.

### 🚀 Fine-Tuning Brilliance: Exploring Model Excellence with Grid Search in Keras 🤖

> We have commented out this code snippet though you can uncomment it and run the grid search to get the best hyperparameters with reference to your model's architecture.
>
> We made things simpler by creating a new function, create_model_fixed, that includes fixed details like input_shape and num_classes. This helps keep the code neat and organized. The function is then used in our model setup for grid search, where we explore different hyperparameters. After the search, the best parameters and their corresponding accuracy are printed. This approach makes it easier to manage both fixed and adjustable parts of the model.
>
> While running the code you might come across a DeprecationWarning that indicates that the KerasClassifier class you are using is deprecated.The recommendation is to use keras-tuner.

> You can use the link for reference :
> https://www.tensorflow.org/tutorials/keras/keras_tuner

In [None]:
# fixed_params = {
#     'input_shape': input_shape,
#     'num_classes': num_classes, 
# }

# # Create a new function with fixed parameters
# def create_model_fixed(learning_rate, num_hidden_units, conv_units, dense_units):
#     return create_model(**fixed_params, learning_rate=learning_rate, num_hidden_units=num_hidden_units,
#                         conv_units=conv_units, dense_units=dense_units)

# model = KerasClassifier(build_fn=create_model_fixed, epochs=10, batch_size=32, verbose=0)

# # Define the hyperparameters for grid search
# param_grid = {
#     'learning_rate': [0.001, 0.01],
#     'num_hidden_units': [64, 128],
#     'conv_units': [32, 64],
#     'dense_units': [64, 128]
# }

# # Perform grid search without k-fold cross-validation
# grid_search = GridSearchCV(estimator=model, param_grid=param_grid, scoring='accuracy',cv = 2 ,verbose=1)
# grid_result = grid_search.fit(X_train, y_train)

# # Print the best parameters and corresponding accuracy
# print("Best parameters found: ", grid_result.best_params_)
# print("Best accuracy found: ", grid_result.best_score_)


> The provided code utilizes k-fold cross-validation to improve the model's performance evaluation. By splitting the data into different folds and training the model on one subset while validating on another, we obtain more reliable accuracy scores. This approach helps ensure the model's effectiveness across diverse portions of the dataset, leading to a more trustworthy assessment of its overall performance.

In [None]:
from sklearn.metrics import confusion_matrix
import seaborn as sns
import matplotlib.pyplot as plt

num_splits = 5
kf = KFold(n_splits=num_splits, shuffle=True, random_state=42)

# Lists to store the cross-validation results
acc_scores = []
fold_number = 1
trained_models = {}
y_train_array = np.array(y_train)

# Perform k-fold cross-validation
for train_index, val_index in kf.split(X_train):
    X_train_fold, X_val_fold = X_train[train_index], X_train[val_index]
    y_train_fold, y_val_fold = y_train_array[train_index], y_train_array[val_index]

    # Assuming you have optimal hyperparameters determined through grid search
    best_learning_rate = 0.001
    best_num_hidden_units = 128
    conv_units = 64
    dense_units = 128

    # Create and compile the model with the best hyperparameters
    model = create_model(input_shape=input_shape,
                         learning_rate=best_learning_rate,
                         num_hidden_units=best_num_hidden_units,
                         conv_units=conv_units,
                         dense_units=dense_units,
                         num_classes=num_classes)

    # Train the model on the current fold
    history = model.fit(X_train_fold, y_train_fold, epochs=50, batch_size=32, verbose=0, validation_data=(X_val_fold, y_val_fold))

    # Evaluate the model on the validation fold
    _, accuracy = model.evaluate(X_val_fold, y_val_fold, verbose=0)
    print(f"Fold {fold_number}: Validation Accuracy = {accuracy}")

    # Plot loss and accuracy curves
    plt.figure(figsize=(12, 4))
    plt.subplot(1, 2, 1)
    plt.plot(history.history['loss'], label='Train')
    plt.plot(history.history['val_loss'], label='Validation')
    plt.title('Model Loss')
    plt.xlabel('Epochs')
    plt.ylabel('Loss')
    plt.legend()

    plt.subplot(1, 2, 2)
    plt.plot(history.history['accuracy'], label='Train')
    plt.plot(history.history['val_accuracy'], label='Validation')
    plt.title('Model Accuracy')
    plt.xlabel('Epochs')
    plt.ylabel('Accuracy')
    plt.legend()

    plt.tight_layout()
    plt.show()

    # Confusion Matrix
    y_val_pred = model.predict(X_val_fold)
    y_val_pred_classes = np.argmax(y_val_pred, axis=1)
    y_val_true_classes = y_val_fold

    cm = confusion_matrix(y_val_true_classes, y_val_pred_classes)

    plt.figure(figsize=(4, 4))
    sns.heatmap(cm, annot=True, fmt='d', cmap='Blues')
    plt.title('Confusion Matrix - Fold {}'.format(fold_number))
    plt.xlabel('Predicted Label')
    plt.ylabel('True Label')
    plt.show()

    trained_models[fold_number] = model
    fold_number += 1
    acc_scores.append(accuracy)
    print("--------------------------------------------")


### Validation check

In [None]:
y_val_array = np.array(y_val)
val_loss, val_accuracy = model.evaluate(X_val, y_val_array)
print(f'Validation Loss: {val_loss:.4f}')
print(f'Validation Accuracy: {val_accuracy:.4f}')

In [None]:
trained_models

> The loop iterates through each trained model stored in the trained_models dictionary. For each model, predictions are made on the X_test_csv_rgb dataset, and these individual predictions are appended to the individual_predictions list. Finally, the mean prediction is computed by taking the average of the individual predictions along the specified axis (axis=0).
> 
> This ensemble strategy can be particularly effective when the individual models capture different aspects of the data or have complementary strengths, contributing to a more reliable and stable prediction.

In [None]:
individual_predictions = []

# Iterate through the trained models and make predictions
for fold_number, model in trained_models.items():
    predictions = model.predict(X_test_csv_rgb)
    individual_predictions.append(predictions)

# Calculate the mean of all predictions
mean_prediction = np.mean(individual_predictions, axis=0)

In [None]:
submissions_file = (f"/kaggle/input/digit-recognizer/sample_submission.csv")
sample_submission = pd.read_csv(submissions_file)
sample_submission

In [None]:
image_ids = range(1, len(mean_prediction) + 1)
# Create a DataFrame with 'ImageId' and 'Label' columns
submission_df = pd.DataFrame({'ImageId': image_ids, 'Label': mean_prediction.argmax(axis=1)})


## Submitting to kaggle

In [None]:
sample_submission['Label'] = submission_df['Label']
sample_submission.to_csv('/kaggle/working/submission.csv', index=False)
sample_submission.head()

# Scope for Improvement
1. We can do Rotations and other Image Pre-processing steps on input images and see if it helps on the accuracy
2. Increment EfficientNetB0 to higher versions and check the accuract
3. Check the mis-classified numbers and add similar looking sets to data or duplicate them to see if that helps on accuracy.
