# LAB 1: WORKING WITH PRE-TRAINED MODELS
## Machine Learning Hardware Course

---

## OVERVIEW

This lab introduces you to the practical application of pre-trained convolutional neural networks (CNNs) using the MNIST dataset. You will experiment with established architectures such as MobileNet, ResNet, and VGG, analyzing their performance, efficiency, and hardware requirements. Through hands-on implementation, you will gain experience with transfer learning, model adaptation, and quantitative evaluation of model characteristics.

---

## LEARNING OBJECTIVES

By the end of this lab, you will be able to:

1. Configure a Google Colab environment for deep learning development
2. Adapt pre-trained CNN architectures for the MNIST dataset
3. Compare multiple model architectures based on performance metrics
4. Evaluate the impact of model complexity on hardware requirements
5. Implement transfer learning techniques for efficient model adaptation
6. Quantitatively analyze model performance versus computational cost

---

## TIME ALLOCATION

Total time: 2 hours (120 minutes)

| Activity | Duration |
|----------|----------|
| Environment Setup | 15 minutes |
| Dataset Preparation | 15 minutes |
| Model Adaptation | 30 minutes |
| Model Evaluation | 30 minutes |
| Performance Analysis | 20 minutes |
| Worksheet Completion | 20 minutes |


# PART 1: ENVIRONMENT SETUP

In this section, we'll configure the Google Colab environment, import necessary libraries, and check for GPU availability.

## Mounting Google Drive

First, we'll mount Google Drive to save our work and create a directory for this lab.

In [None]:
# Mount Google Drive
from google.colab import drive
drive.mount('/content/drive')

# Create a directory for this lab
!mkdir -p "/content/drive/My Drive/ML_Hardware_Course/Lab1"

## Installing Required Libraries

Next, we'll import the necessary libraries for this lab, including TensorFlow, Keras, NumPy, Matplotlib, and other data analysis tools.

In [None]:
# Import basic libraries
import numpy as np
import matplotlib.pyplot as plt
import time
import seaborn as sns
from sklearn.metrics import confusion_matrix, classification_report
import pandas as pd

# TensorFlow and Keras
import tensorflow as tf
from tensorflow.keras.datasets import mnist, fashion_mnist
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Input, Dense, Flatten, Dropout, GlobalAveragePooling2D
from tensorflow.keras.layers import Conv2D, MaxPooling2D  # For model modification
from tensorflow.keras.applications import MobileNetV2, ResNet50, VGG16
from tensorflow.keras.utils import to_categorical
from tensorflow.keras.callbacks import EarlyStopping

# Check TensorFlow version
print("TensorFlow version:", tf.__version__)

# Check for GPU availability
print("GPU Available: ", tf.config.list_physical_devices('GPU'))
print("GPU Details:")
try:
    !nvidia-smi
except:
    print("nvidia-smi command not available (likely not running on GPU)")

# PART 2: MNIST DATASET PREPARATION

In this section, we'll load the MNIST dataset and prepare it for use with pre-trained models. The MNIST dataset contains grayscale images of handwritten digits (0-9), but pre-trained models expect RGB inputs with specific sizes. We need to adapt the data accordingly.

## Loading the MNIST Dataset

In [None]:
# Choose which dataset to use (uncomment one)
# Option 1: MNIST (Handwritten Digits)
(X_train, y_train), (X_test, y_test) = mnist.load_data()
class_names = [str(i) for i in range(10)]  # 0-9 digits

# Option 2: Fashion-MNIST (Clothing Items)
# (X_train, y_train), (X_test, y_test) = fashion_mnist.load_data()
# class_names = ['T-shirt/top', 'Trouser', 'Pullover', 'Dress', 'Coat',
#                'Sandal', 'Shirt', 'Sneaker', 'Bag', 'Ankle boot']

# Print dataset shapes
print("Training data shape:", X_train.shape)
print("Training labels shape:", y_train.shape)
print("Test data shape:", X_test.shape)
print("Test labels shape:", y_test.shape)

## Visualizing Sample Images

Let's visualize some sample images from the MNIST dataset to better understand the data we're working with.

In [None]:
# Create a function to display multiple images
def display_sample_images(X, y, num_samples=10):
    plt.figure(figsize=(15, 3))
    for i in range(num_samples):
        plt.subplot(1, num_samples, i+1)
        plt.imshow(X[i], cmap='gray')
        plt.title(f"{class_names[y[i]]}")
        plt.axis('off')
    plt.tight_layout()
    plt.show()

# Display 10 sample images
display_sample_images(X_train, y_train)

## Preprocessing the MNIST Dataset

Now, we'll define a simple preprocessing function for the MNIST dataset. We'll normalize the pixel values to [0,1] and reshape the images to add a channel dimension, which is required for CNNs.

In [None]:
# Simple preprocessing function for MNIST
def preprocess_mnist_simple(X_train, X_test):
    """
    Simple preprocessing for MNIST, normalizing and reshaping to add channel dimension
    """
    # Normalize to [0,1]
    X_train = X_train.astype('float32') / 255.0
    X_test = X_test.astype('float32') / 255.0

    # Add channel dimension (28x28 -> 28x28x1)
    X_train = X_train.reshape(X_train.shape[0], 28, 28, 1)
    X_test = X_test.reshape(X_test.shape[0], 28, 28, 1)

    return X_train, X_test

# Prepare the labels
y_train_encoded = to_categorical(y_train, 10)
y_test_encoded = to_categorical(y_test, 10)

# Create a validation set (20% of training data)
val_size = 12000  # 20% of 60,000
X_val = X_train[-val_size:]
y_val = y_train_encoded[-val_size:]
X_train_final = X_train[:-val_size]
y_train_final = y_train_encoded[:-val_size]

print("Training set size:", X_train_final.shape[0])
print("Validation set size:", X_val.shape[0])
print("Test set size:", X_test.shape[0])

# PART 3: MOBILENETV2 MODEL PREPARATION

In this section, we'll prepare the MobileNetV2 model for the MNIST dataset. MobileNetV2 is a lightweight architecture designed for mobile and edge devices, making it an excellent choice for efficiency-critical applications.

We need to adapt the model for our MNIST dataset, which requires addressing two key challenges:
1. MNIST images are grayscale (1 channel), while pre-trained models expect RGB images (3 channels)
2. MNIST images are 28x28 pixels, while pre-trained models often expect larger input sizes

Our approach will use a padding layer to increase the image size and a convolutional layer to convert from grayscale to RGB.

In [None]:
def create_mobilenet_model():
    """
    Create MobileNetV2 model with properly padded input for MNIST
    """
    # Preprocess data
    X_train_mobilenet, X_test_mobilenet = preprocess_mnist_simple(X_train_final, X_test)
    X_val_mobilenet, _ = preprocess_mnist_simple(X_val, np.zeros((1, 28, 28)))

    # Create model architecture
    inputs = Input(shape=(28, 28, 1))

    # Pad the input from 28x28 to 32x32 using zero padding
    x = tf.keras.layers.ZeroPadding2D(padding=2)(inputs)  # Add 2 pixels on each side: 28x28 -> 32x32
    # Convert single-channel grayscale to 3-channel RGB format required by MobileNetV2
    x = Conv2D(16, kernel_size=3, padding='same', activation='relu')(x)
    x = Conv2D(3, kernel_size=1, padding='same', activation='relu')(x)  # Output 3 channels

    # Create sub-model with MobileNetV2
    base_model = MobileNetV2(
        include_top=False,
        weights='imagenet',
        input_shape=(32, 32, 3),
        pooling='avg'
    )

    # Sets the MobileNetV2 layers as trainable, allowing fine-tuning on MNIST data
    base_model.trainable = True

    # # Freeze the base model layers
    # base_model.trainable = False

    # Continue with the model architecture
    x = base_model(x)
    x = Dense(128, activation='relu')(x)
    x = Dropout(0.2)(x)
    outputs = Dense(10, activation='softmax')(x)

    # Combines the input and output layers into a single Keras model.
    mobilenet_model = Model(inputs, outputs)

    # Compile the model
    mobilenet_model.compile(
        optimizer='adam',
        loss='categorical_crossentropy',
        metrics=['accuracy']
    )

    # Display model summary
    mobilenet_model.summary()

    return mobilenet_model, X_train_mobilenet, X_val_mobilenet, X_test_mobilenet

# Create MobileNetV2 model
print("\n--- Creating MobileNetV2 Model ---")
mobilenet_model, X_train_mobilenet, X_val_mobilenet, X_test_mobilenet = create_mobilenet_model()

**Train MobileNetV2  model**

In [None]:
# Early stopping callback
early_stopping = EarlyStopping(
    monitor='val_accuracy',
    patience=3,
    restore_best_weights=True
)

# Train MobileNetV2 model
print("\n--- Training MobileNetV2 Model ---")
start_time = time.time()
mobilenet_history = mobilenet_model.fit(
    X_train_mobilenet,
    y_train_final,
    epochs=10,
    batch_size=64,
    validation_data=(X_val_mobilenet, y_val),
    callbacks=[early_stopping],
    verbose=1
)
mobilenet_training_time = time.time() - start_time
print(f"MobileNetV2 - Training completed in {mobilenet_training_time:.2f} seconds")


# PART 4: RESNET50 MODEL PREPARATION

Now, we'll prepare the ResNet50 model for the MNIST dataset. ResNet50 is a deeper architecture known for its residual connections that help address the vanishing gradient problem in deep networks.

Similar to the MobileNetV2 approach, we'll need to adapt the ResNet50 model to work with our MNIST dataset by addressing the channel and size discrepancies.

In [None]:
def create_resnet_model():
    """
    Create ResNet50 model with properly padded input for MNIST
    """
    # Preprocess data
    X_train_resnet, X_test_resnet = preprocess_mnist_simple(X_train_final, X_test)
    X_val_resnet, _ = preprocess_mnist_simple(X_val, np.zeros((1, 28, 28)))

    # Create model architecture
    inputs = Input(shape=(28, 28, 1))

    # Pad the input from 28x28 to 32x32
    x = tf.keras.layers.ZeroPadding2D(padding=2)(inputs)  # Add 2 pixels on each side

    # Convert single-channel to 3-channel input
    x = Conv2D(16, kernel_size=3, padding='same', activation='relu')(x)
    x = Conv2D(3, kernel_size=1, padding='same', activation='relu')(x)  # Output 3 channels

    # Load ResNet50 with proper input shape
    base_model = ResNet50(
        include_top=False,
        weights='imagenet',
        input_shape=(32, 32, 3),
        pooling='avg'
    )

    # Sets the ResNet50 layers as trainable, allowing fine-tuning on MNIST data
    base_model.trainable = True

    # Continue with model architecture
    x = base_model(x)
    x = Dense(256, activation='relu')(x)
    x = Dropout(0.3)(x)
    outputs = Dense(10, activation='softmax')(x)

    resnet_model = Model(inputs, outputs)

    # Compile the model
    resnet_model.compile(
        optimizer='adam',
        loss='categorical_crossentropy',
        metrics=['accuracy']
    )

    # Display model summary
    resnet_model.summary()

    return resnet_model, X_train_resnet, X_val_resnet, X_test_resnet

# Create ResNet50 model
print("\n--- Creating ResNet50 Model ---")
resnet_model, X_train_resnet, X_val_resnet, X_test_resnet = create_resnet_model()

In [None]:
# Train ResNet50 model
print("\n--- Training ResNet50 Model ---")
start_time = time.time()
resnet_history = resnet_model.fit(
    X_train_resnet,
    y_train_final,
    epochs=10,
    batch_size=32,  # Smaller batch size due to larger model
    validation_data=(X_val_resnet, y_val),
    callbacks=[early_stopping],
    verbose=1
)
resnet_training_time = time.time() - start_time
print(f"ResNet50 - Training completed in {resnet_training_time:.2f} seconds")

# PART 4: VGG16 MODEL PREPARATION


If accuracy is low, please change the optimizer to sgd like below.    

```
vgg_model.compile(
        optimizer='sgd',
        loss='categorical_crossentropy',
        metrics=['accuracy']
    )
```


In [None]:
def create_vgg_model():
    """
    Create VGG16 model with properly padded input for MNIST
    """
    # Preprocess data
    X_train_vgg, X_test_vgg = preprocess_mnist_simple(X_train_final, X_test)
    X_val_vgg, _ = preprocess_mnist_simple(X_val, np.zeros((1, 28, 28)))

    # Create model architecture
    inputs = Input(shape=(28, 28, 1))

    # Pad the input from 28x28 to 32x32
    x = tf.keras.layers.ZeroPadding2D(padding=2)(inputs)  # Add 2 pixels on each side

    # Convert single-channel to 3-channel
    x = Conv2D(16, kernel_size=3, padding='same', activation='relu')(x)
    x = Conv2D(3, kernel_size=1, padding='same', activation='relu')(x)  # Output 3 channels

    # Load VGG16 with proper input shape
    base_model = VGG16(
        include_top=False,
        weights='imagenet',
        input_shape=(32, 32, 3),
        pooling='avg'
    )

    # Sets the VGG16 layers as trainable, allowing fine-tuning on MNIST data
    base_model.trainable = True

    # Continue with model architecture
    x = base_model(x)
    x = Dense(128, activation='relu')(x)
    x = Dropout(0.3)(x)
    outputs = Dense(10, activation='softmax')(x)

    vgg_model = Model(inputs, outputs)

    # Compile the model
    vgg_model.compile(
        optimizer='adam',   # sgd
        loss='categorical_crossentropy',
        metrics=['accuracy']
    )

    # Display model summary
    vgg_model.summary()

    return vgg_model, X_train_vgg, X_val_vgg, X_test_vgg


# Create VGG16 model
vgg_model, X_train_vgg, X_val_vgg, X_test_vgg = create_vgg_model()

: 

## Training VGG16 Model

In [None]:
# Train VGG16 model
print("\n--- Training VGG16 Model ---")
start_time = time.time()
vgg_history = vgg_model.fit(
    X_train_vgg,
    y_train_final,
    epochs=10,
    batch_size=64,
    validation_data=(X_val_vgg, y_val),
    callbacks=[early_stopping],
    verbose=1
)
vgg_training_time = time.time() - start_time
print(f"VGG16 - Training completed in {vgg_training_time:.2f} seconds")

: 

# PART 7: MODEL EVALUATION

Now that we have trained all three models, let's evaluate their performance on the test set and visualize the training history.

In [None]:
# Evaluate models on test set
print("\n--- Model Evaluation on Test Set ---")
mobilenet_loss, mobilenet_accuracy = mobilenet_model.evaluate(X_test_mobilenet, y_test_encoded)
print(f"MobileNetV2 - Test accuracy: {mobilenet_accuracy*100:.2f}%")

resnet_loss, resnet_accuracy = resnet_model.evaluate(X_test_resnet, y_test_encoded)
print(f"ResNet50 - Test accuracy: {resnet_accuracy*100:.2f}%")

vgg_loss, vgg_accuracy = vgg_model.evaluate(X_test_vgg, y_test_encoded)
print(f"VGG16 - Test accuracy: {vgg_accuracy*100:.2f}%")

## Visualizing Training History

Let's visualize the training history of all three models to compare their learning curves.

In [None]:
# Plot training history
def plot_training_history(histories, titles):
    plt.figure(figsize=(15, 5))

    # Plot accuracy
    plt.subplot(1, 2, 1)
    for history, title in zip(histories, titles):
        plt.plot(history.history['accuracy'], label=f'{title} - Training')
        plt.plot(history.history['val_accuracy'], label=f'{title} - Validation')

    plt.title('Model Accuracy')
    plt.xlabel('Epoch')
    plt.ylabel('Accuracy')
    plt.legend()
    plt.grid(True)

    # Plot loss
    plt.subplot(1, 2, 2)
    for history, title in zip(histories, titles):
        plt.plot(history.history['loss'], label=f'{title} - Training')
        plt.plot(history.history['val_loss'], label=f'{title} - Validation')

    plt.title('Model Loss')
    plt.xlabel('Epoch')
    plt.ylabel('Loss')
    plt.legend()
    plt.grid(True)

    plt.tight_layout()
    plt.show()

# Plot training history for all models
plot_training_history(
    [mobilenet_history, resnet_history, vgg_history],
    ['MobileNetV2', 'ResNet50', 'VGG16']
)

# PART 8: CONFUSION MATRICES AND CLASSIFICATION REPORTS

In this section, we'll analyze the performance of each model in more detail by generating confusion matrices and classification reports. This will help us understand which digits are most frequently misclassified by each model.

In [None]:
# Function to generate predictions and confusion matrix
def analyze_model_performance(model, X_test, y_test, model_name):
    # Generate predictions
    y_pred = model.predict(X_test)
    y_pred_classes = np.argmax(y_pred, axis=1)
    y_true_classes = np.argmax(y_test, axis=1)

    # Create confusion matrix
    cm = confusion_matrix(y_true_classes, y_pred_classes)
    plt.figure(figsize=(10, 8))
    sns.heatmap(cm, annot=True, fmt='d', cmap='Blues', cbar=False)
    plt.title(f'{model_name} - Confusion Matrix')
    plt.ylabel('True Label')
    plt.xlabel('Predicted Label')
    plt.show()

    # Find the most confused pairs
    cm_normalized = cm.copy()
    np.fill_diagonal(cm_normalized, 0)  # Ignore correct predictions
    max_confusion = np.unravel_index(np.argmax(cm_normalized), cm_normalized.shape)
    print(f"Most confused pair: True digit {max_confusion[0]} predicted as {max_confusion[1]} ({cm_normalized[max_confusion]} times)")

    # Generate classification report
    report = classification_report(y_true_classes, y_pred_classes, output_dict=True)
    report_df = pd.DataFrame(report).transpose()
    print(f"{model_name} Classification Report:")
    print(report_df.round(3))

    return y_pred_classes, report, max_confusion

# Analyze each model's performance
print("\n--- MobileNetV2 Performance Analysis ---")
mobilenet_pred, mobilenet_report, mobilenet_confused_pair = analyze_model_performance(
    mobilenet_model, X_test_mobilenet, y_test_encoded, 'MobileNetV2'
)

print("\n--- ResNet50 Performance Analysis ---")
resnet_pred, resnet_report, resnet_confused_pair = analyze_model_performance(
    resnet_model, X_test_resnet, y_test_encoded, 'ResNet50'
)

print("\n--- VGG16 Performance Analysis ---")
vgg_pred, vgg_report, vgg_confused_pair = analyze_model_performance(
    vgg_model, X_test_vgg, y_test_encoded, 'VGG16'
)

# PART 9: MODEL METRICS COMPARISON

Now, let's compare the models based on various metrics such as parameter count, training time, and inference time. This will help us understand the trade-offs between model complexity, performance, and efficiency.

In [None]:
def count_model_parameters(model):
    trainable_params = np.sum([np.prod(v.shape) for v in model.trainable_weights])
    non_trainable_params = np.sum([np.prod(v.shape) for v in model.non_trainable_weights])
    total_params = trainable_params + non_trainable_params
    return trainable_params, non_trainable_params, total_params

# Get parameter counts for each model
mobilenet_trainable, mobilenet_non_trainable, mobilenet_total = count_model_parameters(mobilenet_model)
resnet_trainable, resnet_non_trainable, resnet_total = count_model_parameters(resnet_model)
vgg_trainable, vgg_non_trainable, vgg_total = count_model_parameters(vgg_model)

# Function to measure inference time
def measure_inference_time(model, X_test, batch_size=1, num_runs=50):
    # Warm-up
    for _ in range(10):
        _ = model.predict(X_test[:batch_size])

    # Measure time for inference
    start_time = time.time()
    for _ in range(num_runs):
        _ = model.predict(X_test[:batch_size])
    total_time = time.time() - start_time

    # Calculate average inference time per batch
    avg_time = total_time / num_runs
    return avg_time * 1000  # Convert to milliseconds

# Measure inference time for each model (single image)
mobilenet_inference_time = measure_inference_time(mobilenet_model, X_test_mobilenet)
resnet_inference_time = measure_inference_time(resnet_model, X_test_resnet)
vgg_inference_time = measure_inference_time(vgg_model, X_test_vgg)

print("\n--- Single Image Inference Time ---")
print(f"MobileNetV2 - Inference time (1 image): {mobilenet_inference_time:.2f} ms")
print(f"ResNet50 - Inference time (1 image): {resnet_inference_time:.2f} ms")
print(f"VGG16 - Inference time (1 image): {vgg_inference_time:.2f} ms")

# Measure inference time for batch of 32 images
mobilenet_batch_time = measure_inference_time(mobilenet_model, X_test_mobilenet, batch_size=32, num_runs=20)
resnet_batch_time = measure_inference_time(resnet_model, X_test_resnet, batch_size=32, num_runs=20)
vgg_batch_time = measure_inference_time(vgg_model, X_test_vgg, batch_size=32, num_runs=20)

print("\n--- Batch Inference Time (32 images) ---")
print(f"MobileNetV2 - Inference time (32 images): {mobilenet_batch_time:.2f} ms")
print(f"ResNet50 - Inference time (32 images): {resnet_batch_time:.2f} ms")
print(f"VGG16 - Inference time (32 images): {vgg_batch_time:.2f} ms")

## Creating Comparison Tables

Let's create a comprehensive table that compares all the models based on various metrics.

In [None]:
# Create comparison table
model_metrics = {
    'Model': ['MobileNetV2', 'ResNet50', 'VGG16'],
    'Test Accuracy (%)': [
        mobilenet_accuracy * 100,
        resnet_accuracy * 100,
        vgg_accuracy * 100
    ],
    'Trainable Parameters': [
        mobilenet_trainable,
        resnet_trainable,
        vgg_trainable
    ],
    'Total Parameters': [
        mobilenet_total,
        resnet_total,
        vgg_total
    ],
    'Training Time (s)': [
        mobilenet_training_time,
        resnet_training_time,
        vgg_training_time
    ],
    'Inference Time (ms)': [
        mobilenet_inference_time,
        resnet_inference_time,
        vgg_inference_time
    ],
    'Batch Inference Time (ms)': [
        mobilenet_batch_time,
        resnet_batch_time,
        vgg_batch_time
    ],
    'Parameters/Second': [
        mobilenet_total / mobilenet_training_time,
        resnet_total / resnet_training_time,
        vgg_total / vgg_training_time
    ],
    'Accuracy/Million Params': [
        (mobilenet_accuracy * 100) / (mobilenet_total / 1e6),
        (resnet_accuracy * 100) / (resnet_total / 1e6),
        (vgg_accuracy * 100) / (vgg_total / 1e6)
    ],
    'Most Confused Pair': [
        f"{mobilenet_confused_pair[0]}-{mobilenet_confused_pair[1]}",
        f"{resnet_confused_pair[0]}-{resnet_confused_pair[1]}",
        f"{vgg_confused_pair[0]}-{vgg_confused_pair[1]}"
    ]
}

# Create and display DataFrame
metrics_df = pd.DataFrame(model_metrics).set_index('Model')
print("\n--- Model Comparison Metrics ---")
pd.set_option('display.float_format', '{:.2f}'.format)
print(metrics_df)

# PART 10: VISUALIZATION OF MODEL COMPARISONS

Let's create visual comparisons of the models to better understand their performance characteristics.

In [None]:
# Visualize the metrics
plt.figure(figsize=(15, 10))

# Accuracy comparison
plt.subplot(2, 2, 1)
plt.bar(model_metrics['Model'], model_metrics['Test Accuracy (%)'])
plt.title('Test Accuracy (%)')
plt.ylim(90, 100)  # Adjust as needed
plt.grid(axis='y')

# Training time comparison
plt.subplot(2, 2, 2)
plt.bar(model_metrics['Model'], model_metrics['Training Time (s)'])
plt.title('Training Time (seconds)')
plt.grid(axis='y')

# Parameter count comparison (log scale)
plt.subplot(2, 2, 3)
plt.bar(model_metrics['Model'], [np.log10(p) for p in model_metrics['Total Parameters']])
plt.title('Log10(Total Parameters)')
plt.grid(axis='y')

# Efficiency comparison
plt.subplot(2, 2, 4)
plt.bar(model_metrics['Model'], model_metrics['Accuracy/Million Params'])
plt.title('Accuracy/Million Parameters')
plt.grid(axis='y')

plt.tight_layout()
plt.show()

# Inference time comparison
plt.figure(figsize=(12, 5))

plt.subplot(1, 2, 1)
plt.bar(model_metrics['Model'], model_metrics['Inference Time (ms)'])
plt.title('Single Image Inference Time (ms)')
plt.grid(axis='y')

plt.subplot(1, 2, 2)
plt.bar(model_metrics['Model'], model_metrics['Batch Inference Time (ms)'])
plt.title('Batch Inference Time (32 images, ms)')
plt.grid(axis='y')

plt.tight_layout()
plt.show()

# PART 11: WORKSHEET VALUES SUMMARY

In this section, we'll extract the key metrics and findings that you'll need to complete the graded worksheet for this lab.

In [None]:
print("\n===== WORKSHEET VALUES =====")

print("\n1.1 Basic Performance Metrics:")
for model_name, trainable, total, accuracy, train_time, infer_time in zip(
    model_metrics['Model'],
    model_metrics['Trainable Parameters'],
    model_metrics['Total Parameters'],
    model_metrics['Test Accuracy (%)'],
    model_metrics['Training Time (s)'],
    model_metrics['Inference Time (ms)']
):
    print(f"\n{model_name}:")
    print(f"  Trainable Parameters: {trainable}")
    print(f"  Total Parameters: {total}")
    print(f"  Test Accuracy: {accuracy:.2f}%")
    print(f"  Training Time: {train_time:.2f} seconds")
    print(f"  Inference Time: {infer_time:.2f} ms")

print("\n1.2 Efficiency Metrics:")
for model_name, params_per_sec, acc_per_mil, batch_time in zip(
    model_metrics['Model'],
    model_metrics['Parameters/Second'],
    model_metrics['Accuracy/Million Params'],
    model_metrics['Batch Inference Time (ms)']
):
    print(f"\n{model_name}:")
    print(f"  Parameters/Second: {params_per_sec:.2f}")
    print(f"  Accuracy/Million Params: {acc_per_mil:.2f}")
    print(f"  Batch Inference Time: {batch_time:.2f} ms")

print("\n2.1 Most Confused Digit Pairs:")
for model_name, confused_pair in zip(
    model_metrics['Model'],
    model_metrics['Most Confused Pair']
):
    print(f"  {model_name}: {confused_pair}")

# Get the best performing model
best_model_idx = np.argmax(model_metrics['Test Accuracy (%)'])
best_model_name = model_metrics['Model'][best_model_idx]
best_model_report = [mobilenet_report, resnet_report, vgg_report][best_model_idx]

print(f"\n2.2 Per-Class Precision for Best Model ({best_model_name}):")
for digit in range(10):
    print(f"  Digit {digit}: {best_model_report[str(digit)]['precision']:.4f}")

## Model Comparison Table for Worksheet

Below is a formatted table of the key metrics that you need to include in your worksheet submission:

In [None]:
# Create a results summary DataFrame for the worksheet
results_df = pd.DataFrame({
    "Model": model_metrics['Model'],
    "Test Accuracy (%)": model_metrics['Test Accuracy (%)'],
    "Total Parameters": model_metrics['Total Parameters'],
    "Training Time (s)": model_metrics['Training Time (s)'],
    "Inference Time (ms)": model_metrics['Inference Time (ms)'],
    "Accuracy/Million Params": model_metrics['Accuracy/Million Params']
})

print("\n--- Results Summary for Worksheet ---")
print(results_df)

# PART 12: SAVE MODELS AND NOTEBOOK

Now that we've completed all the analyses, let's save our models and the notebook for future reference.

In [None]:
# Save models to Google Drive
mobilenet_model.save("/content/drive/My Drive/ML_Hardware_Course/Lab1/mobilenet_mnist.h5")
resnet_model.save("/content/drive/My Drive/ML_Hardware_Course/Lab1/resnet_mnist.h5")
vgg_model.save("/content/drive/My Drive/ML_Hardware_Course/Lab1/vgg_mnist.h5")
print("Models saved successfully!")

# Create a results summary file
results_summary = {
    "basic_metrics": {
        "mobilenet": {
            "trainable_params": int(mobilenet_trainable),
            "total_params": int(mobilenet_total),
            "test_accuracy": float(mobilenet_accuracy * 100),
            "training_time": float(mobilenet_training_time),
            "inference_time": float(mobilenet_inference_time),
        },
        "resnet": {
            "trainable_params": int(resnet_trainable),
            "total_params": int(resnet_total),
            "test_accuracy": float(resnet_accuracy * 100),
            "training_time": float(resnet_training_time),
            "inference_time": float(resnet_inference_time),
        },
        "vgg": {
            "trainable_params": int(vgg_trainable),
            "total_params": int(vgg_total),
            "test_accuracy": float(vgg_accuracy * 100),
            "training_time": float(vgg_training_time),
            "inference_time": float(vgg_inference_time),
        }
    },
    "efficiency_metrics": {
        "mobilenet": {
            "params_per_second": float(mobilenet_total / mobilenet_training_time),
            "accuracy_per_million_params": float((mobilenet_accuracy * 100) / (mobilenet_total / 1e6)),
            "batch_inference_time": float(mobilenet_batch_time),
        },
        "resnet": {
            "params_per_second": float(resnet_total / resnet_training_time),
            "accuracy_per_million_params": float((resnet_accuracy * 100) / (resnet_total / 1e6)),
            "batch_inference_time": float(resnet_batch_time),
        },
        "vgg": {
            "params_per_second": float(vgg_total / vgg_training_time),
            "accuracy_per_million_params": float((vgg_accuracy * 100) / (vgg_total / 1e6)),
            "batch_inference_time": float(vgg_batch_time),
        }
    },
    "confusion_pairs": {
        "mobilenet": {
            "pair": f"{mobilenet_confused_pair[0]}-{mobilenet_confused_pair[1]}",
            "count": int(confusion_matrix(np.argmax(y_test_encoded, axis=1), mobilenet_pred)[mobilenet_confused_pair])
        },
        "resnet": {
            "pair": f"{resnet_confused_pair[0]}-{resnet_confused_pair[1]}",
            "count": int(confusion_matrix(np.argmax(y_test_encoded, axis=1), resnet_pred)[resnet_confused_pair])
        },
        "vgg": {
            "pair": f"{vgg_confused_pair[0]}-{vgg_confused_pair[1]}",
            "count": int(confusion_matrix(np.argmax(y_test_encoded, axis=1), vgg_pred)[vgg_confused_pair])
        }
    },
    "best_model": {
        "name": best_model_name,
        "precision_by_digit": {str(digit): best_model_report[str(digit)]['precision'] for digit in range(10)}
    }
}

# Save results as JSON
import json
with open("/content/drive/My Drive/ML_Hardware_Course/Lab1/results_summary.json", "w") as f:
    json.dump(results_summary, f, indent=4)
print("Results summary saved successfully!")

# PART 13: ANALYSIS QUESTIONS DISCUSSION

Based on the experimental results, let's address the analysis questions posed in the lab worksheet. These discussions can help you formulate your answers for the worksheet submission.

## 1. Which model provides the best balance between accuracy and computational efficiency? Why?

To answer this question, consider metrics like:
- Test accuracy
- Inference time (especially important for deployment)
- Number of parameters (affects memory usage)
- Accuracy per million parameters (efficiency metric)

Looking at our results, MobileNetV2 likely provides the best balance because it was specifically designed for mobile and edge devices where computational resources are limited, while still maintaining good accuracy.

## 2. How does model size affect training time versus inference time? Explain the differences observed.

Compare the relationship between:
- Total parameters vs. training time
- Total parameters vs. inference time

Generally, larger models (like VGG16) take longer to train and have higher inference times compared to smaller models (like MobileNetV2). However, the relationship isn't always linear due to model architecture differences.

## 3. Why might you choose MobileNetV2 over ResNet50 or VGG16 for a mobile application?

Consider factors like:
- Memory footprint (parameter count)
- Inference speed
- Power consumption implications
- Acceptable accuracy trade-offs for mobile scenarios

MobileNetV2 was specifically designed with mobile devices in mind, using techniques like depthwise separable convolutions to reduce computational complexity while maintaining reasonable accuracy.

## 4. What hardware factors significantly impact the performance of these pre-trained models?

Consider:
- GPU vs. CPU differences
- Memory bandwidth
- Cache size effects
- Parallel processing capabilities
- Model quantization benefits

The performance of deep learning models is highly dependent on hardware acceleration (like GPUs), memory bandwidth, and the ability to parallelize operations, especially for larger models like ResNet50 and VGG16.

# CONCLUSION

In this lab, we explored three pre-trained CNN architectures (MobileNetV2, ResNet50, and VGG16) and adapted them for the MNIST dataset. We compared their performance in terms of accuracy, training time, inference time, and model size.

Key takeaways from this lab:

1. **Transfer Learning Effectiveness**: We successfully applied transfer learning to adapt pre-trained ImageNet models to a different domain (handwritten digit recognition).

2. **Performance-Efficiency Trade-offs**: We observed the trade-offs between model size, accuracy, and computational efficiency across different architectures.

3. **Adaptation Challenges**: We addressed the challenges of adapting pre-trained models to work with different input sizes and channel dimensions.

4. **Quantitative Evaluation**: We performed detailed quantitative evaluation of model performance and efficiency metrics.

This knowledge will be valuable when selecting appropriate model architectures for different deployment scenarios, especially when hardware constraints are a consideration.