# First Practical Laboratory: Deep Learning Architecture Experimentation

**Machine Learning Technologies (MUCEIM)**

**Student Name:** Borja Albert Gramaje

**Date:** 23/11/2025

---

## Instructions

1. **Make a copy** of this notebook to your own Google Drive (`File > Save a copy in Drive`) and be sure you select a Runtime with GPU
2. **Fill in all empty code cells** as instructed
3. **Document your analysis** in the markdown cells provided
4. **Ensure the entire notebook runs** from top to bottom without errors (`Runtime > Restart and run all`)
5. **Share the final notebook** with "Anyone with the link can view" and include the link in your PDF report

---

## 1. Import Libraries

Import all necessary libraries for your experiments. Common libraries include TensorFlow/Keras, NumPy, Matplotlib, and Pandas.

In [None]:
# Import necessary libraries
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers, models, Sequential
from tensorflow.keras.layers import Dense, Dropout
from tensorflow.keras.callbacks import EarlyStopping
from tensorflow.keras.optimizers import SGD, Adam
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

# Set random seeds for reproducibility
np.random.seed(42)
tf.random.set_seed(42)

# Check TensorFlow version
print(f"TensorFlow version: {tf.__version__}")

# Check if GPU is available
print(f"GPU available: {tf.config.list_physical_devices('GPU')}")

In [None]:
# --- Helper Functions for Plotting ---
def plot_acc(history, title="Model Accuracy"):
    plt.plot(history.history['accuracy'])
    plt.plot(history.history['val_accuracy'])
    plt.title(title)
    plt.ylabel('Accuracy')
    plt.xlabel('Epoch')
    plt.legend(['Train', 'Val'], loc='upper left')
    plt.show()

def plot_loss(history, title="Model Loss"):
    plt.plot(history.history['loss'])
    plt.plot(history.history['val_loss'])
    plt.title(title)
    plt.ylabel('Loss')
    plt.xlabel('Epoch')
    plt.legend(['Train', 'Val'], loc='upper right')
    plt.show()

def plot_compare_losses(history1, history2, name1="Model 1",
                        name2="Model 2", title="Graph title"):
    plt.plot(history1.history['loss'], color="green")
    plt.plot(history1.history['val_loss'], 'r--', color="green")
    plt.plot(history2.history['loss'], color="blue")
    plt.plot(history2.history['val_loss'], 'r--', color="blue")
    plt.title(title)
    plt.ylabel('Loss')
    plt.xlabel('Epoch')
    plt.legend(['Train ' + name1, 'Val ' + name1,
                'Train ' + name2, 'Val ' + name2],
               loc='upper right')
    plt.show()

def plot_compare_accs(history1, history2, name1="Model 1",
                      name2="Model 2", title="Graph title"):
    plt.plot(history1.history['accuracy'], color="green")
    plt.plot(history1.history['val_accuracy'], 'r--', color="green")
    plt.plot(history2.history['accuracy'], color="blue")
    plt.plot(history2.history['val_accuracy'], 'r--', color="blue")
    plt.title(title)
    plt.ylabel('Accuracy')
    plt.xlabel('Epoch')
    plt.legend(['Train ' + name1, 'Val ' + name1,
                'Train ' + name2, 'Val ' + name2],
               loc='lower right')
    plt.show()

## 2. Dataset Selection and Loading

Choose your dataset from the options provided in the assignment document:
- MNIST
- Fashion MNIST
- CIFAR-10
- Custom dataset (with justification)

Load and inspect the data.

In [None]:
# Load Fashion MNIST dataset
from tensorflow.keras.datasets import fashion_mnist
(x_train, y_train), (x_test, y_test) = fashion_mnist.load_data()

In [None]:
# Inspect the dataset
print(f"Training data shape: {x_train.shape}")
print(f"Test data shape: {x_test.shape}")
print(f"Number of classes: {len(np.unique(y_train))}")

# Visualize sample images
class_names = ['T-shirt/top', 'Trouser', 'Pullover', 'Dress', 'Coat',
               'Sandal', 'Shirt', 'Sneaker', 'Bag', 'Ankle boot']

plt.figure(figsize=(10,10))
for i in range(25):
    plt.subplot(5,5,i+1)
    plt.xticks([])
    plt.yticks([])
    plt.grid(False)
    plt.imshow(x_train[i], cmap=plt.cm.binary)
    plt.xlabel(class_names[y_train[i]])
plt.show()

### Dataset Choice Justification

**Dataset Selected:** Fashion MNIST

**Justification:** I chose Fashion MNIST because it is a drop-in replacement for MNIST but offers a slightly more challenging classification task. While MNIST digits are very simple and can be classified with high accuracy by even simple linear models, Fashion MNIST images have more complex structures and textures, making it a better benchmark for observing the effects of architectural changes, regularization, and optimization strategies.

## 3. Data Preprocessing

Apply necessary preprocessing steps:
- Normalization (e.g., scaling pixel values to [0,1])
- One-hot encoding for labels (if needed)
- Train/validation split
- Any dataset-specific preprocessing

In [None]:
# Preprocessing code

# 1. Normalize pixel values to be between 0 and 1
x_train = x_train / 255.0
x_test = x_test / 255.0

# 2. Reshape data (Flattening 28x28 images to 784 vectors for MLP)
x_train = x_train.reshape(60000, 784)
x_test = x_test.reshape(10000, 784)

# 3. Cast to float32
x_train = x_train.astype('float32')
x_test = x_test.astype('float32')

# 4. One-hot encoding of labels
num_classes = len(np.unique(y_train))
y_train = keras.utils.to_categorical(y_train, num_classes)
y_test = keras.utils.to_categorical(y_test, num_classes)

print("Preprocessing complete.")
print(f"New x_train shape: {x_train.shape}")
print(f"New y_train shape: {y_train.shape}")

## 4. Baseline Model

Define, compile, and train a simple baseline model. This will serve as your point of comparison for all subsequent experiments.

**Baseline Architecture Description:**
The baseline model is a Multi-Layer Perceptron (MLP) with 3 hidden layers using Sigmoid activation functions.
- Input Layer: 784 neurons (flattened image)
- Hidden Layer 1: 128 neurons, Sigmoid
- Hidden Layer 2: 128 neurons, Sigmoid
- Hidden Layer 3: 64 neurons, Sigmoid
- Output Layer: 10 neurons, Softmax
- Optimizer: SGD
- Loss: Categorical Crossentropy

In [None]:
# Define baseline model
def create_baseline_model():
    model = Sequential()
    model.add(Dense(128, activation='sigmoid', input_shape=(784,)))
    model.add(Dense(128, activation='sigmoid'))
    model.add(Dense(64, activation='sigmoid'))
    model.add(Dense(num_classes, activation='softmax'))
    return model

model_baseline = create_baseline_model()
model_baseline.summary()

In [None]:
# Compile baseline model
model_baseline.compile(loss='categorical_crossentropy',
                       optimizer='sgd',
                       metrics=['accuracy'])

In [None]:
# Train baseline model
# Use EarlyStopping to prevent overfitting and save time
early_stopping = EarlyStopping(monitor='val_loss', patience=5, restore_best_weights=True)

print("Training Baseline Model...")
history_baseline = model_baseline.fit(x_train, y_train,
                                      batch_size=32,
                                      epochs=100,
                                      validation_data=(x_test, y_test),
                                      callbacks=[early_stopping],
                                      verbose=1)

# Plot baseline results
plot_acc(history_baseline, title="Baseline Model Accuracy")
plot_loss(history_baseline, title="Baseline Model Loss")

## 5. Systematic Experimentation (SGD Optimizer)

Conduct experiments by modifying **one** aspect of the baseline at a time. In this section, we use **SGD** for all experiments.

### Experiment 1: Comparison of Activation Functions (Sigmoid vs ReLU)

**Goal:** Compare the performance of Sigmoid (Baseline) vs ReLU activation functions.
**Configuration:** Same architecture as baseline (128-128-64), but replace Sigmoid with **ReLU**. Optimizer: **SGD**.

In [None]:
# Define Experiment 1 Model (ReLU)
def create_exp1_model():
    model = Sequential()
    model.add(Dense(128, activation='relu', input_shape=(784,)))
    model.add(Dense(128, activation='relu'))
    model.add(Dense(64, activation='relu'))
    model.add(Dense(num_classes, activation='softmax'))
    return model

model_exp1 = create_exp1_model()
model_exp1.compile(loss='categorical_crossentropy',
                   optimizer=SGD(learning_rate=0.01), # Explicitly set LR to avoid dying ReLU issues
                   metrics=['accuracy'])

print("Training Experiment 1 (ReLU)...")
history_exp1 = model_exp1.fit(x_train, y_train,
                              batch_size=32,
                              epochs=100,
                              validation_data=(x_test, y_test),
                              callbacks=[early_stopping],
                              verbose=1)

plot_acc(history_exp1, title="Exp 1: ReLU Accuracy")
plot_loss(history_exp1, title="Exp 1: ReLU Loss")

### Experiment 2: Effect of Network Depth

**Goal:** Test if a deeper network improves performance with ReLU activation.
**Configuration:** Increase hidden layers to 5 (128-128-128-128-64). Activation: **ReLU**. Optimizer: **SGD**.

In [None]:
# Define Experiment 2 Model (Deep Network)
def create_exp2_model():
    model = Sequential()
    model.add(Dense(128, activation='relu', input_shape=(784,)))
    model.add(Dense(128, activation='relu'))
    model.add(Dense(128, activation='relu'))
    model.add(Dense(128, activation='relu'))
    model.add(Dense(64, activation='relu'))
    model.add(Dense(num_classes, activation='softmax'))
    return model

model_exp2 = create_exp2_model()
model_exp2.compile(loss='categorical_crossentropy',
                   optimizer='sgd',
                   metrics=['accuracy'])

print("Training Experiment 2 (Deep Network)...")
history_exp2 = model_exp2.fit(x_train, y_train,
                              batch_size=32,
                              epochs=100,
                              validation_data=(x_test, y_test),
                              callbacks=[early_stopping],
                              verbose=1)

plot_acc(history_exp2, title="Exp 2: Deep Network Accuracy")
plot_loss(history_exp2, title="Exp 2: Deep Network Loss")

### Experiment 3: Effect of Dropout Regularization

**Goal:** Test if Dropout helps reduce overfitting.
**Configuration:** Baseline architecture (ReLU) with Dropout layers (rate=0.2) after each hidden layer. Optimizer: **SGD**.

In [None]:
# Define Experiment 3 Model (Dropout)
def create_exp3_model():
    model = Sequential()
    model.add(Dense(128, activation='relu', input_shape=(784,)))
    model.add(Dropout(0.2))
    model.add(Dense(128, activation='relu'))
    model.add(Dropout(0.2))
    model.add(Dense(64, activation='relu'))
    model.add(Dropout(0.2))
    model.add(Dense(num_classes, activation='softmax'))
    return model

model_exp3 = create_exp3_model()
model_exp3.compile(loss='categorical_crossentropy',
                   optimizer='sgd',
                   metrics=['accuracy'])

print("Training Experiment 3 (Dropout)...")
history_exp3 = model_exp3.fit(x_train, y_train,
                              batch_size=32,
                              epochs=100,
                              validation_data=(x_test, y_test),
                              callbacks=[early_stopping],
                              verbose=1)

plot_acc(history_exp3, title="Exp 3: Dropout Accuracy")
plot_loss(history_exp3, title="Exp 3: Dropout Loss")

## 6. Optimization Experiments (Adam Optimizer)

In this section, we will repeat the previous three experiments (Activation, Depth, Regularization) but using the **Adam** optimizer instead of SGD. This will allow us to directly compare the impact of the optimizer on different architectures.

**Adam (Adaptive Moment Estimation)** is generally considered a more robust and faster-converging optimizer than standard SGD.

### Experiment 4: ReLU Activation with Adam

**Goal:** Compare the performance of the ReLU model (from Exp 1) when trained with Adam vs SGD.
**Configuration:** 3 Hidden Layers (128-128-64), ReLU Activation, **Adam Optimizer**.

In [None]:
# Define Experiment 4 Model (ReLU + Adam)
def create_exp4_model():
    model = Sequential()
    model.add(Dense(128, activation='relu', input_shape=(784,)))
    model.add(Dense(128, activation='relu'))
    model.add(Dense(64, activation='relu'))
    model.add(Dense(num_classes, activation='softmax'))
    return model

model_exp4 = create_exp4_model()
model_exp4.compile(loss='categorical_crossentropy',
                   optimizer='adam',
                   metrics=['accuracy'])

print("Training Experiment 4 (ReLU + Adam)...")
history_exp4 = model_exp4.fit(x_train, y_train,
                              batch_size=32,
                              epochs=100,
                              validation_data=(x_test, y_test),
                              callbacks=[early_stopping],
                              verbose=1)

plot_acc(history_exp4, title="Exp 4: ReLU + Adam Accuracy")
plot_loss(history_exp4, title="Exp 4: ReLU + Adam Loss")

### Experiment 5: Deep Network (5 Layers) with Adam

**Goal:** Compare the performance of the Deep ReLU model (from Exp 2) when trained with Adam vs SGD.
**Configuration:** 5 Hidden Layers (128-128-128-128-64), ReLU Activation, **Adam Optimizer**.

In [None]:
# Define Experiment 5 Model (Deep ReLU + Adam)
def create_exp5_model():
    model = Sequential()
    model.add(Dense(128, activation='relu', input_shape=(784,)))
    model.add(Dense(128, activation='relu'))
    model.add(Dense(128, activation='relu'))
    model.add(Dense(128, activation='relu'))
    model.add(Dense(64, activation='relu'))
    model.add(Dense(num_classes, activation='softmax'))
    return model

model_exp5 = create_exp5_model()
model_exp5.compile(loss='categorical_crossentropy',
                   optimizer='adam',
                   metrics=['accuracy'])

print("Training Experiment 5 (Deep ReLU + Adam)...")
history_exp5 = model_exp5.fit(x_train, y_train,
                              batch_size=32,
                              epochs=100,
                              validation_data=(x_test, y_test),
                              callbacks=[early_stopping],
                              verbose=1)

plot_acc(history_exp5, title="Exp 5: Deep ReLU + Adam Accuracy")
plot_loss(history_exp5, title="Exp 5: Deep ReLU + Adam Loss")

### Experiment 6: Dropout Regularization with Adam

**Goal:** Compare the performance of the Dropout model (from Exp 3) when trained with Adam vs SGD.
**Configuration:** 3 Hidden Layers (128-128-64) with Dropout (0.2), ReLU Activation, **Adam Optimizer**.

In [None]:
# Define Experiment 6 Model (Dropout + Adam)
def create_exp6_model():
    model = Sequential()
    model.add(Dense(128, activation='relu', input_shape=(784,)))
    model.add(Dropout(0.2))
    model.add(Dense(128, activation='relu'))
    model.add(Dropout(0.2))
    model.add(Dense(64, activation='relu'))
    model.add(Dropout(0.2))
    model.add(Dense(num_classes, activation='softmax'))
    return model

model_exp6 = create_exp6_model()
model_exp6.compile(loss='categorical_crossentropy',
                   optimizer='adam',
                   metrics=['accuracy'])

print("Training Experiment 6 (Dropout + Adam)...")
history_exp6 = model_exp6.fit(x_train, y_train,
                              batch_size=32,
                              epochs=100,
                              validation_data=(x_test, y_test),
                              callbacks=[early_stopping],
                              verbose=1)

plot_acc(history_exp6, title="Exp 6: Dropout + Adam Accuracy")
plot_loss(history_exp6, title="Exp 6: Dropout + Adam Loss")

## 7. Comprehensive Comparison

### 7.1 Optimizer Comparison (SGD vs Adam)

We compare the same architectures with different optimizers.

In [None]:
# Compare Exp 1 (ReLU SGD) vs Exp 4 (ReLU Adam)
plot_compare_accs(history_exp1, history_exp4, name1="Exp 1 (ReLU SGD)", name2="Exp 4 (ReLU Adam)", title="ReLU: SGD vs Adam")

# Compare Exp 2 (Depth SGD) vs Exp 5 (Depth Adam)
plot_compare_accs(history_exp2, history_exp5, name1="Exp 2 (Depth SGD)", name2="Exp 5 (Depth Adam)", title="Depth: SGD vs Adam")

# Compare Exp 3 (Dropout SGD) vs Exp 6 (Dropout Adam)
plot_compare_accs(history_exp3, history_exp6, name1="Exp 3 (Dropout SGD)", name2="Exp 6 (Dropout Adam)", title="Dropout: SGD vs Adam")

### 7.2 Master Comparison Table

Below is a summary of all experiments conducted.

In [None]:
results = {
    "Baseline (Sigmoid/SGD)": model_baseline.evaluate(x_test, y_test, verbose=0),
    "Exp 1 (ReLU/SGD)": model_exp1.evaluate(x_test, y_test, verbose=0),
    "Exp 2 (Depth/SGD)": model_exp2.evaluate(x_test, y_test, verbose=0),
    "Exp 3 (Dropout/SGD)": model_exp3.evaluate(x_test, y_test, verbose=0),
    "Exp 4 (ReLU/Adam)": model_exp4.evaluate(x_test, y_test, verbose=0),
    "Exp 5 (Depth/Adam)": model_exp5.evaluate(x_test, y_test, verbose=0),
    "Exp 6 (Dropout/Adam)": model_exp6.evaluate(x_test, y_test, verbose=0)
}

print(f"{'Experiment':<30} | {'Test Loss':<10} | {'Test Accuracy':<15}")
print("-"*60)
for name, metrics in results.items():
    print(f"{name:<30} | {metrics[0]:.4f}     | {metrics[1]*100:.2f}%")

## 8. Conclusion

**Write your conclusions here based on the Master Comparison Table.**

Consider discussing:
1.  **Optimizer Impact:** How much did Adam improve performance compared to SGD across the different architectures? Did it converge faster?
2.  **Activation Functions:** Did ReLU still outperform Sigmoid when using Adam? (Comparing Exp 4 vs Exp 5/6)
3.  **Regularization:** Did Dropout combined with Adam (Exp 6) yield the best generalization (lowest difference between train/val accuracy)?
4.  **Best Overall Model:** Which combination of architecture, activation, and optimizer achieved the highest test accuracy?

## 9. AI Usage Documentation

**Prompts used:**
- "Modify the notebook to use Fashion MNIST."
- "Create a baseline model with Sigmoid activation."
- "Create experiments for Depth, Dropout, and ReLU vs Sigmoid."
- "Repeat experiments using Adam optimizer and compare results."
- "Generate comparison plots and tables."

**Modifications made:**
- The code was structured into clear sections for each experiment.
- Helper functions were used for consistent plotting.
- A master comparison table was added to summarize all findings.