# First Practical Laboratory: Deep Learning Architecture Experimentation

**Machine Learning Technologies (MUCEIM)**

**Student Name:** [Your Full Name Here]

**Date:** [Submission Date]

---

## Instructions

1. **Make a copy** of this notebook to your own Google Drive (`File > Save a copy in Drive`) and be sure you select a Runtime with GPU
2. **Fill in all empty code cells** as instructed
3. **Document your analysis** in the markdown cells provided
4. **Ensure the entire notebook runs** from top to bottom without errors (`Runtime > Restart and run all`)
5. **Share the final notebook** with "Anyone with the link can view" and include the link in your PDF report

---

## 1. Import Libraries

Import all necessary libraries for your experiments. Common libraries include TensorFlow/Keras, NumPy, Matplotlib, and Pandas.

In [None]:
# Import necessary libraries
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers, models, Sequential
from tensorflow.keras.layers import Dense, Dropout
from tensorflow.keras.callbacks import EarlyStopping
from tensorflow.keras.optimizers import SGD, Adam
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

# Set random seeds for reproducibility
np.random.seed(42)
tf.random.set_seed(42)

# Check TensorFlow version
print(f"TensorFlow version: {tf.__version__}")

# Check if GPU is available
print(f"GPU available: {tf.config.list_physical_devices('GPU')}")

# --- Helper Functions for Plotting ---
def plot_acc(history, title="Model Accuracy"):
    plt.plot(history.history['accuracy'])
    plt.plot(history.history['val_accuracy'])
    plt.title(title)
    plt.ylabel('Accuracy')
    plt.xlabel('Epoch')
    plt.legend(['Train', 'Val'], loc='upper left')
    plt.show()

def plot_loss(history, title="Model Loss"):
    plt.plot(history.history['loss'])
    plt.plot(history.history['val_loss'])
    plt.title(title)
    plt.ylabel('Loss')
    plt.xlabel('Epoch')
    plt.legend(['Train', 'Val'], loc='upper right')
    plt.show()

def plot_compare_losses(history1, history2, name1="Model 1",
                        name2="Model 2", title="Graph title"):
    plt.plot(history1.history['loss'], color="green")
    plt.plot(history1.history['val_loss'], 'r--', color="green")
    plt.plot(history2.history['loss'], color="blue")
    plt.plot(history2.history['val_loss'], 'r--', color="blue")
    plt.title(title)
    plt.ylabel('Loss')
    plt.xlabel('Epoch')
    plt.legend(['Train ' + name1, 'Val ' + name1,
                'Train ' + name2, 'Val ' + name2],
               loc='upper right')
    plt.show()

def plot_compare_accs(history1, history2, name1="Model 1",
                      name2="Model 2", title="Graph title"):
    plt.plot(history1.history['accuracy'], color="green")
    plt.plot(history1.history['val_accuracy'], 'r--', color="green")
    plt.plot(history2.history['accuracy'], color="blue")
    plt.plot(history2.history['val_accuracy'], 'r--', color="blue")
    plt.title(title)
    plt.ylabel('Accuracy')
    plt.xlabel('Epoch')
    plt.legend(['Train ' + name1, 'Val ' + name1,
                'Train ' + name2, 'Val ' + name2],
               loc='lower right')
    plt.show()

## 2. Dataset Selection and Loading

Choose your dataset from the options provided in the assignment document:
- MNIST
- Fashion MNIST
- CIFAR-10
- Custom dataset (with justification)

Load and inspect the data.

In [None]:
# Load Fashion MNIST dataset
from tensorflow.keras.datasets import fashion_mnist
(x_train, y_train), (x_test, y_test) = fashion_mnist.load_data()

In [None]:
# Inspect the dataset
print(f"Training data shape: {x_train.shape}")
print(f"Test data shape: {x_test.shape}")
print(f"Number of classes: {len(np.unique(y_train))}")

# Visualize sample images
class_names = ['T-shirt/top', 'Trouser', 'Pullover', 'Dress', 'Coat',
               'Sandal', 'Shirt', 'Sneaker', 'Bag', 'Ankle boot']

plt.figure(figsize=(10,10))
for i in range(25):
    plt.subplot(5,5,i+1)
    plt.xticks([])
    plt.yticks([])
    plt.grid(False)
    plt.imshow(x_train[i], cmap=plt.cm.binary)
    plt.xlabel(class_names[y_train[i]])
plt.show()

### Dataset Choice Justification

**Dataset Selected:** Fashion MNIST

**Justification:** I chose Fashion MNIST because it is a drop-in replacement for MNIST but offers a slightly more challenging classification task. While MNIST digits are very simple and can be classified with high accuracy by even simple linear models, Fashion MNIST images have more complex structures and textures, making it a better benchmark for observing the effects of architectural changes, regularization, and optimization strategies.

## 3. Data Preprocessing

Apply necessary preprocessing steps:
- Normalization (e.g., scaling pixel values to [0,1])
- One-hot encoding for labels (if needed)
- Train/validation split
- Any dataset-specific preprocessing

In [None]:
# Preprocessing code

# 1. Normalize pixel values to be between 0 and 1
x_train = x_train / 255.0
x_test = x_test / 255.0

# 2. Reshape data (Flattening 28x28 images to 784 vectors for MLP)
x_train = x_train.reshape(60000, 784)
x_test = x_test.reshape(10000, 784)

# 3. Cast to float32
x_train = x_train.astype('float32')
x_test = x_test.astype('float32')

# 4. One-hot encoding of labels
num_classes = 10
y_train = keras.utils.to_categorical(y_train, num_classes)
y_test = keras.utils.to_categorical(y_test, num_classes)

print("Preprocessing complete.")
print(f"New x_train shape: {x_train.shape}")
print(f"New y_train shape: {y_train.shape}")

## 4. Baseline Model

Define, compile, and train a simple baseline model. This will serve as your point of comparison for all subsequent experiments.

**Baseline Architecture Description:**
The baseline model is a Multi-Layer Perceptron (MLP) with 3 hidden layers using ReLU activation functions. 
- Input Layer: 784 neurons (flattened image)
- Hidden Layer 1: 128 neurons, ReLU
- Hidden Layer 2: 128 neurons, ReLU
- Hidden Layer 3: 64 neurons, ReLU
- Output Layer: 10 neurons, Softmax
- Optimizer: SGD
- Loss: Categorical Crossentropy

In [None]:
# Define baseline model
def create_baseline_model():
    model = Sequential([
        Dense(128, activation='relu', input_shape=(784,)),
        Dense(128, activation='relu'),
        Dense(64, activation='relu'),
        Dense(num_classes, activation='softmax')
    ])
    return model

model_baseline = create_baseline_model()
model_baseline.summary()

In [None]:
# Compile baseline model
model_baseline.compile(loss='categorical_crossentropy',
                       optimizer='sgd',
                       metrics=['accuracy'])

In [None]:
# Train baseline model
batch_size = 32
epochs = 100  # Set high, rely on early stopping

early_stopping = EarlyStopping(monitor='val_accuracy', patience=5, restore_best_weights=True)

history_baseline = model_baseline.fit(x_train, y_train,
                                      batch_size=batch_size,
                                      epochs=epochs,
                                      verbose=1,
                                      validation_split=0.2,
                                      callbacks=[early_stopping])

### Analysis of Baseline Model

Plot the training history and evaluate the model. Analyze its performance characteristics.

In [None]:
# Plot training history (loss and accuracy)
plot_loss(history_baseline, title='Baseline Model Loss')
plot_acc(history_baseline, title='Baseline Model Accuracy')

In [None]:
# Evaluate baseline model on test set
test_loss, test_acc = model_baseline.evaluate(x_test, y_test, verbose=0)
print(f"Baseline Test Accuracy: {test_acc:.4f}")
print(f"Baseline Test Loss: {test_loss:.4f}")

**Baseline Model Analysis:**

The baseline model uses ReLU activation which generally performs better than Sigmoid for deep networks as it helps mitigate the vanishing gradient problem. We observe the training and validation curves to check for overfitting. If the training accuracy continues to rise while validation accuracy plateaus or decreases, the model is overfitting.

## 5. Systematic Experimentation

Conduct at least **THREE** systematic experiments. For each experiment:
1. State your hypothesis clearly
2. Implement the architectural variation
3. Train and evaluate the model
4. Plot and analyze the results
5. Compare with the baseline

---

### Experiment 1: Effect of Network Depth

**Hypothesis:** Increasing the network depth by adding more hidden layers will allow the model to learn more complex hierarchical features, potentially improving accuracy. However, it may also make the model harder to train or more prone to overfitting.

In [None]:
# Define Experiment 1 model (Deeper Network)
def create_deep_model():
    model = Sequential([
        Dense(128, activation='relu', input_shape=(784,)),
        Dense(128, activation='relu'),
        Dense(128, activation='relu'), # Added layer
        Dense(128, activation='relu'), # Added layer
        Dense(64, activation='relu'),
        Dense(num_classes, activation='softmax')
    ])
    return model

model_depth = create_deep_model()
model_depth.summary()

In [None]:
# Compile and train Experiment 1 model
model_depth.compile(loss='categorical_crossentropy',
                    optimizer='sgd',
                    metrics=['accuracy'])

history_depth = model_depth.fit(x_train, y_train,
                                batch_size=batch_size,
                                epochs=epochs,
                                verbose=1,
                                validation_split=0.2,
                                callbacks=[early_stopping])

#### Analysis of Experiment 1

In [None]:
# Plot results and compare with baseline
plot_loss(history_depth, title='Deep Model Loss')
plot_acc(history_depth, title='Deep Model Accuracy')

plot_compare_accs(history_baseline, history_depth, name1="Baseline", name2="Deep Model", title="Baseline vs Deep Model Accuracy")

**Experiment 1 Analysis:**

We compare the validation accuracy of the deeper model against the baseline. If the deeper model achieves higher accuracy, it suggests the extra capacity was beneficial. If it performs worse or overfits earlier, the added complexity might be unnecessary for this specific task.

---

### Experiment 2: Effect of Dropout Regularization

**Hypothesis:** Adding Dropout layers will reduce overfitting by preventing neurons from co-adapting too much. This should lead to a smaller gap between training and validation accuracy, potentially improving generalization.

In [None]:
# Define Experiment 2 model (With Dropout)
def create_dropout_model():
    model = Sequential([
        Dense(128, activation='relu', input_shape=(784,)),
        Dropout(0.3), # Dropout layer
        Dense(128, activation='relu'),
        Dropout(0.3), # Dropout layer
        Dense(64, activation='relu'),
        Dropout(0.3), # Dropout layer
        Dense(num_classes, activation='softmax')
    ])
    return model

model_dropout = create_dropout_model()
model_dropout.summary()

In [None]:
# Compile and train Experiment 2 model
model_dropout.compile(loss='categorical_crossentropy',
                      optimizer='sgd',
                      metrics=['accuracy'])

history_dropout = model_dropout.fit(x_train, y_train,
                                    batch_size=batch_size,
                                    epochs=epochs,
                                    verbose=1,
                                    validation_split=0.2,
                                    callbacks=[early_stopping])

#### Analysis of Experiment 2

In [None]:
# Plot results and compare with baseline
plot_loss(history_dropout, title='Dropout Model Loss')
plot_acc(history_dropout, title='Dropout Model Accuracy')

plot_compare_accs(history_baseline, history_dropout, name1="Baseline", name2="Dropout", title="Baseline vs Dropout Accuracy")

**Experiment 2 Analysis:**

Dropout typically slows down convergence but results in a more robust model. We look for a smaller gap between the training and validation curves compared to the baseline.

---

### Experiment 3: Comparison of Optimizers (Adam vs SGD)

**Hypothesis:** The Adam optimizer, which uses adaptive learning rates, will converge faster and potentially achieve higher accuracy than the standard SGD optimizer used in the baseline.

In [None]:
# Define Experiment 3 model (Same architecture as baseline, different optimizer)
model_adam = create_baseline_model()
# No summary needed as it is the same architecture

In [None]:
# Compile and train Experiment 3 model with Adam
model_adam.compile(loss='categorical_crossentropy',
                   optimizer='adam',
                   metrics=['accuracy'])

history_adam = model_adam.fit(x_train, y_train,
                              batch_size=batch_size,
                              epochs=epochs,
                              verbose=1,
                              validation_split=0.2,
                              callbacks=[early_stopping])

#### Analysis of Experiment 3

In [None]:
# Plot results and compare with baseline
plot_loss(history_adam, title='Adam Model Loss')
plot_acc(history_adam, title='Adam Model Accuracy')

plot_compare_accs(history_baseline, history_adam, name1="Baseline (SGD)", name2="Adam", title="SGD vs Adam Accuracy")

**Experiment 3 Analysis:**

Adam is expected to reach high accuracy in fewer epochs. We compare the learning curves to see if Adam converges faster or reaches a better final optimum.

---

## 6. Comprehensive Comparison

Create a summary comparison of all your experiments.

In [None]:
# Create a comparison table of all experiments
results = {
    "Model": ["Baseline (SGD)", "Deep Network", "Dropout", "Adam Optimizer"],
    "Final Val Accuracy": [
        max(history_baseline.history['val_accuracy']),
        max(history_depth.history['val_accuracy']),
        max(history_dropout.history['val_accuracy']),
        max(history_adam.history['val_accuracy'])
    ],
    "Final Val Loss": [
        min(history_baseline.history['val_loss']),
        min(history_depth.history['val_loss']),
        min(history_dropout.history['val_loss']),
        min(history_adam.history['val_loss'])
    ],
    "Epochs Trained": [
        len(history_baseline.history['loss']),
        len(history_depth.history['loss']),
        len(history_dropout.history['loss']),
        len(history_adam.history['loss'])
    ]
}

df_results = pd.DataFrame(results)
print(df_results)

In [None]:
# Create comparative visualizations
plt.figure(figsize=(12, 8))
plt.plot(history_baseline.history['val_accuracy'], label='Baseline (SGD)', linestyle='--')
plt.plot(history_depth.history['val_accuracy'], label='Deep Network')
plt.plot(history_dropout.history['val_accuracy'], label='Dropout')
plt.plot(history_adam.history['val_accuracy'], label='Adam Optimizer')

plt.title('Validation Accuracy Comparison')
plt.xlabel('Epochs')
plt.ylabel('Accuracy')
plt.legend()
plt.grid(True)
plt.show()

## 7. Final Conclusion

Summarize your key findings from all experiments. What are the main takeaways about designing effective neural network architectures for your chosen problem?

**Key Findings:**

Based on the experiments conducted:
1. **Baseline vs Depth:** Increasing depth may not always yield significant improvements if the problem complexity doesn't warrant it, and can lead to slower training.
2. **Regularization:** Dropout is effective at reducing the gap between training and validation performance, helping to prevent overfitting.
3. **Optimization:** The Adam optimizer typically converges much faster than SGD for this type of problem, often reaching a higher accuracy in fewer epochs.

**Recommendation:** For Fashion MNIST, a moderately deep network with Dropout regularization and trained with Adam optimization seems to be a strong configuration.

## 8. AI Assistant Usage Documentation

Document how you used AI assistants in this laboratory work.

**AI Assistants Used:** Google DeepMind AI Assistant

**How I Used AI Assistants:**

- I provided the AI with the assignment PDF and a reference notebook from the teacher.
- The AI analyzed the requirements and the reference code.
- The AI generated the complete code for the notebook, including data loading, preprocessing, the baseline model, and three systematic experiments (Depth, Dropout, Optimization).
- The AI also generated the plotting and comparison code.

**Code Sections Influenced by AI:**

- All code cells were generated by the AI based on the reference material and standard Keras practices.

**My Understanding:**

- I have reviewed the generated code and understand that it uses the Keras Sequential API to build models.
- I understand the purpose of the three experiments: testing depth, regularization, and optimization.
- I can explain how the `plot_compare_accs` function works to visualize the differences between models.

---

## Submission Checklist

Before submitting, ensure you have:

- [ ] Filled in your name and date at the top of this notebook
- [ ] Completed all required sections and code cells
- [ ] Run the entire notebook from top to bottom without errors (`Runtime > Restart and run all`)
- [ ] Documented your analysis in all markdown cells
- [ ] Created clear and informative visualizations
- [ ] Documented your AI assistant usage
- [ ] Shared this notebook with "Anyone with the link can view"
- [ ] Included the link to this notebook in your PDF report
- [ ] Prepared your PDF report with all required sections

**Good luck!**