# Initializing a 3-Layer Neural Network with PyTorch

Welcome! In this notebook, we will explore how to **initialize the parameters of a simple 3-layer neural network** using PyTorch.  
Rather than starting with a large dataset like MNIST, we will focus on a **Mini Iris** dataset — a small, intuitive problem where we classify flowers based on just a few physical measurements.  

### Why is initialization important?
Proper initialization of weights and biases is crucial to:
- Ensure stable gradients during training.  
- Speed up convergence.  
- Avoid issues like vanishing or exploding activations.  

### What you will learn
By the end of this notebook, you will be able to:
- Build a **3-layer feedforward neural network** in PyTorch (Input → Hidden → Output).  
- Apply **Xavier initialization** to the weights and zero initialization to the biases.  
- Train and evaluate the model on the **Mini Iris dataset**.  
- Validate your work with **unit tests** that check shapes, initialization, and learning performance.  

Let’s get started 🚀


## Setup and Imports

In [None]:
import torch
import torch.nn as nn
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
from typing import Dict, Tuple

# Set random seeds for reproducibility
torch.manual_seed(42)
np.random.seed(42)

print("✅ All imports successful!")
print(f"PyTorch version: {torch.__version__}")
print(f"CUDA available: {torch.cuda.is_available()}")

## Exercise: Initialize a 3-Layer Neural Network

In this exercise, you will translate the high-level idea of a 3-layer neural network into an actual **PyTorch implementation**.  
Your main goal is to focus on **building the architecture** and **applying Xavier initialization** to the parameters.  

### What to do:
- Define the network using `nn.Sequential`.  
- Add a **linear layer** that maps the 3 input features to 4 hidden neurons.  
- Insert a **ReLU activation** for non-linearity.  
- Add a second **linear layer** that maps 4 hidden neurons to 2 output classes.  
- Initialize all weights with **Xavier initialization** and set all biases to zero.  

👉 Don’t worry about training yet — for now, just make sure the network is built and initialized correctly.


Complete the PyTorch implementation below:

In [None]:
import torch
import torch.nn as nn
import torch.optim as optim

def initialize_3layer_network(n_inputs=3, n_hidden=4, n_outputs=2):
    """
    TODO: Initialize a simple 3-layer neural network in PyTorch.

    The network should have:
    - Input -> Hidden (Linear)
    - ReLU activation
    - Hidden -> Output (Linear)

    Then apply Xavier initialization to all linear layers.
    """
    
    # TODO: Create a PyTorch Sequential model
    # Hint: Use nn.Linear layers with correct dimensions
    # Example: nn.Linear(input_dim, output_dim)
    # TODO: Replace ... with proper dimensions
    model = nn.Sequential(...)
    
    # TODO: Apply Xavier initialization
    # Hint: Use nn.init.xavier_normal_() for weights
    #       Use nn.init.zeros_() for biases
    for layer in model:
        if isinstance(layer, nn.Linear):
            # YOUR CODE HERE
            pass
    
    return model


# Mini Iris Dataset Preparation

In this section, we will prepare a simplified version of the Iris dataset to train our 3-layer neural network.

We select 3 input features (sepal length, sepal width, and petal length) to match our network’s input size.

We restrict the dataset to only two flower species (Setosa and Versicolor), since our output layer has 2 neurons (binary classification).

The data is then converted into PyTorch tensors and split into training (80%) and testing (20%) sets.

This setup ensures that the dataset structure is consistent with our model:
Input layer (3 features) → Hidden layer (4 neurons) → Output layer (2 classes).

In [None]:
from sklearn.datasets import load_iris

# Load the classic Iris dataset
iris = load_iris()

# Select 3 input features to match our network architecture
# Here: sepal length, sepal width, petal length
X = iris['data'][:, [0, 1, 2]]
y = iris['target']

# Filter only two classes (Setosa=0, Versicolor=1)
# This converts the problem into binary classification
mask = (y < 2)
X, y = X[mask], y[mask]

# Convert data into PyTorch tensors
X = torch.tensor(X, dtype=torch.float32)
y = torch.tensor(y, dtype=torch.long)

# Shuffle the dataset randomly
perm = torch.randperm(len(X))
X, y = X[perm], y[perm]

# Split into training (80%) and testing (20%) sets
split = int(0.8 * len(X))
X_train, y_train = X[:split], y[:split]
X_test, y_test   = X[split:], y[split:]


## Model Initialization and Training Setup

Now that we have prepared the dataset, let’s initialize our 3-layer neural network and set up the training components.  

- **Model**: We call `initialize_3layer_network(3, 4, 2)` to create a network with 3 input features, 4 hidden neurons, and 2 output classes.  
- **Loss function**: We use `CrossEntropyLoss`, which is standard for classification tasks.  
- **Optimizer**: We use `Adam` with a learning rate of `0.03` to update the model parameters efficiently.  

This setup will allow us to train the network on the Mini Iris dataset in the next step.


In [None]:
import torch.optim as optim

# Initialize model
model = initialize_3layer_network(n_inputs=3, n_hidden=4, n_outputs=2)

# Define loss function and optimizer
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.03)

## Training the Neural Network

Now we are ready to train our 3-layer neural network on the Mini Iris dataset.  

The training process consists of repeating the following steps for several epochs:  
1. **Forward pass**: Pass the input data through the model to compute predictions (logits).  
2. **Loss computation**: Compare predictions with the true labels using the loss function.  
3. **Backward pass**: Compute gradients of the loss with respect to model parameters.  
4. **Parameter update**: Use the optimizer (Adam) to update the weights and biases.  

We will train for **300 epochs**, printing the loss every 50 epochs to monitor progress.


In [None]:
epochs = 300
for epoch in range(epochs):
    # Forward pass
    logits = model(X_train)
    loss = criterion(logits, y_train)

    # Backward pass
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()
    
    # Print progress every 50 epochs
    if (epoch+1) % 50 == 0:
        print(f"Epoch {epoch+1}/{epochs}, Loss: {loss.item():.4f}")

## Model Evaluation

After training the network, we need to evaluate how well it performs on both the **training** and **test** sets.  

- We define an `accuracy` function that compares the model’s predictions with the true labels.  
- Accuracy is computed as the proportion of correctly classified samples.  
- We will report results on both the training set (to check learning) and the test set (to check generalization).  

This gives us a good indication of whether our 3-layer neural network has successfully learned to separate the two flower classes.


In [None]:
def accuracy(model, X, y):
    with torch.no_grad():
        preds = model(X).argmax(1)
        return (preds == y).float().mean().item()

print("Train Accuracy:", round(accuracy(model, X_train, y_train), 3))
print("Test Accuracy:", round(accuracy(model, X_test, y_test), 3))

## Inspecting Sample Predictions

To better understand how our model works, let’s look at a few predictions on unseen test samples:  

- We will take the first **5 samples** from the test set.  
- The model will output predicted class labels.  
- We will compare these predictions with the **true labels**.  

This helps us verify that the network is not only achieving good accuracy, but also making correct predictions on individual examples.


In [None]:
with torch.no_grad():
    sample = X_test[:5]
    preds = model(sample).argmax(1)
    print("Sample predictions:", preds.tolist())
    print("True labels:", y_test[:5].tolist())

### Interpreting the Sample Predictions

In this output, we compared the model’s predictions against the true labels for the first **5 samples** in the test set.

- **Predictions:** `[0, 0, 0, 0, 1]`  
- **True labels:** `[0, 0, 0, 0, 1]`  

The model correctly classified **all five samples**, meaning its decision boundaries generalize well to unseen data.  
This small test is not a substitute for the full accuracy evaluation, but it provides a quick qualitative check that the model is not only memorizing the training data, but also making correct predictions on new examples.


## Unit Tests with Explicit Feedback

Run the following tests to verify your implementation. Each test provides clear feedback if something is incorrect:

In [None]:
# ✅ Unit Tests for the Mini Iris pipeline (init → train → eval)

import torch
import torch.nn as nn
import torch.optim as optim

def _clone_params(m):
    return [p.detach().clone() for p in m.parameters()]

# --- Build model and training objects ---
torch.manual_seed(0)
model = initialize_3layer_network(n_inputs=3, n_hidden=4, n_outputs=2)
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.03)

# 1) Initialization checks (layers, shapes, zeros in bias, sane variance)
assert isinstance(model[0], nn.Linear) and isinstance(model[2], nn.Linear), \
    "❌ Expected Linear layers at positions [0] and [2]."

W1, b1 = model[0].weight.detach(), model[0].bias.detach()
W2, b2 = model[2].weight.detach(), model[2].bias.detach()

assert W1.shape == (4, 3), f"❌ W1 shape {tuple(W1.shape)}; expected (4, 3)."
assert W2.shape == (2, 4), f"❌ W2 shape {tuple(W2.shape)}; expected (2, 4)."
assert torch.allclose(b1, torch.zeros_like(b1)), "❌ b1 is not zero-initialized."
assert torch.allclose(b2, torch.zeros_like(b2)), "❌ b2 is not zero-initialized."

v1, v2 = W1.var().item(), W2.var().item()
assert 1e-3 < v1 < 1.0, f"❌ Var(W1)={v1:.4f} out of reasonable range; check Xavier init."
assert 1e-3 < v2 < 1.0, f"❌ Var(W2)={v2:.4f} out of reasonable range; check Xavier init."

# 2) Forward pass shape
logits = model(X_train)
assert logits.shape == (X_train.shape[0], 2), \
    f"❌ Forward shape {tuple(logits.shape)}; expected ({X_train.shape[0]}, 2)."

# 3) Parameters should update after one optimizer step
before = _clone_params(model)
loss = criterion(logits, y_train)
optimizer.zero_grad(); loss.backward(); optimizer.step()
after = [p.detach().clone() for p in model.parameters()]
assert any(not torch.allclose(b, a, atol=1e-7) for b, a in zip(before, after)), \
    "❌ Parameters did not update after an optimizer step."

# 4) Training should reduce loss meaningfully
def eval_loss(m, X, y):
    with torch.no_grad():
        return criterion(m(X), y).item()

loss_before = eval_loss(model, X_train, y_train)
for _ in range(200):
    optimizer.zero_grad()
    out = model(X_train)
    l = criterion(out, y_train)
    l.backward(); optimizer.step()
loss_after = eval_loss(model, X_train, y_train)

assert (loss_after < loss_before * 0.7) or (loss_after < 0.25), \
    f"❌ Loss did not decrease enough: before={loss_before:.4f}, after={loss_after:.4f}."

# 5) Test accuracy threshold (binary Mini Iris should be high)
with torch.no_grad():
    preds = model(X_test).argmax(1)
    acc_test = (preds == y_test).float().mean().item()

assert acc_test >= 0.85, f"❌ Test accuracy too low: {acc_test:.3f} (expected ≥ 0.85)."

print(f"✅ All tests passed! (loss: {loss_before:.4f} → {loss_after:.4f}, test_acc={acc_test:.3f})")


## 🎉 Congratulations!

All tests have passed successfully ✅  
You have:

- Correctly initialized a **3-layer neural network** with PyTorch.  
- Trained it on the **Mini Iris dataset**.  
- Evaluated its performance with accuracy and sample predictions.  

This means your model is not only running without errors, but it’s also **learning to separate flower classes effectively** 🌸🚀  

Keep going — you just leveled up your PyTorch skills! 🔥


---

## INSTRUCTOR SOLUTION (HIDDEN FROM STUDENTS) 

**Note: This section is for instructors only and should not be visible to learners.**

Below is the complete solution to the exercise:

In [None]:
# ✅ INSTRUCTOR SOLUTION (HIDDEN FROM STUDENTS)
def initialize_3layer_network(n_inputs=3, n_hidden=4, n_outputs=2):
    """
    SOLUTION: Initialize parameters for a 3-layer neural network using PyTorch.
    
    Args:
        n_inputs (int): Number of input features (default: 3)
        n_hidden (int): Number of hidden neurons (default: 4)
        n_outputs (int): Number of output classes (default: 2)
    
    Returns:
        dict: Dictionary containing initialized parameters
    """
    
    # Create a PyTorch Sequential model
    model = nn.Sequential(
        nn.Linear(n_inputs, n_hidden),
        nn.ReLU(),
        nn.Linear(n_hidden, n_outputs)
    )
    
    # Apply Xavier initialization to all linear layers
    for module in model.modules():
        if isinstance(module, nn.Linear):
            nn.init.xavier_normal_(module.weight)
            nn.init.zeros_(module.bias)
    
    # Extract parameters and convert to dictionary format
    parameters = {}
    parameters['W1'] = model[0].weight.data
    parameters['b1'] = model[0].bias.data.unsqueeze(1)  # Make it (n_hidden, 1)
    parameters['W2'] = model[2].weight.data
    parameters['b2'] = model[2].bias.data.unsqueeze(1)  # Make it (n_outputs, 1)
    
    return model