In [None]:
# DS776 Environment Setup & Package Update
# Configures storage paths for proper cleanup/sync, then updates introdl if needed
# If this cell fails, see Lessons/Course_Tools/AUTO_UPDATE_SYSTEM.md for help
%run ../../Lessons/Course_Tools/auto_update_introdl.py

# Homework 04 Assignment
**Name:** [Student Name Here]  
**Total Points:** 40

## Submission Checklist
- [ ] All code cells executed with output saved
- [ ] All questions answered
- [ ] Notebook converted to HTML (use the Homework_04_Utilities notebook)
- [ ] Canvas notebook filename includes `_GRADE_THIS_ONE`
- [ ] Files uploaded to Canvas

---

# Deeper Networks

In this assignment you'll explore using a deeper, fully connected model to classify FashionMNIST images again.  The point isn't really to get the best classifier possible, rather we want you to get some experience seeing how deep networks can be difficult to train and how using batch normalization and residual connections can make a network easier to train while also increasing the performance.  Along the way you'll practice building a deep model using loops instead of typing out each individual layer.


## Part 1 - Deep Fully Connected Network (8 pts)

Our network will consist of:
* input layer which maps the flattened 784 pixels to 64 neurons followed by a ReLU function.
* 10 blocks, each block has linear + ReLU + linear + ReLU, the number of neurons stays at 64
* a linear output layer that maps the 64 neurons to 10 output neurons (for 10 classes) 

Use this template (cut and paste into a code cell and complete the model)

```python
import torch
import torch.nn as nn
import torch.nn.functional as F

class SimpleBlock(nn.Module):
    def __init__(self, hidden_dim=64):
        super(SimpleBlock, self).__init__()
        # fill in the repeating elements here linear(hidden_dim, hidden_dim) + ReLU + linear + ReLU
        
    def forward(self, x):
        # fill the forward methods to call linear + ReLU + linear + ReLU and return result
        return # complete

class Deep_MNIST_FC(nn.Module):
    def __init__(self, num_blocks=10):
        super(Deep_MNIST_FC, self).__init__()
        
        # Input layer
        self.input_layer = ### complete the input layer with linear and ReLU, will handle flatten in the forward below
    
        
        # Repeating simple blocks
        self.blocks = nn.Sequential(*[SimpleBlock(64) for _ in range(num_blocks)])
        
        # Output layer
        self.output_layer = ### add output layer
        
    def forward(self, x):
        x = x.view(x.size(0), -1)  # Flatten input
        ## call input layer
        ## call blocks
        ## call output layer
        return x
```

In [None]:
# YOUR CODE HERE
# TODO: Complete the SimpleBlock class
# - Add two linear layers (hidden_dim -> hidden_dim) 
# - Add ReLU activations between and after the layers
# - Remember to define layers in __init__ and use them in forward()

class SimpleBlock(nn.Module):
    def __init__(self, hidden_dim=64):
        super(SimpleBlock, self).__init__()
        # Fill in the repeating elements here: linear(hidden_dim, hidden_dim) + ReLU + linear + ReLU
        
    def forward(self, x):
        # Fill the forward method to call linear + ReLU + linear + ReLU and return result
        return  # Complete this

## Storage Guidance

**Always use the path variables** (`MODELS_PATH`, `DATA_PATH`, `CACHE_PATH`) instead of hardcoded paths. The actual locations depend on your environment:

| Variable | CoCalc Home Server | Compute Server |
|----------|-------------------|----------------|
| `MODELS_PATH` | `Homework_04_Models/` | `Homework_04_Models/` *(synced)* |
| `DATA_PATH` | `~/home_workspace/data/` | `~/cs_workspace/data/` *(local)* |
| `CACHE_PATH` | `~/home_workspace/downloads/` | `~/cs_workspace/downloads/` *(local)* |

**Why this matters:**
- On **Compute Servers**: Only `MODELS_PATH` syncs back to CoCalc (~10GB limit). Data and cache stay local (~50GB).
- On **CoCalc Home**: Everything syncs and counts against the ~10GB limit.
- **Storage_Cleanup.ipynb** (in this folder) helps free synced space when needed.

**Tip:** Always write `MODELS_PATH / 'model.pt'` ‚Äî never hardcode paths like `'Homework_04_Models/model.pt'`.

Setup the downsampled (10% of training data) FashionMNIST DataSet and DataLoaders here.  Include the data augmentation from last week.  Use batchsize = 64.  If you're not doing so already, instead of test_dataset and test_loader make the second dataset (with train = False) the valid_dataset and the loader should be valid_loader.

In [None]:
# YOUR CODE HERE
# TODO: Complete the Deep_MNIST_FC class
# - Fill in the input layer (784 -> 64 neurons + ReLU)
# - The blocks are already defined for you
# - Fill in the output layer (64 -> 10 classes)
# - Complete the forward method calls

class Deep_MNIST_FC(nn.Module):
    def __init__(self, num_blocks=10):
        super(Deep_MNIST_FC, self).__init__()
        
        # Input layer
        self.input_layer = ### Complete the input layer with linear and ReLU, will handle flatten in forward below
    
        # Repeating simple blocks (already provided)
        self.blocks = nn.Sequential(*[SimpleBlock(64) for _ in range(num_blocks)])
        
        # Output layer
        self.output_layer = ### Add output layer
        
    def forward(self, x):
        x = x.view(x.size(0), -1)  # Flatten input
        ## Call input layer
        ## Call blocks
        ## Call output layer
        return x

# Create model and show summary
# model = Deep_MNIST_FC()
# summary(model, input_size=(32, 1, 28, 28))

Now train the model with AdamW (lr = 0.001) for 40 epochs.  Track the accuracy.  Be sure to save a checkpoint file.

In [None]:
# YOUR CODE HERE
# TODO: Train the Deep_MNIST_FC model with AdamW optimizer for 40 epochs
# - Initialize the model with 10 blocks
# - Use AdamW optimizer with lr=0.001
# - Track training and validation accuracy
# - Save checkpoint after training
# - Use the training loop from previous assignments

Now use plot_training_metrics to plot the loss and accuracy for the training and validation sets.  You don't have to display the results dataframe, just the plots.

In [None]:
# YOUR CODE HERE
# TODO: Plot training metrics using plot_training_metrics function
# - Load the training results from the checkpoint
# - Use plot_training_metrics to display loss and accuracy curves
# - Show both training and validation metrics
# - No need to display the results dataframe

## Part 2 - Batch Normalization (8 pts)

Create a new model class called Deep_MNIST_FN based on your code from Part 1.  You should now use BatchNorm1d after every linear layer.  Create an instance of your model and train it for 40 epochs.  Produce the same graphs.  Comment on how this model trained compared to the model in Part 1.

In [None]:
# YOUR CODE HERE
# TODO: Setup FashionMNIST data with downsampling and augmentation
# - Load FashionMNIST with appropriate transforms (normalization: mean=0.2860, std=0.3530)
# - Downsample training set to 10% using provided code
# - Create DataLoaders with batch_size=64

# Use this code for downsampling (copy and uncomment):
# from torch.utils.data import Subset
# import numpy as np
# np.random.seed(42)  # use this seed for reproducibility
# subset_indices = np.random.choice(len(train_dataset), size=int(0.1 * len(train_dataset)), replace=False)
# train_dataset = Subset(train_dataset, subset_indices)

In [None]:
# YOUR CODE HERE
# TODO: Create Deep_MNIST_FN model class with batch normalization
# - Base on Deep_MNIST_FC from Part 1
# - Add BatchNorm1d after every linear layer
# - Train for 40 epochs with same parameters
# - Create plots comparing to Part 1 model
# - Comment on training differences

## Part 3 - Residual Connections (8 pts)

Now create a new model class called Deep_MNIST_FC_Res by modifying the model from Part 1 so that each block (with two linear layers) has a residual connection around that block.  This model should not include batch normalization.  Train for 40 epochs.  Make plots.

In [None]:
# YOUR CODE HERE
# TODO: Create Deep_MNIST_FC_Res model class with residual connections
# - Modify the SimpleBlock to include residual connections around each block
# - Add the input to the output of each block (x + block_output)
# - No batch normalization in this version
# - Make sure dimensions match for the residual connection

In [None]:
# YOUR CODE HERE
# TODO: Train the Deep_MNIST_FC_Res model for 40 epochs
# - Use same training parameters as previous models
# - AdamW optimizer with lr=0.001
# - Track training and validation metrics
# - Make plots to compare with previous models

## Part 4 - Combined: BatchNorm + Residuals (8 pts)

Now create a new model class called Deep_MNIST_FC_Res_BN by modifying one of the models from above so that each block (with two linear layers) has a residual connection around that block and all linear layers are followed by batch normalization.  Train for 40 epochs.  Make plots.

In [None]:
# YOUR CODE HERE
# TODO: Create Deep_MNIST_FC_Res_BN model with both residual connections and batch normalization
# - Combine residual connections from Part 3 with batch normalization from Part 2
# - Add BatchNorm1d after every linear layer
# - Include residual connections around each block
# - This should be the most advanced model combining both techniques

In [None]:
# YOUR CODE HERE
# TODO: Train the Deep_MNIST_FC_Res_BN model for 40 epochs
# - Use same training parameters as all previous models
# - AdamW optimizer with lr=0.001
# - Track training and validation metrics for comparison
# - Make plots to compare with all previous models

## Part 5 - Analysis and Comparison (6 pts)

Create two plot showing the validation loss and validation accuracy for each of the four models above.  Write a comparison of the four models.  If you could only choose batch normalization or residual connections, which would it be?  Does the model with both residual connections and batch normalization perform better than the others.  Which approach yeilds the fastest training? Do you find that both residual connections and batch normalization are necessary?  Address ALL of these questions in your analysis.

In [None]:
# YOUR CODE HERE
# TODO: Create comparison plots for all four models
# - Plot validation loss for all models on one figure
# - Plot validation accuracy for all models on another figure
# - Use different colors/styles for each model
# - Add legends to identify each model clearly
# - Include models: Deep_MNIST_FC, Deep_MNIST_FN, Deep_MNIST_FC_Res, Deep_MNIST_FC_Res_BN

üìù **YOUR ANALYSIS HERE:**

Write a comprehensive comparison of the four models addressing all the questions above:
- Compare performance of all four models
- Choose between batch normalization OR residual connections if you could only pick one
- Evaluate whether the combined model (res + BN) performs best
- Analyze which approach yields fastest training
- Discuss necessity of both techniques

Use your training results and plots to support your analysis.

## Part 6 - Reflection (2 pts)

1. What, if anything, did you find difficult to understand for this lesson? Why?

üìù **YOUR ANSWER HERE:**

2. What resources did you find supported your learning most and least for this lesson? (Be honest - I use your input to shape the course.)

üìù **YOUR ANSWER HERE:**

### Export Notebook to HTML for Canvas Upload

Uncomment the two lines below and run the cell to export the current notebook to HTML.

In [None]:
# from introdl import export_this_to_html
# export_this_to_html()