# üìö PyTorch Practice Notebook - Lecture 1 Exercises

**Based on:** SAIR PyTorch Mastery - Lecture 1: From NumPy to Production Neural Networks

**Instructions:** Complete the exercises below to test your understanding of PyTorch fundamentals. Try to solve them without looking at the original notebook first!

## üîß Setup & Imports

Run this cell first to set up your environment.

In [None]:
import torch
import torch.nn as nn
import torch.optim as optim
import numpy as np
import matplotlib.pyplot as plt

# Set random seeds for reproducibility
torch.manual_seed(42)
np.random.seed(42)

print(f"PyTorch version: {torch.__version__}")
print(f"CUDA available: {torch.cuda.is_available()}")
if torch.cuda.is_available():
    print(f"GPU: {torch.cuda.get_device_name(0)}")

## üéØ Exercise 1: Understanding Tensors & Autograd

### Part A: Manual Gradient Verification

**Task:** Create a simple computation graph and compute gradients both manually and with PyTorch's autograd.

Given:
- x = 3.0
- w = 2.0  
- b = 1.0

Compute: y = w*x + b, then loss = y¬≤

**Your job:**
1. Create PyTorch tensors with gradient tracking
2. Compute the forward pass
3. Compute gradients using `.backward()`
4. Verify by manually computing ‚àÇloss/‚àÇw and ‚àÇloss/‚àÇx using chain rule

In [None]:
# =========== YOUR CODE HERE ===========
# Create tensors with requires_grad=True

# Forward pass: y = w*x + b, loss = y¬≤

# Backward pass

# Print gradients

# Manual verification
# Compute manually: ‚àÇloss/‚àÇw = ? and ‚àÇloss/‚àÇx = ?
# =========================================

### Part B: Gradient Accumulation Demonstration

**Task:** Show what happens when you forget `optimizer.zero_grad()`.

1. Create a simple linear model
2. Run two training steps WITHOUT zero_grad
3. Show that gradients accumulate
4. Fix by adding zero_grad

In [None]:
# =========== YOUR CODE HERE ===========
# Create a simple model
# model = nn.Linear(3, 1)

# Create dummy data
# X = torch.randn(5, 3)
# y = torch.randn(5, 1)

# Create optimizer
# optimizer = optim.SGD(model.parameters(), lr=0.01)

# First forward-backward (no zero_grad)
# Print gradient after first iteration

# Second forward-backward (no zero_grad)
# Print gradient after second iteration

# Show that gradients doubled!

# Now do it correctly with zero_grad
# =========================================

## üèóÔ∏è Exercise 2: Building Neural Networks with nn.Module

### Part A: Convert NumPy Network to PyTorch

**Task:** Convert this NumPy-style network to PyTorch using `nn.Module`.

Original NumPy network:
```python
class NumPyNetwork:
    def __init__(self):
        self.W1 = np.random.randn(10, 20)
        self.b1 = np.zeros(20)
        self.W2 = np.random.randn(20, 5)
        self.b2 = np.zeros(5)
        
    def forward(self, X):
        z1 = X @ self.W1 + self.b1
        a1 = np.maximum(0, z1)  # ReLU
        z2 = a1 @ self.W2 + self.b2
        return z2
```

Create a PyTorch version with:
1. Proper inheritance from `nn.Module`
2. PyTorch layers instead of manual weights
3. ReLU activation
4. Forward method

In [None]:
# =========== YOUR CODE HERE ===========
class PyTorchNetwork(nn.Module):
    def __init__(self):
        super().__init__()
        # Define layers here
        
    def forward(self, x):
        # Implement forward pass
        pass
# =========================================

### Part B: Test Your Network

Test that your network works correctly:

In [None]:
# Test your network
model = PyTorchNetwork()

# Create dummy input
X_test = torch.randn(8, 10)  # batch_size=8, features=10

# Forward pass
output = model(X_test)

print(f"Input shape: {X_test.shape}")
print(f"Output shape: {output.shape}")
print(f"Output range: [{output.min():.3f}, {output.max():.3f}]")

## üîÑ Exercise 3: Complete Training Loop

### Part A: Fix the Buggy Training Loop

**Task:** This training loop has several bugs. Identify and fix them all.

In [None]:
# =========== BUGGY CODE - FIX ME! ===========
class BuggyNet(nn.Module):
    def __init__(self):
        super().__init__()
        self.layer1 = nn.Linear(5, 10)
        self.layer2 = nn.Linear(10, 1)
    
    def forward(self, x):
        x = torch.relu(self.layer1(x))
        return self.layer2(x)

# Create model and optimizer
model = BuggyNet()
optimizer = optim.Adam(model.parameters())

# Dummy data
X = torch.randn(100, 5)
y = torch.randn(100, 1)

# Training loop with bugs
losses = []
for epoch in range(50):
    # Forward pass
    predictions = model(X)
    loss = ((predictions - y) ** 2).mean()
    losses.append(loss.item())
    
    # Backward pass
    loss.backward()
    optimizer.step()
    
    if epoch % 10 == 0:
        print(f"Epoch {epoch}: Loss = {loss.item():.4f}")

# Plot results
plt.plot(losses)
plt.xlabel('Epoch')
plt.ylabel('Loss')
plt.title('Training Loss (Should Decrease!)')
plt.show()
# ============================================

### Part B: Write a Correct Training Loop

**Task:** Write a complete, correct training loop from scratch for a regression problem.

Requirements:
1. Create a neural network with 2 hidden layers
2. Use proper loss function for regression
3. Include all 6 training steps
4. Track and plot loss
5. Make it device-agnostic

In [None]:
# =========== YOUR CODE HERE ===========
# 1. Define your network architecture

# 2. Set device (CPU or GPU)

# 3. Create dummy data
# X = torch.randn(200, 7)  # 200 samples, 7 features
# y = torch.randn(200, 1)  # 200 target values

# 4. Initialize model, loss function, optimizer

# 5. Training loop (100 epochs)
# losses = []
# for epoch in range(100):
#     # Training steps here...
#     pass

# 6. Plot loss curve
# =========================================

## üíæ Exercise 4: Model Persistence

### Part A: Save and Load Model Weights

**Task:** Train a simple model, save it, load it into a new model, and verify they produce identical predictions.

In [None]:
# =========== YOUR CODE HERE ===========
# 1. Create and train a simple model
# model = nn.Sequential(
#     nn.Linear(4, 10),
#     nn.ReLU(),
#     nn.Linear(10, 1)
# )

# 2. Train for a few epochs

# 3. Save model weights

# 4. Create a NEW model with same architecture
# new_model = ...

# 5. Load saved weights into new model

# 6. Verify predictions match
# test_input = torch.randn(5, 4)
# original_pred = model(test_input)
# loaded_pred = new_model(test_input)
# print(f"Predictions match: {torch.allclose(original_pred, loaded_pred, rtol=1e-4)}")
# =========================================

### Part B: Create Training Checkpoints

**Task:** Save a checkpoint that allows resuming training. Include:
- Model state_dict
- Optimizer state_dict
- Epoch number
- Loss value

Then demonstrate loading and resuming training.

In [None]:
# =========== YOUR CODE HERE ===========
# 1. Train a model for 30 epochs

# 2. Create checkpoint dictionary
# checkpoint = {
#     'epoch': 30,
#     'model_state_dict': ...,
#     'optimizer_state_dict': ...,
#     'loss': ...,
# }

# 3. Save checkpoint

# 4. Create new model and optimizer

# 5. Load checkpoint

# 6. Resume training from epoch 30
# for epoch in range(checkpoint['epoch'], 60):
#     # Training loop
#     print(f"Resumed training at epoch {epoch}")
# =========================================

## üöÄ Exercise 5: Real-World Application

### Part A: Boston Housing Prediction

**Task:** Use PyTorch to predict Boston housing prices.

Steps:
1. Load Boston housing dataset
2. Split into train/test
3. Standardize features
4. Create PyTorch model
5. Train model
6. Evaluate on test set

In [None]:
from sklearn.datasets import load_boston
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

# =========== YOUR CODE HERE ===========
# 1. Load dataset
# boston = load_boston()
# X, y = boston.data, boston.target

# 2. Split data (80% train, 20% test)

# 3. Standardize features

# 4. Convert to PyTorch tensors
# X_train_tensor = ...
# y_train_tensor = ...

# 5. Define model architecture
# class BostonPredictor(nn.Module):
#     def __init__(self, input_size):
#         super().__init__()
#         # Define layers
        
#     def forward(self, x):
#         # Forward pass

# 6. Create model instance
# model = BostonPredictor(input_size=X_train.shape[1])

# 7. Training loop
# ...

# 8. Evaluate on test set
# model.eval()
# with torch.no_grad():
#     test_predictions = model(X_test_tensor)
#     test_loss = nn.MSELoss()(test_predictions, y_test_tensor)
#     print(f"Test Loss: {test_loss.item():.4f}")

# 9. Plot predictions vs actual
# plt.scatter(y_test_tensor.numpy(), test_predictions.numpy())
# plt.plot([y.min(), y.max()], [y.min(), y.max()], 'r--')
# plt.xlabel('Actual Prices')
# plt.ylabel('Predicted Prices')
# plt.title('Boston Housing Predictions')
# plt.show()
# =========================================

## üß™ Challenge Problems

### Challenge 1: Debug a Non-Training Model

Create a model that SHOULD learn but doesn't (loss stays constant). Include at least 3 common bugs. Then write debugging code to identify each bug.

In [None]:
# =========== CHALLENGE 1 ===========
# Create a buggy model
class BuggyModel(nn.Module):
    def __init__(self):
        super().__init__()
        # Add bugs here
        
    def forward(self, x):
        # Add bugs here
        pass

# Debugging function
def debug_model(model, X, y):
    """Identify why model isn't training"""
    print("Debugging Model...")
    
    # Check 1: Are gradients being computed?
    
    # Check 2: Are parameters updating?
    
    # Check 3: Is loss changing?
    
    # Check 4: Are weights reasonable?
    
    # Check 5: Is learning rate appropriate?
    
    print("Debugging complete!")

# Create and debug model
# model = BuggyModel()
# debug_model(model, X, y)
# ===================================

### Challenge 2: Create a Learning Rate Finder

Implement a learning rate finder that:
1. Trains the model with exponentially increasing learning rates
2. Plots loss vs learning rate
3. Identifies optimal learning rate range

In [None]:
# =========== CHALLENGE 2 ===========
def find_learning_rate(model, X, y, min_lr=1e-5, max_lr=1, steps=100):
    """Find optimal learning rate range"""
    
    # Your implementation here
    
    # Return best learning rate
    return best_lr

# Usage:
# model = YourModel()
# optimal_lr = find_learning_rate(model, X_train, y_train)
# print(f"Optimal learning rate: {optimal_lr}")
# ===================================

## üìä Assessment Questions

Answer these questions in markdown cells:

### Q1: What's the difference between these two lines?
```python
x = torch.tensor([1.0, 2.0, 3.0])
x = torch.Tensor([1.0, 2.0, 3.0])
```

### Q2: When should you use `model.train()` vs `model.eval()`?

### Q3: Why do we need both `loss.backward()` and `optimizer.step()`? Can't one function do both?

### Q4: What happens if you forget to call `optimizer.zero_grad()` in a training loop?

### Q5: How do you move a model to GPU? What common error occurs if you forget something?

### Q6: What's the difference between saving `model.state_dict()` and saving the entire model with `torch.save(model, ...)`?

### Q7: Why is it important to use `with torch.no_grad():` during inference?

## ‚úÖ Progress Tracker

Check off exercises as you complete them:

- [ ] Exercise 1A: Manual Gradient Verification
- [ ] Exercise 1B: Gradient Accumulation
- [ ] Exercise 2A: Convert NumPy Network
- [ ] Exercise 2B: Test Network
- [ ] Exercise 3A: Fix Buggy Training Loop
- [ ] Exercise 3B: Write Correct Training Loop
- [ ] Exercise 4A: Save/Load Model Weights
- [ ] Exercise 4B: Training Checkpoints
- [ ] Exercise 5: Boston Housing Prediction
- [ ] Challenge 1: Debug Non-Training Model
- [ ] Challenge 2: Learning Rate Finder
- [ ] Assessment Questions Q1-Q7

## üèÜ Completion Certificate

Once you complete all exercises, you've mastered:
- ‚úÖ PyTorch tensor operations and autograd
- ‚úÖ Building neural networks with nn.Module
- ‚úÖ Complete training loops
- ‚úÖ Model persistence
- ‚úÖ Real-world applications
- ‚úÖ Debugging skills

**You're ready for Lecture 2: Advanced PyTorch Patterns!** üéâ