# Evaluating and Improving Models

## Overview
This notebook covers techniques for evaluating neural network performance and improving model training through various methods including freezing layers, transfer learning, and optimization strategies.

In [2]:
import torch
import torch.nn as nn
from torch.utils.data import TensorDataset, DataLoader
import pandas as pd
import torchmetrics

# Create a Sequential model with multiple layers
# This allows us to access layers by index (0, 1, 2, etc.)
model = nn.Sequential(
    nn.Linear(10, 8),   # Layer 0: Input layer
    nn.ReLU(),          # Layer 1: Activation
    nn.Linear(8, 5),    # Layer 2: Hidden layer
    nn.ReLU(),          # Layer 3: Activation
    nn.Linear(5, 2)     # Layer 4: Output layer
)

print("Model architecture:")
print(model)
print("\nModel parameters:")
for name, param in model.named_parameters():
    print(f"{name}: shape {param.shape}, requires_grad={param.requires_grad}")

Model architecture:
Sequential(
  (0): Linear(in_features=10, out_features=8, bias=True)
  (1): ReLU()
  (2): Linear(in_features=8, out_features=5, bias=True)
  (3): ReLU()
  (4): Linear(in_features=5, out_features=2, bias=True)
)

Model parameters:
0.weight: shape torch.Size([8, 10]), requires_grad=True
0.bias: shape torch.Size([8]), requires_grad=True
2.weight: shape torch.Size([5, 8]), requires_grad=True
2.bias: shape torch.Size([5]), requires_grad=True
4.weight: shape torch.Size([2, 5]), requires_grad=True
4.bias: shape torch.Size([2]), requires_grad=True


### Understanding Layer Freezing

**How it works:**
- `param.requires_grad = True`: Parameter will be updated during training
- `param.requires_grad = False`: Parameter is frozen (no updates)

**Layer naming in Sequential models:**
- Layers are indexed: `0`, `1`, `2`, etc.
- Each layer has parameters: `weight` and `bias`
- Full name format: `{layer_index}.{parameter_name}`
- Example: `'0.weight'` = weight of first layer, `'2.bias'` = bias of third layer

**Common freezing patterns:**
```python
# Freeze all layers
for param in model.parameters():
    param.requires_grad = False

# Freeze only first N layers
for i, (name, param) in enumerate(model.named_parameters()):
    if i < N * 2:  # *2 because each layer has weight and bias
        param.requires_grad = False

# Freeze by name pattern
for name, param in model.named_parameters():
    if 'conv' in name:  # Freeze all convolutional layers
        param.requires_grad = False
```

**Use case:** In transfer learning, you freeze early layers (that detect basic features) and only train the later layers (that learn task-specific patterns).

## Example 1: Freezing Model Layers

**What is Layer Freezing?**
Freezing layers means preventing certain layers from being updated during training by setting `requires_grad=False` on their parameters.

**Why Freeze Layers?**
- **Transfer Learning**: Keep pre-trained weights from earlier layers unchanged
- **Faster Training**: Only update weights in unfrozen layers
- **Prevent Overfitting**: Preserve learned features from pre-trained models
- **Fine-tuning**: Adapt pre-trained models to new tasks with limited data

**When to Freeze Layers:**
- Using a pre-trained model on a similar task
- Training on a small dataset
- Early layers have learned useful general features
- Fine-tuning only the last few layers for a new task

In [2]:
# Freeze layers by checking parameter names
for name, param in model.named_parameters():
    # Check for first layer's weight (layer 0)
    if name == '0.weight':
        # Freeze this weight
        param.requires_grad = False
        print(f"Frozen: {name}")
    
    # Check for second layer's weight (layer 2)
    if name == '2.weight':
        # Freeze this weight
        param.requires_grad = False
        print(f"Frozen: {name}")

print("\nAfter freezing:")
for name, param in model.named_parameters():
    print(f"{name}: requires_grad={param.requires_grad}")

Frozen: 0.weight
Frozen: 2.weight

After freezing:
0.weight: requires_grad=False
0.bias: requires_grad=True
2.weight: requires_grad=False
2.bias: requires_grad=True
4.weight: requires_grad=True
4.bias: requires_grad=True


## Example 2: Weight Initialization

This example demonstrates:
- **Creating layers** with `nn.Linear`
- **Initializing weights** using `nn.init.uniform_()`
- **Building a model** with initialized layers

**What is Weight Initialization?**
Weight initialization is the process of setting initial values for the weights in a neural network before training begins. Proper initialization is crucial for:
- Faster convergence during training
- Avoiding vanishing/exploding gradients
- Breaking symmetry (ensuring neurons learn different features)

**Uniform Initialization:**
$$\text{Uniform}(a, b): \text{weights} \sim U(a, b)$$

The `nn.init.uniform_()` function:
- Initializes weights from a uniform distribution
- Default range: `[-1/√n, 1/√n]` where n = number of input features
- The `_` suffix means it modifies weights **in-place**

**Code breakdown:**
```python
layer0 = nn.Linear(16, 32)              # Create layer
nn.init.uniform_(layer0.weight)          # Initialize its weights
```

**Common Initialization Methods:**

1. **Uniform Distribution**: `nn.init.uniform_(tensor, a, b)`
   - Random values uniformly distributed between a and b
   - Use: General purpose, simple initialization

2. **Normal Distribution**: `nn.init.normal_(tensor, mean, std)`
   - Random values from normal distribution
   - Use: When you want values clustered around mean

3. **Xavier/Glorot**: `nn.init.xavier_uniform_(tensor)`
   - Maintains variance of activations across layers
   - Use: With sigmoid or tanh activations

4. **He/Kaiming**: `nn.init.kaiming_uniform_(tensor)`
   - Designed for ReLU activations
   - Use: With ReLU or LeakyReLU (most common in modern networks)

5. **Constant**: `nn.init.constant_(tensor, value)`
   - Sets all weights to same value
   - Use: Usually for biases (often initialized to 0)

**Why Initialize Weights?**
- **Prevent symmetry**: If all weights start the same, all neurons learn the same features
- **Scale properly**: Too large → exploding gradients, too small → vanishing gradients
- **Faster training**: Good initialization helps the model learn faster

**Best Practices:**
- Use **He initialization** for ReLU networks (most common)
- Use **Xavier initialization** for sigmoid/tanh networks
- Initialize **biases to zero** or small constants
- Don't use all zeros or all same values

**Note:** PyTorch automatically initializes weights when you create a layer, but you can override this with custom initialization for better performance on specific tasks.

In [3]:
# Using uniform initialization for layer0 and layer1 weights

layer0 = nn.Linear(16, 32)
layer1 = nn.Linear(32, 64)

nn.init.uniform_(layer0.weight)
nn.init.uniform_(layer1.weight)

model = nn.Sequential(
    layer0,
    layer1
)

print(model)

Sequential(
  (0): Linear(in_features=16, out_features=32, bias=True)
  (1): Linear(in_features=32, out_features=64, bias=True)
)


## Example 3: Model Validation

This example demonstrates:
- **Creating a validation dataset** from a pandas DataFrame
- **Model evaluation mode** using `model.eval()`
- **Validation loop** with `torch.no_grad()`
- **Computing validation loss** without updating weights

**What is Model Validation?**
Validation is the process of evaluating a trained model on unseen data to assess its performance and generalization ability.

**Key Components:**

1. **`model.eval()`**: 
   - Puts model in evaluation mode
   - Disables dropout and batch normalization layers (if present)
   - Important for consistent predictions

2. **`torch.no_grad()`**:
   - Disables gradient computation
   - Reduces memory usage
   - Speeds up computation (no need to track gradients)
   - Essential during validation/inference

3. **Validation Loss Calculation**:
   - Forward pass through model
   - Compute loss using criterion
   - Accumulate loss over all batches
   - **No backward pass or optimizer step**

**Validation vs Training:**
- **Training**: `model.train()` + gradients + optimizer updates
- **Validation**: `model.eval()` + no gradients + no updates

**The Dataset:**
- 4 animals with features: Height and Weight
- Target: Age prediction
- Features (X): Height, Weight (2 features)
- Labels (y): Age (1 output)

**Why Validation is Important:**
- Check for overfitting (model memorizing training data)
- Compare different model architectures
- Tune hyperparameters (learning rate, batch size, etc.)
- Decide when to stop training (early stopping)

**Common Practice:**
Split your data into:
- **Training set** (70-80%): Used to train the model
- **Validation set** (10-15%): Used to tune and evaluate during training
- **Test set** (10-15%): Final evaluation on completely unseen data

In [23]:
animals = pd.DataFrame({
	'Name': ['Dog', 'Cat', 'Bird', 'Fish'],
	'Height': [60, 25, 15, 5],
	'Weight': [30, 5, 0.5, 0.1],
	'Age': [5, 3, 2, 1]
})


X = animals.iloc[:, 1:-1].to_numpy().astype(float)
y = animals.iloc[:, -1].to_numpy().astype(float)

# Set random seed for reproducibility
torch.manual_seed(42)

model = nn.Sequential(
    nn.Linear(2, 8),  
    nn.Linear(8, 4),
    nn.Linear(4, 1)
)
dataset = TensorDataset(torch.tensor(X).float(), torch.tensor(y).float())
validationloader = DataLoader(dataset, batch_size=3, shuffle=False)  # shuffle=False for consistent results
criterion = nn.MSELoss()
optimizer = torch.optim.SGD(model.parameters(), lr=0.001)

num_epochs = 5
for epoch in range(num_epochs):
    for data in validationloader:
        optimizer.zero_grad()

        features, targets = data
        features = features.float()
        targets = targets.float()
        prediction = model(features)
        loss = criterion(prediction.squeeze(), targets)
        loss.backward()
        optimizer.step()

############# ########### IGNORE ABOVE CODE FOR NOW########################

model.eval()

validation_loss = 0.0
with torch.no_grad():
    for features, labels in validationloader:

        output = model(features).squeeze(-1)  # Only squeeze last dimension

        loss = criterion(output, labels)

        validation_loss += loss.item()

validation_loss_epoch = validation_loss / len(validationloader)
print(validation_loss_epoch)
model.train()

Total validation loss: 0.8905611485242844


In [None]:
# Classification dataset (Height, Weight) -> Class (0=Small, 1=Medium, 2=Large)
X_class = torch.tensor([[60, 30], [25, 5], [15, 0.5], [5, 0.1]]).float()
y_class = torch.tensor([2, 1, 0, 0])  # Class labels: 0, 1, 2

# Classification model (3 outputs for 3 classes)
torch.manual_seed(42)
class_model = nn.Sequential(
    nn.Linear(2, 8),
    nn.ReLU(),
    nn.Linear(8, 3)
)

class_dataset = TensorDataset(X_class, y_class)
class_loader = DataLoader(class_dataset, batch_size=2, shuffle=False)

# Accuracy metric
class_model.eval()

metric = torchmetrics.Accuracy(task="multiclass", num_classes=3)
with torch.no_grad():
    for features, labels in class_loader:
        output = class_model(features)
        metric.update(output, labels)

accuracy = metric.compute()
print(f"Accuracy: {accuracy:.4f}")
metric.reset()

Accuracy: 0.2500
