## <p style="font-family: Georgia; font-weight: normal; letter-spacing: 2px; color: #fabd2f;     font-size: 140%; text-align: left; padding: 0px; border-bottom: 3px solid #fabd2f;">Model Fine-Tuning with Custom Webcam Dataset</p> 


This notebook fine-tunes the pre-trained MyCNN model using additional custom data collected from webcam recordings (check ../scripts/). We explore different fine-tuning strategies to improve emotion detection performance.

## <p style="font-family: Georgia; font-weight: normal; letter-spacing: 2px; color: #fabd2f;     font-size: 140%; text-align: left; padding: 0px; border-bottom: 3px solid #fabd2f;">Dataset Structure</p>


Custom dataset containing 48√ó48 grayscale images organized by emotion class. All images are preprocessed to match the model's input requirements:

```bash
dataset/
    0_happy/
        img_run1_0000.jpg
        img_run1_0001.jpg
        ...
    1_sad/
        img_run1_0000.jpg
        ...
    2_neutral/
        ...
```

In [2]:
import torch
import torchvision.transforms as T
from torchvision.datasets import ImageFolder
from torch.utils.data import DataLoader
from model import MyCNN

device = torch.device('mps' if torch.backends.mps.is_available() else 'cpu')
print(f'Using device: {device}')

transform = T.Compose([
    T.Grayscale(),
    T.Resize((48, 48)),
    T.ToTensor(),
    T.Normalize([0.5], [0.5])
])

Using device: mps


## <p style="font-family: Georgia; font-weight: normal; letter-spacing: 2px; color: #fabd2f;     font-size: 140%; text-align: left; padding: 0px; border-bottom: 3px solid #fabd2f;">Load Dataset</p>


Load the custom dataset from disk with appropriate image transformations (grayscale conversion, resizing, normalization).

In [24]:
dataset = ImageFolder("../dataset", transform=transform)
train_loader = DataLoader(dataset, batch_size=16, shuffle=True)

## <p style="font-family: Georgia; font-weight: normal; letter-spacing: 2px; color: #fabd2f;     font-size: 140%; text-align: left; padding: 0px; border-bottom: 3px solid #fabd2f;">Load Pre-trained Model</p>


Initialize the model and load the original weights from `best_model.pth`

In [None]:
model = MyCNN().to(device)
state = torch.load("models/best_model.pth", map_location=device, weights_only=True)
model.load_state_dict(state, strict=False)  # Use strict=False to handle any architecture changes
model.eval()

MyCNN(
  (conv1): Sequential(
    (0): Conv2d(1, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (1): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (2): ReLU()
    (3): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (4): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (5): ReLU()
    (6): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    (7): Dropout(p=0.05, inplace=False)
  )
  (conv2): Sequential(
    (0): Conv2d(32, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (2): ReLU()
    (3): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (4): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (5): ReLU()
    (6): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    (7): Dropout

## <p style="font-family: Georgia; font-weight: normal; letter-spacing: 2px; color: #fabd2f;     font-size: 140%; text-align: left; padding: 0px; border-bottom: 3px solid #fabd2f;">Strategy: Conservative Fine-Tuning</p>

**Trade-off Analysis:**
- **Full model fine-tuning**: Risk of *catastrophic forgetting* (losing original learned features)
- **Classifier-only fine-tuning**: Safer but limited to existing feature representations

**Approach:** Decided to start conservatively by freezing all convolutional layers and training only the classifier head. If results are insufficient, progressively unfreeze conv layers

In [None]:
# Freeze all convolutional layers to preserve learned features
for param in model.conv1.parameters():
    param.requires_grad = False
for param in model.conv2.parameters():
    param.requires_grad = False
for param in model.conv3.parameters():
    param.requires_grad = False

# Only train the classifier (fully connected layers)
for param in model.fc.parameters():
    param.requires_grad = True

## <p style="font-family: Georgia; font-weight: normal; letter-spacing: 2px; color: #fabd2f;     font-size: 140%; text-align: left; padding: 0px; border-bottom: 3px solid #fabd2f;">Training Loop (Round 1: Classifier Only)</p>


Train the classifier head with frozen convolutional layers. Monitor loss and accuracy over epochs.

In [27]:
optimizer = torch.optim.AdamW(model.fc.parameters(), lr=1e-4, weight_decay=0.01)
criterion = torch.nn.CrossEntropyLoss()

num_epochs = 20

for epoch in range(num_epochs):
    model.train()
    total_loss, correct, total = 0, 0, 0
    for imgs, labels in train_loader:
        imgs, labels = imgs.to(device), labels.to(device)
        
        optimizer.zero_grad()
        outputs = model(imgs)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()
        
        total_loss += loss.item()
        _, preds = torch.max(outputs, 1)
        correct += (preds == labels).sum().item()
        total += labels.size(0)
        
    train_loss = total_loss / len(train_loader)
    train_acc = correct / total * 100
    
    print(f"Epoch [{epoch+1}/{num_epochs}] "
          f"Train Loss: {train_loss:.4f} | Train Acc: {train_acc:.4f} ")

Epoch [1/20] Train Loss: 1.0676 | Train Acc: 65.2720 
Epoch [2/20] Train Loss: 1.0476 | Train Acc: 64.4351 
Epoch [3/20] Train Loss: 0.9315 | Train Acc: 69.0377 
Epoch [4/20] Train Loss: 0.8825 | Train Acc: 69.8745 
Epoch [5/20] Train Loss: 0.8594 | Train Acc: 67.7824 
Epoch [6/20] Train Loss: 0.9060 | Train Acc: 66.9456 
Epoch [7/20] Train Loss: 0.8217 | Train Acc: 68.2008 
Epoch [8/20] Train Loss: 0.7935 | Train Acc: 69.8745 
Epoch [9/20] Train Loss: 0.7963 | Train Acc: 66.5272 
Epoch [10/20] Train Loss: 0.7699 | Train Acc: 71.5481 
Epoch [11/20] Train Loss: 0.7494 | Train Acc: 71.1297 
Epoch [12/20] Train Loss: 0.7233 | Train Acc: 68.6192 
Epoch [13/20] Train Loss: 0.7296 | Train Acc: 70.7113 
Epoch [14/20] Train Loss: 0.7147 | Train Acc: 71.1297 
Epoch [15/20] Train Loss: 0.7297 | Train Acc: 70.2929 
Epoch [16/20] Train Loss: 0.7309 | Train Acc: 70.7113 
Epoch [17/20] Train Loss: 0.7053 | Train Acc: 71.1297 
Epoch [18/20] Train Loss: 0.7077 | Train Acc: 70.2929 
Epoch [19/20] Train

In [None]:
# Save the fine-tuned model
torch.save(model.state_dict(), "models/fine_tuned_classifier.pth")

## <p style="font-family: Georgia; font-weight: normal; letter-spacing: 2px; color: #fabd2f;     font-size: 140%; text-align: left; padding: 0px; border-bottom: 3px solid #fabd2f;">Results & Analysis (Round 1)</p>


Evaluate performance metrics and identify weaknesses.

**Observations:**
- Happy detection improved significantly
- Sad detection degraded further‚Äîthe model struggles with this class (later will know why hehe)

**Action Items:**
1. Unfreeze `conv3` (the last convolutional block) to allow deeper adaptation
2. Increase training data for sad emotion (oversample)
3. Adjust learning rate and epochs for better convergence

## <p style="font-family: Georgia; font-weight: normal; letter-spacing: 2px; color: #fabd2f;     font-size: 140%; text-align: left; padding: 0px; border-bottom: 3px solid #fabd2f;">Training Loop (Round 2: Partial Unfreezing)</p>


Unfreeze the last convolutional layer (`conv3`) to allow more adaptive feature learning while preserving early-stage features from `conv1` and `conv2`

In [None]:
dataset = ImageFolder("../dataset", transform=transform)
# Note: later found bug here ImageFolder assigns labels in alphabetical order (0=happy, 1=neutral, 2=sad)
# This differs from the original training setup (0=happy, 1=sad, 2=neutral)
train_loader = DataLoader(dataset, batch_size=16, shuffle=True)

model = MyCNN().to(device)
state = torch.load("models/best_model.pth", map_location=device, weights_only=True)
model.load_state_dict(state, strict=False)
model.eval()

# freeze early convolutional layers
for param in model.conv1.parameters():
    param.requires_grad = False
for param in model.conv2.parameters():
    param.requires_grad = False

# unfreeze the last conv layer for fine-tuning
for param in model.conv3.parameters():
    param.requires_grad = True

# classifier
for param in model.fc.parameters():
    param.requires_grad = True
    
# only trainable parameters
optimizer = torch.optim.AdamW(
    filter(lambda p: p.requires_grad, model.parameters()), lr=2e-4, weight_decay=0.01
)
criterion = torch.nn.CrossEntropyLoss()

num_epochs = 5

for epoch in range(num_epochs):
    model.train()
    total_loss, correct, total = 0, 0, 0
    for imgs, labels in train_loader:
        imgs, labels = imgs.to(device), labels.to(device)
        
        optimizer.zero_grad()
        outputs = model(imgs)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()
        
        total_loss += loss.item()
        _, preds = torch.max(outputs, 1)
        correct += (preds == labels).sum().item()
        total += labels.size(0)
        
    train_loss = total_loss / len(train_loader)
    train_acc = correct / total * 100
    
    print(f"Epoch [{epoch+1}/{num_epochs}] "
          f"Train Loss: {train_loss:.4f} | Train Acc: {train_acc:.4f} ")

Epoch [1/5] Train Loss: 0.6520 | Train Acc: 62.9857 
Epoch [2/5] Train Loss: 0.3592 | Train Acc: 88.9571 
Epoch [3/5] Train Loss: 0.3039 | Train Acc: 90.3885 
Epoch [4/5] Train Loss: 0.2264 | Train Acc: 93.2515 
Epoch [5/5] Train Loss: 0.1880 | Train Acc: 95.2965 


In [None]:
# Save the updated fine-tuned model
torch.save(model.state_dict(), "models/fine_tuned_classifier10.pth")

## <p style="font-family: Georgia; font-weight: normal; letter-spacing: 2px; color: #fabd2f;     font-size: 140%; text-align: left; padding: 0px; border-bottom: 3px solid #fabd2f;">Iterative Refinement</p>

Now **systematically** improve performance by adjusting:
- Learning rate (coarser -> finer updates)
- Number of epochs (underfitting -> appropriate fit -> overfitting)
- Unfreezing strategy (conservative -> progressive)

Save checkpoints from each round for comparison, and run main.py for real human (me) evaluation, with logging as well

## <p style="font-family: Georgia; font-weight: normal; letter-spacing: 2px; color: #fabd2f;     font-size: 140%; text-align: left; padding: 0px; border-bottom: 3px solid #fabd2f;">üêõCritical Bug Fixed: Label Mapping Mismatch</p>

To keep it real, during this I found (pretty early on, after couple interactions) a critical bug:

**Problem Discovered:**
- Original training: Label mapping was `[happy=0, sad=1, neutral=2]`
- Fine-tuning with `ImageFolder`: Labels are assigned **alphabetically** by folder names, resulting in `[happy=0, neutral=1, sad=2]`, despite me specifically writing `dataset.class_to_idx = {"happy": 0, "sad": 1, "neutral": 2}`

**Impact:**
- This mismatch completely confused the model (and me) during fine-tuning
- Neutral and sad predictions were being swapped
- Explained the unexpectedly poor "sad" detection performance

**Solution:**
- Rename dataset folders to `0_happy`, `1_sad`, `2_neutral` (use numerical prefixes for proper ordering)
- Or manually remap labels with a custom Dataset class (didn't want to implement, easier to rename folders)

**Lesson:** Always validate label mappings between training and inference to avoid subtle bugs!