# Pretrained Model Training Pipeline

This notebook implements the training pipeline for our audio classification model using pretrained CNN architectures. The pipeline consists of several key steps:

## Overview of Steps
1. **Setup and Imports**: Initialize necessary dependencies and paths
2. **Device Selection**: Configure GPU/CPU device for training
3. **Model Architecture**: Choose and configure a pretrained CNN model
4. **Training Approach**: Select between transfer learning or fine-tuning
5. **Classifier Configuration**: Add custom classifier layers
6. **Training Execution**: Train the model with specified parameters


## Set up paths and imports

In [None]:
import os

import torch
import torch.nn as nn

if not os.path.exists("./notebooks"):
    %cd ..

from src.training import do_train, do_test
from src.dataset import prepare_dataset_loaders, RGBSpectrogramDataset

wandb_enabled = False

In [None]:
class Config:
    def __init__(self, lr=0.001, epochs=40, batch_size=32):
        self.learning_rate = lr
        self.epochs = epochs
        self.batch_size = batch_size

### Optionally initialize W&B project

In [None]:
wandb_enabled = True

## 2. Choose device

In [None]:
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

## 3. Choose pretrained model architecture.
We provide two pretrained model options:
- **EfficientNetB0**: A lighter architecture with similar accuracy to VGG16
- **VGG16**: A deeper architecture, more computationally intensive

In [None]:
# EfficientNetB0
from torchvision.models import efficientnet_b0
from torchvision.models import EfficientNet_B0_Weights

weights = EfficientNet_B0_Weights.DEFAULT
pretrained_model = efficientnet_b0(weights=weights)
pre_trans = weights.transforms()
name_base="EfficientNet_B0"
num_features = pretrained_model.classifier[1].in_features

In [None]:
# VGG16
from torchvision.models import vgg16
from torchvision.models import VGG16_Weights

weights = VGG16_Weights.DEFAULT
pretrained_model = vgg16(weights=weights)
pre_trans = weights.transforms()
name_base="VGG16"
num_features = pretrained_model.classifier[0].in_features

## 4. Choose training approach
Choose between two training strategies:
1. **Transfer Learning**: Freezes the pretrained model's weights and only trains the custom classifier. This approach is:
   - Faster to train
   - Less prone to overfitting
   - Useful when target task is similar to original task
   
2. **Fine-Tuning**: Updates both pretrained model and classifier weights. This approach:
   - Can achieve better performance
   - Requires more training data
   - May need careful learning rate selection
"""

In [None]:
# Freeze base model (transfer learning)
pretrained_model.requires_grad_(False)
next(iter(pretrained_model.parameters())).requires_grad
name = name_base + "_transfer_learning"

In [None]:
# Do not freeze model
name = name_base + "_fine_tuning"

##  5. Add our small classifier after pretrained model's feature extraction

The added classifier consists of:
- Linear layer reducing features to 256 dimensions
- ReLU activation
- 50% dropout for regularization
- Final classification layer for binary output
"""

In [None]:
# Our own classifier
N_CLASSES = 2

pretrained_model.classifier = nn.Sequential(
    nn.Linear(num_features, 256),
    nn.ReLU(),
    nn.Dropout(0.5),
    nn.Linear(256, N_CLASSES)
)
my_model = pretrained_model

In [None]:
model = my_model
config = Config(batch_size=32, epochs=40, lr=0.0001)
train_loader, val_loader, test_loader = prepare_dataset_loaders(pre_trans, config.batch_size, RGBSpectrogramDataset)
criterion = torch.nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=config.learning_rate)

do_train(name, train_loader, val_loader, config, model, criterion, optimizer, device, wandb_enabled)