hese imports bring in:

torch, torchvision: For model building and training.

PIL: For image loading.

pandas: For reading the metadata CSV.

In [1]:
import os
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import Dataset, DataLoader
from torchvision import models, transforms
from PIL import Image
import pandas as pd

In [2]:
import torch
print(torch.cuda.is_available())
print(torch.cuda.device_count())

if torch.cuda.is_available():
    print(torch.cuda.current_device())
    print(torch.cuda.get_device_name(torch.cuda.current_device()))
else:
    print("No CUDA device")

False
0
No CUDA device


In [3]:
DATA_DIR = r"C:\Users\athul\Documents\GitHub\birdclef-2025\data\processed\spectrograms"
CSV_PATH = r"C:\Users\athul\Documents\GitHub\birdclef-2025\data\processed\metadata_processed.csv"
BATCH_SIZE = 32
NUM_EPOCHS = 10
IMG_SIZE = 224
NUM_CLASSES = len(pd.read_csv(CSV_PATH)['label'].unique())
DEVICE = torch.device("cuda" if torch.cuda.is_available() else "cpu")

This sets paths and constants:

NUM_CLASSES is dynamically determined from the number of unique species.

DEVICE is set to GPU if available.

In [4]:
class SpectrogramDataset(Dataset):
    def __init__(self, csv_file, transform=None):
        self.data = pd.read_csv(csv_file)
        self.transform = transform

    def __len__(self):
        return len(self.data)

    def __getitem__(self, idx):
        img_path = self.data.iloc[idx]['spectrogram_path']
        image = Image.open(img_path).convert("RGB")
        label = int(self.data.iloc[idx]['label'])

        if self.transform:
            image = self.transform(image)

        return image, label

This class enables loading image-label pairs directly from the metadata file:

Each image is read using its full path and transformed.

Labels are read as integers.

In [5]:
transform = transforms.Compose([
    transforms.Resize((IMG_SIZE, IMG_SIZE)),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.5, 0.5, 0.5], std=[0.5, 0.5, 0.5])
])



transform = transforms.Compose([
    transforms.Resize((IMG_SIZE, IMG_SIZE)),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.5, 0.5, 0.5], std=[0.5, 0.5, 0.5])
])

Resizes all images to 224x224.

Normalizes pixel values to [-1, 1].

In [6]:
dataset = SpectrogramDataset(CSV_PATH, transform=transform)
train_size = int(0.8 * len(dataset))
val_size = len(dataset) - train_size
train_dataset, val_dataset = torch.utils.data.random_split(dataset, [train_size, val_size])
train_loader = DataLoader(train_dataset, batch_size=BATCH_SIZE, shuffle=True, num_workers=4)
val_loader = DataLoader(val_dataset, batch_size=BATCH_SIZE, num_workers=4)


Dataset is split into 80% training and 20% validation.

DataLoader is used to batch and shuffle data efficiently.

In [7]:
model = models.resnet18(pretrained=True)
model.fc = nn.Linear(model.fc.in_features, NUM_CLASSES)
model = model.to(DEVICE)



Loads a pretrained ResNet18 from torchvision.

Replaces the final fully connected (fc) layer to output NUM_CLASSES.

Moves the model to the appropriate device (GPU or CPU).



In [8]:
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=1e-4)

CrossEntropyLoss is standard for multi-class classification.

Adam optimizer is used with a learning rate of 1e-4.

In [None]:
for epoch in range(NUM_EPOCHS):
    model.train()
    running_loss = 0.0
    for images, labels in train_loader:
        images, labels = images.to(DEVICE), labels.to(DEVICE)
        optimizer.zero_grad()
        outputs = model(images)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()
        running_loss += loss.item()

    print(f"Epoch [{epoch+1}/{NUM_EPOCHS}], Loss: {running_loss/len(train_loader):.4f}")
torch.save(model.state_dict(), "outputs/models/cnn_spectrogram_model.pth")

Iterates through epochs and batches.

Performs forward pass, computes loss, backpropagates gradients, and updates weights.

Logs average loss per epoch.

Saves the model weights to disk for later use (e.g., prediction or fusion).