### Dataset Description
The dataset contains bird images, divided into train and test splits. The images are inside test_images and train_images folders.

The labels of the training images are inside train_images.csv file. In this file, the first column is image_path and the second one is the label (1 - 200). The test_images_samples.csv includes a row id with a dummy label. The final goal of the challenge is to change the label column to the predicted label.

The class_names.npy is a dictionary including the name of each label. Load the file using the following code: np.load("class_names.npy", allow_pickle=True).item()

The structure of the final submission should be exactly the same as the test_images_samples.csv! Otherwise, it will fail.

Files

- train_images - the training images
- test_images - the test images
- test_images_sample.csv - a sample submission file in the correct format
- test_images_path.csv - path to test file images
- train_images.csv - supplemental information about the data
- class_names.npy - this file includes the name of each label
- attributes.npy - this file includes the attributes which are extra information for each class.
- attributes.txt - this file includes the attribute names which are extra information for each class.

### Ideas and links: 

1. Train from scratch

**complete tutorials**
- [Link](https://docs.pytorch.org/tutorials/beginner/transfer_learning_tutorial.html) to tutorial on pytorch
- [link](https://www.youtube.com/watch?v=t6oHGXt04ik) to tutorial on transfer learning with a resnet on youtube

**Creating a custom dataset**
- [link](https://docs.pytorch.org/tutorials/beginner/basics/data_tutorial.html#creating-a-custom-dataset-for-your-files) To pytorch tutorial of creating a custom dataset for your files


In [24]:
import os
import pandas as pd
import numpy as np
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import Dataset, DataLoader, Subset
from torchvision import transforms, models
from PIL import Image

**Easy to find parameters**

In [25]:
batch_size = 32
learning_rate = 0.001
num_epochs = 10
num_classes = 200
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
val_split = 0.2
seed = 42

torch.manual_seed(seed)
np.random.seed(seed)

### Setting up dataset and dataloaders
creating train and validation set (80:20)

In [30]:
class BirdDataset(Dataset):
    def __init__(self, csv_file, root_dir, img_col_idx, label_col_idx, transform=None):
        self.data = pd.read_csv(csv_file)
        self.root_dir = root_dir
        self.img_col_idx = img_col_idx
        self.label_col_idx = label_col_idx
        self.transform = transform

    def __len__(self):
        return len(self.data)

    def __getitem__(self, idx):
        filename = str(self.data.iloc[idx, self.img_col_idx])
        clean_filename = filename.lstrip('/').lstrip('\\')
        img_path = os.path.join(self.root_dir, clean_filename)

        try:
            image = Image.open(img_path).convert('RGB')
        except (FileNotFoundError, OSError):
            print(f"Could not open {img_path}, using black image.")
            image = Image.new('RGB', (224, 224), (0, 0, 0))
        
        # 3. Raw CSV is 1-200. We subtract 1 to get 0-199 for PyTorch.
        raw_label = int(self.data.iloc[idx, self.label_col_idx])
        label = raw_label - 1

        # 4. APPLY TRANSFORMS
        if self.transform:
            image = self.transform(image)

        return image, label

In [31]:
train_transform = transforms.Compose([
    transforms.Resize((224, 224)),
    transforms.RandomHorizontalFlip(p=0.5),
    transforms.RandomAffine(degrees=15, translate=(0.1, 0.1), scale=(0.8, 1.2), shear=10),
    transforms.ToTensor(),
    transforms.Normalize((0.485, 0.456, 0.406), (0.229, 0.224, 0.225))
])

# 2. Validation 
val_transform = transforms.Compose([
    transforms.Resize((224, 224)),
    transforms.ToTensor(),
    transforms.Normalize((0.485, 0.456, 0.406), (0.229, 0.224, 0.225))
])

**Dataloaders**

In [32]:
full_train_dataset = BirdDataset('train_images.csv', 'train_images', 0, 1, transform=train_transform)
full_val_dataset   = BirdDataset('train_images.csv', 'train_images', 0, 1, transform=val_transform)

dataset_size = len(full_train_dataset)
indices = list(range(dataset_size))
split = int(np.floor(val_split * dataset_size))

np.random.shuffle(indices)
train_indices, val_indices = indices[split:], indices[:split]

# train_subset 
train_dataset = Subset(full_train_dataset, train_indices)
# val_subset
val_dataset   = Subset(full_val_dataset, val_indices)

train_loader = DataLoader(train_dataset, batch_size= batch_size, shuffle=True)
val_loader   = DataLoader(val_dataset, batch_size= batch_size, shuffle=False)

print(f"Total images: {dataset_size}")
print(f"Training set: {len(train_dataset)} images")
print(f"Validation set: {len(val_dataset)} images")

Total images: 3926
Training set: 3141 images
Validation set: 785 images


**Setting up test set**

In [None]:
test_dataset  = BirdDataset('test_images_path.csv', 'test_images', 1, None, transform=val_transform)
test_loader = DataLoader(test_dataset, batch_size=batch_size, shuffle=False)

print(f"Test set: {len(test_dataset)} images to predict.")

Test set: 4000 images to predict.


### Convolutional Neural Network

In [42]:
class bird4(nn.Module):
    def __init__(self, num_classes):
        super().__init__()

        def cnblock(in_channels, out_channels):
            return nn.Sequential(
                nn.Conv2d(in_channels, out_channels, kernel_size=3, padding=1),
                nn.BatchNorm2d(out_channels),
                nn.ReLU(inplace=True),
                nn.Conv2d(out_channels, out_channels, kernel_size=3, padding=1),
                nn.BatchNorm2d(out_channels),
                nn.ReLU(inplace=True),
                nn.MaxPool2d(kernel_size=2, stride=2)
            )
        self.features = nn.Sequential(
            cnblock(3, 64),
            cnblock(64, 128),
            cnblock(128, 256),
            cnblock(256, 512),
        )

        self.global_pool = nn.AdaptiveAvgPool2d((1, 1))

        self.classifier = nn.Sequential(
            nn.Flatten(),
            nn.Dropout(0.5),
            nn.Linear(512, 1024),
            nn.ReLU(),
            nn.Dropout(0.5),
            nn.Linear(1024, num_classes)
        )

    def forward(self, x):
        x = self.features(x)
        x = self.global_pool(x)
        x = self.classifier(x)
        return x

In [43]:
model = bird4(num_classes=num_classes).to(device)

optimizer = optim.Adam(model.parameters(), lr=learning_rate)
criterion = nn.CrossEntropyLoss()

scheduler = optim.lr_scheduler.ReduceLROnPlateau(optimizer, mode='max', patience=2, factor=0.5, verbose=True)



In [46]:
best_acc = 0.0

print("Starting training...")

for epoch in range(num_epochs):
    # --- TRAIN ---
    model.train()
    running_loss = 0.0

    for inputs, labels in train_loader:
        inputs, labels = inputs.to(device), labels.to(device)

        optimizer.zero_grad()
        outputs = model(inputs)
        loss = criterion(outputs, labels)

        loss.backward()
        optimizer.step()

        running_loss += loss.item()

    avg_train_loss = running_loss / len(train_loader)

    # --- VALIDATION ---
    model.eval()
    val_loss = 0.0
    correct = 0
    total = 0

    with torch.no_grad():           
        for inputs, labels in val_loader:
            inputs, labels = inputs.to(device), labels.to(device)

            outputs = model(inputs)
            loss = criterion(outputs, labels)
            val_loss += loss.item()

            _, predicted = torch.max(outputs, 1)
            total += labels.size(0)
            correct += (predicted == labels).sum().item()

    avg_val_loss = val_loss / len(val_loader)
    val_acc = 100.0 * correct / total

    scheduler.step(val_acc)
        
    current_lr = optimizer.param_groups[0]['lr']

    print(
        f"Epoch {epoch+1}/{num_epochs} | "
        f"Train Loss: {avg_train_loss:.3f} | "
        f"Val Loss: {avg_val_loss:.3f} | "
        f"Val Acc: {val_acc:.1f}% | "
        f"LR: {current_lr:.5f}"
    )

    # --- SAVE BEST MODEL ---
    if val_acc > best_acc:
        best_acc = val_acc
        torch.save(model.state_dict(), "best_model.pth")
        print(f"  --> Saved new best model ({val_acc:.2f}%)")

print(f"Training complete. Best Val Acc: {best_acc:.2f}%")

Starting training...
Epoch 1/10 | Train Loss: 5.194 | Val Loss: 5.250 | Val Acc: 1.8% | LR: 0.00100
  --> Saved new best model (1.78%)
Epoch 2/10 | Train Loss: 5.073 | Val Loss: 5.010 | Val Acc: 1.7% | LR: 0.00100
Epoch 3/10 | Train Loss: 5.003 | Val Loss: 5.059 | Val Acc: 2.4% | LR: 0.00100
  --> Saved new best model (2.42%)
Epoch 4/10 | Train Loss: 4.956 | Val Loss: 4.982 | Val Acc: 1.7% | LR: 0.00100
Epoch 5/10 | Train Loss: 4.925 | Val Loss: 4.935 | Val Acc: 2.2% | LR: 0.00100
Epoch 6/10 | Train Loss: 4.868 | Val Loss: 4.907 | Val Acc: 2.9% | LR: 0.00100
  --> Saved new best model (2.93%)
Epoch 7/10 | Train Loss: 4.842 | Val Loss: 4.942 | Val Acc: 2.0% | LR: 0.00100


KeyboardInterrupt: 