# Indian Bird Species Classification

This notebook demonstrates how to train a deep learning model to classify images of 25 different bird species found in India.

Tags: deep learning, computer vision, image classification, PyTorch


In [1]:
import torch
import torch.nn as nn
import torch.optim as optim
import torchvision.transforms as transforms
import torchvision.datasets as datasets
import torchvision.models as models
from torch.utils.data import DataLoader
from sklearn.model_selection import KFold
import torch.nn.functional as F

import warnings
warnings.filterwarnings('ignore')

# Load and preprocess the dataset

This code loads and preprocesses a dataset of Indian bird species images using PyTorch. 

To begin, the number of classes in the dataset is defined as `num_classes = 25`. This is a necessary step when training a classification model, as the model needs to know how many different classes to predict.

Next, the directory containing the image dataset is specified using the `data_dir` variable. The dataset contains images of Indian bird species, which are organized into separate folders for each class.

A series of data transformations are then defined using the `transforms.Compose()` function from PyTorch. This function takes a list of transform objects and applies them sequentially to each image. In this case, the following transformations are applied:

- `transforms.Resize((224, 224))`: This transformation resizes each image to a fixed size of 224x224 pixels. This is a common size for image classification models.
- `transforms.RandomHorizontalFlip()`: This transformation randomly flips each image horizontally with a probability of 0.5. This helps to increase the diversity of the training data and improve the model's robustness.
- `transforms.ToTensor()`: This transformation converts each image to a PyTorch tensor. Tensors are the primary data structure used by PyTorch for training deep learning models.
- `transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])`: This transformation normalizes each image with the mean and standard deviation values from the ImageNet dataset. Normalization helps to make the data more comparable across different images and can improve the convergence of the training process.

Finally, the `datasets.ImageFolder()` function from PyTorch is used to create a PyTorch dataset from the image files. This function reads the images from the directory specified by `data_dir` and applies the transformations defined in `transform` to each image. The resulting dataset can be used for training a deep learning model to classify the images based on their bird species.

In [2]:
# Define the number of classes
num_classes = 25

# Load the dataset
data_dir = "/kaggle/input/25-indian-bird-species-with-226k-images"
transform = transforms.Compose([
    transforms.Resize((224, 224)),
    transforms.RandomHorizontalFlip(),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
])
dataset = datasets.ImageFolder(data_dir, transform=transform)

# Define cross-validation splits

This code uses the `KFold` function from the scikit-learn library to define cross-validation splits for the dataset. Cross-validation is a common technique used in machine learning to evaluate the performance of a model on a limited dataset.

To create the cross-validation splits, the `KFold` function is used. This function takes several parameters, including `n_splits`, which specifies the number of folds to create. In this case, `n_splits=5`, which means that the dataset will be split into 5 equal-sized parts.

The `shuffle=True` parameter indicates that the data should be shuffled before splitting. This can help to ensure that each fold contains a diverse set of data and that the model is not biased towards any particular subset of the data.

The `random_state` parameter sets the random seed for reproducibility. By setting a random seed, the splits will be generated in a consistent manner each time the code is run, which can help with debugging and comparison of different models.


In [3]:
# Define the cross-validation splits
kf = KFold(n_splits=5, shuffle=True, random_state=123)

# Train and evaluate the model using cross-validation

This code trains and evaluates a deep learning model using cross-validation. It uses a ResNet-18 architecture to classify images of Indian bird species into one of `num_classes` categories. 

The `KFold` function is used to generate indices for the training and validation sets for each fold of the cross-validation process. For each fold, the `enumerate(kf.split(dataset))` function is used to obtain the indices of the training and validation data.

The model is defined using the `resnet18` function from PyTorch's `torchvision.models` module. The model is initialized with weights pre-trained on the ImageNet dataset, and the fully connected layer at the end of the network is replaced with a new layer that has `num_classes` output units. 

The model is then moved to the GPU if available using the `to` method and the `device` variable. The loss function used to train the model is the `CrossEntropyLoss`, which is commonly used for multi-class classification tasks. The `Adam` optimizer is used to update the model weights during training.

The data is loaded into PyTorch `DataLoader` objects using the `SubsetRandomSampler` class. This sampler randomly samples a subset of the training or validation data for each batch of training or evaluation.

The model is trained for 10 epochs using the training data. During each epoch, the model is run through the training data in batches. The gradients of the loss function with respect to the model parameters are computed using the `backward` method and the weights are updated using the `step` method of the optimizer.

After each epoch, the training loss is printed to the console. This provides a measure of how well the model is learning from the training data.

The model is then evaluated on the validation data using the `test_loader` object. The accuracy of the model on the validation data is computed by comparing the predicted labels to the true labels of the validation data.

This process is repeated for each fold of the cross-validation process. By using cross-validation, the code is able to obtain a more robust estimate of the model's performance than would be possible with a single train-test split. The final accuracy of the model is computed as the average of the accuracies obtained in each fold.

The trained model is saved to a file using the `save` method of the `torch.save` module, which saves the model parameters to a binary file. This allows the model to be reloaded later for use in other applications.

In [8]:
from torchvision.models import resnet18

# Train and evaluate the model using cross-validation
for fold, (train_idx, test_idx) in enumerate(kf.split(dataset)):
    print(f"Fold {fold+1}")
    
    # Define the model and move it to the GPU
    device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
    model = resnet18(pretrained=True)
    model.fc = nn.Linear(model.fc.in_features, num_classes)
    model = model.to(device)

    # Define the loss function and optimizer
    criterion = nn.CrossEntropyLoss()
    optimizer = optim.Adam(model.parameters(), lr=0.001)

    train_sampler = torch.utils.data.SubsetRandomSampler(train_idx)
    test_sampler = torch.utils.data.SubsetRandomSampler(test_idx)
    train_loader = DataLoader(dataset, batch_size=64, sampler=train_sampler, num_workers=4, pin_memory=True)
    test_loader = DataLoader(dataset, batch_size=64, sampler=test_sampler, num_workers=4, pin_memory=True)
    for epoch in range(10):
        running_loss = 0.0
        for i, (inputs, labels) in enumerate(train_loader, 0):
            inputs, labels = inputs.to(device), labels.to(device)
            optimizer.zero_grad()
            outputs = model(inputs)
            loss = criterion(outputs, labels)
            loss.backward()
            optimizer.step()
            running_loss += loss.item()
        print(f"Epoch {epoch+1} loss: {running_loss/len(train_loader)}")
    correct = 0
    total = 0
    with torch.no_grad():
        for inputs, labels in test_loader:
            inputs, labels = inputs.to(device), labels.to(device)
            outputs = model(inputs)
            _, predicted = torch.max(outputs.data, 1)
            total += labels.size(0)
            correct += (predicted == labels).sum().item()
    print(f"Accuracy: {correct/total}")

Fold 1


Downloading: "https://download.pytorch.org/models/resnet18-f37072fd.pth" to /root/.cache/torch/hub/checkpoints/resnet18-f37072fd.pth


  0%|          | 0.00/44.7M [00:00<?, ?B/s]

Epoch 1 loss: 0.025470126079035062
Epoch 2 loss: 2.029125873262084e-05
Epoch 3 loss: 1.52429088382941e-05
Epoch 4 loss: 1.1793854078962863e-05
Epoch 5 loss: 9.353173051296624e-06
Epoch 6 loss: 7.509635585523128e-06
Epoch 7 loss: 6.103404375158152e-06
Epoch 8 loss: 5.014294697583263e-06
Epoch 9 loss: 4.156751850874802e-06
Epoch 10 loss: 3.4732735394841624e-06
Accuracy: 1.0
Fold 2
Epoch 1 loss: 0.02714827297781528
Epoch 2 loss: 2.154370909669871e-05
Epoch 3 loss: 1.6277350478177452e-05
Epoch 4 loss: 1.2721924027355014e-05
Epoch 5 loss: 1.0168118833880148e-05
Epoch 6 loss: 8.241779323404119e-06
Epoch 7 loss: 6.751573395434757e-06
Epoch 8 loss: 5.584041047620718e-06
Epoch 9 loss: 4.66400936766342e-06
Epoch 10 loss: 3.907869145560675e-06
Accuracy: 1.0
Fold 3
Epoch 1 loss: 0.022968434523354776
Epoch 2 loss: 2.318731055854344e-05
Epoch 3 loss: 1.6988783474272578e-05
Epoch 4 loss: 1.29002609236757e-05
Epoch 5 loss: 1.0117977095825241e-05
Epoch 6 loss: 8.071181047544794e-06
Epoch 7 loss: 6.5488

# Save the trained model

This code saves the trained ResNet-18 model to a file specified by the `model_path` variable using the `state_dict()` function from PyTorch. This function returns a dictionary containing the parameters and persistent buffers of the model, which can be used to save and load the model's state. 

The saved model can be loaded later using the `load_state_dict()` function from PyTorch. This allows you to reuse the trained model for inference or further training without having to retrain the model from scratch.


In [9]:
# Save the trained model
model_path = "/kaggle/working/trained_resnet18.pth"
torch.save(model.state_dict(), model_path)