# CNN and Transfer Learning with PyTorch: 200 Bird Species Image Classification

This is a project written in the framework of the Online live course Deep Learning with PyTorch: Zero to GANs, delivered by Jovian.ml in collaboration with FreeCodeCamp.org.

## Model training using Convolutional Neural Network (CNN) and Transfer Learning.

This project considers the 200 Bird Species image dataset in order to develop a classification model using Convolutional Neural Network and transfer learning with Pytorch.

The dataset is from kaggle under the name 200 Bird Species. As the name indicates, the dataset contains images of 200 species of birds from different regions around the world. The images are in JPEG format with size 3x224x224. The dataset has 27,503 images for training, 1,000 images for validation et 1,000 images for testing.

The objective of this project is to train a model with CNN and transfer learning in order to be able to get a good accuracy, a minimum loss and to have a very good prediction of the different bird species images.
We will use the training dataset to train the model, the validation dataset to evaluate the model while training and the test dataset to test the model using external data.

We start by importing the libraries and the datasets that we need for this work.

We start by importing the libraries and the datasets that we need for this work.

In [7]:
import os
import torch
import pandas as pd
import numpy as np
from torch.utils.data import Dataset, random_split, DataLoader
from PIL import Image
import torchvision.models as models
import matplotlib.pyplot as plt
from tqdm.notebook import tqdm
import torchvision.transforms as T
from torchvision.datasets import ImageFolder
import torch.nn.functional as F
import torch.nn as nn
from torchvision.utils import make_grid
%matplotlib inline

In [11]:
DATA_DIR = 'C://Users//Yi Ming Chang//Desktop//Dal//PHYC 4250 Topics in Numerical Computing//Problem Sets//Problem set 5//bird species'

TRAIN_DIR = DATA_DIR + '/train'
VALID_DIR = DATA_DIR + '/valid'
TEST_DIR = DATA_DIR + '/test'                             

In [12]:
project_name='project-bird-species'

In [13]:
train_data = ImageFolder(TRAIN_DIR)
valid_data = ImageFolder(VALID_DIR)
test_data = ImageFolder(TEST_DIR)
len(train_data), len(valid_data), len(test_data)

(36609, 1300, 1300)

### 1. Data exploration and pre-processing

As we said in the beginning, the dataset has 200 bird species meaning that we have 200 different classes or 200 outputs we would like to predict with the trained model. These are the classes we have in the train dataset and we have the same set of classes in the validation and test datasets.

In [14]:
CLASSES = list(train_data.class_to_idx.keys())
CLASSES

['AFRICAN CROWNED CRANE',
 'AFRICAN FIREFINCH',
 'ALBATROSS',
 'ALEXANDRINE PARAKEET',
 'AMERICAN AVOCET',
 'AMERICAN BITTERN',
 'AMERICAN COOT',
 'AMERICAN GOLDFINCH',
 'AMERICAN KESTREL',
 'AMERICAN PIPIT',
 'AMERICAN REDSTART',
 'ANHINGA',
 'ANNAS HUMMINGBIRD',
 'ANTBIRD',
 'ARARIPE MANAKIN',
 'ASIAN CRESTED IBIS',
 'BALD EAGLE',
 'BALI STARLING',
 'BALTIMORE ORIOLE',
 'BANANAQUIT',
 'BANDED BROADBILL',
 'BAR-TAILED GODWIT',
 'BARN OWL',
 'BARN SWALLOW',
 'BARRED PUFFBIRD',
 'BAY-BREASTED WARBLER',
 'BEARDED BARBET',
 'BELTED KINGFISHER',
 'BIRD OF PARADISE',
 'BLACK FRANCOLIN',
 'BLACK SKIMMER',
 'BLACK SWAN',
 'BLACK TAIL CRAKE',
 'BLACK THROATED WARBLER',
 'BLACK VULTURE',
 'BLACK-CAPPED CHICKADEE',
 'BLACK-NECKED GREBE',
 'BLACK-THROATED SPARROW',
 'BLACKBURNIAM WARBLER',
 'BLUE GROUSE',
 'BLUE HERON',
 'BOBOLINK',
 'BROWN NOODY',
 'BROWN THRASHER',
 'CACTUS WREN',
 'CALIFORNIA CONDOR',
 'CALIFORNIA GULL',
 'CALIFORNIA QUAIL',
 'CANARY',
 'CAPE MAY WARBLER',
 'CAPUCHINBIRD',
 'C

Here, we count the number of images we have in each dataset according the different species. For the validation and test dataset, we have 5 images for each class while for the training dataset, the number of images may be different according each class.

In [15]:
from collections import Counter
img_counter = Counter([train_data.classes[labels] for _,labels in train_data])
img_counter

Counter({'AFRICAN CROWNED CRANE': 137,
         'AFRICAN FIREFINCH': 140,
         'ALBATROSS': 133,
         'ALEXANDRINE PARAKEET': 165,
         'AMERICAN AVOCET': 179,
         'AMERICAN BITTERN': 170,
         'AMERICAN COOT': 158,
         'AMERICAN GOLDFINCH': 133,
         'AMERICAN KESTREL': 130,
         'AMERICAN PIPIT': 179,
         'AMERICAN REDSTART': 139,
         'ANHINGA': 147,
         'ANNAS HUMMINGBIRD': 139,
         'ANTBIRD': 150,
         'ARARIPE MANAKIN': 105,
         'ASIAN CRESTED IBIS': 105,
         'BALD EAGLE': 160,
         'BALI STARLING': 132,
         'BALTIMORE ORIOLE': 137,
         'BANANAQUIT': 106,
         'BANDED BROADBILL': 194,
         'BAR-TAILED GODWIT': 114,
         'BARN OWL': 119,
         'BARN SWALLOW': 132,
         'BARRED PUFFBIRD': 136,
         'BAY-BREASTED WARBLER': 143,
         'BEARDED BARBET': 160,
         'BELTED KINGFISHER': 125,
         'BIRD OF PARADISE': 104,
         'BLACK FRANCOLIN': 131,
         'BLACK SKIMM

Let's check the characteristic of a single image we have in the dataset. As we can see, the image has size 224x224 with three channels (RGB).

Now, we display few images of the birds we have in the dataset.

In [18]:
set_images = ImageFolder(TRAIN_DIR, transform=T.Compose([T.ToTensor()]))
set_images_dl = DataLoader(set_images, 60 , shuffle=True, 
                      num_workers=3, pin_memory=True)

In [19]:
def show_batch(dl, invert=True):
    for images, labels in dl:
        fig, ax = plt.subplots(figsize=(15, 10))
        ax.set_xticks([]); ax.set_yticks([])
        data = 1-images if invert else images
        ax.imshow(make_grid(data, nrow=10).permute(1, 2, 0))
        break

In [20]:
#show_batch(set_images_dl, invert=True)

From here, we start preparing the data for the training by making some transformations on them.

Pytorch has some utilities to make randomized data augmentation and channel-wise normalization. These two transformations permit to transform the image data in order to end up with a better performance of the model. Before to be able to make the transformations, we have first to transform the images to tensors as PyTorch knows only to work with tensors. 
 
 Channel-wise normalization permits to normalize an image tensor by subtracting the mean and by dividing by the standard deviation of each dimension. In this case, ours tensors will have three dimensions as we have three channels (red, blue and green).
 
Randomized data augmentation helps to avoid over fitting by cropping and flipping the image horizontally (if we choose that option of flipping). This will permit also a better generalization of the model.
We apply the randomized data augmentation and channel-wise normalization to the training dataset. For validation and test datasets, we do only normalization.

To normalize the image channels, we consider the means and the standard deviations from the ImageNet database considering this as a part of transfer learning.

Below we have the transformations that we considered. Many other types of transformations are available on the PyTorch website.

In [21]:
imagenet_stats = ([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])

train_tfms = T.Compose([
    T.RandomCrop(224, padding=8, padding_mode='reflect'),
    T.RandomHorizontalFlip(), 
    T.RandomRotation(10),
    T.ToTensor(), 
    T.Normalize(*imagenet_stats,inplace=True), 
    T.RandomErasing(inplace=True)
])

valid_tfms = T.Compose([ 
    T.ToTensor(), 
    T.Normalize(*imagenet_stats)
])

test_tfms = T.Compose([ 
    T.ToTensor(), 
    T.Normalize(*imagenet_stats)
])

Next, we apply the transformations to the datasets.

In [22]:
train_ds = ImageFolder(TRAIN_DIR, train_tfms)
val_ds = ImageFolder(VALID_DIR, valid_tfms)
test_ds = ImageFolder(TEST_DIR, test_tfms)

Here, we have the shape of a single transformed image tensor for which the label is 0. As we have 200 classes, they will be labeled from 0 to 199.

In [23]:
img_tensor, label = train_ds[0]
print(img_tensor.shape, label)

torch.Size([3, 224, 224]) 0


Number of classes we have.

In [24]:
len(train_ds.classes)             #Number of classes in the dataset

260

### 2. Training data and the model definition

For this training, we consider a batch size of 256. As we run this model on kaggle kernel, we have enough memory space and a GPU as accelerator that permits the training to go faster.

In [25]:
batch_size = 256

We first define our data loader which will load the data in batches of 256 image tensors during each epoch of training. We consider twice batch size for validation data loader. This is because we do not make any gradient calculation during the validation step, so we will have enough memory space to consider more data.

We shuffle the loading only for the training dataset. We do not need to do that for validation and test datasets as there is no need for randomization in these cases.

In [26]:
train_dl = DataLoader(train_ds, batch_size, shuffle=True, 
                      num_workers=2, pin_memory=True)
val_dl = DataLoader(val_ds, batch_size*2, 
                    num_workers=2, pin_memory=True)

We present here a set of images based on the transformed data. We can see the difference with the images we have previously which are from the original dataset. We can remark the effect of cropping and the horizontal flipping on the different images.

In [27]:
def show_batch(dl, invert=True):
    for images, labels in dl:
        fig, ax = plt.subplots(figsize=(15, 10))
        ax.set_xticks([]); ax.set_yticks([])
        data = 1-images if invert else images
        ax.imshow(make_grid(images[:60], nrow=10).permute(1, 2, 0))
        break

In [28]:
#show_batch(train_dl, invert=True)

We develop here our base model for the training (training step and validation step). We consider as metrics accuracy and loss. To determine the losses of the model, we use the cross entropy function.

In [29]:
def accuracy(outputs, labels):
    _, preds = torch.max(outputs, dim=1)
    return torch.tensor(torch.sum(preds == labels).item() / len(preds))

class ImageClassificationBase(nn.Module):
    def training_step(self, batch):
        images, labels = batch 
        out = self(images)                  # Generate predictions
        loss = F.cross_entropy(out, labels) # Calculate loss
        return loss
    
    def validation_step(self, batch):
        images, labels = batch 
        out = self(images)                    # Generate predictions
        loss = F.cross_entropy(out, labels)   # Calculate loss
        acc = accuracy(out, labels)           # Calculate accuracy
        return {'val_loss': loss.detach(), 'val_acc': acc}
        
    def validation_epoch_end(self, outputs):
        batch_losses = [x['val_loss'] for x in outputs]
        epoch_loss = torch.stack(batch_losses).mean()   # Combine losses
        batch_accs = [x['val_acc'] for x in outputs]
        epoch_acc = torch.stack(batch_accs).mean()      # Combine accuracies
        return {'val_loss': epoch_loss.item(), 'val_acc': epoch_acc.item()}
    
    def epoch_end(self, epoch, result):
        print("Epoch [{}], last_lr: {:.5f}, train_loss: {:.4f}, val_loss: {:.4f}, val_acc: {:.4f}".format(
            epoch, result['lrs'][-1], result['train_loss'], result['val_loss'], result['val_acc']))

As we are using transfer learning, we consider as a starting point the pretrained model resnet34 which is a model trained on the ImageNet database. This permits to benefit from its learning on millions of high-resolution images. We have below the architecture of resnet34.

In [30]:
resnet34 = models.resnet34()
#resnet34

We extend our base classification model with the pretrained model resnet34.

Also, we define two interesting function, freeze and unfreeze. Freeze permits to freeze the weights and biases of the previous layers (from resnet34, before fc) during the training, and we have changes only on the weights and biases of the last layer (fc). This allows the weights and biases to be trained first (as the weights and biases from resnet34 have already been trained) and after few epochs we unfreeze the weights and biases to make the training on the entire model.

In [31]:
class BirdResnet(ImageClassificationBase):
    def __init__(self):
        super().__init__()
        # Use a pretrained model
        self.network = models.resnet34(pretrained=True)
        # Replace last layer
        num_ftrs = self.network.fc.in_features
        self.network.fc = nn.Linear(num_ftrs, len(train_ds.classes))
    
    def forward(self, xb):
        return torch.sigmoid(self.network(xb))
    
    def freeze(self):
        # To freeze the residual layers
        for param in self.network.parameters():
            param.require_grad = False
        for param in self.network.fc.parameters():
            param.require_grad = True
    
    def unfreeze(self):
        # Unfreeze all layers
        for param in self.network.parameters():
            param.require_grad = True

We define here a function will permit us to move our data and model to the device. The device we are using is GPU. In case GPU is not available, the function will move the data to the CPU.

In [32]:
def get_default_device():
    """Pick GPU if available, else CPU"""
    if torch.cuda.is_available():
        return torch.device('cuda')
    else:
        return torch.device('cpu')
    
def to_device(data, device):
    """Move tensor(s) to chosen device"""
    if isinstance(data, (list,tuple)):
        return [to_device(x, device) for x in data]
    return data.to(device, non_blocking=True)

class DeviceDataLoader():
    """Wrap a dataloader to move data to a device"""
    def __init__(self, dl, device):
        self.dl = dl
        self.device = device
        
    def __iter__(self):
        """Yield a batch of data after moving it to device"""
        for b in self.dl: 
            yield to_device(b, self.device)

    def __len__(self):
        """Number of batches"""
        return len(self.dl)

In [33]:
device = get_default_device()
#device

We load the data to the default device.

In [34]:
train_dl = DeviceDataLoader(train_dl, device)
val_dl = DeviceDataLoader(val_dl, device)

Instead of considering a fixed learning rate for the training, we will use what is called a learning rate scheduling. With this strategy, the learning rate will vary after each epoch of training. max_lr is the maximum value of the learning rate we consider. Learning rate scheduling is a useful function provided by PyTorch allowing us to automatically consider different learning rate during the training process.

We record also the losses from the validation steps and the used learning rates from each epoch. As recommended in the literature, we do a cycle with equal lengths, one going from a small value of learning rate to the maximum value, and another one from the maximum to a minimum value of the learning rate.

To have control on the weights and the gradients and to avoid that they become too large, we are using also weight decay and gradient clipping. These help also in a better generalization of the model.

In [35]:
@torch.no_grad()
def evaluate(model, val_loader):
    model.eval()
    outputs = [model.validation_step(batch) for batch in val_loader]
    return model.validation_epoch_end(outputs)

def get_lr(optimizer):
    for param_group in optimizer.param_groups:
        return param_group['lr']

def fit_one_cycle(epochs, max_lr, model, train_loader, val_loader, 
                  weight_decay=0, grad_clip=None, opt_func=torch.optim.SGD):
    torch.cuda.empty_cache()
    history = []
    
    # Set up cutom optimizer with weight decay
    optimizer = opt_func(model.parameters(), max_lr, weight_decay=weight_decay)
    # Set up one-cycle learning rate scheduler
    sched = torch.optim.lr_scheduler.OneCycleLR(optimizer, max_lr, epochs=epochs, 
                                                steps_per_epoch=len(train_loader))
    
    for epoch in range(epochs):
        # Training Phase 
        model.train()
        train_losses = []
        lrs = []
        for batch in tqdm(train_loader):
            loss = model.training_step(batch)
            train_losses.append(loss)
            loss.backward()
            
            # Gradient clipping
            if grad_clip: 
                nn.utils.clip_grad_value_(model.parameters(), grad_clip)
            
            optimizer.step()
            optimizer.zero_grad()
            
            # Record & update learning rate
            lrs.append(get_lr(optimizer))
            sched.step()
        
        # Validation phase
        result = evaluate(model, val_loader)
        result['train_loss'] = torch.stack(train_losses).mean().item()
        result['lrs'] = lrs
        model.epoch_end(epoch, result)
        history.append(result)
    return history

Now, we transfer the model to the device.

In [36]:
model = to_device(BirdResnet(), device)

### 3. The training

Before the training start, we begin to evaluate the model. This evaluation shows the starting point of the metrics (loss and accuracy). As we can, we have a loss of 5.30 and a very low accuracy of 0.005. The objective is to minimize the loss and maximize the accuracy while training.

In [37]:
history = [evaluate(model, val_dl)]
history

[{'val_loss': 5.572378635406494, 'val_acc': 0.001953125}]

To start training the model, we first freeze the previous layers of the model as we mentioned previously.

In [38]:
model.freeze()

We define here the values of the different hyperparameters. We choose to make 15 epochs in the freezing step and 15 after we unfreezed the model meaning that in total we have 30 epochs. We consider a maximum learning rate of 0.001 (we found this value better than the others we have tested), gradient clipping value of 1 and a weight decay of 1e-5. The gradient clipping of 1 means that the gradients will be kept in a range of -1 and 1. Also, we use Adam as optimization function.

# The model takes hours to run

In [39]:
epochs = 1
max_lr = 0.01
grad_clip = 1
weight_decay = 1e-4
opt_func = torch.optim.Adam

Finally, we start training the model. We can see how the loss and the accuracy change over the different epochs of the training. Starting with 0.005 accuracy, we jump to 0.7987 after the first epoch.

In [None]:
%%time
history += fit_one_cycle(epochs, max_lr, model, train_dl, val_dl,  
                         opt_func=opt_func)

  0%|          | 0/144 [00:00<?, ?it/s]

Next, we unfreeze the model to train it on the entire architecture. At this step, we found it better in term of accuracy to consider a value of max_lr as 0.0001.

In [None]:
model.unfreeze()

In [None]:
%%time
history += fit_one_cycle(epochs, 0.0001, model, train_dl, val_dl, 
                         grad_clip=grad_clip, 
                         weight_decay=weight_decay,
                         opt_func=opt_func)

In [None]:
train_time='41mn:12s'

As graphs may be more informative, we plot the history of the loss and the accuracy during the training process.

In [None]:
def plot_accuracies(history):
    accuracies = [x['val_acc'] for x in history]
    plt.plot(accuracies, '-x')
    plt.xlabel('epoch')
    plt.ylabel('accuracy')
    plt.title('Accuracy vs. No. of epochs');

The plot shows clearly how the accuracy increases very fast but slowdown its increasing after a certain number of epoch.

In [None]:
plot_accuracies(history)

In [None]:
def plot_losses(history):
    train_losses = [x.get('train_loss') for x in history]
    val_losses = [x['val_loss'] for x in history]
    plt.plot(train_losses, '-bx')
    plt.plot(val_losses, '-rx')
    plt.xlabel('epoch')
    plt.ylabel('loss')
    plt.legend(['Training', 'Validation'])
    plt.title('Loss vs. No. of epochs');

We can see also that the training loss and validation loss are going together. This means that we avoided over fitting in our training. We have to be careful in choosing the number of epoch because the model may start over fitting if this number becomes too large.

In [None]:
plot_losses(history)

In [None]:
def plot_lrs(history):
    lrs = np.concatenate([x.get('lrs', []) for x in history])
    plt.plot(lrs)
    plt.xlabel('Batch no.')
    plt.ylabel('Learning rate')
    plt.title('Learning Rate vs. Batch no.');

This is the history of the considered learning rates during the training process.

In [None]:
plot_lrs(history)

### 4. Predictions

Now that we trained the model, let's make few predictions and assess its performance. 

In [None]:
def predict_image(img, model):
    xb = img.unsqueeze(0)
    xb = to_device(xb, device)
    yb = model(xb)
    _, preds  = torch.max(yb, dim=1)
    return preds[0].item()

As we can, the model predict the different species of birds pretty well. We can see that by comparing the predicted label of the species and their actual label. These ones are the same.

### 5. Final evaluation of the trained model on test dataset

We evaluate our model on the test dataset and ended up with a loss of 4.31 and an accuracy rate of 96 percent which is a good accuracy.

In [None]:
test_dl = DataLoader(test_ds, batch_size*2, 
                    num_workers=2, pin_memory=True)

test_dl = DeviceDataLoader(test_dl, device)
result = evaluate(model, test_dl)
result

### 6. Conclusion

In conclusion, we can say that our model predict very well the different species of birds. 

As a future work about this dataset, we will try to use Generative Adversarial Networks to develop a model which will permit to generate a set of birds images.

This course was an amazing journey through which we learned many things. As the name of the course indicates, we went from zero to GANS. We have learned how to make machine learning by using linear and logistic regressions, deep learning with feed forward neural network, deep learning with convolutional neural network and GANS.

# References

https://neurohive.io/en/popular-networks/resnet/
https://sgugger.github.io/the-1cycle-policy.html
https://towardsdatascience.com/this-thing-called-weight-decay-a7cd4bcfccab
https://www.youtube.com/watch?v=sJF6PiAjE1M

# Social Media

https://www.linkedin.com/in/moustapha-daouda-dala/