# Classification of Bird Species in Singapore

## Introduction
Singapore is a vibrant tropical country that, in spite of its urbanization, offers a paradise for nature lovers. In particular, its nature reserves are home to a host of different species of birds, attracting birdwatchers, nature enthusiasts and simple weekend hikers from all over the world. The official website of [Singapore's National Parks](https://www.nparks.gov.sg/biodiversity/wildlife-in-singapore/species-list/bird) catalogs over 400 different bird species that can be found on the island. Thus it happens frequently to take an image or see a bird flying in the wild without being able to recognize the species.

## Dataset
We construct our dataset by empploying webscrapers on websites such as [Flickr](./from_flickr_to_dataset.ipynb) and [Internet Bird Collection](./scraper_ibc.ipynb) to crawl publicly available images of the common bird species in Singapore.

In [None]:
import math
import pandas as pd
import numpy as np
import tensorflow as tf
from glob import glob
import os
import matplotlib.pyplot as plt
from IPython.display import display_html
import itertools
import keras_preprocessing.image
from collections import defaultdict
from keras.preprocessing.image import ImageDataGenerator
from keras import backend as K
from tensorflow.python.keras.preprocessing.image import load_img, img_to_array
from tensorflow.python.keras.applications import ResNet50
from tensorflow.python.keras.applications.resnet50 import preprocess_input
from tensorflow.python.keras.models import Sequential, Model
from tensorflow.python.keras.layers import Dense, Flatten, BatchNormalization, Input, Dropout, Lambda
from sklearn.metrics import classification_report, confusion_matrix
from sklearn.preprocessing import normalize
from sklearn.neighbors import NearestNeighbors

# The following resolves the issue of possibly truncated images in the datasets
from PIL import ImageFile
ImageFile.LOAD_TRUNCATED_IMAGES = True

## Data Preprocessing and Augmentation
Next we proceed with the preprocessing of the raw images.

This involves the following:

1. A [web application](http://172.21.144.45:5000/) is developed enabling users to manually remove noisy images from the raw dataset. Examples of noisy images are
 - Image does not correspond to the bird species
 - Too much sightline occlusion
 - More than 1 bird present

2. Remove bird species with fewer than 50 images

3. Data normalization and augmentation.

For (2) and (3), we employ the Keras API. Dataframes are created for the training and test sets.

In [None]:
# List all data folders for the raw images
data_folders = [f for f in os.scandir('data') if f.is_dir()]
nspecies = len(data_folders)

train_ratio = 0.8

train_paths = []

test_paths = []

# Create dataframes that can be used directly by Keras ImageDataGenerator class
for i in range(nspecies):
    image_list = glob(data_folders[i].path + "/*.jpg")

    # Remove datasets with fewer than 50 images
    if len(image_list)<50:
        continue

    # Splitting the datasets to training and testing
    train_idx = math.floor(len(image_list)*train_ratio)
    for j in range(train_idx):
        train_paths.append(image_list[j][5:])
    for j in range(train_idx,len(image_list)):
        test_paths.append(image_list[j][5:])
train_set = pd.DataFrame({'paths':train_paths})
test_set = pd.DataFrame({'paths':test_paths})

# The clean dataset is obtained by manual cleaning
dataset_clean = pd.read_csv('./dataset_clean.csv')

train_set = pd.merge(train_set, dataset_clean, how='inner', on=['paths'])
test_set = pd.merge(test_set, dataset_clean, how='inner', on=['paths'])

## Data Visualization and Exploration

In [None]:
# First we list the amount of training and testing images for each species
train_distribution = train_set.groupby(['species'], as_index=False).count()
train_distribution.drop(train_distribution.columns.difference(['species','paths']), 1, inplace=True)
train_distribution.columns=['species', '# Train Images']
train_distribution

test_distribution = test_set.groupby(['species'], as_index=False).count()
test_distribution.drop(test_distribution.columns.difference(['species','paths']), 1, inplace=True)
test_distribution.columns=['species', '# Test Images']
test_distribution

species_info = pd.merge(train_distribution,test_distribution)
display(species_info)

In [None]:
# Read in all images belonging to a species
def read_images(species, trainortest):
    if trainortest == 'train':
        paths = train_set['paths'][train_set['species']==species].tolist()
    elif trainortest == 'test':
        paths = test_set['paths'][test_set['species']==species].tolist()
    paths = ['./data/' + s for s in paths]
    imgs = [load_img(img_path, target_size=(image_size, image_size)) for img_path in paths]
    img_array = np.array([img_to_array(img) for img in imgs])
    return img_array


# Visualize images from a species
all_species = train_set.species.unique()
species = all_species[0]

imgs = read_images(species, 'train')/255
n_imgs = imgs.shape[0]
columns = 4
fig = plt.figure(figsize=(20,int(n_imgs/columns*5)))
for i in range(n_imgs):
    plt.subplot(n_imgs/columns+1, columns, i + 1)
    plt.imshow(imgs[i])
    plt.axis('off')
plt.suptitle(species.upper(), fontsize='x-large')
fig.tight_layout
fig.subplots_adjust(top=0.97)

## Our Proposed Classification Pipeline

Feature extraction will be done using the ResNet50 network pretrained on ImageNet.

Following feature extraction, we then train our network for classification over the bird species. For this task, we adopt and compare the classification performance of two loss functions.

-  Softmax entropy loss which is commonly used in classification tasks.
-  Triplet loss (an example of a loss function from the metric learning loss class).

## Softmax Classification

In the following, we employ the Keras preprocessing framework to read in the training and testing datasets defined via the dataframes above. Keras also allows us to perform data augmentation efficiently.


**If the following error arises**, update your Keras package.

*ImageDataGenerator' object has no attribute 'flow_from_dataframe'*

**Also take note that we do not implement rescale=1./255 here to be consistent with ResNet50 preprocessing. Otherwise, there will be issues when evaluating the trained model on the test set.** This [link](https://github.com/keras-team/keras/issues/3477) discusses a similar issue.

In [None]:
# Some hyperparameters
image_size = 224
batch_size = 32


# Patch the dataframe filenames interaction in Keras preprocessing
def patched_list_filenames(directory, white_list_formats, split,
                                       class_indices, follow_links, df=False):
    dirname = os.path.basename(directory)
    if split:
        num_files = len(list(
            _iter_valid_files(directory, white_list_formats, follow_links)))
        start, stop = int(split[0] * num_files), int(split[1] * num_files)
        valid_files = list(
            _iter_valid_files(
                directory, white_list_formats, follow_links))[start: stop]
    else:
        valid_files = _iter_valid_files(
            directory, white_list_formats, follow_links)
    if df:
        filenames = []
        for root, fname in valid_files:
            absolute_path = os.path.join(root, fname)
            relative_path = os.path.relpath(absolute_path, directory)
            filenames.append(relative_path)
        return filenames
    classes = []
    filenames = []
    for root, fname in valid_files:
        classes.append(class_indices[dirname])
        absolute_path = os.path.join(root, fname)
        relative_path = os.path.join(
            dirname, os.path.relpath(absolute_path, directory))
        filenames.append(relative_path)
    return classes, filenames

keras_preprocessing.image._list_valid_filenames_in_directory.__code__ = patched_list_filenames.__code__


# Train set generation and augmentation using Keras preprocessing
softmax_train_gen = ImageDataGenerator(horizontal_flip=True,
                                     width_shift_range = 0.4,
                                     height_shift_range = 0.4,
                                     zoom_range=0.3,
                                     rotation_range=20,
                                    )
softmax_train_gen = softmax_train_gen.flow_from_dataframe(
    dataframe=train_set, 
    directory='./data',
    x_col='paths', 
    y_col='species', 
    has_ext=True,
    target_size=(image_size, image_size),
    batch_size=batch_size,
    class_mode='categorical')


# Test set generation using Keras preprocessing
softmax_test_gen = ImageDataGenerator()
softmax_test_gen = softmax_test_gen.flow_from_dataframe(
    dataframe=test_set, 
    directory="./data/", 
    x_col="paths", 
    y_col="species", 
    has_ext=True,
    target_size=(image_size, image_size),
    batch_size=batch_size,
    class_mode='categorical')

num_classes = len(train_generator.class_indices)

## Softmax classification network

In [None]:
# Pretrained ResNet50 on ImageNet
softmax_model = Sequential()

softmax_model.add(ResNet50(include_top=False, pooling='avg', weights='imagenet'))
softmax_model.add(Flatten())
softmax_model.add(BatchNormalization())
softmax_model.add(Dense(2048, activation='relu'))
softmax_model.add(BatchNormalization())
softmax_model.add(Dense(1024, activation='relu'))
softmax_model.add(BatchNormalization())
softmax_model.add(Dense(num_classes, activation='softmax'))

softmax_model.layers[0].trainable = False

In [None]:
def top_3_accuracy(y_true, y_pred):
    return tf.keras.metrics.top_k_categorical_accuracy(y_true, y_pred, k=3)
# model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=[top_3_accuracy])

softmax_model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
hist = model.fit_generator(generator=softmax_train_gen,
                           steps_per_epoch=int(train_set.shape[0]/batch_size)+1,
                           epochs=10,
                           shuffle=True,
                           validation_data=softmax_test_gen,
                           validation_steps=int(test_set.shape[0]/batch_size)+1)

## Evaluating on Test Images

We evaluate on test images and display the results using a confusion matrix and precision/recall/f1-scores.

In [None]:
def read_test_images(family):
    paths = test_set['paths'][test_set['family']==family].tolist()
    paths = ['./data/' + s for s in paths]
    imgs = [load_img(img_path, target_size=(image_size, image_size)) for img_path in paths]
    img_array = np.array([img_to_array(img) for img in imgs])
    return preprocess_input(img_array)

prediction_results = []
ground_truth = []
test_distribution.to_dict()
for i in range(len(families)):
    prediction_results.append(model.predict_classes(read_test_images(families[i])))
    n_test_imgs = test_distribution.loc[test_distribution['family']==families[i]].iloc[0][1]
    ground_truth.append(np.ones(n_test_imgs)*i)
prediction_results = np.hstack(prediction_results)
ground_truth = np.hstack(ground_truth)

In [None]:
print('Confusion Matrix')
CM = confusion_matrix(ground_truth, prediction_results)
print(CM)
target_names = [families[i] for i in range(len(families))]
print('Classification Report')
CR = classification_report(ground_truth, prediction_results, target_names=target_names)
print(CR)

## Triplet loss classification

The triplet loss is an example of a metric learning loss. The goal is to learn a metric function that maps from the base features extracted using the pretrained ResNet50 to a target embedding vector. This mapping minimizes intra-species distances while maximizing inter-species distances.

In [None]:
# Some hyperparameters
image_size = 224
batch_size = 32


class sample_gen(object):
    def __init__(self, file_class_mapping):
        self.file_class_mapping= file_class_mapping
        self.class_to_list_files = defaultdict(list)
        self.list_all_files = list(file_class_mapping.keys())
        self.range_all_files = list(range(len(self.list_all_files)))

        for file, class_ in file_class_mapping.items():
            self.class_to_list_files[class_].append(file)

        self.list_classes = list(set(self.file_class_mapping.values()))
        self.range_list_classes= range(len(self.list_classes))
        self.class_weight = np.array([len(self.class_to_list_files[class_]) for class_ in self.list_classes])
        self.class_weight = self.class_weight/np.sum(self.class_weight)

    def get_sample(self):
        class_idx = np.random.choice(self.range_list_classes, 1, p=self.class_weight)[0]
        examples_class_idx = np.random.choice(range(len(self.class_to_list_files[self.list_classes[class_idx]])), 2)
        anchor_example, positive_example = self.class_to_list_files[self.list_classes[class_idx]][examples_class_idx[0]], self.class_to_list_files[self.list_classes[class_idx]][examples_class_idx[1]]

        negative_example = None
        while negative_example is None or self.file_class_mapping[negative_example] == self.file_class_mapping[anchor_example]:
            negative_example_idx = np.random.choice(self.range_all_files, 1)[0]
            negative_example = self.list_all_files[negative_example_idx]
        return anchor_example, positive_example,negative_example

    
def read_single_image(img_path):
    img = load_img('./data/' + img_path, target_size=(image_size, image_size))
    img_array = np.array(img_to_array(img))
    return preprocess_input(img_array)


def gen(triplet_gen):
    while True:
        list_anchor = []
        list_negative = []
        list_positive = []

        for i in range(batch_size):
            path_anchor, path_positive, path_negative = triplet_gen.get_sample()
            
            anchor = read_single_image(path_anchor)
            positive = read_single_image(path_positive)
            negative = read_single_image(path_negative)

            list_anchor.append(anchor)
            list_positive.append(positive)
            list_negative.append(negative)

        A = preprocess_input(np.array(list_anchor))
        P = preprocess_input(np.array(list_positive))
        N = preprocess_input(np.array(list_negative))
        
        yield ({'anchor_input': A, 'positive_input': P, 'negative_input': N}, None)

In [None]:
def triplet_loss(inputs, dist='sqeuclidean', margin='maxplus'):
    anchor, positive, negative = inputs
    positive_distance = K.square(anchor - positive)
    negative_distance = K.square(anchor - negative)
    if dist == 'euclidean':
        positive_distance = K.sqrt(K.sum(positive_distance, axis=-1, keepdims=True))
        negative_distance = K.sqrt(K.sum(negative_distance, axis=-1, keepdims=True))
    elif dist == 'sqeuclidean':
        positive_distance = K.sum(positive_distance, axis=-1, keepdims=True)
        negative_distance = K.sum(negative_distance, axis=-1, keepdims=True)
    loss = positive_distance - negative_distance
    if margin == 'maxplus':
        loss = K.maximum(0.0, 1 + loss)
    elif margin == 'softplus':
        loss = K.log(1 + K.exp(loss))
    return K.mean(loss)


embedding_dim = 50
def GetModel():
    base_model = ResNet50(weights='imagenet', include_top=False, pooling='max')
    for layer in base_model.layers:
        layer.trainable = False
    
    x = base_model.output
    x = Dropout(0.6)(x)
    x = Dense(embedding_dim)(x)
    x = Lambda(lambda  x: K.l2_normalize(x,axis=1))(x)
    embedding_model = Model(base_model.input, x, name="embedding")

    input_shape = (image_size, image_size, 3)
    anchor_input = Input(input_shape, name='anchor_input')
    positive_input = Input(input_shape, name='positive_input')
    negative_input = Input(input_shape, name='negative_input')
    anchor_embedding = embedding_model(anchor_input)
    positive_embedding = embedding_model(positive_input)
    negative_embedding = embedding_model(negative_input)

    inputs = [anchor_input, positive_input, negative_input]
    outputs = [anchor_embedding, positive_embedding, negative_embedding]
       
    triplet_model = Model(inputs, outputs)
    triplet_model.add_loss(K.mean(triplet_loss(outputs)))

    return embedding_model, triplet_model


embedding_model, triplet_model = GetModel()

In [None]:
train_paths_species = {img_path: species for img_path, species in zip(train_set.paths, train_set.species)}
train_paths_species = sample_gen(train_paths_species)
triplet_train_gen = gen(train_paths_species)

test_paths_species = {img_path: species for img_path, species in zip(test_set.paths, test_set.species)}
test_paths_species = sample_gen(test_paths_species)
triplet_test_gen = gen(test_paths_species)

In [None]:
triplet_model.compile(loss=None, optimizer='adam')
history = triplet_model.fit_generator(triplet_train_gen,
                              epochs=5,
                              steps_per_epoch=train_set.shape[0]//batch_size,
                              validation_data=triplet_test_gen, 
                              validation_steps=test_set.shape[0]//batch_size)

## Evaluating on Test Images

We evaluate on test images and display the results using a confusion matrix and precision/recall/f1-scores.

In [None]:
def read_images(species, trainortest):
    if trainortest == 'train':
        paths = train_set['paths'][train_set['species']==species].tolist()
    elif trainortest == 'test':
        paths = test_set['paths'][test_set['species']==species].tolist()
    paths = ['./data/' + s for s in paths]
    imgs = [load_img(img_path, target_size=(image_size, image_size)) for img_path in paths]
    img_array = np.array([img_to_array(img) for img in imgs])
    return img_array


train_features = []
for species in all_species:
    train_imgs = preprocess_input(read_images(species, 'train'))
    train_features.append(embedding_model.predict(train_imgs))
    
train_features = np.vstack(train_features)


test_features = []
for species in all_species:
    test_imgs = preprocess_input(read_images(species, 'test'))
    test_features.append(embedding_model.predict(test_imgs))
    
test_features = np.vstack(test_features)

In [None]:
neigh = NearestNeighbors(n_neighbors=1)
neigh.fit(train_features)
distances_test, neighbors_test = neigh.kneighbors(test_features)

In [None]:
train_labels = []
test_labels = []
for i in train_paths_species.list_classes:
    nspecies_i = train_distribution['# Train Images'][train_distribution.species==i].iloc[0]
    for j in range(nspecies_i):
        train_labels.append(i)
    
    nspecies_i = test_distribution['# Test Images'][train_distribution.species==i].iloc[0]
    for j in range(nspecies_i):
        test_labels.append(i)

predicted_labels = []
for i in range(neighbors_test.shape[0]):
    predicted_labels.append(train_labels[np.asscalar(neighbors_test[i])])

In [None]:
print('Confusion Matrix')
CM = confusion_matrix(test_labels, predicted_labels)
print(CM)
print('Classification Report')
target_names = [i for i in train_paths_species.list_classes]
CR = classification_report(test_labels, predicted_labels, target_names=target_names)
print(CR)