# Species Classification with Metadata and Uncertainty — iNaturalist Project

In this notebook, we will build an **image classification Convolutional neural network** for identifying species 
from the iNaturalist dataset. Our model will combine computer vision with **user-provided metadata** 
(like animal kingdom and phylum), and output both a prediction and an **uncertainty estimate** to build trust in the 
computer vision of the app.

---

## Objectives to improve from simple CNN

1. Load and preprocess the iNaturalist dataset.
2. Apply **advanced image augmentation** using `albumentations` to increase performance.
3. Build a model that combines:
    - CNN features from a **pretrained Tensorflow**.
    - The database of Inaturalist this includes **images, categories,annotations**.
4. Train using **Class-Balanced Loss (CB Loss)** for imbalanced species data.
5. Implement **Monte Carlo Dropout** for uncertainty estimation.
6. Test predictions with confidence and uncertainty output.
 
---



## Loading and Preparing Data

```python
import os
import cv2
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
import json
from keras.models import Sequential, Model
from keras.layers import Dense, Flatten, Activation, Dropout, GlobalAveragePooling2D
from keras.preprocessing.image import ImageDataGenerator
from keras import optimizers, applications
from keras.callbacks import ModelCheckpoint, LearningRateScheduler, TensorBoard, EarlyStopping
from keras import backend as K 
import albumentations as A
from albumentations.core.composition import OneOf
from albumentations.pytorch import ToTensorV2


## Example: open the train data from the github from iNaturalist
ann_file = '../input/inaturalist-2019-fgvc6/train2019.json'
with open(ann_file) as data_file:
        train_anns = json.load(data_file)

train_anns_df = pd.DataFrame(train_anns['annotations'])[['image_id','category_id']]
train_img_df = pd.DataFrame(train_anns['images'])[['id', 'file_name']].rename(columns={'id':'image_id'})
train_cat_df = pd.DataFrame(train_anns['categories']) #also add the category's in the training data, so later the metadata can be added
df_train_file_cat = pd.merge(train_img_df, train_anns_df, train_cat_df, on='image_id')
df_train_file_cat['category_id']=df_train_file_cat['category_id'].astype(str)
df_train_file_cat.head()

In [None]:
# Example of images for category_id = 400
img_names = df_train_file_cat[df_train_file_cat['category_id']=='400']['file_name'][:30]

plt.figure(figsize=[15,15])
i = 1
for img_name in img_names:
    img = cv2.imread("../input/inaturalist-2019-fgvc6/train_val2019/%s" % img_name)[...,[2, 1, 0]]
    plt.subplot(6, 5, i)
    plt.imshow(img)
    i += 1
plt.show()

In [None]:
## Example: open the validation data from the github from iNaturalist
valid_ann_file = '../input/inaturalist-2019-fgvc6/val2019.json'
with open(valid_ann_file) as data_file:
        valid_anns = json.load(data_file)
    
valid_anns_df = pd.DataFrame(valid_anns['annotations'])[['image_id','category_id']]
valid_anns_df.head()
valid_img_df = pd.DataFrame(valid_anns['images'])[['id', 'file_name']].rename(columns={'id':'image_id'})
valid_img_df.head()
valid_cat_df = pd.DataFrame(train_anns['categories']) #also add the category's in the training data, so later the metadata can be added
valid_cat_df.head()
df_valid_file_cat = pd.merge(valid_img_df, valid_anns_df, valid_cat_df, on='image_id')
df_valid_file_cat['category_id']=df_valid_file_cat['category_id'].astype(str)
df_valid_file_cat.head()

## augmentation
Data augmentation is a technique used in machine learning — especially in computer vision — to artificially increase the size and variety of a training dataset by creating modified versions of existing images.

This helps the model generalize better and become less sensitive to specific positions, lighting, angles, or distortions in the input data.

Typical image augmentations include:

-Resizing & Cropping
-Flipping and Rotating
-Brightness and Contrast Adjustments
-Shifting, Scaling, and Zooming
-Noise, Blurring, and Distortion

By teaching the model to handle these variations, augmentation improves its ability to recognize patterns on new, unseen images

In [None]:
## Apply advanced image augmentation like flipping images, zooming and shifting
%%time
# Custom Albumentations transform for EfficientNet preprocessing
class EfficientNetPreprocess(A.ImageOnlyTransform):
    def __init__(self, always_apply=True, p=1.0):
        super().__init__(always_apply, p)

    def apply(self, image, **params):
        return preprocess_input(image)

# Training transform
train_transform = A.Compose([
    A.RandomResizedCrop(300, 300, scale=(0.8, 1.0), p=1.0),
    A.HorizontalFlip(p=0.5),
    A.RandomBrightnessContrast(p=0.3),
    A.ShiftScaleRotate(shift_limit=0.2, scale_limit=0.2, rotate_limit=20, p=0.5),
    EfficientNetPreprocess(),
    ToTensorV2()
])

# Validation transform
valid_transform = A.Compose([
    A.Resize(300, 300),
    EfficientNetPreprocess(),
    ToTensorV2()
])

# Custom Data Generator
class AlbumentationsDataGenerator(tf.keras.utils.Sequence):
    def __init__(self, dataframe, directory, transform, batch_size=32, shuffle=True):
        self.dataframe = dataframe.reset_index(drop=True)
        self.directory = directory
        self.transform = transform
        self.batch_size = batch_size
        self.shuffle = shuffle
        self.indices = np.arange(len(self.dataframe))
        self.on_epoch_end()

    def __len__(self):
        return int(np.ceil(len(self.dataframe) / self.batch_size))

    def __getitem__(self, index):
        batch_indices = self.indices[index * self.batch_size:(index + 1) * self.batch_size]
        batch = [self.dataframe.iloc[i] for i in batch_indices]
        return self.__data_generation(batch)

    def on_epoch_end(self):
        if self.shuffle:
            np.random.shuffle(self.indices)

    def __data_generation(self, batch):
        X = []
        y = []
        for record in batch:
            image_path = os.path.join(self.directory, record["file_name"])
            image = cv2.imread(image_path)
            image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
            augmented = self.transform(image=image)
            X.append(augmented['image'].numpy())
            y.append(record["category_id"])
        
        X = np.array(X)
        y = tf.keras.utils.to_categorical(y, num_classes=self.dataframe['category_id'].nunique())
        return X, y

# Create generators
train_generator = AlbumentationsDataGenerator(
    dataframe=df_train_file_cat,
    directory="../input/inaturalist-2019-fgvc6/train_val2019",
    transform=train_transform,
    batch_size=32,
    shuffle=True
)

valid_generator = AlbumentationsDataGenerator(
    dataframe=df_valid_file_cat,
    directory="../input/inaturalist-2019-fgvc6/train_val2019",
    transform=valid_transform,
    batch_size=32,
    shuffle=False
)

## training with model
EfficientNetB3 is part of the EfficientNet family a series of convolutional neural networks designed to balance accuracy and efficiency.
It’s a great choice because:

Pretrained on ImageNet — it has already learned rich visual features from millions of images.
Optimized Scaling — EfficientNet scales width, depth, and resolution in a balanced way, so it can handle complex images without wasting resources.

Lightweight but Powerful — compared to older models like ResNet or VGG, EfficientNetB3 reaches higher accuracy while using fewer parameters and less computation.

Transfer Learning Friendly — you can fine-tune it easily for your own dataset, even with limited labeled data.

This makes EfficientNetB3 ideal for modern computer vision tasks when you want a model that is both fast and accurate.



In [None]:
model = EfficientNetB3(weights='imagenet', include_top=False, input_shape=(img_size, img_size, 3))
model.trainable = False

#Adding custom layers 
x = model.output
x = Flatten()(x)
x = Dense(1024, activation="relu")(x)
x = Dropout(0.5)(x)
predictions = Dense(nb_classes, activation="softmax")(x)
model_final = Model(input = model.input, output = predictions)

model_final.compile(optimizers.rmsprop(lr=0.0001, decay=1e-6),loss='categorical_crossentropy',metrics=['accuracy'])