In this experiment, we are going to visualize the uncertainty of a deep neural network by
using dropout as a mean of Bayesian Approximation [[1]](https://arxiv.org/abs/1506.02142).

We are going to use the [monkey species dataset from Kaggle](https://www.kaggle.com/slothkong/10-monkey-species).
If you want to run this notebook, clone it in your Colab, get your Kaggle API key,
upload it on your Colab, download the Kaggle monkey dataset and start cracking.

# Download Dataset from Kaggle

In [None]:
! pip install -q kaggle
from google.colab import files

This is where you upload your Kaggle API JSON file.
Steps to do that: Go to your Kaggle account and look for API token generation section.
Expire your previous tokens (if you any of them) and then, create a new token. You can
just create a new token without expiring your previous tokens if you wish too.
This will download a `Kaggle.json` file. Upload this file in the following step.

In [None]:
# Run this cell and press Browse to upload Kaggle.json file
files.upload()

### Download local copy of monkey dataset

In [None]:
! mkdir ~/.kaggle
! mv kaggle.json ~/.kaggle/
! chmod 600 ~/.kaggle/kaggle.json
! kaggle datasets download -d slothkong/10-monkey-species
! mkdir monkey_dataset
! unzip 10-monkey-species.zip -d monkey_dataset

In [None]:
# Disable eager execution for K.function() with learning_phase to work properly
tf.compat.v1.disable_eager_execution()

# Preparation

In [1]:
from network import Network
from plotting import plot_grid, visualize_probdist
from dataset import Dataset

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

In [None]:
IMAGE_SHAPE = (224, 224, 3)
BATCH_SIZE = 32

train_dir = 'monkey_dataset/training/training'
test_dir = 'monkey_dataset/validation/validation'

# The CSV file contains spaces for data fields and column names which we need to remove.
monkey_labels = pd.read_csv("monkey_dataset/monkey_labels.txt")
monkey_labels.columns = [c.strip() for c in monkey_labels.columns]
monkey_labels['Label'] = monkey_labels['Label'].str.strip()
monkey_labels['Common Name'] = monkey_labels['Common Name'].str.strip()

### Plot Training and Testing Data Count Distribution

In [None]:
plt.figure(figsize=(10,4))

plt.bar(range(len(monkey_labels['Train Images'])), monkey_labels['Train Images'])
plt.bar(range(len(monkey_labels['Validation Images'])), monkey_labels['Validation Images'])
plt.xticks(range(len(monkey_labels['Common Name'])), monkey_labels['Common Name'], rotation=90)
plt.xlabel('Monkey Types')
plt.ylabel('Count')
plt.title('Training and Testing Data Distribution')
plt.show()

# Visualize Monkeys in the Dataset

In [None]:
# [Class Folder Name]: [Common Name]

In [None]:
plot_grid(rows=2, cols=5, figsize=(16,8),
          image_root_path=train_dir, labels=monkey_labels, data_shape=IMAGE_SHAPE[:2])

# Training

We want to train some classes of monkeys and see how our model performs for classes of monkeys which it was never trained before (testing out of distribution uncertainty). But this is not as extreme as testing out on completely different looking data samples (like objects, and other animals)

In [7]:
# Selecting classes of monkey to train
class_filter = ['n0', 'n1', 'n2', 'n3', 'n4']
label_mapping = monkey_labels['Common Name'][:5].to_dict() # Get class folder name -> common name mapping by taking the first five rows from dataframe

In [8]:
train_dataset = Dataset(path=train_dir, target_size=IMAGE_SHAPE[:2], dataset_type=Dataset.TRAIN,
                        batch_size=BATCH_SIZE,
                        class_filter=class_filter,
                        label_mapping=label_mapping)

val_dataset = Dataset(path=train_dir, target_size=IMAGE_SHAPE[:2], dataset_type=Dataset.VAL,
                        batch_size=BATCH_SIZE,
                        class_filter=class_filter,
                        label_mapping=label_mapping)

test_dataset = Dataset(path=test_dir, target_size=IMAGE_SHAPE[:2], dataset_type=Dataset.TEST,
                       batch_size=BATCH_SIZE,
                       class_filter=class_filter,
                       label_mapping=label_mapping)

Found 443 images belonging to 5 classes.
Found 0 images belonging to 5 classes.
Found 137 images belonging to 5 classes.


In [9]:
net = Network(input_shape=IMAGE_SHAPE, dropout_rate=0.2, num_classes=len(class_filter))

In [10]:
net.train_model(train_dataset, val_dataset, epochs=2)

Epoch 1/2
14/13 - 31s - loss: 1.7165 - accuracy: 0.2122
Epoch 2/2
14/13 - 29s - loss: 1.5480 - accuracy: 0.3341


In [23]:
y_prob_preds = net.model.predict(test_dataset.datagenerator)
y_preds = np.argmax(y_prob_preds, axis=1)

label_test = test_dataset.datagenerator.classes

In [25]:
from sklearn.metrics import classification_report
print(classification_report(label_test, y_preds))

              precision    recall  f1-score   support

           0       0.79      0.73      0.76        26
           1       0.42      0.36      0.38        28
           2       0.71      0.37      0.49        27
           3       0.40      1.00      0.57        30
           4       0.00      0.00      0.00        26

    accuracy                           0.50       137
   macro avg       0.46      0.49      0.44       137
weighted avg       0.46      0.50      0.44       137



  _warn_prf(average, modifier, msg_start, len(result))
