# Intro
In this notebook I'd explore some data quality issues of the dataset. For that, I'd use images which appeared to be among the most challenging for the classifier. To let the classifier make excuses for its mistakes (and to use the excuses as clues to data quality), we'd use Guided GradCAM.

# Plan
1. [Preparing everything necessary](#Preparing-everything-necessary)
2. [Inspecting False Positives](#Inspecting-False-Positives)
  * [Confusion matrix](#Confusion-matrix)
  * [Visualizations](#Visualizations)
    * [Helper functions](#Helper-functions)
      * [Guided Backprop routines](#Guided-Backprop-routines)
      * [Functions checking distribution of labels for challenging images](#Functions-checking-distribution-of-labels-for-challenging-images)
      * [Functions for plotting mistakes alongside with correct ones for comparison](#Functions-for-plotting-mistakes-alongside-with-correct-ones-for-comparison)
    * [Fish](#Fish)
      * [Visual comparison of class representatives with challenging images](#Visual-comparison-of-class-representatives-with-challenging-images)
      * [Distribution of labels in Fish-false-positive images](#Distribution-of-labels-in-Fish-false-positive-images)
    * [Exploring dark images](#Exploring-dark-images)
    * [Flower](#Flower)
      * [Visual comparison of class representatives with challenging images](#Visual-comparison-of-class-representatives-with-challenging-images)
      * [Distribution of labels in Flower-false-positive images](#Distribution-of-labels-in-Flower-false-positive-images)
    * [Sugar](#Sugar)
      * [Visual comparison of class representatives with challenging images](#Visual-comparison-of-class-representatives-with-challenging-images)
      * [Distribution of labels in Sugar-false-positive images](#Distribution-of-labels-in-Sugar-false-positive-images)
    * [Gravel](#Gravel)
      * [Visual comparison of class representatives with challenging images](#Visual-comparison-of-class-representatives-with-challenging-images)
      * [Distribution of labels in Gravel-false-positive images](#Distribution-of-labels-in-Gravel-false-positive-images)
3. [Conclusion](#Conclusion)

# Preparing everything necessary

## Libraries

In [None]:
import os, glob
import random
from sklearn.model_selection import train_test_split
import cv2
import numpy as np
import pandas as pd
import multiprocessing
from copy import deepcopy
from sklearn.metrics import precision_recall_curve, auc, multilabel_confusion_matrix
import tensorflow.keras as keras
from tensorflow.keras.models import Model, load_model
from tensorflow.keras.utils import Sequence
import matplotlib.pyplot as plt
from IPython.display import Image
from tqdm import tqdm_notebook as tqdm
import itertools
import tensorflow as tf
from tensorflow.python.framework import ops
from tensorflow.keras import backend as K
from numpy.random import seed
seed(10)
from tensorflow import set_random_seed
set_random_seed(10)
%matplotlib inline

In [None]:
test_imgs_folder = '../input/understanding_cloud_organization/test_images/'
train_imgs_folder = '../input/understanding_cloud_organization/train_images/'
num_cores = multiprocessing.cpu_count()
image_width, image_height = 224, 224

## Data Generators

In [None]:
train_df = pd.read_csv('../input/understanding_cloud_organization/train.csv')
train_df_orig = pd.read_csv('../input/understanding_cloud_organization/train.csv')

In [None]:
train_df = train_df[~train_df['EncodedPixels'].isnull()]
train_df['Image'] = train_df['Image_Label'].map(lambda x: x.split('_')[0])
train_df['Class'] = train_df['Image_Label'].map(lambda x: x.split('_')[1])
classes = train_df['Class'].unique()
train_df = train_df.groupby('Image')['Class'].agg(set).reset_index()
for class_name in classes:
    train_df[class_name] = train_df['Class'].map(lambda x: 1 if class_name in x else 0)

In [None]:
# dictionary for fast access to ohe vectors
img_2_ohe_vector = {img:vec for img, vec in zip(train_df['Image'], train_df.iloc[:, 2:].values)}

## Stratified split into train/val

In [None]:
train_imgs, val_imgs = train_test_split(train_df['Image'].values, 
                                        test_size=0.2, 
                                        stratify=train_df['Class'].map(lambda x: str(sorted(list(x)))), # sorting present classes in lexicographical order, just to be sure
                                        random_state=10)

## Generator class

In [None]:
class DataGenenerator(Sequence):
    def __init__(self, images_list=None, folder_imgs=train_imgs_folder, 
                 batch_size=32, shuffle=True, augmentation=None,
                 resized_height=224, resized_width=224, num_channels=3):
        self.batch_size = batch_size
        self.shuffle = shuffle
        self.augmentation = augmentation
        if images_list is None:
            self.images_list = os.listdir(folder_imgs)
        else:
            self.images_list = deepcopy(images_list)
        self.folder_imgs = folder_imgs
        self.len = len(self.images_list) // self.batch_size
        self.resized_height = resized_height
        self.resized_width = resized_width
        self.num_channels = num_channels
        self.num_classes = 4
        self.is_test = not 'train' in folder_imgs
        if not shuffle and not self.is_test:
            self.labels = [img_2_ohe_vector[img] for img in self.images_list[:self.len*self.batch_size]]

    def __len__(self):
        return self.len
    
    def on_epoch_start(self):
        if self.shuffle:
            random.shuffle(self.images_list)

    def __getitem__(self, idx):
        current_batch = self.images_list[idx * self.batch_size: (idx + 1) * self.batch_size]
        X = np.empty((self.batch_size, self.resized_height, self.resized_width, self.num_channels))
        y = np.empty((self.batch_size, self.num_classes))

        for i, image_name in enumerate(current_batch):
            path = os.path.join(self.folder_imgs, image_name)
            img = cv2.resize(cv2.imread(path), (self.resized_height, self.resized_width)).astype(np.float32)
            if not self.augmentation is None:
                augmented = self.augmentation(image=img)
                img = augmented['image']
            X[i, :, :, :] = img/255.0
            if not self.is_test:
                y[i, :] = img_2_ohe_vector[image_name]
        return X, y

    def get_labels(self):
        if self.shuffle:
            images_current = self.images_list[:self.len*self.batch_size]
            labels = [img_2_ohe_vector[img] for img in images_current]
        else:
            labels = self.labels
        return np.array(labels)

Generator instances

In [None]:
data_generator_val = DataGenenerator(val_imgs, shuffle=False)

## Val predictions and ground truth

In [None]:
model = load_model('../input/clouds-classifier-files/classifier_epoch_45_val_pr_auc_0.8344173287108075.h5')
y_pred = model.predict_generator(data_generator_val, workers=num_cores)
y_true = data_generator_val.get_labels()
model_class_names =  ['Fish', 'Flower', 'Sugar', 'Gravel']

## Mask routines

In [None]:
# helper functions
# credits: https://www.kaggle.com/artgor/segmentation-in-pytorch-using-convenient-tools?scriptVersionId=20202006
def rle_decode(mask_rle: str = '', shape: tuple = (1400, 2100)):
    '''
    Decode rle encoded mask.
    
    :param mask_rle: run-length as string formatted (start length)
    :param shape: (height, width) of array to return 
    Returns numpy array, 1 - mask, 0 - background
    '''
    s = mask_rle.split()
    starts, lengths = [np.asarray(x, dtype=int) for x in (s[0:][::2], s[1:][::2])]
    starts -= 1
    ends = starts + lengths
    img = np.zeros(shape[0] * shape[1], dtype=np.uint8)
    for lo, hi in zip(starts, ends):
        img[lo:hi] = 1
    return img.reshape(shape, order='F')

def make_mask(df, image_label, shape: tuple = (1400, 2100)):
    """
    Create mask based on df, image name and shape.
    """
    df = df.set_index('Image_Label')
    encoded_mask = df.loc[image_label, 'EncodedPixels']
    mask = np.zeros((shape[0], shape[1]), dtype=np.float32)
    if encoded_mask is not np.nan:
        mask = rle_decode(encoded_mask)
            
    return cv2.resize(mask, (image_height, image_width))

val_masks_np = np.empty((len(model_class_names), len(val_imgs), image_height, image_width))
for class_i, class_name in enumerate(tqdm(model_class_names)):
    for img_i, img_name in enumerate(val_imgs):
        mask = make_mask(train_df_orig, img_name + '_' + class_name)
        val_masks_np[class_i][img_i] = mask

# Inspecting False Positives

## Confusion matrix

To better understand which classes are difficult for the classifier, let's create a confusion matrix. 

In [None]:
# inspired by similar plots in PLAsTiCC Astronomical Classification
def plot_confusion_matrix(y_true, y_pred,
                          cmap=plt.cm.Blues,
                          normalize=False
                         ):
    confusion_matrices = multilabel_confusion_matrix(y_true, y_pred)
    for class_name, conf_matrix in zip(model_class_names, confusion_matrices):
        plt.figure()
        if normalize:
            conf_matrix = np.divide(conf_matrix, conf_matrix.sum(axis=1).repeat(2).reshape(2, 2))
        
        plt.imshow(conf_matrix, cmap=plt.cm.Blues)
        plt.colorbar()
        tick_marks = np.arange(-.5, 1.51, 0.5)
        plt.xticks(tick_marks, ['', 'the rest', '', class_name])
        plt.yticks(tick_marks, ['', 'the rest', '', class_name])
        plt.title(f'Confusion matrix: {class_name} vs. All')

        fmt = '.2f'
        thresh = conf_matrix.max() / 2.
        for i, j in itertools.product(range(conf_matrix.shape[0]), range(conf_matrix.shape[1])):
            plt.text(j, i, format(conf_matrix[i, j], fmt),
                     horizontalalignment="center",
                     color="white" if conf_matrix[i, j] > thresh else "black")

        plt.ylabel('True label')
        plt.xlabel('Predicted label')
        plt.tight_layout()

In [None]:
def get_probability_for_precision_threshold(precision, thresholds, precision_threshold):
    # consice, even though unnecessary passing through all the values
    probability_threshold = [thres for prec, thres in zip(precision, thresholds) if prec >= precision_threshold][0]
    return probability_threshold

def get_probability_for_recall_threshold(recall, thresholds, recall_threshold):
    i = len(thresholds) - 1
    probability_threshold_recall = None
    while probability_threshold_recall is None:
        next_threshold = thresholds[i]
        next_recall = recall[i]
        if next_recall >= recall_threshold:
            probability_threshold_recall = next_threshold 
        i -= 1
    return probability_threshold_recall

class_precision_2_probability = [dict() for _ in range(len(model_class_names))]
class_recall_2_probability = [dict() for _ in range(len(model_class_names))]
precision, recall, thresholds = precision_recall_curve(y_true[:, class_i], y_pred[:, class_i])
for class_i in tqdm(range(len(model_class_names))):
    for thres in np.arange(10, 100):
        class_prob_prec = get_probability_for_precision_threshold(precision, thresholds, thres/100)
        class_precision_2_probability[class_i][thres] = class_prob_prec
        class_prob_recall = get_probability_for_recall_threshold(recall, thresholds, thres/100)
        class_recall_2_probability[class_i][thres] = class_prob_recall

In [None]:
y_pred_prec_90 = y_pred.copy()
for class_i in range(len(model_class_names)):
    y_pred_prec_90[:, class_i] = np.where(y_pred_prec_90[:, class_i] >= class_precision_2_probability[class_i][90], 1.0, 0.0)
plot_confusion_matrix(y_true, y_pred_prec_90)

For classes Flower and Sugar the classifier retrieves more than half of all instances while remaining 90% precision. It's not the case for Gravel and Fish. In other words, Gravel is more easily confused with other classes, the same goes for Fish.

## Visualizations

### Helper functions

#### Guided Backprop routines

False Positive might be due to wrong labels or bias of the classifier. To check potential classifier bias, I'd use Guided GradCAM, which is simply GradCAM multiplied by the result of guided backprop (i.e. focusing only on pixels which stimulate the net).

In [None]:
layer_name='conv5_block16_concat'
def build_guided_model():
    """Function returning modified model.
    
    Changes gradient function for all ReLu activations
    according to Guided Backpropagation.
    """
    if "GuidedBackProp" not in ops._gradient_registry._registry:
        @ops.RegisterGradient("GuidedBackProp")
        def _GuidedBackProp(op, grad):
            dtype = op.inputs[0].dtype
            return grad * tf.cast(grad > 0., dtype) * \
                   tf.cast(op.inputs[0] > 0., dtype)

    g = tf.get_default_graph()
    with g.gradient_override_map({'Relu': 'GuidedBackProp'}):
        new_model = load_model('../input/clouds-classifier-files/classifier_epoch_45_val_pr_auc_0.8344173287108075.h5')
    return new_model

guided_model = build_guided_model()

def get_guided_backprop_fn(input_model, layer_name):
    """Guided Backpropagation method for visualizing input saliency."""
    input_imgs = input_model.input
    layer_output = input_model.get_layer(layer_name).output
    grads = K.gradients(layer_output, input_imgs)[0]
    backprop_fn = K.function([input_imgs, K.learning_phase()], [grads])
    def guided_backprop(images):
        gb = backprop_fn([images, 0])[0]
        """Same normalization as in:
        https://github.com/fchollet/keras/blob/master/examples/conv_filter_visualization.py
        """
        gb = np.mean(gb, axis=3)
        img_means = np.mean(gb, axis=(1, 2))
        img_means = np.repeat(img_means, gb.shape[1]*gb.shape[2]).reshape(gb.shape)
        img_std = np.std(gb, axis=(1, 2))
        img_std = np.repeat(img_std, gb.shape[1]*gb.shape[2]).reshape(gb.shape) + 1e-5
        gb = np.divide((gb - img_means), img_std)
        # ensure std is 0.1
        gb *= 0.1
        gb += 0.5
        # clip to [0, 1]
        gb = np.clip(gb, 0, 1)
        return gb
        
    return guided_backprop

guided_backprop_fn = get_guided_backprop_fn(guided_model, layer_name)

In [None]:
# gradcam functions source/inspiration: https://github.com/eclique/keras-gradcam/blob/master/grad_cam.py

# for per image grad_cam
gradient_functions = []
for class_i in range(len(model_class_names)):
    y_c = model.output[0, class_i]
    conv_output = model.get_layer(layer_name).output
    grads = K.gradients(y_c, conv_output)[0]
    gradient_functions.append(K.function([model.input], [conv_output, grads]))

# for batch gradcam
gradient_fns = []
for class_i in range(len(model_class_names)):
    class_predictions = tf.slice(model.output, [0, class_i], [-1, 1])
    conv_layer_output = model.get_layer(layer_name).output
    grads = K.gradients(class_predictions, conv_layer_output)[0]
    gradient_fns.append(K.function([model.input, K.learning_phase()], [conv_layer_output, grads]))
    
def grad_cam(image, class_i):
    """GradCAM method for visualizing input saliency."""
    output, grads_val = gradient_functions[class_i](image)
    output, grads_val = output[0, :], grads_val[0, :, :, :]
    weights = np.mean(grads_val, axis=(0, 1))
    cam = np.dot(output, weights)
    # Process CAM
    cam = cv2.resize(cam, (image_height, image_width), cv2.INTER_LINEAR)
    cam = np.maximum(cam, 0)
    cam = cam / cam.max()
    return cam


def grad_cam_batch(images, class_i):
    """GradCAM method for visualizing input saliency.
    Same as grad_cam but processes multiple images in one run."""
    conv_output, grads_val = gradient_fns[class_i]([images, 0])    
    weights = np.mean(grads_val, axis=(1, 2))
    cams = np.einsum('ijkl,il->ijk', conv_output, weights)
    
    # Process CAMs
    new_cams = np.empty((images.shape[0], image_height, image_width))
    for i in range(images.shape[0]):
        cam_i = cams[i] - cams[i].mean()
        cam_i = (cam_i + 1e-10) / (np.linalg.norm(cam_i, 2) + 1e-10)
        new_cams[i] = cv2.resize(cam_i, (image_height, image_width), cv2.INTER_LINEAR)
        new_cams[i] = np.maximum(new_cams[i], 0)
        new_cams[i] = new_cams[i] / new_cams[i].max()    
    return new_cams

In [None]:
def guided_grad_cam_batch(imgs_batch, class_i):
    guided_backprop = guided_backprop_fn(imgs_batch)
    gradcams_batch = grad_cam_batch(imgs_batch, class_i)
    return gradcams_batch*guided_backprop

In [None]:
def generate_guided_gradcam_masks(model=model, model_class_names=model_class_names):
    print('Reading images..')
    imgs_np = np.empty((len(val_imgs), image_height, image_width, 3))
    for img_i, img_name in enumerate(tqdm(val_imgs)):
        img_path = os.path.join(train_imgs_folder, img_name)
        imgs_np[img_i, :, :, :] = cv2.resize(cv2.imread(img_path), (image_height, image_width)).astype(np.float32)/255.0
    
    gradcam_masks_np = np.empty((len(model_class_names),) + imgs_np.shape[:3], np.float32)
    zero_mask = np.zeros((image_height, image_width), np.float32)
    batch_size = 32
    num_batches = imgs_np.shape[0]//batch_size + 1
    print('Generating Guided GradCAMs')
    for batch_i in tqdm(range(num_batches)):
        imgs_batch = imgs_np[batch_i*batch_size: (batch_i + 1)*batch_size]
        predictions = model.predict(imgs_batch)
        for class_i, class_name in enumerate(model_class_names):
            guided_gradcams_batch = guided_grad_cam_batch(imgs_batch, class_i)
            gradcam_masks_np[class_i][batch_i*batch_size: (batch_i + 1)*batch_size] = guided_gradcams_batch
    return gradcam_masks_np

min_size = 2000
guided_gradcam_masks_np = generate_guided_gradcam_masks()

#### Functions checking distribution of labels for challenging images

In [None]:
def barplot_per_classes(imgs_indices, class_i, img_list=val_imgs, fp=True):
    labels_np = np.array([img_2_ohe_vector[img_list[img_idx]] for img_idx in imgs_indices])
    class_counts = labels_np.sum(axis=0)
    plt.figure(figsize=(10,7))
    plt.bar(model_class_names, class_counts)
    plt.xlabel('Class', fontsize=12)
    plt.ylabel('Count of images', fontsize=12)
    plt.title(f"Distribution of classes for {model_class_names[class_i]} false {'positives' if fp else 'negatives'}", fontsize=17)

#### Functions for plotting mistakes alongside with correct ones for comparison

In [None]:
def get_img(img_idx, imgs_list=val_imgs, folder=train_imgs_folder):
    img_name = imgs_list[img_idx]
    img_path = os.path.join(folder, img_name)
    return cv2.resize(cv2.imread(img_path), (image_height, image_width)).astype(np.float32)/255., img_name

def visualize_and_compare_to_base(img_indices_base, img_indices_wrong, masks_val_class, ggrad_cam_class, class_name=None, fp_classes=[]):
    assert(len(img_indices_base) == 8)
    fig, axes = plt.subplots(4, 4, figsize=(15,15))
    for subplot_i, img_idx_good in enumerate(img_indices_base):
        ax = axes[subplot_i//4, subplot_i%4]
        ax.axis('off')
        img, img_name = get_img(img_idx_good)
        ax.imshow(img)
        ax.imshow(masks_val_class[img_idx_good], alpha=0.2)
        ax.set_title(f'{img_name} ({class_name})')
        
    for subplot_ii, img_idx_bad in enumerate(img_indices_wrong):
        fp_ax = axes[(subplot_i + subplot_ii + 1)//4, (subplot_i + subplot_ii + 1)%4]
        img, img_name = get_img(img_idx_bad)
        fp_ax.set_title(f"{img_name} ({', '.join(fp_classes[subplot_ii])})")
        fp_ax.imshow(img)
        fp_ax.imshow(ggrad_cam_class[img_idx_bad], cmap='jet', alpha=0.2)
        plt.setp(fp_ax.spines.values(), color='red') 
    plt.suptitle('Two rows of reference class images followed by challenging images', fontsize=18)
        
def get_fp_tp_fn_indices(class_i, threshold_probability):
    y_pred_prec = y_pred.copy()
    y_pred_prec[:, class_i] = np.where(y_pred_prec[:, class_i] >= threshold_probability,
                                       1.0, 0.0)
    fp_idx = [idx for idx, (label, pred) in enumerate(zip(y_true[:, class_i], y_pred_prec[:, class_i])) if label == 0 and pred == 1]
    tp_idx = [idx for idx, (label, pred) in enumerate(zip(y_true[:, class_i], y_pred_prec[:, class_i])) if label == 1 and pred == 1]
    fn_idx = [idx for idx, (label, pred) in enumerate(zip(y_true[:, class_i], y_pred_prec[:, class_i])) if label == 1 and pred == 0]
    return tp_idx, fp_idx, fn_idx

def visualize_and_compare_fp(class_i, precision_level=90):
    threshold_probability = class_precision_2_probability[class_i][precision_level]
    tp_idx, fp_idx, _ = get_fp_tp_fn_indices(class_i, threshold_probability)
    indices_fp = random.sample(fp_idx, 8)
    fp_true_classes = [np.array(model_class_names)[y_true[idx_fp] == 1] for idx_fp in indices_fp]
    visualize_and_compare_to_base(random.sample(tp_idx, 8), indices_fp,
                                  val_masks_np[class_i], guided_gradcam_masks_np[class_i],
                                  model_class_names[class_i], fp_true_classes)

### Fish

#### Visual comparison of class representatives with challenging images

In [None]:
class_i = model_class_names.index('Fish')
visualize_and_compare_fp(class_i=class_i, precision_level=90)

2 interesting observations:
   1. it seems that some Fish-like pattern are not labeled,
   2. Fish might have higher ration of black images (based on the 5264e81.jpg)

#### Distribution of labels in Fish-false-positive images

Let's now check if some label prevails in images, where the classifier "saw" Fish.

In [None]:
threshold_probability = class_precision_2_probability[class_i][90]
tp_idx, fp_idx, _ = get_fp_tp_fn_indices(class_i, threshold_probability)
barplot_per_classes(fp_idx, class_i)

So, probably Fish might get lost in Gravel more easily. Not too many FP images for reliable conclusion though.

### Exploring dark images

In [None]:
#let's define a dark image as one with at least 33% its pixels being black
dark_imgs_count = [0] * len(model_class_names)
img_threshold_area = image_width*image_height/3

def count_dark_ones(imgs_list):
    global dark_imgs_count
    for img_idx, img_name in enumerate(tqdm(imgs_list)):
        img, img_name = get_img(img_idx, imgs_list=imgs_list)
        if np.sum(img[:, :, 0] == 0) >= img_threshold_area:
            labels = img_2_ohe_vector[img_name]
            for class_i in labels.nonzero()[0]:
                dark_imgs_count[class_i] += 1

print('Processing train..')
count_dark_ones(train_imgs)
print('Processing val..')
count_dark_ones(val_imgs)

In [None]:
fig, ax = plt.subplots(figsize=(7, 7))
# credits: https://stackoverflow.com/questions/6170246/how-do-i-use-matplotlib-autopct
def make_autopct(values):
    def my_autopct(pct):
        total = sum(values)
        val = int(round(pct*total/100.0))
        return '{p:.0f}%  ({v:d})'.format(p=pct,v=val)
    return my_autopct
ax.pie(dark_imgs_count, labels=model_class_names, autopct=make_autopct(dark_imgs_count), shadow=True, startangle=90)
ax.axis('equal')
ax.set_title('Dark Images')

There is bias in training data indeed, as half of dark images contain Fish. However, there are only 12 such dark images in the initial train dataset. Let's check how many test dark images are there.

In [None]:
dark_imgs_test_count = 0
print('Processing test..')

for img_idx, img_name in enumerate(tqdm(os.listdir(test_imgs_folder))):
    img, img_name = get_img(img_idx, imgs_list=os.listdir(test_imgs_folder), folder=test_imgs_folder)
    if np.sum(img[:, :, 0] == 0) >= img_threshold_area:
        dark_imgs_test_count += 1

In [None]:
print(f'There are {dark_imgs_test_count} dark images in test.')

As there are just 7 dark images, I'd ignore the issue for the time being.

### Flower

#### Visual comparison of class representatives with challenging images

In [None]:
class_i = model_class_names.index('Flower')
visualize_and_compare_fp(class_i=class_i, precision_level=90)

I'd consider some of the unlabeled images a Flower pattern.

#### Distribution of labels in Flower-false-positive images

Now, let's check distribution of class in Flower FPs.

In [None]:
threshold_probability = class_precision_2_probability[class_i][90]
tp_idx, fp_idx, _ = get_fp_tp_fn_indices(class_i, threshold_probability)
barplot_per_classes(fp_idx, class_i)

So, it seems like Flower is usually just missed, on contrary to Fish which seemed to be confused with Gravel in addition to misses.

### Sugar

#### Visual comparison of class representatives with challenging images

In [None]:
class_i = model_class_names.index('Sugar')
visualize_and_compare_fp(class_i=class_i, precision_level=90)

#### Distribution of labels in Flower-false-positive images

In [None]:
threshold_probability = class_precision_2_probability[class_i][90]
tp_idx, fp_idx, _ = get_fp_tp_fn_indices(class_i, threshold_probability)
barplot_per_classes(fp_idx, class_i)

Sugar seems to be sometimes confused with Gravel and Fish. If our theory about subjective confusion of class pair holds, then in the distribution of Gravel false positives there would be more Fish and Sugar.

### Gravel

#### Visual comparison of class representatives with challenging images

In [None]:
class_i = model_class_names.index('Gravel')
visualize_and_compare_fp(class_i=class_i, precision_level=90)

#### Distribution of labels in Gravel-false-positive images

In [None]:
threshold_probability = class_precision_2_probability[class_i][90]
tp_idx, fp_idx, _ = get_fp_tp_fn_indices(class_i, threshold_probability)
barplot_per_classes(fp_idx, class_i)

Indeed! Gravel seems to be more easily confused with Fish and also Sugar. The amount of inspected Gravel FPs images is rather small. Yet, for Fish FPs we saw a larger count of Gravel-labeled images as well.

# Conclusion
In later stages it might be worth investing time in cleaning data. It might be worth considering pseudo labeling the data and/or manually check some labels, especially for Gravel class as it seems to be more confusing to distinguish Sugar from Gravel, and likely Fish from Gravel.

Black images doesn't seem to be of significant concern as there are not too many of them.  