<a href="https://colab.research.google.com/github/aubricot/computer_vision_with_eol_images/blob/master/classification_for_image_tagging/rating/inspect_train_results.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Determine confidence threshold for Image Rating Classification Models 
---
*Last Updated 17 December 2022*   
Choose which trained model and confidence threshold values to use for classifying EOL image ratings. Threshold values should be chosen that maximize coverage and minimize error.

First, choose the best models trained in [rating_train.ipynb](https://colab.research.google.com/github/aubricot/computer_vision_with_eol_images/blob/master/classification_for_image_tagging/rating/rating_train.ipynb). Then, run this notebook. 

Run 500 images per class (Image ratings 1-5) through the best models chosen in rating_train.ipynb for validation of model performance. Plot histograms of true and false predictions per class at binned confidence intervals to find the best performance by class and confidence threshold. (This is helpful because all models may not learn classes equally well).

***Models were trained in Python 2 and TF 1 in December 2020: MobileNet SSD v2 (Run 18, trained on 'good' and 'bad' classes) was trained for 12 hours to 10 epochs with Batch Size=16, Lr=0.001, Dropout=0.2. Inception v3 was trained for 12 hours to 10 epochs with Batch Size=32 Lr=0.001, Dropout=0 (Run 20, trained on 'good' and 'bad' classes). Inception v3 was trained for 4 hours to 15 epochs with Batch Size=64, Lr=0.1, Dropout=0 (Run 6, trained on numerical rating classes 1-5).***

Notes:   
* Run code blocks by pressing play button in brackets on left
* Before you you start: change the runtime to "GPU" with "High RAM"
* Change parameters using form fields on right (find details at corresponding lines of code by searching '#@param')

## Installs & Imports
---

In [None]:
#@title Choose where to save results & set up directory structure
# Use dropdown menu on right
save = "in Colab runtime (files deleted after each session)" #@param ["in my Google Drive", "in Colab runtime (files deleted after each session)"]
print("Saving results ", save)

# Mount google drive to export file(s)
if 'Google Drive' in save:
    from google.colab import drive
    drive.mount('/content/drive', force_remount=True)

# Type in the path to your working directory in form field to right
import os
basewd = "/content/drive/MyDrive/train/tf2" #@param ["/content/drive/MyDrive/train/tf2"] {allow-input: true}
if not os.path.exists(basewd):
    os.makedirs(basewd)

# Folder where inspect results outputs will be saved
results_folder = "inspect_resul" #@param ["inspect_resul"] {allow-input: true}
cwd = basewd + '/' + results_folder
if not os.path.exists(cwd):
    os.makedirs(cwd)
print("\nWorking directory set to: \n", cwd)

# Enter image classes of interest in form field
filters = ["1", "2", "3", "4", "5"] #@param ["[\"1\", \"2\", \"3\", \"4\", \"5\"]"] {type:"raw", allow-input: true}

# Folder where image metadata was saved in rating_preprocessing.ipynb
data_folder = "pre-processing/image_data" #@param ["pre-processing/image_data"] {allow-input: true}
data_wd = basewd + '/' + data_folder
if not os.path.exists(data_wd):
    !pip3 install --upgrade gdown
    os.makedirs(data_wd)
    print("\nDownload image bundles for rating classes 1-5...\n")
    %cd $data_wd
    file_ids = ['1XbINEyYbCkVlnsOlniobpvBPnYfBt5lT', '1ovMMh6U4biqmYzLt3bguonQa9GMfA901', \
                '1-OYbexPMJlPKTLCmj_zHW9LCQ1wjywqU', '1-OY_Bxoi7OeKM8VrrQHyqPt46TrhQ6kx', \
                '1-NJVPsKEuPCHFdcl-mDXsYOWZNDIqOdi', '1-KNgpvgBvf8mjeIFuHGxd7UIWMH-ssmB', \
                '1CVWiGCGGdPoWa4jsqZz6H_0zb_KKa2Qq']
    for file_id in file_ids:
        !gdown $file_id
print("\nImage metadata directory set to: \n", data_wd)

# Folder where saved models were stored in rating_train.ipynb
models_folder = "saved_models" #@param ["saved_models"] {allow-input: true}
models_wd = basewd + '/' + models_folder
if not os.path.exists(models_wd):
    os.makedirs(models_wd)
    print("\nDownloading pre-trained EOL models for training attempts 06, 18, 20...\n")
    %cd $models_wd
    file_ids = ['1v-Qq2699o7SV4DH3s0Gr3_m1uPzeh3bA', '1L-WqfuoQtPgqJzU8tDKjgsZC98M-68w9', '1-7gwnHoqTseAuBxsafow9bkRiGeuy7yB']
    outfnames = ['06.zip', '18.zip', '20.zip']
    for idx, file_id in enumerate(file_ids):
        file_download_link = "https://docs.google.com/uc?export=download&id=" + file_id
        outfname = outfnames[idx]
        outfolder = outfnames[idx].split('.')[0]
        !mkdir $outfolder
        !gdown $file_id
        !unzip $outfname -d .
        outfpath = 'content/drive/MyDrive/summer20/classification/rating/saved_models/' + outfolder + '/*'
        !mv -v $outfpath $outfolder
        !rm -r content #is this safe if connected to google drive?
        !rm -r $outfname

print("\nSaved models directory set to: \n", models_wd)

In [None]:
# For working with data
import itertools
import os
import numpy as np
import pandas as pd
# Suppress pandas setting with copy warning
pd.options.mode.chained_assignment = None  # default='warn'

# For downloading and displaying images
import matplotlib.pyplot as plt
from PIL import Image, ImageFile
ImageFile.LOAD_TRUNCATED_IMAGES = True
%matplotlib inline

# For measuring inference time
import time

# For image classification and training
import tensorflow as tf

# Define functions

# To read in EOL formatted data files
def read_datafile(fpath, sep="\t", header=0, disp_head=True, lineterminator='\n', encoding='latin1', dtype=None):
    try:
        df = pd.read_csv(fpath, sep=sep, header=header, lineterminator=lineterminator, encoding=encoding, dtype=dtype)
        if disp_head:
          print("Data header: \n", df.head())
    except FileNotFoundError as e:
        raise Exception("File not found: Enter the path to your file in form field and re-run").with_traceback(e.__traceback__)
    
    return df

# List filenames of all images used for training/testing models
def list_train_images(imclasses):
    # Get image class bundle filenames
    all_filenames = [imclass + '_download_7k.txt' for imclass in imclasses] 
    print('Image class bundles used for training/testing models: \n', all_filenames)
    # Make combined list all image ratings from bundles
    used_images = []
    for fn in all_filenames: 
        df = pd.read_csv(fn, index_col=None, header=1, sep='\n')
        df.columns = ['link']
        used_images.append(df)
    used_images = pd.concat(used_images, axis=0, ignore_index=True)
    print('\nNo. image ratings used for training/testing: {}'.format(len(used_images),
                                                                   used_images.head()))

    return used_images

# Remove all images used for training/testing from EOL bundle
def remove_used_images(df, used_images, dataset):
    print("\nTotal image ratings available for {}: {}".format(dataset, len(df)))
    if 'object_url' in df:
        df.rename(columns={'object_url':'obj_url'}, inplace=True)
    condition = df['obj_url'].isin(used_images['link'])
    df.drop(df[condition].index, inplace = True)
    unused_images = df.copy()
    print("\nTotal un-used image ratings available for {}: {}".format(dataset, len(unused_images)))

    return unused_images

# Make master unused image dataset for ratings and exemplars
def make_master_unused_df(ratings, exemplars):
    # Reformat image ratings to match exemplars
    df1 = unused_ratings[["obj_with_overall_rating", "obj_url", "overall_rating", "ancestry"]].copy()
    df1.rename(columns={"obj_with_overall_rating": "obj_id"}, inplace=True)
    # Reformat image exemplars to match ratings
    df2 = unused_exemplars[["target_id", "obj_url", "ancestry"]].copy()
    df2.rename(columns={"object_url":"obj_url", "target_id": "obj_id"}, inplace=True)
    df2["overall_rating"] = 5
    # Merge ratings and exemplars
    unused_images = pd.concat([df1, df2])
    print("\nMaster un-used image ratings for validation (ratings + exemplars): {}".format(len(unused_images)))

    return unused_images

## Build validation dataset (Only run once)
---
Build dataset of image ratings for images not previously seen by models.  
Removes image ratings found in EOL user generated rating and exemplar files that were used in 7k training/testing datasets 

In [None]:
# Find images with ratings that were not used for training or testing models 
%cd $data_wd

# Get list of images used for 7k training/testing datasets
imclasses = filters
used_images = list_train_images(imclasses)

# Remove images already used for training/testing from EOL rating dataset
df = read_datafile("image_ratings.txt", disp_head=False)
unused_ratings = remove_used_images(df, used_images, "Ratings")
unused_ratings.to_csv('unused_image_ratings_foreval.txt', sep="\t", index=False, header=True)

# Remove images already used for training/testing from EOL exemplar dataset (used to supplment rating=5)
df = read_datafile("images_selected_as_exemplar.txt", disp_head=False)
unused_exemplars = remove_used_images(df, used_images, "Exemplars")
unused_exemplars.to_csv('unused_image_exemplars_foreval.txt', sep="\t", index=False, header=True)

# Make master unused images dataset for ratings and exemplars
unused_images = make_master_unused_df(unused_ratings, unused_exemplars)
unused_images.to_csv('unused_images_foreval_master.txt', sep="\t", index=False, header=True)

## Run images through for classification and validating predictions (Run 1x for each trained model)   
---
Selected models from rating_train.ipynb   
* Run 06: Inception v3 (trained on numerical rating classes 1-5)
* Run 18: Mobilenet SSD v2 (trained on 'good' and 'bad' classes)
* Run 20: Inception v3 (trained on 'good' and 'bad' classes)

In [None]:
# Define functions

# Define start and stop indices in EOL bundle for running inference   
def set_start_stop(run, df):
    # To test with a tiny subset, use 50 random bundle images
    N = len(df)
    if "tiny subset" in run:
        start=np.random.choice(a=N, size=1)[0]
        stop=start+50
    # To run for a larger set, use 500 random images
    else:
        start=np.random.choice(a=N, size=1)[0]
        stop=start+500
    
    return start, stop

# Load saved model from directory
def load_saved_model(models_wd, TRAIN_SESS_NUM, module_selection):
    # Load trained model from path
    saved_model_path = models_wd + '/' + TRAIN_SESS_NUM
    model = tf.keras.models.load_model(saved_model_path)
    # Get name and image size for model type
    handle_base, pixels = module_selection

    return model, pixels, handle_base

# Get info about model based on training attempt number
def get_model_info(TRAIN_SESS_NUM):
    # Session 18
    if int(TRAIN_SESS_NUM) == 18:
        module_selection =("mobilenet_v2_1.0_224", 224)
        dataset_labels = ['bad', 'good'] # Classes aggregated after attempt 7: 1/2 -> bad, 4/5 -> good
    # Session 20
    elif int(TRAIN_SESS_NUM) == 20:
        module_selection = ("inception_v3", 299)
        dataset_labels = ['bad', 'good'] # Classes aggregated after attempt 7: 1/2 -> bad, 4/5 -> good
    # Session 6
    elif int(TRAIN_SESS_NUM) == 6:
        module_selection = ("inception_v3", 299)
        dataset_labels = ['1', '2', '3', '4', '5'] # Before aggregating classes

    return module_selection, dataset_labels

# Set filename for saving classification results
def set_outfpath(true_imclass):
    outfpath = cwd + '/ratings_' + TRAIN_SESS_NUM + '_' + true_imclass + '.csv'

    return outfpath

# Load in image from URL
# Modified from https://colab.research.google.com/github/tensorflow/docs/blob/master/site/en/guide/saved_model.ipynb#scrollTo=JhVecdzJTsKE
def image_from_url(url, fn):
    file = tf.keras.utils.get_file(fn, url) # Filename doesn't matter
    disp_img = tf.keras.preprocessing.image.load_img(file)
    image = tf.keras.preprocessing.image.load_img(file, target_size=[pixels, pixels])
    image = tf.keras.preprocessing.image.img_to_array(image)
    image = tf.keras.applications.mobilenet_v2.preprocess_input(
        image[tf.newaxis,...])

    return image, disp_img

# Get info from predictions to display on images
def get_predict_info(predictions, i, stop, start):
    # Get info from predictions
    label_num = np.argmax(predictions[0], axis=-1)
    conf = predictions[0][label_num]
    im_class = dataset_labels[label_num]
    # Display progress message after each image
    print("Completed for {} of {} files".format(i+1, format(stop-start, '.0f')))
    
    return label_num, conf, im_class

# Make placeholder lists to fill for each rating class
def make_placeholders():
    filenames = []
    confidences = []
    true_imclasses = []
    det_imclasses = []
    ancestries = []

    return filenames, confidences, true_imclasses, det_imclasses, ancestries
    
# Add values for each image to placeholder list
def record_results(fn, conf, true_imclass, det_imclass, ancestry):
    filenames.append(fn)
    confidences.append(conf)
    true_imclasses.append(true_imclass)
    det_imclasses.append(str(det_imclass))
    ancestries.append(ancestry)
    results = [filenames, confidences, true_imclasses, det_imclasses, ancestries]

    return results

# Export results
def export_results(results, outfpath):
    results = pd.DataFrame(results)
    results = results.transpose()
    results.to_csv(outfpath, index=False, header=("filename", "confidence", 
                                                     "true_id", "det_id", "ancestry"))
    print("\nClassification predictions for image class {} being saved to : \n{}\n".format(
          true_imclass, outfpath))

In [None]:
#@title Run inference for chosen Training Session Number (06, 18, 20) and dataset size
%cd $cwd

# Choose training attempt number to inspect results for
TRAIN_SESS_NUM = "18" #@param ["20", "18", "06"] {allow-input: true}

# Test pipeline with a smaller subset than 5k images?
run = "test with tiny subset" #@param ["test with tiny subset", "for 500 images"]
print("Run: ", run)

# Load saved model
module_selection, dataset_labels = get_model_info(TRAIN_SESS_NUM)
print("Loading saved model ", module_selection)
model, pixels, handle_base = load_saved_model(models_wd, TRAIN_SESS_NUM, module_selection)

# Run inference for each image class to compare known versus predicted ratings
true_imclasses = filters
for true_imclass in true_imclasses:
    print("Runing inference for class: {}\n".format(true_imclass))
    # Set filename for saving classification results
    outfpath = set_outfpath(true_imclass)

    # Make placeholder lists to record values for each image
    filenames, confidences, true_imclasses, det_imclasses, ancestries = make_placeholders()

    # Load subset of in validation images df for each image class
    df = unused_images.copy()
    df = df[df.overall_rating==int(true_imclass)]

    # Run 500 random EOL bundle images through trained model
    start, stop = set_start_stop(run, df)
    for i, row in enumerate(df.iloc[start:stop].iterrows()):
        try:
            # Read in image from url
            url = df['obj_url'][i]
            fn = str(i) + '.jpg'
            img, disp_img = image_from_url(url, fn)
            ancestry = df['ancestry'][i]
        
            # Image classification
            start_time = time.time() # Record inference time
            predictions = model.predict(img, batch_size=1)
            label_num, conf, det_imclass = get_predict_info(predictions, i, stop, start)
            end_time = time.time()
            print("Inference time: {} sec".format(format(end_time-start_time, '.2f')))

            # Record results in placeholder lists to inspect results in next step
            results = record_results(url, conf, true_imclass, str(det_imclass), ancestry)

        except:
            pass

    # Combine to df and export results
    export_results(results, outfpath)

In [None]:
#@title Aggregate model outputs for numerical classes into 'bad' (1-2) and 'good' (4-5)

# Combine prediction files created in codeblock above
base = 'ratings_' + TRAIN_SESS_NUM + '_'
imclasses = filters
all_filenames = [base + imclass + '.csv' for imclass in imclasses]
all_predictions = pd.concat([pd.read_csv(f, sep=',', header=0, na_filter = False) for f in all_filenames])
print("No. Images: {}\n".format(len(all_predictions)))
print("Model predictions for Training Attempt {}, {} with numeric classes:\n{}".format(\
      TRAIN_SESS_NUM, handle_base, all_predictions[['filename', 'true_id', 'det_id']].head()))
print("\n\nAggregating numeric class predictions to 'bad' or 'good'...\n")

# Aggregate numerical rating classes into 'bad' (1-2) and 'good' (4-5)
c0 = "bad" #@param {type:"string"}
c1 = "good" #@param {type:"string"}
imclasses = [c0, c1]

# Predictions of 1 or 2 -> 'bad'
all_predictions.true_id[(all_predictions.true_id==1) | (all_predictions.true_id==2)] = c0
all_predictions.det_id[(all_predictions.det_id==1) | (all_predictions.det_id==2)] = c0

# Predictions of 4 or 5 -> 'good'
all_predictions.true_id[(all_predictions.true_id==4) | (all_predictions.true_id==5)] = c1
all_predictions.det_id[(all_predictions.det_id==4) | (all_predictions.det_id==5)] = c1

# Remove predictions of 3
all_predictions = all_predictions[all_predictions.det_id!=3]
all_predictions = all_predictions[all_predictions.true_id!=3]

print("\nImage ratings aggregated into {} (1-2) and {} (4-5):\n{}".format(c0, c1, all_predictions[['filename', 'true_id', 'det_id']].head()))

## Plot prediction error and confidence for each class (Run 1x for each trained model)
---   
Use these histograms to find a confidence threshold value to optimize dataset coverage and accuracy

In [None]:
# Define functions

# Calculate prediction accuracy
def get_accuracy(obs, all_vals):
    # obs = observed, all_vals = observed + expected
    if obs:
        accuracy = format((obs/all_vals), '.2f')
    else:
        accuracy = 0
    
    return accuracy

# Valide predictions by image class (and optionally, by: taxon)
def validate_predict(df, inspect_by_taxon, taxon):
    # If inspecting for taxon-specific images only
    if inspect_by_taxon:
        taxon = taxon
        df = df.loc[df.ancestry.str.contains(taxon, case=False, na=False)]
        print("Inspecting results for {}:\n{}".format(taxon, df.head()))
    
    # Validate predictions
    # Check where true ratings and model-determined classes match
    df['det'] = (df['true_id'] == df['det_id'])
    tru = df.loc[df.det, :] # True ID
    fal = df.loc[~df.det, :] # False ID

    return tru, fal, taxon

# Plot results by image class
def plot_predict_x_conf(tru, fal, thresh, imclasses):
    # Break up predictions by image class and confidence values
    c0,c1 = [imclasses[i] for i in range(0, len(imclasses))]
    # Check how many true/false predictions are at each confidence value
    # Class 0 - 'Bad'
    c0t = tru.loc[tru['true_id'] == c0, :] # True dets
    c0f = fal.loc[fal['true_id'] == c0, :] # False dets
    # Class 1 - 'Good'
    c1t = tru.loc[tru['true_id'] == c1, :] 
    c1f = fal.loc[fal['true_id'] == c1, :] 
    
    # Plot parameters to make 1 subplot per image class
    kwargs = dict(alpha=0.5, bins=15)
    fig, axes = plt.subplots(len(imclasses), figsize=(10, 10), constrained_layout=True)
    fig.suptitle('Prediction Confidence by Class\n Overall Accuracy: {}'.format(
                  get_accuracy(len(tru), (len(tru)+len(fal)))))
    
    # Make subplots
    # Class 0 - 'Bad'
    # True predictions
    axes[0].hist(c0t['confidence'], color='y', label='True Det', **kwargs)
    # False predictions
    axes[0].hist(c0f['confidence'], color='r', label='False Det', **kwargs)
    axes[0].set_title("{} (n={} images)\n Accuracy: {}".format(imclasses[0], 
                      len(c0t+c0f), get_accuracy(len(c0t), (len(c0t)+len(c0f)))))
    axes[0].legend();

    # Class 1 - 'Good'
    # True predictions
    axes[1].hist(c1t['confidence'], color='y', label='True Det', **kwargs)
    # False predictions
    axes[1].hist(c1f['confidence'], color='r', label='False Det', **kwargs)
    axes[1].set_title("{} (n={} images)\n Accuracy: {}".format(imclasses[1], 
                      len(c1t+c1f), get_accuracy(len(c1t), (len(c1t)+len(c1f)))))
    axes[1].legend();

    # Add Y-axis labels
    for ax in fig.get_axes():
        ax.set(ylabel='Freq (# imgs)')
        if thresh:
            ax.axvline(thresh, color='k', linestyle='dashed', linewidth=1)

    return fig

# To save the figure
def save_figure(fig, taxon, TRAIN_SESS_NUM=TRAIN_SESS_NUM, handle_base=handle_base):
    # Make filename
    if taxon: # If for a specific taxon
        if 'plant' in taxon:
            handle_base = handle_base + '_plantae'
        elif 'anim' in taxon:
            handle_base = handle_base + '_animalia'

    outfpath = TRAIN_SESS_NUM + '_' + handle_base + '.png'
    fig.savefig(outfpath)
    print("Histograms saved to ", outfpath)

    return outfpath

### Plot histograms of accuracy for each image class
Use these plots to determine confidence thresholds or class predictions to keep or filter out during post-processing. 

For example, we found that model predictions for "bad" had high accuracy at all confidence levels, but predictions for "good" had low accuracy for all confidence levels. We used these plots to determine that we should keep all predictions for "bad" and discarded predictions for "good".

In [None]:
#@title (Optional: inspect for specific taxon and/or add a confidence threshold line)

# Load combined prediction results from above
df = all_predictions.copy()

# Optional: Inspect predictions for taxon-specific images only?
inspect_by_taxon = False #@param {type:"boolean"}
taxon = "" #@param {type:"string"}

# Optional: Draw threshold value to help choose optimal balance b/w maximizing useful data and minimizing error
thresh = 0 #@param {type:"number"}

# Valide predictions by image class (optionally, by taxon)
tru, fal, taxon = validate_predict(df, inspect_by_taxon, taxon)

# Plot result accuracy by image class (optionally, with confidence threshold line)
fig = plot_predict_x_conf(tru, fal, thresh, imclasses)

# Export histograms
save_figure(fig, taxon)