<div class = "alert alert-block alert-success">
    
# <span style='color:black'> Model Evaluation and False Positive Reduction

<div class = "alert alert-block alert-success">

# <span style='color:Blue'>This notebook is provided if you'd like to use the object detection evaluation module (objdeteval.py) to evaluate the results output from the nightingale_parallel module that were saved and formatted in the nightingale_api.ipynb Jupyter Notebook. **Note that because this example uses the very few detections and groundtruth annotations provided with the sample data, the curves will look a little funky. **
    
# <span style='color:purple'> An exciting feature of Nightingale's *objdeteval* module is that is saves Falses Positives to Nightingale-Formatted groundtruth files, allowing for easy merging with the original groundtruth data csv for enhanced network training on confusers that are not labled in the original dataset. 

<div class = "alert alert-block alert-success">

# <span style='color:Blue'> First we'll check out and run the eval code

In [1]:
import numpy as np

# Evaluation inputs
det_file = 'my_results.csv'
gt_file = '../Sample_NITF/omittedimage_groundtruth.csv'
iou_thresh = 0.2
score_thresh = [0.01,0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,1.0]
## PRIORI3 ##
class_list = ['class1', 'class2', 'calss3']

In [2]:
from objdeteval import Eval

## The usage for the Eval function is as follows:

**Eval(gt_file,
     det_file,
     iou_thresh,
     class_list,
     score_thresh=array([0. , 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1. ]),)**

***Evaluates a detection file against a groundtruth file for object-oriented quadrilaters.
Returns precision, recall, F1-Score, True-Positives, False-Positives, False-Negatives,
and a confusion matrix***

**Usage: p,r,f1,tp,fp,fn,confm = Eval(gt_file, det_file, iou_thresh, score_thresh, class_list)**

**gt_file** : str; path to Nightgale formatted groundtruth csv:
              IMID,xLF,yLF,xRF,yRF,xRB,yRB,xLB,yLB,class
              
**det_file** : str; path to Nighingale formatted detection csv:
               geometry,class,conf,image_name,id,Background Score,Class #1 Score,Class #2 Score, etc...

**iou_thresh** : float; The acceptable Intersection-Over-Union for a detection to be considered a 
             True-Positive

**score_thresh** : iterable : a list or array of confidence scores to evaluate against. E.g., 
               score_thresh = np.arange(0.0,1.1,0.1)
               
**class_list** : list; An list of strings corresponding to the classes in the groundtruth (does
             not include the 'background' class). E.g., class_list=['class1','class2','class3']

In [3]:
[p,r,f1,tp,fp,fn,confm] = Eval(gt_file=gt_file,det_file=det_file,iou_thresh=iou_thresh,score_thresh=score_thresh,class_list=class_list)

converted txt files to arrays

done, computing p r and f1
done


## The outputs for p, r, f1, tp, fp, & fn will be numpy arrays with a shape equal to ***(len(score_thresh),len(class_list))***
## The confusion matrix will have a shape equal to ***(len(class_list)+1,len(class_list),len(score_thresh))***, where the first dimension is one longer because it includes the "background" class

In [4]:
p.shape,r.shape,f1.shape,tp.shape,fp.shape,fn.shape,confm.shape

((11, 3), (11, 3), (11, 3), (11, 3), (11, 3), (11, 3), (4, 3, 11))

In [5]:
# find the average metrics for each score threshold
p_mean = np.nanmean(p,axis=1)
r_mean = np.nanmean(r,axis=1)
f1_mean = np.nanmean(f1,axis=1)

## The below cells are used to plot the results

In [6]:
import matplotlib.pyplot as plt
from matplotlib import cm
import matplotlib.patheffects as pe

In [3]:
##############################
## Plot the average metrics ##
##############################

fig,ax = plt.subplots(1,2,figsize=(20,20))

ax[0].plot(r_mean,p_mean,linewidth=4,c='r',path_effects=[pe.Stroke(linewidth=6, foreground='k'), pe.Normal()])
ax[0].plot(r_mean,p_mean,'o',c='k',mew=5)
ax[0].set_ylabel('Precision',size=20)
ax[0].set_xlabel('Recall',size=20)
ax[0].axis('square');
ax[0].set_title('PR Curve (AP = '+str(np.round(np.nanmean(p_mean),decimals=3))+')',size=20);
ax[0].minorticks_on()
ax[0].grid()
ax[0].grid(which='minor', linestyle=':', linewidth='0.1', color='black')
ax[0].set_xlim(-0.01,1.01);
ax[0].set_ylim(-0.01,1.01);
ax[0].tick_params(labelsize=22)

ax[1].plot(score_thresh,p_mean,linewidth=4,c='r',path_effects=[pe.Stroke(linewidth=6, foreground='k'), pe.Normal()])
ax[1].plot(score_thresh,r_mean,linewidth=4,c='b',path_effects=[pe.Stroke(linewidth=6, foreground='k'), pe.Normal()])
ax[1].plot(score_thresh,f1_mean,linewidth=4,c='m',path_effects=[pe.Stroke(linewidth=6, foreground='k'), pe.Normal()])
ax[1].legend(['Precision','Recall','F1'],fontsize=20,loc='lower left')
ax[1].set_xlabel('Confidence Score',size=20);
ax[1].set_ylabel('Value',size=20);
ax[1].axis('square');
ax[1].set_title('Average Metrics vs Confindence',size=20);
ax[1].minorticks_on()
ax[1].grid()
ax[1].grid(which='minor', linestyle=':', linewidth='0.1', color='black')
ax[1].set_xlim(-0.01,1.01);
ax[1].set_ylim(-0.01,1.01);
ax[1].tick_params(labelsize=22)

In [4]:
#######################################
## Plot the class-specific PR curves ##
#######################################

fig0,ax0 = plt.subplots(figsize=(10,10))

w = cm.get_cmap('gist_rainbow')
linsty = ['solid','dotted','dashed']*20
for cl in np.arange(0,len(class_list)):
    ax0.plot(r[:,cl],p[:,cl],color=w(cl/(len(class_list)-1)),path_effects=[pe.Stroke(linewidth=6, foreground='k'), pe.Normal()],linestyle = linsty[cl],linewidth=4)
ax0.legend(class_list,loc='right',bbox_to_anchor=(1.35, 0.5),prop={'size': 14},fontsize=40)
ax0.set_xlabel('Recall',size=20);
ax0.set_ylabel('Precision',size=20);
ax0.axis('square');
ax0.set_title('PR Class Breakdown',size=20);
ax0.minorticks_on()
ax0.grid()
ax0.grid(which='minor', linestyle=':', linewidth='0.1', color='black')
ax0.set_xlim(-0.01,1.01);
ax0.set_ylim(-0.01,1.01);
ax0.tick_params(labelsize=22)

In [5]:
#####################################################################
## Plot the class-specific F1, Precision, and Recall vs Confidence ##
#####################################################################

fig0,ax0 = plt.subplots(1,3,figsize=(25,25))

w = cm.get_cmap('gist_rainbow')
linsty = ['solid','dotted','dashed']*20
for cl in np.arange(0,len(class_list)):
    ax0[0].plot(score_thresh,f1[:,cl],color=w(cl/(len(class_list)-1)),path_effects=[pe.Stroke(linewidth=6, foreground='k'), pe.Normal()],linestyle = linsty[cl],linewidth=4)
ax0[0].set_xlabel('Confidence Score',size=20);
ax0[0].set_ylabel('Value',size=20);
ax0[0].axis('square');
ax0[0].set_title('F1 Class Breakdown',size=20);
ax0[0].minorticks_on()
ax0[0].grid()
ax0[0].grid(which='minor', linestyle=':', linewidth='0.1', color='black')
ax0[0].set_xlim(-0.01,1.01);
ax0[0].set_ylim(-0.01,1.01);
ax0[0].tick_params(labelsize=22)


for cl in np.arange(0,len(class_list)):
    ax0[1].plot(score_thresh,r[:,cl],color=w(cl/(len(class_list)-1)),path_effects=[pe.Stroke(linewidth=6, foreground='k'), pe.Normal()],linestyle = linsty[cl],linewidth=4)
ax0[1].set_xlabel('Confidence Score',size=20);
ax0[1].axis('square');
ax0[1].set_title('Recall Class Breakdown',size=20);
ax0[1].minorticks_on()
ax0[1].grid()
ax0[1].grid(which='minor', linestyle=':', linewidth='0.1', color='black')
ax0[1].set_xlim(-0.01,1.01);
ax0[1].set_ylim(-0.01,1.01);
ax0[1].tick_params(labelsize=22)


for cl in np.arange(0,len(class_list)):
    ax0[2].plot(score_thresh,p[:,cl],color=w(cl/(len(class_list)-1)),path_effects=[pe.Stroke(linewidth=6, foreground='k'), pe.Normal()],linestyle = linsty[cl],linewidth=4)
ax0[2].legend(class_list,loc='right',bbox_to_anchor=(1.42, 0.5),prop={'size': 14},fontsize=40)
ax0[2].set_xlabel('Confidence Score',size=20);
ax0[2].axis('square');
ax0[2].set_title('Precision Class Breakdown',size=20);
ax0[2].minorticks_on()
ax0[2].grid()
ax0[2].grid(which='minor', linestyle=':', linewidth='0.1', color='black')
ax0[2].set_xlim(-0.01,1.01);
ax0[2].set_ylim(-0.01,1.01);
ax0[2].tick_params(labelsize=22)

In [1]:
######################################################
## Plot the confusion matrix where confidence = 0.5 ##
######################################################
import pandas as pd
pd.DataFrame(confm[:,:,5].astype(int),columns=class_list,index=['Background']+class_list)

## The below cell will plot the results of each category at the optimized confidence score determined by peak F1 performance. This plot will tell you what scores you should threshold the detector confidence at for optimized F1 performance (see description of the "conf" parameter in the nightingale_api notebook). 

In [6]:
peak_f1s = []
optimized_scores = []
p_f1s = []
r_f1s = []
score_class_list = []

index=[]
for col in range(0,f1.shape[1]):
    try:
        index.append(np.nanargmax(f1[:,col]))
    except:
        index.append(0)
    
index = np.array(index)
for cl in range(0,len(class_list)):
    peak_f1s.append(f1[index[cl],cl])
    p_f1s.append(p[index[cl],cl])
    r_f1s.append(r[index[cl],cl])
    optimized_scores.append(score_thresh[index[cl]])
    score_class_list.append(class_list[cl]+'\n@conf='+str(optimized_scores[cl]))
    
fig_break, ax_break = plt.subplots(figsize = (10,10))
width = 0.25
x = np.arange(0,len(class_list),dtype=np.int32)

ax_break.bar(x-width,peak_f1s,width=width,label='f1')
ax_break.bar(x,p_f1s,width=width, label='Precision')
ax_break.bar(x+width,r_f1s,width=width, label='Recall')

ax_break.set_xticks(x)
ax_break.set_xticklabels(class_list)
ax_break.legend()
ax_break.set_title('Peak F1 Performance by Category',weight='bold',size=12);

ax_break.set_ylim([0,1])

ax_break.set_yticks(np.arange(0.0,1.05,0.05))
ylabels = np.ndarray.tolist(np.array(np.round(ax_break.get_yticks(),decimals=2),dtype='str'))
ax_break.set_yticklabels(ylabels,weight='bold',size=12);
ax_break.set_ylabel('Score',weight='bold',size=12);
ax_break.set_xticklabels(score_class_list,weight='bold',size=12);

ax_break.grid(axis='y')

<div class = "alert alert-block alert-success">
    
# <span style='color:Green'> *False-Positive Feedback (FPF)*
    
# <span style='color:purple'> Now that we've evaluated our model, you are done learning most of what Nightingale has to offer! But the output of the Eval function gives us more information that we can use to improve the model by reducing False Positives.

# <span style='color:blue'> Notice that there is a new folder in our Test&Evaluate folder called "FP_Files"

In [12]:
ls

evaluate.ipynb  [0m[01;34mFP_Files[0m/  my_results.csv  objdeteval.py  [01;34m__pycache__[0m/


<div class = "alert alert-block alert-success">

# <span style='color:blue'> The folder contains csv files of False Positives that were detected at each confidence score threshold we tested for:

In [13]:
ls FP_Files/

False_Positives_score_0.01_and_up.csv  False_Positives_score_0.6_and_up.csv
False_Positives_score_0.1_and_up.csv   False_Positives_score_0.7_and_up.csv
False_Positives_score_0.2_and_up.csv   False_Positives_score_0.8_and_up.csv
False_Positives_score_0.3_and_up.csv   False_Positives_score_0.9_and_up.csv
False_Positives_score_0.4_and_up.csv   False_Positives_score_1.0_and_up.csv
False_Positives_score_0.5_and_up.csv


<div class = "alert alert-block alert-success">

# <span style='color:blue'> Using Pandas, we can easily merge our False Positives labels with the sample Nightingale Formatted groundtruth file (omittedimage_groundtruth.csv) in Nightingale/Inference/Sample_NITF. 
    
# <span style='color:blue'> Let's make a new CSV that includes both the original annotations and all False Positives that had a score >= 0.8

In [15]:
import pandas as pd

In [16]:
gt_path_True = '../Sample_NITF/omittedimage_groundtruth.csv'
gt_path_FP = 'FP_Files/False_Positives_score_0.8_and_up.csv'
# reading the groundtruth file
df_True = pd.read_csv(gt_path_True)
df_FP = pd.read_csv(gt_path_FP)
gt_dataframe = pd.concat([df_True,df_FP])

<div class = "alert alert-block alert-success">

# <span style='color:blue'> See that the new training data includes our original categories and new FALSE categories. 

In [2]:
gt_dataframe['class'].unique()

<div class = "alert alert-block alert-success">

# <span style='color:blue'> Save your combined groundtruth and False Positive dataframe a new CSV

In [19]:
gt_dataframe.to_csv('../Sample_NITF/omittedimage_groundtruth_Plus_FP.csv',index=False)

<div class = "alert alert-block alert-success">

# <span style='color:green'> The resulting file can be used for future model training. When you convert your data to a TensorFlow Record (Training Notebook 2), Nightingale will automatically include examples of the "FALSE" categories in the training set by chipping image areas that contain False Positives and re-labeling the False Positives as background. When you retrain your model, it should be more robust to False Positives (higher precision). 

<div class = "alert alert-block alert-success">

# <span style='color:green'> Try going back to the Training notebooks and think how you would apply this to the "PRIORI_TRAIN" csv you created. What would be a good strategy for collecting FP's and retraining the model? Answer: Run inference on your training data, collect the False Postives, make a new groundtruth file that includes the False Positives, and retrain the model.  

<div class = "alert alert-block alert-success">

# <span style='color:green'> That about covers it! 
    
# <span style='color:blue'> Good luck, and thanks for checking out <span style='color:purple'>*Nightingale*</span>