# Model Performance Statistics
This example notebook shows statistics, including box plots, mean and variance, for model performance. You can run an experimnet multiple times and use this notebook to analyze the metrics results distribution.

## Specify Model Performance
You can train models multiple times (say 10 times) using the same Synthetic data. And copy model performances down below. For each metric, you can copy metric value into a list (Each column represents metric value for one run).
1. In the baseline performance session, copy the baseline model performance.
2. In the new performance session, copy the model you want to compare.

In [56]:
# baseline performance
base_dataset_name = "v6"
mean_ap_base = [0.025,0.016,0.011,0.015,0.020,0.032,0.046,0.019,0.013,0.028]
mean_ap_50_base = [0.052,0.033,0.023,0.031,0.046,0.079,0.118,0.044,0.032,0.082]
mean_ar_base = [0.077,0.070,0.038,0.048,0.086,0.114,0.192,0.072,0.059,0.113]

# new performance
new_dataset_name = "v6"
mean_ap_new = [0.035,0.037,0.055,0.060,0.069,0.109,0.055,0.069,0.050,0.090]
mean_ap_50_new = [0.074,0.079,0.126,0.138,0.174,0.307,0.117,0.162,0.097,0.234]
mean_ar_new = [0.113,0.095,0.139,0.167,0.164,0.303,0.140,0.199,0.164,0.217]

## Baseline and New Model Statistics

In [57]:
# indices = {0: "val mAP", 1: "test mAP", 2: "val mAP50", 3: "test mAP50", 4: "val mAR", 5: "test mAR"}
indices = {0: "base mAP", 1: "new mAP", 2: "base mAP50", 3: "new mAP50", 4: "base mAR", 5: "new mAR"}

In [58]:
import pandas as pd

def collection(mean_ap_base, mean_ap_50_base, mean_ar_base, mean_ap_new, mean_ap_50_new, mean_ar_new):
    performance = pd.DataFrame()
    mAP_base = pd.Series(mean_ap_base)
    mAP50_base = pd.Series(mean_ap_50_base)
    mAR_base = pd.Series(mean_ar_base)
    mAP_new = pd.Series(mean_ap_new)
    mAP50_new = pd.Series(mean_ap_50_new)
    mAR_new = pd.Series(mean_ar_new)
    performance = performance.append(mAP_base.describe()[1:3],ignore_index=True)
    performance = performance.append(mAP_new.describe()[1:3],ignore_index=True)
    performance = performance.append(mAP50_base.describe()[1:3],ignore_index=True)
    performance = performance.append(mAP50_new.describe()[1:3],ignore_index=True)
    performance = performance.append(mAR_base.describe()[1:3],ignore_index=True)
    performance = performance.append(mAR_new.describe()[1:3],ignore_index=True)
    performance = performance.rename(index=indices)
    return performance


In [59]:
performance = collection(mean_ap_base, mean_ap_50_base, mean_ar_base, mean_ap_new, mean_ap_50_new, mean_ar_new)
performance

Unnamed: 0,mean,std
base mAP,0.0225,0.010638
new mAP,0.0629,0.022811
base mAP50,0.054,0.029978
new mAP50,0.1508,0.072918
base mAR,0.0869,0.044411
new mAR,0.1701,0.059212


## Model Performance Box Plot
In this part, box plots for two model performances would be displayed.

In [60]:
import plotly.graph_objects as go

columns = ["performance", "metrics_type"]
def performance_plot(title,mean_ap_base, mean_ap_50_base, mean_ar_base, mean_ap_new, mean_ap_50_new, mean_ar_new):
    names = list(indices.values())
    fig = go.Figure(layout=go.Layout(title=go.layout.Title(text=title)))
    fig.update_yaxes(range=[0, 0.5])
    fig.add_trace(go.Box(y=mean_ap_base, name=names[0], marker_color = 'indianred'))
    fig.add_trace(go.Box(y=mean_ap_new, name=names[1], marker_color = 'lightseagreen'))
    fig.add_trace(go.Box(y=mean_ap_50_base, name=names[2], marker_color = 'indianred'))
    fig.add_trace(go.Box(y=mean_ap_50_new, name=names[3], marker_color = 'lightseagreen'))
    fig.add_trace(go.Box(y=mean_ar_base, name=names[4], marker_color = 'indianred'))
    fig.add_trace(go.Box(y=mean_ar_new, name=names[5], marker_color = 'lightseagreen'))
    return fig

In [61]:
fig = performance_plot(f"Baseline ({base_dataset_name}) vs New ({new_dataset_name}) dataset", mean_ap_base, mean_ap_50_base, mean_ar_base, mean_ap_new, mean_ap_50_new, mean_ar_new)
fig.show()

## P value
A p-value is the probability that the results from your sample data occurred by chance. P-values are from 0 to 1. Low p-values are good; They indicate your data did not occur by chance. In most cases, a p-value of 0.05 is accepted to mean the data is valid.

In [55]:
from scipy import stats

t2, p_value = stats.ttest_ind(mean_ap_base, mean_ap_new)
print("mAP:")
print(f"p_value = {p_value: .4f}")
t2, p_value = stats.ttest_ind(mean_ap_50_base, mean_ap_50_new)
print("mAP@IOU50:")
print(f"p_value = {p_value: .4f}")
t2, p_value = stats.ttest_ind(mean_ar_base, mean_ar_new)
print("mAR:")
print(f"p_value = {p_value: .4f}")

mAP:
p_value =  0.0001
mAP@IOU50:
p_value =  0.0011
mAR:
p_value =  0.0023
