In [1]:
import numpy as np

# Definitions:


- PPV (precision) = TP/ (TP+FP) = 6/(6+30) = 0.166 
- FPR (pollution rate) = FP / (FP+TN) = 30/(30+8) = 0.789 
- Recall (TPR) = TP / (TP+FN) = 6/(6+2) = 0.750 


**AUC ROC plots (FRP vs. TPR) while AUC PVR plots (PPV, TRP)**

###### Imbalanced: are shown with numbers 
###### balanced: are shown with letters 

In [9]:
def ppv (tp, fp):
    return tp/ (tp+fp)

def fpr(fp, tn):
    return fp/ (fp+tn)


def tpr(tp, fn):
    return tp/(tp+fn)

In [10]:
# case 1: pos & neg perfect
tp1 = 10
fp1 = 0
tn1 = 90
fn1 = 0

ppv1 = ppv(tp1, fp1)
fpr1 = fpr(fp1, tn1)
tpr1 = tpr(tp1, fn1)

ppv1, fpr1, tpr1

(1.0, 0.0, 1.0)

In [29]:
# case A: pos & neg perfect
tp_a = 10
fp_a = 0
tn_a = 9
fn_a = 0

ppv_a = ppv(tp_a, fp_a)
fpr_a = fpr(fp_a, tn_a)
tpr_a = tpr(tp_a, fn_a)

ppv_a, fpr_a, tpr_a

(1.0, 0.0, 1.0)

- Not much differences 

In [12]:
# case 2: pos: rnd, neg: perfect
tp2 = 5
fp2 = 5
tn2 = 90
fn2 = 0

ppv2 = ppv(tp2, fp2)
fpr2 = fpr(fp2, tn2)
tpr2 = tpr(tp2, fn2)

ppv2, fpr2, tpr2

(0.5, 0.05263157894736842, 1.0)

In [13]:
# case B: pos: rnd, neg: perfect
tp_b = 5
fp_b = 5
tn_b = 9
fn_b = 0

ppv_b = ppv(tp_b, fp_b)
fpr_b = fpr(fp_b, tn_b)
tpr_b = tpr(tp_b, fn_b)

ppv_b, fpr_b, tpr_b

(0.5, 0.35714285714285715, 1.0)



- precision is 50% but it not completely representative, because we have 100% correct prediction for negative class.

- **Moreover, the main issue appears if one compare the case B with case 2 and will understand how illusive this value is in case 2**



In [14]:
# case 3: pos: perfect, neg: 50-50 
tp3 = 10
fp3 = 0
tn3 = 80
fn3 = 10

ppv3 = ppv(tp3, fp3)
fpr3 = fpr(fp3, tn3)
tpr3 = tpr(tp3, fn3)

ppv3, fpr3, tpr3

(1.0, 0.0, 0.5)

In [19]:
# case 3: pos: perfect, neg: 50-50 
tp_c = 10
fp_c = 0
tn_c = 4
fn_c = 5

ppv_c = ppv(tp_c, fp_c)
fpr_c = fpr(fp_c, tn_c)
tpr_c = tpr(tp_c, fn_c)

ppv_c, fpr_c, tpr_c

(1.0, 0.0, 0.6666666666666666)

- we cannot observe significant differences in these cases

In [20]:
# case 4: pos: perfect, neg: 50-50 
tp4 = 10
fp4 = 0
tn4 = 45
fn4 = 45

ppv4 = ppv(tp4, fp4)
fpr4 = fpr(fp4, tn4)
tpr4 = tpr(tp4, fn4)

ppv4, fpr4, tpr4

(1.0, 0.0, 0.18181818181818182)

In [24]:
# case 4: pos: perfect, neg: 50-50 
tp_d = 10
fp_d = 0
tn_d = 4
fn_d = 5

ppv_d = ppv(tp_d, fp_d)
fpr_d = fpr(fp_d, tn_d)
tpr_d = tpr(tp_d, fn_d)

ppv_d, fpr_d, tpr_d

(1.0, 0.0, 0.6666666666666666)

- We observe significant difference between balanced and imbalanced datasets w.r.t to call, which leads to misunderstanding of the performance of an algorithm in imbalanced case, if one uses ROC AUC plots

In [26]:
# case 5: pos: perfect, neg: bad
tp5 = 10
fp5 = 0
tn5 = 1
fn5 = 89

ppv5 = ppv(tp5, fp5)
fpr5 = fpr(fp5, tn5)
tpr5 = tpr(tp5, fn5)

ppv5, fpr5, tpr5

(1.0, 0.0, 0.10101010101010101)

In [28]:
# case 4: pos: perfect, neg: rnd
tp_e = 10
fp_e = 0
tn_e = 1
fn_e = 8

ppv_e = ppv(tp_e, fp_e)
fpr_e = fpr(fp_e, tn_e)
tpr_e = tpr(tp_e, fn_e)

ppv_e, fpr_e, tpr_e

(1.0, 0.0, 0.5555555555555556)

- again the difference is clear!

## Conclusion:
The ROC AUC values, which are the result of plotting PPV (precision) vs TPR (recall), are illusively low, and this is why it cannot be considered as good evaluation metric/plot.