## <center> P-value calculation for JIP models </center>

This Notebook calculates the p-values for predictions made by the JIP models for the test datasets.
Before executing this Notebook, be sure to have trained all 6 artefact models using the provided code in three steps:
    
1. Preprocess all datasets (train and test) using the following command:
```bash
python JIP.py --mode preprocess --device <cuda_id> --datatype train
```  
and   
```bash
python JIP.py --mode preprocess --device <cuda_id> --datatype test
```
2. Train all 6 models using the following command:
```bash
python JIP.py --mode train --device <cuda_id> --datatype train 
                 --noise_type <noise_model> --store_data
```
3. Perform the testing as follows:
```bash
python JIP.py --mode testIOOD --device <cuda_id> --datatype test
                 --noise_type <noise_model> --store_data
```


Once this is finished, everything is set up to run the Notebook.

#### Import necessary libraries

In [1]:
import warnings
warnings.filterwarnings('ignore')

import os
import math
import numpy as np
from sklearn.metrics import confusion_matrix

# -- Grouper from https://stackoverflow.com/questions/8991506/iterate-an-iterator-by-chunks-of-n-in-python -- #
def grouper(n, iterable, fillvalue=None):
    "grouper(3, 'ABCDEFG', 'x') --> ABC DEF Gxx"
    args = [iter(iterable)] * n
    return zip_longest(fillvalue=fillvalue, *args)

#### Set necessary directories
Specify the train_base and test_base directory. These are just the full paths to the JIP folder train_dirs and test_dirs output, for instance: `../JIP/train_dirs/output` and `../JIP/test_dirs/output`.

In [2]:
# Set the base path to JIP/train_dirs/output folder
train_base = '<path>/JIP/train_dirs/output/'
# Set the base path to JIP/test_dirs/output folder
test_base = '<path>/JIP/test_dirs/output/'

#### Load data

In [3]:
artefacts = ['blur', 'ghosting', 'motion', 'noise', 'resolution', 'spike']

data = dict()
for artefact in artefacts:
    # Load data
    dl = np.load(os.path.join(train_base, artefact, 'results', 'accuracy_detailed_test.npy'))
    ID = np.load(os.path.join(test_base, artefact, 'testID_results', 'accuracy_detailed_test.npy'))
    OOD = np.load(os.path.join(test_base, artefact, 'testOOD_results', 'accuracy_detailed_test.npy'))
    
    # Create One Hot vectors from predicted values
    for idx, a in enumerate(dl):
        b = np.zeros_like(a[1])
        b[a[1].argmax()] = 1
        a[1] = b
        dl[idx] = a
    for idx, a in enumerate(ID):
        b = np.zeros_like(a[1])
        b[a[1].argmax()] = 1
        a[1] = b
        ID[idx] = a
    for idx, a in enumerate(OOD):
        b = np.zeros_like(a[1])
        b[a[1].argmax()] = 1
        a[1] = b
        OOD[idx] = a
        
    # Save data in dictionary
    data['test_dl-' + artefact] = dl
    data['test_ID-' + artefact] = ID
    data['test_OOD-' + artefact] = OOD

#### Transform data into Confusion Matrix

In [4]:
# Transform the data into right format for calculations
y_yhats = dict()
# Loop through all data sets
for k, v in data.items():
    y_yhats[k] = dict()
    y_yhats[k]['prediction'] = list()
    y_yhats[k]['ground_truth'] = list()
    # Change the format of y_yhats --> split in prediction and GT
    for y_yhat in v:
        y_yhats[k]['prediction'].append(y_yhat[1])
        y_yhats[k]['ground_truth'].append(y_yhat[0])
    y_yhats[k]['prediction'] = np.array(y_yhats[k]['prediction'])
    y_yhats[k]['ground_truth'] = np.array(y_yhats[k]['ground_truth'])

#### Calculate p-value for all test sets based on the assumption that bad quality images (labels 1, 2 and 3 (and 4)) are really rejected by each classifier

In [5]:
print('NOTE: In every confusion matrix: x-axis --> predicted, y-axis --> actual.')

def p_test(p,y,n):
    '''takes a p value for the null hypothesis, the number of 
    successfull results and the number of unseccessfull results'''
    prob = 0
    combined = y + n 
    for times in np.arange(y,combined+1,1):
        term = p**times * (1-p)**(combined-times)
        bin = math.comb(combined,times)
        prob += term * bin
    return prob

def find_p_p_test(y,n):
    for p in np.arange(1,0,-0.01):
        #print(p,p_test(p,y,n))
        if p_test(p,y,n) <= 0.05:
            return round(p, 2), round(p_test(p,y,n), 2)

# Loop through the transformed data and calculate everything
for test_name, results in y_yhats.items():
    confusion = confusion_matrix(results['ground_truth'].argmax(axis=1),
                                 results['prediction'].argmax(axis=1),
                                 labels=[0, 1, 2, 3, 4])
    print('\n{}:'.format(test_name))
    print('Confusion Matrix')
    print(confusion)
    
    # Compress the confusion metric
    #print()
    #print(confusion[:3, :3]) #TP
    #print(confusion[3:, 3:]) #TN
    #print(confusion[3:, :3]) #FP
    #print(confusion[:3, 3:]) #FN
    compr_confusion = np.array([[confusion[:3, :3].sum(), confusion[:3, 3:].sum()],
                               [confusion[3:, :3].sum(), confusion[3:, 3:].sum()]])
    print('\nCompressed Confusion Matrix')
    print(compr_confusion)
    
    print('\nsensitivity: (probability, p-value) for which we can reject H0: {}'.format(find_p_p_test(confusion[:3, :3].sum(), confusion[:3, 3:].sum())))
    print('specificity: (probability, p-value) for which we can reject H0: {}\n'.format(find_p_p_test(confusion[3:, 3:].sum(), confusion[3:, :3].sum())))

NOTE: In every confusion matrix: x-axis --> predicted, y-axis --> actual.

test_dl-blur:
Confusion Matrix
[[195  53   2  33   5]
 [  8 176  21  11   7]
 [ 39  33 157  14  15]
 [  1  16  32  59  19]
 [ 18  22  53  21 190]]

Compressed Confusion Matrix
[[684  85]
 [142 289]]

sensitivity: (probability, p-value) for which we can reject H0: (0.86, 0.01)
specificity: (probability, p-value) for which we can reject H0: (0.63, 0.04)


test_ID-blur:
Confusion Matrix
[[ 0  0  0  0  0]
 [ 0  0  0  0  0]
 [ 0 23  9 35 13]
 [ 0  6  5 18 19]
 [ 0  0  2  0 30]]

Compressed Confusion Matrix
[[32 48]
 [13 67]]

sensitivity: (probability, p-value) for which we can reject H0: (0.3, 0.04)
specificity: (probability, p-value) for which we can reject H0: (0.75, 0.04)


test_OOD-blur:
Confusion Matrix
[[ 0  0  0  0  0]
 [ 0  0  0  0  0]
 [ 0  0  0  0  0]
 [ 0 12  2 40 42]
 [ 0 15  9 92 12]]

Compressed Confusion Matrix
[[  0   0]
 [ 38 186]]

sensitivity: (probability, p-value) for which we can reject H0: Non

In [6]:
# Loop through the transformed data and calculate everything
for test_name, results in y_yhats.items():
    confusion = confusion_matrix(results['ground_truth'].argmax(axis=1),
                                 results['prediction'].argmax(axis=1),
                                 labels=[0, 1, 2, 3, 4])
    print('\n{}:'.format(test_name))
    print('Confusion Matrix')
    print(confusion)
    
    # Compress the confusion metric
    #print()
    #print(confusion[:4, :4]) #TP
    #print(confusion[4:, 4:]) #TN
    #print(confusion[4:, :4]) #FP
    #print(confusion[:4, 4:]) #FN
    compr_confusion = np.array([[confusion[:4, :4].sum(), confusion[:4, 4:].sum()],
                               [confusion[4:, :4].sum(), confusion[4:, 4:].sum()]])
    print('\nCompressed Confusion Matrix')
    print(compr_confusion)
    
    print('\nsensitivity: (probability, p-value) for which we can reject H0: {}'.format(find_p_p_test(confusion[:4, :4].sum(), confusion[:4, 4:].sum())))
    print('specificity: (probability, p-value) for which we can reject H0: {}\n'.format(find_p_p_test(confusion[4:, 4:].sum(), confusion[4:, :4].sum())))


test_dl-blur:
Confusion Matrix
[[195  53   2  33   5]
 [  8 176  21  11   7]
 [ 39  33 157  14  15]
 [  1  16  32  59  19]
 [ 18  22  53  21 190]]

Compressed Confusion Matrix
[[850  46]
 [114 190]]

sensitivity: (probability, p-value) for which we can reject H0: (0.93, 0.01)
specificity: (probability, p-value) for which we can reject H0: (0.57, 0.03)


test_ID-blur:
Confusion Matrix
[[ 0  0  0  0  0]
 [ 0  0  0  0  0]
 [ 0 23  9 35 13]
 [ 0  6  5 18 19]
 [ 0  0  2  0 30]]

Compressed Confusion Matrix
[[96 32]
 [ 2 30]]

sensitivity: (probability, p-value) for which we can reject H0: (0.67, 0.03)
specificity: (probability, p-value) for which we can reject H0: (0.81, 0.04)


test_OOD-blur:
Confusion Matrix
[[ 0  0  0  0  0]
 [ 0  0  0  0  0]
 [ 0  0  0  0  0]
 [ 0 12  2 40 42]
 [ 0 15  9 92 12]]

Compressed Confusion Matrix
[[ 54  42]
 [116  12]]

sensitivity: (probability, p-value) for which we can reject H0: (0.47, 0.04)
specificity: (probability, p-value) for which we can reject H0: