# **Rejection Evaluation**
***

## **Seed 55**
***

We trained resnet50 on RSNAPneumonia dataset. This is a binary classification task which classifies chest X-ray images as either sick or normal. Let's take a look at the results dataframe:

In [2]:
import pandas as pd
output_path = r"C:\Computer_Vision\Medical_Images_With_Rejection\OurSolution\failure_detection_benchmark\outputs\RSNAPneumonia\resnet50\dropout_all_layers_autolr_paper\seed_55\failure_detection\scores_df.csv"
scores_df = pd.read_csv(output_path)
scores_df.head()

Unnamed: 0,Targets,Predictions,IsCorrect,Probas,Threshold,Baseline,doctor_alpha,mcmc_soft_scores,mcmc_predictions,mcmc_probas,...,Laplace_threshold,Laplace_score,Laplace_probas,TrustScore,ConfidNet_scores,SWAG_probas,SWAG_targets,SWAG_predictions,SWAG_threshold,SWAG_score
0,0,False,True,0.069185,0.256104,0.930815,-0.147838,0.910226,False,0.089774,...,0.270832,0.91082,0.08918,1.170258,0.883999,0.202565,0,False,0.365878,0.797435
1,0,False,True,0.04234,0.256104,0.95766,-0.08825,0.966602,False,0.033398,...,0.270832,0.931371,0.068629,1.159251,0.929127,0.047159,0,False,0.365878,0.952841
2,0,False,True,0.119516,0.256104,0.880484,-0.266567,0.880902,False,0.119098,...,0.270832,0.855339,0.144661,1.57555,0.786745,0.091072,0,False,0.365878,0.908928
3,0,False,True,0.020617,0.256104,0.979383,-0.042083,0.97586,False,0.02414,...,0.270832,0.95601,0.04399,1.306924,0.96305,0.058561,0,False,0.365878,0.941439
4,1,True,True,0.501353,0.256104,0.501353,-0.999985,0.450166,True,0.450166,...,0.270832,0.501297,0.501297,1.012178,0.499037,0.400292,1,True,0.365878,0.400292


Let's see what columns do we have in here:

In [3]:
scores_df.columns

Index(['Targets', 'Predictions', 'IsCorrect', 'Probas', 'Threshold',
       'Baseline', 'doctor_alpha', 'mcmc_soft_scores', 'mcmc_predictions',
       'mcmc_probas', 'mcmc_entropy_scores', 'Laplace_targets',
       'Laplace_predictions', 'Laplace_threshold', 'Laplace_score',
       'Laplace_probas', 'TrustScore', 'ConfidNet_scores', 'SWAG_probas',
       'SWAG_targets', 'SWAG_predictions', 'SWAG_threshold', 'SWAG_score'],
      dtype='object')

The dataframe consists of the following columns:
* Targets - Our GT classes.
* Predictions - The predicted classes
* IsCorrect - Is that classification correct or not?
* Probas - the probability of making a prediction.
* Threshold - The decision threshold that will yield the FPR as closest to 0.2 as possible.
* Baseline - Baseline softmax scorer.
* doctor_alpha - It is a scorer that quantifies the likelihood of being misclassified , and thus negative (as stated in the article).
* mcmc-soft-scores, mcmc-entropy scores - From the paper: "MC-dropout (MC): Gal & Ghahramani (2016) showed that training a neural network with dropout
regularization (Srivastava et al., 2014) produces a Bayesian approximation of the posterior, where the
approximation is obtained by Monte-Carlo sampling of the network’s parameters i.e. by applying
dropout at test-time and averaging the outputs over several inference passes. The confidence in
the prediction can then be approximated by the negative entropy of the outputs; or by taking the
softmax confidence score on the averaged outputs."
* mcmc predictions - predictions made by using Monte-Carlo sampling on the network's parameters.
* mcmc_probas - probabilities of the predictions for monte-carlo sampling.
* Laplace_targets - The targets for the laplace method. They are the same as the general targets.
* Laplace predictions - predictions made by Laplace
* Laplace threshold - In Laplace score, for binary classification problem, it computes the ROC curve for the validation set, and searches for the threshold that will yield a False Positive Rate as close as possible to 0.2. That threshold is saved in that column.
* Laplace score -Laplace confidence score
* Laplace_probas - probabilities of laplace predictions (classifying 0 or 1)
* TrustScore - Score made by TrustScore scorer.
* ConfidNet scores- scores made by ConfidNet
* SWAG_probas - probabilities of SWAG predictions (classifying 0 or 1)
* SWAG_targets - targets for SWAG method. The same as the general targets.
* SWAG_predictions - The predictions made by SWAG
* SWAG threshold - In SWAG score, for binary classification problem, it computes the ROC curve for the validation set, and searches for the threshold that will yield a False Positive Rate as close as possible to 0.2. That threshold is saved in that column.
* SWAG_score - SWAG confidence scores

We used The Baseline scorer, doctor-alpha, MC- softmax, MC- entropy, Laplace, TrustScore, ConfidNet, and SWAG (8 overall). The ones missing (as compared to the paper) are DUQ (which we failed to train due to insufficient CUDA memory) and ensemble (we used only 1 seed so ensemble is irrelevant).
