# Results from `1_vs_all` datasets

On a first view, it looks like the performance is the same (up to insignificant variations) for classifying 1_vs_all any of the 10 antigens. Although having an (with 1:9 unbalanaced data!) accuracy of 89%, the balanced accuracy is 58%, precision is 43%, recall 20% and auc 73%.

Which metric to use?

https://neptune.ai/blog/balanced-accuracy
```
F1 score doesn’t care about how many true negatives are being classified. When working on an imbalanced dataset that demands attention on the negatives, Balanced Accuracy does better than F1. In cases where positives are as important as negatives, balanced accuracy is a better metric for this than F1. F1 is a great scoring metric for imbalanced data when more attention is needed on the positives. 
```


A potentially insightful question is whether the random classifiers had picked similar 3-mers as important features. This is something to be checked in this notebook.

In [1]:
from pathlib import Path
import json
import re
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

import NegativeClassOptimization.config as config

In [9]:
def antigen_from_dir_name(dir_name) -> str:
    return (
        re.search(
            r"fit_(.*)_vs_all_dataset_instruction", 
            dir_name
        ).groups()[0]
    )


BASE_PATH = config.IMMUNE_ML_BASE_PATHPath / "1_vs_all_analysis_1_out"


dfs = []
fit_paths = list(BASE_PATH.glob("fit_*_vs_all_dataset_instruction"))
for path in fit_paths:
    df_i = pd.read_csv(path / "binder_all_assessment_performances.csv")
    assert df_i.shape[0] == 1
    
    ag = antigen_from_dir_name(path.name)
    df_i['Antigen'] = ag
    dfs.append(df_i)

df = pd.concat(dfs, axis=0)
df["specificity"] = 2*df["balanced_accuracy"] - df["recall"]
df[["Antigen", "balanced_accuracy", "accuracy", "recall", "precision", "auc", "f1_macro", "specificity"]]

Unnamed: 0,Antigen,balanced_accuracy,accuracy,recall,precision,auc,f1_macro,specificity
0,3RAJ,0.587568,0.89432,0.204877,0.431416,0.732646,0.610404,0.970259
0,1OB1,0.58753,0.894368,0.204733,0.431811,0.733259,0.610393,0.970328
0,1NSN,0.588244,0.894692,0.205933,0.435135,0.732928,0.611378,0.970555
0,1ADQ,0.587733,0.894501,0.205021,0.43312,0.732693,0.610698,0.970444
0,3VRL,0.587173,0.894225,0.204109,0.430321,0.732313,0.609913,0.970238
0,1WEJ,0.587776,0.894311,0.205357,0.431467,0.733702,0.610626,0.970196
0,5E94,0.588021,0.894559,0.205597,0.433809,0.733613,0.611049,0.970444
0,1H0D,0.587935,0.894521,0.205453,0.433418,0.73416,0.610932,0.970418
0,2YPV,0.587498,0.894463,0.204541,0.432633,0.73317,0.610417,0.970455
0,1FBI,0.587493,0.894416,0.204589,0.432208,0.732752,0.610382,0.970396
