## Execution of ABROCA code
This notebook will demonstrate a couple of ways to execute the ABROCA code for different purposes/outputs.\
Change paths and parameters to run

In [None]:
import abroca
import pandas as pd
import numpy as np

#### Running on Binary Attributes
The simplest way to run ABROCA, where two attributes being compared with each other. \
Ommit bootstrap if you do not want bootstraps, set getGraph = False if you do not want graph. 

In [None]:
df = pd.read_csv("/Users/shent/Desktop/summer23/fairness/abroca_boot/lens_merged_recon34_10.csv")
 
df.name="Eedi_Small_LENS"   # the name of the output folder
actual="correct"
predicted="probability_correct"
bootstrap=10    # customize 

demographic="PremiumPupil"

bin1=1
bin2=0

abroca_val=abroca.ABROCA(df, demographic, actual, predicted, bin1, bin2, bootstrap=bootstrap, getGraph=True) 
print(f"{abroca_val} (ABROCA value ({bin1} vs {bin2}) for {df.name} on {demographic})")

#### Multi-Classification Attributes
Executes for attributes with more than two values to compare.\
\
Run the code block above for pairwise comparison.\
Run the code blocks below for 1-vs-all or 1-vs-others tests.\
Customize the value of bin2 as "all" or "other".

In [None]:
# These are all the same as above
df = pd.read_csv("/Users/shent/Desktop/summer23/fairness/madd_boot/MAP_medium_recon_meta.csv")

df.name="MAP_medium" # the name of the output folder
actual="correct"
predicted="probability_correct"
bootstrap=10    # customize 

demographic="STUDENT_ETHNIC_GRD_KEY"

In [None]:
#customize here
bin2="other"  # bin2 = "other" or bin2 = "all"

In [None]:
# This is also the same as binary
abroca_val=abroca.ABROCA(df, demographic, actual, predicted, bin1, bin2, bootstrap=bootstrap, getGraph=True) 
print(f"{abroca_val} (ABROCA value ({bin1} vs {bin2}) for {df.name} on {demographic})")

#### Aggregation and Bootstrapping the Aggregations

The below code demonstrates how to aggregate (by addition) and bootstrap for aggregated results (the sums).\
Customize bin2 and number of bootstraps.

In [None]:
# These are all the same as above
df = pd.read_csv("/Users/shent/Desktop/summer23/fairness/madd_boot/MAP_medium_recon_meta.csv")

df.name="MAP_medium" # the name of the output folder
actual="correct"
predicted="probability_correct"
bootstrap=10    # customize 

demographic="STUDENT_ETHNIC_GRD_KEY"


In [None]:

#customize here
bin2="other"  # bin2 = "other" or bin2 = "all"

In [None]:
# getting the abroca value for all the classes and summing them up
num_classes=df[demographic].nunique()
classes=df[demographic].unique()
abroca_val=0
for i in range(num_classes):
    bin1 = classes[i]
    abroca_val=abroca.ABROCA(df, demographic, actual, predicted, bin1, bin2, bootstrap=False, getGraph=False)
    abroca_val+=abroca

print(f"{abroca_val} (Sum of ABROCA value for {df.name} on {demographic})")


In [None]:
num_boot=100 # customize number of bootstraps

# shuffling the unique classes and mapping them back for each bootstrap(permutation)
# For each bootstrap(permutation), caculate the sum of all abroca values and put them in a list
# From that list, calculate the p-value of the actual sum

abrocas=[]
for j in range(num_boot):
    np.random.shuffle(classes) 

    # Create a dictionary to map the shuffled labels to the original labels 
    shuffle_mapping = {original_label: shuffled_label for original_label, shuffled_label in zip(df[demographic].unique(), classes)} 
    df[demographic] = df[demographic].map(shuffle_mapping)

    if j%20==0:
        print(f"bootstrap{j}")

    abroca_sum=0
    for k in range(num_classes):
        bin1 = classes[k]
        abroca_val_boot=abroca.ABROCA(df, demographic, actual, predicted, bin1, bin2, bootstrap=False, getGraph=False)
        abroca_sum+=abroca_val_boot
    
    abrocas.append(abroca_sum)

p=len([x for x in abrocas if x > abroca_val])/len(abrocas)
print(f"Test statistic for aggregated abroca={abroca_val} in permutations: p={p}")
