## Applying the bias scan tool on a BERT disinformation classifier
In this notebook, the bias scan tool is applied on a BERT-based disinformation classifier. The bias scan tool is based on an implementation of the k-means Hierarchical Bias Aware Clustering (HBAC) method\*. The python script `./helper_functions.py` contains functions that execute the bias scan. A conceptual description how the bias scan works, including the rationale why k-means is chosen as a clustering algorithm and paramater choices, can be found in the [bias scan tool report](https://github.com/NGO-Algorithm-Audit/Bias_scan/blob/master/Bias_scan_tool_report.pdf).

The classifier is used to make predictions on the Twitter15\*\* data set. Details on pre-processing steps performed on this dataset are provided in the `../data/Twitter_dataset/Twitter_preprocessing.ipynb` notebook. Details on training the BERT disinformation classifier is provided in the `../case_studies/BERT_disinformation_classifier/BERT_Twitter_classifier.ipynb` notebook.

This notebook is structured as follows:
1. Load data and pre-processing
2. Bias scan using k-means clustering
3. Clustering results
4. Statistical testing of inter-cluster difference 

\* Misztal-Radecka, Indurkya, *Information Processing and Management*. Bias-Aware Hierarchical Clustering for detecting the discriminated groups of users in recommendation systems (2021).

\*\* Liu, Xiaomo and Nourbakhsh, Armineh and Li, Quanzhi and Fang, Rui and Shah, Sameena, *Proceedings of the 24th ACM International on Conference on Information and Knowledge Management* (2015) [[link to dataset]](https://www.dropbox.com/s/7ewzdrbelpmrnxu/rumdetect2017.zip?dl=0&file_subpath=%2Frumor_detection_acl2017)

### Overview of notebook:
1. Load data and pre-processing
2. Sensitivity testing of HBAC scan in `HBAC_BERT_disinformation_classifier.ipynb`

### Load libraries

In [1]:
import random
import warnings
import numpy as np
import pandas as pd
import seaborn as sns

# IPython
from IPython.display import Markdown, display

# matplotlib
import matplotlib.pyplot as plt

# helper functions
from helper_functions import *

# welch's t-test
import scipy.stats as stats

warnings.filterwarnings('ignore')

### 1. Load data and pre-processing

In [2]:
# read data
path = '../classifiers/BERT_disinformation_classifier/pred_BERT.csv'
df_pred = pd.read_csv(path)

# change column names
df_pred.columns = ['tweet','predicted_class','true_class']

# Calculating absolute errors
df_pred['errors'] = abs(df_pred['predicted_class'] - df_pred['true_class'])

# Calculate FP errors
FP_condition = (df_pred['predicted_class'] == 1) & (df_pred['true_class'] == 0)
df_pred['FP_errors'] = np.where(FP_condition, 1, 0)

# Calculate FN errors
FN_condition = (df_pred['predicted_class'] == 0) & (df_pred['true_class'] == 1)
df_pred['FN_errors'] = np.where(FN_condition, 1, 0)

df_pred.head()

Unnamed: 0,tweet,predicted_class,true_class,errors,FP_errors,FN_errors
0,"it was a long, dark day in ottawa. a timeline ...",0,0,0,0,0
1,hewlett-packard will split into two companies ...,0,0,0,0,0
2,white house lit in rainbow colors after high c...,0,0,0,0,0
3,breaking: 10 reportedly shot dead at paris hq ...,0,0,0,0,0
4,steve jobs was adopted. his biological father ...,1,1,0,0,0


In [3]:
df_pred.shape

(423, 6)

#### Add features to data

In [4]:
path_feat = '../data/Twitter_dataset/twitter1516_final.csv'
df_feat = pd.read_csv(path_feat)
df_feat.head(2)

Unnamed: 0,label,tweet_id,tweet,length,#URLs,#mentions,#hashs,verified,#followers,user_engagement,sentiment_score
0,1,489800427152879616,malaysia airlines says it lost contact with pl...,95,2,0,0,1,15375121,72.567469,-0.3182
1,1,560474897013415936,for just $1 you can get a free jr. frosty with...,118,1,1,0,1,3673898,55.294333,0.8398


#### Data cleaning

In [5]:
df_full = pd.merge(df_pred, df_feat, on=['tweet'])

# Drop duplicate rows in merged dataframe
df_full = df_full.drop_duplicates('tweet', keep='last')

# remove certain columns
df_full = df_full.drop(columns=['tweet_id','label'])

# features dataframe
features = df_full.drop(['tweet','predicted_class', 'true_class', 'errors', 'FP_errors', 'FN_errors'], axis=1)

df_full.head()

Unnamed: 0,tweet,predicted_class,true_class,errors,FP_errors,FN_errors,length,#URLs,#mentions,#hashs,verified,#followers,user_engagement,sentiment_score
0,"it was a long, dark day in ottawa. a timeline ...",0,0,0,0,0,84,2,0,1,1,1828600,102.622934,-0.4767
1,hewlett-packard will split into two companies ...,0,0,0,0,0,93,1,0,0,1,45988032,82.510668,0.0
2,white house lit in rainbow colors after high c...,0,0,0,0,0,89,1,0,0,1,16023476,155.112226,0.0
3,breaking: 10 reportedly shot dead at paris hq ...,0,0,0,0,0,80,1,0,0,1,3629813,601.280035,-0.6486
35,steve jobs was adopted. his biological father ...,1,1,0,0,0,87,0,0,0,0,3479057,253.795131,0.0


In [6]:
df_full.shape

(413, 14)

#### Data initialization

In [7]:
full_data = init_dataset(df_full,features)
full_data.head()

Unnamed: 0,predicted_class,true_class,errors,FP_errors,FN_errors,length,#URLs,#mentions,#hashs,verified,#followers,user_engagement,sentiment_score,clusters,new_clusters
0,0,0,0,0,0,-0.341015,0.990826,-0.319494,0.628861,0.57269,-0.352927,0.54874,-0.626901,0,-1
1,0,0,0,0,0,0.056461,-0.445003,-0.319494,-0.596232,0.57269,3.792159,0.266703,0.481597,0,-1
2,0,0,0,0,0,-0.120195,-0.445003,-0.319494,-0.596232,0.57269,0.979494,1.284804,0.481597,0,-1
3,0,0,0,0,0,-0.517671,-0.445003,-0.319494,-0.596232,0.57269,-0.183854,7.541476,-1.02663,0,-1
35,1,1,0,0,0,-0.208523,-1.880832,-0.319494,-0.596232,-1.746144,-0.198005,2.668648,0.481597,0,-1


### 2. Sensitivity testing
Counting average difference and p-value for each feature for 9x9x2=162 hyperparameter configurations.  

In [8]:
feat_ls = ['verified','#URLs','user_engagement','length','#hashs','#mentions','sentiment_score','#followers']
diff_ls = 8*[0]
p_val_ls = 8*[0]
df_FP_results = pd.DataFrame(
    {'feature': feat_ls,
     'difference': diff_ls,
     'p-value': p_val_ls
    })
counter_FP_cluster = 0
df_FN_results = pd.DataFrame(
    {'feature': feat_ls,
     'difference': diff_ls,
     'p-value': p_val_ls
    })
counter_FN_cluster = 0

# initial cluster split of k-means clustering
n_clusters_ls = [2,3]

# minimal splittable cluster size and acceptable cluster size between 5 and 45 per 5 
split_cluster_size_ls = np.linspace(5,45,9)
acc_cluster_size_ls = np.linspace(5,45,9)

for n in n_clusters_ls:
    
    print("number of cluster at start k-means cluster: ", n)
    
    # Clustering algorithms parameters
    clustering_paramaters = {
    "n_clusters": n,
    "init": "k-means++",
    "n_init": 20,
    "max_iter": 300
    }
    
    for i in split_cluster_size_ls:
        for j in acc_cluster_size_ls:
            
            # minimal splittable cluster size
            split_cluster_size = i
            print("minimal splittable cluster size: ", split_cluster_size)

            # minimal acceptable cluster size
            acc_cluster_size = j
            print("minimal acceptable cluster size: ", acc_cluster_size)
            
            # FPR bias scan
            df_FP = HBAC_bias_scan(full_data, 'FP', split_cluster_size, acc_cluster_size, clustering_paramaters)

            # FNR bias scan
            df_FN = HBAC_bias_scan(full_data, 'FN', split_cluster_size, acc_cluster_size, clustering_paramaters)
            
            ## iterate through identified clusters found in FPR scan
            ## printing amount of bias, number of elements in cluster and difference in feature means, 
            ## including statistical test results.
            n_clusters_FP = df_FP['clusters'].nunique()
            for c in range(0,n_clusters_FP):
                
                # get bias of cluster
                bias_FP = round(bias_acc(df_FP, 'FP', c, "clusters"), 2)
                
                if bias_FP > 0: 
                    # counter
                    counter_FP_cluster += 1
                    
                    # get cluster
                    cluster_FP = df_FP[df_FP['clusters']==c]
#                     print(f"cluster {c} has bias (FPR): " + str(bias_FP))
#                     print(f"#elements in cluster {c}:", len(cluster_FP))

                    # discriminated cluster
                    discriminated_cluster_FP = full_data[full_data['clusters']==c].drop(columns=['predicted_class', 'true_class', 'errors','clusters', 'new_clusters', 'FP_errors', 'FN_errors'])
                    not_discriminated_FP = full_data[full_data['clusters']!=c].drop(columns=['predicted_class', 'true_class', 'errors','clusters', 'new_clusters', 'FP_errors', 'FN_errors'])

                    cluster_analysis_FP = stat_df(full_data, discriminated_cluster_FP, not_discriminated_FP)
                    df_FP_results['difference'] = cluster_analysis_FP['difference'] + df_FP_results['difference']
                    df_FP_results['p-value'] = cluster_analysis_FP['p-value'] + df_FP_results['p-value']  
#                     print(cluster_analysis_FP)
                else:
                    continue
                
            ## iterate through identified clusters found in FNR scan
            ## printing amount of bias, number of elements in cluster and difference in feature means, 
            ## including statistical test results.          
            n_clusters_FN = df_FN['clusters'].nunique()
            for d in range(0,n_clusters_FN):
                
                # get bias of cluster
                bias_FN = round(bias_acc(df_FN, 'FN', d, "clusters"), 2)
                
                if bias_FN > 0: 
                    # counter
                    counter_FN_cluster += 1
                    
                    # get cluster
                    cluster_FN = df_FN[df_FN['clusters']==d]
#                     print(f"cluster {d} has bias (FNR): " + str(bias_FN))
#                     print(f"#elements in cluster {d}:", len(cluster_FN))

                    # discriminated cluster
                    discriminated_cluster_FN = full_data[full_data['clusters']==d].drop(columns=['predicted_class', 'true_class', 'errors','clusters', 'new_clusters', 'FP_errors', 'FN_errors'])
                    not_discriminated_FN = full_data[full_data['clusters']!=d].drop(columns=['predicted_class', 'true_class', 'errors','clusters', 'new_clusters', 'FP_errors', 'FN_errors'])

                    cluster_analysis_FN = stat_df(full_data, discriminated_cluster_FN, not_discriminated_FN)
                    df_FN_results['difference'] = cluster_analysis_FN['difference'] + df_FN_results['difference']
                    df_FN_results['p-value'] = cluster_analysis_FN['p-value'] + df_FN_results['p-value']
#                     print(cluster_analysis_FN)
                else: 
                    continue

number of cluster at start k-means cluster:  2
minimal splittable cluster size:  5.0
minimal acceptable cluster size:  5.0
bias FP is:  0.8862275449101796
done
bias FN is:  0.7195121951219512
done
minimal splittable cluster size:  5.0
minimal acceptable cluster size:  10.0
bias FP is:  0.8862275449101796
done
bias FN is:  0.7195121951219512
done
minimal splittable cluster size:  5.0
minimal acceptable cluster size:  15.0
bias FP is:  0.8862275449101796
done
bias FN is:  0.7195121951219512
done
minimal splittable cluster size:  5.0
minimal acceptable cluster size:  20.0
bias FP is:  0.8862275449101796
done
bias FN is:  0.7195121951219512
done
minimal splittable cluster size:  5.0
minimal acceptable cluster size:  25.0
bias FP is:  0.8862275449101796
done
bias FN is:  0.7195121951219512
done
minimal splittable cluster size:  5.0
minimal acceptable cluster size:  30.0
bias FP is:  0.8862275449101796
done
bias FN is:  0.7195121951219512
done
minimal splittable cluster size:  5.0
minimal ac

done
bias FN is:  0.7195121951219512
done
minimal splittable cluster size:  35.0
minimal acceptable cluster size:  5.0
bias FP is:  0.8862275449101796
done
bias FN is:  0.7195121951219512
done
minimal splittable cluster size:  35.0
minimal acceptable cluster size:  10.0
bias FP is:  0.8862275449101796
done
bias FN is:  0.7195121951219512
done
minimal splittable cluster size:  35.0
minimal acceptable cluster size:  15.0
bias FP is:  0.8862275449101796
done
bias FN is:  0.7195121951219512
done
minimal splittable cluster size:  35.0
minimal acceptable cluster size:  20.0
bias FP is:  0.8862275449101796
done
bias FN is:  0.7195121951219512
done
minimal splittable cluster size:  35.0
minimal acceptable cluster size:  25.0
bias FP is:  0.8862275449101796
done
bias FN is:  0.7195121951219512
done
minimal splittable cluster size:  35.0
minimal acceptable cluster size:  30.0
bias FP is:  0.8862275449101796
done
bias FN is:  0.7195121951219512
done
minimal splittable cluster size:  35.0
minimal 

done
bias FN is:  0.7195121951219512
done
minimal splittable cluster size:  20.0
minimal acceptable cluster size:  5.0
bias FP is:  0.8862275449101796
done
bias FN is:  0.7195121951219512
done
minimal splittable cluster size:  20.0
minimal acceptable cluster size:  10.0
bias FP is:  0.8862275449101796
done
bias FN is:  0.7195121951219512
done
minimal splittable cluster size:  20.0
minimal acceptable cluster size:  15.0
bias FP is:  0.8862275449101796
done
bias FN is:  0.7195121951219512
done
minimal splittable cluster size:  20.0
minimal acceptable cluster size:  20.0
bias FP is:  0.8862275449101796
done
bias FN is:  0.7195121951219512
done
minimal splittable cluster size:  20.0
minimal acceptable cluster size:  25.0
bias FP is:  0.8862275449101796
done
bias FN is:  0.7195121951219512
done
minimal splittable cluster size:  20.0
minimal acceptable cluster size:  30.0
bias FP is:  0.8862275449101796
done
bias FN is:  0.7195121951219512
done
minimal splittable cluster size:  20.0
minimal 

done
bias FN is:  0.7195121951219512
done


In [9]:
df_FP_results['difference'] = df_FP_results['difference']/counter_FP_cluster
df_FP_results['p-value'] = df_FP_results['p-value']/counter_FP_cluster
print("#cluster FPR scan: ", counter_FP_cluster)
df_FP_results

#cluster FPR scan:  2974


Unnamed: 0,feature,difference,p-value
0,verified,-0.08418,0.0
1,#URLs,0.02973,0.0
2,user_engagement,0.30086,0.01156
3,length,-0.28022,0.0259
4,#hashs,-0.16345,0.07969
5,#mentions,0.02607,0.1245
6,sentiment_score,-0.11428,0.24404
7,#followers,-0.08117,0.54327


In [10]:
df_FN_results['difference'] = df_FN_results['difference']/counter_FN_cluster
df_FN_results['p-value'] = df_FN_results['p-value']/counter_FN_cluster
print("#cluster FNR scan: ", counter_FN_cluster)
df_FN_results

#cluster FNR scan:  2506


Unnamed: 0,feature,difference,p-value
0,verified,-0.76465,0.0
1,#URLs,-0.17086,0.0
2,user_engagement,0.17602,0.00314
3,length,-0.27959,0.01789
4,#hashs,0.01235,0.05865
5,#mentions,-0.11344,0.09821
6,sentiment_score,-0.2079,0.20123
7,#followers,0.01376,0.45017
