### Test Swedish Social Security Agency's Fairness Tests

This notebook tests the 2017 dataset we obtained against the fairness procedure outlined by the Swedish Social Security Agency. The agency refused to provide evidence that this procedure was ever actually deployed. 

Author: Gabriel Geiger <br>


In [1]:
import pandas as pd
import os

BASE_PATH = os.getcwd() + "/"
RAW_DATA_PATH = BASE_PATH + "raw_data/"
PROCESSED_DATA_PATH = BASE_PATH + "processed_data/"

### Load Data

Loads data from an Excel file where each sheet is a demographic category. To comply with GDPR, the ISF disclosed two versions of the dataset, one with 2 decimals for the risk score with a small number of rows removed and another with 1 decimal for the risk score with no rows removed. We use the version with 1 decimal because the risk score is not necessary for this analysis. 

In [2]:
"""
Load processed data stored Excel file.
@param filename: The name of the file to load (data_english or data_swedish)
@param path: The path to the file (default is RAW_DATA_PATH)

@return tables: A dictionary where each key is the category (e.g. gender) and each value is a Dataframe of the corresponding table. 
"""
def load_data(filename, path = RAW_DATA_PATH) -> dict[str : pd.DataFrame] :
  print("Loading data from {f}... \n".format(f=path))
  tables = {}

  excel = pd.ExcelFile(path + filename)
  sheet_names = excel.sheet_names

  for sheet_name in sheet_names :

    # We only want to get the tables with 1 decimal (e.g. no rows removed)
    if "1" not in sheet_name :
      continue

    df = excel.parse(sheet_name)

    tables[sheet_name] = df
    print("Table '{t}' loaded with shape {s}".format(t=sheet_name,s=df.shape))

  return tables

raw_tables = load_data("data_english.xlsx")

Loading data from c:\Users\gabri\Desktop\Sweden_Fairness_v2/raw_data/... 

Table 'Gender 1 Decimal' loaded with shape (6129, 4)
Table 'Income 1 Decimal' loaded with shape (6129, 4)
Table 'Education 1 Decimal' loaded with shape (6129, 4)
Table 'Foreign 1 Decimal' loaded with shape (6129, 5)


### Data Processing 

We conduct a few small processing steps to make our analysis easier. 

- We create education into a binary categorical variable between low (no university degree) and high (has university degree)
- We make a split between lower and higher income based on the median income. 
- We merge labels into "Errors found" and "No errors found." In practice, there are a few other smaller labels. 
- We merge selection methods 'high risk', 'high high risk' and 'follow-up control' into 'algorithm.' All of these selection methods involved being selected by the algorithm, but 'high high risk' selection defines the highest risk scores and 'follow-up control' are people who were previously selected by the algorithm who are checked again.  

In [3]:
"""
This function runs some basic preprocessing steps on the data. 

@input tables: A dictionary where each key is a category and each value is a dataframe
@return tables: The same as input, but with all dataframe correctly formatted
"""

def process_data(tables:dict) -> dict[str : pd.DataFrame] : 

    # Split education into a high and low education
    ed_table = tables["Education 1 Decimal"]
    ed_table["Education Level"] = ed_table["Education"].apply(
        lambda e: "Low Education" if e <= 3.0 else "High Education"
    )
    tables["Education 1 Decimal"] = ed_table

    # Split Income into a high and low income based on the median. 
    income_table = tables["Income 1 Decimal"]
    median_income = income_table["Income"].median()

    income_table["Income Level"] = income_table["Income"].apply(
        lambda i: "High Income" if i >= median_income else "Low Income"
    )
    tables["Income 1 Decimal"] = income_table

    # Merge labels into "No Errors Found" and "Errors Found"
    for key,table in tables.items() : 

        table["Result"] = table["Result"].apply(
            lambda r : "No Errors Found" if r == "No Errors Found" else "Errors Found"
        )

        tables[key] = table
    
    return tables

tables = process_data(raw_tables) 

### First Step of the Swedish Social Security Agency's fairness procedure. 

The first step of the fairness procedure checks if a group's proportion has halved, doubled, or changed by 30 percentage points. 

In [None]:

"""
Runs the first step for the agency's own fairness evaluation 
"""
def run_fk_test_1(table:pd.DataFrame,column) : 

    unique_values = table[column].unique()

    results_dict = {
        "category":[],
        "class":[],
        "percentage_random":[],
        "percentage_algorithm":[],
        "percentage_difference":[],
        "over_under_representation":[],
        "pass_fail":[]
    }

    for category in unique_values : 

        # Split by algorithm or random sample
        algorithm_sample = table[table["Selection Method"] != "Random"]
        random_sample = table[table["Selection Method"] == "Random"]

        # Restrict algorithm and random sample by class (e.g. men)
        n_class_algo = algorithm_sample[algorithm_sample[column] == category]
        n_class_random = random_sample[random_sample[column] == category]

        # Get the percentage of the class in the algorithm and random sample
        proportion_random = (len(n_class_random) / len(random_sample)) * 100 
        proportion_algo =  (len(n_class_algo) / len(algorithm_sample)) * 100 

        pass_test = True 

        results_dict['category'].append(column)
        results_dict['class'].append(category)
        results_dict['percentage_random'].append(round(proportion_random,2))
        results_dict['percentage_algorithm'].append(round(proportion_algo,2))
        results_dict['percentage_difference'].append(round(proportion_algo - proportion_random,2))
        results_dict['over_under_representation'].append(round(proportion_algo / proportion_random,2))
        
        # Check if the proportion of a class has doubled 
        if proportion_algo >= (2 * proportion_random) : 
            pass_test = False 
        
        # Check if the proportion of a class has halved 
        if proportion_algo <= (0.5 * proportion_random) : 
            pass_test = False 
        
        # Check if the proportion of a class has changed by 30 percentage points
        if abs(proportion_algo - proportion_random) >= 30 : 
            pass_test = False 
        
        results_dict['pass_fail'].append(pass_test)
    
    return pd.DataFrame(results_dict)

results_dfs = []

for category, table in tables.items() : 

    category_name = table.columns[-1]

    df = run_fk_test_1(table,category_name)

    results_dfs.append(df)

final_df = pd.concat(results_dfs)

final_df.reset_index(inplace=True)
final_df.drop(columns=["index"],inplace=True)

final_df.to_excel("results/fk_results_step_1.xlsx")

final_df

Unnamed: 0,category,class,percentage_random,percentage_algorithm,percentage_difference,over_under_representation,pass_fail
0,Gender,M,43.94,32.49,-11.45,0.74,True
1,Gender,K,56.06,67.51,11.45,1.2,True
2,Income Level,Low Income,25.69,50.83,25.13,1.98,True
3,Income Level,High Income,74.31,49.17,-25.13,0.66,True
4,Education Level,Low Education,52.72,78.57,25.85,1.49,True
5,Education Level,High Education,47.28,21.43,-25.85,0.45,False
6,Foreign Background,0,76.31,56.87,-19.45,0.75,True
7,Foreign Background,1,23.69,43.13,19.45,1.82,True


### Second Step of the Swedish Social Security Agency's Fairness Procedure 

The agency compares its share in the risk-based sample to its share in true positives detected by the model. If there is a difference of more than 10 percentage points, the test is failed.

In [None]:
"""
Runs the second step of the agency's own fairness evaluation 
"""
def run_fk_test_2(table:pd.DataFrame,column) : 

    unique_values = table[column].unique()

    results_dict = {
        "category":[],
        "class":[],
        "percentage_algorithm":[],
        "percentage_mistakes_algorithm":[],
        "percentage_difference":[],
        "pass_fail":[]
    }

    for category in unique_values : 

        # Restrict to algorithm sample 
        algorithm_sample = table[table["Selection Method"] != "Random"]

        # Get the number of people from the class in that sample 
        n_class_algo = len(algorithm_sample[algorithm_sample[column] == category])
        
        # Create dataframe containing only people flagged by the algorithm who made a mistake and get the number 
        errors_algo = algorithm_sample[algorithm_sample["Result"] != "No Errors Found"]
        n_errors_class = len(errors_algo[errors_algo[column] == category])

        # Proportion of class in the algo sample 
        prop_class_algo = round((n_class_algo / len(algorithm_sample)) * 100,2)

        # Proportion of class in errors in the algo sample 
        prop_class_errors = round((n_errors_class / len(errors_algo)) * 100,2)

        results_dict['category'].append(column)
        results_dict['class'].append(category)
        results_dict['percentage_algorithm'].append(prop_class_algo)
        results_dict['percentage_mistakes_algorithm'].append(prop_class_errors)
        results_dict['percentage_difference'].append(round(prop_class_errors - prop_class_algo,2))

        pass_test = True 
        
        # Check if the proportion of a class in people flagged by the algorithm vs mistakes detected by algorithm is greater than ten
        if abs(prop_class_algo - prop_class_errors) >= 10 : 
            pass_test = False 
        
        results_dict['pass_fail'].append(pass_test)

    return pd.DataFrame(results_dict)

results_dfs = []

for category, table in tables.items() : 

    category_name = table.columns[-1]

    df = run_fk_test_2(table,category_name)

    results_dfs.append(df)

final_df = pd.concat(results_dfs)

final_df.reset_index(inplace=True)
final_df.drop(columns=["index"],inplace=True)

final_df.to_excel("results/fk_results_step_2.xlsx")

final_df

Unnamed: 0,category,class,percentage_algorithm,percentage_mistakes_algorithm,percentage_difference,pass_fail
0,Gender,M,32.49,34.31,1.82,True
1,Gender,K,67.51,65.69,-1.82,True
2,Income Level,Low Income,50.83,52.96,2.13,True
3,Income Level,High Income,49.17,47.04,-2.13,True
4,Education Level,Low Education,78.57,81.23,2.66,True
5,Education Level,High Education,21.43,18.77,-2.66,True
6,Foreign Background,0,56.87,52.44,-4.43,True
7,Foreign Background,1,43.13,47.56,4.43,True
