# Anti discrimination examples
### References:
#### [1] Sara Hajian and Josep Domingo-Ferrer, "A methodology for direct and indirect discrimination prevention in data mining", IEEE Transactions on Knowledge and Data Engineering, Vol. 25, no. 7, pp. 1445-1459, Jun 2013. DOI: https://doi.org/10.1109/TKDE.2012.72

## 1. Direct and indirect discrimination prevention

In [1]:
from antiDiscrimination.src.algorithms.anti_discrimination import Anti_discrimination
from antiDiscrimination.src.entities.dataset_DataFrame import Dataset_DataFrame
from antiDiscrimination.src.utils import utils
import pandas as pd

### Adult data set (45222 records)

#### Following, it is indicated the path to the csv file containing the data set.  

In [2]:
path_csv = "./antiDiscrimination/input_datasets/adult_anti_discrimination.csv"
data_frame = utils.read_dataframe_from_csv(path_csv)

#### 1.1 The data set is loaded from a DataFrame passed as parameter

In [3]:
dataset = Dataset_DataFrame(data_frame)
dataset.description()

Loading dataset
Dataset loaded: ./antiDiscrimination/input_datasets/adult_anti_discrimination.csv
Records loaded: 45222
Dataset: ./antiDiscrimination/input_datasets/adult_anti_discrimination.csv
Dataset head:


Unnamed: 0,age,workclass,education,marital-status,occupation,relationship,race,sex,hours-per-week,native-country,prediction
0,young,Private,11th,Never-married,Machine-op-inspct,Own-child,Black,Male,full_time,United-States,<=50K
1,old,Private,HS-grad,Married-civ-spouse,Farming-fishing,Husband,White,Male,full_time,United-States,<=50K
2,young,Local-gov,Assoc-acdm,Married-civ-spouse,Protective-serv,Husband,White,Male,full_time,United-States,>50K
3,old,Private,Some-college,Married-civ-spouse,Machine-op-inspct,Husband,Black,Male,full_time,United-States,>50K
4,old,Private,10th,Never-married,Other-service,Not-in-family,White,Male,part_time,United-States,<=50K



Dataset description:
Data set: ./antiDiscrimination/input_datasets/adult_anti_discrimination.csv
Records: 45222
Attributes:


Unnamed: 0,Name
0,age
1,workclass
2,education
3,marital-status
4,occupation
5,relationship
6,race
7,sex
8,hours-per-week
9,native-country





#### 1.2 Set the anti discrimination parameters:
* min support and min confidence: to consider frequent rules
* alfa: discriminatory threshold to consider direct and indirect discriminatory rules
* DI: Predetermined disciminatory items (attribute,discriminatory value)

In [4]:
min_support = 0.02
min_confidence = 0.1
alfa = 1.20
DI = [("age","young"), ("sex","female")]
anonymization_scheme = Anti_discrimination(dataset, min_support, min_confidence, alfa, DI)

#### 1.3 Launch the direct and indirect anti discrimination process

In [5]:
anonymization_scheme.calculate_anonymization()

Anonymizing anti-discrimination via direct & indirect discrimination detection
Alfa = 1.25
Calculating FR rules...


HBox(children=(HTML(value=''), FloatProgress(value=0.0, max=7429967.0), HTML(value='')))


FR Rules: 6573
PD Rules: 1516
PND Rules: 5057
Total FR = PD + PND: 6573
Calculating RR and non_RR rules...


HBox(children=(HTML(value=''), FloatProgress(value=0.0, max=5057.0), HTML(value='')))


RR Rules: 0
Indirect alfa-discriminatory rules: 0
non RR Rules: 5057
Calculating MR and PR rules...


HBox(children=(HTML(value=''), FloatProgress(value=0.0, max=1516.0), HTML(value='')))


MR Rules: 246
PR Rules: 1270
Anonymizing anti-discrimination via direct & indirect discrimination detection
Calculating impacts...


HBox(children=(HTML(value=''), FloatProgress(value=0.0, max=45222.0), HTML(value='')))


Anonymizing...


HBox(children=(HTML(value=''), FloatProgress(value=1.0, bar_style='info', layout=Layout(width='20px'), max=1.0…




HBox(children=(HTML(value=''), FloatProgress(value=0.0, max=246.0), HTML(value='')))




#### 1.4 Calculate discrimination metrics and estimate information loss 
Metrics calculated: 
* DDPD: direct discrimination prevention degree
* DDPP: direct discrimination protection preservation
* IDPD: indirect discrimination prevention degree
* IDPP: indirect discrimination protection preservation

In [6]:
anti_discrimination_metrics = anonymization_scheme.calculate_metrics()
anti_discrimination_metrics.description()

Calculating metrics on anonymized dataset...
Calculating FR rules...


HBox(children=(HTML(value=''), FloatProgress(value=0.0, max=7429967.0), HTML(value='')))


FR Rules: 6853
PD Rules: 1624
PND Rules: 5229
Total FR = PD + PND: 6853
Calculating RR and non_RR rules...


HBox(children=(HTML(value=''), FloatProgress(value=0.0, max=5229.0), HTML(value='')))


RR Rules: 3
Indirect alfa-discriminatory rules: 3
non RR Rules: 5226
Calculating MR and PR rules...


HBox(children=(HTML(value=''), FloatProgress(value=0.0, max=1624.0), HTML(value='')))


MR Rules: 12
PR Rules: 1612
DDPD: 1.00
DDPP: 0.98
IDPD: 1.00
IDPP: 0.81


#### 1.5 The anonymized data set can be saved to a csv formated file

In [7]:
anonymization_scheme.save_anonymized_dataset("./antiDiscrimination/output_datasets/adult_anti_discrimination_anom.csv")

'Dataset saved: ./antiDiscrimination/output_datasets/adult_anti_discrimination_anom.csv'

#### 1.6 The anonymized data set can be converted to DataFrame

In [8]:
df_anonymized = anonymization_scheme.anonymized_dataset_to_dataframe()
df_anonymized.head()

Unnamed: 0,age,workclass,education,marital-status,occupation,relationship,race,sex,hours-per-week,native-country,prediction
0,young,Private,11th,Never-married,Machine-op-inspct,Own-child,Black,Male,full_time,United-States,<=50K
1,old,Private,HS-grad,Married-civ-spouse,Farming-fishing,Husband,White,Male,full_time,United-States,<=50K
2,young,Local-gov,Assoc-acdm,Married-civ-spouse,Protective-serv,Husband,White,Male,full_time,United-States,>50K
3,old,Private,Some-college,Married-civ-spouse,Machine-op-inspct,Husband,Black,Male,full_time,United-States,<=50K
4,old,Private,10th,Never-married,Other-service,Not-in-family,White,Male,part_time,United-States,<=50K


### 2.0 Adult data set (45222 records) 
Values of alfa parameter in the range from 1.0 to 1.2

In [2]:
path_csv = "./antiDiscrimination/input_datasets/adult_anti_discrimination.csv"
data_frame = utils.read_dataframe_from_csv(path_csv)

In [3]:
data_frame = utils.read_dataframe_from_csv(path_csv)
dataset = Dataset_DataFrame(data_frame)
metrics = []
min_support = 0.02
min_confidence = 0.1
DI = [("age","young"), ("sex","female")]
alfa_list = [1.0, 1.1, 1.2]
for alfa in alfa_list:
    anonymization_scheme = Anti_discrimination(dataset, min_support, min_confidence, alfa, DI)
    anonymization_scheme.calculate_anonymization()
    metrics.append(anonymization_scheme.calculate_metrics())

Loading dataset
Dataset loaded: ./antiDiscrimination/input_datasets/adult_anti_discrimination.csv
Records loaded: 45222
Anonymizing anti-discrimination via direct & indirect discrimination detection
Alfa = 1.0
Calculating FR rules...


HBox(children=(HTML(value=''), FloatProgress(value=0.0, max=7429967.0), HTML(value='')))


FR Rules: 6573
PD Rules: 1516
PND Rules: 5057
Total FR = PD + PND: 6573
Calculating RR and non_RR rules...


HBox(children=(HTML(value=''), FloatProgress(value=0.0, max=5057.0), HTML(value='')))


RR Rules: 33
Indirect alfa-discriminatory rules: 55
non RR Rules: 5024
Calculating MR and PR rules...


HBox(children=(HTML(value=''), FloatProgress(value=0.0, max=1516.0), HTML(value='')))


MR Rules: 1458
PR Rules: 58
Anonymizing anti-discrimination via direct & indirect discrimination detection
Calculating impacts...


HBox(children=(HTML(value=''), FloatProgress(value=0.0, max=45222.0), HTML(value='')))


Anonymizing...


HBox(children=(HTML(value=''), FloatProgress(value=0.0, max=33.0), HTML(value='')))




HBox(children=(HTML(value=''), FloatProgress(value=0.0, max=1458.0), HTML(value='')))


Calculating metrics on anonymized dataset...
Calculating FR rules...


HBox(children=(HTML(value=''), FloatProgress(value=0.0, max=7429967.0), HTML(value='')))


FR Rules: 7061
PD Rules: 1679
PND Rules: 5382
Total FR = PD + PND: 7061
Calculating RR and non_RR rules...


HBox(children=(HTML(value=''), FloatProgress(value=0.0, max=5382.0), HTML(value='')))


RR Rules: 190
Indirect alfa-discriminatory rules: 62467
non RR Rules: 5192
Calculating MR and PR rules...


HBox(children=(HTML(value=''), FloatProgress(value=0.0, max=1679.0), HTML(value='')))


MR Rules: 383
PR Rules: 1296
Anonymizing anti-discrimination via direct & indirect discrimination detection
Alfa = 1.1
Calculating FR rules...


HBox(children=(HTML(value=''), FloatProgress(value=0.0, max=7429967.0), HTML(value='')))


FR Rules: 6573
PD Rules: 1516
PND Rules: 5057
Total FR = PD + PND: 6573
Calculating RR and non_RR rules...


HBox(children=(HTML(value=''), FloatProgress(value=0.0, max=5057.0), HTML(value='')))


RR Rules: 0
Indirect alfa-discriminatory rules: 0
non RR Rules: 5057
Calculating MR and PR rules...


HBox(children=(HTML(value=''), FloatProgress(value=0.0, max=1516.0), HTML(value='')))


MR Rules: 459
PR Rules: 1057
Anonymizing anti-discrimination via direct & indirect discrimination detection
Calculating impacts...


HBox(children=(HTML(value=''), FloatProgress(value=0.0, max=45222.0), HTML(value='')))


Anonymizing...


HBox(children=(HTML(value=''), FloatProgress(value=1.0, bar_style='info', layout=Layout(width='20px'), max=1.0…




HBox(children=(HTML(value=''), FloatProgress(value=0.0, max=459.0), HTML(value='')))


Calculating metrics on anonymized dataset...
Calculating FR rules...


HBox(children=(HTML(value=''), FloatProgress(value=0.0, max=7429967.0), HTML(value='')))


FR Rules: 6889
PD Rules: 1670
PND Rules: 5219
Total FR = PD + PND: 6889
Calculating RR and non_RR rules...


HBox(children=(HTML(value=''), FloatProgress(value=0.0, max=5219.0), HTML(value='')))


RR Rules: 0
Indirect alfa-discriminatory rules: 0
non RR Rules: 5219
Calculating MR and PR rules...


HBox(children=(HTML(value=''), FloatProgress(value=0.0, max=1670.0), HTML(value='')))


MR Rules: 0
PR Rules: 1670
Anonymizing anti-discrimination via direct & indirect discrimination detection
Alfa = 1.2
Calculating FR rules...


HBox(children=(HTML(value=''), FloatProgress(value=0.0, max=7429967.0), HTML(value='')))


FR Rules: 6573
PD Rules: 1516
PND Rules: 5057
Total FR = PD + PND: 6573
Calculating RR and non_RR rules...


HBox(children=(HTML(value=''), FloatProgress(value=0.0, max=5057.0), HTML(value='')))


RR Rules: 0
Indirect alfa-discriminatory rules: 0
non RR Rules: 5057
Calculating MR and PR rules...


HBox(children=(HTML(value=''), FloatProgress(value=0.0, max=1516.0), HTML(value='')))


MR Rules: 337
PR Rules: 1179
Anonymizing anti-discrimination via direct & indirect discrimination detection
Calculating impacts...


HBox(children=(HTML(value=''), FloatProgress(value=0.0, max=45222.0), HTML(value='')))


Anonymizing...


HBox(children=(HTML(value=''), FloatProgress(value=1.0, bar_style='info', layout=Layout(width='20px'), max=1.0…




HBox(children=(HTML(value=''), FloatProgress(value=0.0, max=337.0), HTML(value='')))


Calculating metrics on anonymized dataset...
Calculating FR rules...


HBox(children=(HTML(value=''), FloatProgress(value=0.0, max=7429967.0), HTML(value='')))


FR Rules: 6912
PD Rules: 1637
PND Rules: 5275
Total FR = PD + PND: 6912
Calculating RR and non_RR rules...


HBox(children=(HTML(value=''), FloatProgress(value=0.0, max=5275.0), HTML(value='')))


RR Rules: 1
Indirect alfa-discriminatory rules: 1
non RR Rules: 5274
Calculating MR and PR rules...


HBox(children=(HTML(value=''), FloatProgress(value=0.0, max=1637.0), HTML(value='')))


MR Rules: 11
PR Rules: 1626


In [4]:
table = []
for i in range(len(metrics)):
    DDPD = round(metrics[i].DDPD, 2)
    DDPP = round(metrics[i].DDPP, 2)
    IDPD = round(metrics[i].IDPD, 2)
    IDPP = round(metrics[i].IDPP, 2)
    row = [alfa_list[i], DDPD, DDPP, IDPD, IDPP]
    table.append(row)
df = pd.DataFrame(table, columns=["Alfa", "DDPD", "DDPP", "IDPD", "IDPP"])
display(df)

Unnamed: 0,Alfa,DDPD,DDPP,IDPD,IDPP
0,1.0,0.8,0.45,1.0,0.62
1,1.1,1.0,0.98,1.0,0.64
2,1.2,1.0,0.98,1.0,0.75
