# AI Automation for AI Fairness
When AI models contribute to high-impact decisions such as whether or not someone gets a loan, we want them to be fair. Unfortunately, in current practice, AI models are often optimized primarily for accuracy, with little consideration for fairness. This notebook gives a hands-on example for how AI Automation can help build AI models that are both accurate and fair. This notebook is written for data scientists who have some familiarity with Python. No prior knowledge of AI Automation or AI Fairness is required, we will introduce the relevant concepts as we get to them.

Bias in data leads to bias in models. AI models are increasingly consulted for consequential decisions about people, in domains including credit loans, hiring and retention, penal justice, medical, and more. Often, the model is trained from past decisions made by humans. If the decisions used for training were discriminatory, then your trained model will be too, unless you are careful. Being careful about bias is something you should do as a data scientist. Fortunately, you do not have to grapple with this issue alone. You can consult others about ethics. You can also ask yourself how your AI model may affect your (or your institution's) reputation. And ultimately, you must follow applicable laws and regulations.

AI Fairness can be measured via several metrics, and you need to select the appropriate metrics based on the circumstances. For illustration purposes, this notebook uses one particular fairness metric called disparate impact. Disparate impact is defined as the ratio of the rate of favorable outcome for the unprivileged group to that of the privileged group. To make this definition more concrete, consider the case where a favorable outcome means getting a loan, the unprivileged group is women, and the privileged group is men. Then if your AI model were to let women get a loan in 30% of the cases and men in 60% of the cases, the disparate impact would be 30% / 60% = 0.5, indicating a gender bias towards men. The ideal value for disparate impact is 1, and you could define fairness for this metric as a band around 1, e.g., from 0.8 to 1.25.

To get the best performance out of your AI model, you must experiment with its configuration. This means searching a high-dimensional space where some options are categorical, some are continuous, and some are even conditional. No configuration is optimal for all domains let alone all metrics, and searching them all by hand is impossible. In fact, in a high-dimensional space, even exhaustively enumerating all the valid combinations soon becomes impractical. Fortunately, you can use tools to automate the search, thus making you more productive at finding good models quickly. These productivity and quality improvements become compounded when you have to do the search over.

AI Automation is a technology that assists data scientists in building AI models by automating some of the tedious steps. One AI automation technique is algorithm selection , which automatically chooses among alternative algorithms for a particular task. Another AI automation technique is hyperparameter tuning , which automatically configures the arguments of AI algorithms. You can use AI automation to optimize for a variety of metrics. This notebook shows you how to use AI automation to optimize for both accuracy and for fairness as measured by disparate impact.

This Jupyter notebook uses the following open-source Python libraries. AIF360 is a collection of fairness metrics and bias mitigation algorithms. The pandas and scikit-learn libraries support data analysis and machine learning with data structures and a comprehensive collection of AI algorithms. The hyperopt library implements both algorithm selection and hyperparameter tuning for AI automation. And Lale is a library for semi-automated data science; this notebook uses Lale as the backbone for putting the other libraries together.

Our starting point is a dataset and a task. For illustration purposes, we picked credit-g, also known as the German Credit dataset. Each row describes a person using several features that may help evaluate them as a potential loan applicant. The task is to classify people into either good or bad credit risks. We load the version of the dataset from OpenML along with some fairness metadata.

In [1]:
from lale.lib.aif360 import fetch_creditg_df
all_X, all_y, fairness_info = fetch_creditg_df(preprocess=True)

# 
To see what the dataset looks like, we can use off-the-shelf functionality from pandas for inspecting a few rows. The creditg dataset has a single label column, class, to be pr

In [2]:
import pandas as pd
pd.options.display.max_columns = None
pd.concat([all_y, all_X], axis=1)

Unnamed: 0,class,checking_status_0<=X<200,checking_status_<0,checking_status_>=200,checking_status_no checking,credit_history_all paid,credit_history_critical/other existing credit,credit_history_delayed previously,credit_history_existing paid,credit_history_no credits/all paid,purpose_business,purpose_domestic appliance,purpose_education,purpose_furniture/equipment,purpose_new car,purpose_other,purpose_radio/tv,purpose_repairs,purpose_retraining,purpose_used car,savings_status_100<=X<500,savings_status_500<=X<1000,savings_status_<100,savings_status_>=1000,savings_status_no known savings,employment_1<=X<4,employment_4<=X<7,employment_<1,employment_>=7,employment_unemployed,other_parties_co applicant,other_parties_guarantor,other_parties_none,property_magnitude_car,property_magnitude_life insurance,property_magnitude_no known property,property_magnitude_real estate,other_payment_plans_bank,other_payment_plans_none,other_payment_plans_stores,housing_for free,housing_own,housing_rent,job_high qualif/self emp/mgmt,job_skilled,job_unemp/unskilled non res,job_unskilled resident,own_telephone_none,own_telephone_yes,foreign_worker_no,foreign_worker_yes,duration,credit_amount,installment_commitment,residence_since,age,existing_credits,num_dependents,sex
0,1,0.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,1.0,6.0,1169.0,4.0,4.0,1.0,2.0,1.0,1.0
1,0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0,0.0,1.0,48.0,5951.0,2.0,2.0,0.0,1.0,1.0,0.0
2,1,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,1.0,0.0,0.0,1.0,12.0,2096.0,2.0,3.0,1.0,1.0,2.0,1.0
3,1,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0,0.0,1.0,42.0,7882.0,2.0,4.0,1.0,1.0,2.0,1.0
4,0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0,0.0,1.0,24.0,4870.0,3.0,4.0,1.0,2.0,2.0,1.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
995,1,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,1.0,0.0,0.0,1.0,12.0,1736.0,3.0,4.0,1.0,1.0,1.0,0.0
996,1,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,30.0,3857.0,4.0,4.0,1.0,1.0,1.0,1.0
997,1,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0,0.0,1.0,12.0,804.0,4.0,4.0,1.0,1.0,1.0,1.0
998,0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,1.0,45.0,1845.0,4.0,4.0,0.0,1.0,1.0,1.0


In [3]:
fairness_info

{'favorable_labels': [1],
 'protected_attributes': [{'feature': 'sex', 'reference_group': [1]},
  {'feature': 'age', 'reference_group': [1]}]}

In [4]:
import lale.pretty_print
lale.pretty_print.ipython_display(fairness_info)

```python
{
    "favorable_labels": [1],
    "protected_attributes": [
        {"feature": "sex", "reference_group": [1]},
        {"feature": "age", "reference_group": [1]},
    ],
}
```

# 
A best practice for any machine-learning experiments is to split the data into a training set and a hold-out set. Doing so helps detect and prevent over-fitting. The fairness information induces groups in the dataset by outcomes and by privileged groups. We want the distribution of these groups to be similar for the training set and the holdout set. Therefore, we split the data in a stratified way.

In [5]:
from lale.lib.aif360 import fair_stratified_train_test_split
train_X, test_X, train_y, test_y = fair_stratified_train_test_split(
    all_X, all_y, **fairness_info, test_size=0.33, random_state=42)

# 
Let's use the disparate_impact metric to measure how biased the training data and the test data are. At 0.75 and 0.73, they are far from the ideal value of 1.0.

In [6]:
from lale.lib.aif360 import disparate_impact
disparate_impact_scorer = disparate_impact(**fairness_info)
print("disparate impact of training data {:.2f}, test data {:.2f}".format(
    disparate_impact_scorer.scoring(X=train_X, y_pred=train_y),
    disparate_impact_scorer.scoring(X=test_X, y_pred=test_y)))

disparate impact of training data 0.75, test data 0.73


# 
Before we look at how to train a classifier that is optimized for both accuracy and disparate impact, we will set a baseline, by training a pipeline that is only optimized for accuracy. For this purpose, we import a few algorithms from scikit-learn and Lale: Project picks a subset of the feature columns, OneHotEncoder turns categoricals into numbers, ConcatFeatures combines sets of feature columns, and the three interpretable classifiers LR, Tree, and KNN make predictions.

In [7]:
from lale.lib.lale import Project
from sklearn.preprocessing import OneHotEncoder
from lale.lib.lale import ConcatFeatures
from lale.lib.aif360 import PrejudiceRemover
from sklearn.linear_model import LogisticRegression as LR
from lale.lib.aif360 import logisticaix360
from sklearn.tree import DecisionTreeClassifier as Tree

In [8]:
import lale
lale.wrap_imported_operators()
prep_to_numbers = (
    (Project(columns={"type": "string"}) >> OneHotEncoder(handle_unknown="ignore"))
    & Project(columns={"type": "number"})
    ) >> ConcatFeatures
planned_orig = prep_to_numbers >> (PrejudiceRemover(**fairness_info)| logisticaix360(**fairness_info) |LR)

In [9]:
#Accuracy
from lale.lib.lale import Hyperopt
best_estimator = planned_orig.auto_configure(
    train_X, train_y, optimizer=Hyperopt,verbose=True, cv=3, max_evals=10)

100%|███████████████████████████████████████████████| 10/10 [04:14<00:00, 25.48s/trial, best loss: -0.7418388319453343]


In [10]:
best_estimator.pretty_print(ipython_display=True, show_imports=False)

```python
project_0 = Project(columns={"type": "string"})
one_hot_encoder = OneHotEncoder(handle_unknown="ignore")
project_1 = Project(columns={"type": "number"})
lr = LR(
    fit_intercept=False,
    intercept_scaling=0.5926196753055677,
    max_iter=876,
    tol=0.007753646034635036,
)
pipeline = (
    ((project_0 >> one_hot_encoder) & project_1) >> ConcatFeatures() >> lr
)
```

# 
Logistic regression  model gives the best result for the above scorer among LR|PrejudiceRemover| logisticaix360 

In [11]:
import sklearn.metrics
accuracy_scorer = sklearn.metrics.make_scorer(sklearn.metrics.accuracy_score)
print(f'accuracy {accuracy_scorer(best_estimator, test_X, test_y):.1%}')

accuracy 73.0%


# 
However, we would like our model to be not just accurate but also fair. We can use the same disparate_impact_scorer from before to evaluate the fairness of best_estimator.

In [12]:
print(f'disparate impact {disparate_impact_scorer(best_estimator, test_X, test_y):.2f}')

disparate impact 0.76


In [13]:
planned_fairer =  LR|PrejudiceRemover(**fairness_info)| logisticaix360(**fairness_info) 

In [14]:
from lale.lib.aif360 import accuracy_and_disparate_impact
combined_scorer = accuracy_and_disparate_impact(**fairness_info)

In [15]:
combined_scorer

<lale.lib.aif360.util._AccuracyAndDisparateImpact at 0x1e961db8ca0>

In [16]:
from lale.lib.aif360 import FairStratifiedKFold
fair_cv = FairStratifiedKFold(**fairness_info, n_splits=3)

In [17]:
trained_fairer = planned_fairer.auto_configure(
    train_X, train_y, optimizer=Hyperopt, cv=fair_cv,verbose=True,
    max_evals=10, scoring=combined_scorer, best_score=1.0)


  0%|                                                                           | 0/10 [00:00<?, ?trial/s, best loss=?]

there are 0 positives in the privileged group

there are 0 positives in the unprivileged group

The metric disparate_impact is ill-defined and returns 0.0. Check your fairness configuration. The set of predicted labels is {0}.

there are 0 positives in the privileged group

there are 0 positives in the unprivileged group

The metric disparate_impact is ill-defined and returns 0.0. Check your fairness configuration. The set of predicted labels is {0}.



 40%|███████████████████▏                            | 4/10 [01:13<01:21, 13.57s/trial, best loss: 0.30000133461456324]

there are 0 positives in the privileged group

there are 0 positives in the unprivileged group

The metric disparate_impact is ill-defined and returns 0.0. Check your fairness configuration. The set of predicted labels is {0}.

there are 0 positives in the privileged group

there are 0 positives in the unprivileged group

The metric disparate_impact is ill-defined and returns 0.0. Check your fairness configuration. The set of predicted labels is {0}.



 50%|████████████████████████                        | 5/10 [01:51<01:53, 22.60s/trial, best loss: 0.30000133461456324]

there are 0 positives in the privileged group

there are 0 positives in the unprivileged group

The metric disparate_impact is ill-defined and returns 0.0. Check your fairness configuration. The set of predicted labels is {0}.

there are 0 positives in the privileged group

there are 0 positives in the unprivileged group

The metric disparate_impact is ill-defined and returns 0.0. Check your fairness configuration. The set of predicted labels is {0}.



100%|███████████████████████████████████████████████| 10/10 [02:55<00:00, 17.50s/trial, best loss: 0.30000133461456324]


In [18]:
print(f'accuracy {accuracy_scorer(trained_fairer, test_X, test_y):.1%}')
print(f'disparate impact {disparate_impact_scorer(trained_fairer, test_X, test_y):.2f}')
#trained_fairer.visualize()

accuracy 70.0%
disparate impact 1.00


In [19]:
trained_fairer.pretty_print(ipython_display=True, show_imports=False)

```python
pipeline = logisticaix360(
    favorable_labels=[1],
    protected_attributes=[
        {"feature": "sex", "reference_group": [1]},
        {"feature": "age", "reference_group": [1]},
    ],
    lambda0=28061.60432549893,
    lambda1=19696.767526190495,
)
```

# 
Logisticaix360 works best for the above scorer among LR|PrejudiceRemover| logisticaix360

In [20]:
from lale.lib.aif360 import DisparateImpactRemover
lale.pretty_print.ipython_display(
    DisparateImpactRemover.hyperparam_schema('repair_level'))

```python
{
    "description": "Repair amount from 0 = none to 1 = full.",
    "type": "number",
    "minimum": 0,
    "maximum": 1,
    "default": 1,
}
```

In [21]:
#Disparate Impact
di_remover = DisparateImpactRemover(
    **fairness_info, preparation=prep_to_numbers)

planned_fairer = (di_remover >> (Tree|LR))|PrejudiceRemover(**fairness_info)| logisticaix360(**fairness_info)


In [22]:

from lale.lib.aif360 import disparate_impact
combined_scorer1 = disparate_impact(**fairness_info)

In [23]:
from lale.lib.aif360 import FairStratifiedKFold
fair_cv1 = FairStratifiedKFold(**fairness_info, n_splits=3)

In [24]:
trained_fairer1 = planned_fairer.auto_configure(
    train_X, train_y, optimizer=Hyperopt, cv=fair_cv1,verbose=True,
    max_evals=10, scoring=combined_scorer1, best_score=1.0)


  0%|                                                                           | 0/10 [00:00<?, ?trial/s, best loss=?]

there are 0 positives in the privileged group

there are 0 positives in the unprivileged group

The metric disparate_impact is ill-defined and returns 0.0. Check your fairness configuration. The set of predicted labels is {0}.

there are 0 positives in the privileged group

there are 0 positives in the unprivileged group

The metric disparate_impact is ill-defined and returns 0.0. Check your fairness configuration. The set of predicted labels is {0}.



 40%|█████████████████████████▌                                      | 4/10 [01:17<01:53, 18.98s/trial, best loss: 0.0]

there are 0 positives in the privileged group

there are 0 positives in the unprivileged group

The metric disparate_impact is ill-defined and returns 0.0. Check your fairness configuration. The set of predicted labels is {0}.

there are 0 positives in the privileged group

there are 0 positives in the unprivileged group

The metric disparate_impact is ill-defined and returns 0.0. Check your fairness configuration. The set of predicted labels is {0}.



 50%|████████████████████████████████                                | 5/10 [01:40<01:43, 20.63s/trial, best loss: 0.0]

there are 0 positives in the privileged group

there are 0 positives in the unprivileged group

The metric disparate_impact is ill-defined and returns 0.0. Check your fairness configuration. The set of predicted labels is {0}.

there are 0 positives in the privileged group

there are 0 positives in the unprivileged group

The metric disparate_impact is ill-defined and returns 0.0. Check your fairness configuration. The set of predicted labels is {0}.



100%|███████████████████████████████████████████████████████████████| 10/10 [02:28<00:00, 14.89s/trial, best loss: 0.0]


In [25]:
print(f'accuracy {accuracy_scorer(trained_fairer1, test_X, test_y):.1%}')
print(f'disparate impact {disparate_impact_scorer(trained_fairer1, test_X, test_y):.2f}')

accuracy 70.0%
disparate impact 1.00


In [26]:
trained_fairer1.pretty_print(ipython_display=True, show_imports=False)

```python
pipeline = logisticaix360(
    favorable_labels=[1],
    protected_attributes=[
        {"feature": "sex", "reference_group": [1]},
        {"feature": "age", "reference_group": [1]},
    ],
    lambda0=28061.60432549893,
    lambda1=19696.767526190495,
)
```

# 
logisticaix360 works best for(max_evals=10) the above acorer among LR|PrejudiceRemover| logisticaix360

In [27]:
from lale.lib.aif360 import average_odds_difference
combined_scorer2 = average_odds_difference(**fairness_info)

In [28]:
from lale.lib.aif360 import FairStratifiedKFold
fair_cv2 = FairStratifiedKFold(**fairness_info, n_splits=3)

In [29]:
planned_fairer =  LR|PrejudiceRemover(**fairness_info)| logisticaix360(**fairness_info)

In [30]:
trained_fairer2 = planned_fairer.auto_configure(
    train_X, train_y, optimizer=Hyperopt, cv=fair_cv2,verbose=True,
    max_evals=10, scoring=combined_scorer2, best_score=1.0)

100%|███████████████████████████████████████████████████████████████| 10/10 [01:13<00:00,  7.32s/trial, best loss: 1.0]


# 
average odds difference is the average difference of the false  positive rate (false positives/negatives) and true  positive rate (true positives/positives) between  unprivileged and privileged groups.

In [31]:
average_odds_difference_scorer = average_odds_difference(**fairness_info)

In [32]:
print(f'average_odds_difference_scorer {average_odds_difference_scorer(trained_fairer2, test_X, test_y):.1%}')


average_odds_difference_scorer 0.0%


In [33]:
trained_fairer2.pretty_print(ipython_display=True, show_imports=False)


```python
pipeline = PrejudiceRemover(
    favorable_labels=[1],
    protected_attributes=[
        {"feature": "sex", "reference_group": [1]},
        {"feature": "age", "reference_group": [1]},
    ],
    eta=27283.69166514559,
)
```

# 
PrejudiceRemover model works best for (max_evals=10) the above scorer among LR|PrejudiceRemover| logisticaix360 

In [34]:
from lale.lib.aif360 import average_abs_odds_difference
from lale.lib.lale import Hyperopt
combined_scorer9 = average_abs_odds_difference(**fairness_info)
from lale.lib.aif360 import FairStratifiedKFold
fair_cv9 = FairStratifiedKFold(**fairness_info, n_splits=3)
planned_fairer9 =  LR|PrejudiceRemover(**fairness_info)| logisticaix360(**fairness_info)
trained_fairer9 = planned_fairer9.auto_configure(
    train_X, train_y, optimizer=Hyperopt, cv=fair_cv9,verbose=True,
    max_evals=10, scoring=combined_scorer9, best_score=1.0)
average_abs_odds_difference_scorer = average_abs_odds_difference(**fairness_info)
print(f'average_abs_odds_difference_scorer {average_abs_odds_difference_scorer(trained_fairer9, test_X, test_y):.1%}')
trained_fairer9.pretty_print(ipython_display=True, show_imports=False)


100%|█████████████████████████████████████████████████| 10/10 [01:13<00:00,  7.37s/trial, best loss: 0.790953120953121]
average_abs_odds_difference_scorer 14.2%


```python
pipeline = LR(
    fit_intercept=False,
    intercept_scaling=0.3240599822843736,
    max_iter=839,
    solver="newton-cg",
    tol=0.009200093064280898,
)
```

# 
LR model  works best  for the above scorer among LR|PrejudiceRemover| logisticaix360

In [35]:
from lale.lib.aif360 import equal_opportunity_difference
combined_scorer3 = equal_opportunity_difference(**fairness_info)

In [36]:
combined_scorer3

<lale.lib.aif360.util._ScorerFactory at 0x1e961dd3a30>

In [37]:
from lale.lib.aif360 import FairStratifiedKFold
fair_cv3 = FairStratifiedKFold(**fairness_info, n_splits=3)

In [38]:
planned_fairer =  LR|PrejudiceRemover(**fairness_info)| logisticaix360(**fairness_info)

In [39]:
trained_fairer2 = planned_fairer.auto_configure(
    train_X, train_y, optimizer=Hyperopt, cv=fair_cv2,verbose=True,
    max_evals=10, scoring=combined_scorer3, best_score=1.0)

100%|████████████████████████████████████████████████| 10/10 [01:13<00:00,  7.33s/trial, best loss: 0.9943589743589744]


# 
equal opportunity difference is the difference  of true positive rates between  the unprivileged  and privileged  groups.

In [40]:
equal_opportunity_difference_scorer = equal_opportunity_difference(**fairness_info)
print(f'equal_opportunity_difference_scorer {equal_opportunity_difference_scorer(trained_fairer2, test_X, test_y):.1%}')

equal_opportunity_difference_scorer 0.0%


In [41]:
trained_fairer2.pretty_print(ipython_display=True, show_imports=False)

```python
pipeline = LR(
    dual=True,
    fit_intercept=False,
    intercept_scaling=0.45366534380629886,
    max_iter=196,
    multi_class="ovr",
    solver="liblinear",
    tol=0.0002093662203197455,
)
```

# 
 LR model works best for(max_evals=10) the above scorer among LR|PrejudiceRemover| logisticaix360 

In [42]:
from lale.lib.aif360 import r2_and_disparate_impact
combined_scorer4 = r2_and_disparate_impact(**fairness_info)

In [43]:
from lale.lib.aif360 import FairStratifiedKFold
fair_cv4 = FairStratifiedKFold(**fairness_info, n_splits=3)

In [44]:
planned_fairer4 =  LR|PrejudiceRemover(**fairness_info)| logisticaix360(**fairness_info)

In [45]:
trained_fairer4 = planned_fairer4.auto_configure(
    train_X, train_y, optimizer=Hyperopt, cv=fair_cv4,verbose=True,
    max_evals=10, scoring=combined_scorer4, best_score=1.0)

  0%|                                                                           | 0/10 [00:00<?, ?trial/s, best loss=?]

there are 0 positives in the privileged group

there are 0 positives in the unprivileged group

The metric disparate_impact is ill-defined and returns 0.0. Check your fairness configuration. The set of predicted labels is {0}.

unexpected result -0.42948717948717885 for r2 -0.42948717948717907

there are 0 positives in the privileged group

there are 0 positives in the unprivileged group

The metric disparate_impact is ill-defined and returns 0.0. Check your fairness configuration. The set of predicted labels is {0}.



 10%|████▌                                        | 1/10 [00:22<03:26, 22.99s/trial, best loss: 2.2685489775901924e+38]

unexpected result -0.4267515923566876 for r2 -0.4267515923566878

unexpected result -0.42948717948717885 for r2 -0.42948717948717907



 20%|█████████▊                                       | 2/10 [00:28<01:40, 12.57s/trial, best loss: 1.4285753171103486]

unexpected result -0.4267515923566876 for r2 -0.4267515923566878

unexpected result -0.42948717948717885 for r2 -0.42948717948717907



 30%|██████████████▋                                  | 3/10 [00:33<01:04,  9.27s/trial, best loss: 1.4285753171103486]

unexpected result -0.4267515923566876 for r2 -0.4267515923566878

unexpected result -0.42948717948717885 for r2 -0.42948717948717907



 40%|███████████████████▌                             | 4/10 [00:34<00:35,  5.88s/trial, best loss: 1.4285753171103486]

there are 0 positives in the privileged group

there are 0 positives in the unprivileged group

The metric disparate_impact is ill-defined and returns 0.0. Check your fairness configuration. The set of predicted labels is {0}.

unexpected result -0.42948717948717885 for r2 -0.42948717948717907

there are 0 positives in the privileged group

there are 0 positives in the unprivileged group

The metric disparate_impact is ill-defined and returns 0.0. Check your fairness configuration. The set of predicted labels is {0}.



 50%|████████████████████████▌                        | 5/10 [00:58<01:03, 12.66s/trial, best loss: 1.4285753171103486]

there are 0 positives in the privileged group

there are 0 positives in the unprivileged group

The metric disparate_impact is ill-defined and returns 0.0. Check your fairness configuration. The set of predicted labels is {0}.

unexpected result -0.42948717948717885 for r2 -0.42948717948717907

there are 0 positives in the privileged group

there are 0 positives in the unprivileged group

The metric disparate_impact is ill-defined and returns 0.0. Check your fairness configuration. The set of predicted labels is {0}.



 60%|█████████████████████████████▍                   | 6/10 [01:22<01:05, 16.29s/trial, best loss: 1.4285753171103486]

unexpected result -0.4267515923566876 for r2 -0.4267515923566878

unexpected result -0.42948717948717885 for r2 -0.42948717948717907



 70%|██████████████████████████████████▎              | 7/10 [01:23<00:33, 11.22s/trial, best loss: 1.4285753171103486]

unexpected result -0.4267515923566876 for r2 -0.4267515923566878

unexpected result -0.42948717948717885 for r2 -0.42948717948717907



 90%|████████████████████████████████████████████     | 9/10 [01:29<00:06,  6.75s/trial, best loss: 1.4285753171103486]

unexpected result -0.4267515923566876 for r2 -0.4267515923566878

unexpected result -0.42948717948717885 for r2 -0.42948717948717907



100%|████████████████████████████████████████████████| 10/10 [01:30<00:00,  9.04s/trial, best loss: 1.4285753171103486]


In [46]:
r2_and_disparate_impact_scorer = r2_and_disparate_impact(**fairness_info)
print(f'r2_and_disparate_impact_scorer {r2_and_disparate_impact_scorer(trained_fairer4, test_X, test_y):.1%}')

r2_and_disparate_impact_scorer -42.9%


In [47]:
trained_fairer4.pretty_print(ipython_display=True, show_imports=False)

```python
pipeline = logisticaix360(
    favorable_labels=[1],
    protected_attributes=[
        {"feature": "sex", "reference_group": [1]},
        {"feature": "age", "reference_group": [1]},
    ],
    lambda0=28061.60432549893,
    lambda1=19696.767526190495,
)
```

# 
Logisticaix360 works best for the above scorer among  LR|PrejudiceRemover| logisticaix360

In [48]:
from lale.lib.aif360 import statistical_parity_difference
combined_scorer5 = statistical_parity_difference(**fairness_info)

In [49]:
combined_scorer5

<lale.lib.aif360.util._ScorerFactory at 0x1e96439dac0>

In [50]:
from lale.lib.aif360 import FairStratifiedKFold
fair_cv5 = FairStratifiedKFold(**fairness_info, n_splits=3)

In [51]:
planned_fairer5 =  LR|PrejudiceRemover(**fairness_info)| logisticaix360(**fairness_info)

In [52]:
trained_fairer5 = planned_fairer5.auto_configure(
    train_X, train_y, optimizer=Hyperopt, cv=fair_cv5,verbose=True,
    max_evals=10, scoring=combined_scorer5, best_score=1.0)

100%|███████████████████████████████████████████████████████████████| 10/10 [01:10<00:00,  7.09s/trial, best loss: 1.0]


In [53]:
statistical_parity_difference_scorer = statistical_parity_difference(**fairness_info)
print(f'statistical_parity_difference_scorer {statistical_parity_difference_scorer(trained_fairer5, test_X, test_y):.1%}')

statistical_parity_difference_scorer 0.0%


In [54]:
trained_fairer5.pretty_print(ipython_display=True, show_imports=False)

```python
pipeline = PrejudiceRemover(
    favorable_labels=[1],
    protected_attributes=[
        {"feature": "sex", "reference_group": [1]},
        {"feature": "age", "reference_group": [1]},
    ],
    eta=27283.69166514559,
)
```

# 
 PrejudiceRemover model  works best for(max_evals=10) the above scorer among  LR|PrejudiceRemover| logisticaix360

In [55]:
from lale.lib.aif360 import symmetric_disparate_impact
combined_scorer6 = symmetric_disparate_impact(**fairness_info)

In [56]:
from lale.lib.aif360 import FairStratifiedKFold
fair_cv6 = FairStratifiedKFold(**fairness_info, n_splits=3)

In [57]:
planned_fairer6 =  LR|PrejudiceRemover(**fairness_info)| logisticaix360(**fairness_info)

In [58]:
trained_fairer6 = planned_fairer6.auto_configure(
    train_X, train_y, optimizer=Hyperopt, cv=fair_cv6,verbose=True,
    max_evals=10, scoring=combined_scorer6, best_score=1.0)

  0%|                                                                           | 0/10 [00:00<?, ?trial/s, best loss=?]

there are 0 positives in the privileged group

there are 0 positives in the unprivileged group

The metric disparate_impact is ill-defined and returns 0.0. Check your fairness configuration. The set of predicted labels is {0}.

there are 0 positives in the privileged group

there are 0 positives in the unprivileged group

The metric disparate_impact is ill-defined and returns 0.0. Check your fairness configuration. The set of predicted labels is {0}.



 40%|█████████████████████████▌                                      | 4/10 [00:28<00:28,  4.83s/trial, best loss: 0.0]

there are 0 positives in the privileged group

there are 0 positives in the unprivileged group

The metric disparate_impact is ill-defined and returns 0.0. Check your fairness configuration. The set of predicted labels is {0}.

there are 0 positives in the privileged group

there are 0 positives in the unprivileged group

The metric disparate_impact is ill-defined and returns 0.0. Check your fairness configuration. The set of predicted labels is {0}.



 50%|████████████████████████████████                                | 5/10 [00:46<00:48,  9.67s/trial, best loss: 0.0]

there are 0 positives in the privileged group

there are 0 positives in the unprivileged group

The metric disparate_impact is ill-defined and returns 0.0. Check your fairness configuration. The set of predicted labels is {0}.

there are 0 positives in the privileged group

there are 0 positives in the unprivileged group

The metric disparate_impact is ill-defined and returns 0.0. Check your fairness configuration. The set of predicted labels is {0}.



100%|███████████████████████████████████████████████████████████████| 10/10 [01:11<00:00,  7.18s/trial, best loss: 0.0]


In [59]:
symmetric_disparate_impact_scorer = symmetric_disparate_impact(**fairness_info)
print(f'symmetric_disparate_impact_scorer {symmetric_disparate_impact_scorer(trained_fairer6, test_X, test_y):.1%}')

symmetric_disparate_impact_scorer 100.0%


In [60]:
trained_fairer6.pretty_print(ipython_display=True, show_imports=False)

```python
pipeline = logisticaix360(
    favorable_labels=[1],
    protected_attributes=[
        {"feature": "sex", "reference_group": [1]},
        {"feature": "age", "reference_group": [1]},
    ],
    lambda0=28061.60432549893,
    lambda1=19696.767526190495,
)
```

#  
logisticaix360 model works best for the above scorer  among  LR|PrejudiceRemover| logisticaix360

In [61]:
from lale.lib.aif360 import theil_index
combined_scorer7 = theil_index(**fairness_info)

In [62]:
from lale.lib.aif360 import FairStratifiedKFold
fair_cv7 = FairStratifiedKFold(**fairness_info, n_splits=3)

In [63]:
planned_fairer7 =  LR|PrejudiceRemover(**fairness_info)| logisticaix360(**fairness_info)

In [64]:
trained_fairer7 = planned_fairer7.auto_configure(
    train_X, train_y, optimizer=Hyperopt, cv=fair_cv7,verbose=True,
    max_evals=10, scoring=combined_scorer7, best_score=1.0)

100%|████████████████████████████████████████████████| 10/10 [01:16<00:00,  7.62s/trial, best loss: 0.1776649221048826]


# 
Theil index measures  the inequality in benefit  allocation for  individuals.

In [65]:
theil_index_scorer = theil_index(**fairness_info)
print(f'theil_index_scorer {theil_index_scorer(trained_fairer7, test_X, test_y):.1%}')

theil_index_scorer 120.4%


In [66]:
trained_fairer7.pretty_print(ipython_display=True, show_imports=False)

```python
pipeline = PrejudiceRemover(
    favorable_labels=[1],
    protected_attributes=[
        {"feature": "sex", "reference_group": [1]},
        {"feature": "age", "reference_group": [1]},
    ],
    eta=27283.69166514559,
)
```

# 
Prejudice remover  model  works best  for theil index scorer  among  LR|PrejudiceRemover| logisticaix360

In [67]:

from lale.lib.aif360 import false_omission_rate_difference
combined_scorer8 = false_omission_rate_difference(**fairness_info)

In [68]:
from lale.lib.aif360 import FairStratifiedKFold
fair_cv7 = FairStratifiedKFold(**fairness_info, n_splits=3)

In [69]:
planned_fairer7 =  LR|PrejudiceRemover(**fairness_info)| logisticaix360(**fairness_info) 

In [70]:
from lale.lib.lale import Hyperopt
trained_fairer8 = planned_fairer7.auto_configure(
    train_X, train_y, optimizer=Hyperopt, cv=fair_cv7,verbose=True,
    max_evals=10, scoring=combined_scorer8, best_score=1.0)

100%|███████████████████████████████████████████████████████████████| 10/10 [01:17<00:00,  7.79s/trial, best loss: 1.0]


In [71]:
false_omission_rate_difference_scorer = false_omission_rate_difference(**fairness_info)
print(f'false_omission_rate_difference {false_omission_rate_difference_scorer(trained_fairer7, test_X, test_y):.1%}')

false_omission_rate_difference -19.7%


In [72]:
trained_fairer8.pretty_print(ipython_display=True, show_imports=False)

```python
pipeline = logisticaix360(
    favorable_labels=[1],
    protected_attributes=[
        {"feature": "sex", "reference_group": [1]},
        {"feature": "age", "reference_group": [1]},
    ],
    lambda0=28061.60432549893,
    lambda1=19696.767526190495,
)
```

# 
Logisticaix360 model  works best for the above scorer among  LR|PrejudiceRemover| logisticaix360

In [73]:
from lale.lib.aif360 import consistency_score


In [74]:
fairness_info

{'favorable_labels': [1],
 'protected_attributes': [{'feature': 'sex', 'reference_group': [1]},
  {'feature': 'age', 'reference_group': [1]}]}

In [75]:
combined_scorer10 = consistency_score(**fairness_info)

In [76]:
from lale.lib.aif360 import FairStratifiedKFold
fair_cv10 = FairStratifiedKFold(**fairness_info, n_splits=3)

In [77]:
planned_fairer10 =  LR|PrejudiceRemover(**fairness_info)| logisticaix360(**fairness_info)

In [78]:
from lale.lib.lale import Hyperopt
from sklearn.neighbors import NearestNeighbors
trained_fairer10 = planned_fairer10.auto_configure(
    train_X, train_y, optimizer=Hyperopt, cv=fair_cv10,verbose=True,
    max_evals=10, scoring=combined_scorer10, best_score=1.0)

100%|███████████████████████████████████████████████████████████████| 10/10 [00:51<00:00,  5.14s/trial, best loss: 0.0]


In [79]:
consistency_scorer = consistency_score(**fairness_info)
print(f'consistency_scorer {consistency_scorer(trained_fairer10, test_X, test_y):.1%}')

consistency_scorer 100.0%


In [85]:
trained_fairer10.pretty_print(ipython_display=True, show_imports=False)

```python
pipeline = PrejudiceRemover(
    favorable_labels=[1],
    protected_attributes=[
        {"feature": "sex", "reference_group": [1]},
        {"feature": "age", "reference_group": [1]},
    ],
    eta=27283.69166514559,
)
```

In [None]:
class MyClass:
    def __init__(self, a):
        self.a = a

    def add_one_to_a(self):
        self.a += 1
        def test_method_add_one_to_a():
    initial_a = 1
    instance = MyClass(a=1)
    assert instance.a == initial_a  # we expect this to be 1
    instance.add_one_to_a()  # instance.a is now 2assert instance.a == initial_a + 1  # we expect this to be 2

In [105]:
class MyClass:
    def __init__(self, a):
        self.a = a

    def add_one_to_a(self):
        
        self.a += 1
        print(self.a)
        
    def test_method_add_one_to_a(self):
        initial_a = 1
        instance = MyClass(a=1)
        assert instance.a == initial_a  # we expect this to be 1
        instance.add_one_to_a()  
        

In [106]:
a=MyClass(6)
a.add_one_to_a()
a.test_method_add_one_to_a()


7
2


In [96]:
def test_method_add_one_to_a():
    initial_a = 1
    instance = MyClass(a=1)
    assert instance.a == initial_a  # we expect this to be 1
    instance.add_one_to_a()  # instance.a is now 2assert instance.a == initial_a + 1  # we expect this to be 2

# 
 PrejudiceRemover model works best for the above scorer max_evals=10  among LR|PrejudiceRemover| logisticaix360


# 
Note :Best model depends on the max_evals values and it may differ as per its values.