# Applying the principle of Ceteris paribus

<a href="https://colab.research.google.com/drive/17u12_U3BuNBk5ySuT54AXloKCn1LPAHQ" target="_blank">
  <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab">
</a>

Return to the [castle](https://github.com/Nkluge-correa/TeenyTinyCastle).

A possible criticism against Fairness measures based only on statistical variations between groups and subgroups is that "_correlation is not causation._"

If you think about it, individual $ X $ (belonging to the _unprivileged group_) was classified for the negative class because the classifier more often favors the privileged group for the positive class. If "_correlation is not causation_", we cannot. Statistical fairness indices (_Statistical Parity, Equalized Probabilities, Equal Opportunity_, etc.) say something about the "_population_" and not the "_individual_."

Another class of fairness metrics (and interpretability tools) exists to address this issue. In ML Fairness, these methods are called "[Causality-Based Fairness](https://www.frontiersin.org/articles/10.3389/fdata.2022.892837/full)", and in ML Interpretability (XAI), we call these "_[What-if models](https://ema.drwhy.ai/ceterisParibus.html#ref-ICEbox)_" or "_Individual Conditional Expectations_."

<img src="https://static.propublica.org/projects/algorithmic-bias/assets/img/generated/opener-b-crop-1200*675-00796e.jpg" width="600"/>

Source: [ProPublica](https://www.propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencing).

**Correctional Offender Management Profiling for Alternative Sanctions** (`COMPAS`) is a case management and decision support tool developed and owned by Northpointe (now [Equivant](https://www.equivant.com/)) used by USA courts to assess the likelihood of a defendant becoming a recidivist. In short, the `COMPAS` software uses an algorithm to assess potential recidivism risk. Northpointe created risk scales for general and **violent recidivism**, and for **pretrial misconduct**.

> **Note**: all datasets and models related to the course and repo are in the Hub. 🤗

A general critique of the use of proprietary software such as COMPAS is that since the algorithms it uses are **trade secrets**, they cannot be examined by the public and affected parties, which may violate due process. Another general criticism of machine-learning-based algorithms is that since they are data-dependent, the software will likely yield biased results if the data used in training is biased.

To test this critique, we will create a classifier from scratch and train it on the COMPAS dataset.
> **Note: For another interpretability/fairness analysis of a classifier trained on the COMPAS dataset, go to [this notebook](https://github.com/Nkluge-correa/TeenyTinyCastle/blob/master/ML-Explainability/Tabular/fairness_xai_COMPAS.ipynb).**



In [6]:
!pip install datasets -q

from datasets import load_dataset

# Load the datasets from the hub
dataset = load_dataset('AiresPucrs/COMPAS', split="train")

# Turn the datasets into a pandas.DataFrame
df = dataset.to_pandas()

After downloading our dataset, we need to eliminate the labels (scores and categories) that the original algorithm produced. We will only use a subset of the features from the original dataset in this tutorial. Also, for simplicity, we are merging the `Low` and `Medium` labels to turn this classification task into a binary problem (Fairness analyses are more straightforward in these cases). 

> **Note:** `High` risk samples represent only 25% of our dataset, and these are precisely the cases we want to distinguish better.

In [7]:
import pandas as pd

# Create a Label column
df['label'] = df['score_text'].apply(lambda x: 0 if x == 'High' else 1)

# Select the features we will use
features = ['sex', 'age_cat', 'race',
        'juv_fel_count', 'juv_misd_count',
        'juv_other_count', 'priors_count',
        'days_b_screening_arrest', 'c_days_from_compas',
        'c_charge_degree', 'is_recid', 'is_violent_recid',
        'label']

df = df[features].dropna()
df.reset_index(inplace=True, drop=True)

with pd.option_context('display.max_columns', None):
    display(df)

Unnamed: 0,sex,age_cat,race,juv_fel_count,juv_misd_count,juv_other_count,priors_count,days_b_screening_arrest,c_days_from_compas,c_charge_degree,is_recid,is_violent_recid,label
0,Male,Greater than 45,Other,0,0,0,0,-1.0,1.0,(F3),0,0,1
1,Male,Greater than 45,Other,0,0,0,0,-1.0,1.0,(F3),0,0,1
2,Male,25 - 45,African-American,0,0,0,0,-1.0,1.0,(F3),1,1,1
3,Male,Less than 25,African-American,0,0,1,4,-1.0,1.0,(F3),1,0,1
4,Male,Less than 25,African-American,0,0,1,4,-1.0,1.0,(F3),1,0,1
...,...,...,...,...,...,...,...,...,...,...,...,...,...
17014,Female,25 - 45,African-American,0,0,0,5,-1.0,1.0,(M1),0,0,1
17015,Male,Greater than 45,Other,0,0,0,0,-1.0,1.0,(F2),0,0,1
17016,Female,25 - 45,African-American,0,0,0,3,-1.0,1.0,(M1),0,0,1
17017,Female,Less than 25,Hispanic,0,0,0,2,-2.0,2.0,(F3),1,0,1


Now, let us see how our target is related to the sensitive attributes of our dataset (`Age`, `Race`, and `Sex`).

In [8]:
high_risk = []
low_risk = []

for element in list(df['sex'].unique()):
    a = df[df['sex'] == element]['label'].value_counts()[0]
    b = df[df['sex'] == element]['label'].value_counts()[1]
    high_risk.append(a)
    low_risk.append(b)

import plotly.graph_objects as go

fig = go.Figure(data=[
    go.Bar(name='High Risk', x=list(df['sex'].unique()), y=high_risk),
    go.Bar(name='Low Risk', x=list(df['sex'].unique()), y=low_risk)
])

fig.update_layout(
    barmode='group',
    template='plotly_dark',
    xaxis_title="<b>Sex</b>",
    yaxis_title="<b>Risk by Sex</b>",
    title='Distribution of <i>Risk Scores</i> by "Sex"',
    paper_bgcolor='rgba(0, 0, 0, 0)',
    plot_bgcolor='rgba(0, 0, 0, 0)',
    )
fig.show()

high_risk = []
low_risk = []

for element in list(df['age_cat'].unique()):
    a = df[df['age_cat'] == element]['label'].value_counts()[0]
    b = df[df['age_cat'] == element]['label'].value_counts()[1]
    high_risk.append(a)
    low_risk.append(b)

import plotly.graph_objects as go

fig = go.Figure(data=[
    go.Bar(name='High Risk', x=list(df['age_cat'].unique()), y=high_risk),
    go.Bar(name='Low Risk', x=list(df['age_cat'].unique()), y=low_risk)
])

fig.update_layout(
    barmode='group',
    template='plotly_dark',
    xaxis_title="<b>Age</b>",
    yaxis_title="<b>Risk by Age</b>",
    title='Distribution of <i>Risk Scores</i> by "Age"',
    paper_bgcolor='rgba(0, 0, 0, 0)',
    plot_bgcolor='rgba(0, 0, 0, 0)',
    )
fig.show()

high_risk = []
low_risk = []

for element in list(df['race'].unique()):
    a = df[df['race'] == element]['label'].value_counts()[0]
    b = df[df['race'] == element]['label'].value_counts()[1]
    high_risk.append(a)
    low_risk.append(b)

import plotly.graph_objects as go

fig = go.Figure(data=[
    go.Bar(name='High Risk', x=list(df['race'].unique()), y=high_risk),
    go.Bar(name='Low Risk', x=list(df['race'].unique()), y=low_risk)
])

fig.update_layout(
    barmode='group',
    template='plotly_dark',
    xaxis_title="<b>Race</b>",
    yaxis_title="<b>Risk by Race</b>",
    title='Distribution of <i>Risk Scores</i> by "Race"',
    paper_bgcolor='rgba(0, 0, 0, 0)',
    plot_bgcolor='rgba(0, 0, 0, 0)',
    )
fig.show()

Samples with the features `African-American`, `Male`, and `25-45` represent this dataset's bulk of high-risk samples. Let us see if our sensitive attributes correlate with our label. If they are, this is already a sign that our future model could inherit these biases against a specific unprivileged class.

> **Note: To be able to calculate correlations, let us transform all categorical values into numbers.**

In [9]:
from sklearn.preprocessing import LabelEncoder
import plotly.express as px

corr_df = df.copy()

le = LabelEncoder()

for column in list(set(df.columns) - set(df._get_numeric_data().columns)):
    corr_df[column] = le.fit_transform(corr_df[column])

fig = px.imshow(corr_df.corr(numeric_only=True).values,
                labels=dict(x="Features", y="Features"),
                x=list(corr_df.columns),
                y=list(corr_df.columns),
                text_auto=True
                )

fig.update_xaxes(side='top')

fig.update_layout(template='plotly_dark',
                  title='Correlation Matrix',
                  coloraxis_showscale=False,
                  paper_bgcolor='rgba(0, 0, 0, 0)',
                  plot_bgcolor='rgba(0, 0, 0, 0)')
fig.show()

According to the correlation scores, `race` has an alarming 0.22 correlation with our label. Let us see how this will impact our future model.

To deal with the classification problem, we will create a `RandomForestClassifier` using the `scikit-learn`.

In [10]:
from sklearn.preprocessing import StandardScaler, OneHotEncoder
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.compose import make_column_transformer
from sklearn.metrics import confusion_matrix
from sklearn.pipeline import make_pipeline

import plotly.express as px

# Seed is set for reproducibility purposes
seed = 42

# Define features and labels
X, y = df[df.columns.values.tolist()[0:12]], df[df.columns.values.tolist()[-1]]

# Perform the train/test split
X_train, X_test, y_train, y_test = train_test_split(
    X,
    y,
    test_size=0.2,
    random_state=seed
)

# Pre-process all features
preprocess = make_column_transformer(
    (StandardScaler(), ['juv_fel_count', 'juv_misd_count', 'juv_other_count', 'priors_count', 'days_b_screening_arrest', 'c_days_from_compas']),
    (OneHotEncoder(), ['sex', 'age_cat', 'race', 'c_charge_degree', 'is_recid', 'is_violent_recid']))

# Create instance of a `RandomForestClassifier`
model_rf = make_pipeline(
    preprocess,
    RandomForestClassifier(max_depth=3, n_estimators=500))

# Fit the forest!
model_rf.fit(X_train, y_train.values.ravel())

# Evaluate
score = model_rf.score(X_test, y_test.values.ravel())
print(f'Accuracy (Random Forest): ' + '{:.2f}'.format(score * 100) + ' %')

# Plot results as a confusion matrix
preds = model_rf.predict(X_test)
matrix = confusion_matrix(y_test.values.ravel(), preds)

fig = px.imshow(matrix,
                labels=dict(x="Predicted", y="True label"),
                x=['High', 'Low'],
                y=['High', 'Low'],
                text_auto=True
                )

fig.update_xaxes(side='top')

fig.update_layout(template='plotly_dark',
                  title='Confusion Matrix (Random Forest Model)',
                  coloraxis_showscale=False,
                  paper_bgcolor='rgba(0, 0, 0, 0)',
                  plot_bgcolor='rgba(0, 0, 0, 0)')

fig.show()

Accuracy (Random Forest): 77.20 %


## Ceteris paribus & Counterfactual Fairness

_Ceteris paribus_ is a principle (and Latin phrase) meaning “_all else unchanged_”. Think of it this way:

> "_if everything else was the same, minus this change, then what?_"

Ceteris paribus methods focus on evaluating the effect of a selected explanatory variable in terms of changes in a model's prediction induced by changes in the variable's values, i.e., _what would be the model prediction if this single variable is different?_ The main goal of this methodology is to understand how changes in the values of the variable affect the model's predictions.

This methodology has _appeal for the field of XAI_ as it can explain a single classification by _not relying on statistical evaluations of an entire population_, but rather on the _causal influence_ that certain _features_ have on the _classification of a model_.

A CP profile is nothing more than a profile of how the classification of a model varies concerning a change in a single exploratory variable/feature. And this can be used directly to measure something called [Counterfactual Fairness](https://arxiv.org/abs/1703.06856).

Counterfactual Fairness has a very intuitive and simple definition of fairness:

> An algorithm is said to be counterfactually fair if, _and only if_, the probability that the individual $X$, member of a group $a$, would be unchanged, even if we lived in a world where individual $X$ was of the group $b$.

We could relax the "unchanged" aspect to something like, "_the individual would still be classified for the same class as before._"

Now, let us implement a function that produces a CP profile for us. Then, we will use this function to evaluate if the trained model is _counterfactually fair_.

In [11]:
import plotly.graph_objects as go

def make_cp_profile(df, feature_name, sample, model):
    """
    Generate a Ceteris-paribus profile plot for a given feature.

    Parameters:
        df (pandas.DataFrame): The data to be used for generating the plot.
        feature_name (str): The name of the feature for which the profile
            plot will be generated. sample (pandas.Series): The data point to be
            used for generating the plot.
        model (object): The machine learning model used to generate the predictions.

    Returns:
        A Plotly figure object representing the Ceteris-paribus profile plot.

    """
    feature_values = list(df[feature_name].unique())
    sample_features = pd.DataFrame(sample).transpose().reset_index(drop=True)

    for value in feature_values:
        sample_features = pd.concat([sample_features, pd.DataFrame(sample_features.iloc[-1]).transpose()])
        sample_features.iloc[-1][feature_name] = value

    preds = model.predict_proba(sample_features.reset_index(drop=True).drop_duplicates())
    sample_features = sample_features.reset_index(drop=True).drop_duplicates()

    scores = []
    colors = []

    for i in range(len(preds)):

        if preds[i][0] > preds[i][1]:

            scores.append(-abs(preds[i][0] * 100))
            colors.append('red')

        else:

            scores.append(preds[i][1] * 100)
            colors.append('green')

    sample_features['model_score'] = scores
    sample_features['colors'] = colors

    fig = go.Figure(go.Bar(
        x=sample_features['model_score'],
        y=sample_features[feature_name],
        orientation='h',
        marker_color = sample_features.colors))

    fig.update_xaxes(ticksuffix = "%",
                    griddash='dash')

    fig.add_annotation(text="Positive Class",
                  xref="paper", yref="paper",
                  x=.9, y=1, showarrow=False)

    fig.add_annotation(text="Negative Class",
                  xref="paper", yref="paper",
                  x=.1, y=1, showarrow=False)

    fig.update_layout(
        xaxis=dict(
            tickmode='linear',
            tick0=0,
            dtick=10
        ),
        xaxis_range=[-100,100],
        template='plotly_dark',
        title_text=f'Ceteris-paribus Profile (Feature --> {feature_name})',
        paper_bgcolor='rgba(0, 0, 0, 0)',
        plot_bgcolor='rgba(0, 0, 0, 0)'

    )

    return  fig.show()

The above function takes as input the dataset used for training, a sample to be evaluated, an explanatory variable (feature), and a model trained on the dataset:

We are using the dataset to estimate all the unique values each variable can possess. So, when we choose a variable (e.g., "_race_") and a sample (e.g., subject $X$), we create copies of that sample, where each differs only in the value of one variable, having all the values that that variable can have. If our chosen variable were "_gender_", and the dataset contained only the values "_Male_" and "_Female_", we would have only two samples: one where $X$ is "_Male_", and one where $X$ is "_Female_".

With all possible variations of a single variable for the same sample, _we rank the variations with the trained model_. For simplicity's sake, the above function will generate a _red bar_ if the sample was assigned to the _negative class_ and a _green bar_ for the _positive class_. Now, let us choose a sample to evaluate.

In [12]:
print("Sample 1: \n\n", X_test.iloc[12])

Sample 1: 

 sex                                    Male
age_cat                             25 - 45
race                       African-American
juv_fel_count                             2
juv_misd_count                            0
juv_other_count                           0
priors_count                             18
days_b_screening_arrest                -1.0
c_days_from_compas                      1.0
c_charge_degree                        (M1)
is_recid                                  1
is_violent_recid                          0
Name: 5838, dtype: object


Now, let us evaluate this sample, permutating the selected features by their possible unique values, for "_race_", "_priors_count_", and "_age_".

In [13]:
make_cp_profile(X, 'race', X_test.iloc[12], model_rf)
make_cp_profile(X, 'priors_count', X_test.iloc[12], model_rf)
make_cp_profile(X, 'age_cat', X_test.iloc[12], model_rf)

Using the principle of _Ceteris paribus_, we can arrive at a very similar interpretation to our [another notebook example where we used the COMPAS dataset](https://github.com/Nkluge-correa/TeenyTinyCastle/blob/master/ML-Explainability/Tabular/fairness_xai_COMPAS.ipynb), but by using a completely different tool.

According to the generated graphs, when we talk about the features:

- **_race_**: the sample is only classified as high-risk if the value is "_African-American_". The sample with the highest probability for low risk is "_Caucasian_".
- **_priors_count_**: if the sample has more than _6 priors_count_, the sample is classified as high-risk.
- **_age_**: only the category "_older than 45_" was classified for the low-risk class.

This interpretation is specific to this sample and shows how "_if individual X were Caucasian_," all else being equal, he would be classified as low-risk rather than high.

Since race is a sensitive/protected attribute, this model cannot be considered "fair" according to the definition of Counterfactual Fairness.

**Note: One can ask whether "_Can we freeze all features representing a sample and change only one when the sample is a person?_" For a sociological and metaphysical discussion of this subject, we recommend the reader to [this study](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9099231/).**

----

Return to the [castle](https://github.com/Nkluge-correa/TeenyTinyCastle).

