# Fairness processors

In this notebook we will showcase the different fairness processors we have implemented, showing a simple use case in which we debias the *BERT* model.

Fairness processors can be classified according to the part of the machine learning pipeline they are introduced in:

1. Pre processors: if they are introduced before the model has been trained.
1. In processors: if they are introduced during the process of training the model.
1. Post processors: if they are introduced after the training step.
1. Intra processors: aditionally, we speak of *intra processors* when refering to fairness methods that do not modify a model's parameters. This notion overlaps with that of post processors and can be deemed equivalent.

To showcase the implementation of these methods we will run then on the imdb data set without further considerations as it is only intended to serve as a proof of concept.

# Imports

In [1]:
# Standard libraries
import sys
import os

# Pytorch
import torch
import torch.nn as nn
from torch.utils.data import DataLoader
from torch.optim import AdamW

# Hugging face
from transformers import (
    BertForSequenceClassification,
    AutoTokenizer,
    Trainer,
    TrainingArguments,
)
from datasets import (
    load_dataset,
    Dataset
)


# Custom imports
LOCAL = True
if LOCAL:
    import os
    import sys
    ROOT_PATH = os.path.abspath(os.path.join(os.path.dirname(__file__), "..")) \
        if "__file__" in globals() else os.path.abspath("..")
    sys.path.insert(0, ROOT_PATH)

# Preliminaries

In [None]:
# Use GPU is available
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(device)

# Load BERT
def get_bert():
    return BertForSequenceClassification.from_pretrained(
        "bert-base-uncased", num_labels=2
        )

TOKENIZER = AutoTokenizer.from_pretrained('bert-base-uncased')
BERT = get_bert()
HIDDEN_DIM_BERT = BERT.config.hidden_size

# Download data set, tokenize
imdb = load_dataset("imdb")

def tokenize_function(example):
    return TOKENIZER(
        example["text"],
        padding="max_length",
        truncation=True,
        max_length=128
        )

dataset = imdb.map(tokenize_function, batched=True)
dataset.set_format(
    type="torch", columns=["input_ids", "attention_mask", "label"]
    )

# Train test split
train_dataset = dataset["train"]
val_dataset = dataset["test"]

# Trainer configuration
training_args = TrainingArguments(
    output_dir="./results",
    eval_strategy="epoch",
    save_strategy="epoch",
    learning_rate=1e-5,
    per_device_train_batch_size=32,
    per_device_eval_batch_size=32,
    num_train_epochs=1,
    fp16=True,
    save_safetensors=False, 
    weight_decay=0.1,
    logging_dir="./logs",
    logging_steps=10,
)

cuda


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


## Base model

We will run our proof of concept with a `Trainer`:

In [None]:
trainer = Trainer(
    model=BERT,
    args=training_args,
    train_dataset=train_dataset,
    eval_dataset=val_dataset,
    optimizers=(
        AdamW(BERT.parameters(), lr=1e-5, weight_decay=0.1),
        None
        )
)

trainer.train()
results = trainer.evaluate()
print(results)

Epoch,Training Loss,Validation Loss
1,0.3273,0.293338


{'eval_loss': 0.2933380901813507, 'eval_runtime': 82.3371, 'eval_samples_per_second': 303.63, 'eval_steps_per_second': 9.498, 'epoch': 1.0}


# Pre processors

Pre processors are those methods that only affect the model's inputs and do not change their parameters. We have implemented:

1. Counterfactual Data Augmentation (CDA).
1. Projection based debiasing.
1. BLIND debiasing.

## CDA

CDA is based on the idea of augmenting the data by flipping words with information of the sensitive attribute (e.g. feminine vs. masculine words). This procedure is implemented with the `transform_batch` function which is applied to a hugging face data set.

In [None]:
from FairLangProc.algorithms.preprocessors import CDA

gendered_pairs = [
    ('he', 'she'),
    ('him', 'her'),
    ('his', 'hers'),
    ('actor', 'actress'),
    ('priest', 'nun'),
    ('father', 'mother'),
    ('dad', 'mom'),
    ('daddy', 'mommy'),
    ('waiter', 'waitress'),
    ('James', 'Jane')
    ]

cda_train = Dataset.from_dict(
        CDA(imdb['train'][:], pairs = dict(gendered_pairs))
)

train_CDA = cda_train.map(tokenize_function, batched=True)
train_CDA.set_format(
    type="torch", columns=["input_ids", "attention_mask", "label"]
)

# Check differences
print(f'Lenght of original train data set: {len(train_dataset['text'])}')
print(f'Lenght of CDA augmented train data set: {len(cda_train['text'])}')

# Train model
CDAModel = get_bert()

trainer = Trainer(
    model=CDAModel,
    args=training_args,
    train_dataset=train_CDA,
    eval_dataset=val_dataset,
    optimizers=(
        AdamW(CDAModel.parameters(), lr=2e-5, weight_decay=0.01),
        None
        )
)

trainer.train()
results = trainer.evaluate()
print(results)

Map:   0%|          | 0/39684 [00:00<?, ? examples/s]

Lenght of original train data set: 25000
Lenght of CDA augmented train data set: 39684


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss
1,0.1953,0.272441


{'eval_loss': 0.27244114875793457, 'eval_runtime': 39.2718, 'eval_samples_per_second': 636.589, 'eval_steps_per_second': 19.912, 'epoch': 1.0}


## BLIND debiasing

BLIND debiasing incoroporates a classifier whom is tasked with identifying whether the base model will succeed in the task for a given training instance. The model then reweights each training instance depending on the probability that this auxiliary model, $g_{B}$, assigns to the base model that it will correctly perform the task. The loss is modified accordingly:

$$     \mathcal{L}_{BLIND} = \left(1 - \sigma \left( g_{B}(h; \theta_{B} ) \right) \right)^{\gamma} \mathcal{L}^{task}(\hat{y}, y), $$

where $\gamma$ is a hyper-parameter.

The implementation of BLIND is given by the `BLINDModel` abstract class, which requires the implementation of three abstract methods:

1. `_get_loss`: sets the `self.loss_fct` attribute to the desired loss.
1. `_loss`: computes the value of `self.loss_fct` for a training instance.
1. `_get_embedding`: which retrieves the hidden representation of a given input.

We have implemented the `BLINDModelForClassification` to handle classification tasks, which sets the loss function to the usual cross-entropy loss and only requires the definition of `_get_embedding`. Below we implement a custom class for the *BERT* model which showcases the ease of use of our class:

In [None]:
from FairLangProc.algorithms.preprocessors import BLINDTrainer

BLINDModel = get_bert()
BLINDClassifier = nn.Sequential(
      nn.Linear(HIDDEN_DIM_BERT, HIDDEN_DIM_BERT),
      nn.ReLU(),
      nn.Linear(HIDDEN_DIM_BERT, 2)
)

class BLINDBERTTrainer(BLINDTrainer):
    def _get_embedding(self, inputs):
        return self.model.bert(
            input_ids = inputs.get("input_ids"),
            attention_mask = inputs.get("attention_mask"),
            token_type_ids = inputs.get("token_type_ids")
            ).last_hidden_state[:,0,:]
    
trainer = BLINDBERTTrainer(
    blind_model = BLINDClassifier,
    blind_optimizer = lambda x: AdamW(x, lr=1e-5, weight_decay=0.1),
    temperature = 1.0,
    gamma = 2.0,
    alpha = 1.0,
    model = BLINDModel,
    args = training_args,
    train_dataset = train_dataset,
    eval_dataset = val_dataset,
    optimizers=(
        AdamW(BLINDModel.parameters(), lr=1e-5, weight_decay=0.1),
        None
        )
)

trainer.train()
results = trainer.evaluate()
print(results)

Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss
1,0.137576,0.079221


{'eval_loss': 0.07922087609767914, 'eval_runtime': 197.1266, 'eval_samples_per_second': 126.822, 'eval_steps_per_second': 3.967, 'epoch': 1.0}


## Projection based debiasing

Projection based debiasing identifies a bias subpsace by performing PCA on the difference of the hidden representation of counterfactual pairs of words or sentences. Then, the hidden representation of a given input is debiased by computing its projection on the bias-free subspace that's orthogonal to the bias subspace:

$$    h_{proj} = h - \sum_{i = 1}^{n_{bias} } \langle h, v_i \rangle \, v_i.$$

The implementation of projection based debiasing is given by the `SentDebiasModel` abstract class, which requires the implementation of three abstract methods:

1. `_get_loss`: sets the `self.loss_fct` attribute to the desired loss.
1. `_loss`: computes the value of `self.loss_fct` for a training instance.
1. `_get_embedding`: which retrieves the hidden representation of a given input.

We have implemented the `SentDebiasForSequenceClassification` to handle classification tasks, which sets the loss function to the usual cross-entropy loss and only requires the definition of `_get_embedding`. Below we implement a custom class for the *BERT* model which showcases the ease of use of our class:

In [None]:
from FairLangProc.algorithms.preprocessors\
import SentDebiasForSequenceClassification

gendered_pairs = [('he', 'she'), ('his', 'hers'), ('monk', 'nun')]

model = get_bert()

class SentDebiasBert(SentDebiasForSequenceClassification):        
    def _get_embedding(
            self,
            input_ids,
            attention_mask = None,
            token_type_ids = None
            ):
        return self.model.bert(
            input_ids,
            attention_mask = attention_mask,
            token_type_ids = token_type_ids
            ).last_hidden_state[:,0,:]


EmbedModel = SentDebiasBert(
    model = model,
    config = None,
    tokenizer = TOKENIZER,
    word_pairs = gendered_pairs,
    n_components = 1,
    n_labels = 2
)

trainer = Trainer(
    model=EmbedModel,
    args=training_args,
    train_dataset=train_dataset,
    eval_dataset=val_dataset,
    optimizers=(
        AdamW(EmbedModel.parameters(), lr=1e-5, weight_decay=0.1),
        None
        )
)

trainer.train()
results = trainer.evaluate()
print(results)

Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss
1,0.3073,0.288937


{'eval_loss': 0.2889372706413269, 'eval_runtime': 82.5018, 'eval_samples_per_second': 303.024, 'eval_steps_per_second': 9.479, 'epoch': 1.0}


# In processors

In processors are those methods that change the way the model is trained. In particular we have implemented:

1. ADELE (adapter based debiasing).
1. Selective updating.
1. Regularizers.

## ADELE

The ADELE method adopts an adapter-based approach where they include an adapter layer after each FNN layer of the transformer architecture, this layers take the form:

$$\text{Adapter}(h, r) = U \cdot g(D \cdot h) + r$$

that is, it is a linear layer with an activation function and a bias, $r$. This layer has a smaller dimension than the corresponding FNN, compressing the data and providing a information bottleneck so the bias information gets discarded after carefully training the model.

In [None]:
from adapters import AdapterTrainer
from FairLangProc.algorithms.inprocessors import DebiasAdapter

DebiasAdapter = DebiasAdapter(
    model = get_bert(),
    adapter_config = "seq_bn"
    )
AdeleModel = DebiasAdapter.get_model()

trainer = AdapterTrainer(
    model=AdeleModel,
    args=training_args,
    train_dataset=train_CDA,
    eval_dataset=val_dataset,
    optimizers=(
        AdamW(AdeleModel.parameters(),lr=1e-5, weight_decay=0.1),
        None
        )
)

trainer.train()
results = trainer.evaluate()
print(results)

Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss
1,0.6613,0.651091


{'eval_loss': 0.6510912775993347, 'eval_runtime': 98.7479, 'eval_samples_per_second': 253.17, 'eval_steps_per_second': 7.919, 'epoch': 1.0}


## Selective updating

Selective updating aims to selectively update some of the model's parameters. The method `selective_unfreezing` allows to freeze all of the model's parameters with the exception of certain parameters specificied by their names.

In [None]:
from FairLangProc.algorithms.inprocessors import selective_unfreezing

FrozenBert = get_bert()
selective_unfreezing(FrozenBert, ["attention.self", "attention.output"])

trainer = Trainer(
    model=FrozenBert,
    args=training_args,
    train_dataset=train_CDA,
    eval_dataset=val_dataset,
    optimizers=(
        AdamW(FrozenBert.parameters(), lr=1e-5, weight_decay=0.1),
        None
        )
)

trainer.train()
results = trainer.evaluate()
print(results)

Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss
1,0.3278,0.318286


{'eval_loss': 0.3182864487171173, 'eval_runtime': 74.9231, 'eval_samples_per_second': 333.675, 'eval_steps_per_second': 10.437, 'epoch': 1.0}


## Regularizers

The idea of regularizers is to modify the original task loss by adding a new term:

$$
\mathcal{L}_{reg} = \mathcal{L}^{task} + \lambda \mathcal{R}
$$,

where $\mathcal{R}$ represents a term that aims to debias the LLM. In particular, we have implemented Entropy Attention Regularizer (EAR) and a projection based regularizer. Here we showcase the EAR regularizer through the `EARModel` class:

In [None]:
from FairLangProc.algorithms.inprocessors import EARModel

model = get_bert()

EARRegularizer = EARModel(
     model = model,
     ear_reg_strength = 0.01
)

trainer = Trainer(
    model=EARRegularizer,
    args=training_args,
    train_dataset=train_dataset,
    eval_dataset=val_dataset,
    optimizers=(
        AdamW(EARRegularizer.parameters(), lr=1e-5, weight_decay=0.1),
        None
        )
)

trainer.train()
results = trainer.evaluate()
print(results)

Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss
1,-0.2502,-0.283381


{'eval_loss': -0.28338149189949036, 'eval_runtime': 189.3683, 'eval_samples_per_second': 132.018, 'eval_steps_per_second': 4.13, 'epoch': 1.0}


# Intra processors

Intra processors are those methods that happen after training has already been done but which do not change the model's parameters. There is certain overlap between intra processors and more traditional post processors.

We have implemented:

1. Diff pruning.
1. Entropy Attention Temperature (EAT) scaling.

## Diff pruning

Diff pruning is a modular technique which freezes the model's parameters and trains a sparse set of parameters, $\delta$, added over the original ones. These parameters are decomposed as $\delta = m \odot w$ where $m$ is a sparsity mask and $w$ is the magnitude of the parameter. The new parameters are trained on a new loss which:

1. Learns the task at hand with $$\mathcal{L}^{task}$$
1. Learns to debias $$\mathcal{L}^{debias} = \left(\frac{\sum_{x_A \in X^A} \phi (E(x_A))}{|X^A |} - \frac{\sum_{x_B \in X^B} \phi (E(x_B))}{|X^B|} \right)^2$$ where $\phi$ is a kernel and $X^i, i \in \{A,B\}$ are sets of words with demographic information.
1. Promotes sparsity with $$\mathcal{L}^{0} = \sum_{i=1}^{|\delta_{\rho}|} \sigma\left( \log \alpha_{\rho, i} - \log\left(- \frac{\gamma}{\zeta}\right) \right)$$

The total loss function is given by the sum of the previous three terms.

We have implemented this method through the `DiffPrunedDebiasing` class which requires the implementation of the abstract method `_get_embedding`, which computes the embedding of a given input.
We provide the implementation of `DiffPrunningBERT` to apply this method to the BERT model.

In [None]:
from FairLangProc.algorithms.intraprocessors import DiffPrunBERT

gendered_pairs = [
    ("manager", "manageress"),
    ("nephew", "niece"),
    ("prince", "princess"),
    ("baron", "baroness"),
    ("father", "mother"),
    ("stepsons", "stepdaughters"),
    ("boyfriend", "girlfriend"),
    ("fiances", "fiancees"),
    ("shepherd", "shepherdess"),
    ("beau", "belle"),
    ("males", "females"),
    ("hunter", "huntress"),
    ("grandfathers", "grandmothers"),
    ("daddies", "mummies"),
    ("step-son", "step-daughter"),
    ("masters", "mistresses"),
    ("nephews", "nieces"),
    ("brother", "sister"),
    ("grandfather", "grandmother"),
    ("priest", "priestess")
]

tokens_male = [words[0] for words in gendered_pairs]
tokens_female = [words[1] for words in gendered_pairs]

inputs_male = TOKENIZER(
    tokens_male, padding = True, return_tensors = "pt"
    )
inputs_female = TOKENIZER(
    tokens_female, padding = True, return_tensors = "pt"
    )


def normalize_by_column(x: torch.Tensor, eps: float = 1e-8):
    mean = x.mean(dim=0, keepdim=True)
    std = x.std(dim=0, keepdim=True)
    return (x - mean) / (std + eps)

original_model = get_bert()

ModularDebiasingBERT = DiffPrunBERT(
    head = original_model.classifier,
    encoder = original_model.bert,
    loss_fn = torch.nn.CrossEntropyLoss(),
    input_ids_A = inputs_male,
    input_ids_B = inputs_female,
    bias_kernel = normalize_by_column,
    upper = 10,
    lower = -0.001,
    lambda_bias = 0.5,
    lambda_sparse = 0.00001
)

trainer = Trainer(
    model=ModularDebiasingBERT,
    args=training_args,
    train_dataset=train_dataset,
    eval_dataset=val_dataset,
    optimizers=(
        AdamW(ModularDebiasingBERT.parameters(), lr=1e-5, weight_decay=0.1),
        None
        )
)

trainer.train()
results = trainer.evaluate()
print(results)

Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss
1,1.5619,1.519466


{'eval_loss': 1.5194658041000366, 'eval_runtime': 163.0023, 'eval_samples_per_second': 153.372, 'eval_steps_per_second': 4.797, 'epoch': 1.0}


## EAT

Entropy Attention Temperature (EAT) scaling proposes the use of Entropy-based Attention Temperature (EAT) scaling in order to modify the distribution of the attention scores with a temperature-related parameter, $\beta \in [0, \infty)$:

$$\text{Attention}_{\beta} (\mathbf{Q}, \mathbf{K}, \mathbf{V}) = \text{softmax} \left(\frac{\beta \mathbf{Q} \mathbf{K}}{\sqrt{d_k}} \right) \mathbf{V}.$$

We have implemented EAT scaling through the `add_EAT_hook` which simply requires the specification of a LLM and the $\beta$ parameter.

In [None]:
from FairLangProc.algorithms.intraprocessors import add_EAT_hook

EATBert = BERT
beta = 1.5

add_EAT_hook(model=EATBert, beta=beta)

trainer = Trainer(
    model=EATBert,
    args=training_args,
    train_dataset=train_dataset,
    eval_dataset=val_dataset,
    optimizers=(
        AdamW(EATBert.parameters(), lr=1e-5, weight_decay=0.1),
        None
        )
)

results = trainer.evaluate()
print(results)

{'eval_loss': 0.798125147819519, 'eval_model_preparation_time': 0.0021, 'eval_runtime': 40.3474, 'eval_samples_per_second': 619.619, 'eval_steps_per_second': 19.382}
