# Fairness processors

In this notebook we will showcase the different fairness processors we have implemented, showing a simple use case in which we debias the *BERT* model.

Fairness processors can be classified according to the part of the machine learning pipeline they are introduced in:

1. Pre processors: if they are introduced before the model has been trained.
1. In processors: if they are introduced during the process of training the model.
1. Post processors: if they are introduced after the training step.
1. Intra processors: aditionally, we speak of *intra processors* when refering to fairness methods that do not modify a model's parameters. This notion overlaps with that of post processors and can be deemed equivalent.

To showcase the implementation of these methods we will run then on the imdb data set without further considerations as it is only intended to serve as a proof of concept.

# Imports

In [1]:
# Standard libraries
import sys
import os

# NLP libraries
import torch
import torch.nn as nn
from torch.utils.data import DataLoader
from torch.optim import AdamW

from transformers import (
    BertForSequenceClassification,
    AutoTokenizer,
    Trainer,
    TrainingArguments,
)
from datasets import (
    load_dataset,
    Dataset
)


# Custom imports
project_path = os.path.abspath(os.path.join(os.getcwd(), ".."))
sys.path.append(project_path)


from FairnessDatasets.FairnessDatasets import BiasDataLoader

# Preliminaries

Use GPU if available:

In [2]:
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(device)

cuda


Load `BERT`:

In [3]:
def get_bert():
    return BertForSequenceClassification.from_pretrained("bert-base-uncased", num_labels=2)

TOKENIZER = AutoTokenizer.from_pretrained('bert-base-uncased')
BERT = get_bert()
HIDDEN_DIM_BERT = BERT.config.hidden_size

Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Download data set and tokenize it:

In [4]:
imdb = load_dataset("imdb")

def tokenize_function(example):
    return TOKENIZER(example["text"], padding="max_length", truncation=True, max_length=128)

dataset = imdb.map(tokenize_function, batched=True)
dataset.set_format(type="torch", columns=["input_ids", "attention_mask", "label"])

Train test split:

In [5]:
train_dataset = dataset["train"]
val_dataset = dataset["test"]

Trainer configuration:

In [6]:
training_args = TrainingArguments(
    output_dir="./results",
    eval_strategy="epoch",
    save_strategy="epoch",
    learning_rate=1e-5,
#    per_device_train_batch_size=1,
#    per_device_eval_batch_size=1,
    per_device_train_batch_size=16,
    per_device_eval_batch_size=16,
    num_train_epochs=3,
    weight_decay=0.1,
    logging_dir="./logs",
    logging_steps=10,
)

## Base model

We will run our proof of concept with a `Trainer`:

In [7]:
trainer = Trainer(
    model=BERT,
    args=training_args,
    train_dataset=train_dataset,
    eval_dataset=val_dataset,
    optimizers=(AdamW(BERT.parameters(), lr=1e-5, weight_decay=0.1), None)
)

trainer.train()

Epoch,Training Loss,Validation Loss
1,0.2897,0.295473
2,0.2647,0.296338
3,0.2296,0.371886


TrainOutput(global_step=4689, training_loss=0.24466333005519328, metrics={'train_runtime': 3780.6748, 'train_samples_per_second': 19.838, 'train_steps_per_second': 1.24, 'total_flos': 4933332288000000.0, 'train_loss': 0.24466333005519328, 'epoch': 3.0})

In [8]:
results = trainer.evaluate()
print(results)

{'eval_loss': 0.37188637256622314, 'eval_runtime': 268.5077, 'eval_samples_per_second': 93.107, 'eval_steps_per_second': 5.821, 'epoch': 3.0}


# Pre processors

Pre processors are those methods that only affect the model's inputs and do not change their parameters. We have implemented:

1. Counterfactual Data Augmentation (CDA).
1. Projection based debiasing.
1. BLIND debiasing.

## CDA

CDA is based on the idea of augmenting the data by flipping words with information of the sensitive attribute (e.g. feminine vs. masculine words). This procedure is implemented with the `transform_batch` function which is applied to a hugging face data set.

We build the CDA function:

In [9]:
from FairnessProcessors.Preprocessors.Augmentation import transform_batch

gendered_pairs = [
    ('he', 'she'),
    ('him', 'her'),
    ('his', 'hers'),
    ('actor', 'actress'),
    ('priest', 'nun'),
    ('father', 'mother'),
    ('dad', 'mom'),
    ('daddy', 'mommy'),
    ('waiter', 'waitress'),
    ('James', 'Jane')
    ]

cda_train = Dataset.from_dict(
    transform_batch(imdb['train'][:], pairs = dict(gendered_pairs))
    )
cda_test = Dataset.from_dict(
transform_batch(imdb['test'][:], pairs = dict(gendered_pairs))
)

train_CDA = cda_train.map(tokenize_function, batched=True)
train_CDA.set_format(
    type="torch", columns=["input_ids", "attention_mask", "label"]
    )

val_CDA = cda_test.map(tokenize_function, batched=True)
val_CDA.set_format(
    type="torch", columns=["input_ids", "attention_mask", "label"]
    )

IndentationError: expected an indented block after function definition on line 10 (Regularizers.py, line 16)

Check differences:

In [None]:
print(f'Lenght of original train data set: {len(train_dataset['text'])}')
print(f'Lenght of CDA augmented train data set: {len(cda_train['text'])}')

print(f'Lenght of original train data set: {len(val_dataset['text'])}')
print(f'Lenght of CDA augmented train data set: {len(cda_test['text'])}')

KeyboardInterrupt: 

Train the model:

In [None]:
trainer = Trainer(
    model=get_bert(),
    args=training_args,
    train_dataset=train_CDA,
    eval_dataset=val_CDA,
    optimizers=(AdamW(CDAForClassification.parameters(), lr=2e-5, weight_decay=0.01), None)
)

trainer.train()



Epoch,Training Loss,Validation Loss


KeyboardInterrupt: 

In [None]:
results = trainer.evaluate()
print(results)

## BLIND debiasing

BLIND debiasing incoroporates a classifier whom is tasked with identifying whether the base model will succeed in the task for a given training instance. The model then reweights each training instance depending on the probability that this auxiliary model, $g_{B}$, assigns to the base model that it will correctly perform the task. The loss is modified accordingly:

$$     \mathcal{L}_{BLIND} = \left(1 - \sigma \left( g_{B}(h; \theta_{B} ) \right) \right)^{\gamma} \mathcal{L}^{task}(\hat{y}, y), $$

where $\gamma$ is a hyper-parameter.

The implementation of BLIND is given by the `BLINDModel` abstract class, which requires the implementation of three abstract methods:

1. `_get_loss`: sets the `self.loss_fct` attribute to the desired loss.
1. `_loss`: computes the value of `self.loss_fct` for a training instance.
1. `_get_embedding`: which retrieves the hidden representation of a given input.

We have implemented the `BLINDModelForClassification` to handle classification tasks, which sets the loss function to the usual cross-entropy loss and only requires the definition of `_get_embedding`. Below we implement a custom class for the *BERT* model which showcases the ease of use of our class:

In [None]:
from FairnessProcessors.Preprocessors import BLINDModelForClassification

model = get_bert()

class BLINDBERT(BLINDModelForClassification):
    def _get_embedding(self, input_ids = None, attention_mask = None, token_type_ids = None):
        return self.model.bert(input_ids = input_ids, attention_mask = attention_mask, token_type_ids = token_type_ids).last_hidden_state[:,0,:]


BLIND = BLINDBERT(
    model = model,
    config = None,
    base_loss = nn.CrossEntropyLoss(),
    alpha=1.0,
    gamma=2.0,
    temperature=1.0,
    size_average=True,
    hidden_dim = HIDDEN_DIM_BERT
)

Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


In [None]:
trainer = Trainer(
    model=BLIND,
    args=training_args,
    train_dataset=train_dataset,
    eval_dataset=val_dataset,
)

trainer.train()

Epoch,Training Loss,Validation Loss


KeyboardInterrupt: 

In [None]:
results = trainer.evaluate()
print(results)

NameError: name 'trainer' is not defined

## Projection based debiasing

Projection based debiasing identifies a bias subpsace by performing PCA on the difference of the hidden representation of counterfactual pairs of words or sentences. Then, the hidden representation of a given input is debiased by computing its projection on the bias-free subspace that's orthogonal to the bias subspace:

$$    h_{proj} = h - \sum_{i = 1}^{n_{bias} } \langle h, v_i \rangle \, v_i.$$

The implementation of projection based debiasing is given by the `SentDebiasModel` abstract class, which requires the implementation of three abstract methods:

1. `_get_loss`: sets the `self.loss_fct` attribute to the desired loss.
1. `_loss`: computes the value of `self.loss_fct` for a training instance.
1. `_get_embedding`: which retrieves the hidden representation of a given input.

We have implemented the `SentDebiasForSequenceClassification` to handle classification tasks, which sets the loss function to the usual cross-entropy loss and only requires the definition of `_get_embedding`. Below we implement a custom class for the *BERT* model which showcases the ease of use of our class:

In [None]:
from FairnessProcessors.Preprocessors.ProjectionBased import SentDebiasForSequenceClassification
gendered_pairs = BiasDataLoader('WinoBias', config = 'pairs', format = 'raw')

model = get_bert()

class SentDebiasBert(SentDebiasForSequenceClassification):

    def _loss(self, logits, labels):
        loss = nn.CrossEntropyLoss()
        return loss(logits, labels)

    def _get_embedding(self, input_ids, attention_mask = None, token_type_ids = None):
        return self.model.bert(input_ids, attention_mask = attention_mask, token_type_ids = token_type_ids).last_hidden_state[:,0,:]


SentDebias = SentDebiasBert(
    model = model,
    config = None,
    tokenizer = TOKENIZER,
    word_pairs = gendered_pairs,
    n_components = 1
)

Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


In [None]:
trainer = Trainer(
    model=SentDebias,
    args=training_args,
    train_dataset=train_dataset,
    eval_dataset=val_dataset,
)

trainer.train()

Epoch,Training Loss,Validation Loss


KeyboardInterrupt: 

In [None]:
results = trainer.evaluate()
print(results)

# In processors

In processors are those methods that change the way the model is trained. In particular we have implemented:

1. ADELE (adapter based debiasing).
1. Selective updating.
1. Regularizers.

## ADELE

The ADELE method adopts an adapter-based approach where they include an adapter layer after each FNN layer of the transformer architecture, this layers take the form:

$$\text{Adapter}(h, r) = U \cdot g(D \cdot h) + r$$

that is, it is a linear layer with an activation function and a bias, $r$. This layer has a smaller dimension than the corresponding FNN, compressing the data and providing a information bottleneck so the bias information gets discarded after carefully training the model.

In [None]:
from FairnessProcessors.Inprocessors.AdapterBased import DebiasAdapter

ADELE = DebiasAdapter(
    get_bert(),
    config = 'lora'
)

Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


In [None]:
trainer = Trainer(
    model=ADELE,
    args=training_args,
    train_dataset=train_dataset,
    eval_dataset=val_dataset,
)

trainer.train()

Epoch,Training Loss,Validation Loss


KeyboardInterrupt: 

In [None]:
results = trainer.evaluate()
print(results)

## Selective updating

Selective updating aims to selectively update some of the model's parameters. The method `selective_unfreezing` allows to freeze all of the model's parameters with the exception of certain parameters specificied by their names.

In [None]:
from FairnessProcessors.Inprocessors import selective_unfreezing

FrozenBert = get_bert()
selective_unfreezing(FrozenBert, ["attention.self", "attention.output"])

Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


In [None]:
trainer = Trainer(
    model=FrozenBert,
    args=training_args,
    train_dataset=train_dataset,
    eval_dataset=val_dataset,
)

trainer.train()

Epoch,Training Loss,Validation Loss


KeyboardInterrupt: 

In [None]:
results = trainer.evaluate()
print(results)

## Regularizers

The idea of regularizers is to modify the original task loss by adding a new term:

$$
\mathcal{L}_{reg} = \mathcal{L}^{task} + \lambda \mathcal{R}
$$,

where $\mathcal{R}$ represents a term that aims to debias the LLM. In particular, we have implemented Entropy Attention Regularizer (EAR) and a projection based regularizer. Here we showcase the EAR regularizer through the `EARModel` class:

In [None]:
from FairnessProcessors.Inprocessors import EARModel

model = get_bert()

EARRegularizer = EARModel(
     model = model,
     ear_reg_strength = 0.01
)

Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


In [None]:
trainer = Trainer(
    model=EARRegularizer,
    args=training_args,
    train_dataset=train_dataset,
    eval_dataset=val_dataset,
)

trainer.train()



Epoch,Training Loss,Validation Loss


KeyboardInterrupt: 

In [None]:
results = trainer.evaluate()
print(results)

# Intra processors

Intra processors are those methods that happen after training has already been done but which do not change the model's parameters. There is certain overlap between intra processors and more traditional post processors.

We have implemented:

1. Diff pruning.
1. Entropy Attention Temperature (EAT) scaling.

## Diff pruning

Diff pruning is a modular technique which freezes the model's parameters and trains a sparse set of parameters, $\delta$, added over the original ones. These parameters are decomposed as $\delta = m \odot w$ where $m$ is a sparsity mask and $w$ is the magnitude of the parameter These new parameters are trained on a new loss which:

1. Learns the task at hand with $$\mathcal{L}^{task}$$
1. Learns to debias $$\mathcal{L}^{debias} = \left(\frac{\sum_{x_A \in X^A} \phi (E(x_A))}{|X^A |} - \frac{\sum_{x_B \in X^B} \phi (E(x_B))}{|X^B|} \right)^2$$ where $\phi$ is a kernel and $X^i, i \in \{A,B\}$ are sets of words with demographic information.
1. Promotes sparsity with $$\mathcal{L}^{0} = \sum_{i=1}^{|\delta_{\rho}|} \sigma\left( \log \alpha_{\rho, i} - \log\left(- \frac{\gamma}{\zeta}\right) \right)$$

The total loss function is given by the sum of the previous three terms.

We have implemented this method through the `DiffPrunedDebiasing` class which requires the implementation of two abstract methods:

1. `_get_embedding`: which computes the embedding of a given input.
1. `_get_encoder`: which sets the `self.encoder` attribute that computes the hidden representation of an input.

We provide the implementation of `DiffPrunningBERT` to apply this method to the BERT model.

In [None]:
from FairnessProcessors.IntraProcessors.ModularDebiasing import DiffPrunningBERT

gendered_pairs = BiasDataLoader('WinoBias', config = 'pairs', format = 'raw')

tokens_male = [words[0] for words in gendered_pairs]
tokens_female = [words[1] for words in gendered_pairs]

inputs_male = TOKENIZER(tokens_male, padding = True, return_tensors = "pt")
inputs_female = TOKENIZER(tokens_female, padding = True, return_tensors = "pt")

In [None]:
BERT = get_bert()

ModularDebiasingBERT = DiffPrunningBERT(
    base_model = BERT,
    input_ids_A = inputs_male,
    input_ids_B = inputs_female
)

Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


In [None]:
trainer = Trainer(
    model=ModularDebiasingBERT,
    args=training_args,
    train_dataset=train_dataset,
    eval_dataset=val_dataset,
)

trainer.train()

Epoch,Training Loss,Validation Loss


KeyboardInterrupt: 

In [None]:
results = trainer.evaluate()
print(results)

## EAT

Entropy Attention Temperature (EAT) scaling proposes the use of Entropy-based Attention Temperature (EAT) scaling in order to modify the distribution of the attention scores with a temperature-related parameter, $\beta \in [0, \infty)$:

$$\text{Attention}_{\beta} (\mathbf{Q}, \mathbf{K}, \mathbf{V}) = \text{softmax} \left(\frac{\beta \mathbf{Q} \mathbf{K}}{\sqrt{d_k}} \right) \mathbf{V}.$$

We have implemented EAT scaling through the `add_EAT_hook` which simply requires the specification of a LLM and the $\beta$ parameter.

In [None]:
from FairnessProcessors.IntraProcessors.WeightRedistribution import add_EAT_hook

BERT = get_bert()
beta = 1.5

Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


In [None]:
new_bert = add_EAT_hook(
    model = BERT,
    beta = beta
)

In [None]:
trainer = Trainer(
    model=BERT,
    args=training_args,
    train_dataset=train_dataset,
    eval_dataset=val_dataset,
)

trainer.train()

Epoch,Training Loss,Validation Loss


KeyboardInterrupt: 