### Mind the gap

This project aims to identify, measure, and mitigate social biases, such as gender, race, or profession-related stereotypes, in lightweight transformer models through hands-on fine-tuning and evaluation on targeted NLP tasks. More specifically, the project should implement a four-step methodology, defined as follows:

1. Choose a lightweight pre-trained transformer model (e.g., DistilBERT, ALBERT, RoBERTa-base) suitable for local fine-tuning and evaluation.
2. Evaluate the presence and extent of social bias (e.g., gender, racial, or occupational stereotypes) using dedicated benchmark datasets. Both quantitative metrics and qualitative outputs should be evaluated.
3. Apply a bias mitigation technique, such as **fine-tuning on curated counter-stereotypical data**, integrating **adapter layers**, or employing **contrastive learning**, while keeping the solution computationally efficient and transparent.
4. Re-assess the model using the same benchmark(s) to measure improvements. Students should compare pre- and post-intervention results, discuss trade-offs (e.g., performance vs. fairness), and visualize the impact of their approach.

#### Dataset
- [StereoSet: Measuring stereotypical bias in pretrained language models](https://github.com/moinnadeem/StereoSet). Nadeem, M., Bethke, A., & Reddy, S. (2020). StereoSet: Measuring stereotypical bias in pretrained language models. *arXiv preprint arXiv:2004.09456*.
- [StereoSet: Measuring stereotypical bias in pretrained language models](https://huggingface.co/datasets/McGill-NLP/stereoset). Hugging Face co.

In [13]:
import pandas as pd
import numpy as np
import transformers
from transformers import AutoConfig, AutoTokenizer
import torch
import torch.nn as nn
from torch.utils.data import DataLoader, random_split
import matplotlib.pyplot as plt
from sklearn import preprocessing
from sklearn.metrics import classification_report
from collections import defaultdict
from tqdm import tqdm
from joblib import Parallel, delayed
from multiprocessing import cpu_count
import json
#to output plots within the notebook
%matplotlib inline

import importlib
import inspect
import os
import sys


# %cd /content/
# !ls -R
# repo_name = "MindTheGap"
# #!rm -rf {repo_name}
# if not os.path.exists(repo_name):
#     print(f"Directory {repo_name} does not exist, proceeding with clone.")
#     !git clone https://github.com/Abudo-S/MindTheGap.git

%cd MindTheGap

import AdaptedMLMTransformer as AdaptedMLMTransformer_Module
import AdaptedNSPTransformer as AdaptedNSPTransformer_Module
import intersentence_loader as intersentence_loader_Module
import dataloader
importlib.reload(AdaptedMLMTransformer_Module) # in case of updates
importlib.reload(AdaptedNSPTransformer_Module) # in case of updates
importlib.reload(intersentence_loader_Module) # in case of updates
importlib.reload(dataloader) # in case of updates
from AdaptedMLMTransformer import AdaptedMLMTransformer
from AdaptedNSPTransformer import AdaptedNSPTransformer
from intersentence_loader import IntersentenceDataset

[WinError 2] The system cannot find the file specified: 'MindTheGap'
c:\Users\abudo\source\vscode_projects\MindTheGap


### General observations:
- The given dataset contains 2123 intersentence contexts and 2106 intrasentence contexts, each context has 3 sentences that are labeled as ```stereotype, anti-stereotype and unrelated``` with respect to the target bias type.
- Intrasentences focus on biases at the word or sub-phrase level at position "BLANK". Meanwhile intersentences focus on biases within the relationship between the context and its associated sentences.


#### Tasks:
- Intrasentences are considered as a masked-language-modeling (MLM) task in which the model is given a sentence where a certain percentage of the tokens have been replaced with a special [MASK] token. The model's task is to predict the original words/sub-words of the masked tokens based on the surrounding context.
- Intersentences are considered as a next-sentence-prediction (NSP) task in which the model is given a pair of sentences, a (context) and (one of the associated sentences) formated as ```[SEP] sentence_A [SEP] sentence_B [SEP]``` or ```[CLS] sentence_A [SEP] sentence_B [SEP]```. Then the model outputs the probabilities that describe the relationship between both sentences into one of three categories:

    * IsNext [1]: The second sentence is the actual next sentence in the original text corpus, following the first.

    * NotNext [0]: The second sentence is unrelated to the first.

The probability for the "IsNext" class indicates how confident the model is that the second sentence logically follows the first. In our case we can use the probability of "IsNext" to compare the conjunction between each associated labeled sentence with the main context.

## Data Preprocessing
- Intrasentences: The dataloader creates a sentence for each target token of the masked word/subword that replaces "BLANK" with respect to the target word, concatenating the original context.

- Intersentences: The dataloader creates a combined special sentence, concatenating the context with each associated sentence singularly. The generated sentence depends on the used tokenizer, some tokenizers use [CLS] and [SEP], others use only [SEP]. So it's necessary to use a compatible tokenzier with the pre-trained model.


#### Context's sentences Evaluation
- Intrasentences: The MLM model should predict the masked token in each sentence, calculating per each context the mean score of target tokens w.r.t. model's vocabulary. So our task isn't retrieving the highest score belonging to a random predicted word, but retrieving the score of our target word's tokens. Later the score belonging to each sentence is compared to the scores of other sentences belonging to the same context, in order to determine model's preference w.r.t. the target bias.

- Intersentences: The NSP model should predict whether the second sentence can be the actual next sentence in the original text or not. So our task is to retrieve the score that describes the probability of the second sentence to follow the first sentence.


#### Loss calculation
Applying Mean Squared Error (MSE) loss with the following target labels in case of different tasks.
$$MSE = \frac{1}{N}\sum_{i=1}^{N}(y_i - \hat{y}_i)^2$$

- Intrasentences: The loss is calculated by taking average loss between the output tokens' probabilities and different token-score thresholds based on sentence's label:  
    * Stereotyped sentences: the minmum value of the softMax function (theoretically 0 but since the softmax never outputs zero in practical evaluation, so we decided to set a very small threshold = **1e-5** for stereotyped sentences).
    * Anti-stereotyped sentences: the maximum value of the softMax function (theoretically 1 but since the softmax never outputs 1 in practical evaluation, so we decided to set a very small threshold =  **0.999** for anti-stereotyped sentences).
    * Unrelated sentences: w'd consider a threshold = **0.50** for neutral/unrelated sentences since they less relevant to the main context.

- Intersentences: The loss is calculated by comparing the predict score of the second sentence (nextScore) and the pre-determined thresholds based on sentence's labels:  
    * Stereotyped sentences: Since we'd like to reduce the prediction scores of stereotyped sentences (sentences labeled as "stereotype"); we can label them with **1e-5** (Not 0 for the same reason in the case of intrasentences), so when the model gives a reasonable score for a stereotyped sentence as a next sentence (nextScore > 0), we still need to **minimize** that score through an optimization process.
    * Anti-stereotyped sentences: viceversa for anti-stereotyped sentences; we can label them with **0.999** (Not 1 for the same reason in the case of intrasentences), so when the model gives a reasonable score for an anti-stereotyped sentence as a next sentence (nextScore < 1), we still need to **maximize** that score through an optimization procedure.  
    * Unrelated sentences: Meanwhile for unrelated sentences; we can label them with **0.50**, so if the model gives a a reasonable score for an unrelated sentence as a next sentence (nextScore > 0.50), we'd need to minimize the score w.r.t. the threshold of neutral/unrelated sentences (nextScore = 0.50).



### Data Split and Model Comparability
--todo--  
- train NSP head of AutoModelForSequenceClassification over training set (freeze pre-trained params except those beloning to the NSP head).
- explain test data on pre-trained model (loss and SS evaluation in MLM and NSP) with bar plots.
- explain training data on fine-tuned adapted model (loss and SS evaluation over epochs in MLM and NSP) with curve plots. (freeze pre-trained params including those belonging NSP head).
- add explainability for bad scores either in case of pre-trained or fine-tuned models.

In [3]:
# original_init = AdaptedMLMTransformer_Module.AdaptedMLMTransformer.__init__

# # Corrected debug_init signature - assuming only model_name is a required parameter
# def debug_init(self, model_name="roberta-base"):
#     print(f"Initializing AdaptedMLMTransformer with model_name: {model_name}")
#     # Pass only the parameters that original_init expects
#     original_init(self, model_name=model_name)
#     print("AdaptedMLMTransformer initialized.")
#     for name, param in self.named_parameters():
#         print(f"Layer: {name}, Device: {param.device}, Dtype: {param.dtype}")

# AdaptedMLMTransformer_Module.LightTAdaptedMLMTransformerransformerModel.__init__ = debug_init

In [15]:
# os.environ['CUDA_LAUNCH_BLOCKING'] = '1'
# os.environ['TORCH_USE_CUDA_DSA'] = '0'
INPUT_FILE = "data/stereo_dataset.json"
#INPUT_FILE = "/content/MindTheGap/data/stereo_dataset.json"
OUTPUT_FILE_PRETRAINED ="predictions.json"
OUTPUT_FILE_FINETUNED ="predictions.json"
OUTPUT_DIR = "predictions/"
BATCH_SIZE = 5
MAX_SEQ_LENGTH = 128
NO_CUDA = False
TRAINING_SET_SIZE_PERCENT = 0.85

class SentenceEvaluator():
    def __init__(self,
                 input_file="data/stereo_dataset.json",
                 model_name="roberta-base",
                 intersentence_load_path=None,
                 intrasentence_load_path=None,
                 skip_intrasentence=False,
                 skip_intersentence=False,
                 loss_fn = nn.MSELoss):
        print(f"Loading {input_file}...")

        #self.dataloader = dataloader.StereoSet(os.path.abspath(input_file))
        self.input_file = input_file
        self.model_name = model_name
        self.INTRASENTENCE_LOAD_PATH = intrasentence_load_path
        self.INTERSENTENCE_LOAD_PATH = intersentence_load_path
        self.SKIP_INTERSENTENCE = skip_intersentence
        self.SKIP_INTRASENTENCE = skip_intrasentence
        self.loss_fn = loss_fn()
        self.tokenizer = AutoTokenizer.from_pretrained(self.model_name)
        self.device = "cuda" if not NO_CUDA else "cpu"
        self.MASK_TOKEN = self.tokenizer.mask_token
        #saved after first dat split
        self.intra_train_dataset = None
        self.intra_test_dataset = None
        self.inter_train_dataset = None
        self.inter_test_dataset = None

        # Set this to be none if you don't want to batch items together!
        self.max_seq_length = None if BATCH_SIZE == 1 else MAX_SEQ_LENGTH

        self.MASK_TOKEN_IDX = self.tokenizer.encode(
            self.MASK_TOKEN, add_special_tokens=False)
        assert len(self.MASK_TOKEN_IDX) == 1
        self.MASK_TOKEN_IDX = self.MASK_TOKEN_IDX[0]

    def evaluate_intrasentence(self, useAdapter=False):
        print(f"Using device: {self.device}")
        print(f"Using pretrained class: {self.model_name}")

        model = AdaptedMLMTransformer(model_name=self.model_name).model

        config = AutoConfig.from_pretrained(self.model_name)

        print(f"Model max position embeddings: {config.max_position_embeddings}")
        if torch.cuda.is_available() and self.device == "cuda":
            print("Moving model to GPU...")
            model.to(self.device)
            # Explicitly move the underlying model to the device as well
            if hasattr(model, 'model') and isinstance(model.model, nn.Module):
                model.model.to(self.device)
                #self.tokenizer.to(self.device)
                print(f"Underlying model moved to {self.device}.")
            print(f"{model.__class__.__name__} instance moved to GPU.")
        else:
            print("CUDA is not available or device is not set to cuda, using CPU.")


        if torch.cuda.device_count() > 1 and self.device == "cuda":
            print("Let's use", torch.cuda.device_count(), "GPUs!")
            model = nn.DataParallel(model)
        model.eval()

        print()
        if self.INTRASENTENCE_LOAD_PATH:
            state_dict = torch.load(self.INTRASENTENCE_LOAD_PATH)
            model.load_state_dict(state_dict)

        #pad_to_max_length = True if BATCH_SIZE > 1 else False

        dataset = dataloader.IntrasentenceLoader(self.tokenizer, max_seq_length=self.max_seq_length,
                                                 pad_to_max_length='max_length',
                                                 input_file= self.input_file)

        if self.intra_train_dataset is None or self.intra_test_dataset is None:
          #reproducible split of dataset into train and test sets
          gen = torch.Generator().manual_seed(41)
          training_size = int(TRAINING_SET_SIZE_PERCENT * len(dataset))
          test_size = len(dataset) - training_size
          train_dataset, test_dataset = random_split(dataset, [training_size, test_size], generator=gen)
          print(f"First element of the [intrasentences] training set: {train_dataset[0]}")
          print(f"First element of the [intrasentances] test set: {test_dataset[0]}")
          #save splitted datasets for future evaluation
          self.intra_train_dataset = train_dataset
          self.intra_test_dataset = test_dataset

        #print(f'Sentence 0: {dataset.sentences[0]}')
        data_loader = DataLoader(self.intra_test_dataset, batch_size=BATCH_SIZE)
        word_probabilities = defaultdict(list)

        print("Calculating intrasentence predictions...")

        # calculate the logits for each prediction
        for sentence_id, next_token, input_ids, attention_mask, token_type_ids, sentence_label in tqdm(data_loader, total=len(data_loader)):
            # start by converting everything to a tensor
            if BATCH_SIZE == 1:
              print(f"Max attention mask value: {max(attention_mask)}")
              print(f"Min attention mask value: {min(attention_mask)}")
              max_id = max(input_ids)
              print(f"Max input ID: {max_id}, Model vocab size: {self.tokenizer.vocab_size}")

            input_ids = input_ids.squeeze(1).to(self.device)
            attention_mask = attention_mask.squeeze(1).to(self.device)
            next_token = next_token.to(self.device) #token to predict
            # else:  #in case of batch_size > 1
            #   print(input_ids)
            #   input_ids = torch.stack(input_ids).to(self.device).transpose(0, 1)
            #   attention_mask = torch.stack(attention_mask).to(
            #     self.device).transpose(0, 1)
            #   next_token = next_token.to(self.device)
            #   token_type_ids = torch.stack(token_type_ids).to(
            #     self.device).transpose(0, 1)

            mask_idxs = (input_ids == self.MASK_TOKEN_IDX)

            # get the probabilities
            output = model(input_ids, attention_mask=attention_mask)[0].softmax(dim=-1)

            output = output[mask_idxs] #target only the masked positions
            output = output.index_select(1, next_token).diag() #extract the probs of true tokens from the vocabulary dimension
            for idx, item in enumerate(output):
                word_probabilities[sentence_id[idx]].append((item.item(), sentence_label[idx]))

        # now reconcile the probabilities into sentences
        sentence_probabilties = []
        for k, v in word_probabilities.items():
            pred = {}
            pred['id'] = k
            #since we have n next tokens for the same sentance id, associated probs needs to be standarized
            #in order to be compared with other labeled sentences' scores
            v_scores = [v_k for v_k, _ in v]
            # score = np.sum([np.log2(i) for i in v]) + np.log2(len(v))
            score = np.mean(v_scores)
            pred['score'] = score
            
            loss = self.loss_fn(torch.tensor(v_scores), torch.tensor([v_v for _, v_v in v]))
            pred['loss'] = loss.item()

            sentence_probabilties.append(pred)

        return sentence_probabilties

    def count_parameters(self, model):
        return sum(p.numel() for p in model.parameters() if p.requires_grad)

    def evaluate_intersentence(self, useAdapter=False):
        model = AdaptedNSPTransformer().to(self.device)

        config = AutoConfig.from_pretrained(self.model_name)

        print(f"Model max position embeddings: {config.max_position_embeddings}")
        if torch.cuda.is_available() and self.device == "cuda":
            print("Moving model to GPU...")
            model.to(self.device)
            # Explicitly move the underlying model to the device as well
            if hasattr(model, 'model') and isinstance(model.model, nn.Module):
                model.model.to(self.device)
                #self.tokenizer.to(self.device)
                print(f"Underlying model moved to {self.device}.")
            print(f"{model.__class__.__name__} instance moved to GPU.")
        else:
            print("CUDA is not available or device is not set to cuda, using CPU.")


        print(f"Number of parameters: {self.count_parameters(model):,}")
        print("Let's use", torch.cuda.device_count(), "GPUs!")
        model = torch.nn.DataParallel(model)

        if self.INTERSENTENCE_LOAD_PATH:
            model.load_state_dict(torch.load(self.INTERSENTENCE_LOAD_PATH))

        model.eval()
        dataset = IntersentenceDataset(self.tokenizer)

        if self.inter_train_dataset is None or self.inter_test_dataset is None:
          #reproducible split of dataset into train and test sets
          gen = torch.Generator().manual_seed(42)
          training_size = int(0.85 * len(dataset))
          test_size = len(dataset) - training_size
          train_dataset, test_dataset = random_split(dataset, [training_size, test_size], generator=gen)
          print(f"First element of the [intersentences] training set: {train_dataset[0]}")
          print(f"First element of the [intersentances] test set: {test_dataset[0]}")
          #save splitted datasets for future evaluation
          self.inter_train_dataset = train_dataset
          self.inter_test_dataset = test_dataset

        #print(f'Sentence 0: {dataset.sentences[0]}')
        data_loader = DataLoader(self.inter_test_dataset, batch_size=BATCH_SIZE)

        print("Calculating intersentence predictions...")
        if NO_CUDA:
            n_cpus = cpu_count()
            print(f"Using {n_cpus} cpus!")
            predictions = Parallel(n_jobs=n_cpus, backend="multiprocessing")(delayed(process_job)(
                batch, model, self.model_name) for batch in tqdm(data_loader, total=len(data_loader)))
        else:
            predictions = []

            for batch_num, batch in tqdm(enumerate(data_loader), total=len(data_loader)):
                input_ids, token_type_ids, attention_mask, sentence_id, sentence_label = batch
                input_ids = input_ids.squeeze(1).to(self.device)
                attention_mask = attention_mask.squeeze(1).to(self.device)
                #token_type_ids = token_type_ids.to(self.device)

                outputs = model(input_ids=input_ids,attention_mask=attention_mask)
                if hasattr(outputs, "logits"):
                    outputs = outputs.logits
                else:
                    outputs = outputs[0]
                outputs = torch.softmax(outputs, dim=1)

                for idx in range(input_ids.shape[0]):
                    probabilities = {}
                    probabilities['id'] = sentence_id[idx]
                    #probability of the second sentence to be "next" to the first one, [idx, 1] corresponds to the positive class
                    probabilities['score'] = outputs[idx, 1].item()

                    loss = self.loss_fn(torch.tensor(outputs[idx, 1].item()), torch.tensor(sentence_label[idx]))
                    probabilities['loss'] = loss.item()

                    predictions.append(probabilities)

        return predictions

    def evaluate(self):
        predictions = {}
        if not self.SKIP_INTERSENTENCE:
            intersentence_predictions = self.evaluate_intersentence()
            predictions['intersentence'] = intersentence_predictions

        if not self.SKIP_INTRASENTENCE:
            intrasentence_predictions = self.evaluate_intrasentence()
            predictions['intrasentence'] = intrasentence_predictions
        return predictions


def process_job(batch, model, pretrained_class):
    input_ids, token_type_ids, sentence_id = batch
    outputs = model(input_ids, token_type_ids=token_type_ids)
    if type(outputs) == tuple:
        outputs = outputs[0]
    outputs = torch.softmax(outputs, dim=1)

    pid = sentence_id[0]
    #probability of the second sentence to be "next" to the first one, [idx, 1] corresponds to the positive class
    pscore = outputs[0, 1].item()
    return (pid, pscore)

pretrained_model_name="roberta-base"
evaluator = SentenceEvaluator(input_file=INPUT_FILE, model_name=pretrained_model_name)
results = evaluator.evaluate()
results

#save evaluation predictions
output_file = f"{pretrained_model_name}_{OUTPUT_FILE_PRETRAINED}"
output_file = os.path.join(OUTPUT_DIR, output_file)
with open(output_file, "w+") as f:
    json.dump(results, f, indent=2)

Loading data/stereo_dataset.json...


Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at roberta-base and are newly initialized: ['classifier.dense.bias', 'classifier.dense.weight', 'classifier.out_proj.bias', 'classifier.out_proj.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Model max position embeddings: 514
Moving model to GPU...
Underlying model moved to cuda.
AdaptedNSPTransformer instance moved to GPU.
Number of parameters: 124,647,170
Let's use 1 GPUs!


  if (len(sentence_tokens) + len(context_tokens)) > self.emp_max_seq_length:
  while (len(sentence_tokens) + len(context_tokens)) < self.max_seq_length:


Maximum sequence length found: -inf
First element of the [intersentences] training set: (tensor([[    0, 13584,  1623,  1364,    25,    10,   475,  2137,     4,     2,
             2, 22649, 31720,    32,   747, 11216,     4,     2,     1,     1,
             1,     1,     1,     1,     1,     1,     1,     1,     1,     1,
             1,     1,     1,     1,     1,     1,     1,     1,     1,     1,
             1,     1,     1,     1,     1,     1,     1,     1,     1,     1,
             1,     1,     1,     1,     1,     1,     1,     1,     1,     1,
             1,     1,     1,     1,     1,     1,     1,     1,     1,     1,
             1,     1,     1,     1,     1,     1,     1,     1,     1,     1,
             1,     1,     1,     1,     1,     1,     1,     1,     1,     1,
             1,     1,     1,     1,     1,     1,     1,     1,     1,     1,
             1,     1,     1,     1,     1,     1,     1,     1,     1,     1,
             1,     1,     1,     1,     1

  loss = self.loss_fn(torch.tensor(outputs[idx, 1].item()), torch.tensor(sentence_label[idx]))
  loss = self.loss_fn(torch.tensor(outputs[idx, 1].item()), torch.tensor(sentence_label[idx]))
100%|██████████| 192/192 [00:12<00:00, 15.60it/s]



Using device: cuda
Using pretrained class: roberta-base
Model max position embeddings: 514
Moving model to GPU...
Model max position embeddings: 514
Moving model to GPU...
RobertaForMaskedLM instance moved to GPU.

RobertaForMaskedLM instance moved to GPU.

First element of the [intrasentences] training set: ('f8eadc4d1b9a7e3d8f8c045c6c90a1e6', 10733, tensor([[    0,  2895, 27775,    82,    32,   543, 50264,     4,     2,     1,
             1,     1,     1,     1,     1,     1,     1,     1,     1,     1,
             1,     1,     1,     1,     1,     1,     1,     1,     1,     1,
             1,     1,     1,     1,     1,     1,     1,     1,     1,     1,
             1,     1,     1,     1,     1,     1,     1,     1,     1,     1,
             1,     1,     1,     1,     1,     1,     1,     1,     1,     1,
             1,     1,     1,     1,     1,     1,     1,     1,     1,     1,
             1,     1,     1,     1,     1,     1,     1,     1,     1,     1,
             1

100%|██████████| 331/331 [00:29<00:00, 11.09it/s]



Loading data/stereo_dataset.json...


Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at roberta-base and are newly initialized: ['classifier.dense.bias', 'classifier.dense.weight', 'classifier.out_proj.bias', 'classifier.out_proj.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Model max position embeddings: 514
Moving model to GPU...
Underlying model moved to cuda.
AdaptedNSPTransformer instance moved to GPU.
Number of parameters: 124,647,170
Let's use 1 GPUs!


  input_ids = torch.tensor(input_ids)
  attention_mask = torch.tensor(attention_mask)


Maximum sequence length found: -inf
First element of the [intersentences] training set: (tensor([[    0, 13584,  1623,  1364,    25,    10,   475,  2137,     4,     2,
             2, 22649, 31720,    32,   747, 11216,     4,     2,     1,     1,
             1,     1,     1,     1,     1,     1,     1,     1,     1,     1,
             1,     1,     1,     1,     1,     1,     1,     1,     1,     1,
             1,     1,     1,     1,     1,     1,     1,     1,     1,     1,
             1,     1,     1,     1,     1,     1,     1,     1,     1,     1,
             1,     1,     1,     1,     1,     1,     1,     1,     1,     1,
             1,     1,     1,     1,     1,     1,     1,     1,     1,     1,
             1,     1,     1,     1,     1,     1,     1,     1,     1,     1,
             1,     1,     1,     1,     1,     1,     1,     1,     1,     1,
             1,     1,     1,     1,     1,     1,     1,     1,     1,     1,
             1,     1,     1,     1,     1

 78%|███████▊  | 150/192 [00:09<00:02, 16.02it/s]


KeyboardInterrupt: 