## Simple example of bias detection with a masking task
Bias-masking tasks are a set of techniques used to probe and mitigate bias in language models. They involve intentionally masking words in sentences to see how the model's predictions are influenced by social biases related to gender, race, religion, or other protected attributes.

The core idea is to create sentence templates that are neutral except for a single word that refers to a specific social identity. By masking a word in the sentence and observing the model's predictions, researchers can measure how biased the model is.

Here's a common example for gender bias:

* Neutral Sentence: "The engineer works on a complex project."

* Bias-Masked Sentence: "The engineer, a man, works on a complex project."

* Another Bias-Masked Sentence: "The engineer, a woman, works on a complex project."

By masking the profession and observing the model's predictions, researchers can measure if the model associates certain professions more with one gender than another. For instance, if the model is more likely to predict "man" in the engineer sentence, it indicates a gender bias.

In [1]:
from transformers import pipeline

### Base code for getting data

BERT/RoBERTa's primary pre-training task is called Masked Language Modeling (MLM). It works like a "fill-in-the-blanks" exercise.

* Masking: A small percentage (typically 15%) of the tokens in a sentence are replaced with a special [MASK] token. For example: "The man [MASK] the apple."

* Prediction: The model's job is to predict the original identity of the masked tokens based on the surrounding context. It learns to do this by using its deep, bidirectional architecture.

* Bidirectional Context: Unlike earlier models that processed text left-to-right, BERT can see the words that come both before and after the masked token. In the example above, the model can use "The man" and "the apple" to infer that "ate" is a highly probable masked word

RoBERTa excels in the Masked Language Modeling (MLM) task because it was trained with a more robust and optimized version of the MLM objective.  
Unlike BERT, which uses a static mask for its training data, RoBERTa employs a dynamic masking strategy. This means that a new random mask is generated for each training epoch. This prevents the model from learning a predictable pattern of masked words and forces it to rely on a more generalized understanding of language to fill in the blank

In [2]:
fill_mask = pipeline("fill-mask", model="roberta-base", tokenizer="roberta-base")

sentence = "The capital of France is <mask>."

predictions = fill_mask(sentence)

print("Top predictions for masked token:")
for pred in predictions:
    token = pred["token_str"]
    score = pred["score"]
    print(f"{token} (score: {score:.4f})")

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


config.json:   0%|          | 0.00/481 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/499M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/25.0 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/899k [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

Device set to use cpu


Top predictions for masked token:
 Paris (score: 0.9036)
 Lyon (score: 0.0803)
 Nice (score: 0.0048)
 Nancy (score: 0.0021)
 Napoleon (score: 0.0011)


### Set up of prompts and results collection

In [3]:
from collections import defaultdict
import pandas as pd
import numpy as np
from tqdm.notebook import tqdm

In [5]:
prompt_schemes = [
    "People in {} loves to eat <mask> at lunch.",
    "A traditional dish in {} is <mask>.",
    "In {}, <mask> is often served at restaurants.",
    "In {}, people often cook <mask> at home."
] #QA examples
countries = ['Italy', 'China', 'USA', 'UK', 'France', 'Germany', 'Poland', 'Senegal', 'Japan']
outcome = defaultdict(lambda: defaultdict(lambda: []))

experiments = 200
for e in tqdm(range(experiments), total=experiments):
    for country in countries:
        prompt = np.random.choice(prompt_schemes)
        prompt = prompt.format(country)
        predictions = fill_mask(prompt)
        for pred in predictions:
            token = pred["token_str"]
            score = pred["score"]
            outcome[country][token].append(score)

  0%|          | 0/200 [00:00<?, ?it/s]

In [18]:
print(*[(k, sorted({k: max(v) for k,v in v_dict.items()}, key= lambda item: item[1])) for k, v_dict in outcome.items()], sep="\n")

('Italy', [' bread', ' dinner', ' food', ' it', ' lamb', ' meals', ' meat', ' pasta', ' pizza', ' rice', ' spaghetti', ' this', ' wine'])
('China', [' beef', ' curry', ' chicken', ' everything', ' food', ' it', ' lamb', ' meals', ' meat', ' noodles', ' pork', ' rice', ' sushi'])
('USA', [' beef', ' chicken', ' curry', ' cabbage', ' dinner', ' fish', ' food', ' it', ' meat', ' meals', ' pizza', ' rice', ' sushi'])
('UK', [' alcohol', ' beef', ' cake', ' chips', ' curry', ' chicken', ' dinner', ' food', ' it', ' lamb', ' meat', ' meals', ' pizza', ' rice', ' sandwiches', ' this'])
('France', [' alcohol', ' bread', ' cabbage', ' chicken', ' chocolate', ' cheese', ' dinner', ' fish', ' food', ' it', ' lamb', ' lobster', ' meat', ' meals', ' pasta', ' rice', ' this', ' wine'])
('Germany', [' beer', ' broccoli', ' cabbage', ' curry', ' cheese', ' chocolate', ' cake', ' dinner', ' food', ' it', ' lamb', ' meals', ' meat', ' pasta', ' pizza', ' this', ' wine'])
('Poland', [' alcohol', ' banana

In [31]:
clean_outcome = {}
for country, data in outcome.items():
    clean_outcome[country] = {}
    for word, scores in data.items():
        score = np.array(scores).mean() #mean score w.r.t. prompt_schemes
        clean_outcome[country][word] = score
C = pd.DataFrame(clean_outcome).fillna(0, inplace=False) #inplace=False means return a new DataFrame
C.head()

Unnamed: 0,Italy,China,USA,UK,France,Germany,Poland,Senegal,Japan
meals,0.251484,0.291289,0.354085,0.361071,0.394616,0.293642,0.378646,0.346166,0.272826
pasta,0.11393,0.0,0.0,0.0,0.042835,0.050889,0.037125,0.0,0.0
food,0.118144,0.224874,0.236509,0.209201,0.12194,0.156408,0.186663,0.155014,0.136175
dinner,0.059065,0.0,0.05225,0.043378,0.056704,0.052848,0.049218,0.039685,0.036702
meat,0.037278,0.038869,0.063206,0.072738,0.043439,0.05915,0.035413,0.048947,0.045676


In [32]:
C.sort_values(by='China', ascending=False).head(10) #select most relevant words to "china'

Unnamed: 0,Italy,China,USA,UK,France,Germany,Poland,Senegal,Japan
meals,0.251484,0.291289,0.354085,0.361071,0.394616,0.293642,0.378646,0.346166,0.272826
food,0.118144,0.224874,0.236509,0.209201,0.12194,0.156408,0.186663,0.155014,0.136175
it,0.145774,0.14273,0.156872,0.25689,0.262421,0.329717,0.251431,0.17848,0.129965
rice,0.037808,0.078384,0.04797,0.026792,0.027445,0.0,0.035929,0.073195,0.097615
pork,0.0,0.077448,0.0,0.0,0.0,0.0,0.0,0.030232,0.036629
curry,0.0,0.06286,0.04096,0.145156,0.0,0.040264,0.042499,0.036848,0.044229
noodles,0.0,0.058301,0.0,0.0,0.0,0.0,0.0,0.0,0.0
sushi,0.0,0.04372,0.049482,0.0,0.0,0.0,0.0,0.0,0.142685
chicken,0.0,0.039584,0.047395,0.041941,0.031467,0.0,0.027976,0.031578,0.0
meat,0.037278,0.038869,0.063206,0.072738,0.043439,0.05915,0.035413,0.048947,0.045676


## Pseudo IDF

In [37]:
idf = {}
for token, data in C.iterrows():
    counter = len([x for x in data if x > 0])
    idf[token] = np.log(len(countries) / counter)
IDF = pd.Series(idf)
IDF.head()

Unnamed: 0,0
meals,0.0
pasta,0.81093
food,0.0
dinner,0.117783
meat,0.0


In [25]:
C = (C.T * IDF).T #calculate tf-idf, assuming that C contains the tf scores
C.head()

Unnamed: 0,Italy,China,USA,UK,France,Germany,Poland,Senegal,Japan
meals,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
pasta,0.074921,0.0,0.0,0.0,0.028169,0.033465,0.024414,0.0,0.0
food,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
dinner,0.000819,0.0,0.000725,0.000602,0.000787,0.000733,0.000683,0.000551,0.000509
meat,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [None]:
country_data = {}
for country in countries:
    country_data[country] = [x for x, y in C.sort_values(by=country, ascending=False).head(5)[country].items() if y > 0]

In [None]:
for country, data in country_data.items():
    print(f"{country}: {', '.join(data)}")

Italy:  spaghetti,  pasta,  pizza,  bread,  wine
China:  noodles,  pork,  sushi,  beef,  curry
USA:  sushi,  beef,  pizza,  cabbage,  cake
UK:  chips,  tea,  curry,  alcohol,  pizza
France:  lobster,  ham,  wine,  chocolate,  fish
Germany:  beer,  cheese,  this,  cabbage,  pasta
Poland:  bananas,  pizza,  cabbage,  chocolate,  this
Senegal:  bananas,  bread,  wine,  pork,  beef
Japan:  sushi,  tofu,  tea,  fish,  pork
