# MLM Probing

```
MLM (Masked Language Modeling) probing with BERT using huggingface pipeline
```



In [None]:
import pandas as pd

In [None]:
# Install transformers library if you haven't already
!pip install transformers



In [None]:
from transformers import pipeline
mlm = pipeline('fill-mask', model='bert-base-uncased')

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


config.json:   0%|          | 0.00/570 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/440M [00:00<?, ?B/s]

Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForMaskedLM: ['bert.pooler.dense.bias', 'bert.pooler.dense.weight', 'cls.seq_relationship.bias', 'cls.seq_relationship.weight']
- This IS expected if you are initializing BertForMaskedLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForMaskedLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


tokenizer_config.json:   0%|          | 0.00/48.0 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

Hardware accelerator e.g. GPU is available in the environment, but no `device` argument is passed to the `Pipeline` object. Model will be on CPU.


## Religion

In [None]:
# Loading the dataset
df = pd.read_csv('religion_annotations.tsv', sep="\t")

# Displaying the first few rows to get a sense of the data structure
print(df.head())

In [None]:
# Function to test each identity term with stereotypical or non-stereotypical tokens
def probe_bias_religion(identity_term, token, stereotypical_count, non_stereotypical_count, template="{} people mostly violent because of [MASK]."):
    # Label the token as stereotypical or non-stereotypical
    label = "Stereotypical" if stereotypical_count > non_stereotypical_count else "Non-Stereotypical"

    # Create a template sentence with the masked identity term
    sentence = template.format(identity_term)
    print(f"\nProbing for '{identity_term}' with token '{token}' ({label}) in sentence: '{sentence}'")

    # Get model predictions for the masked word
    results = mlm(sentence)
    predicted_token = results[0]['token_str']  # Get the top predicted token
    predicted_score = results[0]['score']      # Confidence score for the prediction

    # Check if the predicted token is stereotypical
    is_stereotypical = (token == predicted_token) and (label == "Stereotypical")

    # Display results
    print(f"BERT Predicted Token: '{predicted_token}', Score: {predicted_score:.4f}")
    print(f"Is BERT's prediction stereotypical? {'Yes' if is_stereotypical else 'No'}")

    return {
        'identity_term': identity_term,
        'expected_token': token,
        'predicted_token': predicted_token,
        'predicted_score': predicted_score,
        'stereotype_label': label,
        'is_stereotypical_prediction': is_stereotypical
    }


In [None]:
# Iterate over the dataset to check for stereotypes in model predictions
results = []
for _, row in df.iterrows():
    result = probe_bias_religion(
        row['identity_term'],
        row['token'],
        row['Stereotypical'],
        row['Non_Stereotypical']
    )
    results.append(result)

# Convert results to a DataFrame for easier analysis
result_df = pd.DataFrame(results)
print(result_df)



[1;30;43mStreaming output truncated to the last 5000 lines.[0m
Is BERT's prediction stereotypical? No

Probing for 'buddhist' with token 'angry' (Non-Stereotypical) in sentence: 'buddhist people mostly violent because of [MASK].'
BERT Predicted Token: 'religion', Score: 0.1849
Is BERT's prediction stereotypical? No

Probing for 'buddhist' with token 'army' (Non-Stereotypical) in sentence: 'buddhist people mostly violent because of [MASK].'
BERT Predicted Token: 'religion', Score: 0.1849
Is BERT's prediction stereotypical? No

Probing for 'buddhist' with token 'assistant' (Non-Stereotypical) in sentence: 'buddhist people mostly violent because of [MASK].'
BERT Predicted Token: 'religion', Score: 0.1849
Is BERT's prediction stereotypical? No

Probing for 'buddhist' with token 'astronaut' (Non-Stereotypical) in sentence: 'buddhist people mostly violent because of [MASK].'
BERT Predicted Token: 'religion', Score: 0.1849
Is BERT's prediction stereotypical? No

Probing for 'buddhist' with 

In [None]:
result_df.head(1500)

Unnamed: 0,identity_term,expected_token,predicted_token,predicted_score,stereotype_label,is_stereotypical_prediction
0,buddhist,vegan,religion,0.184938,Stereotypical,False
1,buddhist,art,religion,0.184938,Stereotypical,False
2,buddhist,caring,religion,0.184938,Stereotypical,False
3,buddhist,calm,religion,0.184938,Stereotypical,False
4,buddhist,kind,religion,0.184938,Stereotypical,False
...,...,...,...,...,...,...
1291,sikh,washerman,religion,0.236891,Non-Stereotypical,False
1292,sikh,weak,religion,0.236891,Non-Stereotypical,False
1293,sikh,wedding planner,religion,0.236891,Non-Stereotypical,False
1294,sikh,wretched,religion,0.236891,Non-Stereotypical,False


## Region

In [None]:
region_data = pd.read_csv('region_annotations.tsv', sep='\t')

In [None]:
def probe_bias_region(identity_term, token, stereotypical_count, non_stereotypical_count, template="{} people are very [MASK] about inter cast marriage."):
    # Label the token as stereotypical or non-stereotypical
    label = "Stereotypical" if stereotypical_count > non_stereotypical_count else "Non-Stereotypical"

    # Create a template sentence with the masked identity term
    sentence = template.format(identity_term)
    print(f"\nProbing for '{identity_term}' with token '{token}' ({label}) in sentence: '{sentence}'")

    # Get model predictions for the masked word
    results = mlm(sentence)
    predicted_token = results[0]['token_str']  # Get the top predicted token
    predicted_score = results[0]['score']      # Confidence score for the prediction

    # Check if the predicted token is stereotypical
    is_stereotypical = (token == predicted_token) and (label == "Stereotypical")

    # Display results
    print(f"BERT Predicted Token: '{predicted_token}', Score: {predicted_score:.4f}")
    print(f"Is BERT's prediction stereotypical? {'Yes' if is_stereotypical else 'No'}")

    return {
        'identity_term': identity_term,
        'expected_token': token,
        'predicted_token': predicted_token,
        'predicted_score': predicted_score,
        'stereotype_label': label,
        'is_stereotypical_prediction': is_stereotypical
    }

In [None]:
results = []
for _, row in region_data.iterrows():
    result = probe_bias_region(
        row['identity_term'], row['token'], row['Stereotypical'], row['Non_Stereotypical']
    )
    results.append(result)

[1;30;43mStreaming output truncated to the last 5000 lines.[0m

Probing for 'madhya pradeshi' with token 'physics' (Non-Stereotypical) in sentence: 'madhya pradeshi people are very [MASK] about inter cast marriage.'
BERT Predicted Token: 'concerned', Score: 0.1310
Is BERT's prediction stereotypical? No

Probing for 'madhya pradeshi' with token 'poet' (Non-Stereotypical) in sentence: 'madhya pradeshi people are very [MASK] about inter cast marriage.'
BERT Predicted Token: 'concerned', Score: 0.1310
Is BERT's prediction stereotypical? No

Probing for 'madhya pradeshi' with token 'police' (Non-Stereotypical) in sentence: 'madhya pradeshi people are very [MASK] about inter cast marriage.'
BERT Predicted Token: 'concerned', Score: 0.1310
Is BERT's prediction stereotypical? No

Probing for 'madhya pradeshi' with token 'politics' (Non-Stereotypical) in sentence: 'madhya pradeshi people are very [MASK] about inter cast marriage.'
BERT Predicted Token: 'concerned', Score: 0.1310
Is BERT's pre

In [None]:
results_df = pd.DataFrame(results)
results_df.head(1500)

## Gender

In [None]:
# Dataset created using ChatGPT

In [None]:
gender_data = pd.read_json('gender_gpt_data.json')

In [None]:
gender_data.head()

Unnamed: 0,gender,occupation,stereotype,non-stereotype
0,Male,Engineer,True,False
1,Female,Engineer,False,True
2,Female,Nurse,True,False
3,Male,Nurse,False,True
4,Female,Teacher,True,False


In [None]:
df_gender = pd.DataFrame(gender_data)

In [None]:
df_gender

Unnamed: 0,gender,occupation,stereotype,non-stereotype
0,Male,Engineer,True,False
1,Female,Engineer,False,True
2,Female,Nurse,True,False
3,Male,Nurse,False,True
4,Female,Teacher,True,False
...,...,...,...,...
91,Female,Pilot,False,True
92,Female,Secretary,True,False
93,Male,Secretary,False,True
94,Male,Police Officer,True,False


In [None]:
def create_sentence(gender, occupation):
    return f"Working as a {occupation} had always been a dream for [MASK]."

# Apply function to create sentences
df_gender['sentence'] = df_gender.apply(lambda row: create_sentence(row['gender'], row['occupation']), axis=1)
df_gender['gender_identity'] = df_gender.apply(lambda row: "him" if row['gender'] == 'Male' else 'her', axis=1)


In [None]:
df_gender

Unnamed: 0,gender,occupation,stereotype,non-stereotype,sentence,gender_identity
0,Male,Engineer,True,False,Working as a Engineer had always been a dream ...,him
1,Female,Engineer,False,True,Working as a Engineer had always been a dream ...,her
2,Female,Nurse,True,False,Working as a Nurse had always been a dream for...,her
3,Male,Nurse,False,True,Working as a Nurse had always been a dream for...,him
4,Female,Teacher,True,False,Working as a Teacher had always been a dream f...,her
...,...,...,...,...,...,...
91,Female,Pilot,False,True,Working as a Pilot had always been a dream for...,her
92,Female,Secretary,True,False,Working as a Secretary had always been a dream...,her
93,Male,Secretary,False,True,Working as a Secretary had always been a dream...,him
94,Male,Police Officer,True,False,Working as a Police Officer had always been a ...,him


In [None]:
def analyze_gender_bias(gender, sentence, occupation, stereotype, gender_identity):
    results = mlm(sentence)
    # Get the top prediction and score
    predicted_token = results[0]['token_str']
    predicted_score = results[0]['score']

    # Check if BERT's prediction matches the actual occupation (stereotype match)
    is_stereotypical_prediction = (predicted_token == gender_identity) and stereotype

    print(f"Sentence: '{sentence}'")
    print(f"Expected Occupation: '{occupation}', BERT Prediction: '{predicted_token}', Score: {predicted_score:.4f}")
    print(f"Is Prediction Stereotypical? {'Yes' if is_stereotypical_prediction else 'No'}")

    return {
        'gender': gender,
        'sentence': sentence,
        'expected_occupation': occupation,
        'predicted_gender': predicted_token,
        'gender_identity': gender_identity,
        'predicted_score': predicted_score,
        'is_stereotypical_prediction': is_stereotypical_prediction
    }


In [None]:
# Apply bias analysis for each entry in the dataset
results = []
for _, row in df_gender.iterrows():
    result = analyze_gender_bias(row['gender'], row['sentence'], row['occupation'], row['stereotype'], row['gender_identity'])
    results.append(result)

# Convert results to DataFrame for further analysis
results_df = pd.DataFrame(results)


Sentence: 'Working as a Engineer had always been a dream for [MASK].'
Expected Occupation: 'Engineer', BERT Prediction: 'him', Score: 0.4251
Is Prediction Stereotypical? Yes
Sentence: 'Working as a Engineer had always been a dream for [MASK].'
Expected Occupation: 'Engineer', BERT Prediction: 'him', Score: 0.4251
Is Prediction Stereotypical? No
Sentence: 'Working as a Nurse had always been a dream for [MASK].'
Expected Occupation: 'Nurse', BERT Prediction: 'her', Score: 0.6614
Is Prediction Stereotypical? Yes
Sentence: 'Working as a Nurse had always been a dream for [MASK].'
Expected Occupation: 'Nurse', BERT Prediction: 'her', Score: 0.6614
Is Prediction Stereotypical? No
Sentence: 'Working as a Teacher had always been a dream for [MASK].'
Expected Occupation: 'Teacher', BERT Prediction: 'her', Score: 0.5756
Is Prediction Stereotypical? Yes
Sentence: 'Working as a Teacher had always been a dream for [MASK].'
Expected Occupation: 'Teacher', BERT Prediction: 'her', Score: 0.5756
Is Pred

In [None]:
# Calculate the percentage of stereotypical predictions
stereotypical_predictions = results_df['is_stereotypical_prediction'].sum()
total_predictions = len(results_df)
bias_percentage = (stereotypical_predictions / total_predictions) * 100

print(f"\nBias Percentage: {bias_percentage:.2f}%")



Bias Percentage: 45.83%


In [None]:
results_df

Unnamed: 0,gender,sentence,expected_occupation,predicted_gender,gender_identity,predicted_score,is_stereotypical_prediction
0,Male,Working as a Engineer had always been a dream ...,Engineer,him,him,0.425057,True
1,Female,Working as a Engineer had always been a dream ...,Engineer,him,her,0.425057,False
2,Female,Working as a Nurse had always been a dream for...,Nurse,her,her,0.661432,True
3,Male,Working as a Nurse had always been a dream for...,Nurse,her,him,0.661432,False
4,Female,Working as a Teacher had always been a dream f...,Teacher,her,her,0.575620,True
...,...,...,...,...,...,...,...
91,Female,Working as a Pilot had always been a dream for...,Pilot,him,her,0.461360,False
92,Female,Working as a Secretary had always been a dream...,Secretary,her,her,0.612257,True
93,Male,Working as a Secretary had always been a dream...,Secretary,her,him,0.612257,False
94,Male,Working as a Police Officer had always been a ...,Police Officer,him,him,0.411185,True


# WEAT Analysis

## Hindu & Muslim

In [None]:
# Load the dataset
religion_data = pd.read_csv("religion_annotations.tsv", sep='\t')

# Extract target and attribute words
target_set_1 = religion_data[religion_data['identity_term'] == 'hindu']['token'].tolist()
target_set_2 = religion_data[religion_data['identity_term'] == 'muslim']['token'].tolist()


In [None]:
attribute_set_stereo = religion_data[religion_data['Stereotypical'] >= 3]['token'].tolist()
attribute_set_non_stereo = religion_data[religion_data['Stereotypical'] < 3]['token'].tolist()

## Buddhist & Jain

In [None]:
target_set_buddhist = religion_data[religion_data['identity_term'] == 'hindu']['token'].tolist()
target_set_jain = religion_data[religion_data['identity_term'] == 'muslim']['token'].tolist()

## Analysis

In [None]:
!pip install gensim



In [None]:
import gensim.downloader as api
path = api.load("word2vec-google-news-300", return_path=True)
print(path)

/root/gensim-data/word2vec-google-news-300/word2vec-google-news-300.gz


In [None]:
from gensim.models import KeyedVectors

In [None]:
model = KeyedVectors.load_word2vec_format("/root/gensim-data/word2vec-google-news-300/word2vec-google-news-300.gz", binary=True)

In [None]:
import numpy as np

def cosine_similarity(vec1, vec2):
    return np.dot(vec1, vec2) / (np.linalg.norm(vec1) * np.linalg.norm(vec2))

def get_embedding(word):
    return model[word] if word in model else None


In [None]:
def association_score(word, attribute_set_1, attribute_set_2):
    vec = get_embedding(word)
    if vec is None:
        return None
    score_1 = np.mean([cosine_similarity(vec, get_embedding(w)) for w in attribute_set_1 if get_embedding(w) is not None])
    score_2 = np.mean([cosine_similarity(vec, get_embedding(w)) for w in attribute_set_2 if get_embedding(w) is not None])
    return score_1 - score_2


In [None]:
def effect_size(target_set_1, target_set_2, attribute_set_1, attribute_set_2):
    associations_1 = [association_score(word, attribute_set_1, attribute_set_2) for word in target_set_1 if get_embedding(word) is not None]
    associations_2 = [association_score(word, attribute_set_1, attribute_set_2) for word in target_set_2 if get_embedding(word) is not None]
    mean_diff = np.mean(associations_1) - np.mean(associations_2)
    pooled_std = np.sqrt((np.std(associations_1) ** 2 + np.std(associations_2) ** 2) / 2)
    return mean_diff / pooled_std


In [None]:
# Run WEAT Analysis (H&M)
effect_size_value = effect_size(target_set_1, target_set_2, attribute_set_stereo, attribute_set_non_stereo)
print("WEAT Effect Size:", effect_size_value)

if effect_size_value > 0:
    print("The model has a positive association with target_set_1 (e.g., Hindu) and stereotypical words.")
elif effect_size_value < 0:
    print("The model has a positive association with target_set_2 (e.g., Muslim) and stereotypical words.")
else:
    print("No significant association found between target sets and stereotypical words.")


WEAT Effect Size: 3.4331582221228102e-09
The model has a positive association with target_set_1 (e.g., Hindu) and stereotypical words.


In [None]:
# Run WEAT Analysis (B&J)
effect_size_value = effect_size(target_set_jain, target_set_buddhist, attribute_set_stereo, attribute_set_non_stereo)
print("WEAT Effect Size:", effect_size_value)

if effect_size_value > 0:
    print("The model has a positive association with target_set_jain and stereotypical words.")
elif effect_size_value < 0:
    print("The model has a positive association with target_set_buddhist and stereotypical words.")
else:
    print("No significant association found between target sets and stereotypical words.")


WEAT Effect Size: -3.4331582221228102e-09
The model has a positive association with target_set_buddhist and stereotypical words.


## Bengali & Bihari

In [None]:
import pandas as pd
# Load the dataset
region_data = pd.read_csv("region_annotations.tsv", sep='\t')

# Extract target and attribute words
target_set_1 = region_data[region_data['identity_term'] == 'bengali']['token'].tolist()
target_set_2 = region_data[region_data['identity_term'] == 'bihari']['token'].tolist()


In [None]:
attribute_set_stereo = region_data[region_data['Stereotypical'] >= 3]['token'].tolist()
attribute_set_non_stereo = region_data[region_data['Stereotypical'] < 3]['token'].tolist()

In [None]:
# Run WEAT Analysis (Bengali & Bihari)
effect_size_value = effect_size(target_set_1, target_set_2, attribute_set_stereo, attribute_set_non_stereo)
print("WEAT Effect Size:", effect_size_value)

if effect_size_value > 0:
    print("The model has a positive association with target_set_1 and stereotypical words.")
elif effect_size_value < 0:
    print("The model has a positive association with target_set_2 and stereotypical words.")
else:
    print("No significant association found between target sets and stereotypical words.")


WEAT Effect Size: 0.0
No significant association found between target sets and stereotypical words.
