# Implicit bias in BERT
We saw that BERT performs well on most tasks and is unbiased on certain tasks. However, we put forward another hypothesis that BERT has an implicit bias towards certain groups of people. 

Like the [implicit bias test](https://implicit.harvard.edu/implicit/Study?tid=-1) for humans that uses answering speed to infer implicit bias in humans. This test utilizes response time to infer that people's first intuition is to associate certain words with certain groups of people. For instance, if you are asked to name a profession and you say "doctor" faster than "lawyer", then you have an implicit bias towards doctors over lawyers. The test gives mulitple pictures of African American and European American children, pleasant words, and unpleasant words. As each items appear you are asked to make responses by swiping left or right as quickly as possible. The test then uses the response time to infer implicit bias. 


We propose a similar test for BERT that proposes that BERT has two mechanisms inside the neural network, 
1. A mechanism that creates bias. 
2. A mechanism that removes bias.

Ideally BERT would have none of the above mechanisms and would be completely unbiased from the beginning. However, I propose that because biases are discovered later and retraining is used to remove the biases there is slight chance that the biases are still present in the model but just hidden by the last few layers of the model. 

If our hypothesis is true we can make the following two predictions.

1. If we remove the last few layers of BERT, we should see the implicit bias in BERT. 

2. If we see the difference between biased and unbiased BERT, we can see that later neural network layers are changed more than the earlier layers of the network.

In [1]:
# Measuring implicit
# Take the firt half of hidden states from BERT and store them for later use
from transformers import BertTokenizer, BertModel
import torch
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

import warnings
warnings.filterwarnings('ignore')

Sentence_1 = "The person is from the United States"
Sentence_2 = "The person is from the India"

sentences = [Sentence_1, Sentence_2]

# Load pre-trained model tokenizer (vocabulary)
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')


def implicit_bias(sentences):
    """
    Find the difference between the first half of the hidden states 
    and the second half for different sentences. 

    :param: sentences array cotaining each group as an index
    """
    pass

def plot_implicit_bias(implicit_bias):
    """
    Plot the implicit bias for each sentence
    """
    pass

: 

: 

In [27]:
n = 2 # 2, 3, 4, 6 or 12
from collections import defaultdict

from torch import embedding
implicit_bias = defaultdict(list)
for sentence in sentences:
    from transformers import BertTokenizer, BertModel
    tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
    model = BertModel.from_pretrained("bert-base-uncased")
    encoded_input = tokenizer(sentence, return_tensors='pt')
    output = model(**encoded_input, output_hidden_states=True)
    hidden_states = output[2]
    embedding_output = hidden_states[0]
    hidden_states = hidden_states[1:]
    # divide the hidden states into n pieces and store them for later use
    n_halfs = len(hidden_states) // n

    # get the first half of the hidden states
    for i in range(n):
        implicit_bias[sentence].append(hidden_states[i * n_halfs: (i + 1) * n_halfs])


Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertModel: ['cls.predictions.decoder.weight', 'cls.predictions.bias', 'cls.seq_relationship.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.seq_relationship.weight']
- This IS expected if you are initializing BertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertModel: ['cls.predictions.decoder.weight', 'cls.predictions.bi

In [21]:
from transformers import BertTokenizer, BertModel
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
model = BertModel.from_pretrained("bert-base-uncased")
encoded_input = tokenizer("They are from United States", return_tensors='pt')
output = model(**encoded_input, output_hidden_states=True)
hidden_states = output[2]

# Lets print the length of the hidden states
print(len(hidden_states))

Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertModel: ['cls.predictions.decoder.weight', 'cls.predictions.bias', 'cls.seq_relationship.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.seq_relationship.weight']
- This IS expected if you are initializing BertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


13


In [28]:
# 13 
embedding_output = hidden_states[0]
attention_hidden_states = hidden_states[1:]

for i in range(len(attention_hidden_states)):
    print(attention_hidden_states[i].shape)
print("Embedding output", embedding_output.shape)
print("Tokenized input", encoded_input['input_ids'].shape)

torch.Size([1, 8, 768])
torch.Size([1, 8, 768])
torch.Size([1, 8, 768])
torch.Size([1, 8, 768])
torch.Size([1, 8, 768])
torch.Size([1, 8, 768])
torch.Size([1, 8, 768])
torch.Size([1, 8, 768])
torch.Size([1, 8, 768])
torch.Size([1, 8, 768])
torch.Size([1, 8, 768])
Embedding output torch.Size([1, 8, 768])
Tokenized input torch.Size([1, 8])


# Hypothesis 1: there are more bias suppresors in the later layers due to debiasing training


In [None]:
# Input a biased sentence and find the activation in the first half of the hidden states
# and the second half of the hidden states
# Compare the difference between the two halves

# Input a biased sentence and find the activation in the first half of the hidden states

