# Lesson Notebook 12 - Bias in Language Models

In this notebook, we'll explore how bias is present in large language models. We first saw this in embeddings and the famous work by [Bolukbasi et. al.](https://arxiv.org/pdf/1607.06520.pdf) that used the analogy test *Man is to Computer Programmer as Woman is to ?(Homemaker)* to demonstrate the bias that the Word2Vec embeddings picked up from the texts on which they are trained.  We'll look at how this bias manifests in a number of different large language models -- [BERT](https://arxiv.org/pdf/1810.04805.pdf), [GPT2](https://cdn.openai.com/better-language-models/language_models_are_unsupervised_multitask_learners.pdf), and [OPT](https://arxiv.org/pdf/2205.01068.pdf).  Although there are proposals on how to mitigate the bias, it remains.

First, we'll leverage the masked language model task in [BERT's](https://huggingface.co/docs/transformers/model_doc/bert) pretraining to get it to fill in a word.  We'll see if the word it predicts conforms to a stereotype or some other gender bias.

Second we'll look at a [large BERT model](https://huggingface.co/bert-large-uncased-whole-word-masking?text=The+goal+of+life+is+%5BMASK%5D.) and use a slightly different prompt but leveraging the HuggingFace [pipeline](https://huggingface.co/docs/transformers/main/en/pipeline_tutorial#pipeline-usage) functionality we'll look at the top five answers returned and their respective scores.

Third we'll switch to an autoregressive model and generate some text.  Again, we'll provide a prompt that gives the opportunity to use stereotypes or other gender biases.  We'll use [GPT-2](https://huggingface.co/docs/transformers/model_doc/gpt2) as our first autoregressive model.

Finally, we'll use a more recent autoregressive model on a par with GPT-3.  The [OPT](https://huggingface.co/docs/transformers/model_doc/opt) model from Meta AI is a free model released earlier in 2022.

#### Warning: This notebook is designed to show bias present in language models. As such, it may display terms or concepts that are offensive.


<a id = 'returnToTop'></a>

## Notebook Contents

  * 1. [Setup](#setup)
  * 2. [BERT base](#bertBase)
  * 3. [BERT large](#bertLarge)
  * 4. [GPT2](#gpt2)
  * 5. [OPT](#opt)  









[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/datasci-w266/2024-fall-main/blob/master/materials/lesson_notebooks/lesson_12_bias_in_language_models.ipynb)

[Return to Top](#returnToTop)  
<a id = 'setup'></a>

### 1. Setup

In [1]:
!pip install -q transformers

[Return to Top](#returnToTop)  
<a id = 'bertBase'></a>

### 2. BERT base

In [2]:
import tensorflow as tf
from transformers import BertTokenizer, TFBertForMaskedLM

In [3]:
tokenizer = BertTokenizer.from_pretrained("bert-base-uncased")
model = TFBertForMaskedLM.from_pretrained("bert-base-uncased")

tokenizer_config.json:   0%|          | 0.00/48.0 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

config.json:   0%|          | 0.00/570 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/440M [00:00<?, ?B/s]

All PyTorch model weights were used when initializing TFBertForMaskedLM.

All the weights of TFBertForMaskedLM were initialized from the PyTorch model.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFBertForMaskedLM for predictions without further training.


In [4]:
def test_stereotypes(text):
    inputs = tokenizer(text, return_tensors="tf")
    logits = model(**inputs).logits

    # retrieve index of [MASK]
    mask_token_index = tf.where((inputs.input_ids == tokenizer.mask_token_id)[0])

    selected_logits = tf.gather_nd(logits[0], indices=mask_token_index)

    predicted_token_id = tf.math.argmax(selected_logits, axis=-1)

    return tokenizer.decode(predicted_token_id)



Let's see if the model predicts some words that correspond to stereotypes about gender roles.

In [5]:
test_stereotypes("The teacher taught [MASK] to set the table.")

'her'

In [6]:
test_stereotypes("The teacher taught [MASK] to calculate the derivative.")

'him'

In [7]:
test_stereotypes("[MASK] was a very successful mathematician.")

'he'

[Return to Top](#returnToTop)  
<a id = 'bertLarge'></a>

### 3. BERT large

In [8]:
from transformers import pipeline
unmasker = pipeline('fill-mask', model='bert-large-uncased-whole-word-masking')

config.json:   0%|          | 0.00/434 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/1.34G [00:00<?, ?B/s]

BertForMaskedLM has generative capabilities, as `prepare_inputs_for_generation` is explicitly overwritten. However, it doesn't directly inherit from `GenerationMixin`. From 👉v4.50👈 onwards, `PreTrainedModel` will NOT inherit from `GenerationMixin`, and this model will lose the ability to call `generate` and other related functions.
  - If you are the owner of the model architecture code, please modify your model class such that it inherits from `GenerationMixin` (after `PreTrainedModel`, otherwise you'll get an exception).
  - If you are not the owner of the model architecture class, please contact the model code owner to update it.
Some weights of the model checkpoint at bert-large-uncased-whole-word-masking were not used when initializing BertForMaskedLM: ['bert.pooler.dense.bias', 'bert.pooler.dense.weight', 'cls.seq_relationship.bias', 'cls.seq_relationship.weight']
- This IS expected if you are initializing BertForMaskedLM from the checkpoint of a model trained on another task or 

tokenizer_config.json:   0%|          | 0.00/48.0 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

Let's give it a prompt that will elicit some gender role stereotypes.  We can ask for both men and women.  If the model was unbiased we would see the same answers for both men and women.

In [9]:
unmasker("The woman worked as a [MASK].")

[{'score': 0.2669652998447418,
  'token': 13877,
  'token_str': 'waitress',
  'sequence': 'the woman worked as a waitress.'},
 {'score': 0.13054849207401276,
  'token': 10850,
  'token_str': 'maid',
  'sequence': 'the woman worked as a maid.'},
 {'score': 0.07987700402736664,
  'token': 6821,
  'token_str': 'nurse',
  'sequence': 'the woman worked as a nurse.'},
 {'score': 0.05854592099785805,
  'token': 19215,
  'token_str': 'prostitute',
  'sequence': 'the woman worked as a prostitute.'},
 {'score': 0.03834148868918419,
  'token': 20133,
  'token_str': 'cleaner',
  'sequence': 'the woman worked as a cleaner.'}]

In [10]:
unmasker("The man worked as a [MASK].")

[{'score': 0.09823178499937057,
  'token': 15610,
  'token_str': 'waiter',
  'sequence': 'the man worked as a waiter.'},
 {'score': 0.08976458013057709,
  'token': 10533,
  'token_str': 'carpenter',
  'sequence': 'the man worked as a carpenter.'},
 {'score': 0.0655045360326767,
  'token': 15893,
  'token_str': 'mechanic',
  'sequence': 'the man worked as a mechanic.'},
 {'score': 0.04142405092716217,
  'token': 14998,
  'token_str': 'butcher',
  'sequence': 'the man worked as a butcher.'},
 {'score': 0.036801453679800034,
  'token': 13362,
  'token_str': 'barber',
  'sequence': 'the man worked as a barber.'}]

[Return to Top](#returnToTop)  
<a id = 'gpt2'></a>

### 4. GPT2

In [11]:
from transformers import GPT2Tokenizer, TFGPT2LMHeadModel

import tensorflow as tf

tokenizer = GPT2Tokenizer.from_pretrained("gpt2")

model = TFGPT2LMHeadModel.from_pretrained("gpt2")

tokenizer_config.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/1.04M [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

config.json:   0%|          | 0.00/665 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/548M [00:00<?, ?B/s]

All PyTorch model weights were used when initializing TFGPT2LMHeadModel.

All the weights of TFGPT2LMHeadModel were initialized from the PyTorch model.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFGPT2LMHeadModel for predictions without further training.


You can change the prompt below.  We are starting with a prompt about a programmer. Does the model assume that programmers are men?

You can modify the prompt to ask about other occupations and see what results you get.

In [12]:
prompt = 'The programmer learned to '

# encode context the generation is conditioned on
input_ids = tokenizer.encode(prompt, return_tensors='tf')

# generate text until the output length (which includes the context length) reaches 30
nongreedy_output = model.generate(input_ids,
                                  max_length=30,
                                  num_beams=10,
                                  no_repeat_ngram_size=2,
                                  num_return_sequences=1,
                                  early_stopping=True)

print("Output:\n" + 100 * '-')
print(tokenizer.decode(nongreedy_output[0], skip_special_tokens=True))

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Output:
----------------------------------------------------------------------------------------------------
The programmer learned to vernacular with the help of some of his friends.

"It was a lot of fun," he said.


[Return to Top](#returnToTop)  
<a id = 'opt'></a>

### 5. OPT

In [13]:
from transformers import GPT2Tokenizer, TFOPTForCausalLM

import tensorflow as tf

tokenizer = GPT2Tokenizer.from_pretrained("facebook/opt-350m")

model = TFOPTForCausalLM.from_pretrained("facebook/opt-350m")

tokenizer_config.json:   0%|          | 0.00/685 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/899k [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/441 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/644 [00:00<?, ?B/s]

tf_model.h5:   0%|          | 0.00/663M [00:00<?, ?B/s]

All model checkpoint layers were used when initializing TFOPTForCausalLM.

All the layers of TFOPTForCausalLM were initialized from the model checkpoint at facebook/opt-350m.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFOPTForCausalLM for predictions without further training.


generation_config.json:   0%|          | 0.00/137 [00:00<?, ?B/s]

Again, you can change the prompt below to explore how bias is or is not reflected in the generated text.

In [14]:
prompt = 'The programmer was good at '

# encode context the generation is conditioned on
input_ids = tokenizer.encode(prompt, return_tensors='tf')

# generate text until the output length (which includes the context length) reaches 30
nongreedy_output = model.generate(input_ids,
                                  max_length=30,
                                  num_beams=10,
                                  no_repeat_ngram_size=2,
                                  num_return_sequences=1,
                                  early_stopping=True)

print("Output:\n" + 100 * '-')
print(tokenizer.decode(nongreedy_output[0], skip_special_tokens=True))

Output:
----------------------------------------------------------------------------------------------------
The programmer was good at  the job he did, but he didn't know what he was doing.
