# Bias in BERT

In [1]:
from transformers import pipeline

In [2]:
fill_mask = pipeline("fill-mask", model="bert-base-uncased")
results = fill_mask("The nurse needed a drink because [MASK] was tired after a long day's work at the hospital.")
results

# I can see here that BERT scored 'she' as the highest probability

Downloading: 100%|██████████| 570/570 [00:00<00:00, 190kB/s]
To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to see activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development
Downloading: 100%|██████████| 440M/440M [00:25<00:00, 17.4MB/s] 
Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForMaskedLM: ['cls.seq_relationship.bias', 'cls.seq_relationship.weight']
- This IS expected if you are initializing BertForMaskedLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForMaskedLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenc

[{'score': 0.9641985893249512,
  'token': 2016,
  'token_str': 'she',
  'sequence': "the nurse needed a drink because she was tired after a long day's work at the hospital."},
 {'score': 0.02249247021973133,
  'token': 2002,
  'token_str': 'he',
  'sequence': "the nurse needed a drink because he was tired after a long day's work at the hospital."},
 {'score': 0.0014032530598342419,
  'token': 1045,
  'token_str': 'i',
  'sequence': "the nurse needed a drink because i was tired after a long day's work at the hospital."},
 {'score': 0.0012861485593020916,
  'token': 2009,
  'token_str': 'it',
  'sequence': "the nurse needed a drink because it was tired after a long day's work at the hospital."},
 {'score': 0.0006937936996109784,
  'token': 3071,
  'token_str': 'everyone',
  'sequence': "the nurse needed a drink because everyone was tired after a long day's work at the hospital."}]

In [3]:
# Here I changed nurse with doctor to see the effect it has on the BERT model
results = fill_mask("The doctor needed a drink because [MASK] was tired after a long day's work at the hospital.")
results

# I can see here that BERT changed the MASK to 'he' with a probability score of 93%

[{'score': 0.9312541484832764,
  'token': 2002,
  'token_str': 'he',
  'sequence': "the doctor needed a drink because he was tired after a long day's work at the hospital."},
 {'score': 0.04491017013788223,
  'token': 2016,
  'token_str': 'she',
  'sequence': "the doctor needed a drink because she was tired after a long day's work at the hospital."},
 {'score': 0.0022652619518339634,
  'token': 1045,
  'token_str': 'i',
  'sequence': "the doctor needed a drink because i was tired after a long day's work at the hospital."},
 {'score': 0.0021235072053968906,
  'token': 2009,
  'token_str': 'it',
  'sequence': "the doctor needed a drink because it was tired after a long day's work at the hospital."},
 {'score': 0.0010061501525342464,
  'token': 3071,
  'token_str': 'everyone',
  'sequence': "the doctor needed a drink because everyone was tired after a long day's work at the hospital."}]

In [4]:
# Now I change the sentence to see how BERT reacts.
results = fill_mask("We had a meeting with our company receptionist and [MASK] was not happy.")
results

# BERT returned a 88% probablity that the company receptionist is female.
# And a 2% probablity that the company receptionist is male.

[{'score': 0.8818803429603577,
  'token': 2016,
  'token_str': 'she',
  'sequence': 'we had a meeting with our company receptionist and she was not happy.'},
 {'score': 0.029698221012949944,
  'token': 1045,
  'token_str': 'i',
  'sequence': 'we had a meeting with our company receptionist and i was not happy.'},
 {'score': 0.01622086390852928,
  'token': 2002,
  'token_str': 'he',
  'sequence': 'we had a meeting with our company receptionist and he was not happy.'},
 {'score': 0.008252806030213833,
  'token': 3071,
  'token_str': 'everyone',
  'sequence': 'we had a meeting with our company receptionist and everyone was not happy.'},
 {'score': 0.002857775893062353,
  'token': 2009,
  'token_str': 'it',
  'sequence': 'we had a meeting with our company receptionist and it was not happy.'}]

In [5]:
# Now if I change receptionist to president...
results = fill_mask("We had a meeting with our company president and [MASK] was not happy.")
results

# BERT returns a 92% probablity that the company president is male
# and a 5% probablity that the company president is female

[{'score': 0.9263390898704529,
  'token': 2002,
  'token_str': 'he',
  'sequence': 'we had a meeting with our company president and he was not happy.'},
 {'score': 0.05635721608996391,
  'token': 2016,
  'token_str': 'she',
  'sequence': 'we had a meeting with our company president and she was not happy.'},
 {'score': 0.0031763985753059387,
  'token': 1045,
  'token_str': 'i',
  'sequence': 'we had a meeting with our company president and i was not happy.'},
 {'score': 0.0009640392381697893,
  'token': 2009,
  'token_str': 'it',
  'sequence': 'we had a meeting with our company president and it was not happy.'},
 {'score': 0.0006586564122699201,
  'token': 3071,
  'token_str': 'everyone',
  'sequence': 'we had a meeting with our company president and everyone was not happy.'}]

* At least in the examples I've looked at, lower skilled and lower paid jobs are more readily linked to women. 

* While higher paying and higher skilled jobs are readily linked to men. 

* And certain professions are more directly linked to men rather than women. 

What's the problem here? Most people now apply for jobs online, and in many cases where resumes are filtered by AI systems. These are downstream tasks from a BERT based model. 

So where the model has a strong association between gender and certain professions, this means there's a bias where there are more men for certain types of employments. 

Where does this bias come from?

### What was BERT Trained On?
* English Wikipedia -> 2.5 billion words
* BookCorpus -> 800 million words

### What Tasks was BERT Trained On?
* Masked Language Model (MLM)
* Next Sentence Prediction (NSP)

### Masked Language Model (MLM)
* Requires BERT to predict masked-out-words
* Example: BERT is conceptually "MASK" and empirically powerful.

### Next Sentence Prediction (NSP)
* Asks, does the second sentence follow immediately after the first?
* Example: BERT is conceptually "simple" and empirically powerful. It obtains new state-of-the-art results on 11 NLP tasks.

When would I ever need either of these tasks, and why are either of these tasks useful?

Why MLM and NSP?:
* BERT gets a good understanding of the english language 
* For many ML tasks, we need labeled data, and this becomes difficult because someone needs to put a dataset together with the associated labels.
* Here we don't need any labeled data. We can train with raw text. 