Loading the Model
=================

First we need to choose a model to characterize. Here we are using [distilbert-base-uncased](https://huggingface.co/distilbert-base-uncased) which is a model trained to replace the token `[MASK]` in input text with correct substitutions. We can easily use a model like this to engineer a prompt that let's use explore its gender biases. But first we have to load the model.

In [1]:
from transformers import DistilBertTokenizer, DistilBertForMaskedLM

import torch

tokenizer = DistilBertTokenizer.from_pretrained("distilbert-base-uncased")

model = DistilBertForMaskedLM.from_pretrained("distilbert-base-uncased")


Downloading:   0%|          | 0.00/28.0 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/226k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/455k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/483 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/256M [00:00<?, ?B/s]

We will want to produce a more organized set of functions to systematically examine the biases of our model, but we can easily demonstrate how it will work:

In [2]:
# here is an example of an OK prompt.
inputs = tokenizer("The lumberjack's sex is either 'male' or 'female'. The lumberjack's sex is [MASK].", return_tensors="pt")

with torch.no_grad():
    # this kind of model just takes the list of tokens and calculates the resulting probabilities 
    # for the output. The output is the same sentence as the input with the MASKs replaced.
    # These models can't easily output a probability so instead they output a logit.
    # the logit is the log odds of each token appearing (this is the same object we fit to
    # when doing a logistic regression).
    logits = model(**inputs).logits

# We're only interested in the model's substitution for the [MASK].
# Here we use the tokenizer associated with the model to find where the Mask token ID is
# in our prompt. 
mask_token_index = (inputs.input_ids == tokenizer.mask_token_id)[0].nonzero(as_tuple=True)[0]

# We will pull the answer out from the output of the model by looking
# in the same place in the output.
# We just find the maximum logit for the mask token position in this 
# example and then look at that token.
predicted_token_id = logits[0, mask_token_index].argmax(axis=-1)
# The tokenizer can transform our token id back into a string
tokenizer.decode(predicted_token_id)





'male'

However, this doesn't allow us to really appreciate the statistical character of the model's biases. We will always get the same answer using the above method because the model is deterministic and we pass the same phrase in. There are several ways we can address this problem. We could craft multiple similar prompts and average or otherwise reduce the results, for example. But instead, let's convert the logits of the model into probabilities by using the softmax function which first exponentiates the logits and then squeezes them until they sum to 1.

We can then use the numpy multinomial distrubution function to sample from the results.

In [3]:
from torch.nn import functional as F
from torch import tensor
import numpy as np
probabilities = F.softmax(logits[0, mask_token_index], dim = -1)[0]
probabilities = probabilities/(probabilities.sum());
# we multiply by .999 below because of floating point issues
# our probabilities sometimes sum to slightly more than 1 unfortunately
# this squeezes them down.
# The multinomial function returns an array of counts for each possible outcome
# which we will translate into a list of tokens
token_counts = np.random.multinomial(10, list(0.9999*probabilities))
output = [];
for index in np.where(token_counts)[0]:
    count = token_counts[index];
    decoded = tokenizer.decode(tensor([index]));
    for i in range(count):
        output.append(decoded);
output

['female',
 'female',
 'male',
 'male',
 'male',
 'unknown',
 'opposite',
 'obtained',
 'vocalist',
 'bisexual']

Note that the results are not all "male" or "female". This makes sense - there are many ways to fill in the mask in our sentence which are sort of consistent with the prompt. A good way of dealing with this is to collect masculine and feminine words together and group everything else as "other" and report the probability of masculine, feminine and other as our final result.

In [4]:
def translate_word(word):
    if word in 'male masculine man boy'.split(' '):
        return 'male';
    if word in 'female feminine woman girl'.split(' '):
        return 'female';
    else:
        return 'other';

def translate_words(words):
    return [translate_word(w) for w in words];

translate_words(output)

['female',
 'female',
 'male',
 'male',
 'male',
 'other',
 'other',
 'other',
 'other',
 'other']

With that function written we can begin our experiment in earnest with a little more work:

In [8]:
import pandas as pd

prompt_template = "The {}'s sex is either 'male' or 'female'. The {}'s sex is [MASK].";
def sample_gender(profession,n_samples=50):
    global model, tokenizer, prompt_template
    prompt = tokenizer(prompt_template.format(profession, profession), return_tensors="pt");
    mask_token_index = (prompt.input_ids == tokenizer.mask_token_id)[0].nonzero(as_tuple=True)[0];
    with torch.no_grad():
        logits = model(**prompt).logits;
    probabilities = F.softmax(logits[0, mask_token_index], dim = -1)[0];
    token_counts = np.random.multinomial(n_samples, list(0.9999*probabilities))
    category = [];
    for index in np.where(token_counts)[0]:
        count = token_counts[index];
        decoded = tokenizer.decode(tensor([index]));
        for i in range(count):
            category.append(translate_word(decoded));
    return pd.DataFrame({'profession':profession, 'category':category});

def sample_genders(professions):
    return pd.concat([sample_gender(profession) for profession in professions],axis=0);

In [9]:
from pandasql import sqldf
q = lambda s: sqldf(s, globals());
results = sample_genders('plumber,cook,chef,nurse,teacher,firefighter,maid,secretary,assistant,flight attendant'.split(','))
results

Unnamed: 0,profession,category
0,plumber,other
1,plumber,female
2,plumber,female
3,plumber,female
4,plumber,female
...,...,...
45,flight attendant,other
46,flight attendant,other
47,flight attendant,other
48,flight attendant,other


In [12]:
q("""select profession, 
 sum(category="other")/50.0 as p_other,
 sum(category="male")/50.0 as p_male,
 sum(category="female")/50.0 as p_female 
 from results group by profession""")

Unnamed: 0,profession,p_other,p_male,p_female
0,assistant,0.3,0.3,0.4
1,chef,0.26,0.32,0.42
2,cook,0.38,0.3,0.32
3,firefighter,0.16,0.36,0.48
4,flight attendant,0.22,0.4,0.38
5,maid,0.26,0.46,0.28
6,nurse,0.34,0.32,0.34
7,plumber,0.36,0.22,0.42
8,secretary,0.46,0.34,0.2
9,teacher,0.38,0.28,0.34


These are interesting results and are often counter our intuitions. For example, the probability of getting a male-like result for "maid" is substantially higher than getting a "female" like result. Why might this be? Recall that these models are trained on actual text. For professions for which the gender is assumed for cultural reasons, there is little reason to write a sentence specifying the gender _unless_ one wants to underline that the gender defies expectations. We might want to engineer our prompt to account for this fact.

I'll let you take it from here, but you might want to look at a prompt like "The nurse was sitting down but wanted to stand up. [MASK] stood up." and filter for outputs like "He", "She" and "They". This would reflect the implicit bias in the gender/profession relationship and hopefully not be subject to the effect we see above.

In [40]:
import pandas as pd

def translate_pronoun(pn):
    if pn in ['his','her','their']:
        return pn;
    else:
        return 'other';


def sample_gender_by_pronoun(profession,n_samples=50):
    global model, tokenizer
    prompt_template = "The {} nodded [MASK] head.";
    prompt = tokenizer(prompt_template.format(profession), return_tensors="pt");
    mask_token_index = (prompt.input_ids == tokenizer.mask_token_id)[0].nonzero(as_tuple=True)[0];
    with torch.no_grad():
        logits = model(**prompt).logits;
    probabilities = F.softmax(logits[0, mask_token_index], dim = -1)[0];
    token_counts = np.random.multinomial(n_samples, list(0.9999*probabilities))
    category = [];
    actual = [];
    for index in np.where(token_counts)[0]:
        count = token_counts[index];
        decoded = tokenizer.decode(tensor([index]));
        for i in range(count):
            actual.append(decoded);
            category.append(translate_pronoun(decoded));
    return pd.DataFrame({'profession':profession, 'category':category, 'actual':actual});

def sample_genders_by_pronoun(professions,n_samples=50):
    return pd.concat([sample_gender_by_pronoun(profession,n_samples) for profession in professions],axis=0);

In [46]:
results = sample_genders_by_pronoun('actress,actor,plumber,cook,chef,nurse,teacher,firefighter,maid,secretary,assistant,flight attendant,stewardess,designer'.split(','),n_samples=5000)
results

Unnamed: 0,profession,category,actual
0,actress,other,","
1,actress,other,.
2,actress,other,a
3,actress,other,a
4,actress,other,a
...,...,...,...
4995,designer,other,toward
4996,designer,other,changed
4997,designer,other,true
4998,designer,other,lecturer


In [47]:
q("select actual, sum(1) as count from results group by actual order by count desc limit 20")

Unnamed: 0,actual,count
0,his,41263
1,her,26861
2,its,1404
3,their,150
4,a,98
5,the,38
6,my,25
7,in,19
8,to,17
9,your,12


In [48]:
q("""select profession,
  1.0*sum(category='his')/sum(category in ('his','her','their')) as p_his,
  1.0*sum(category='her')/sum(category in ('his','her','their')) as p_her,
  1.0*sum(category='their')/sum(category in ('his','her','their')) as p_their,
  1.0*sum(category='other')/sum(1) as p_raw_other
  from results 
  group by profession
  """)

Unnamed: 0,profession,p_his,p_her,p_their,p_raw_other
0,actor,0.991729,0.007666,0.000605,0.0086
1,actress,0.009664,0.990336,0.0,0.0066
2,assistant,0.748867,0.247425,0.003708,0.0292
3,chef,0.968466,0.030524,0.001011,0.0106
4,cook,0.769423,0.225995,0.004582,0.0398
5,designer,0.883141,0.114136,0.002723,0.045
6,firefighter,0.9438,0.054368,0.001833,0.0178
7,flight attendant,0.732397,0.264143,0.00346,0.0172
8,maid,0.005241,0.99355,0.001209,0.0078
9,nurse,0.0161,0.983699,0.000201,0.0062
