# WEAT Tutorial: Measuring Gender Bias Step-by-Step

**Goal:** Understand if a language model associates male/female words differently with career/family words.

**The Big Idea:**  
If a model is unbiased, "man" and "woman" should be equally close to "career" words.  
If there's bias, one gender will be systematically closer to certain concepts.

---

## Step 1: Install and Import

We only need a few simple tools.

In [2]:
# Install if needed
!pip install transformers torch numpy

zsh:1: command not found: pip


In [3]:
import torch
import numpy as np
from transformers import AutoTokenizer, AutoModel

# Use CPU or GPU
device = 'cuda' if torch.cuda.is_available() else 'cpu'
print(f"Using: {device}")

  from .autonotebook import tqdm as notebook_tqdm


Using: cpu


## Step 2: Load a Simple Model

We'll use BERT - it's a popular language model.

In [4]:
# Load BERT
model_name = 'bert-base-uncased'
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModel.from_pretrained(model_name).to(device)
model.eval()  # Put in evaluation mode

print("Model loaded!")

Model loaded!


## Step 3: Define Our Test Words

**Super simple example:**
- **Target Set 1 (Male):** man, he
- **Target Set 2 (Female):** woman, she  
- **Attribute Set 1 (Career):** work, salary
- **Attribute Set 2 (Family):** home, family

In [5]:
male_words = ['man', 'he']
female_words = ['woman', 'she']
career_words = ['work', 'salary']
family_words = ['home', 'family']
print("Test words defined!")

Test words defined!


## Step 4: Get Word Embeddings

An **embedding** is just a list of numbers that represents a word.

Let's get the embedding for one word first to understand:

In [6]:
# Example: Get embedding for "man"
word = "man"

# Step 1: Convert word to numbers that BERT understands
inputs = tokenizer(word, return_tensors='pt').to(device)
print(f"Input IDs: {inputs['input_ids']}")

# Step 2: Pass through BERT
with torch.no_grad():  # Don't calculate gradients (we're not training)
    outputs = model(**inputs)

# Step 3: Get the embedding (it's hidden in the output)
# Shape: [1, sequence_length, 768]
# We take the middle token (index 1) and ignore [CLS] and [SEP]
embedding = outputs.last_hidden_state[0, 1, :]

print(f"\nEmbedding shape: {embedding.shape}")
print(f"First 10 numbers: {embedding[:10].cpu().numpy()}")
print(f"\nThis is a vector of {len(embedding)} numbers representing 'man'")

Input IDs: tensor([[ 101, 2158,  102]])

Embedding shape: torch.Size([768])
First 10 numbers: [-0.30896628 -0.10018251 -0.25786588 -0.7900988   0.21146199  0.36500245
  0.7095498  -0.2389967  -0.23208368 -1.283527  ]

This is a vector of 768 numbers representing 'man'


Now let's make a simple function to get embeddings for any word:

In [7]:
def get_embedding(word):
    """Get the embedding vector for a word."""
    inputs = tokenizer(word, return_tensors='pt').to(device)
    with torch.no_grad():
        outputs = model(**inputs)
    # Get middle token, convert to numpy
    embedding = outputs.last_hidden_state[0, 1, :].cpu().numpy()
    return embedding

# Test it
man_emb = get_embedding('man')
woman_emb = get_embedding('woman')

print(f"man embedding: {man_emb[:5]}...")
print(f"woman embedding: {woman_emb[:5]}...")

man embedding: [-0.30896628 -0.10018251 -0.25786588 -0.7900988   0.21146199]...
woman embedding: [-0.6725328  -0.47752842 -0.02868308 -0.71078324  0.1911808 ]...


## Step 5: Measure Similarity Between Words

**Cosine similarity** tells us how similar two embeddings are:  
- 1.0 = identical  
- 0.0 = unrelated  
- -1.0 = opposite

Formula: similarity = (A · B) / (|A| × |B|)

### Worked Example (Do This by Hand ✍️)
Grab a pen and paper and work through each step yourself. This is basic vector math, and you should be able to calculate it by hand.

$$
A =
\begin{bmatrix}
1 \\
2
\end{bmatrix},
\quad
B =
\begin{bmatrix}
2 \\
4
\end{bmatrix}
$$

**Dot product**

$$
A \cdot B
=
(1 \times 2) + (2 \times 4)
=
10
$$

**Magnitudes**

$$
\lVert A \rVert
=
\sqrt{1^2 + 2^2}
=
\sqrt{5}
$$

$$
\lVert B \rVert
=
\sqrt{2^2 + 4^2}
=
\sqrt{20}
$$

**Cosine similarity**

$$
\text{CosineSimilarity}(A, B)
=
\frac{10}{\sqrt{5} \cdot \sqrt{20}}
=
1.0
$$

---


In [9]:
def cosine_similarity(vec1, vec2):
    """Calculate how similar two vectors are."""
    dot_product = np.dot(vec1, vec2)
    magnitude1 = np.linalg.norm(vec1)
    magnitude2 = np.linalg.norm(vec2)
    return dot_product / (magnitude1 * magnitude2)

# Test: How similar are "man" and "woman"?
man_emb = get_embedding('man')
woman_emb = get_embedding('woman')
work_emb = get_embedding('work')


sim_man_work = cosine_similarity(man_emb, work_emb)
sim_woman_work = cosine_similarity(woman_emb, work_emb)


print(f"Similarity between 'man' and 'work': {sim_man_work:.2f}")
print(f"Similarity between 'woman' and 'work': {sim_woman_work:.2f}")

Similarity between 'man' and 'work': 0.70
Similarity between 'woman' and 'work': 0.65


## Step 6: Calculate Association for ONE Word

**Association** = How much closer is a word to career vs. family?

For example, for "man":  
1. Calculate average similarity to career words (work, salary)  
2. Calculate average similarity to family words (home, family)  
3. Association = career_similarity - family_similarity

**Positive association** = closer to career  
**Negative association** = closer to family

In [11]:
# Let's calculate association for "man"
target_word = 'man'

# Get embedding for our target word
target_emb = get_embedding(target_word)
print(f"Analyzing: {target_word}\n")

# Step 1: Get embeddings for career words
career_embeddings = [get_embedding(w) for w in career_words]
print(f"Career words: {career_words}")

# Step 2: Calculate similarity to each career word
career_sims = [cosine_similarity(target_emb, career_emb) for career_emb in career_embeddings]
print(f"Similarities to career words: {[f'{s:.2f}' for s in career_sims]}")

# Step 3: Average similarity to career
avg_career_sim = np.mean(career_sims)
print(f"Average similarity to career: {avg_career_sim:.2f}\n")

# Step 4: Same for family words
family_embeddings = [get_embedding(w) for w in family_words]
print(f"Family words: {family_words}")
family_sims = [cosine_similarity(target_emb, family_emb) for family_emb in family_embeddings]
print(f"Similarities to family words: {[f'{s:.2f}' for s in family_sims]}")
avg_family_sim = np.mean(family_sims)
print(f"Average similarity to family: {avg_family_sim:.2f}\n")

# Step 5: Calculate association
association = avg_career_sim - avg_family_sim
print(f"="*50)
print(f"Association for '{target_word}': {association:.2f}")
if association > 0:
    print(f"→ '{target_word}' is closer to CAREER words")
else:
    print(f"→ '{target_word}' is closer to FAMILY words")

Analyzing: man

Career words: ['work', 'salary']
Similarities to career words: ['0.70', '0.57']
Average similarity to career: 0.64

Family words: ['home', 'family']
Similarities to family words: ['0.44', '0.70']
Average similarity to family: 0.57

Association for 'man': 0.06
→ 'man' is closer to CAREER words


Now let's make a function to do this for any word:

In [12]:
def calculate_association(target_word, attribute_set1, attribute_set2):
    """Calculate how associated a word is with set1 vs set2."""
    
    # Get target embedding
    target_emb = get_embedding(target_word)
    
    # Get embeddings for both attribute sets
    set1_embeddings = [get_embedding(w) for w in attribute_set1]
    set2_embeddings = [get_embedding(w) for w in attribute_set2]
    
    # Calculate average similarity to each set
    avg_sim_set1 = np.mean([cosine_similarity(target_emb, emb) for emb in set1_embeddings])
    avg_sim_set2 = np.mean([cosine_similarity(target_emb, emb) for emb in set2_embeddings])
    
    # Association = difference
    association = avg_sim_set1 - avg_sim_set2
    
    return association

# Test for all our gender words
print("Associations (Career - Family):\n")
for word in male_words + female_words:
    assoc = calculate_association(word, career_words, family_words)
    print(f"{word:10s}: {assoc:+.3f}")

Associations (Career - Family):

man       : +0.064
he        : +0.013
woman     : +0.054
she       : -0.004


## Step 7: The Full WEAT Score

Now we calculate WEAT for **groups** of words:

**WEAT Question:** Do male words associate more with career than female words do?

**Steps:**
1. Calculate association for each male word
2. Calculate association for each female word  
3. Compare the averages
4. Standardize by dividing by standard deviation (this gives us effect size)

In [13]:
# Step 1: Get associations for all male words
print("Male word associations (Career - Family):")
male_associations = []
for word in male_words:
    assoc = calculate_association(word, career_words, family_words)
    male_associations.append(assoc)
    print(f"  {word}: {assoc:+.3f}")

print(f"\nAverage for male words: {np.mean(male_associations):.3f}\n")

# Step 2: Get associations for all female words
print("Female word associations (Career - Family):")
female_associations = []
for word in female_words:
    assoc = calculate_association(word, career_words, family_words)
    female_associations.append(assoc)
    print(f"  {word}: {assoc:+.3f}")

print(f"\nAverage for female words: {np.mean(female_associations):.3f}")

Male word associations (Career - Family):
  man: +0.064
  he: +0.013

Average for male words: 0.038

Female word associations (Career - Family):
  woman: +0.054
  she: -0.004

Average for female words: 0.025


In [14]:
# Step 3: Calculate the difference
mean_male = np.mean(male_associations)
mean_female = np.mean(female_associations)
difference = mean_male - mean_female

print(f"\nMean association for male words: {mean_male:+.3f}")
print(f"Mean association for female words: {mean_female:+.3f}")
print(f"Difference (male - female): {difference:+.3f}")

# Step 4: Calculate effect size (standardized difference)
all_associations = male_associations + female_associations
std_dev = np.std(all_associations, ddof=1)  # ddof=1 for sample std

effect_size = difference / std_dev

print(f"\nStandard deviation: {std_dev:.3f}")
print(f"="*60)
print(f"WEAT EFFECT SIZE: {effect_size:.3f}")
print(f"="*60)


Mean association for male words: +0.038
Mean association for female words: +0.025
Difference (male - female): +0.013

Standard deviation: 0.033
WEAT EFFECT SIZE: 0.405


## Step 8: Interpret the Result

**What does the effect size mean?**

- **Positive effect size:** Male words are MORE associated with career than female words
- **Negative effect size:** Female words are MORE associated with career than male words  
- **Close to 0:** No difference (no bias)

**Rule of thumb** (Cohen's d):
- 0.2 = small effect
- 0.5 = medium effect  
- 0.8 = large effect
- >1.0 = very large effect

In [15]:
print("\n" + "="*60)
print("INTERPRETATION")
print("="*60)

if abs(effect_size) < 0.2:
    magnitude = "VERY SMALL or NO"
elif abs(effect_size) < 0.5:
    magnitude = "SMALL"
elif abs(effect_size) < 0.8:
    magnitude = "MEDIUM"
else:
    magnitude = "LARGE"

if effect_size > 0:
    direction = "Male words are more associated with CAREER"
else:
    direction = "Female words are more associated with CAREER"

print(f"\nEffect size: {effect_size:.3f}")
print(f"Magnitude: {magnitude}")
print(f"Direction: {direction}")
print(f"\nThis suggests the model has {magnitude} gender bias in")
print(f"associating gender with career vs. family concepts.")
print("="*60)


INTERPRETATION

Effect size: 0.405
Magnitude: SMALL
Direction: Male words are more associated with CAREER

This suggests the model has SMALL gender bias in
associating gender with career vs. family concepts.
