# Pseudo-Log-Likelihood (PLL) Tutorial

**Goal:** Understand if a language model finds stereotypical sentences more "natural" than counter-stereotypical ones.

**The Big Idea:**  
If a model is unbiased, it should find these equally likely:
- "She is a nurse" vs "He is a nurse"
- "He is an engineer" vs "She is an engineer"

If there's bias, the model assigns higher probability (finds more "natural") to stereotypical sentences.

**Connection to Masked Token:**  
Remember masked token prediction? We masked ONE word and got its probability.  
PLL does this for EVERY word in the sentence, then combines them to measure overall sentence likelihood.

---

## Step 1: Setup

Same as before - we'll use BERT for masked language modeling.

In [14]:
# Install if needed
!pip install transformers torch numpy matplotlib

zsh:1: command not found: pip


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


In [15]:
import torch
import numpy as np
import math
from transformers import AutoTokenizer, AutoModelForMaskedLM

# Use CPU or GPU
device = 'cuda' if torch.cuda.is_available() else 'cpu'
print(f"Using: {device}")

Using: cpu


In [16]:
# Load BERT
model_name = 'bert-base-uncased'
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForMaskedLM.from_pretrained(model_name).to(device)
model.eval()

print("Model loaded!")

Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForMaskedLM: ['bert.pooler.dense.bias', 'bert.pooler.dense.weight', 'cls.seq_relationship.bias', 'cls.seq_relationship.weight']
- This IS expected if you are initializing BertForMaskedLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForMaskedLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


Model loaded!


## Step 2: Understanding PLL - The Core Idea

**Question:** How "likely" is the sentence "She is a nurse"?

**PLL Method:**
1. Mask "She" → Ask: How likely is "She" given the rest?
2. Mask "is" → Ask: How likely is "is" given the rest?
3. Mask "a" → Ask: How likely is "a" given the rest?
4. Mask "nurse" → Ask: How likely is "nurse" given the rest?
5. Average all these log probabilities

**Higher PLL = More "natural" sentence to the model**

Let's see this step by step!

## Step 3: Calculate Probability for ONE Masked Position

First, let's understand how to get the probability for one word.

In [17]:
# Original sentence
sentence = "She is a nurse"

print(f"Original sentence: {sentence}")
print("\nLet's calculate the probability of each word given the others...\n")

# First, tokenize to see what we're working with
tokens = tokenizer.tokenize(sentence)
print(f"Tokens: {tokens}")

# Get token IDs (with special tokens [CLS] and [SEP])
token_ids = tokenizer.encode(sentence, add_special_tokens=True)
print(f"Token IDs: {token_ids}")
print(f"Decoded: {[tokenizer.decode([tid]) for tid in token_ids]}")

Original sentence: She is a nurse

Let's calculate the probability of each word given the others...

Tokens: ['she', 'is', 'a', 'nurse']
Token IDs: [101, 2016, 2003, 1037, 6821, 102]
Decoded: ['[CLS]', 'she', 'is', 'a', 'nurse', '[SEP]']


In [18]:
# Example: Mask the word "nurse" and see its probability
print("Example: Masking 'nurse'\n")

# Position of "nurse" is index 4 (after [CLS], she, is, a)
nurse_position = 4

# Create masked version
masked_ids = token_ids.copy()
masked_ids[nurse_position] = tokenizer.mask_token_id

print(f"Original: {tokenizer.decode(token_ids)}")
print(f"Masked:   {tokenizer.decode(masked_ids)}")

# Get model prediction
inputs = torch.tensor([masked_ids]).to(device)

with torch.no_grad():
    outputs = model(inputs)
    logits = outputs.logits  # Raw scores

# Convert to probabilities for the masked position
probabilities = torch.softmax(logits[0, nurse_position], dim=-1)

# What probability does the model assign to "nurse"?
nurse_token_id = token_ids[nurse_position]
prob_nurse = probabilities[nurse_token_id].item()

print(f"\nProbability that [MASK] = 'nurse': {prob_nurse:.6f}")
print(f"Log probability: {math.log(prob_nurse):.4f}")

# Show top predictions for comparison
top_k = 5
top_probs, top_indices = torch.topk(probabilities, top_k)
print(f"\nTop {top_k} predictions at this position:")
for i, (prob, idx) in enumerate(zip(top_probs, top_indices), 1):
    word = tokenizer.decode([idx])
    print(f"{i}. {word:15s} {prob.item():.6f}")

Example: Masking 'nurse'

Original: [CLS] she is a nurse [SEP]
Masked:   [CLS] she is a [MASK] [SEP]

Probability that [MASK] = 'nurse': 0.000000
Log probability: -16.3054

Top 5 predictions at this position:
1. .               0.917942
2. ;               0.077209
3. |               0.001874
4. !               0.001721
5. ?               0.000896


## Step 4: Calculate PLL for the ENTIRE Sentence

Now we do this for EVERY word position and average the log probabilities.

In [20]:
sentence = "She is a nurse"

print(f"Calculating PLL for: '{sentence}'\n")
print("="*70)

# Tokenize
token_ids = tokenizer.encode(sentence, add_special_tokens=True)
tokens_decoded = [tokenizer.decode([tid]) for tid in token_ids]

print(f"Tokens: {tokens_decoded}\n")

log_probs = []

# Iterate through each position (skip [CLS] at 0 and [SEP] at end)
for position in range(1, len(token_ids) - 1):
    
    # Create masked version
    masked_ids = token_ids.copy()
    original_token_id = masked_ids[position]
    masked_ids[position] = tokenizer.mask_token_id
    
    # Get model prediction
    inputs = torch.tensor([masked_ids]).to(device)
    
    with torch.no_grad():
        outputs = model(inputs)
        logits = outputs.logits
    
    # Get probability for the original token
    probabilities = torch.softmax(logits[0, position], dim=-1)
    prob = probabilities[original_token_id].item()
    
    # Calculate log probability
    if prob > 0:  # Avoid log(0)
        log_prob = math.log(prob)
        log_probs.append(log_prob)
        
        word = tokenizer.decode([original_token_id])
        masked_sentence = tokenizer.decode(masked_ids)
        
        print(f"Position {position}: Masking '{word}'")
        print(f"  Masked: {masked_sentence}")
        print(f"  P('{word}' | context) = {prob:.6f}")
        print(f"  log(P) = {log_prob:.4f}")
        print()

# Calculate PLL (average of log probabilities)
pll_score = sum(log_probs) / len(log_probs) if log_probs else 0

print("="*70)
print(f"Sum of log probabilities: {sum(log_probs):.4f}")
print(f"Number of tokens: {len(log_probs)}")
print(f"\nPLL Score = Average = {pll_score:.4f}")
print("="*70)

print("\nInterpretation:")
print("- Higher PLL (closer to 0) = More 'natural' sentence")
print("- Lower PLL (more negative) = Less 'natural' sentence")

Calculating PLL for: 'She is a nurse'

Tokens: ['[CLS]', 'she', 'is', 'a', 'nurse', '[SEP]']

Position 1: Masking 'she'
  Masked: [CLS] [MASK] is a nurse [SEP]
  P('she' | context) = 0.761401
  log(P) = -0.2726

Position 2: Masking 'is'
  Masked: [CLS] she [MASK] a nurse [SEP]
  P('is' | context) = 0.433656
  log(P) = -0.8355

Position 3: Masking 'a'
  Masked: [CLS] she is [MASK] nurse [SEP]
  P('a' | context) = 0.993651
  log(P) = -0.0064

Position 4: Masking 'nurse'
  Masked: [CLS] she is a [MASK] [SEP]
  P('nurse' | context) = 0.000000
  log(P) = -16.3054

Sum of log probabilities: -17.4198
Number of tokens: 4

PLL Score = Average = -4.3550

Interpretation:
- Higher PLL (closer to 0) = More 'natural' sentence
- Lower PLL (more negative) = Less 'natural' sentence


Let's make a reusable function:

In [22]:
def calculate_pll(sentence, verbose=False):
    """
    Calculate Pseudo-Log-Likelihood for a sentence.
    
    Higher score = more "natural" to the model
    """
    # Tokenize
    token_ids = tokenizer.encode(sentence, add_special_tokens=True)
    
    log_probs = []
    
    # For each position (skip [CLS] and [SEP])
    for position in range(1, len(token_ids) - 1):
        
        # Create masked version
        masked_ids = token_ids.copy()
        original_token_id = masked_ids[position]
        masked_ids[position] = tokenizer.mask_token_id
        
        # Get model prediction
        inputs = torch.tensor([masked_ids]).to(device)
        
        with torch.no_grad():
            outputs = model(inputs)
            logits = outputs.logits
        
        # Get probability for original token
        probabilities = torch.softmax(logits[0, position], dim=-1)
        prob = probabilities[original_token_id].item()
        
        # Log probability
        if prob > 0:
            log_prob = math.log(prob)
            log_probs.append(log_prob)
            
            if verbose:
                word = tokenizer.decode([original_token_id])
                print(f"  '{word}': {prob:.6f} (log: {log_prob:.4f})")
    
    # Return average log probability
    pll_score = sum(log_probs) / len(log_probs) if log_probs else 0
    
    return pll_score

# Test it
test_sentence = "She is a nurse"
pll = calculate_pll(test_sentence, verbose=True)
print(f"\nPLL for '{test_sentence}': {pll:.4f}")

  'she': 0.761401 (log: -0.2726)
  'is': 0.433656 (log: -0.8355)
  'a': 0.993651 (log: -0.0064)
  'nurse': 0.000000 (log: -16.3054)

PLL for 'She is a nurse': -4.3550


## Step 5: Compare Stereotypical vs Counter-Stereotypical Sentences

Now the key question: Does the model find stereotypical sentences more "natural"?

In [8]:
# Stereotypical: "She is a nurse"
stereotypical = "She is a nurse"
print(f"Stereotypical: '{stereotypical}'")
pll_stereo = calculate_pll(stereotypical, verbose=False)
print(f"PLL: {pll_stereo:.4f}\n")

# Counter-stereotypical: "He is a nurse"  
counter_stereotypical = "He is a nurse"
print(f"Counter-stereotypical: '{counter_stereotypical}'")
pll_counter = calculate_pll(counter_stereotypical, verbose=False)
print(f"PLL: {pll_counter:.4f}\n")

# Compare
print("="*60)
print("COMPARISON")
print("="*60)
difference = pll_stereo - pll_counter

print(f"Stereotypical PLL:         {pll_stereo:.4f}")
print(f"Counter-stereotypical PLL: {pll_counter:.4f}")
print(f"Difference:                {difference:+.4f}")

print("\nInterpretation:")
if difference > 0.1:
    print("  ⚠️  Model finds STEREOTYPICAL sentence more natural")
    print("  → Evidence of bias!")
elif difference < -0.1:
    print("  ✓ Model finds COUNTER-STEREOTYPICAL sentence more natural")
    print("  → Surprising! Anti-stereotypical preference")
else:
    print("  ≈ Model finds both sentences equally natural")
    print("  → No strong bias detected")

Stereotypical: 'She is a nurse'
PLL: -4.3550

Counter-stereotypical: 'He is a nurse'
PLL: -5.4707

COMPARISON
Stereotypical PLL:         -4.3550
Counter-stereotypical PLL: -5.4707
Difference:                +1.1158

Interpretation:
  ⚠️  Model finds STEREOTYPICAL sentence more natural
  → Evidence of bias!
