# IOI Circuit Analysis - Question Documentation

This notebook contains the key knowledge points and exam questions for evaluating understanding of the IOI Circuit Analysis research.

**Note**: Students should only use the documentation.md file to answer these questions.


## Key Knowledge Points for IOI Circuit Analysis

### 1. Research Goal & Hypothesis
- Research objective: Identify circuit in GPT2-small for IOI task within 11,200 dimension budget
- Three-component hypothesis:
  - Duplicate Token Heads: S2 → S1 attention
  - S-Inhibition Heads: END → S2 attention
  - Name-Mover Heads: END → IO attention

### 2. Dataset Understanding
- Source: mib-bench/ioi (Hugging Face)
- Size: 10,000 examples total, 100 used for analysis
- Key positions: S1, S2, IO, END
- Metadata: subject, indirect_object, object, place

### 3. Model Configuration
- Model: GPT2-small (TransformerLens)
- Architecture: 12 layers, 12 heads, d_model=768, d_head=64, d_mlp=3072
- Device: CUDA (A100)

### 4. Write Budget Constraints
- Head: 64 dimensions (d_model / n_heads)
- MLP: 768 dimensions (d_model)
- Total budget: ≤ 11,200 dimensions

### 5. Analysis Pipeline
- Baseline evaluation → Attention pattern analysis → Node selection → Validation
- Baseline accuracy: 94%

### 6. Head Detection Methods
- Duplicate Token Heads: Attention S2 → S1 (top: a3.h0 with 0.7191)
- S-Inhibition Heads: Attention END → S2 (top: a8.h6 with 0.7441)
- Name-Mover Heads: Attention END → IO (top: a9.h9 with 0.7998)

### 7. Circuit Composition
- Total: 44 nodes (1 input + 31 heads + 12 MLPs)
- Budget: 31×64 + 12×768 = 1,984 + 9,216 = 11,200 (100% utilization)

### 8. Key Findings
- Layered processing: Early (0-3) → Middle (7-8) → Late (9-11)
- High selectivity (>0.7 attention for top heads)
- Efficient representation (10.1% of total model capacity)

### 9. Attention Head Categories
- 6 Duplicate Token Heads (layers 0, 1, 3)
- 12 S-Inhibition Heads (layers 7, 8, 9)
- 15 Name-Mover Heads (layers 9, 10, 11)


---

## Exam Questions

The following questions test both factual understanding and ability to apply, reason, and generalize from the documentation.

**Question Distribution**: 
- Multiple Choice: 8 questions
- Free Generation: 10 questions  
- Code-Based: 3 questions

### Question 1 (Multiple Choice)

What is the primary research objective of the IOI circuit analysis experiment?

A) Train GPT2-small to perform the IOI task from scratch
B) Compare different transformer architectures on the IOI task
C) Identify a precise circuit in GPT2-small that implements IOI behavior within a write budget constraint
D) Optimize GPT2-small's performance on the IOI task

### Question 2 (Multiple Choice)

According to the three-component hypothesis, what is the function of S-Inhibition Heads?

A) Attend from S2 to S1 to detect duplicate tokens
B) Attend from END to S2 to inhibit Name-Mover attention to subject positions
C) Attend from END to IO to copy the indirect object
D) Attend from S1 to END to establish positional context

### Question 3 (Free Generation)

What dataset was used for this analysis, and how many examples were used? Identify the four key positions in the IOI sentence structure.

### Question 4 (Multiple Choice)

What is the d_head dimension in GPT2-small, and how is it calculated?

A) 768, calculated as d_model
B) 64, calculated as d_model / n_heads
C) 3072, calculated as 4 × d_model
D) 12, calculated as n_heads

### Question 5 (Free Generation)

Explain the write budget allocation in this experiment. How many dimensions does each attention head write? How many dimensions does each MLP write?

### Question 6 (Multiple Choice)

What was GPT2-small's baseline accuracy on the IOI task?

A) 100%
B) 72%
C) 80%
D) 94%

### Question 7 (Free Generation)

Which attention head had the highest average S2→S1 attention score for duplicate token detection, and what was its score?

### Question 8 (Multiple Choice)

How many total nodes are in the final circuit, and what is their breakdown?

A) 31 nodes: 19 attention heads and 12 MLPs
B) 43 nodes: 31 attention heads and 12 MLPs
C) 44 nodes: 1 input, 31 attention heads, and 12 MLPs
D) 44 nodes: 32 attention heads and 12 MLPs

### Question 9 (Free Generation)

Describe the layer distribution pattern of the three types of attention heads in the identified circuit. Which layers contain each type?

### Question 10 (Multiple Choice)

According to the final circuit composition, how many attention heads belong to each functional category?

A) 6 Duplicate Token Heads, 12 S-Inhibition Heads, 15 Name-Mover Heads
B) 10 Duplicate Token Heads, 10 S-Inhibition Heads, 11 Name-Mover Heads
C) 8 Duplicate Token Heads, 8 S-Inhibition Heads, 15 Name-Mover Heads
D) 5 Duplicate Token Heads, 15 S-Inhibition Heads, 11 Name-Mover Heads

### Question 11 (Free Generation)

Given the final circuit composition of 31 attention heads and 12 MLPs, calculate the total write budget used. Show your work and explain why this configuration achieves 100% budget utilization.

### Question 12 (Free Generation)

Explain the causal mechanism by which the three types of heads work together to solve the IOI task. How does information flow from S1 to the final prediction?

### Question 13 (Multiple Choice)

If the IOI task were modified so that the sentence structure changed from 'A and B ... A gave to __' to 'B and A ... A gave to __' (swapping the order of first appearance), which component of the hypothesis would need to be reconsidered first?

A) Duplicate Token Heads, because S1 and S2 positions would change relative to IO position
B) S-Inhibition Heads, because they wouldn't know which token to inhibit
C) Name-Mover Heads, because they would attend to the wrong position
D) MLPs, because they would fail to transform features correctly

### Question 14 (Free Generation)

The documentation mentions that the IOI circuit uses only 10.1% of total model capacity. Calculate the total possible dimensions in GPT2-small and explain what this sparse representation implies about how the model implements the IOI task.

### Question 15 (Multiple Choice)

Why was attention averaging across all 100 examples used to identify functional head types rather than analyzing individual examples?

A) To reduce computational cost
B) To increase the number of data points
C) To find heads with consistent, specialized behavior across contexts rather than context-dependent heads
D) To enable statistical significance testing

### Question 16 (Free Generation)

Explain why all 12 MLPs were included in the circuit even though the main focus was on attention head analysis. What role do MLPs play in the IOI circuit?

### Question 17 (Free Generation)

The documentation mentions 'Negative Name Movers' as an alternative hypothesis to explore. Based on your understanding of the circuit, what might be the function of Negative Name Mover heads and how would they complement the existing three-component hypothesis?

### Question 18 (Multiple Choice)

If ablation studies were performed and removing a8.h6 (the top S-Inhibition Head with 0.74 attention score) caused minimal performance drop, what would be the most likely explanation?

A) The S-Inhibition hypothesis is incorrect
B) The circuit has redundant S-Inhibition heads that compensate for the ablation
C) S-Inhibition is not necessary for the IOI task
D) The attention score measurement was inaccurate

### Question 19 (Code Question (CQ1))

Write code to verify the write budget of the IOI circuit. Given the circuit nodes list, count the attention heads and MLPs, then calculate the total budget used and utilization percentage.

In [None]:
# CQ1: Circuit Budget Verification
# Your task is to verify the write budget of the identified IOI circuit.

# Given circuit data:
circuit_nodes = [
    "input",
    "a0.h1", "a0.h10", "a0.h5", "a0.h6",
    "a1.h11",
    "a10.h0", "a10.h1", "a10.h10", "a10.h2", "a10.h3", "a10.h6", "a10.h7",
    "a11.h10", "a11.h6", "a11.h8",
    "a3.h0", "a3.h6",
    "a6.h0",
    "a7.h3", "a7.h9",
    "a8.h10", "a8.h2", "a8.h3", "a8.h5", "a8.h6",
    "a9.h0", "a9.h2", "a9.h6", "a9.h7", "a9.h8", "a9.h9",
    "m0", "m1", "m2", "m3", "m4", "m5", "m6", "m7", "m8", "m9", "m10", "m11"
]

# Model configuration
D_HEAD = 64  # dimensions per attention head
D_MODEL = 768  # dimensions per MLP
BUDGET_LIMIT = 11200  # maximum allowed dimensions

# TODO: Count the number of attention heads (nodes starting with 'a')
n_attention_heads = None  # Your code here

# TODO: Count the number of MLPs (nodes starting with 'm')
n_mlps = None  # Your code here

# TODO: Calculate total write budget used
total_budget = None  # Your code here

# TODO: Calculate budget utilization percentage
utilization_percentage = None  # Your code here

# Print results
print(f"Attention heads: {n_attention_heads}")
print(f"MLPs: {n_mlps}")
print(f"Total budget used: {total_budget}")
print(f"Budget utilization: {utilization_percentage:.1f}%")


### Question 20 (Code Question (CQ2))

Write code to analyze the layer distribution of attention heads in the circuit. Parse each head name to extract its layer, count heads per layer, find the layer with most heads, and count heads in early (0-3) and late (9-11) layers.

In [None]:
# CQ2: Layer Distribution Analysis
# Your task is to analyze the layer distribution of attention heads in the circuit.

# Given attention head nodes (extracted from circuit):
attention_heads = [
    "a0.h1", "a0.h10", "a0.h5", "a0.h6",
    "a1.h11",
    "a10.h0", "a10.h1", "a10.h10", "a10.h2", "a10.h3", "a10.h6", "a10.h7",
    "a11.h10", "a11.h6", "a11.h8",
    "a3.h0", "a3.h6",
    "a6.h0",
    "a7.h3", "a7.h9",
    "a8.h10", "a8.h2", "a8.h3", "a8.h5", "a8.h6",
    "a9.h0", "a9.h2", "a9.h6", "a9.h7", "a9.h8", "a9.h9"
]

# TODO: Parse the layer number from each head name (format: a{layer}.h{head})
# Create a dictionary counting heads per layer
layer_counts = {}  # Your code here

# TODO: Find which layer has the maximum number of heads
max_layer = None  # Your code here
max_count = None  # Your code here

# TODO: Calculate the total number of heads in early layers (0-3)
early_layer_count = None  # Your code here

# TODO: Calculate the total number of heads in late layers (9-11)
late_layer_count = None  # Your code here

# Print results
print(f"Heads per layer: {dict(sorted(layer_counts.items()))}")
print(f"Layer with most heads: {max_layer} ({max_count} heads)")
print(f"Early layer heads (0-3): {early_layer_count}")
print(f"Late layer heads (9-11): {late_layer_count}")


### Question 21 (Code Question (CQ3))

Write code to analyze the attention scores across the three head types. Calculate mean scores for each type, identify which type has the highest mean, count heads with high selectivity (>0.5), and find the overall top head.

In [None]:
# CQ3: Attention Score Ranking Analysis
# Your task is to analyze and compare attention scores across the three head types.

import numpy as np

# Top attention scores from the documentation
duplicate_token_scores = {
    "a3.h0": 0.7191,
    "a1.h11": 0.6613,
    "a0.h5": 0.6080,
    "a0.h1": 0.5152,
    "a0.h10": 0.2359
}

s_inhibition_scores = {
    "a8.h6": 0.7441,
    "a7.h9": 0.5079,
    "a8.h10": 0.3037,
    "a8.h5": 0.2852,
    "a9.h7": 0.2557
}

name_mover_scores = {
    "a9.h9": 0.7998,
    "a10.h7": 0.7829,
    "a9.h6": 0.7412,
    "a11.h10": 0.6369,
    "a10.h0": 0.3877
}

# TODO: Calculate the mean attention score for each head type
mean_duplicate = None  # Your code here
mean_s_inhibition = None  # Your code here
mean_name_mover = None  # Your code here

# TODO: Determine which head type has the highest mean attention score
highest_mean_type = None  # Your code here

# TODO: Count how many heads in each category have attention > 0.5 (high selectivity)
high_selectivity_duplicate = None  # Your code here
high_selectivity_s_inhibition = None  # Your code here
high_selectivity_name_mover = None  # Your code here

# TODO: Find the overall top head across all categories
all_heads = {**duplicate_token_scores, **s_inhibition_scores, **name_mover_scores}
top_head = None  # Your code here
top_score = None  # Your code here

# Print results
print(f"Mean duplicate token score: {mean_duplicate:.4f}")
print(f"Mean S-inhibition score: {mean_s_inhibition:.4f}")
print(f"Mean name mover score: {mean_name_mover:.4f}")
print(f"Highest mean type: {highest_mean_type}")
print(f"High selectivity counts (>0.5): Duplicate={high_selectivity_duplicate}, S-Inhibition={high_selectivity_s_inhibition}, Name-Mover={high_selectivity_name_mover}")
print(f"Top head overall: {top_head} ({top_score:.4f})")
