# IOI Circuit Analysis Exam

## Overview

This exam assesses your understanding of the IOI (Indirect Object Identification) circuit analysis research documented in this repository. You should answer all questions based solely on the information provided in the documentation.

**Instructions:**
- Read each question carefully
- Answer based on the documentation provided
- For code questions, complete the TODOs in the code cells
- Show your work and reasoning where applicable

## Key Knowledge Points

Before attempting the exam, ensure you understand these core concepts from the documentation:

### 1. Task Definition & Structure
- **IOI Task**: Predict the indirect object at the end of a sentence with two names, one repeated
- **Key Positions**: 
  - S1: First mention of subject
  - S2: Second mention of subject  
  - IO: Indirect object
  - END: Final token position
- **Example**: "As Carl and Maria left the consulate, Carl gave a fridge to ___" → Answer: Maria

### 2. Three-Component Hypothesis
1. **Duplicate Token Heads**: Attend from S2 → S1, detect token duplication
2. **S-Inhibition Heads**: Attend from END → S2, inhibit subject attention
3. **Name-Mover Heads**: Attend from END → IO, copy indirect object to output

### 3. Model Architecture (GPT2-small)
- 12 layers, 12 heads per layer
- d_model = 768, d_head = 64
- Each attention head writes 64 dimensions
- Each MLP writes 768 dimensions

### 4. Write Budget Constraints
- Total budget limit: ≤ 11,200 dimensions
- Final circuit: 31 heads (1,984 dims) + 12 MLPs (9,216 dims) = 11,200 dims (100% utilization)

### 5. Key Findings
- **Baseline accuracy**: 94% on 100 IOI examples
- **Top heads identified**:
  - Duplicate Token: a3.h0 (0.7191)
  - S-Inhibition: a8.h6 (0.7441)
  - Name-Mover: a9.h9 (0.7998)
- **Layered processing**: Early layers detect patterns, middle layers inhibit, late layers predict

### 6. Analysis Insights
- Circuit exhibits clear functional specialization
- Redundancy across multiple heads per category
- Only 10.1% of total model capacity used (sparse subcircuit)
- Strong empirical support for three-component hypothesis


---

# Exam Questions

## Part 1: Factual Understanding

### Question 1

**Type**: Multiple Choice

What is the primary goal of the IOI (Indirect Object Identification) task?

**Options:**
A. To predict the indirect object at the end of a sentence where two names appear, with one name repeated.
B. To identify which subject performed an action in a sentence.
C. To determine the direct object being transferred in a sentence.
D. To classify whether a sentence contains duplicate tokens.


**Your Answer:** [Select A, B, C, or D]

### Question 2

**Type**: Multiple Choice

In the example sentence 'As Carl and Maria left the consulate, Carl gave a fridge to ___', what are the positions S1, S2, and IO respectively?

**Options:**
A. S1 = position 2 (first 'Carl'), S2 = position 9 (second 'Carl'), IO = position 4 ('Maria')
B. S1 = position 4 ('Maria'), S2 = position 9 (second 'Carl'), IO = position 2 (first 'Carl')
C. S1 = position 2 (first 'Carl'), S2 = position 4 ('Maria'), IO = position 9 (second 'Carl')
D. S1 = position 9 (second 'Carl'), S2 = position 2 (first 'Carl'), IO = position 13 ('to')


**Your Answer:** [Select A, B, C, or D]

### Question 3

**Type**: Free Generation

Describe the three functional components hypothesized to comprise the IOI circuit, including what positions they attend from/to and their proposed function.

**Your Answer:**

[Write your answer here]

### Question 4

**Type**: Multiple Choice

How many dimensions does each attention head write to the residual stream in GPT2-small?

**Options:**
A. 64 dimensions
B. 768 dimensions
C. 3,072 dimensions
D. 12 dimensions


**Your Answer:** [Select A, B, C, or D]

### Question 5

**Type**: Multiple Choice

What was the baseline accuracy of GPT2-small on the IOI task (100 example sample)?

**Options:**
A. 94.00%
B. 87.50%
C. 99.00%
D. 72.00%


**Your Answer:** [Select A, B, C, or D]

### Question 6

**Type**: Multiple Choice

Which attention head showed the highest average attention from S2 to S1 (Duplicate Token Head behavior)?

**Options:**
A. a3.h0 (0.7191)
B. a1.h11 (0.6613)
C. a8.h6 (0.7441)
D. a9.h9 (0.7998)


**Your Answer:** [Select A, B, C, or D]

### Question 7

**Type**: Multiple Choice

Which attention head showed the highest average attention from END to IO (Name-Mover Head behavior)?

**Options:**
A. a9.h9 (0.7998)
B. a10.h7 (0.7829)
C. a9.h6 (0.7412)
D. a8.h6 (0.7441)


**Your Answer:** [Select A, B, C, or D]

### Question 8

**Type**: Multiple Choice

How many total nodes (including input, attention heads, and MLPs) are in the final identified circuit?

**Options:**
A. 44 nodes (1 input + 31 heads + 12 MLPs)
B. 43 nodes (31 heads + 12 MLPs)
C. 31 nodes (attention heads only)
D. 55 nodes (1 input + 31 heads + 12 MLPs + 11 additional)


**Your Answer:** [Select A, B, C, or D]

### Question 9

**Type**: Free Generation

Explain why the identified circuit exhibits 'layered processing' and what functional role each layer group plays in the IOI task.

**Your Answer:**

[Write your answer here]

### Question 10

**Type**: Free Generation

The documentation states that the circuit achieves 100% budget utilization with exactly 11,200 dimensions. If the researchers wanted to add 5 more attention heads to the circuit, how many MLPs would they need to remove to stay within budget? Show your calculation.

**Your Answer:**

[Write your answer here]

### Question 11

**Type**: Multiple Choice

Why does the documentation suggest that finding multiple heads per functional category indicates 'robustness through redundancy'?

**Options:**
A. Multiple heads performing similar functions provide backup pathways, so if one head fails or is ablated, others can compensate, making the circuit more resilient to perturbations.
B. Having more heads increases the total budget utilization, making the circuit more efficient.
C. Redundant heads allow the model to process multiple sentences in parallel.
D. Multiple heads with the same function indicate that the circuit was overfit to the training data.


**Your Answer:** [Select A, B, C, or D]

### Question 12

**Type**: Free Generation

Based on the methodology described, propose a concrete experiment to test whether the S-Inhibition Heads are causally necessary for the IOI task. What would you measure and what result would confirm their necessity?

**Your Answer:**

[Write your answer here]

### Question 13

**Type**: Multiple Choice

The circuit uses only 11,200 of 110,592 possible dimensions (10.1% of total model capacity). What does this suggest about how GPT2-small implements the IOI behavior?

**Options:**
A. IOI is implemented by a relatively sparse, specialized subcircuit rather than requiring the full model capacity, suggesting modular functional organization.
B. The model is inefficient and wastes 90% of its capacity on irrelevant computations.
C. IOI is a simple task that doesn't require sophisticated neural processing.
D. The remaining 90% of capacity is used for error correction and robustness.


**Your Answer:** [Select A, B, C, or D]

### Question 14

**Type**: Free Generation

Suppose you found a new attention head a5.h7 that shows strong attention from END to S1 (not S2). How would you classify this head relative to the three hypothesized categories, and what function might it serve?

**Your Answer:**

[Write your answer here]

---

## Part 2: Code-Based Analysis

The following questions require you to write Python code to verify or analyze aspects of the IOI circuit. Complete the TODOs in each code cell.

### Question 15 (CQ1)

Write code to verify the budget constraint calculation. Given the final circuit composition (31 attention heads and 12 MLPs), compute the total dimensional write and verify it equals 11,200. Also calculate what percentage of the budget is used by attention heads vs MLPs.

In [None]:
# TODO: Calculate the budget for the IOI circuit
# Given information:
# - 31 attention heads in the circuit
# - 12 MLPs in the circuit
# - Each attention head writes: d_model / n_heads = 768 / 12 = 64 dimensions
# - Each MLP writes: d_model = 768 dimensions

# TODO: Calculate total dimensions written by attention heads
n_heads = 31
dims_per_head = 64
total_head_dims = # YOUR CODE HERE

# TODO: Calculate total dimensions written by MLPs
n_mlps = 12
dims_per_mlp = 768
total_mlp_dims = # YOUR CODE HERE

# TODO: Calculate total budget
total_budget = # YOUR CODE HERE

# TODO: Calculate percentages
head_percentage = # YOUR CODE HERE
mlp_percentage = # YOUR CODE HERE

# Print results
print(f"Attention heads: {total_head_dims} dimensions ({head_percentage:.2f}%)")
print(f"MLPs: {total_mlp_dims} dimensions ({mlp_percentage:.2f}%)")
print(f"Total budget: {total_budget} dimensions")
print(f"Budget constraint (≤11,200): {'PASS' if total_budget <= 11200 else 'FAIL'}")

### Question 16 (CQ2)

Write code to analyze the layer distribution of the 31 attention heads in the circuit. Count how many heads are in each layer (0-11) and identify which layer has the most heads. Then determine if there's a trend showing more heads in later layers.

In [None]:
# TODO: Analyze the layer distribution of attention heads in the circuit
# The circuit contains these attention heads (from the documentation):
circuit_heads = [
    "a0.h1", "a0.h10", "a0.h5", "a0.h6",
    "a1.h11",
    "a3.h0", "a3.h6",
    "a6.h0",
    "a7.h3", "a7.h9",
    "a8.h10", "a8.h2", "a8.h3", "a8.h5", "a8.h6",
    "a9.h0", "a9.h2", "a9.h6", "a9.h7", "a9.h8", "a9.h9",
    "a10.h0", "a10.h1", "a10.h10", "a10.h2", "a10.h3", "a10.h6", "a10.h7",
    "a11.h10", "a11.h6", "a11.h8"
]

# TODO: Count heads per layer
layer_counts = {}
for head in circuit_heads:
    # Extract layer number from head name (e.g., "a0.h1" -> layer 0)
    layer = # YOUR CODE HERE
    if layer not in layer_counts:
        layer_counts[layer] = 0
    layer_counts[layer] += 1

# TODO: Find the layer with most heads
max_layer = # YOUR CODE HERE
max_count = # YOUR CODE HERE

# TODO: Calculate early vs late layer distribution
# Early layers: 0-3, Late layers: 9-11
early_layers_count = # YOUR CODE HERE
late_layers_count = # YOUR CODE HERE

# Print results
print("Layer distribution:")
for layer in sorted(layer_counts.keys()):
    print(f"  Layer {layer}: {layer_counts[layer]} heads")
print(f"\nLayer with most heads: Layer {max_layer} ({max_count} heads)")
print(f"\nEarly layers (0-3): {early_layers_count} heads")
print(f"Late layers (9-11): {late_layers_count} heads")
print(f"Trend: {'More heads in later layers' if late_layers_count > early_layers_count else 'More heads in early layers'}")

### Question 17 (CQ3)

Write code to simulate what would happen if you wanted to create a minimal IOI circuit using only the top-1 head from each functional category (Duplicate Token, S-Inhibition, Name-Mover) plus all 12 MLPs. Calculate the total budget used and how much budget remains unused compared to the full circuit.

In [None]:
# TODO: Calculate budget for a minimal IOI circuit
# Minimal circuit composition:
# - Top Duplicate Token Head: a3.h0
# - Top S-Inhibition Head: a8.h6
# - Top Name-Mover Head: a9.h9
# - All 12 MLPs (m0 through m11)

# Constants
DIMS_PER_HEAD = 64
DIMS_PER_MLP = 768
BUDGET_LIMIT = 11200
FULL_CIRCUIT_BUDGET = 11200  # From the documentation

# TODO: Calculate minimal circuit budget
minimal_heads = 3  # One from each category
minimal_mlps = 12
minimal_budget = # YOUR CODE HERE

# TODO: Calculate remaining budget
budget_remaining = # YOUR CODE HERE

# TODO: Calculate savings compared to full circuit
budget_saved = # YOUR CODE HERE

# Print results
print(f"Minimal Circuit Composition:")
print(f"  - Attention heads: {minimal_heads} heads × {DIMS_PER_HEAD} dims = {minimal_heads * DIMS_PER_HEAD} dims")
print(f"  - MLPs: {minimal_mlps} MLPs × {DIMS_PER_MLP} dims = {minimal_mlps * DIMS_PER_MLP} dims")
print(f"\nMinimal circuit budget: {minimal_budget} dimensions")
print(f"Budget remaining: {budget_remaining} dimensions")
print(f"Budget saved vs full circuit: {budget_saved} dimensions")
print(f"\nBudget constraint (≤11,200): {'PASS' if minimal_budget <= BUDGET_LIMIT else 'FAIL'}")

---

## Submission Instructions

1. Complete all questions to the best of your ability
2. For multiple choice questions, clearly indicate your selected answer (A, B, C, or D)
3. For free generation questions, provide clear, concise explanations based on the documentation
4. For code questions, ensure your code runs without errors and produces the expected output
5. Save your completed notebook

## Grading Criteria

- **Factual Understanding (40%)**: Correct recall of information from documentation
- **Applied Reasoning (40%)**: Ability to analyze, generalize, and apply concepts
- **Code Implementation (20%)**: Correctness and clarity of code solutions

Good luck!
