In [1]:
# Setup: Change to the correct working directory and check GPU availability
import os
os.chdir('/home/smallyan/critic_model_mechinterp')

import torch
print(f"Working directory: {os.getcwd()}")
print(f"CUDA available: {torch.cuda.is_available()}")
if torch.cuda.is_available():
    print(f"Device: {torch.cuda.get_device_name(0)}")

Working directory: /home/smallyan/critic_model_mechinterp
CUDA available: True
Device: NVIDIA A100 80GB PCIe


# IOI Circuit Analysis - Exam Answers

## Student Answer Notebook

This notebook contains answers to all exam questions based strictly on the provided documentation.

---

## Question 1: Primary Research Objective (Multiple Choice)

**Question:** What is the primary research objective of the IOI circuit analysis?

**Choices:**
- A. Train GPT2-small to perform better on name identification tasks
- B. Identify a precise circuit in GPT2-small that implements IOI behavior within a dimension budget
- C. Compare IOI performance across different transformer models
- D. Develop a new attention mechanism for indirect object identification

**Reasoning:** According to Section 1 (Goal) of the documentation, the Research Objective is stated as: "Identify a precise circuit in GPT2-small that implements the **Indirect Object Identification (IOI)** behavior while adhering to strict residual write-budget constraints (≤ 11,200 dimensions)."

This directly matches option B. The study is not about training the model (A), comparing models (C), or developing new mechanisms (D).

**Answer: B**

---

## Question 2: Three Functional Components (Multiple Choice)

**Question:** According to the IOI hypothesis, which of the following correctly describes the three functional components and their attention patterns?

**Choices:**
- A. Duplicate Token Heads (END→S1), S-Inhibition Heads (S2→END), Name-Mover Heads (IO→END)
- B. Duplicate Token Heads (S1→S2), S-Inhibition Heads (S2→END), Name-Mover Heads (END→IO)
- C. Duplicate Token Heads (S2→S1), S-Inhibition Heads (END→S2), Name-Mover Heads (END→IO)
- D. Duplicate Token Heads (S2→S1), S-Inhibition Heads (IO→S2), Name-Mover Heads (END→S1)

**Reasoning:** According to Section 1 (Hypothesis) of the documentation:
1. **Duplicate Token Heads**: "Active at S2, attending to S1" → S2→S1
2. **S-Inhibition Heads**: "Active at END, attending to S2" → END→S2
3. **Name-Mover Heads**: "Active at END, attending to IO position" → END→IO

This matches option C exactly. The documentation clearly states these attention patterns in Section 3.2 (Attention Pattern Analysis) where it describes:
- "Duplicate Token Heads (S2 → S1 attention)"
- "S-Inhibition Heads (END → S2 attention)"
- "Name-Mover Heads (END → IO attention)"

**Answer: C**

---

## Question 3: Position Identification (Free Generation)

**Question:** Given the example sentence 'As Carl and Maria left the consulate, Carl gave a fridge to', identify the S1, S2, IO, and END positions. Explain why Carl appears twice and what role each position plays in the IOI circuit hypothesis.

**Reasoning:** According to Section 2 (Data) of the documentation, for the example sentence "As Carl and Maria left the consulate, Carl gave a fridge to":

**Key Positions:**
- **S1 (position 2)**: "Carl" - First mention of subject
- **S2 (position 9)**: "Carl" - Second mention of subject  
- **IO (position 4)**: "Maria" - Indirect object
- **END (position 13)**: "to" - Final position

**Why Carl appears twice:**
Carl appears twice because the subject name is repeated in the sentence structure - first as part of the initial phrase ("As Carl and Maria left...") and then as the actor performing the action ("Carl gave..."). This repetition is a key feature of the IOI task that the circuit must detect.

**Role of each position in the IOI circuit hypothesis:**

1. **S1 and S2**: These positions allow Duplicate Token Heads to signal token duplication. The heads at S2 attend to S1 to detect that the same name (Carl) has been repeated, encoding this duplication through position features.

2. **S2**: S-Inhibition Heads at END position attend to S2 to inhibit Name-Mover attention to subject positions. This prevents the model from incorrectly predicting "Carl" as the answer.

3. **IO**: Name-Mover Heads at END attend to the IO position to copy the indirect object token (Maria) to the residual stream for prediction.

4. **END**: This is where the prediction happens. S-Inhibition and Name-Mover heads are active here, working together to predict the correct indirect object (Maria).

**Answer:** 
- S1 = position 2 ("Carl", first mention)
- S2 = position 9 ("Carl", second mention)  
- IO = position 4 ("Maria")
- END = position 13 ("to")

Carl appears twice to create the duplicate token pattern that the circuit must detect to distinguish between the subject (repeated name) and the indirect object (unique name). S1/S2 enable duplication detection, S2 is used for inhibition signaling, IO provides the target for name moving, and END is where the final prediction occurs through the coordinated action of all three component types.

---

## Question 4: Attention Head Dimensions (Multiple Choice)

**Question:** In GPT2-small, how many dimensions does each attention head write to the residual stream?

**Choices:**
- A. 768 dimensions
- B. 64 dimensions
- C. 3,072 dimensions
- D. 12 dimensions

**Reasoning:** According to Section 3.2 (Write Budget Constraints) of the documentation:
- "Each attention head writes: 64 dimensions (d_model / n_heads)"
- This is derived from d_model (768) / n_heads (12) = 64 dimensions per head

The other values in the choices are:
- 768: This is d_model (what each MLP writes)
- 3,072: This is d_mlp
- 12: This is the number of heads per layer

**Answer: B**

---

## Question 5: Budget Calculation Verification (Free Generation)

**Question:** The final IOI circuit contains 31 attention heads and 12 MLPs. Show the calculation to verify that this circuit uses exactly 11,200 dimensions and achieves 100% budget utilization.

In [2]:
# Helper calculation for Question 5

# From documentation:
# - Each attention head writes: 64 dimensions
# - Each MLP writes: 768 dimensions
# - Budget limit: 11,200 dimensions

num_attention_heads = 31
dims_per_head = 64
num_mlps = 12
dims_per_mlp = 768
budget_limit = 11200

# Calculate dimensions
attention_dims = num_attention_heads * dims_per_head
mlp_dims = num_mlps * dims_per_mlp
total_dims = attention_dims + mlp_dims

# Calculate utilization
utilization = (total_dims / budget_limit) * 100

print("Budget Verification Calculation:")
print("=" * 40)
print(f"Attention heads: {num_attention_heads} × {dims_per_head} = {attention_dims} dimensions")
print(f"MLPs: {num_mlps} × {dims_per_mlp} = {mlp_dims} dimensions")
print(f"Total: {attention_dims} + {mlp_dims} = {total_dims} dimensions")
print(f"Budget limit: {budget_limit} dimensions")
print(f"Utilization: {total_dims}/{budget_limit} = {utilization:.1f}%")

Budget Verification Calculation:
Attention heads: 31 × 64 = 1984 dimensions
MLPs: 12 × 768 = 9216 dimensions
Total: 1984 + 9216 = 11200 dimensions
Budget limit: 11200 dimensions
Utilization: 11200/11200 = 100.0%


**Reasoning:** According to Section 3.2 of the documentation:
- Each attention head writes 64 dimensions
- Each MLP writes 768 dimensions
- Budget limit is 11,200 dimensions

**Calculation:**
- Attention heads: 31 × 64 = 1,984 dimensions
- MLPs: 12 × 768 = 9,216 dimensions
- Total: 1,984 + 9,216 = 11,200 dimensions
- Utilization: 11,200 / 11,200 = 100.0%

**Answer:** The circuit uses exactly 11,200 dimensions (1,984 from attention heads + 9,216 from MLPs), achieving 100% budget utilization. This matches the Budget Verification table in Section 4 (Results) of the documentation.

---

## Question 6: Highest Name-Mover Attention (Multiple Choice)

**Question:** Which attention head showed the highest average attention score for Name-Mover behavior (END→IO attention)?

**Choices:**
- A. a3.h0 with 0.7191 average attention
- B. a8.h6 with 0.7441 average attention
- C. a10.h7 with 0.7829 average attention
- D. a9.h9 with 0.7998 average attention

**Reasoning:** According to Section 3.3 (Attention Pattern Analysis) under "Name-Mover Heads (END → IO attention)", the top 5 heads are listed as:
1. a9.h9: 0.7998
2. a10.h7: 0.7829
3. a9.h6: 0.7412
4. a11.h10: 0.6369
5. a10.h0: 0.3877

The highest is a9.h9 with 0.7998 average attention. 

Note that:
- a3.h0 (0.7191) is the top Duplicate Token Head, not Name-Mover
- a8.h6 (0.7441) is the top S-Inhibition Head, not Name-Mover

**Answer: D**

---

## Question 7: Layered Distribution Explanation (Free Generation)

**Question:** The documentation shows that Duplicate Token Heads are found in early layers (0-3), S-Inhibition Heads in middle layers (7-8), and Name-Mover Heads in late layers (9-11). Explain why this layered distribution makes sense from an information processing perspective.

**Reasoning:** According to Section 5 (Analysis) under "Key Observations" point 1 (Layered Processing), the documentation explicitly addresses this:

> "The circuit exhibits clear stratification:
> - Early layers (0-3): Duplicate token detection
> - Middle layers (7-8): Subject inhibition
> - Late layers (9-11): Name moving and prediction"

**Why this makes sense from an information processing perspective:**

1. **Sequential Information Flow**: Transformers process information sequentially through layers. Early layers must detect low-level patterns first before higher-level reasoning can occur.

2. **Dependency Chain**: The three components have a logical dependency:
   - First, the model needs to DETECT that a token is duplicated (Duplicate Token Heads in early layers)
   - Then, using this duplication information, it needs to INHIBIT attention to the repeated subject positions (S-Inhibition Heads in middle layers)
   - Finally, with the subject inhibited, it can MOVE the correct name (indirect object) to the output (Name-Mover Heads in late layers)

3. **Feature Detection to Action**: This follows a natural computational pattern:
   - Early layers: Pattern detection (detecting "Carl" appears twice)
   - Middle layers: Feature integration and control (signaling which positions to avoid)
   - Late layers: Task completion (copying the correct answer)

4. **Positional vs. Semantic Processing**: The documentation notes that Duplicate Token Heads work with "position features" (early/simple processing), while Name-Mover Heads are "ideal for final token prediction" (late/output-focused processing).

**Answer:** The layered distribution makes sense because information processing in the IOI circuit follows a sequential pipeline: (1) early layers detect the basic pattern of token duplication, (2) middle layers use this information to inhibit attention to the repeated subject, and (3) late layers perform the final task of copying the correct indirect object to the prediction. This reflects a natural progression from pattern detection → interference control → output generation, which aligns with how transformers build up increasingly abstract representations through successive layers.

---

## Question 8: Baseline Accuracy (Multiple Choice)

**Question:** What baseline accuracy did GPT2-small achieve on the IOI task before any circuit interventions?

**Choices:**
- A. 100%
- B. 84%
- C. 94%
- D. 74%

**Reasoning:** According to Section 3.3 (Step 1: Baseline Evaluation) of the documentation:
"**Result**: 94.00% accuracy (94/100 correct)"

This is also confirmed in Section 4 (Performance Metrics):
"**Baseline Model Accuracy**: 94.00% (94/100 examples)"

And in Section 7 (Main Takeaways) point 4:
"**High Baseline Performance**: GPT2-small achieves 94% accuracy on IOI"

**Answer: C**

---

## Question 9: Head Count Distribution Explanation (Free Generation)

**Question:** The final circuit includes 6 Duplicate Token Heads, 12 S-Inhibition Heads, and 15 Name-Mover Heads (31 total). Why might there be more Name-Mover Heads than Duplicate Token Heads in the circuit?

**Reasoning:** Based on the documentation, several factors explain this distribution:

1. **Complexity of the Task**: According to Section 5 (Analysis), the documentation notes that "Multiple heads per category suggest robustness through redundancy." Name-Mover Heads perform the final, most critical step of copying the correct token to the output. This is the primary task output, so having more heads dedicated to this function provides greater reliability.

2. **Layer Distribution**: The documentation shows:
   - Duplicate Token Heads: primarily in layers 0, 1, 3 (early layers)
   - Name-Mover Heads: primarily in layers 9, 10, 11 (late layers)
   
   Looking at the Layer Distribution in Section 4:
   - Early layers (0-3): 4 + 1 + 2 = 7 heads total
   - Late layers (9-11): 5 + 7 + 4 = 16 heads total
   
   There are simply more heads selected from later layers, where Name-Mover functionality is concentrated.

3. **Attention Pattern Analysis**: The top 5 Name-Mover heads have high attention scores (0.7998, 0.7829, 0.7412, 0.6369, 0.3877), but even more heads may show significant END→IO attention patterns that qualified them for inclusion.

4. **Functional Requirements**: Duplicate token detection is a simpler positional matching task (detecting the same token appearing twice). Name moving requires actually copying semantic content (the specific name token) to influence the final prediction, which may require more distributed processing.

5. **Selection Strategy**: According to Section 3.3 (Step 3), the strategy was to "maximize circuit expressiveness" with remaining budget after selecting top heads. This may have favored including more late-layer heads that directly impact output.

**Answer:** There are more Name-Mover Heads (15) than Duplicate Token Heads (6) likely because: (1) name moving is the primary output task requiring greater redundancy and reliability, (2) more heads in late layers showed significant END→IO attention patterns, (3) detecting token duplication is computationally simpler than copying semantic content for prediction, and (4) the selection strategy prioritized circuit expressiveness, favoring heads that directly impact the final output. The documentation notes that multiple heads per category suggest "robustness through redundancy," and this redundancy may be more critical for the final prediction step.

---

## Question 10: Dataset Size (Multiple Choice)

**Question:** How many examples from the IOI dataset were used for the analysis in this study?

**Choices:**
- A. 10,000 examples
- B. 100 examples
- C. 1,000 examples
- D. 50 examples

**Reasoning:** According to Section 2 (Data) under "Dataset":
- "**Size**: 10,000 examples (100 used for analysis)"

And confirmed in Section 3.3 (Step 1: Baseline Evaluation):
- "Tokenized 100 IOI examples"

And in Section 4 (Performance Metrics):
- "**Sample Size**: 100 examples from training set"

While the full dataset contains 10,000 examples, only 100 were used for the actual analysis.

**Answer: B**

---

## Question 11: Hypothesis Evaluation (Free Generation)

**Question:** Based on the results, evaluate whether the three-component IOI hypothesis was supported. What specific evidence from the attention pattern analysis supports or contradicts the hypothesis?

**Reasoning:** According to Section 5 (Analysis), the documentation explicitly states: "The analysis **strongly supports** the three-component IOI hypothesis."

**Specific Evidence Supporting the Hypothesis:**

1. **Duplicate Token Heads Identified:**
   - Found 6 heads with strong S2→S1 attention
   - Top head: a3.h0 with 0.72 average attention (high selectivity)
   - Located in early-to-middle layers (0, 1, 3) - consistent with positional feature detection
   - Evidence: High attention scores (0.7191, 0.6613, 0.6080 for top 3)

2. **S-Inhibition Heads Identified:**
   - Found 12 heads with strong END→S2 attention
   - Top head: a8.h6 with 0.74 average attention (high selectivity)
   - Located in middle-to-late layers (7, 8, 9) - appropriate for suppressing subject interference
   - Evidence: High attention scores (0.7441, 0.5079, 0.3037 for top 3)

3. **Name-Mover Heads Identified:**
   - Found 15 heads with strong END→IO attention
   - Top head: a9.h9 with 0.80 average attention (high selectivity)
   - Concentrated in late layers (9, 10, 11) - ideal for final token prediction
   - Evidence: High attention scores (0.7998, 0.7829, 0.7412 for top 3)

**Key Supporting Observations from Documentation:**

1. "Top heads show very strong attention patterns (>0.7) to their hypothesized targets, indicating specialized functionality"

2. The layered processing pattern (early→middle→late) matches the hypothesized information flow

3. "High Selectivity": The top heads in each category show attention >0.7, suggesting they are specifically tuned for their hypothesized functions

**No Contradictory Evidence Mentioned:**
The documentation does not present any evidence that contradicts the hypothesis. All findings align with the three-component model.

**Answer:** The three-component IOI hypothesis was **strongly supported** by the evidence. Specific supporting evidence includes:
- Duplicate Token Heads: 6 heads with strong S2→S1 attention (top: a3.h0 at 0.72) in early layers (0-3)
- S-Inhibition Heads: 12 heads with strong END→S2 attention (top: a8.h6 at 0.74) in middle layers (7-8)
- Name-Mover Heads: 15 heads with strong END→IO attention (top: a9.h9 at 0.80) in late layers (9-11)

The high selectivity (>0.7 attention) of top heads and the clear layer stratification matching the hypothesized information flow (detection → inhibition → prediction) provide strong support. No contradictory evidence was found in the analysis.

---

## Question 12: Total Nodes in Circuit (Multiple Choice)

**Question:** What is the total number of nodes in the final IOI circuit?

**Choices:**
- A. 31 nodes
- B. 12 nodes
- C. 43 nodes
- D. 44 nodes

**Reasoning:** According to Section 4 (Results) under "Final Circuit Composition":

> "**Total Nodes**: 44
> - 1 input node
> - 31 attention heads
> - 12 MLPs"

Calculation: 1 + 31 + 12 = 44 nodes

Note that 31 is just the attention heads, and 12 is just the MLPs. The total must include the input node.

**Answer: D**

---

## Question 13: Cross-Dataset Validation Experiment (Free Generation)

**Question:** The documentation mentions 'Cross-Dataset Validation' as a potential extension. Describe a specific experiment that could test whether the identified IOI circuit generalizes to other name-based tasks, and what results would support or refute generalization.

**Reasoning:** According to Section 6 (Next Steps) under "Potential Extensions" point 4:
> "**Cross-Dataset Validation**: Test if identified heads generalize to other name-based tasks"

**Specific Experiment Design:**

**Task:** Test the identified IOI circuit on a different name-based task, such as a "Name Attribution" task where the model must identify who performed an action.

**Dataset Example:**
- Sentences like: "After John gave the book to Mary, she thanked ___ for the gift"
- Correct answer: John
- This requires similar name tracking but with different syntactic structure

**Alternative Dataset:** Could also use:
- Gendered pronoun resolution (e.g., "John told Mary that he/she would...")
- Name coreference resolution
- Subject-verb agreement with names

**Experimental Procedure:**
1. Create or obtain a dataset of 100+ examples of the alternative name-based task
2. Run the same attention pattern analysis on the new task
3. For each of the 44 identified IOI circuit nodes:
   - Measure if they show similar functional attention patterns
   - For Duplicate Token Heads: Do they still attend to repeated tokens?
   - For S-Inhibition Heads: Do they still inhibit attention to specific positions?
   - For Name-Mover Heads: Do they still move name information to prediction positions?
4. Perform ablation studies: Remove the IOI circuit nodes and measure performance drop on the new task

**Results Supporting Generalization:**
- The same heads (e.g., a3.h0, a8.h6, a9.h9) show high attention scores for analogous patterns in the new task
- Ablating the IOI circuit significantly degrades performance on the new task
- Similar layer stratification is observed (early detection, middle inhibition, late output)

**Results Refuting Generalization:**
- The identified heads show random or weak attention patterns on the new task
- Ablating the IOI circuit has minimal effect on new task performance
- Different heads show strong task-relevant attention patterns
- The layer distribution of important heads differs significantly

**Answer:** A cross-dataset validation experiment could test the IOI circuit on a "Name Attribution" task (e.g., "John gave the book to Mary, she thanked ___ for the gift" → John). The experiment would measure whether the same 44 circuit nodes show similar functional attention patterns (Duplicate Token, S-Inhibition, Name-Mover) and perform ablation studies.

Results supporting generalization would include: (1) the same heads showing strong attention to analogous positions, (2) significant performance drop when ablating the IOI circuit, and (3) similar layer stratification.

Results refuting generalization would include: (1) weak or random attention patterns from IOI circuit heads, (2) minimal performance impact from ablation, and (3) different heads emerging as important for the new task.

---

## Question 14 (CQ1): Budget Calculation Verification (Code Required)

**Question:** Write code to verify the IOI circuit budget calculation. Given the circuit contains 31 attention heads (each writing 64 dimensions) and 12 MLPs (each writing 768 dimensions), calculate:
1. Total dimensions used by attention heads
2. Total dimensions used by MLPs  
3. Total circuit dimensions
4. Budget utilization percentage (budget limit is 11,200)

Print each value with appropriate labels.

In [3]:
# CQ1: Budget Calculation Verification

# Circuit parameters from documentation
num_attention_heads = 31
dims_per_attention_head = 64

num_mlps = 12
dims_per_mlp = 768

budget_limit = 11200

# Calculate dimensions
total_attention_dims = num_attention_heads * dims_per_attention_head
total_mlp_dims = num_mlps * dims_per_mlp
total_circuit_dims = total_attention_dims + total_mlp_dims

# Calculate utilization
budget_utilization = (total_circuit_dims / budget_limit) * 100

# Print results with appropriate labels
print("IOI Circuit Budget Verification")
print("=" * 50)
print(f"1. Total dimensions used by attention heads: {total_attention_dims}")
print(f"   ({num_attention_heads} heads × {dims_per_attention_head} dims/head)")
print()
print(f"2. Total dimensions used by MLPs: {total_mlp_dims}")
print(f"   ({num_mlps} MLPs × {dims_per_mlp} dims/MLP)")
print()
print(f"3. Total circuit dimensions: {total_circuit_dims}")
print(f"   ({total_attention_dims} + {total_mlp_dims})")
print()
print(f"4. Budget utilization percentage: {budget_utilization:.1f}%")
print(f"   ({total_circuit_dims} / {budget_limit} × 100)")

IOI Circuit Budget Verification
1. Total dimensions used by attention heads: 1984
   (31 heads × 64 dims/head)

2. Total dimensions used by MLPs: 9216
   (12 MLPs × 768 dims/MLP)

3. Total circuit dimensions: 11200
   (1984 + 9216)

4. Budget utilization percentage: 100.0%
   (11200 / 11200 × 100)


**Reasoning:** Using the values from Section 3.2 of the documentation:
- Each attention head writes 64 dimensions (d_model / n_heads = 768 / 12)
- Each MLP writes 768 dimensions (d_model)
- Budget limit is 11,200 dimensions

The code calculates each component and verifies the budget utilization.

**Answer:** 
1. Total attention head dimensions: 1,984 (31 × 64)
2. Total MLP dimensions: 9,216 (12 × 768)
3. Total circuit dimensions: 11,200
4. Budget utilization: 100.0%

This confirms the circuit uses exactly the budget limit with no waste.

---

## Question 15 (CQ2): Layer Distribution Analysis (Code Required)

**Question:** Write code to analyze the layer distribution of the IOI circuit heads. Given the following selected attention heads:
['a0.h1', 'a0.h10', 'a0.h5', 'a0.h6', 'a1.h11', 'a3.h0', 'a3.h6', 'a6.h0', 'a7.h3', 'a7.h9', 'a8.h10', 'a8.h2', 'a8.h3', 'a8.h5', 'a8.h6', 'a9.h0', 'a9.h2', 'a9.h6', 'a9.h7', 'a9.h8', 'a9.h9', 'a10.h0', 'a10.h1', 'a10.h10', 'a10.h2', 'a10.h3', 'a10.h6', 'a10.h7', 'a11.h10', 'a11.h6', 'a11.h8']

Calculate and print:
1. Number of heads per layer (0-11)
2. Total number of heads
3. Which layer has the most heads

In [4]:
# CQ2: Layer Distribution Analysis

# Selected attention heads from the IOI circuit
attention_heads = [
    'a0.h1', 'a0.h10', 'a0.h5', 'a0.h6', 
    'a1.h11', 
    'a3.h0', 'a3.h6', 
    'a6.h0', 
    'a7.h3', 'a7.h9', 
    'a8.h10', 'a8.h2', 'a8.h3', 'a8.h5', 'a8.h6', 
    'a9.h0', 'a9.h2', 'a9.h6', 'a9.h7', 'a9.h8', 'a9.h9', 
    'a10.h0', 'a10.h1', 'a10.h10', 'a10.h2', 'a10.h3', 'a10.h6', 'a10.h7', 
    'a11.h10', 'a11.h6', 'a11.h8'
]

# Initialize count dictionary for all layers (0-11)
layer_counts = {i: 0 for i in range(12)}

# Count heads per layer
for head in attention_heads:
    # Extract layer number from head name (e.g., 'a0.h1' -> 0)
    layer = int(head.split('.')[0][1:])
    layer_counts[layer] += 1

# Print results
print("Layer Distribution of IOI Circuit Attention Heads")
print("=" * 50)
print()

# 1. Number of heads per layer
print("1. Number of heads per layer:")
for layer in range(12):
    count = layer_counts[layer]
    bar = '█' * count
    print(f"   Layer {layer:2d}: {count} heads {bar}")
print()

# 2. Total number of heads
total_heads = sum(layer_counts.values())
print(f"2. Total number of heads: {total_heads}")
print()

# 3. Layer with most heads
max_count = max(layer_counts.values())
layers_with_max = [layer for layer, count in layer_counts.items() if count == max_count]
print(f"3. Layer with most heads: Layer {layers_with_max[0]} ({max_count} heads)")

Layer Distribution of IOI Circuit Attention Heads

1. Number of heads per layer:
   Layer  0: 4 heads ████
   Layer  1: 1 heads █
   Layer  2: 0 heads 
   Layer  3: 2 heads ██
   Layer  4: 0 heads 
   Layer  5: 0 heads 
   Layer  6: 1 heads █
   Layer  7: 2 heads ██
   Layer  8: 5 heads █████
   Layer  9: 6 heads ██████
   Layer 10: 7 heads ███████
   Layer 11: 3 heads ███

2. Total number of heads: 31

3. Layer with most heads: Layer 10 (7 heads)


**Reasoning:** The code parses each attention head name to extract the layer number (e.g., 'a10.h7' → layer 10) and counts heads per layer. According to the documentation's Layer Distribution in Section 4:
- Layer 0: 4 heads
- Layer 1: 1 head
- Layer 3: 2 heads
- Layer 6: 1 head
- Layer 7: 2 heads
- Layer 8: 5 heads
- Layer 9: 5 heads
- Layer 10: 7 heads
- Layer 11: 4 heads

Note: The code output shows Layer 9 has 6 heads and Layer 11 has 3 heads, which differs slightly from the documentation. This is because the provided list contains 31 heads as stated, but the distribution may have a minor discrepancy with the documentation's stated distribution.

**Answer:**
1. Heads per layer: Layer 0: 4, Layer 1: 1, Layer 2: 0, Layer 3: 2, Layer 4: 0, Layer 5: 0, Layer 6: 1, Layer 7: 2, Layer 8: 5, Layer 9: 6, Layer 10: 7, Layer 11: 3
2. Total number of heads: 31
3. Layer with most heads: Layer 10 with 7 heads

The late layers (8-11) contain the majority of heads (21 out of 31), consistent with the importance of S-Inhibition and Name-Mover Heads in those layers.

---

## Question 16 (CQ3): Attention Score Rankings Verification (Code Required)

**Question:** Write code to verify the attention score rankings for the three head types. Given the following attention scores:

Duplicate Token Heads (S2→S1):
- a3.h0: 0.7191, a1.h11: 0.6613, a0.h5: 0.6080, a0.h1: 0.5152, a0.h10: 0.2359

S-Inhibition Heads (END→S2):
- a8.h6: 0.7441, a7.h9: 0.5079, a8.h10: 0.3037, a8.h5: 0.2852, a9.h7: 0.2557

Name-Mover Heads (END→IO):
- a9.h9: 0.7998, a10.h7: 0.7829, a9.h6: 0.7412, a11.h10: 0.6369, a10.h0: 0.3877

For each head type, calculate and print:
1. The mean attention score of the top 5 heads
2. The head with highest attention score
3. Whether the top head has attention > 0.7 (strong selectivity)

In [5]:
# CQ3: Attention Score Rankings Verification

# Attention scores from documentation
duplicate_token_heads = {
    'a3.h0': 0.7191,
    'a1.h11': 0.6613,
    'a0.h5': 0.6080,
    'a0.h1': 0.5152,
    'a0.h10': 0.2359
}

s_inhibition_heads = {
    'a8.h6': 0.7441,
    'a7.h9': 0.5079,
    'a8.h10': 0.3037,
    'a8.h5': 0.2852,
    'a9.h7': 0.2557
}

name_mover_heads = {
    'a9.h9': 0.7998,
    'a10.h7': 0.7829,
    'a9.h6': 0.7412,
    'a11.h10': 0.6369,
    'a10.h0': 0.3877
}

def analyze_head_type(name, heads_dict):
    """Analyze attention scores for a head type."""
    scores = list(heads_dict.values())
    head_names = list(heads_dict.keys())
    
    # Find max
    max_score = max(scores)
    max_head = head_names[scores.index(max_score)]
    
    # Calculate mean
    mean_score = sum(scores) / len(scores)
    
    # Check strong selectivity
    strong_selectivity = max_score > 0.7
    
    print(f"{name}:")
    print(f"  1. Mean attention score of top 5 heads: {mean_score:.4f}")
    print(f"  2. Head with highest attention score: {max_head} ({max_score:.4f})")
    print(f"  3. Strong selectivity (top head > 0.7): {'Yes' if strong_selectivity else 'No'}")
    print()
    
    return mean_score, max_head, max_score, strong_selectivity

print("Attention Score Analysis for IOI Circuit Head Types")
print("=" * 60)
print()

# Analyze each head type
dt_results = analyze_head_type("Duplicate Token Heads (S2→S1)", duplicate_token_heads)
si_results = analyze_head_type("S-Inhibition Heads (END→S2)", s_inhibition_heads)
nm_results = analyze_head_type("Name-Mover Heads (END→IO)", name_mover_heads)

# Summary
print("Summary:")
print("-" * 60)
print(f"All three head types have top heads with strong selectivity (>0.7): ")
print(f"  - Duplicate Token: {dt_results[3]}")
print(f"  - S-Inhibition: {si_results[3]}")
print(f"  - Name-Mover: {nm_results[3]}")

Attention Score Analysis for IOI Circuit Head Types

Duplicate Token Heads (S2→S1):
  1. Mean attention score of top 5 heads: 0.5479
  2. Head with highest attention score: a3.h0 (0.7191)
  3. Strong selectivity (top head > 0.7): Yes

S-Inhibition Heads (END→S2):
  1. Mean attention score of top 5 heads: 0.4193
  2. Head with highest attention score: a8.h6 (0.7441)
  3. Strong selectivity (top head > 0.7): Yes

Name-Mover Heads (END→IO):
  1. Mean attention score of top 5 heads: 0.6697
  2. Head with highest attention score: a9.h9 (0.7998)
  3. Strong selectivity (top head > 0.7): Yes

Summary:
------------------------------------------------------------
All three head types have top heads with strong selectivity (>0.7): 
  - Duplicate Token: True
  - S-Inhibition: True
  - Name-Mover: True


**Reasoning:** The code stores the attention scores from Section 3.3 of the documentation and calculates summary statistics for each head type. The analysis confirms the patterns described in Section 5 (Analysis) under "High Selectivity" where it states "Top heads show very strong attention patterns (>0.7) to their hypothesized targets."

**Answer:**

**Duplicate Token Heads (S2→S1):**
1. Mean attention score: 0.5479
2. Highest: a3.h0 (0.7191)
3. Strong selectivity: Yes

**S-Inhibition Heads (END→S2):**
1. Mean attention score: 0.4193
2. Highest: a8.h6 (0.7441)
3. Strong selectivity: Yes

**Name-Mover Heads (END→IO):**
1. Mean attention score: 0.6697
2. Highest: a9.h9 (0.7998)
3. Strong selectivity: Yes

All three head types have their top head with attention >0.7, confirming strong selectivity and specialized functionality for their hypothesized roles.

---

## End of Exam

All questions have been answered based strictly on the provided documentation.