In [1]:
import os
os.chdir('/home/smallyan/critic_model_mechinterp')
print(f"Current working directory: {os.getcwd()}")

# Check if CUDA is available
import torch
print(f"CUDA available: {torch.cuda.is_available()}")
if torch.cuda.is_available():
    print(f"CUDA device: {torch.cuda.get_device_name(0)}")

Current working directory: /home/smallyan/critic_model_mechinterp


CUDA available: True
CUDA device: NVIDIA A100 80GB PCIe


# IOI Circuit Analysis Exam - Student Answers

Student: Claude (AI Student)  
Date: 2025-11-14  
Documentation Source: `/home/smallyan/critic_model_mechinterp/runs/circuits_claude_2025-11-09_14-46-37/logs/documentation.md`

---

## Part 1: Factual Understanding

### Question 1

**Question:** What is the primary goal of the IOI (Indirect Object Identification) task?

**Options:**
- A. To predict the indirect object at the end of a sentence where two names appear, with one name repeated.
- B. To identify which subject performed an action in a sentence.
- C. To determine the direct object being transferred in a sentence.
- D. To classify whether a sentence contains duplicate tokens.

**Reasoning:**
According to Section 1 (Goal) of the documentation, the research objective is to "Identify a precise circuit in GPT2-small that implements the **Indirect Object Identification (IOI)** behavior." Section 2 (Data) explicitly states that the task is to "Predict the indirect object at the end of a sentence" and provides the example structure where two names appear with one being repeated (the subject S appears at S1 and S2, while the indirect object IO appears once). The correct answer must be the one that predicts the indirect object in a sentence with two names where one is repeated.

**Answer:** A

### Question 2

**Question:** In the example sentence 'As Carl and Maria left the consulate, Carl gave a fridge to ___', what are the positions S1, S2, and IO respectively?

**Options:**
- A. S1 = position 2 (first 'Carl'), S2 = position 9 (second 'Carl'), IO = position 4 ('Maria')
- B. S1 = position 4 ('Maria'), S2 = position 9 (second 'Carl'), IO = position 2 (first 'Carl')
- C. S1 = position 2 (first 'Carl'), S2 = position 4 ('Maria'), IO = position 9 (second 'Carl')
- D. S1 = position 9 (second 'Carl'), S2 = position 2 (first 'Carl'), IO = position 13 ('to')

**Reasoning:**
Section 2 (Data) under "Example Sentence Structure" provides this exact example and explicitly states:
- "**S1**: First mention of subject (position 2: 'Carl')"
- "**S2**: Second mention of subject (position 9: 'Carl')"
- "**IO**: Indirect object (position 4: 'Maria')"
- "**END**: Final position (position 13: 'to')"

The documentation clearly identifies S1 as the first Carl at position 2, S2 as the second Carl at position 9, and IO as Maria at position 4.

**Answer:** A

### Question 3

**Question:** Describe the three functional components hypothesized to comprise the IOI circuit, including what positions they attend from/to and their proposed function.

**Reasoning:**
Section 1 (Goal) under "Hypothesis" explicitly lists the three functional components. Section 3.3 (Analysis Pipeline) Step 2 provides detailed descriptions of the attention patterns for each component type.

Based on the documentation:

1. **Duplicate Token Heads**: These heads are active at position S2 and attend to position S1. Their function is to signal token duplication through position features. The documentation describes them as having "S2 → S1 attention" and states they detect when a token appears twice in the sequence.

2. **S-Inhibition Heads**: These heads are active at the END position and attend to position S2. Their function is to inhibit Name-Mover attention to subject positions. The documentation describes them as having "END → S2 attention" and they work to suppress attention to the subject (which would be the wrong answer).

3. **Name-Mover Heads**: These heads are active at the END position and attend to the IO position. Their function is to copy the indirect object token to the residual stream for final prediction. The documentation describes them as having "END → IO attention" and they are responsible for moving the correct answer (the indirect object) to the output.

**Answer:**

The three functional components are:

1. **Duplicate Token Heads**: Attend from S2 → S1; detect and signal token duplication through position features
2. **S-Inhibition Heads**: Attend from END → S2; inhibit Name-Mover attention to subject positions  
3. **Name-Mover Heads**: Attend from END → IO; copy the indirect object token to the residual stream for prediction

### Question 4

**Question:** How many dimensions does each attention head write to the residual stream in GPT2-small?

**Options:**
- A. 64 dimensions
- B. 768 dimensions
- C. 3,072 dimensions
- D. 12 dimensions

**Reasoning:**
Section 3.2 (Write Budget Constraints) explicitly states: "Each attention head writes: 64 dimensions (d_model / n_heads)". This is further confirmed in Section 3.1 (Model Configuration) which lists:
- d_model: 768
- Heads per layer: 12
- d_head: 64

The calculation is: d_model / n_heads = 768 / 12 = 64 dimensions per head.

**Answer:** A

### Question 5

**Question:** What was the baseline accuracy of GPT2-small on the IOI task (100 example sample)?

**Options:**
- A. 94.00%
- B. 87.50%
- C. 99.00%
- D. 72.00%

**Reasoning:**
Section 3.3 (Analysis Pipeline) Step 1: Baseline Evaluation explicitly states:
"**Result**: 94.00% accuracy (94/100 correct)"

This is also confirmed in Section 4 (Results) under "Performance Metrics" which states:
"**Baseline Model Accuracy**: 94.00% (94/100 examples)"

**Answer:** A

### Question 6

**Question:** Which attention head showed the highest average attention from S2 to S1 (Duplicate Token Head behavior)?

**Options:**
- A. a3.h0 (0.7191)
- B. a1.h11 (0.6613)
- C. a8.h6 (0.7441)
- D. a9.h9 (0.7998)

**Reasoning:**
Section 3.3 (Analysis Pipeline) Step 2: Attention Pattern Analysis under "Duplicate Token Heads (S2 → S1 attention)" lists the top 5 heads:
1. a3.h0: 0.7191
2. a1.h11: 0.6613
3. a0.h5: 0.6080
4. a0.h1: 0.5152
5. a0.h10: 0.2359

The highest average attention from S2 to S1 is head a3.h0 with 0.7191. Note that a8.h6 (0.7441) is the top S-Inhibition head (END → S2), and a9.h9 (0.7998) is the top Name-Mover head (END → IO), not Duplicate Token heads.

**Answer:** A

### Question 7

**Question:** Which attention head showed the highest average attention from END to IO (Name-Mover Head behavior)?

**Options:**
- A. a9.h9 (0.7998)
- B. a10.h7 (0.7829)
- C. a9.h6 (0.7412)
- D. a8.h6 (0.7441)

**Reasoning:**
Section 3.3 (Analysis Pipeline) Step 2: Attention Pattern Analysis under "Name-Mover Heads (END → IO attention)" lists the top 5 heads:
1. a9.h9: 0.7998
2. a10.h7: 0.7829
3. a9.h6: 0.7412
4. a11.h10: 0.6369
5. a10.h0: 0.3877

The highest average attention from END to IO is head a9.h9 with 0.7998. This is also mentioned in Section 5 (Analysis) under "Key Observations" where it states that top Name-Mover heads show "a9.h9 with 0.80 avg attention."

**Answer:** A

### Question 8

**Question:** How many total nodes (including input, attention heads, and MLPs) are in the final identified circuit?

**Options:**
- A. 44 nodes (1 input + 31 heads + 12 MLPs)
- B. 43 nodes (31 heads + 12 MLPs)
- C. 31 nodes (attention heads only)
- D. 55 nodes (1 input + 31 heads + 12 MLPs + 11 additional)

**Reasoning:**
Section 4 (Results) under "Final Circuit Composition" explicitly states:
"**Total Nodes**: 44
- 1 input node
- 31 attention heads
- 12 MLPs"

This is the arithmetic: 1 + 31 + 12 = 44 total nodes.

**Answer:** A

### Question 9

**Question:** Explain why the identified circuit exhibits 'layered processing' and what functional role each layer group plays in the IOI task.

**Reasoning:**
Section 5 (Analysis) under "Key Observations" point 1 states: "**Layered Processing**: The circuit exhibits clear stratification" and lists three distinct layer groups with their functions:
- Early layers (0-3): Duplicate token detection
- Middle layers (7-8): Subject inhibition
- Late layers (9-11): Name moving and prediction

Section 5 also under "Support for Hypothesis" provides additional detail about why this layering makes sense functionally, stating that Duplicate Token heads are "predominantly in early-to-middle layers (0, 1, 3), consistent with positional feature detection," S-Inhibition heads are "in middle-to-late layers (7, 8, 9), appropriate for suppressing subject interference," and Name-Mover heads are "concentrated in late layers (9, 10, 11), ideal for final token prediction."

The documentation also states in the Main Takeaways (Section 7, point 3) that "The circuit exhibits clear layered processing, with early layers detecting patterns, middle layers inhibiting interference, and late layers performing final prediction."

**Answer:**

The circuit exhibits 'layered processing' because different functional components are concentrated in different layers of the model, creating a hierarchical pipeline:

1. **Early layers (0-3)**: Perform duplicate token detection. These layers identify that the subject token appears twice in the sequence (S1 and S2 positions) and extract positional features.

2. **Middle layers (7-8)**: Handle subject inhibition. These layers suppress attention to the subject positions to prevent the model from incorrectly predicting the subject instead of the indirect object.

3. **Late layers (9-11)**: Execute name moving and final prediction. These layers attend to the indirect object position and copy that token to the residual stream for the final output prediction.

This stratification shows that the IOI task is solved through a sequential computational pipeline where each layer group builds on the previous processing stage.

### Question 10

**Question:** The documentation states that the circuit achieves 100% budget utilization with exactly 11,200 dimensions. If the researchers wanted to add 5 more attention heads to the circuit, how many MLPs would they need to remove to stay within budget? Show your calculation.

**Reasoning:**
From Section 3.2 (Write Budget Constraints):
- Each attention head writes: 64 dimensions
- Each MLP writes: 768 dimensions
- Total budget: ≤ 11,200 dimensions

Current circuit (from Section 4):
- 31 attention heads × 64 = 1,984 dimensions
- 12 MLPs × 768 = 9,216 dimensions
- Total = 11,200 dimensions (100% utilization)

Calculation:
- Adding 5 attention heads adds: 5 × 64 = 320 dimensions
- New budget requirement: 11,200 + 320 = 11,520 dimensions
- Amount over budget: 11,520 - 11,200 = 320 dimensions
- MLPs to remove: 320 ÷ 768 = 0.4167 MLPs

Since we cannot remove a fractional MLP, we must round up to remove 1 complete MLP.
- Removing 1 MLP frees: 1 × 768 = 768 dimensions
- New total with +5 heads and -1 MLP: 11,200 + 320 - 768 = 10,752 dimensions
- This is within budget (≤ 11,200) ✓

**Answer:**

To add 5 attention heads while staying within the 11,200 dimension budget:

**Calculation:**
- 5 additional heads × 64 dims/head = 320 additional dimensions needed
- 320 dimensions ÷ 768 dims/MLP = 0.4167 MLPs
- Since MLPs are indivisible units, we must remove at least 1 MLP

**Answer: 1 MLP must be removed**

Verification:
- Original: 31 heads + 12 MLPs = 1,984 + 9,216 = 11,200 dims
- Modified: 36 heads + 11 MLPs = 2,304 + 8,448 = 10,752 dims ✓ (within budget)

### Question 11

**Question:** Why does the documentation suggest that finding multiple heads per functional category indicates 'robustness through redundancy'?

**Options:**
- A. Multiple heads performing similar functions provide backup pathways, so if one head fails or is ablated, others can compensate, making the circuit more resilient to perturbations.
- B. Having more heads increases the total budget utilization, making the circuit more efficient.
- C. Redundant heads allow the model to process multiple sentences in parallel.
- D. Multiple heads with the same function indicate that the circuit was overfit to the training data.

**Reasoning:**
Section 5 (Analysis) under "Key Observations" point 4 states:
"**Redundancy**: Multiple heads per category suggest robustness through redundancy, which is common in neural networks"

The concept of "robustness through redundancy" in neural networks refers to having multiple components that can perform similar functions, so that the system remains functional even if some components are damaged or removed. This is a well-established principle in neural network architecture. Option A correctly describes this concept - multiple heads provide backup pathways that can compensate if individual heads fail or are ablated (removed).

Options B, C, and D are not supported by the documentation and do not align with the standard meaning of "robustness through redundancy" in neural network contexts.

**Answer:** A

### Question 12

**Question:** Based on the methodology described, propose a concrete experiment to test whether the S-Inhibition Heads are causally necessary for the IOI task. What would you measure and what result would confirm their necessity?

**Reasoning:**
Section 6 (Next Steps) under "Potential Extensions" suggests several experimental approaches:
1. "**Ablation Studies**: Remove individual heads or head categories to measure performance impact"
3. "**Activation Patching**: Directly test causal role of each component"

The documentation's hypothesis (Section 1) states that S-Inhibition Heads work by "inhibiting Name-Mover attention to subject positions" (attending from END → S2). Section 5 describes these heads as being responsible for "suppressing subject interference."

Based on this methodology, an ablation study would be the most direct way to test causal necessity. The key measurement would be whether the model starts predicting the subject (S) instead of the indirect object (IO) when S-Inhibition heads are removed, since these heads are hypothesized to suppress attention to the subject.

**Answer:**

**Proposed Experiment: Ablation Study of S-Inhibition Heads**

**Procedure:**
1. Identify all 12 S-Inhibition heads in the circuit (those with strong END → S2 attention, particularly a8.h6, a7.h9, a8.h10, a8.h5, and a9.h7)
2. Run the model on the 100 IOI test examples with these heads ablated (set their outputs to zero or remove them from the circuit)
3. Evaluate model predictions and compare to baseline performance

**Measurements:**
- Overall accuracy on IOI task (baseline: 94%)
- Specifically measure how often the model incorrectly predicts the subject (S) instead of the indirect object (IO)
- Compare logit differences between IO and S tokens (with vs. without S-Inhibition heads)

**Expected Results if S-Inhibition Heads are Causally Necessary:**
- Accuracy should drop significantly below the 94% baseline
- The model should show increased incorrect predictions of the subject token (S)
- The logit advantage for IO over S should decrease or reverse, indicating the model is no longer properly suppressing subject interference

This would confirm that S-Inhibition heads are causally necessary for preventing the model from incorrectly attending to and predicting the repeated subject token.

### Question 13

**Question:** The circuit uses only 11,200 of 110,592 possible dimensions (10.1% of total model capacity). What does this suggest about how GPT2-small implements the IOI behavior?

**Options:**
- A. IOI is implemented by a relatively sparse, specialized subcircuit rather than requiring the full model capacity, suggesting modular functional organization.
- B. The model is inefficient and wastes 90% of its capacity on irrelevant computations.
- C. IOI is a simple task that doesn't require sophisticated neural processing.
- D. The remaining 90% of capacity is used for error correction and robustness.

**Reasoning:**
Section 7 (Main Takeaways) point 5 explicitly states:
"**Efficient Representation**: The circuit uses only 11,200 of 110,592 possible dimensions (10.1% of total model capacity), suggesting IOI is implemented by a relatively sparse subcircuit"

The documentation emphasizes that this finding suggests "modular functional organization" where specific tasks are handled by specialized subcircuits rather than requiring the entire model. This is a key finding about neural network interpretability - complex behaviors can be localized to specific components.

Options B, C, and D are not supported by the documentation. The documentation does not suggest the model is inefficient (B), does not characterize IOI as simple (it identifies complex multi-component processing), and does not discuss error correction in the remaining capacity (D).

**Answer:** A

### Question 14

**Question:** Suppose you found a new attention head a5.h7 that shows strong attention from END to S1 (not S2). How would you classify this head relative to the three hypothesized categories, and what function might it serve?

**Reasoning:**
The documentation describes three functional categories based on their attention patterns:
1. **Duplicate Token Heads**: S2 → S1 attention
2. **S-Inhibition Heads**: END → S2 attention  
3. **Name-Mover Heads**: END → IO attention

A head with END → S1 attention does not fit any of the three hypothesized categories. However, Section 6 (Next Steps) under "Alternative Hypotheses to Explore" mentions:
"**Negative Name Movers**: Heads that move S tokens to suppress them"

Since S1 is the first occurrence of the subject token (which should NOT be the answer), a head attending from END → S1 could potentially be involved in suppressing or inhibiting the subject. This would be similar to S-Inhibition heads but targeting S1 instead of S2.

The documentation does not explicitly describe END → S1 heads, so this requires inference based on the functional logic of the task. Since both S1 and S2 are subject positions that need to be suppressed in favor of the IO, a head attending to S1 could serve an inhibitory function.

**Answer:**

The head a5.h7 (END → S1 attention) does not fit into any of the three hypothesized categories (Duplicate Token, S-Inhibition, or Name-Mover).

**Classification:** This would likely be a variant of an inhibition head, possibly what the documentation refers to as a "Negative Name Mover" in the alternative hypotheses section.

**Proposed Function:** Since S1 is the first mention of the subject (which is the incorrect answer for IOI), this head might serve to suppress or inhibit attention to the first subject position. This would complement the S-Inhibition heads (which suppress S2) by ensuring that neither subject position is selected as the final answer.

Alternatively, this head could be part of a backup pathway that helps distinguish between S1 and S2 positions, or it could be involved in the early detection phase by identifying the first occurrence of the repeated name.

To determine its true function, the documentation's methodology suggests performing ablation studies or activation patching experiments on this head.

---

## Part 2: Code-Based Analysis

### Question 15 (CQ1): Budget Constraint Verification and Component Analysis

**Question:** Write code to verify the budget constraint calculation. Given the final circuit composition (31 attention heads and 12 MLPs), compute the total dimensional write and verify it equals 11,200. Also calculate what percentage of the budget is used by attention heads vs MLPs.

**Reference:** Section 3.2 (Write Budget Constraints) and Section 4 (Budget Verification table)

In [2]:
# Calculate the budget for the IOI circuit
# Given information:
# - 31 attention heads in the circuit
# - 12 MLPs in the circuit
# - Each attention head writes: d_model / n_heads = 768 / 12 = 64 dimensions
# - Each MLP writes: d_model = 768 dimensions

# Calculate total dimensions written by attention heads
n_heads = 31
dims_per_head = 64
total_head_dims = n_heads * dims_per_head

# Calculate total dimensions written by MLPs
n_mlps = 12
dims_per_mlp = 768
total_mlp_dims = n_mlps * dims_per_mlp

# Calculate total budget
total_budget = total_head_dims + total_mlp_dims

# Calculate percentages
head_percentage = (total_head_dims / total_budget) * 100
mlp_percentage = (total_mlp_dims / total_budget) * 100

# Print results
print(f"Attention heads: {total_head_dims} dimensions ({head_percentage:.2f}%)")
print(f"MLPs: {total_mlp_dims} dimensions ({mlp_percentage:.2f}%)")
print(f"Total budget: {total_budget} dimensions")
print(f"Budget constraint (≤11,200): {'PASS' if total_budget <= 11200 else 'FAIL'}")

Attention heads: 1984 dimensions (17.71%)
MLPs: 9216 dimensions (82.29%)
Total budget: 11200 dimensions
Budget constraint (≤11,200): PASS


**Reasoning:**
Based on the documentation (Section 3.2 and Section 4):
- Each attention head writes 64 dimensions (d_model / n_heads = 768 / 12)
- Each MLP writes 768 dimensions (d_model)
- The final circuit has 31 attention heads and 12 MLPs

The calculation verifies that:
- 31 heads × 64 dims = 1,984 dimensions (17.71% of budget)
- 12 MLPs × 768 dims = 9,216 dimensions (82.29% of budget)
- Total = 11,200 dimensions (exactly at the budget limit)

This matches the Budget Verification table in Section 4 of the documentation.

**Answer:** The circuit achieves exactly 11,200 dimensions with 100% budget utilization. Attention heads use 1,984 dims (17.71%) and MLPs use 9,216 dims (82.29%). Budget constraint verification: PASS ✓

### Question 16 (CQ2): Layer Distribution Analysis of Attention Heads

**Question:** Write code to analyze the layer distribution of the 31 attention heads in the circuit. Count how many heads are in each layer (0-11) and identify which layer has the most heads. Then determine if there's a trend showing more heads in later layers.

**Reference:** Section 4 (Results) - Layer Distribution and Section 5 - layered processing observation

In [3]:
# Analyze the layer distribution of attention heads in the circuit
# The circuit contains these attention heads (from the documentation):
circuit_heads = [
    "a0.h1", "a0.h10", "a0.h5", "a0.h6",
    "a1.h11",
    "a3.h0", "a3.h6",
    "a6.h0",
    "a7.h3", "a7.h9",
    "a8.h10", "a8.h2", "a8.h3", "a8.h5", "a8.h6",
    "a9.h0", "a9.h2", "a9.h6", "a9.h7", "a9.h8", "a9.h9",
    "a10.h0", "a10.h1", "a10.h10", "a10.h2", "a10.h3", "a10.h6", "a10.h7",
    "a11.h10", "a11.h6", "a11.h8"
]

# Count heads per layer
layer_counts = {}
for head in circuit_heads:
    # Extract layer number from head name (e.g., "a0.h1" -> layer 0)
    layer = int(head.split('.')[0][1:])  # Remove 'a' prefix and extract number
    if layer not in layer_counts:
        layer_counts[layer] = 0
    layer_counts[layer] += 1

# Find the layer with most heads
max_layer = max(layer_counts, key=layer_counts.get)
max_count = layer_counts[max_layer]

# Calculate early vs late layer distribution
# Early layers: 0-3, Late layers: 9-11
early_layers_count = sum(layer_counts.get(layer, 0) for layer in range(0, 4))
late_layers_count = sum(layer_counts.get(layer, 0) for layer in range(9, 12))

# Print results
print("Layer distribution:")
for layer in sorted(layer_counts.keys()):
    print(f"  Layer {layer}: {layer_counts[layer]} heads")
print(f"\nLayer with most heads: Layer {max_layer} ({max_count} heads)")
print(f"\nEarly layers (0-3): {early_layers_count} heads")
print(f"Late layers (9-11): {late_layers_count} heads")
print(f"Trend: {'More heads in later layers' if late_layers_count > early_layers_count else 'More heads in early layers'}")

Layer distribution:
  Layer 0: 4 heads
  Layer 1: 1 heads
  Layer 3: 2 heads
  Layer 6: 1 heads
  Layer 7: 2 heads
  Layer 8: 5 heads
  Layer 9: 6 heads
  Layer 10: 7 heads
  Layer 11: 3 heads

Layer with most heads: Layer 10 (7 heads)

Early layers (0-3): 7 heads
Late layers (9-11): 16 heads
Trend: More heads in later layers


**Reasoning:**
The code extracts layer numbers from the attention head names (format: "aX.hY" where X is the layer) and counts the distribution. The documentation in Section 4 lists the layer distribution:
- Layer 0: 4 heads
- Layer 1: 1 head
- Layer 3: 2 heads
- Layer 6: 1 head
- Layer 7: 2 heads
- Layer 8: 5 heads
- Layer 9: 5 heads (Note: my count shows 6 heads for layer 9)
- Layer 10: 7 heads
- Layer 11: 4 heads (Note: my count shows 3 heads for layer 11)

The analysis shows Layer 10 has the most heads (7 heads). When comparing early layers (0-3) with 7 total heads versus late layers (9-11) with 16 total heads, there is a clear trend toward more heads in later layers. This aligns with the documentation's observation about "layered processing" where late layers (9-11) are responsible for "Name moving and prediction" which is the final and most critical step in the IOI task.

**Answer:** Layer 10 has the most heads (7 heads). There is a clear trend showing more heads in later layers, with 16 heads in layers 9-11 compared to only 7 heads in layers 0-3. This supports the hypothesis that later layers perform the critical name-moving and prediction functions.

### Question 17 (CQ3): Minimal Circuit Budget Calculation

**Question:** Write code to simulate what would happen if you wanted to create a minimal IOI circuit using only the top-1 head from each functional category (Duplicate Token: a3.h0, S-Inhibition: a8.h6, Name-Mover: a9.h9) plus all 12 MLPs. Calculate the total budget used and how much budget remains unused compared to the full circuit.

**Reference:** Section 3.3 (Circuit Node Selection) and Section 4 (Budget Verification)

In [4]:
# Calculate budget for a minimal IOI circuit
# Minimal circuit composition:
# - Top Duplicate Token Head: a3.h0
# - Top S-Inhibition Head: a8.h6
# - Top Name-Mover Head: a9.h9
# - All 12 MLPs (m0 through m11)

# Constants
DIMS_PER_HEAD = 64
DIMS_PER_MLP = 768
BUDGET_LIMIT = 11200
FULL_CIRCUIT_BUDGET = 11200  # From the documentation

# Calculate minimal circuit budget
minimal_heads = 3  # One from each category
minimal_mlps = 12
minimal_budget = (minimal_heads * DIMS_PER_HEAD) + (minimal_mlps * DIMS_PER_MLP)

# Calculate remaining budget
budget_remaining = BUDGET_LIMIT - minimal_budget

# Calculate savings compared to full circuit
budget_saved = FULL_CIRCUIT_BUDGET - minimal_budget

# Print results
print(f"Minimal Circuit Composition:")
print(f"  - Attention heads: {minimal_heads} heads × {DIMS_PER_HEAD} dims = {minimal_heads * DIMS_PER_HEAD} dims")
print(f"  - MLPs: {minimal_mlps} MLPs × {DIMS_PER_MLP} dims = {minimal_mlps * DIMS_PER_MLP} dims")
print(f"\nMinimal circuit budget: {minimal_budget} dimensions")
print(f"Budget remaining: {budget_remaining} dimensions")
print(f"Budget saved vs full circuit: {budget_saved} dimensions")
print(f"\nBudget constraint (≤11,200): {'PASS' if minimal_budget <= BUDGET_LIMIT else 'FAIL'}")

Minimal Circuit Composition:
  - Attention heads: 3 heads × 64 dims = 192 dims
  - MLPs: 12 MLPs × 768 dims = 9216 dims

Minimal circuit budget: 9408 dimensions
Budget remaining: 1792 dimensions
Budget saved vs full circuit: 1792 dimensions

Budget constraint (≤11,200): PASS


**Reasoning:**
The minimal circuit uses only the top-performing head from each of the three functional categories identified in the documentation (Section 3.3 Step 2):
- **Duplicate Token Head**: a3.h0 (highest S2→S1 attention at 0.7191)
- **S-Inhibition Head**: a8.h6 (highest END→S2 attention at 0.7441)
- **Name-Mover Head**: a9.h9 (highest END→IO attention at 0.7998)

Plus all 12 MLPs (as the documentation states they are important for "feature extraction and transformation").

The calculation:
- 3 heads × 64 dims = 192 dimensions
- 12 MLPs × 768 dims = 9,216 dimensions
- Total = 9,408 dimensions

This is 1,792 dimensions less than the full circuit (11,200 dims), saving 16% of the budget. The minimal circuit still passes the budget constraint and represents a highly efficient hypothesis-driven design focusing on one representative from each functional category.

**Answer:** A minimal IOI circuit using only the top-1 head from each functional category plus all MLPs would use 9,408 dimensions, leaving 1,792 dimensions of unused budget. This saves 1,792 dimensions compared to the full circuit while still maintaining coverage of all three hypothesized functional components.

In [5]:
# Save the notebook to the exam directory
import shutil
import os

# Current notebook path
current_notebook = "/home/smallyan/critic_model_mechinterp/notebooks/2025-11-14-11-41_ExamAnswers.ipynb"

# Target directory and filename
target_dir = "/home/smallyan/critic_model_mechinterp/runs/circuits_claude_2025-11-09_14-46-37/exam"
target_file = os.path.join(target_dir, "student_answer.ipynb")

# Ensure the target directory exists
os.makedirs(target_dir, exist_ok=True)

# Copy the notebook to the target location
shutil.copy2(current_notebook, target_file)

print(f"Notebook saved successfully!")
print(f"Source: {current_notebook}")
print(f"Destination: {target_file}")
print(f"\nVerifying file exists: {os.path.exists(target_file)}")