# Sarcasm Circuit Exam - GPT2-Small

This exam evaluates understanding of the sarcasm detection circuit documented in the research.

## Instructions

1. Answer all questions based solely on the provided documentation
2. For multiple choice questions, provide the letter (A, B, C, or D)
3. For free generation questions, provide clear, concise answers
4. For code questions, complete the TODO sections in the provided stubs

---

## Key Knowledge Points

The sarcasm circuit research covers the following key concepts:

### 1. **Circuit Discovery Method**
- Differential activation analysis on paired sarcastic/literal examples
- L2 norm measurement of activation differences
- Budget-constrained component selection

### 2. **Circuit Architecture**
- 54 total components (1 input, 10 MLPs, 43 attention heads)
- Write budget: 11,200 dimensions (100% utilization)
- Three-stage hierarchical processing

### 3. **Key Components**
- m2 (Layer 2 MLP): Primary sarcasm detector (32.47 differential)
- m11 (Layer 11 MLP): Final pre-output processing (22.30 differential)
- a11.h8, a11.h0: Output integration heads (3.33, 2.74 differential)
- Excluded: m3, m4 (minimal differential < 6.5)

### 4. **Three-Stage Processing**
- **Stage 1 (L0-L2)**: Early incongruity detection via m2
- **Stage 2 (L3-L7)**: Signal propagation and refinement
- **Stage 3 (L8-L11)**: Final integration into output

### 5. **Linguistic Features**
- Discourse markers: "Oh", "Wow", "Just"
- Positive sentiment words in negative contexts
- Contradiction between literal and contextual meaning

### 6. **Mechanistic Insights**
- Sarcasm detection is early (Layer 2, not middle layers)
- MLPs dominate (7,680 dims) vs attention (2,752 dims)
- Pattern matching rather than semantic reversal

### 7. **Comparison to IOI Circuit**
- Different primary mechanism (MLP vs attention)
- Different key layers (early vs late)
- Different circuit density (dense vs sparse)

### 8. **Limitations and Future Work**
- No causal validation via ablation
- Small dataset (40 examples, 5 analyzed in detail)
- Synthetic data vs real-world sarcasm
- Budget maximization (minimal circuit likely smaller)

---

## Section 1: Multiple Choice Questions

### Question 1

What is the primary computational mechanism used by GPT2-small for sarcasm detection, according to the documented circuit?

A) Attention-based information routing across sequence positions  
B) Late-layer sentiment polarity reversal  
C) Early-layer MLP-based incongruity detection  
D) Distributed gradient computation across all 12 layers

**Your answer:**

### Question 2

Which component shows the most dominant differential activation in the sarcasm circuit, and what is its approximate differential activation value?

A) a11.h8 (Layer 11, Head 8) with differential ~3.33  
B) m2 (Layer 2 MLP) with differential ~32.47  
C) m11 (Layer 11 MLP) with differential ~22.30  
D) a4.h11 (Layer 4, Head 11) with differential ~1.40

**Your answer:**

### Question 3

What is the total write budget utilized by the documented sarcasm circuit?

A) 8,448 dimensions (75% of budget)  
B) 9,600 dimensions (86% of budget)  
C) 10,240 dimensions (91% of budget)  
D) 11,200 dimensions (100% of budget)

**Your answer:**

### Question 5

Which two MLP layers were excluded from the circuit due to minimal differential activation?

A) m0 and m1  
B) m3 and m4  
C) m5 and m6  
D) m10 and m11

**Your answer:**

### Question 7

What method was used to identify components causally important for sarcasm detection?

A) Gradient-based attribution analysis  
B) Systematic ablation testing with behavioral metrics  
C) Differential activation analysis on paired examples  
D) Linear probing with supervised sarcasm classifiers

**Your answer:**

### Question 9

How many attention heads were included in the final circuit, and which layer contains the most important attention heads?

A) 43 heads total, with the most important in Layer 11  
B) 101 heads total, with the most important in Layer 4  
C) 54 heads total, with the most important in Layer 2  
D) 19 heads total, with the most important in Layer 6

**Your answer:**

### Question 11

In the differential activation analysis method, activations were averaged over which dimension to handle variable-length inputs?

A) Batch dimension  
B) Sequence position dimension  
C) Model dimension (d_model)  
D) Head dimension (d_head)

**Your answer:**

### Question 13

What is a key limitation of using differential activation (L2 norm of activation differences) as the selection criterion for circuit components?

A) It cannot handle variable-length sequences  
B) It requires expensive gradient computation  
C) It only works for attention mechanisms, not MLPs  
D) High differential activation does not guarantee causal importance

**Your answer:**

### Question 15

According to the documentation, what is the dimension of each attention head's output in GPT2-small?

A) 768 dimensions  
B) 128 dimensions  
C) 64 dimensions  
D) 32 dimensions

**Your answer:**

## Section 2: Free Generation Questions

### Question 4

The documentation describes a three-stage hierarchical process for sarcasm detection. Describe each stage, identify the key components involved, and explain the computational function performed at each stage.

**Your answer:**

### Question 6

Compare the sarcasm circuit to the Indirect Object Identification (IOI) circuit along four dimensions: primary mechanism, key layer, circuit size, and relative importance of attention vs. MLPs. What does this comparison suggest about linguistic task processing in transformers?

**Your answer:**

### Question 8

Explain the key linguistic features that distinguish sarcastic from literal sentences in the dataset. How might these features enable Layer 2 MLP to detect incongruity?

**Your answer:**

### Question 10

The original hypothesis predicted that middle layers (L4-L7) would be the primary detection site, but empirical evidence showed Layer 2 as the primary detector. Explain this discrepancy and what it reveals about the mechanistic difference between the predicted and actual sarcasm processing.

**Your answer:**

### Question 12

The circuit uses 10 MLPs (7,680 dims) versus 43 attention heads (2,752 dims). Given the budget-constrained selection algorithm described in the documentation, explain why this distribution occurred and what it implies about the relative importance of MLPs vs. attention for sarcasm detection.

**Your answer:**

### Question 14

Based on the documented circuit structure and the exclusion of m3 and m4, propose a hypothesis for why these specific middle layers might show minimal differential activation. What experiments would you conduct to test this hypothesis?

**Your answer:**

### Question 16

The documentation mentions that the circuit hasn't been validated with ablation testing. Design a systematic ablation experiment to test the sufficiency and necessity of the identified circuit components. Your design should address both individual component importance and potential interaction effects.

**Your answer:**

## Section 3: Code Questions

Complete the following code exercises to verify claims made in the documentation.

### Code Question 1 (CQ1): Write Budget Verification

Write code to verify the write budget calculation for the documented circuit. Given the circuit composition (1 input embedding, 10 MLPs, 43 attention heads) and the dimension specifications (d_model=768, d_head=64), compute the total write cost and verify it matches the documented 11,200 dimensions.

In [None]:
# Code Question 1: Write Budget Verification

# Given specifications
d_model = 768  # Dimension for input embedding and MLPs
d_head = 64    # Dimension for attention heads

# Circuit composition
num_input = 1
num_mlps = 10
num_attention_heads = 43

# TODO: Calculate the total write cost
# Hint: total_cost = (input_cost) + (mlp_cost) + (attention_head_cost)
total_write_cost = 0  # Replace with your calculation

# TODO: Verify if it matches the documented budget
documented_budget = 11200
matches = False  # Replace with your verification

print(f"Calculated total write cost: {total_write_cost}")
print(f"Documented budget: {documented_budget}")
print(f"Budget matches documentation: {matches}")

### Code Question 2 (CQ2): Differential Activation Percentage Verification

The documentation claims m2 is approximately 45% stronger than m11 in differential activation. Write code to verify this claim by computing the percentage difference between m2's differential (32.47) and m11's differential (22.30), and check if it's approximately 45%.

In [None]:
# Code Question 2: Differential Activation Percentage Verification

# Given differential activation values
m2_diff = 32.47
m11_diff = 22.30

# TODO: Calculate the percentage by which m2 is stronger than m11
# Hint: percentage_stronger = ((m2_diff - m11_diff) / m11_diff) * 100
percentage_stronger = 0.0  # Replace with your calculation

# TODO: Check if it's approximately 45% (within ±2% tolerance)
claimed_percentage = 45.0
tolerance = 2.0
approximately_correct = False  # Replace with your verification

print(f"m2 differential: {m2_diff}")
print(f"m11 differential: {m11_diff}")
print(f"Percentage stronger: {percentage_stronger:.2f}%")
print(f"Claimed percentage: {claimed_percentage}%")
print(f"Approximately correct (±{tolerance}%): {approximately_correct}")

### Code Question 3 (CQ3): Attention Head Distribution Verification

The circuit includes attention heads distributed across layers. Write code to verify the documented distribution: 9 heads in early layers (L0-L3), 19 heads in middle layers (L4-L7), and 15 heads in late layers (L8-L11). Parse the provided list of attention head components and compute the actual distribution to verify these claims.

In [None]:
# Code Question 3: Attention Head Distribution Verification

# Sample attention head components from the documentation
# In practice, you would have all 43 heads - this is a representative sample
# Format: "a{layer}.h{head}"
attention_heads = [
    "a11.h8", "a11.h0", "a4.h11", "a9.h3", "a6.h11", "a8.h5", 
    "a9.h10", "a5.h3", "a10.h5", "a11.h3", "a0.h2", "a1.h5",
    "a2.h8", "a3.h1", "a4.h3", "a4.h7", "a5.h9", "a5.h11",
    "a6.h2", "a6.h5", "a6.h8", "a7.h1", "a7.h4", "a7.h10",
    "a8.h0", "a8.h3", "a8.h9", "a9.h1", "a9.h5", "a9.h8",
    "a10.h0", "a10.h2", "a10.h8", "a10.h11", "a11.h1", "a11.h5",
    "a0.h7", "a1.h2", "a2.h3", "a3.h6", "a7.h9", "a8.h11", "a11.h9"
]

# TODO: Parse the layer number from each attention head component
# Hint: Extract the number between 'a' and '.h' (e.g., "a11.h8" -> layer 11)

# TODO: Count heads in each layer range
early_layers_count = 0   # L0-L3
middle_layers_count = 0  # L4-L7
late_layers_count = 0    # L8-L11

# TODO: Verify against documented distribution
documented_early = 9
documented_middle = 19
documented_late = 15

distribution_matches = False  # Replace with your verification

print(f"Early layers (L0-L3): {early_layers_count} heads (documented: {documented_early})")
print(f"Middle layers (L4-L7): {middle_layers_count} heads (documented: {documented_middle})")
print(f"Late layers (L8-L11): {late_layers_count} heads (documented: {documented_late})")
print(f"Distribution matches documentation: {distribution_matches}")