# Sarcasm Circuit Analysis - Assessment Questions

This assessment evaluates your understanding of the sarcasm circuit analysis documentation for GPT2-small.

**Instructions**: 
- Answer all questions based on the documentation provided
- For multiple choice questions, select the correct answer
- For free generation questions, provide detailed explanations
- For code questions, complete the code to produce the expected output


## Key Knowledge Points

The following key concepts from the documentation are tested in this assessment:

### 1. Goal and Setup
- Task: Identify sarcasm detection circuit in GPT2-small
- Model specifications: 12 layers, 12 attention heads per layer
- Dimensions: d_model = 768, d_head = 64
- Write budget constraint: ≤ 11,200 dimensions

### 2. Dataset Characteristics
- 40 total examples (20 sarcastic, 20 literal)
- Synthetic paired data structure
- Key linguistic features: discourse markers, positive sentiment words, negative situational context

### 3. Methodology
- Differential activation analysis
- L2 norm measurement: ||mean_sarc - mean_lit||_2
- Normalization over sequence positions

### 4. Circuit Composition
- 54 total components (1 input + 10 MLPs + 43 attention heads)
- 100% budget utilization (11,200 dimensions)
- Key finding: m2 as primary sarcasm detector

### 5. Three-Stage Mechanistic Model
- Stage 1 (L0-L2): Early detection
- Stage 2 (L3-L7): Distributed propagation
- Stage 3 (L8-L11): Final integration

### 6. Key Insights
- Sarcasm detection happens early (L2), not gradually
- MLPs dominate over attention heads
- Different from IOI circuit (MLP-dominant vs attention-dominant)


---
## Assessment Questions

### Question 1 (Multiple Choice)

What is the dimension of a single attention head (d_head) in GPT2-small as used in this sarcasm circuit analysis?

A. 32 dimensions
B. 128 dimensions
C. 768 dimensions
D. 64 dimensions


### Question 2 (Multiple Choice)

How many total components are included in the identified sarcasm detection circuit?

A. 44 components (1 input + 12 MLPs + 31 attention heads)
B. 54 components (1 input + 10 MLPs + 43 attention heads)
C. 64 components (1 input + 12 MLPs + 51 attention heads)
D. 24 components (1 input + 10 MLPs + 13 attention heads)


### Question 3 (Multiple Choice)

Which MLP layer is identified as the primary sarcasm detector with the highest differential activation?

A. m0 (Layer 0 MLP) with 7.33 average differential activation
B. m2 (Layer 2 MLP) with 32.47 average differential activation
C. m11 (Layer 11 MLP) with 22.30 average differential activation
D. m5 (Layer 5 MLP) with 7.79 average differential activation


### Question 4 (Free Generation)

Which two MLP layers were excluded from the sarcasm circuit? Explain why they were excluded based on the documentation.

**Your Answer:**


### Question 5 (Free Generation)

The initial hypothesis suggested that sarcasm detection follows a three-stage process: sentiment encoding → incongruity detection → meaning reversal. How did the empirical findings revise this understanding? Explain the key differences between the original hypothesis and the observed mechanism.

**Your Answer:**


### Question 6 (Multiple Choice)

How does the sarcasm circuit differ from the Indirect Object Identification (IOI) circuit in terms of the dominant component type?

A. Both circuits are equally balanced between MLPs and attention
B. IOI circuit is MLP-dominant while sarcasm circuit is attention-dominant
C. Both circuits are primarily attention-dominant
D. Sarcasm circuit is MLP-dominant while IOI circuit is attention-dominant


### Question 7 (Free Generation)

Identify the two most important attention heads in the sarcasm circuit based on differential activation. What is their interpreted function according to the documentation?

**Your Answer:**


### Question 8 (Free Generation)

Based on the documentation, explain the key linguistic features that characterize sarcastic sentences in the dataset. How does the combination of these features create the contradiction that the circuit must detect?

**Your Answer:**


### Question 9 (Free Generation)

If you wanted to include all 12 MLPs and all 144 attention heads (12 layers × 12 heads) in a circuit for GPT2-small, calculate the total write cost. Would this exceed the 11,200 dimension budget? Show your calculation.

**Your Answer:**


### Question 10 (Multiple Choice)

According to the revised mechanistic model, what is the primary function of the middle layers (L3-L7) in the sarcasm circuit?

A. Primary incongruity detection - identifying contradictions between sentiment and context
B. Sentiment encoding - detecting and encoding literal sentiment words
C. Distributed propagation - refining and routing the sarcasm signal across sequence positions
D. Meaning reversal - flipping sentiment polarity when sarcasm is detected


### Question 11 (Multiple Choice)

What normalization technique was used to handle variable-length inputs when computing differential activations?

A. Max pooling over sequence positions
B. Averaged activations over sequence positions (mean over sequence dimension)
C. Used only the last token's activation
D. Padded all sequences to the same length


### Question 12 (Free Generation)

The documentation lists several limitations of the study. Why is the distinction between 'differential activation' and 'causal importance' considered a significant limitation? Explain what additional experiments would be needed to establish causal importance.

**Your Answer:**


### Code Question CQ1

Write code to verify the write budget calculation for the sarcasm circuit. Given the circuit composition (1 input embedding, 10 MLPs, 43 attention heads), compute the total write cost and verify it matches the documented 11,200 dimension budget.

Your code should:
1. Define the dimension sizes (d_model=768 for input/MLPs, d_head=64 for attention heads)
2. Calculate the individual costs for each component type
3. Calculate the total write cost
4. Print the breakdown and verify it equals 11,200

Expected output should show the individual costs and confirm the total equals 11,200.


In [None]:
# TODO: Verify the write budget calculation
# Define dimension sizes
d_model = None  # TODO: Set to 768
d_head = None   # TODO: Set to 64

# Circuit composition
num_input = None        # TODO: Set to 1
num_mlps = None         # TODO: Set to 10
num_attention_heads = None  # TODO: Set to 43

# Calculate individual costs
input_cost = None       # TODO: Calculate
mlp_cost = None         # TODO: Calculate
attn_cost = None        # TODO: Calculate

# Calculate total
total_cost = None       # TODO: Calculate

# Print results
print(f"Input embedding cost: {input_cost} dimensions")
print(f"MLP cost ({num_mlps} MLPs): {mlp_cost} dimensions")
print(f"Attention head cost ({num_attention_heads} heads): {attn_cost} dimensions")
print(f"Total write cost: {total_cost} dimensions")
print(f"Budget verification: {total_cost == 11200}")

### Code Question CQ2

Write code to analyze the distribution of the 43 attention heads in the sarcasm circuit across the 12 layers (0-11). 

Given the list of attention heads in the circuit (from the JSON file or as provided below), your code should:
1. Parse the attention head names to extract layer numbers
2. Count the number of heads per layer
3. Group layers into three stages:
   - Early (L0-L3)
   - Middle (L4-L7)  
   - Late (L8-L11)
4. Print the count per layer and the total per stage
5. Verify the stage totals match the documentation: Early=9, Middle=19, Late=15

Attention heads in circuit: ['a11.h8', 'a11.h0', 'a4.h11', 'a9.h3', 'a6.h11', 'a8.h5', 'a9.h10', 'a5.h3', 'a10.h5', 'a11.h3', 'a3.h9', 'a10.h9', 'a4.h9', 'a4.h7', 'a3.h11', 'a8.h7', 'a7.h8', 'a6.h0', 'a4.h0', 'a2.h8', 'a5.h4', 'a8.h10', 'a5.h7', 'a4.h1', 'a6.h8', 'a5.h2', 'a11.h11', 'a6.h7', 'a8.h4', 'a3.h2', 'a8.h8', 'a2.h5', 'a6.h4', 'a7.h9', 'a7.h3', 'a4.h3', 'a2.h2', 'a3.h6', 'a6.h5', 'a11.h4', 'a2.h3', 'a8.h2', 'a1.h0']


In [None]:
# TODO: Analyze attention head distribution by layer
attention_heads = ['a11.h8', 'a11.h0', 'a4.h11', 'a9.h3', 'a6.h11', 'a8.h5', 
                   'a9.h10', 'a5.h3', 'a10.h5', 'a11.h3', 'a3.h9', 'a10.h9', 
                   'a4.h9', 'a4.h7', 'a3.h11', 'a8.h7', 'a7.h8', 'a6.h0', 
                   'a4.h0', 'a2.h8', 'a5.h4', 'a8.h10', 'a5.h7', 'a4.h1', 
                   'a6.h8', 'a5.h2', 'a11.h11', 'a6.h7', 'a8.h4', 'a3.h2', 
                   'a8.h8', 'a2.h5', 'a6.h4', 'a7.h9', 'a7.h3', 'a4.h3', 
                   'a2.h2', 'a3.h6', 'a6.h5', 'a11.h4', 'a2.h3', 'a8.h2', 'a1.h0']

# TODO: Initialize a counter for each layer (0-11)
layer_counts = None  # Use a dictionary or list

# TODO: Parse each attention head name to extract layer number
# Example: 'a11.h8' -> layer 11
for head in attention_heads:
    # TODO: Extract layer number from head name
    layer = None  # Parse the layer number
    # TODO: Increment the count for this layer
    pass

# TODO: Print count per layer
print("Heads per layer:")
for layer in range(12):
    # TODO: Print layer and count
    pass

# TODO: Calculate stage totals
early_total = None    # Layers 0-3
middle_total = None   # Layers 4-7
late_total = None     # Layers 8-11

print(f"\nStage totals:")
print(f"Early (L0-L3): {early_total} heads")
print(f"Middle (L4-L7): {middle_total} heads")
print(f"Late (L8-L11): {late_total} heads")

# TODO: Verify against documentation
print(f"\nVerification:")
print(f"Early matches doc (9): {early_total == 9}")
print(f"Middle matches doc (19): {middle_total == 19}")
print(f"Late matches doc (15): {late_total == 15}")

### Code Question CQ3

Write code to analyze the relative contribution of MLPs versus attention heads to the sarcasm circuit in terms of dimensions.

Your code should:
1. Calculate the total dimensions from MLPs (10 MLPs × 768 dims)
2. Calculate the total dimensions from attention heads (43 heads × 64 dims)
3. Calculate the percentage of total circuit dimensions (excluding input) contributed by:
   - MLPs
   - Attention heads
4. Print the results and explain what this ratio tells us about whether the circuit is "MLP-dominant" or "attention-dominant"

Note: Exclude the input embedding (768 dims) from the percentage calculation since it's always required.

Expected conclusion: The circuit should be MLP-dominant with MLPs contributing ~74% of non-input dimensions.


In [None]:
# TODO: Analyze MLP vs attention head contribution
# Define dimensions and counts
d_model = 768
d_head = 64
num_mlps = 10
num_attention_heads = 43

# TODO: Calculate total dimensions from each component type
mlp_dims = None         # TODO: Calculate
attn_dims = None        # TODO: Calculate

# TODO: Calculate total non-input dimensions
total_non_input = None  # TODO: Calculate (exclude input embedding)

# TODO: Calculate percentages
mlp_percentage = None   # TODO: Calculate
attn_percentage = None  # TODO: Calculate

# Print results
print(f"MLP dimensions: {mlp_dims}")
print(f"Attention head dimensions: {attn_dims}")
print(f"Total non-input dimensions: {total_non_input}")
print(f"\nPercentage contribution:")
print(f"MLPs: {mlp_percentage:.1f}%")
print(f"Attention heads: {attn_percentage:.1f}%")

# TODO: Determine if circuit is MLP-dominant or attention-dominant
if mlp_percentage > attn_percentage:
    print("\nConclusion: Circuit is MLP-dominant")
else:
    print("\nConclusion: Circuit is attention-dominant")