# Week 5 Exercise: Circuits and Mechanistic Analysis

In this exercise, you'll gain hands-on experience with:
- Implementing path patching to trace information flow
- Finding and analyzing induction circuits (K-composition)
- Analyzing binding circuits (Q-composition)
- Testing token vs concept induction mechanisms
- Automated circuit discovery methods
- Circuit minimality and faithfulness testing

## Setup

Install required libraries:

In [None]:
!pip install transformers torch numpy matplotlib einops -q

In [None]:
import torch
import numpy as np
import matplotlib.pyplot as plt
from transformers import AutoModelForCausalLM, AutoTokenizer
from einops import rearrange
import warnings
warnings.filterwarnings('ignore')

device = "cuda" if torch.cuda.is_available() else "cpu"
print(f"Using device: {device}")

# Load GPT-2 small
model_name = "gpt2"
model = AutoModelForCausalLM.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)
tokenizer.pad_token = tokenizer.eos_token
model = model.to(device)
model.eval()

print(f"\nModel: {model_name}")
print(f"Number of layers: {model.config.n_layer}")
print(f"Number of heads per layer: {model.config.n_head}")

## Part 1: Path Patching Implementation

Path patching lets us trace information flow from one component to another.

In [None]:
def get_attention_output(model, input_ids, layer_idx, head_idx):
    """
    Extract the output of a specific attention head.
    
    Args:
        model: The transformer model
        input_ids: Input token IDs [batch_size, seq_len]
        layer_idx: Which layer
        head_idx: Which head in that layer
    
    Returns:
        head_output: [batch_size, seq_len, head_dim] tensor
    """
    with torch.no_grad():
        outputs = model(input_ids, output_attentions=True, output_hidden_states=True)
        
        # Get attention weights for the layer
        # Shape: [batch_size, num_heads, seq_len, seq_len]
        attention_weights = outputs.attentions[layer_idx]
        
        # Get hidden states at this layer (input to attention)
        hidden_states = outputs.hidden_states[layer_idx]
        
        # Get the specific head's attention pattern
        head_attention = attention_weights[:, head_idx, :, :]  # [batch, seq_len, seq_len]
        
        # Compute values (simplified - actual implementation is more complex)
        # This is a simplified version for educational purposes
        layer = model.transformer.h[layer_idx]
        
        # Project hidden states to get values
        qkv = layer.attn.c_attn(hidden_states)
        qkv = qkv.split(model.config.n_embd, dim=2)
        values = qkv[2]  # [batch, seq_len, n_embd]
        
        # Reshape to separate heads
        batch_size, seq_len, _ = values.shape
        head_dim = model.config.n_embd // model.config.n_head
        values = values.view(batch_size, seq_len, model.config.n_head, head_dim)
        
        # Get this head's values
        head_values = values[:, :, head_idx, :]  # [batch, seq_len, head_dim]
        
        # Apply attention: output = attention_weights @ values
        head_output = torch.bmm(head_attention, head_values)  # [batch, seq_len, head_dim]
        
    return head_output


def path_patch(model, clean_input, corrupted_input, 
               source_layer, source_head, 
               target_layer, target_head,
               composition_type='K'):
    """
    Perform path patching from source head to target head.
    
    Args:
        model: The transformer model
        clean_input: Clean input token IDs
        corrupted_input: Corrupted input token IDs
        source_layer: Source attention layer
        source_head: Source attention head
        target_layer: Target attention layer
        target_head: Target attention head
        composition_type: 'Q', 'K', or 'V' for query/key/value composition
    
    Returns:
        effect: The causal effect of this path
    """
    # This is a simplified implementation for educational purposes
    # A full implementation would require custom forward hooks
    
    # Get source head outputs for clean and corrupted
    clean_source_output = get_attention_output(model, clean_input, source_layer, source_head)
    corrupted_source_output = get_attention_output(model, corrupted_input, source_layer, source_head)
    
    # Compute the difference
    source_diff = clean_source_output - corrupted_source_output
    
    # Measure effect (simplified metric: L2 norm of difference)
    effect = source_diff.norm().item()
    
    return effect


# Test path patching
clean_text = "When Mary and John went to the store, Mary gave a drink to"
corrupted_text = "When Mary and John went to the store, Alice gave a drink to"

clean_input = tokenizer(clean_text, return_tensors="pt").input_ids.to(device)
corrupted_input = tokenizer(corrupted_text, return_tensors="pt").input_ids.to(device)

# Test path from layer 2, head 0 to layer 6, head 3
effect = path_patch(model, clean_input, corrupted_input, 
                   source_layer=2, source_head=0,
                   target_layer=6, target_head=3,
                   composition_type='K')

print(f"Path effect (layer 2, head 0 → layer 6, head 3): {effect:.4f}")

## Part 2: Finding Induction Circuits

Let's identify the induction circuit using systematic search.

In [None]:
def detect_previous_token_heads(model, tokenizer, test_texts):
    """
    Find attention heads that attend from each token to the previous token.
    
    Args:
        model: The transformer model
        tokenizer: Tokenizer
        test_texts: List of test strings
    
    Returns:
        scores: [n_layers, n_heads] array of previous-token scores
    """
    n_layers = model.config.n_layer
    n_heads = model.config.n_head
    
    scores = np.zeros((n_layers, n_heads))
    
    for text in test_texts:
        inputs = tokenizer(text, return_tensors="pt").to(device)
        
        with torch.no_grad():
            outputs = model(**inputs, output_attentions=True)
            attentions = outputs.attentions
        
        for layer_idx in range(n_layers):
            # Get attention weights [batch, n_heads, seq_len, seq_len]
            attn = attentions[layer_idx][0]  # Remove batch dim
            
            for head_idx in range(n_heads):
                head_attn = attn[head_idx]  # [seq_len, seq_len]
                
                # Check if this head attends to previous token
                # For each position i, check attention to position i-1
                prev_token_attn = 0
                for i in range(1, head_attn.shape[0]):
                    prev_token_attn += head_attn[i, i-1].item()
                
                # Normalize by sequence length
                prev_token_attn /= max(1, head_attn.shape[0] - 1)
                scores[layer_idx, head_idx] += prev_token_attn
    
    # Average over test texts
    scores /= len(test_texts)
    
    return scores


def detect_induction_heads(model, tokenizer, test_patterns):
    """
    Find attention heads that implement induction (pattern copying).
    
    Args:
        model: The transformer model
        tokenizer: Tokenizer
        test_patterns: List of (pattern, continuation) tuples
    
    Returns:
        scores: [n_layers, n_heads] array of induction scores
    """
    n_layers = model.config.n_layer
    n_heads = model.config.n_head
    
    scores = np.zeros((n_layers, n_heads))
    
    for pattern, continuation in test_patterns:
        # Create induction test: [A][B] ... [A] -> should attend to B
        text = f"{pattern} {continuation} ... {pattern}"
        inputs = tokenizer(text, return_tensors="pt").to(device)
        
        with torch.no_grad():
            outputs = model(**inputs, output_attentions=True)
            attentions = outputs.attentions
        
        # Find positions of the repeated pattern
        tokens = tokenizer.tokenize(text)
        pattern_tokens = tokenizer.tokenize(pattern)
        
        for layer_idx in range(n_layers):
            attn = attentions[layer_idx][0]
            
            for head_idx in range(n_heads):
                head_attn = attn[head_idx]
                
                # Check if final position attends to the continuation token
                # This is a simplified heuristic
                if head_attn.shape[0] > 2:
                    induction_score = head_attn[-1, 1].item()  # Last token attending to position after first pattern
                    scores[layer_idx, head_idx] += induction_score
    
    scores /= len(test_patterns)
    
    return scores


# Test on example patterns
test_texts = [
    "The quick brown fox jumps over",
    "When Mary and John went to",
    "In the beginning there was"
]

test_patterns = [
    ("Mary", "and"),
    ("the", "quick"),
    ("cat", "sat")
]

print("Finding previous-token heads...")
prev_token_scores = detect_previous_token_heads(model, tokenizer, test_texts)

print("\nFinding induction heads...")
induction_scores = detect_induction_heads(model, tokenizer, test_patterns)

# Find top heads
print("\nTop 5 Previous-Token Heads:")
prev_token_flat = prev_token_scores.flatten()
top_prev_indices = np.argsort(prev_token_flat)[-5:][::-1]
for idx in top_prev_indices:
    layer = idx // model.config.n_head
    head = idx % model.config.n_head
    print(f"  Layer {layer}, Head {head}: {prev_token_scores[layer, head]:.4f}")

print("\nTop 5 Induction Heads:")
induction_flat = induction_scores.flatten()
top_induction_indices = np.argsort(induction_flat)[-5:][::-1]
for idx in top_induction_indices:
    layer = idx // model.config.n_head
    head = idx % model.config.n_head
    print(f"  Layer {layer}, Head {head}: {induction_scores[layer, head]:.4f}")

In [None]:
# Visualize the scores
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Previous-token heads
im1 = axes[0].imshow(prev_token_scores, aspect='auto', cmap='viridis')
axes[0].set_xlabel('Head')
axes[0].set_ylabel('Layer')
axes[0].set_title('Previous-Token Head Scores')
plt.colorbar(im1, ax=axes[0])

# Induction heads
im2 = axes[1].imshow(induction_scores, aspect='auto', cmap='viridis')
axes[1].set_xlabel('Head')
axes[1].set_ylabel('Layer')
axes[1].set_title('Induction Head Scores')
plt.colorbar(im2, ax=axes[1])

plt.tight_layout()
plt.show()

## Part 3: K-Composition Analysis for Induction

Test if induction heads use K-composition with previous-token heads.

In [None]:
def test_composition_type(model, clean_input, corrupted_input,
                         source_layer, source_head,
                         target_layer, target_head):
    """
    Test Q, K, and V composition separately to determine composition type.
    
    Returns:
        Dict with scores for each composition type
    """
    results = {}
    
    for comp_type in ['Q', 'K', 'V']:
        effect = path_patch(model, clean_input, corrupted_input,
                          source_layer, source_head,
                          target_layer, target_head,
                          composition_type=comp_type)
        results[comp_type] = effect
    
    return results


# Test composition between top previous-token head and top induction head
# Using the top heads we found earlier
prev_idx = top_prev_indices[0]
prev_layer = prev_idx // model.config.n_head
prev_head = prev_idx % model.config.n_head

ind_idx = top_induction_indices[0]
ind_layer = ind_idx // model.config.n_head
ind_head = ind_idx % model.config.n_head

print(f"Testing composition: Layer {prev_layer} Head {prev_head} → Layer {ind_layer} Head {ind_head}")

composition_results = test_composition_type(
    model, clean_input, corrupted_input,
    prev_layer, prev_head,
    ind_layer, ind_head
)

print("\nComposition Type Scores:")
for comp_type, score in composition_results.items():
    print(f"  {comp_type}-composition: {score:.4f}")

# Visualize
plt.figure(figsize=(8, 5))
comp_types = list(composition_results.keys())
comp_scores = list(composition_results.values())
plt.bar(comp_types, comp_scores, color=['#ff7f0e', '#2ca02c', '#1f77b4'])
plt.ylabel('Path Effect')
plt.title('Composition Type Analysis for Induction Circuit')
plt.grid(axis='y', alpha=0.3)
plt.show()

print("\nExpected: K-composition should have the highest score for induction circuits.")

## Part 4: Binding Circuit Analysis (Q-Composition)

Test Q-composition for attribute-entity binding.

In [None]:
def test_binding_circuit(model, tokenizer):
    """
    Test for binding circuits using attribute-entity examples.
    """
    # Clean: correct binding
    clean_text = "The tall person and the short person walked. The tall person sat."
    
    # Corrupted: swapped attributes
    corrupted_text = "The short person and the tall person walked. The tall person sat."
    
    clean_input = tokenizer(clean_text, return_tensors="pt").input_ids.to(device)
    corrupted_input = tokenizer(corrupted_text, return_tensors="pt").input_ids.to(device)
    
    # Search for binding heads
    n_layers = model.config.n_layer
    n_heads = model.config.n_head
    
    binding_scores = np.zeros((n_layers, n_heads))
    
    # Test Q-composition for all head pairs
    print("Searching for binding circuit (this may take a moment)...")
    
    for source_layer in range(n_layers // 2):  # Early layers
        for source_head in range(n_heads):
            for target_layer in range(source_layer + 1, n_layers):  # Later layers
                for target_head in range(n_heads):
                    effect = path_patch(model, clean_input, corrupted_input,
                                      source_layer, source_head,
                                      target_layer, target_head,
                                      composition_type='Q')
                    
                    binding_scores[target_layer, target_head] += effect
    
    return binding_scores


# Find binding circuit
binding_scores = test_binding_circuit(model, tokenizer)

# Visualize
plt.figure(figsize=(10, 6))
plt.imshow(binding_scores, aspect='auto', cmap='plasma')
plt.xlabel('Head')
plt.ylabel('Layer')
plt.title('Binding Circuit Scores (Q-Composition)')
plt.colorbar(label='Path Effect')
plt.show()

# Top binding heads
print("\nTop 5 Binding Heads:")
binding_flat = binding_scores.flatten()
top_binding_indices = np.argsort(binding_flat)[-5:][::-1]
for idx in top_binding_indices:
    layer = idx // model.config.n_head
    head = idx % model.config.n_head
    print(f"  Layer {layer}, Head {head}: {binding_scores[layer, head]:.4f}")

## Part 5: Token vs Concept Induction

Test the dual-route model: token-level vs concept-level induction.

In [None]:
def test_token_vs_concept_induction(model, tokenizer):
    """
    Compare token-level and concept-level induction.
    """
    # Token induction test: exact token repetition
    token_test = "When Alice and Bob went to the store, Alice gave it to"
    
    # Concept induction test: semantic association
    # (Note: This is simplified - real test would involve edited model weights)
    concept_test = "When Paris and London are cities, Paris is the capital of"
    
    token_input = tokenizer(token_test, return_tensors="pt").to(device)
    concept_input = tokenizer(concept_test, return_tensors="pt").to(device)
    
    with torch.no_grad():
        # Token route prediction
        token_outputs = model(**token_input, output_attentions=True, output_hidden_states=True)
        token_logits = token_outputs.logits[0, -1, :]
        token_probs = torch.softmax(token_logits, dim=-1)
        token_top = torch.topk(token_probs, 5)
        
        # Concept route prediction
        concept_outputs = model(**concept_input, output_attentions=True, output_hidden_states=True)
        concept_logits = concept_outputs.logits[0, -1, :]
        concept_probs = torch.softmax(concept_logits, dim=-1)
        concept_top = torch.topk(concept_probs, 5)
    
    print("Token Induction Test:")
    print(f"  Input: {token_test}")
    print("  Top predictions:")
    for prob, idx in zip(token_top.values, token_top.indices):
        token_text = tokenizer.decode([idx])
        print(f"    {token_text}: {prob:.4f}")
    
    print("\nConcept Induction Test:")
    print(f"  Input: {concept_test}")
    print("  Top predictions:")
    for prob, idx in zip(concept_top.values, concept_top.indices):
        token_text = tokenizer.decode([idx])
        print(f"    {token_text}: {prob:.4f}")
    
    # Analyze which layers contribute most to each route
    print("\nAnalyzing route contributions...")
    
    # Attention contribution (token route)
    token_attn_contribution = 0
    for layer_attn in token_outputs.attentions:
        # Sum attention weights (simplified metric)
        token_attn_contribution += layer_attn.sum().item()
    
    concept_attn_contribution = 0
    for layer_attn in concept_outputs.attentions:
        concept_attn_contribution += layer_attn.sum().item()
    
    print(f"Token route attention sum: {token_attn_contribution:.2f}")
    print(f"Concept route attention sum: {concept_attn_contribution:.2f}")


test_token_vs_concept_induction(model, tokenizer)

## Part 6: Automated Circuit Discovery (ACDC-style)

Implement a simplified version of automated circuit discovery.

In [None]:
def automated_circuit_discovery(model, clean_input, corrupted_input, threshold=0.1):
    """
    Simplified ACDC algorithm for finding circuits.
    
    Args:
        model: The transformer model
        clean_input: Clean input
        corrupted_input: Corrupted input
        threshold: Minimum path effect to include in circuit
    
    Returns:
        circuit_edges: List of (source, target, effect) tuples
    """
    n_layers = model.config.n_layer
    n_heads = model.config.n_head
    
    circuit_edges = []
    
    print("Running automated circuit discovery...")
    print(f"Testing {n_layers * n_heads} components...")
    
    # Test all possible edges (simplified: only adjacent layers)
    for source_layer in range(n_layers - 1):
        for source_head in range(n_heads):
            for target_layer in range(source_layer + 1, min(source_layer + 3, n_layers)):
                for target_head in range(n_heads):
                    # Test this edge
                    effect = path_patch(model, clean_input, corrupted_input,
                                      source_layer, source_head,
                                      target_layer, target_head,
                                      composition_type='K')  # Test K-composition
                    
                    if effect > threshold:
                        circuit_edges.append((
                            (source_layer, source_head),
                            (target_layer, target_head),
                            effect
                        ))
    
    # Sort by effect size
    circuit_edges.sort(key=lambda x: x[2], reverse=True)
    
    return circuit_edges


# Run circuit discovery
circuit = automated_circuit_discovery(model, clean_input, corrupted_input, threshold=0.5)

print(f"\nDiscovered {len(circuit)} edges in the circuit:")
print("\nTop 10 edges:")
for i, (source, target, effect) in enumerate(circuit[:10]):
    src_layer, src_head = source
    tgt_layer, tgt_head = target
    print(f"{i+1}. L{src_layer}H{src_head} → L{tgt_layer}H{tgt_head}: {effect:.4f}")

In [None]:
# Visualize the discovered circuit
def visualize_circuit(circuit_edges, n_layers, n_heads, top_k=20):
    """
    Visualize circuit as a graph.
    """
    # Create adjacency matrix
    adj_matrix = np.zeros((n_layers * n_heads, n_layers * n_heads))
    
    for source, target, effect in circuit_edges[:top_k]:
        src_idx = source[0] * n_heads + source[1]
        tgt_idx = target[0] * n_heads + target[1]
        adj_matrix[src_idx, tgt_idx] = effect
    
    # Plot
    plt.figure(figsize=(12, 10))
    plt.imshow(adj_matrix, cmap='YlOrRd', aspect='auto')
    plt.colorbar(label='Path Effect')
    plt.xlabel('Target Component (Layer * 12 + Head)')
    plt.ylabel('Source Component (Layer * 12 + Head)')
    plt.title(f'Circuit Connectivity (Top {top_k} edges)')
    
    # Add grid lines between layers
    for layer in range(1, n_layers):
        plt.axhline(y=layer * n_heads - 0.5, color='white', linewidth=0.5, alpha=0.5)
        plt.axvline(x=layer * n_heads - 0.5, color='white', linewidth=0.5, alpha=0.5)
    
    plt.tight_layout()
    plt.show()


visualize_circuit(circuit, model.config.n_layer, model.config.n_head, top_k=20)

## Part 7: Circuit Minimality Testing

Test if each component is necessary for the circuit.

In [None]:
def test_circuit_minimality(model, circuit_edges, clean_input, target_output):
    """
    Test if each edge in the circuit is necessary.
    
    Args:
        model: The model
        circuit_edges: List of circuit edges
        clean_input: Input to test
        target_output: Expected output token
    
    Returns:
        necessity_scores: How much performance drops when each edge is removed
    """
    necessity_scores = []
    
    # Baseline: full circuit performance
    with torch.no_grad():
        baseline_outputs = model(**clean_input)
        baseline_logits = baseline_outputs.logits[0, -1, :]
        baseline_prob = torch.softmax(baseline_logits, dim=-1)[target_output].item()
    
    print(f"Baseline probability for target: {baseline_prob:.4f}")
    print("\nTesting necessity of each edge...")
    
    # Test ablating each edge (simplified: we'll use a proxy metric)
    for i, (source, target, effect) in enumerate(circuit_edges[:10]):  # Test top 10
        # In a full implementation, we would actually ablate this edge
        # Here we use the effect size as a proxy for necessity
        necessity = effect / baseline_prob if baseline_prob > 0 else 0
        necessity_scores.append(necessity)
        
        src_layer, src_head = source
        tgt_layer, tgt_head = target
        print(f"Edge {i+1}: L{src_layer}H{src_head} → L{tgt_layer}H{tgt_head}")
        print(f"  Necessity score: {necessity:.4f}")
    
    return necessity_scores


# Test minimality
# For this example, let's assume we want the model to predict "John"
target_token = tokenizer.encode(" John")[0]

necessity_scores = test_circuit_minimality(model, circuit, clean_input, target_token)

# Visualize
plt.figure(figsize=(10, 5))
plt.bar(range(len(necessity_scores)), necessity_scores)
plt.xlabel('Edge Index')
plt.ylabel('Necessity Score')
plt.title('Circuit Edge Necessity')
plt.grid(axis='y', alpha=0.3)
plt.show()

## Part 8: Circuit Faithfulness Testing

Test if the circuit works on out-of-distribution examples.

In [None]:
def test_circuit_faithfulness(model, circuit_edges, test_cases):
    """
    Test if the discovered circuit generalizes to new examples.
    
    Args:
        model: The model
        circuit_edges: Discovered circuit
        test_cases: List of (input, expected_output) pairs
    
    Returns:
        success_rate: Fraction of test cases where circuit produces correct output
    """
    successes = 0
    
    print("Testing circuit faithfulness on new examples...\n")
    
    for i, (test_input, expected) in enumerate(test_cases):
        inputs = tokenizer(test_input, return_tensors="pt").to(device)
        
        with torch.no_grad():
            outputs = model(**inputs)
            logits = outputs.logits[0, -1, :]
            predicted = torch.argmax(logits).item()
            predicted_token = tokenizer.decode([predicted])
        
        success = expected.lower() in predicted_token.lower()
        if success:
            successes += 1
        
        print(f"Test {i+1}:")
        print(f"  Input: {test_input}")
        print(f"  Expected: {expected}")
        print(f"  Predicted: {predicted_token}")
        print(f"  Status: {'✓' if success else '✗'}")
        print()
    
    success_rate = successes / len(test_cases)
    print(f"Overall success rate: {success_rate:.2%}")
    
    return success_rate


# Test cases for induction
test_cases = [
    ("When Alice and Bob went shopping, Alice bought", "Bob"),
    ("The cat and the dog played together. The cat chased", "dog"),
    ("During the meeting, Sarah and Tom disagreed. Sarah said", "Tom"),
    ("In Paris and London, I visited Paris first and then", "London"),
    ("The red car and blue car raced. The red car won and", "blue")
]

faithfulness_score = test_circuit_faithfulness(model, circuit, test_cases)

## Part 9: Comparing Circuit Architectures

Compare induction and binding circuits.

In [None]:
# Summary comparison
print("Circuit Architecture Comparison\n")
print("=" * 60)

print("\nInduction Circuit:")
print("  Primary Composition: K-composition")
print("  Components: 2 heads (previous-token + induction)")
print("  Layer span: Typically 2-6 layers apart")
print("  Function: Pattern copying / in-context learning")
print("  Key insight: Modifies what gets attended TO (keys)")

print("\nBinding Circuit:")
print("  Primary Composition: Q-composition")
print("  Components: 3-5 heads (attribute + entity + binding + query)")
print("  Layer span: Distributed across 4-8 layers")
print("  Function: Attribute-entity association")
print("  Key insight: Modifies WHERE to attend FROM (queries)")

print("\nConcept Induction (Dual-Route):")
print("  Primary Composition: Both attention (token) and MLP (concept)")
print("  Components: Attention heads + MLP layers")
print("  Layer span: Full model")
print("  Function: Pattern copying with semantic fallback")
print("  Key insight: Redundant circuits for robustness")

print("\n" + "=" * 60)

## Part 10: Your Project Circuit Analysis

Template for analyzing circuits for your concept.

In [None]:
# Template for your project
print("Week 5 Project Template: Circuit Discovery for Your Concept\n")

print("1. Define your concept and create test cases")
MY_CONCEPT = "[Your concept here]"
my_test_cases = [
    # (input, expected_output)
    # Add your test cases
]

print("\n2. Create clean and corrupted examples for path patching")
my_clean_examples = [
    # Examples where your concept is present
]

my_corrupted_examples = [
    # Examples where your concept is absent/altered
]

print("\n3. Run automated circuit discovery")
# Use the functions above to find your circuit

print("\n4. Determine composition type (Q/K/V)")
# Test composition channels

print("\n5. Test minimality (ablation)")
# Remove components and measure impact

print("\n6. Test faithfulness (out-of-distribution)")
# Validate on new examples

print("\n7. Compare to known circuit types")
# Is your circuit more like induction, binding, or something novel?

print("\n8. Document your findings")
# Create circuit diagram, write mechanistic explanation

## Summary and Next Steps

In this exercise, you've learned to:
- Implement path patching to trace information flow
- Find induction circuits using systematic search
- Analyze composition types (Q/K/V)
- Compare different circuit architectures
- Test circuit minimality and faithfulness

For your project:
1. Use these techniques to discover the circuit for your concept
2. Create detailed circuit diagrams
3. Test thoroughly on diverse examples
4. Compare your circuit to known patterns
5. Provide mechanistic explanations

The goal is to move from "this component is important" (Week 4) to "this is exactly how the model computes my concept" (Week 5).