# Session 2: Architectural Prerequisites
## Why Next-Token Prediction Fails

**Production LLM Deployment: Risk Characterization Before Failure**

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/Javihaus/Production_LLM_Deployment/blob/main/sessions/session_02_architectural_prerequisites/notebook.ipynb)

---

**Learning Objectives:**
1. Distinguish between discrete and continuous representations
2. Identify pattern matching vs computational processes
3. Recognize biological existence proofs for specialized reasoning
4. Apply Lake & Baroni's compositional generalization framework

## Setup

In [None]:
!pip install -q anthropic numpy pandas matplotlib seaborn

import anthropic
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from typing import List, Dict, Tuple
import json
import time

plt.style.use('seaborn-v0_8-whitegrid')
%matplotlib inline

try:
    from google.colab import userdata
    api_key = userdata.get('ANTHROPIC_API_KEY')
except:
    import os
    api_key = os.environ.get('ANTHROPIC_API_KEY')

client = anthropic.Anthropic(api_key=api_key)
print("Setup complete!")

## Part 1: Discrete vs Continuous Representations

LLMs operate on discrete tokens. This is fundamental to their architecture and determines what they can and cannot do.

| Task Type | Representation | LLM Fit |
|-----------|---------------|----------|
| Text completion | Discrete (tokens) | Excellent |
| Classification | Discrete (categories) | Good |
| Arithmetic | Continuous (magnitudes) | Poor |
| Temporal reasoning | Continuous (durations) | Very Poor |

In [None]:
def test_discrete_vs_continuous():
    """Demonstrate LLM performance on discrete vs continuous tasks."""
    
    tests = [
        # Discrete tasks (should work well)
        {
            "type": "discrete",
            "name": "Category classification",
            "prompt": "Classify this as POSITIVE or NEGATIVE: 'I love this product!'\nAnswer:",
            "expected": "POSITIVE"
        },
        {
            "type": "discrete",
            "name": "Pattern completion",
            "prompt": "Complete the sequence: A, B, C, D, ___",
            "expected": "E"
        },
        # Continuous tasks (may fail)
        {
            "type": "continuous",
            "name": "Novel arithmetic",
            "prompt": "Calculate: 847 * 293 = ?\nAnswer with just the number:",
            "expected": "248171"
        },
        {
            "type": "continuous",
            "name": "Time calculation",
            "prompt": "If it's 9:47 AM and I wait 3 hours and 38 minutes, what time is it?\nAnswer with just the time:",
            "expected": "1:25 PM"
        },
        {
            "type": "continuous",
            "name": "Duration comparison",
            "prompt": "Which is longer: 2 hours 45 minutes or 175 minutes?\nAnswer with just the one that is longer:",
            "expected": "175 minutes"
        }
    ]
    
    results = []
    
    print("=" * 60)
    print("DISCRETE VS CONTINUOUS TASK PERFORMANCE")
    print("=" * 60)
    
    for test in tests:
        response = client.messages.create(
            model="claude-sonnet-4-5-20250929",
            max_tokens=50,
            messages=[{"role": "user", "content": test["prompt"]}]
        )
        
        answer = response.content[0].text.strip()
        correct = test["expected"].lower() in answer.lower()
        
        results.append({
            "type": test["type"],
            "name": test["name"],
            "expected": test["expected"],
            "got": answer[:50],
            "correct": correct
        })
        
        print(f"\n{test['type'].upper()}: {test['name']}")
        print(f"  Expected: {test['expected']}")
        print(f"  Got: {answer[:50]}")
        print(f"  Correct: {'YES' if correct else 'NO'}")
        
        time.sleep(0.5)
    
    # Summary
    df = pd.DataFrame(results)
    print("\n" + "=" * 60)
    print("SUMMARY")
    print("=" * 60)
    
    for task_type in ["discrete", "continuous"]:
        subset = df[df["type"] == task_type]
        accuracy = subset["correct"].mean()
        print(f"{task_type.capitalize()} tasks: {accuracy:.1%} accuracy")
    
    return df

results_df = test_discrete_vs_continuous()

## Part 2: Pattern Matching vs Computation

LLMs excel at pattern matching against training data. They struggle with genuine computation that requires:
- Maintaining intermediate state
- Applying rules systematically
- Handling novel inputs

In [None]:
def test_pattern_vs_computation():
    """Test whether model is pattern matching or computing."""
    
    # Test 1: Common vs Novel arithmetic
    arithmetic_tests = [
        # Common (likely in training data)
        {"prompt": "What is 2 + 2?", "expected": "4", "type": "common"},
        {"prompt": "What is 10 * 10?", "expected": "100", "type": "common"},
        # Novel (unlikely in training data)
        {"prompt": "What is 7847 + 2938?", "expected": "10785", "type": "novel"},
        {"prompt": "What is 6291 - 4873?", "expected": "1418", "type": "novel"},
    ]
    
    print("=" * 60)
    print("PATTERN MATCHING VS COMPUTATION: ARITHMETIC")
    print("=" * 60)
    
    results = []
    for test in arithmetic_tests:
        response = client.messages.create(
            model="claude-sonnet-4-5-20250929",
            max_tokens=20,
            messages=[{"role": "user", "content": f"{test['prompt']} Answer with just the number."}]
        )
        answer = response.content[0].text.strip()
        correct = test["expected"] in answer
        
        results.append({
            "type": test["type"],
            "prompt": test["prompt"],
            "expected": test["expected"],
            "got": answer,
            "correct": correct
        })
        
        print(f"\n{test['type'].upper()}: {test['prompt']}")
        print(f"  Expected: {test['expected']}, Got: {answer}, Correct: {'YES' if correct else 'NO'}")
        
        time.sleep(0.3)
    
    return pd.DataFrame(results)

arithmetic_df = test_pattern_vs_computation()

In [None]:
def test_compositional_generalization():
    """Test Lake & Baroni style compositional generalization."""
    
    # Define primitives
    print("=" * 60)
    print("COMPOSITIONAL GENERALIZATION (Lake & Baroni)")
    print("=" * 60)
    
    # Training-like examples (primitives)
    training_prompt = """Learn these definitions:
- 'dax' means 'jump'
- 'wif' means 'twice'
- 'lug' means 'walk'
- 'zup' means 'opposite direction'

Examples:
- 'dax' = jump
- 'wif dax' = jump jump
- 'lug' = walk
"""
    
    # Test novel compositions
    composition_tests = [
        {"input": "wif lug", "expected": "walk walk", "difficulty": "easy"},
        {"input": "wif wif dax", "expected": "jump jump jump jump", "difficulty": "medium"},
        {"input": "zup dax", "expected": "opposite of jump / jump backwards", "difficulty": "hard"},
    ]
    
    results = []
    
    for test in composition_tests:
        prompt = f"""{training_prompt}
Now interpret: '{test['input']}'
What actions does this represent?"""
        
        response = client.messages.create(
            model="claude-sonnet-4-5-20250929",
            max_tokens=100,
            messages=[{"role": "user", "content": prompt}]
        )
        
        answer = response.content[0].text.strip()
        
        print(f"\n{test['difficulty'].upper()}: '{test['input']}'")
        print(f"  Expected: {test['expected']}")
        print(f"  Got: {answer[:100]}")
        
        results.append({
            "difficulty": test["difficulty"],
            "input": test["input"],
            "response": answer
        })
        
        time.sleep(0.5)
    
    return results

comp_results = test_compositional_generalization()

## Part 3: Architectural Prerequisites Framework

Use this framework to assess whether your task's requirements match LLM capabilities.

In [None]:
class ArchitecturalPrerequisiteAnalyzer:
    """Framework for analyzing architectural prerequisites."""
    
    PREREQUISITES = {
        "discrete_representation": {
            "description": "Task can be represented with discrete symbols/tokens",
            "llm_provides": True,
            "examples": ["text classification", "named entity recognition", "translation"]
        },
        "continuous_representation": {
            "description": "Task requires continuous magnitude representation",
            "llm_provides": False,
            "examples": ["precise arithmetic", "physical simulation", "temporal duration"]
        },
        "state_maintenance": {
            "description": "Task requires maintaining and updating state over time",
            "llm_provides": False,
            "examples": ["tracking inventory", "game state", "evolving constraints"]
        },
        "compositional_generalization": {
            "description": "Task requires systematic combination of primitives",
            "llm_provides": False,
            "examples": ["novel algorithm design", "proof construction", "planning"]
        },
        "pattern_completion": {
            "description": "Task involves completing familiar patterns",
            "llm_provides": True,
            "examples": ["text completion", "code completion", "style transfer"]
        },
        "retrieval": {
            "description": "Task requires recalling facts from training data",
            "llm_provides": "Partial",
            "examples": ["factual Q&A", "definitions", "general knowledge"]
        }
    }
    
    def analyze(self, task_name: str, required_prerequisites: List[str]) -> Dict:
        """Analyze whether LLM can handle the task."""
        
        analysis = {
            "task": task_name,
            "prerequisites": [],
            "gaps": [],
            "llm_fit": None,
            "recommendation": None
        }
        
        for prereq in required_prerequisites:
            if prereq in self.PREREQUISITES:
                prereq_info = self.PREREQUISITES[prereq]
                provides = prereq_info["llm_provides"]
                
                analysis["prerequisites"].append({
                    "name": prereq,
                    "llm_provides": provides,
                    "description": prereq_info["description"]
                })
                
                if provides == False:
                    analysis["gaps"].append(prereq)
        
        # Determine overall fit
        if len(analysis["gaps"]) == 0:
            analysis["llm_fit"] = "Good"
            analysis["recommendation"] = "LLM-only approach likely sufficient. Proceed with testing."
        elif len(analysis["gaps"]) == 1:
            analysis["llm_fit"] = "Partial"
            analysis["recommendation"] = f"Consider hybrid architecture to address: {analysis['gaps'][0]}"
        else:
            analysis["llm_fit"] = "Poor"
            analysis["recommendation"] = f"Hybrid architecture required. Gaps: {', '.join(analysis['gaps'])}"
        
        return analysis
    
    def print_analysis(self, analysis: Dict):
        """Print formatted analysis."""
        print("=" * 60)
        print(f"ARCHITECTURAL ANALYSIS: {analysis['task']}")
        print("=" * 60)
        
        print("\nRequired Prerequisites:")
        for prereq in analysis["prerequisites"]:
            status = "YES" if prereq["llm_provides"] == True else ("PARTIAL" if prereq["llm_provides"] == "Partial" else "NO")
            print(f"  - {prereq['name']}: LLM provides = {status}")
        
        print(f"\nArchitectural Gaps: {analysis['gaps'] if analysis['gaps'] else 'None'}")
        print(f"LLM Fit: {analysis['llm_fit']}")
        print(f"\nRecommendation: {analysis['recommendation']}")


# Example analyses
analyzer = ArchitecturalPrerequisiteAnalyzer()

# Example 1: Content summarization (good fit)
analysis1 = analyzer.analyze(
    "Document Summarization",
    ["discrete_representation", "pattern_completion"]
)
analyzer.print_analysis(analysis1)

print("\n")

# Example 2: Medical scheduling (poor fit)
analysis2 = analyzer.analyze(
    "Medical Appointment Scheduling",
    ["continuous_representation", "state_maintenance", "pattern_completion"]
)
analyzer.print_analysis(analysis2)

## Part 4: Biological Existence Proofs

Nature demonstrates that specialized mechanisms are needed for temporal reasoning. These are NOT pattern matching systems.

In [None]:
# Visualize biological timing mechanisms

biological_systems = {
    "System": [
        "Hippocampal Time Cells",
        "Cerebellar Timing",
        "Interval Timing (Dopaminergic)",
        "Circadian Rhythms"
    ],
    "Time Scale": [
        "Seconds to minutes",
        "Milliseconds",
        "Seconds to hours",
        "~24 hours"
    ],
    "Mechanism": [
        "Sequential neural firing",
        "Parallel fiber delays",
        "Dopamine accumulation",
        "Transcription-translation loops"
    ],
    "Pattern Matching?": [
        "No - computational",
        "No - physical delay lines",
        "No - accumulator model",
        "No - molecular oscillator"
    ]
}

df_bio = pd.DataFrame(biological_systems)

print("=" * 80)
print("BIOLOGICAL TIMING MECHANISMS: Existence Proofs for Specialized Processing")
print("=" * 80)
print()
print(df_bio.to_string(index=False))
print()
print("Key Insight: None of these systems work by pattern matching on symbolic")
print("representations. They use dedicated computational mechanisms.")

## Part 5: Exercise - Analyze Your Deployment Scenario

In [None]:
# YOUR EXERCISE: Analyze your deployment scenario

analyzer = ArchitecturalPrerequisiteAnalyzer()

# Fill in your scenario
my_analysis = analyzer.analyze(
    "YOUR TASK NAME HERE",  # e.g., "Customer Support Chatbot"
    [
        # Uncomment the prerequisites your task requires:
        "discrete_representation",
        "pattern_completion",
        # "continuous_representation",
        # "state_maintenance",
        # "compositional_generalization",
        # "retrieval",
    ]
)

analyzer.print_analysis(my_analysis)

## Key Takeaways

1. **LLMs operate on discrete tokens.** Tasks requiring continuous magnitude representation will fail.

2. **Pattern matching ≠ computation.** High accuracy on common examples doesn't guarantee accuracy on novel inputs.

3. **Biology provides existence proofs.** Reliable temporal reasoning requires specialized mechanisms, not pattern matching.

4. **Analyze prerequisites before building.** Understanding architectural fit prevents wasted effort.

5. **Gaps indicate hybrid architecture needs.** When LLMs can't provide a prerequisite, add specialized components.

---

**Homework:** Complete the architectural analysis for your deployment scenario.

**Next Session:** Experimental Design I—Constructing Diagnostic Scenarios