# HGM Core Concepts Workshop

A hands-on exploration of AI agent memory and context management using Hierarchical Graph Memory (HGM) concepts.

---

## What You Will Learn

This workshop teaches the foundational concepts that enable AI agents to **remember**, **learn**, and **adapt**. By the end, you'll understand:

| Concept | What It Is | Why It Matters |
|---------|------------|----------------|
| **Context Engineering** | Techniques for managing what information an agent "sees" | Agents have limited context windows; smart selection is critical |
| **Agent State Management** | Tracking focus, sessions, and conversation episodes | Enables personalized, coherent interactions over time |
| **Agentic RAG** | Retrieval-Augmented Generation with active memory promotion | Goes beyond static retrieval to dynamically reorganize knowledge |
| **Hierarchical Memory** | Three-tier architecture with temperature-based placement | Balances speed vs. storage, like human working/long-term memory |

---

## How This Workshop Is Structured

Each section follows a consistent pattern:
1. **Concept Definition** - What is this concept and why does it exist?
2. **Comparison to Literature** - How does this relate to established patterns?
3. **Algorithm Explanation** - The underlying algorithms with pseudocode
4. **Code Implementation** - Working code you can modify and experiment with
5. **Demo Output** - See the concepts in action

---

## Prerequisites

- **Python 3.10+**
- **NumPy** (the only external dependency)
- Basic understanding of AI/LLM concepts

**No external databases, Redis, or API keys needed!** Everything runs in-memory for workshop portability.

---

## Key Insight

> Traditional RAG systems are **passive** - they retrieve the same documents regardless of context.
> 
> HGM is **active** - it reorganizes memories based on what's relevant NOW, promoting frequently-accessed knowledge and letting unused knowledge fade.

This mirrors how human memory works: information you use often stays accessible, while rarely-used knowledge becomes harder to recall.

---

## Context Engineering Landscape

Before diving into HGM specifics, the next section situates these concepts within the broader **context engineering** landscape, comparing to:

- **Memory Streams** (Generative Agents paper)
- **MemGPT** (Virtual context management)
- **Reflexion** (Self-reflection patterns)
- **RAPTOR** (Recursive summarization)
- Traditional **RAG** approaches

This will help you understand **what's new** in HGM and **why** each design decision was made.

In [None]:
# Section 1: Setup - Only standard library + numpy
from dataclasses import dataclass, field
from datetime import datetime, timezone
from enum import Enum
from typing import Any, Optional
import uuid
import re
import time
import numpy as np

# Utility for timestamps
def utcnow() -> datetime:
    return datetime.now(timezone.utc)

print("Setup complete!")
print(f"NumPy version: {np.__version__}")
print("Dependencies: numpy, dataclasses, enum, datetime, uuid, re")

---

## Context Engineering: Where HGM Fits

### What is Context Engineering?

**Context Engineering** is the discipline of designing and managing the information that flows into an AI model's context window. As models have limited context (4K-200K tokens), what you include—and exclude—directly impacts response quality.

> "Context engineering is the new prompt engineering" — Andrej Karpathy

### The Evolution of Context Management

| Era | Approach | Limitation |
|-----|----------|------------|
| **Prompt Engineering** | Hand-crafted prompts with examples | Static, doesn't scale |
| **RAG v1** | Retrieve docs → stuff into context | No prioritization, token waste |
| **RAG v2** | Chunking + reranking | Still passive, no learning |
| **Agentic RAG** | Active retrieval with memory | Where HGM operates |
| **Context Engineering** | Holistic context curation | Full system design |

### Established Patterns vs HGM Implementation

| Established Pattern | Description | HGM Implementation |
|---------------------|-------------|-------------------|
| **Sliding Window** | Keep last N tokens | Episode boundaries + focus decay |
| **Retrieval-Augmented Generation** | Fetch relevant docs at query time | Three-tier memory with promotion |
| **Memory Streams** (Park et al.) | Time-weighted memory with importance | Temperature scoring (recency × relevance × frequency) |
| **Reflection** | Summarize and compress memories | Episode summaries + hierarchy paths |
| **Working Memory** | Limited capacity, fast access store | Hot tier with token-budget LRU |
| **Long-term Memory** | Persistent knowledge storage | Cold tier (PostgreSQL) |
| **Semantic Memory** | Facts and concepts | SEMANTIC memory type |
| **Episodic Memory** | Events and experiences | EPISODIC memory type + Episode tracking |
| **Procedural Memory** | Skills and how-to knowledge | PROCEDURAL memory type |

### Key Concepts from the Literature

#### 1. Memory Streams (Stanford/Google - Generative Agents)
The influential "Generative Agents" paper introduced **memory streams** with:
- **Recency**: Recent memories are more important
- **Importance**: Some memories matter more
- **Relevance**: Context-dependent retrieval

**HGM's approach**: Temperature scoring combines all three into a single metric with tunable weights.

📄 Paper: [Generative Agents: Interactive Simulacra of Human Behavior](https://arxiv.org/abs/2304.03442) (Park et al., 2023)

#### 2. Reflexion (Shinn et al.)
Self-reflection pattern where agents learn from mistakes:
- Store failed attempts
- Retrieve relevant failures
- Avoid repeating mistakes

**HGM's approach**: Pattern graph stores learned strategies with effectiveness scores that improve over time.

📄 Paper: [Reflexion: Language Agents with Verbal Reinforcement Learning](https://arxiv.org/abs/2303.11366) (Shinn et al., 2023)

#### 3. MemGPT (Packer et al.)
Virtual context management mimicking OS memory:
- Main context = RAM (limited)
- External storage = Disk (unlimited)
- Explicit page in/out

**HGM's approach**: Three-tier architecture (Hot/Warm/Cold) with automatic promotion/demotion based on temperature.

📄 Paper: [MemGPT: Towards LLMs as Operating Systems](https://arxiv.org/abs/2310.08560) (Packer et al., 2023)

#### 4. RAPTOR (Sarthi et al.)
Recursive summarization for better retrieval:
- Build tree of summaries
- Retrieve at appropriate abstraction level

**HGM's approach**: Hierarchy paths (e.g., "tech/python/async") enable retrieval at different abstraction levels.

📄 Paper: [RAPTOR: Recursive Abstractive Processing for Tree-Organized Retrieval](https://arxiv.org/abs/2401.18059) (Sarthi et al., 2024)

#### 5. ACE - Agentic Context Engineering (Stanford)
Framework for dynamically evolving contexts that enable self-improvement:
- Contexts as "evolving playbooks" that accumulate strategies
- Generation, reflection, and curation cycle
- Addresses "brevity bias" and "context collapse"

**HGM's approach**: Similar philosophy of treating context as dynamic rather than static, with pattern graphs that learn and improve over time.

📄 Paper: [Agentic Context Engineering](https://arxiv.org/abs/2510.04618) (Agarwal et al., 2025)

#### 6. Cognitive Architectures (ACT-R, SOAR)
Classical AI architectures modeling human cognition:
- Declarative vs procedural memory
- Activation-based retrieval
- Goal-directed behavior

**HGM's approach**: Memory types (SEMANTIC, EPISODIC, PROCEDURAL, EMOTIONAL) directly inspired by cognitive science.

📄 Resources: [ACT-R](http://act-r.psy.cmu.edu/) | [SOAR](https://soar.eecs.umich.edu/)

### Context Window Strategies Comparison



### HGM's Unique Contributions

| Innovation | What's New | Why It Matters |
|------------|------------|----------------|
| **Temperature Scoring** | Unified 5-factor metric | Single score for all tier/ranking decisions |
| **Automatic Promotion** | Access patterns drive organization | System improves without explicit training |
| **Pattern Graph** | Keyword → Strategy mapping | Bypass LLM for known patterns |
| **Mode Selection** | Route to optimal response strategy | Cost/latency optimization |
| **Agent Context** | Per-agent focus tracking | Multi-agent personalization |

### Research References & Further Reading

| Paper | Year | Key Contribution | Link |
|-------|------|------------------|------|
| **Generative Agents** | 2023 | Memory streams, reflection, importance scoring | [arXiv:2304.03442](https://arxiv.org/abs/2304.03442) |
| **MemGPT** | 2023 | Virtual context management, paging | [arXiv:2310.08560](https://arxiv.org/abs/2310.08560) |
| **Reflexion** | 2023 | Self-reflection, learning from failures | [arXiv:2303.11366](https://arxiv.org/abs/2303.11366) |
| **RAPTOR** | 2024 | Recursive summarization trees | [arXiv:2401.18059](https://arxiv.org/abs/2401.18059) |
| **ACE** | 2025 | Evolving context playbooks, self-improving agents | [arXiv:2510.04618](https://arxiv.org/abs/2510.04618) |
| **RAG Survey** | 2024 | Comprehensive RAG techniques overview | [arXiv:2312.10997](https://arxiv.org/abs/2312.10997) |
| **Self-RAG** | 2023 | Self-reflective retrieval augmentation | [arXiv:2310.11511](https://arxiv.org/abs/2310.11511) |
| **LangChain** | 2022+ | Practical RAG implementation patterns | [docs.langchain.com](https://docs.langchain.com) |
| **LlamaIndex** | 2022+ | Data framework for LLM applications | [docs.llamaindex.ai](https://docs.llamaindex.ai) |
| **ACT-R** | 1993+ | Cognitive architecture, activation-based memory | [act-r.psy.cmu.edu](http://act-r.psy.cmu.edu/) |
| **SOAR** | 1987+ | Cognitive architecture, procedural learning | [soar.eecs.umich.edu](https://soar.eecs.umich.edu/) |

---

## Section 2: Core Data Models - Memory Types

### The Concept

Human memory isn't a single system—cognitive scientists have identified distinct memory types that serve different purposes. HGM adopts this model for AI agents, classifying memories to enable smarter retrieval and organization.

### Why This Matters for AI Agents

Without memory classification:
- ❌ A fact and a preference get treated identically
- ❌ Instructions get buried alongside random observations
- ❌ The agent can't prioritize procedural knowledge when executing tasks

With memory classification:
- ✅ Agent knows to retrieve PROCEDURAL memories when asked "how to"
- ✅ EMOTIONAL memories (preferences) influence response style
- ✅ EPISODIC memories provide conversation continuity

### The Four Memory Types

| Type | Cognitive Basis | AI Agent Use Case | Example |
|------|-----------------|-------------------|---------|
| **SEMANTIC** | Declarative facts, world knowledge | Reference information, definitions | "Python is a programming language" |
| **EPISODIC** | Events tied to time/place | Conversation history, meeting notes | "We discussed the API design yesterday" |
| **PROCEDURAL** | How-to knowledge, skills | Instructions, workflows, recipes | "To deploy: run docker compose up" |
| **EMOTIONAL** | Feelings, preferences | User preferences, sentiment | "User prefers concise explanations" |

### What the Code Demonstrates

1. **MemoryType enum** - Classification system for memories
2. **Memory dataclass** - The core unit with embedding, metadata, and temperature
3. **create_memory()** - Factory function with deterministic mock embeddings

### Mental Model

Think of memory types like filing cabinets:
- 📚 **SEMANTIC** = Encyclopedia cabinet (facts to look up)
- 📅 **EPISODIC** = Journal cabinet (events to recall)
- 📋 **PROCEDURAL** = Instruction manual cabinet (steps to follow)
- ❤️ **EMOTIONAL** = Personal notes cabinet (preferences to honor)

In [None]:
# Memory type classification
class MemoryType(str, Enum):
    """Classification based on cognitive memory types."""
    SEMANTIC = "semantic"      # Facts, concepts
    EPISODIC = "episodic"      # Events, experiences
    PROCEDURAL = "procedural"  # How-to, patterns
    EMOTIONAL = "emotional"    # Sentiment, preferences

@dataclass
class Memory:
    """Core memory unit with embedding and metadata."""
    id: str
    content: str
    embedding: np.ndarray
    memory_type: MemoryType
    
    # Temporal tracking
    created_at: float  # Unix timestamp
    accessed_at: float
    access_count: int = 0
    
    # Classification
    hierarchy_path: str = ""  # e.g., "tech/python/async"
    entity_ids: list[str] = field(default_factory=list)
    
    # Temperature (computed dynamically)
    temperature: float = 0.5
    
    @property
    def token_estimate(self) -> int:
        """Rough token count (~4 chars per token)."""
        return len(self.content) // 4

print("Memory types:", [t.value for t in MemoryType])

In [None]:
# Helper to create memories with deterministic mock embeddings
def create_memory(
    content: str,
    memory_type: MemoryType = MemoryType.SEMANTIC,
    hierarchy_path: str = "",
    entity_ids: list[str] = None,
    hours_ago: float = 0,
    access_count: int = 1,
) -> Memory:
    """
    Create a memory with deterministic embedding based on content hash.
    
    In production, embeddings come from models like OpenAI's text-embedding-3-small.
    Here we use content-based hashing for reproducible demos.
    """
    now = utcnow().timestamp()
    
    # Deterministic embedding from content (384 dimensions like all-MiniLM-L6-v2)
    np.random.seed(hash(content) % 2**32)
    embedding = np.random.randn(384).astype(np.float32)
    embedding = embedding / np.linalg.norm(embedding)  # Normalize to unit vector
    
    return Memory(
        id=str(uuid.uuid4()),
        content=content,
        embedding=embedding,
        memory_type=memory_type,
        created_at=now - (hours_ago * 3600),
        accessed_at=now - (hours_ago * 3600),
        access_count=access_count,
        hierarchy_path=hierarchy_path,
        entity_ids=entity_ids or [],
    )

# Create sample memories for the workshop
sample_memories = [
    create_memory(
        "Python is a versatile programming language with dynamic typing and extensive libraries.",
        MemoryType.SEMANTIC,
        "tech/python",
        ["python", "programming"],
        hours_ago=2,
        access_count=5,
    ),
    create_memory(
        "We discussed the API design for the authentication module in yesterday's meeting.",
        MemoryType.EPISODIC,
        "projects/api",
        ["api", "authentication", "meeting"],
        hours_ago=24,
        access_count=2,
    ),
    create_memory(
        "To deploy the application: run `docker compose up -d` then `./scripts/migrate.sh`",
        MemoryType.PROCEDURAL,
        "ops/deployment",
        ["docker", "deployment", "devops"],
        hours_ago=48,
        access_count=10,
    ),
    create_memory(
        "User prefers concise, technical explanations over verbose ones.",
        MemoryType.EMOTIONAL,
        "preferences/style",
        ["preference", "communication"],
        hours_ago=72,
        access_count=3,
    ),
]

print(f"Created {len(sample_memories)} sample memories:")
for m in sample_memories:
    print(f"  [{m.memory_type.value:10}] {m.content[:50]}...")
    print(f"                  Path: {m.hierarchy_path}, Entities: {m.entity_ids}")

---

## Section 3: Agent State Management

### The Concept

**Agent State Management** is about maintaining the "mental state" of an AI agent across interactions. This includes:

- **Focus** - What the agent is currently "thinking about" (embedding, entities, topic hierarchy)
- **Session** - The current interaction period (turn count, timing)
- **Episodes** - Coherent conversation segments grouped by topic

### Why This Matters for AI Agents

Consider a coding assistant:

| Without State Management | With State Management |
|-------------------------|----------------------|
| Every message is isolated | Agent remembers you were debugging auth |
| Repeats the same questions | Knows your preferences from past sessions |
| Can't track topic changes | Detects when you switch from coding to deployment |
| Generic responses | Personalized to your role and context |

### Core Components

#### 1. Focus Tracking (Context Engineering)
The agent's **focus** determines which memories are most relevant RIGHT NOW:
- `focus_embedding` - Vector representation of current topic
- `focus_entities` - Key concepts being discussed (e.g., {"python", "async", "error handling"})
- `focus_hierarchy_path` - Position in knowledge hierarchy (e.g., "tech/python/async")

#### 2. Episode Management
**Episodes** are coherent segments of conversation. Detecting episode boundaries helps:
- Summarize completed topics
- Archive relevant context
- Reset focus for new topics

#### 3. Session Tracking
- `turn_count` - How many interactions in this session
- `last_interaction` - For timeout and recency calculations

### What the Code Demonstrates

1. **AgentContext** dataclass - Complete state for a single agent
2. **Episode** dataclass - Conversation segment with topic summary
3. **update_focus()** - How focus changes as conversation evolves
4. **Demo** - Two agents (researcher, coder) with different contexts

### Mental Model

Think of an agent like a person in a meeting:
- 🎯 **Focus** = What they're currently paying attention to
- 🗣️ **Episode** = The current agenda item being discussed
- 📊 **Session** = The entire meeting
- 🧠 **Context** = Everything they know + their role

In [None]:
@dataclass
class Episode:
    """A coherent episode of interaction (like a conversation topic)."""
    id: str
    started_at: float
    ended_at: float | None = None
    topic_summary: str = ""
    key_entities: list[str] = field(default_factory=list)
    memory_ids: list[str] = field(default_factory=list)
    turn_count: int = 0

@dataclass
class AgentContext:
    """Per-agent context for memory operations and focus tracking."""
    agent_id: str
    agent_role: str  # "researcher", "coder", "planner", etc.
    session_id: str
    
    # Focus tracking - what the agent is currently "thinking about"
    focus_embedding: np.ndarray | None = None
    focus_entities: set[str] = field(default_factory=set)
    focus_hierarchy_path: str = ""
    
    # Session state
    turn_count: int = 0
    last_interaction: float = field(default_factory=lambda: utcnow().timestamp())
    
    # Episode tracking
    current_episode: Episode | None = None
    episode_history: list[Episode] = field(default_factory=list)
    
    def update_focus(
        self,
        embedding: np.ndarray | None = None,
        entities: set[str] | None = None,
        hierarchy_path: str | None = None,
    ) -> None:
        """Update the agent's current focus (context engineering)."""
        if embedding is not None:
            self.focus_embedding = embedding
        if entities is not None:
            self.focus_entities = entities
        if hierarchy_path is not None:
            self.focus_hierarchy_path = hierarchy_path
        self.last_interaction = utcnow().timestamp()
        self.turn_count += 1
    
    def start_episode(self, topic: str = "") -> Episode:
        """Start a new episode (topic segment)."""
        if self.current_episode:
            self.end_episode()
        
        self.current_episode = Episode(
            id=str(uuid.uuid4()),
            started_at=utcnow().timestamp(),
            topic_summary=topic,
            key_entities=list(self.focus_entities),
        )
        return self.current_episode
    
    def end_episode(self) -> Episode | None:
        """End current episode and archive it."""
        if not self.current_episode:
            return None
        
        episode = self.current_episode
        episode.ended_at = utcnow().timestamp()
        self.episode_history.append(episode)
        self.current_episode = None
        return episode

def create_agent(agent_id: str, role: str = "general") -> AgentContext:
    """Factory to create an agent context."""
    return AgentContext(
        agent_id=agent_id,
        agent_role=role,
        session_id=str(uuid.uuid4()),
    )

print("AgentContext defined with focus tracking and episode management")

In [None]:
# Demo: Create two agents with different roles and contexts

# Researcher agent focusing on ML
researcher = create_agent("researcher-001", "researcher")
np.random.seed(42)
ml_embedding = np.random.randn(384).astype(np.float32)
ml_embedding = ml_embedding / np.linalg.norm(ml_embedding)

researcher.update_focus(
    embedding=ml_embedding,
    entities={"machine learning", "neural networks", "deep learning"},
    hierarchy_path="research/ai/ml",
)
researcher.start_episode("Exploring ML architectures")

# Coder agent focusing on API development
coder = create_agent("coder-001", "coder")
np.random.seed(123)
api_embedding = np.random.randn(384).astype(np.float32)
api_embedding = api_embedding / np.linalg.norm(api_embedding)

coder.update_focus(
    embedding=api_embedding,
    entities={"api", "authentication", "fastapi"},
    hierarchy_path="projects/api/auth",
)
coder.start_episode("Implementing auth endpoints")

print("Agent States Comparison:")
print("=" * 60)

for agent in [researcher, coder]:
    print(f"\n{agent.agent_role.upper()} ({agent.agent_id}):")
    print(f"  Focus entities: {agent.focus_entities}")
    print(f"  Focus path: {agent.focus_hierarchy_path}")
    print(f"  Turn count: {agent.turn_count}")
    print(f"  Current episode: {agent.current_episode.topic_summary}")
    print(f"  Has focus embedding: {agent.focus_embedding is not None}")

---

## Section 4: Temperature Scoring System

### The Concept

**Temperature** is a unified metric (0.0 to 1.0) that represents how "hot" or relevant a memory is RIGHT NOW. It combines multiple signals into a single score that determines:

1. Which **tier** a memory belongs to (Hot, Warm, Cold)
2. Whether to **promote** or **demote** a memory
3. **Ranking** within search results

### Why This Matters for AI Agents

The challenge: An agent might have 100,000 memories, but only ~4,000 tokens fit in context. How do you pick the RIGHT memories?

**Naive approaches fail:**
- ❌ Most recent? Misses relevant older knowledge
- ❌ Most accessed? Ignores current context
- ❌ Semantic similarity only? Misses entity connections

**Temperature scoring succeeds by combining all signals:**
- ✅ Recent AND relevant AND frequently accessed = BLAZING hot
- ✅ Old but highly relevant to current query = Still gets promoted
- ✅ Recent but irrelevant = Cools down quickly

### The Five Scoring Factors

| Factor | Weight | What It Captures | How It's Calculated |
|--------|--------|------------------|---------------------|
| **Recency** | 30% | Time since last access | Exponential decay (24h half-life) |
| **Frequency** | 15% | How often accessed | Access count / max_count |
| **Relevance** | 35% | Semantic match to focus | Cosine similarity of embeddings |
| **Entity Overlap** | 15% | Keyword/concept match | Jaccard-like intersection |
| **Agent Match** | 5% | Same agent/role | Binary boost |

### The Formula

```
temperature = 0.30 × recency + 0.15 × frequency + 0.35 × relevance + 0.15 × entity_overlap + 0.05 × agent_match
```

### Temperature Zones

| Zone | Range | Meaning | Storage Tier |
|------|-------|---------|--------------|
| 🔥 BLAZING | >0.85 | Critical for current task | Hot (in-memory) |
| 🌡️ HOT | 0.70-0.85 | Highly relevant, working memory | Hot (in-memory) |
| ☀️ WARM | 0.50-0.70 | Recently used, session cache | Warm (Redis) |
| 🌤️ COOLING | 0.30-0.50 | Fading relevance | Warm → Cold |
| ❄️ COLD | 0.10-0.30 | Archived, long-term storage | Cold (Postgres) |
| 🧊 FROZEN | <0.10 | Deep archive, rarely accessed | Cold (archive) |

### What the Code Demonstrates

1. **TemperatureConfig** - Tunable weights and thresholds
2. **TemperatureScorer** - The 5-factor computation
3. **compute_breakdown()** - See contribution of each factor
4. **Demo** - Same memories scored differently based on focus context

### Mental Model

Think of temperature like attention:
- 🔥 **Hot** = Top of mind, actively thinking about
- ☀️ **Warm** = In the back of your head, easily recalled
- ❄️ **Cold** = Stored away, requires effort to remember

In [None]:
class TemperatureZone(str, Enum):
    """Temperature zones for tier placement."""
    BLAZING = "blazing"   # >0.85 - Critical
    HOT = "hot"           # 0.70-0.85 - Working memory
    WARM = "warm"         # 0.50-0.70 - Session cache
    COOLING = "cooling"   # 0.30-0.50 - Fading
    COLD = "cold"         # 0.10-0.30 - Archived
    FROZEN = "frozen"     # <0.10 - Deep archive

@dataclass
class TemperatureConfig:
    """Configuration for temperature computation."""
    # Component weights (must sum to 1.0)
    weight_recency: float = 0.30
    weight_frequency: float = 0.15
    weight_relevance: float = 0.35
    weight_entity: float = 0.15
    weight_agent: float = 0.05
    
    # Decay parameters
    recency_half_life_hours: float = 24.0  # Temperature halves every 24 hours
    max_access_count: int = 100
    
    # Zone thresholds
    blazing_threshold: float = 0.85
    hot_threshold: float = 0.70
    warm_threshold: float = 0.50
    cooling_threshold: float = 0.30
    cold_threshold: float = 0.10

class TemperatureScorer:
    """Computes memory temperature for tier placement decisions."""
    
    def __init__(self, config: TemperatureConfig | None = None):
        self.config = config or TemperatureConfig()
    
    def compute(
        self,
        memory: Memory,
        focus_embedding: np.ndarray | None = None,
        focus_entities: set[str] | None = None,
        current_time: float | None = None,
    ) -> float:
        """Compute temperature for a memory based on 5 factors."""
        current_time = current_time or utcnow().timestamp()
        
        # 1. RECENCY: Exponential decay based on last access
        hours_ago = (current_time - memory.accessed_at) / 3600.0
        recency = np.power(0.5, hours_ago / self.config.recency_half_life_hours)
        
        # 2. FREQUENCY: How often is this memory accessed?
        frequency = min(memory.access_count / self.config.max_access_count, 1.0)
        
        # 3. RELEVANCE: Cosine similarity to current focus
        if focus_embedding is not None:
            dot = np.dot(memory.embedding, focus_embedding)
            norm_m = np.linalg.norm(memory.embedding)
            norm_f = np.linalg.norm(focus_embedding)
            # Map cosine similarity [-1, 1] to [0, 1]
            relevance = float(np.clip((dot / (norm_m * norm_f + 1e-8) + 1) / 2, 0, 1))
        else:
            relevance = 0.5  # Default when no focus
        
        # 4. ENTITY OVERLAP: Jaccard-like intersection with focus entities
        if focus_entities and memory.entity_ids:
            memory_entities = set(memory.entity_ids)
            intersection = len(memory_entities & focus_entities)
            entity_score = intersection / len(focus_entities) if focus_entities else 0
        else:
            entity_score = 0.0
        
        # 5. AGENT MATCH: Would check agent_id match in full implementation
        agent_score = 0.5  # Neutral for this demo
        
        # Weighted combination
        temperature = (
            self.config.weight_recency * recency +
            self.config.weight_frequency * frequency +
            self.config.weight_relevance * relevance +
            self.config.weight_entity * entity_score +
            self.config.weight_agent * agent_score
        )
        
        return float(np.clip(temperature, 0.0, 1.0))
    
    def compute_breakdown(
        self,
        memory: Memory,
        focus_embedding: np.ndarray | None = None,
        focus_entities: set[str] | None = None,
    ) -> dict:
        """Compute temperature with full factor breakdown."""
        current_time = utcnow().timestamp()
        hours_ago = (current_time - memory.accessed_at) / 3600.0
        
        recency = np.power(0.5, hours_ago / self.config.recency_half_life_hours)
        frequency = min(memory.access_count / self.config.max_access_count, 1.0)
        
        if focus_embedding is not None:
            dot = np.dot(memory.embedding, focus_embedding)
            norm_m = np.linalg.norm(memory.embedding)
            norm_f = np.linalg.norm(focus_embedding)
            relevance = float(np.clip((dot / (norm_m * norm_f + 1e-8) + 1) / 2, 0, 1))
        else:
            relevance = 0.5
        
        if focus_entities and memory.entity_ids:
            memory_entities = set(memory.entity_ids)
            intersection = len(memory_entities & focus_entities)
            entity_score = intersection / len(focus_entities)
        else:
            entity_score = 0.0
        
        return {
            "recency": recency,
            "frequency": frequency,
            "relevance": relevance,
            "entity_overlap": entity_score,
            "hours_ago": hours_ago,
        }
    
    def get_zone(self, temperature: float) -> TemperatureZone:
        """Get temperature zone for a value."""
        if temperature >= self.config.blazing_threshold:
            return TemperatureZone.BLAZING
        elif temperature >= self.config.hot_threshold:
            return TemperatureZone.HOT
        elif temperature >= self.config.warm_threshold:
            return TemperatureZone.WARM
        elif temperature >= self.config.cooling_threshold:
            return TemperatureZone.COOLING
        elif temperature >= self.config.cold_threshold:
            return TemperatureZone.COLD
        else:
            return TemperatureZone.FROZEN
    
    def should_promote_to_hot(self, temperature: float) -> bool:
        """Should this memory be promoted to hot tier?"""
        return temperature >= self.config.hot_threshold
    
    def should_demote_from_hot(self, temperature: float) -> bool:
        """Should this memory be demoted from hot tier?"""
        return temperature < self.config.warm_threshold

# Create global scorer
scorer = TemperatureScorer()
print("TemperatureScorer ready!")
print(f"Weights: recency={scorer.config.weight_recency}, frequency={scorer.config.weight_frequency}, "
      f"relevance={scorer.config.weight_relevance}, entity={scorer.config.weight_entity}")

In [None]:
# Demo: Score memories with different focus contexts

# Create focus embedding for "Python programming"
np.random.seed(hash("Python programming") % 2**32)
python_focus = np.random.randn(384).astype(np.float32)
python_focus = python_focus / np.linalg.norm(python_focus)
python_entities = {"python", "programming", "code"}

print("Temperature Scoring Demo")
print(f"Focus: 'Python programming'")
print(f"Focus entities: {python_entities}")
print("=" * 70)

for memory in sample_memories:
    temp = scorer.compute(
        memory=memory,
        focus_embedding=python_focus,
        focus_entities=python_entities,
    )
    breakdown = scorer.compute_breakdown(
        memory=memory,
        focus_embedding=python_focus,
        focus_entities=python_entities,
    )
    zone = scorer.get_zone(temp)
    promote = scorer.should_promote_to_hot(temp)
    
    # Update memory's temperature
    memory.temperature = temp
    
    print(f"\n[{memory.memory_type.value.upper():10}] Temperature: {temp:.3f} | Zone: {zone.value:8}")
    print(f"  Content: {memory.content[:50]}...")
    print(f"  Breakdown:")
    print(f"    Recency:  {breakdown['recency']:.3f} ({breakdown['hours_ago']:.1f}h ago)")
    print(f"    Frequency: {breakdown['frequency']:.3f} ({memory.access_count} accesses)")
    print(f"    Relevance: {breakdown['relevance']:.3f}")
    print(f"    Entity:    {breakdown['entity_overlap']:.3f} (overlap: {set(memory.entity_ids) & python_entities})")
    print(f"  -> Promote to HOT: {promote}")

---

## Section 5: Hot Tier (Working Memory)

### The Concept

The **Hot Tier** is the agent's "working memory"—the fastest, most expensive tier designed for sub-millisecond access to the most critical memories. In production, this is implemented in Rust with SIMD acceleration.

### Why This Matters for AI Agents

LLM context windows are limited (4K-128K tokens). The hot tier ensures:

| Constraint | Hot Tier Solution |
|------------|-------------------|
| Limited context window | **Token budget** - Stays within limits automatically |
| Slow retrieval = slow responses | **<1ms access** - No perceptible delay |
| Irrelevant context = poor answers | **LRU eviction** - Old/unused memories make room for new |
| Need fast similarity search | **SIMD vectorization** - Batch cosine similarity in NumPy |

### Key Features

#### 1. Token-Based Eviction
Unlike traditional caches that evict by count, the hot tier evicts by **token count**:
- Max budget: ~8,000 tokens (configurable)
- When full: Evict **Least Recently Used** memories
- Why tokens? LLM context is measured in tokens, not memory count

#### 2. Vectorized Similarity Search
For fast retrieval, we compute similarity against ALL hot memories at once:
```python
# Batch cosine similarity (vectorized)
similarities = (embeddings / norms) @ query_vector
```
This is O(n) but with very small constants due to SIMD parallelism.

#### 3. Entity Boost
Memories matching the current focus entities get a relevance boost:
- +10% per matching entity
- Max +30% boost
- Why? Keywords indicate topical relevance beyond embeddings

### Hot Tier vs Traditional Cache

| Traditional Cache | Hot Tier |
|------------------|----------|
| Key-value lookup | Similarity search |
| Fixed-size entries | Variable-size (tokens) |
| Count-based eviction | Token-budget eviction |
| Binary hit/miss | Relevance-ranked results |

### What the Code Demonstrates

1. **HotMemory** - Optimized dataclass for hot tier storage
2. **HotTier.put()** - Add with automatic LRU eviction
3. **HotTier.scan()** - Vectorized similarity search with entity boost
4. **Demo** - Add memories, observe eviction, scan with query

### Mental Model

Think of the hot tier like a whiteboard in a meeting:
- 📝 **Limited space** - Can only fit so much
- 🔄 **Erase old stuff** - Make room for new relevant info
- ⚡ **Instantly visible** - Everyone can see it immediately
- 🎯 **Most important items** - Only key points go on the board

In [None]:
@dataclass
class HotMemory:
    """Hot tier memory - optimized for fast access."""
    id: str
    content: str
    embedding: np.ndarray
    memory_type: MemoryType
    
    accessed_at: float
    access_count: int
    entity_ids: list[str]
    hierarchy_path: str
    token_count: int
    temperature: float = 0.0

@dataclass
class ScanResult:
    """Result from hot tier scan."""
    memories: list[HotMemory]
    scores: list[float]
    total_scanned: int
    scan_time_ms: float

class HotTier:
    """Working memory tier using in-memory storage with LRU eviction."""
    
    def __init__(self, agent_id: str, max_tokens: int = 8000):
        self.agent_id = agent_id
        self.max_tokens = max_tokens
        self._memories: dict[str, HotMemory] = {}
        self._token_count = 0
        self._access_order: dict[str, float] = {}  # LRU tracking
        self._scan_count = 0
        self._eviction_count = 0
    
    def put(self, memory: HotMemory) -> list[str]:
        """Add memory, evicting LRU if over token budget. Returns evicted IDs."""
        evicted = []
        
        # Evict until we have room
        while (self._token_count + memory.token_count > self.max_tokens 
               and self._memories):
            evicted_id = self._evict_lru()
            if evicted_id:
                evicted.append(evicted_id)
        
        self._memories[memory.id] = memory
        self._token_count += memory.token_count
        self._access_order[memory.id] = utcnow().timestamp()
        
        return evicted
    
    def get(self, memory_id: str) -> HotMemory | None:
        """Get memory by ID and update access time."""
        memory = self._memories.get(memory_id)
        if memory:
            self._access_order[memory_id] = utcnow().timestamp()
            memory.access_count += 1
        return memory
    
    def scan(
        self,
        query_embedding: np.ndarray,
        focus_entities: set[str] | None = None,
        limit: int = 10,
        min_similarity: float = 0.0,
    ) -> ScanResult:
        """Scan hot tier for relevant memories using vectorized similarity."""
        start = time.perf_counter()
        self._scan_count += 1
        
        if not self._memories:
            return ScanResult([], [], 0, 0.0)
        
        memory_list = list(self._memories.values())
        
        # Batch cosine similarity (vectorized for speed)
        embeddings = np.array([m.embedding for m in memory_list])
        query_norm = query_embedding / (np.linalg.norm(query_embedding) + 1e-8)
        mem_norms = np.linalg.norm(embeddings, axis=1, keepdims=True) + 1e-8
        similarities = (embeddings / mem_norms) @ query_norm
        
        # Entity boost: +10% per matching entity, max +30%
        if focus_entities:
            for i, mem in enumerate(memory_list):
                overlap = len(set(mem.entity_ids) & focus_entities)
                if overlap > 0:
                    boost = min(0.3, overlap * 0.1)
                    similarities[i] = min(1.0, similarities[i] * (1 + boost))
        
        # Combine similarity with temperature (70% similarity + 30% temperature)
        final_scores = [
            0.7 * sim + 0.3 * mem.temperature
            for sim, mem in zip(similarities, memory_list)
        ]
        
        # Filter and sort
        scored = [
            (mem, score, sim)
            for mem, score, sim in zip(memory_list, final_scores, similarities)
            if sim >= min_similarity
        ]
        scored.sort(key=lambda x: x[1], reverse=True)
        top = scored[:limit]
        
        elapsed_ms = (time.perf_counter() - start) * 1000
        
        return ScanResult(
            memories=[m for m, _, _ in top],
            scores=[s for _, s, _ in top],
            total_scanned=len(memory_list),
            scan_time_ms=elapsed_ms,
        )
    
    def _evict_lru(self) -> str | None:
        """Evict least recently used memory."""
        if not self._access_order:
            return None
        lru_id = min(self._access_order, key=self._access_order.get)
        memory = self._memories.pop(lru_id, None)
        if memory:
            self._token_count -= memory.token_count
            self._access_order.pop(lru_id)
            self._eviction_count += 1
        return lru_id
    
    def stats(self) -> dict:
        return {
            "memory_count": len(self._memories),
            "token_count": self._token_count,
            "max_tokens": self.max_tokens,
            "utilization": f"{self._token_count / self.max_tokens:.1%}",
            "scan_count": self._scan_count,
            "eviction_count": self._eviction_count,
        }

print("HotTier defined with LRU eviction and vectorized similarity search")

In [None]:
# Demo: Hot tier operations

# Helper to convert Memory to HotMemory
def to_hot_memory(m: Memory) -> HotMemory:
    return HotMemory(
        id=m.id,
        content=m.content,
        embedding=m.embedding,
        memory_type=m.memory_type,
        accessed_at=m.accessed_at,
        access_count=m.access_count,
        entity_ids=m.entity_ids,
        hierarchy_path=m.hierarchy_path,
        token_count=m.token_estimate,
        temperature=m.temperature,
    )

# Create hot tier with small budget to show eviction
hot = HotTier(agent_id="demo-agent", max_tokens=500)

print("Adding memories to Hot Tier (max 500 tokens):")
print("-" * 60)

for m in sample_memories:
    hot_mem = to_hot_memory(m)
    evicted = hot.put(hot_mem)
    print(f"Added: {m.content[:40]}... ({hot_mem.token_count} tokens)")
    if evicted:
        print(f"  -> Evicted {len(evicted)} memory(ies) to make room")

print(f"\nHot Tier Stats: {hot.stats()}")

# Scan hot tier
print("\n" + "=" * 60)
print("Scanning Hot Tier for: 'Python programming'")
print("=" * 60)

result = hot.scan(
    query_embedding=python_focus,
    focus_entities={"python", "programming"},
    limit=5,
)

print(f"\nScan time: {result.scan_time_ms:.3f}ms")
print(f"Scanned: {result.total_scanned} memories")
print("\nResults:")
for mem, score in zip(result.memories, result.scores):
    print(f"  [{score:.3f}] {mem.content[:50]}...")

---

## Section 6: Agentic RAG - Active Recall with Promotion

### The Concept

**Agentic RAG** (Retrieval-Augmented Generation) goes beyond passive document retrieval. Instead of just fetching relevant content, the system **actively reorganizes** its memory based on what's being accessed.

Key behaviors:
- **Promotion**: Frequently accessed cold memories get promoted to warmer tiers
- **Demotion**: Unused hot memories cool down and get demoted
- **Self-Organization**: The system learns what's important through usage patterns

### Why This Matters for AI Agents

Traditional RAG is **static**—the same query always searches the same indexes. Agentic RAG is **dynamic**:

| Traditional RAG | Agentic RAG |
|----------------|-------------|
| Search once, return results | Search + reorganize for next time |
| All memories equally accessible | Hot memories are faster to access |
| No learning from access patterns | Frequently used = promoted |
| Cold start every query | Warmed-up context from recent queries |

### The Three-Tier Architecture

```
         ┌─────────────────┐
         │    HOT TIER     │  ← <1ms access (Rust in-memory)
         │   Working Set   │     Most relevant RIGHT NOW
         └────────┬────────┘
         ▲ promote │ demote ▼
         ┌────────┴────────┐
         │    WARM TIER    │  ← <50ms access (Redis)
         │  Session Cache  │     Recently accessed
         └────────┬────────┘
         ▲ promote │ demote ▼
         ┌────────┴────────┐
         │    COLD TIER    │  ← <200ms access (PostgreSQL)
         │   Long-term     │     Full knowledge base
         └─────────────────┘
```

### Tier Movement Rules

| Current Tier | Temperature | Action |
|--------------|-------------|--------|
| Cold | ≥0.70 | **Promote to Hot** |
| Cold | ≥0.50 | Promote to Warm |
| Warm | ≥0.70 | **Promote to Hot** |
| Warm | <0.50 | Demote to Cold |
| Hot | <0.50 | **Demote to Warm** |

### The Recall Flow

1. **Query arrives** → Compute query embedding
2. **Search all tiers** in parallel (Hot, Warm, Cold)
3. **Score each result** using temperature formula
4. **Promote/demote** based on new temperatures
5. **Return ranked results** with hottest first

### What the Code Demonstrates

1. **ThreeTierMemory** - Simulated three-tier system
2. **recall()** - Active recall with automatic tier movement
3. **TierPlacement** - Tracks promotions and demotions
4. **Demo** - Query triggers promotion of relevant memories

### Mental Model

Think of Agentic RAG like a library with a reading desk:
- 📚 **Cold** = Books on the shelves (all knowledge)
- 📖 **Warm** = Books you've pulled out recently (session)
- 📝 **Hot** = Open books on your desk (working set)

When you reference a book often, it stays on your desk. When you stop using it, it goes back to the shelf.

In [None]:
@dataclass
class TierPlacement:
    """Result of tier placement decision."""
    memory_id: str
    source_tier: str
    target_tier: str
    temperature: float
    promoted: bool
    demoted: bool

class ThreeTierMemory:
    """Simulated three-tier memory system with automatic promotion/demotion."""
    
    def __init__(self, agent_id: str):
        self.agent_id = agent_id
        self.scorer = TemperatureScorer()
        
        # Three tiers (in production: Hot=Rust memory, Warm=Redis, Cold=Postgres)
        self.hot: dict[str, Memory] = {}
        self.warm: dict[str, Memory] = {}
        self.cold: dict[str, Memory] = {}
        
        # Thresholds for tier movement
        self.hot_threshold = 0.70
        self.warm_threshold = 0.50
        self.cold_threshold = 0.30
    
    def store(self, memory: Memory) -> str:
        """Store memory - new memories start in warm tier."""
        self.warm[memory.id] = memory
        return "warm"
    
    def recall(
        self,
        query_embedding: np.ndarray,
        focus_entities: set[str],
        limit: int = 10,
    ) -> tuple[list[Memory], list[TierPlacement]]:
        """
        Active recall with automatic tier promotion/demotion.
        
        Returns:
            - List of recalled memories (sorted by relevance)
            - List of tier placements (promotions/demotions)
        """
        all_memories = []
        placements = []
        
        # Collect all memories with their source tier
        for tier_name, tier in [("hot", self.hot), ("warm", self.warm), ("cold", self.cold)]:
            for mem in tier.values():
                mem._source_tier = tier_name
                all_memories.append(mem)
        
        # Score all memories
        scored = []
        for mem in all_memories:
            temp = self.scorer.compute(
                memory=mem,
                focus_embedding=query_embedding,
                focus_entities=focus_entities,
            )
            mem.temperature = temp
            scored.append((mem, temp))
        
        # Sort by temperature (relevance)
        scored.sort(key=lambda x: x[1], reverse=True)
        top_memories = [m for m, _ in scored[:limit]]
        
        # Handle tier movements
        for mem, temp in scored:
            source = getattr(mem, '_source_tier', 'cold')
            target = self._get_target_tier(temp)
            
            if source != target:
                self._move_memory(mem, source, target)
                placements.append(TierPlacement(
                    memory_id=mem.id,
                    source_tier=source,
                    target_tier=target,
                    temperature=temp,
                    promoted=(target == "hot" and source != "hot") or 
                             (target == "warm" and source == "cold"),
                    demoted=(target == "cold" and source != "cold") or 
                            (target == "warm" and source == "hot"),
                ))
        
        return top_memories, placements
    
    def _get_target_tier(self, temperature: float) -> str:
        if temperature >= self.hot_threshold:
            return "hot"
        elif temperature >= self.warm_threshold:
            return "warm"
        else:
            return "cold"
    
    def _move_memory(self, memory: Memory, source: str, target: str):
        """Move memory between tiers."""
        tiers = {"hot": self.hot, "warm": self.warm, "cold": self.cold}
        tiers[source].pop(memory.id, None)
        tiers[target][memory.id] = memory
    
    def stats(self) -> dict:
        return {
            "hot": len(self.hot),
            "warm": len(self.warm),
            "cold": len(self.cold),
            "total": len(self.hot) + len(self.warm) + len(self.cold),
        }

print("ThreeTierMemory with active recall and promotion defined")

In [None]:
# Demo: Active recall with tier promotion

tiers = ThreeTierMemory(agent_id="demo-agent")

# Store sample memories (all start in warm tier)
print("Storing memories (all start in WARM tier):")
print("-" * 60)
for m in sample_memories:
    tier = tiers.store(m)
    print(f"  [{tier}] {m.content[:50]}...")

print(f"\nInitial tier distribution: {tiers.stats()}")

# Simulate recall with Python focus
print("\n" + "=" * 70)
print("ACTIVE RECALL: 'Python programming' query")
print("=" * 70)

recalled, placements = tiers.recall(
    query_embedding=python_focus,
    focus_entities={"python", "programming", "code"},
    limit=5,
)

print(f"\nRecalled {len(recalled)} memories (sorted by relevance):")
for m in recalled:
    zone = scorer.get_zone(m.temperature)
    print(f"  [{m.temperature:.3f}] [{zone.value:8}] {m.content[:45]}...")

print(f"\nTier movements ({len(placements)} changes):")
for p in placements:
    direction = "PROMOTED" if p.promoted else ("DEMOTED" if p.demoted else "MOVED")
    print(f"  {direction}: {p.source_tier} -> {p.target_tier} (temp: {p.temperature:.3f})")

print(f"\nFinal tier distribution: {tiers.stats()}")

---

## Section 7: Pattern Graph System

### The Concept

The **Pattern Graph** is a knowledge structure that connects **entities** (keywords, concepts) to **patterns** (learned response strategies). Unlike vector search alone, it enables:

- **Graph-based retrieval**: Find patterns via connected keywords
- **Semantic expansion**: "eigenvalues" connects to "linear algebra" patterns
- **Fast keyword matching**: O(1) lookup by entity

### Why This Matters for AI Agents

Embedding-based search finds semantically similar content, but misses explicit keyword connections:

| Query | Embedding Search | Pattern Graph |
|-------|------------------|---------------|
| "deploy" | Finds similar deployment docs | Finds exact "deploy" pattern with known strategy |
| "JWT auth" | Finds auth-related content | Finds specific JWT pattern via both keywords |
| "eigenvalues" | Finds math content | Follows graph: eigenvalues → linear algebra → patterns |

### Graph Structure

```
         ┌────────────┐
         │  ENTITIES  │  ← Keywords extracted from queries/patterns
         │ (keywords) │
         └─────┬──────┘
               │ edges (strength weighted)
               ▼
         ┌────────────┐
         │  PATTERNS  │  ← Trigger + Strategy pairs
         │ (responses)│
         └────────────┘
```

### Algorithms Used

#### 1. Keyword Extraction
```python
def extract_keywords(text):
    words = re.findall(r"[a-zA-Z][a-zA-Z0-9_]*", text.lower())
    return [w for w in words if w not in STOPWORDS and len(w) >= 3]
```
- **Regex tokenization**: Extract word-like tokens
- **Stopword filtering**: Remove common words (the, is, what, how, etc.)
- **Length threshold**: Keep only words ≥3 characters

#### 2. Entity-Pattern Linking
Each entity connects to patterns with a **strength** weight (0.0-1.0):
- Longer/rarer keywords get higher strength (0.9)
- Common keywords get lower strength (0.7)
- Strength = confidence that this keyword indicates the pattern

#### 3. Graph Traversal for Pattern Matching
```python
def find_patterns(keywords):
    results = {}
    for keyword in keywords:
        if keyword in entity_to_patterns:
            for (pattern_id, strength) in entity_to_patterns[keyword]:
                score = strength × pattern.effectiveness
                results[pattern_id] = max(results.get(pattern_id), score)
    return sorted(results, key=score, reverse=True)
```
- **O(k × p)** where k=keywords, p=average patterns per keyword
- **Aggregation**: Same pattern found via multiple keywords keeps max score
- **Ranking**: Patterns sorted by score

### What the Code Demonstrates

1. **PatternGraph** - Entity-pattern graph structure
2. **extract_keywords()** - Tokenization with stopword filtering
3. **link_entity_to_pattern()** - Creating weighted edges
4. **find_patterns()** - Graph traversal for pattern retrieval
5. **Demo** - Build graph, query with different inputs

### Mental Model

Think of the pattern graph like a library card catalog:
- 📇 **Entity cards** = Subject index cards (keywords)
- 📘 **Pattern cards** = Book location cards (strategies)
- 🔗 **Links** = "See also" references between cards
- 🔍 **Search** = Follow index cards to find relevant books

In [None]:
@dataclass
class PatternMatch:
    """Result of a pattern search."""
    pattern_id: str
    trigger: str
    strategy: str
    effectiveness: float
    relevance: float
    matched_via: list[str] = field(default_factory=list)

class PatternGraph:
    """Entity-pattern graph for intelligent response routing."""
    
    # Stopwords to filter during keyword extraction
    STOPWORDS = frozenset({
        # Articles, pronouns
        "a", "an", "the", "i", "me", "my", "we", "you", "your", "it", "this", "that",
        # Common verbs
        "is", "are", "was", "were", "be", "been", "have", "has", "had",
        "do", "does", "did", "will", "would", "can", "could", "should",
        # Prepositions, conjunctions
        "to", "of", "in", "for", "on", "with", "at", "by", "from",
        "and", "but", "or", "if", "as", "so",
        # Question words
        "what", "how", "why", "when", "where", "who", "which",
        # Action words
        "let", "lets", "about", "tell", "explain", "help", "show", "make",
    })
    
    def __init__(self):
        self._entities: dict[str, int] = {}  # entity -> access count
        self._patterns: dict[str, dict] = {}  # pattern_id -> pattern data
        self._entity_to_patterns: dict[str, list[tuple[str, float]]] = {}  # entity -> [(pattern_id, strength)]
        self._query_count = 0
    
    def add_entity(self, name: str) -> None:
        """Add an entity node."""
        name_lower = name.lower()
        if name_lower not in self._entities:
            self._entities[name_lower] = 0
        self._entities[name_lower] += 1
    
    def add_pattern(
        self,
        pattern_id: str,
        trigger: str,
        strategy: str,
        effectiveness: float = 0.7,
    ) -> None:
        """Add a pattern node."""
        self._patterns[pattern_id] = {
            "id": pattern_id,
            "trigger": trigger,
            "strategy": strategy,
            "effectiveness": effectiveness,
        }
    
    def link_entity_to_pattern(
        self,
        entity: str,
        pattern_id: str,
        strength: float = 0.8,
    ) -> None:
        """Link an entity to a pattern with given strength."""
        entity_lower = entity.lower()
        if entity_lower not in self._entity_to_patterns:
            self._entity_to_patterns[entity_lower] = []
        self._entity_to_patterns[entity_lower].append((pattern_id, strength))
    
    def find_patterns(
        self,
        keywords: list[str],
        limit: int = 10,
    ) -> list[PatternMatch]:
        """Find patterns matching keywords via graph traversal."""
        self._query_count += 1
        results: dict[str, PatternMatch] = {}
        
        for keyword in keywords:
            kw_lower = keyword.lower()
            if kw_lower in self._entity_to_patterns:
                for pattern_id, strength in self._entity_to_patterns[kw_lower]:
                    if pattern_id in self._patterns:
                        pattern = self._patterns[pattern_id]
                        score = strength * pattern["effectiveness"]
                        
                        if pattern_id not in results or score > results[pattern_id].relevance:
                            results[pattern_id] = PatternMatch(
                                pattern_id=pattern_id,
                                trigger=pattern["trigger"],
                                strategy=pattern["strategy"],
                                effectiveness=pattern["effectiveness"],
                                relevance=score,
                                matched_via=[kw_lower],
                            )
        
        sorted_results = sorted(results.values(), key=lambda m: -m.relevance)
        return sorted_results[:limit]
    
    def extract_keywords(self, text: str) -> list[str]:
        """Extract meaningful keywords from text."""
        words = re.findall(r"[a-zA-Z][a-zA-Z0-9_]*", text.lower())
        return [w for w in words if w not in self.STOPWORDS and len(w) >= 3]
    
    def stats(self) -> dict:
        return {
            "entities": len(self._entities),
            "patterns": len(self._patterns),
            "edges": sum(len(v) for v in self._entity_to_patterns.values()),
            "queries": self._query_count,
        }

print("PatternGraph defined with entity-pattern linking")

In [None]:
# Build pattern graph with sample patterns

graph = PatternGraph()

# Define patterns (simulating learned response strategies)
patterns = [
    {
        "id": "pat_deploy",
        "trigger": "How to deploy the application?",
        "strategy": "Use docker compose: `docker compose up -d && ./scripts/migrate.sh`",
        "keywords": ["deploy", "deployment", "docker", "application", "production"],
        "effectiveness": 0.9,
    },
    {
        "id": "pat_auth",
        "trigger": "How does authentication work?",
        "strategy": "JWT tokens via /api/v1/auth/login endpoint. Tokens expire in 24h.",
        "keywords": ["authentication", "auth", "login", "jwt", "token", "security"],
        "effectiveness": 0.85,
    },
    {
        "id": "pat_python",
        "trigger": "What is Python?",
        "strategy": "Python is a versatile, dynamically-typed language popular for web, data science, and automation.",
        "keywords": ["python", "programming", "language", "coding"],
        "effectiveness": 0.8,
    },
    {
        "id": "pat_ml",
        "trigger": "How to train a neural network?",
        "strategy": "Define model architecture, prepare data, set loss function and optimizer, iterate through epochs.",
        "keywords": ["neural", "network", "train", "machine", "learning", "deep", "model"],
        "effectiveness": 0.75,
    },
]

print("Building Pattern Graph:")
print("-" * 60)

for p in patterns:
    graph.add_pattern(p["id"], p["trigger"], p["strategy"], p["effectiveness"])
    for kw in p["keywords"]:
        graph.add_entity(kw)
        # Longer keywords get higher strength
        strength = 0.9 if len(kw) > 4 else 0.7
        graph.link_entity_to_pattern(kw, p["id"], strength)
    print(f"  Added: {p['trigger'][:40]}... ({len(p['keywords'])} keywords)")

print(f"\nGraph stats: {graph.stats()}")

In [None]:
# Query pattern graph with various queries

test_queries = [
    "How do I deploy my app to production?",
    "Tell me about Python programming",
    "JWT authentication setup",
    "deep learning training process",
    "What's the weather like?",  # No match expected
]

print("Pattern Graph Queries")
print("=" * 70)

for query in test_queries:
    keywords = graph.extract_keywords(query)
    matches = graph.find_patterns(keywords, limit=3)
    
    print(f"\nQuery: {query}")
    print(f"Keywords: {keywords}")
    
    if matches:
        for m in matches:
            print(f"  [{m.relevance:.3f}] {m.trigger[:45]}...")
            print(f"           Strategy: {m.strategy[:50]}...")
    else:
        print("  No patterns found")

---

## Section 8: Pattern Relevance Scoring

### The Concept

Once patterns are retrieved, we need to **rank** them by relevance to the query. **Pattern Scoring** combines multiple signals to produce a robust relevance score that's better than any single signal alone.

### Why This Matters for AI Agents

Each scoring signal has weaknesses:

| Signal Alone | Problem |
|--------------|---------|
| Embedding similarity | Misses keyword-level matches |
| Keyword overlap | Misses semantic similarity |
| Topic matching | Too coarse-grained |
| Structure matching | Ignores content |

Combined scoring is more robust because weaknesses of one signal are compensated by strengths of another.

### Algorithms Used

#### 1. Cosine Similarity (Semantic Matching)
```python
def cosine_similarity(a, b):
    dot = np.dot(a, b)
    return dot / (np.linalg.norm(a) * np.linalg.norm(b))
```
- Range: [-1, 1] → mapped to [0, 1]
- Captures semantic meaning via embeddings
- Weight: **40%** (highest because most robust)

#### 2. Jaccard Similarity (Keyword Overlap)
```python
def jaccard(set_a, set_b):
    intersection = len(set_a & set_b)
    union = len(set_a | set_b)
    return intersection / union
```
- Range: [0, 1]
- Captures exact keyword matches
- Weight: **15%**

#### 3. Topic Overlap (Long Word Matching)
```python
def topic_overlap(query_words, pattern_words):
    query_topics = {w for w in query_words if len(w) >= 5}
    pattern_topics = {w for w in pattern_words if len(w) >= 5}
    matches = query_topics & pattern_topics
    return min(1.0, len(matches) * 0.5)
```
- Long words (≥5 chars) are more likely to be meaningful topics
- Weight: **20%**

#### 4. Structure Similarity (Question Pattern Matching)
```python
PATTERNS = [r'^what is\b', r'^how to\b', r'^why\b', ...]

def structure_similarity(query, pattern_trigger):
    query_pattern = find_matching_pattern(query)
    trigger_pattern = find_matching_pattern(pattern_trigger)
    if query_pattern == trigger_pattern:
        return 1.0  # Same question type
    elif both_have_patterns:
        return 0.3  # Different question types
    return 0.0     # No structure match
```
- Matches question forms: "what is X" vs "how to Y"
- Weight: **10%**

#### 5. Orchestrator Keyword Boost
```python
def keyword_boost(boost_keywords, pattern_trigger):
    matches = sum(1 for kw in boost_keywords if kw in pattern_trigger)
    return min(1.0, matches * 0.3)
```
- External system can boost specific keywords
- Weight: **15%**

### The Combined Formula

```
relevance = 0.40 × semantic + 0.15 × keyword + 0.20 × topic + 0.10 × structure + 0.15 × boost
```

### What the Code Demonstrates

1. **PatternScorer** - Multi-factor scoring engine
2. **score()** - Computes all factors and weighted combination
3. **ScoredPattern** - Result with full breakdown
4. **Demo** - Score patterns with detailed factor visualization

### Mental Model

Think of pattern scoring like evaluating job candidates:
- 📊 **Semantic** = Overall qualifications (big picture fit)
- 🔑 **Keywords** = Required skills checklist
- 🎯 **Topics** = Domain expertise
- 📝 **Structure** = Communication style match
- ⭐ **Boost** = Referral bonus

In [None]:
@dataclass
class ScoredPattern:
    """Pattern with detailed relevance score breakdown."""
    pattern_id: str
    trigger: str
    strategy: str
    relevance: float
    
    # Score breakdown
    semantic_similarity: float
    keyword_overlap: float
    topic_overlap: float
    structure_similarity: float

class PatternScorer:
    """Multi-factor pattern relevance scoring."""
    
    QUESTION_PATTERNS = [
        r'^what is\b', r'^what are\b', r'^how to\b', r'^how do\b',
        r'^why\b', r'^explain\b', r'^describe\b', r'^tell me about\b',
    ]
    
    def __init__(
        self,
        weight_semantic: float = 0.40,
        weight_keyword: float = 0.15,
        weight_topic: float = 0.20,
        weight_structure: float = 0.10,
        weight_boost: float = 0.15,
    ):
        self.weight_semantic = weight_semantic
        self.weight_keyword = weight_keyword
        self.weight_topic = weight_topic
        self.weight_structure = weight_structure
        self.weight_boost = weight_boost
        self.stopwords = PatternGraph.STOPWORDS
    
    def score(
        self,
        query: str,
        pattern: dict,
        query_embedding: np.ndarray | None = None,
        pattern_embedding: np.ndarray | None = None,
        boost_keywords: list[str] | None = None,
    ) -> ScoredPattern:
        """Score a pattern for relevance to query."""
        trigger = pattern.get("trigger", "")
        strategy = pattern.get("strategy", "")
        
        # 1. Semantic similarity (embedding-based)
        if query_embedding is not None and pattern_embedding is not None:
            dot = np.dot(query_embedding, pattern_embedding)
            norm_q = np.linalg.norm(query_embedding)
            norm_p = np.linalg.norm(pattern_embedding)
            semantic = float(np.clip((dot / (norm_q * norm_p + 1e-8) + 1) / 2, 0, 1))
        else:
            semantic = 0.5
        
        # 2. Keyword overlap (Jaccard similarity)
        query_kw = self._extract_keywords(query)
        trigger_kw = self._extract_keywords(trigger)
        keyword_overlap = self._jaccard(query_kw, trigger_kw)
        
        # 3. Topic overlap (5+ char words for better signal)
        query_topics = {w for w in query_kw if len(w) >= 5}
        trigger_topics = {w for w in trigger_kw if len(w) >= 5}
        topic_intersection = query_topics & trigger_topics
        topic_overlap = min(1.0, len(topic_intersection) * 0.5) if topic_intersection else 0.0
        
        # 4. Structure similarity (question pattern matching)
        structure = self._structure_similarity(query.lower(), trigger.lower())
        
        # 5. Keyword boost from orchestrator
        boost = 0.0
        if boost_keywords:
            matches = sum(1 for kw in boost_keywords if kw.lower() in trigger.lower())
            boost = min(1.0, matches * 0.3)
        
        # Weighted combination
        relevance = (
            self.weight_semantic * semantic +
            self.weight_keyword * keyword_overlap +
            self.weight_topic * topic_overlap +
            self.weight_structure * structure +
            self.weight_boost * boost
        )
        
        return ScoredPattern(
            pattern_id=pattern.get("id", ""),
            trigger=trigger,
            strategy=strategy,
            relevance=relevance,
            semantic_similarity=semantic,
            keyword_overlap=keyword_overlap,
            topic_overlap=topic_overlap,
            structure_similarity=structure,
        )
    
    def _extract_keywords(self, text: str) -> set[str]:
        words = re.findall(r'\b[a-zA-Z]{3,}\b', text.lower())
        return {w for w in words if w not in self.stopwords}
    
    def _jaccard(self, set_a: set, set_b: set) -> float:
        if not set_a or not set_b:
            return 0.0
        intersection = len(set_a & set_b)
        union = len(set_a | set_b)
        return intersection / union if union > 0 else 0.0
    
    def _structure_similarity(self, query: str, trigger: str) -> float:
        query_pattern = None
        trigger_pattern = None
        
        for pattern in self.QUESTION_PATTERNS:
            if re.search(pattern, query):
                query_pattern = pattern
            if re.search(pattern, trigger):
                trigger_pattern = pattern
        
        if query_pattern and trigger_pattern:
            return 1.0 if query_pattern == trigger_pattern else 0.3
        return 0.0

pattern_scorer = PatternScorer()
print("PatternScorer ready with multi-factor scoring")

In [None]:
# Demo: Score patterns with detailed breakdown

query = "How do I deploy my application to production?"

# Generate query embedding
np.random.seed(hash(query) % 2**32)
query_emb = np.random.randn(384).astype(np.float32)
query_emb = query_emb / np.linalg.norm(query_emb)

print(f"Query: {query}")
print("=" * 70)

for p in patterns:
    # Generate pattern embedding
    np.random.seed(hash(p["trigger"]) % 2**32)
    pattern_emb = np.random.randn(384).astype(np.float32)
    pattern_emb = pattern_emb / np.linalg.norm(pattern_emb)
    
    scored = pattern_scorer.score(
        query=query,
        pattern=p,
        query_embedding=query_emb,
        pattern_embedding=pattern_emb,
        boost_keywords=["deploy", "production"],
    )
    
    print(f"\nPattern: {scored.trigger[:50]}...")
    print(f"  TOTAL RELEVANCE: {scored.relevance:.3f}")
    print(f"  Breakdown:")
    print(f"    Semantic:  {scored.semantic_similarity:.3f} (x{pattern_scorer.weight_semantic})")
    print(f"    Keyword:   {scored.keyword_overlap:.3f} (x{pattern_scorer.weight_keyword})")
    print(f"    Topic:     {scored.topic_overlap:.3f} (x{pattern_scorer.weight_topic})")
    print(f"    Structure: {scored.structure_similarity:.3f} (x{pattern_scorer.weight_structure})")

---

## Section 9: Mode Selection

### The Concept

**Mode Selection** determines HOW the agent should respond based on available context. Different modes have different costs, latencies, and capabilities.

### Why This Matters for AI Agents

Not every query needs the same response strategy:

| Query Type | Inefficient | Efficient |
|------------|-------------|-----------|
| "What is X?" (known pattern) | Full LLM reasoning | Direct pattern response |
| "Help me debug" (needs context) | Generic answer | Agent with memory retrieval |
| "Analyze and summarize all" | Single LLM call | Multi-step workflow |
| "Quick fact check" | Complex pipeline | Fast lookup + response |

### The Four Response Modes

| Mode | Use Case | Cost | Latency | Description |
|------|----------|------|---------|-------------|
| **FAST** | Simple queries with context | Low | <500ms | Single LLM call with retrieved memories |
| **AGENT** | Exploration needed | Medium | 1-10s | Multi-step ReAct agent with tool use |
| **PATTERN_DIRECT** | High-confidence pattern | Very Low | <100ms | Return pattern strategy without LLM |
| **WORKFLOW** | Complex multi-step tasks | High | 10s+ | DAG-based workflow execution |

### The Decision Algorithm

```python
def select_mode(query, memories, patterns):
    # Priority 1: Complex tasks → WORKFLOW
    if contains_complex_keywords(query):
        return WORKFLOW
    
    # Priority 2: High-confidence pattern → PATTERN_DIRECT
    if patterns and best_pattern.relevance >= 0.7:
        return PATTERN_DIRECT
    
    # Priority 3: Have memories → FAST
    if len(memories) >= 1:
        return FAST
    
    # Default: Need exploration → AGENT
    return AGENT
```

### Decision Factors

#### Complex Task Detection
```python
COMPLEX_KEYWORDS = [
    "analyze", "workflow", "process", "multiple", "steps",
    "first then", "summarize all", "compare all", "review all"
]

def is_complex(query):
    return any(kw in query.lower() for kw in COMPLEX_KEYWORDS)
```

#### Pattern Confidence Threshold
- Threshold: 0.7 (configurable)
- Above threshold: Pattern is reliable enough to use directly
- Below threshold: Pattern is suggestive but needs LLM verification

#### Memory Sufficiency
- Minimum: 1 memory (configurable)
- Presence of memories means we have context to answer
- No memories = need to explore (AGENT mode)

### What the Code Demonstrates

1. **ResponseMode** enum - The four available modes
2. **ModeDecision** - Result with mode, reason, and confidence
3. **ModeSelector** - Decision logic implementation
4. **Demo** - Different contexts trigger different modes

### Mental Model

Think of mode selection like choosing transportation:
- 🚀 **PATTERN_DIRECT** = Take the shortcut (you know the way)
- 🚗 **FAST** = Drive direct (have a map)
- 🗺️ **AGENT** = Need GPS navigation (exploring)
- 🚂 **WORKFLOW** = Multi-leg journey (complex trip)

In [None]:
class ResponseMode(str, Enum):
    """Available response modes."""
    FAST = "fast"               # Single LLM call with context
    AGENT = "agent"             # Multi-step ReAct agent
    PATTERN_DIRECT = "pattern"  # Direct pattern response (no LLM)
    WORKFLOW = "workflow"       # DAG-based workflow execution

@dataclass
class ModeDecision:
    """Result of mode selection."""
    mode: ResponseMode
    reason: str
    confidence: float = 1.0
    
    # Context flags
    has_memories: bool = False
    has_patterns: bool = False
    
    # For pattern_direct mode
    direct_response: str | None = None

class ModeSelector:
    """Selects optimal response mode based on available context."""
    
    COMPLEX_TASK_KEYWORDS = [
        "analyze", "workflow", "process", "multiple", "steps",
        "first then", "summarize all", "compare all", "review all",
    ]
    
    def __init__(
        self,
        min_memories_for_fast: int = 1,
        pattern_direct_threshold: float = 0.7,
    ):
        self.min_memories_for_fast = min_memories_for_fast
        self.pattern_direct_threshold = pattern_direct_threshold
    
    def select(
        self,
        query: str,
        memories: list | None = None,
        patterns: list[ScoredPattern] | None = None,
        conversation_turns: int = 0,
    ) -> ModeDecision:
        """Select optimal response mode."""
        memories = memories or []
        patterns = patterns or []
        
        query_lower = query.lower()
        has_memories = len(memories) >= self.min_memories_for_fast
        has_patterns = len(patterns) > 0
        
        # 1. Check for complex multi-step task
        is_complex = any(kw in query_lower for kw in self.COMPLEX_TASK_KEYWORDS)
        if is_complex:
            return ModeDecision(
                mode=ResponseMode.WORKFLOW,
                reason="complex_task_detected",
                confidence=0.8,
                has_memories=has_memories,
                has_patterns=has_patterns,
            )
        
        # 2. Check for high-confidence pattern
        if patterns:
            best = max(patterns, key=lambda p: p.relevance)
            if best.relevance >= self.pattern_direct_threshold:
                return ModeDecision(
                    mode=ResponseMode.PATTERN_DIRECT,
                    reason="high_relevance_pattern",
                    confidence=best.relevance,
                    has_memories=has_memories,
                    has_patterns=True,
                    direct_response=best.strategy,
                )
        
        # 3. Check for sufficient memories
        if has_memories:
            return ModeDecision(
                mode=ResponseMode.FAST,
                reason="memories_found",
                confidence=0.9,
                has_memories=True,
                has_patterns=has_patterns,
            )
        
        # 4. Default to agent mode (needs exploration)
        return ModeDecision(
            mode=ResponseMode.AGENT,
            reason="insufficient_context",
            confidence=0.6,
            has_memories=has_memories,
            has_patterns=has_patterns,
        )

mode_selector = ModeSelector()
print("ModeSelector ready!")

In [None]:
# Demo: Mode selection with different contexts

test_cases = [
    {
        "name": "Simple query with memories",
        "query": "What is Python?",
        "memories": sample_memories[:2],
        "patterns": [],
    },
    {
        "name": "Query with high-confidence pattern",
        "query": "How to deploy?",
        "memories": [],
        "patterns": [ScoredPattern("p1", "How to deploy?", "Use docker compose up -d", 0.85, 0.8, 0.7, 0.6, 1.0)],
    },
    {
        "name": "Complex multi-step task",
        "query": "Analyze all the data and then summarize the findings in multiple steps",
        "memories": sample_memories,
        "patterns": [],
    },
    {
        "name": "No context available",
        "query": "Random question about something obscure",
        "memories": [],
        "patterns": [],
    },
]

print("Mode Selection Demo")
print("=" * 70)

for case in test_cases:
    decision = mode_selector.select(
        query=case["query"],
        memories=case["memories"],
        patterns=case["patterns"],
    )
    
    print(f"\n{case['name']}:")
    print(f"  Query: {case['query'][:50]}...")
    print(f"  Context: {len(case['memories'])} memories, {len(case['patterns'])} patterns")
    print(f"  -> Mode: {decision.mode.value.upper()}")
    print(f"     Reason: {decision.reason}")
    print(f"     Confidence: {decision.confidence:.2f}")
    if decision.direct_response:
        print(f"     Direct response: {decision.direct_response[:40]}...")

---

## Section 9B: LLM-Driven Orchestration

### The Concept

In production HGM, the **Orchestrator** uses an LLM to make intelligent decisions that go beyond rule-based logic. The LLM provides:

1. **Contextual Temperature Adjustment** - Understanding *why* a memory is relevant, not just similarity scores
2. **Semantic Promotion Decisions** - Promoting memories based on reasoning, not just thresholds
3. **Dynamic Keyword Extraction** - Identifying important concepts the pattern graph might miss
4. **Mode Selection Reasoning** - Explaining why a particular response mode was chosen

### Why LLM-Driven Decisions Matter

Rule-based systems have limitations:

| Rule-Based | LLM-Driven |
|------------|------------|
| Fixed thresholds (temp > 0.7 → promote) | Contextual reasoning ("this is relevant because...") |
| Keyword matching only | Semantic understanding of concepts |
| Binary decisions | Nuanced confidence with explanations |
| Can't handle novel situations | Generalizes to new contexts |

### The Orchestrator Pattern

```
┌─────────────────────────────────────────────────────────────────────────────┐
│                           ORCHESTRATOR FLOW                                  │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│   USER QUERY                                                                 │
│       │                                                                      │
│       ▼                                                                      │
│   ┌─────────────────────────────────────────────┐                           │
│   │            INITIAL RETRIEVAL                 │                           │
│   │  • Pattern graph lookup                      │                           │
│   │  • Three-tier memory search                  │                           │
│   │  • Temperature scoring (rule-based)          │                           │
│   └─────────────────────────────────────────────┘                           │
│       │                                                                      │
│       ▼                                                                      │
│   ┌─────────────────────────────────────────────┐                           │
│   │         LLM ORCHESTRATOR ANALYSIS            │            │
│   │  • Evaluate memory relevance (0-1)           │                           │
│   │  • Suggest temperature adjustments           │                           │
│   │  • Extract additional keywords               │                           │
│   │  • Recommend response mode                   │                           │
│   │  • Provide reasoning                         │                           │
│   └─────────────────────────────────────────────┘                           │
│       │                                                                      │
│       ▼                                                                      │
│   ┌─────────────────────────────────────────────┐                           │
│   │         ADJUSTED DECISIONS                   │                           │
│   │  • Apply LLM temperature adjustments         │                           │
│   │  • Promote/demote based on LLM reasoning     │                           │
│   │  • Use LLM-selected mode                     │                           │
│   └─────────────────────────────────────────────┘                           │
│       │                                                                      │
│       ▼                                                                      │
│   RESPONSE GENERATION                                                        │
│                                                                              │
└─────────────────────────────────────────────────────────────────────────────┘
```

### What the Code Demonstrates

1. **OrchestratorLLM** - Interface for LLM-driven decisions (supports OpenAI, Anthropic, or mock)
2. **analyze_memories()** - LLM evaluates memory relevance with reasoning
3. **suggest_promotions()** - LLM recommends tier movements
4. **Temperature adjustment** - Combining rule-based scores with LLM judgment
5. **Demo** - See how LLM reasoning improves decisions

### Configuration

Set environment variables for real LLM integration:
```bash
export OPENAI_API_KEY="sk-..."        # For OpenAI
export ANTHROPIC_API_KEY="sk-ant-..." # For Anthropic
```

Or run without API keys to use the mock LLM (demonstrates the pattern without costs).

In [None]:
# LLM Orchestrator for intelligent decision-making
import os
import json as json_module
from abc import ABC, abstractmethod

# =============================================================================
# LLM INTERFACE
# =============================================================================

class BaseLLM(ABC):
    """Abstract base class for LLM providers."""
    
    @abstractmethod
    def complete(self, prompt: str, system: str = "") -> str:
        """Generate a completion from the LLM."""
        pass

class MockLLM(BaseLLM):
    """
    Mock LLM for demonstration without API keys.
    Uses heuristics to simulate LLM-like reasoning.
    """
    
    def complete(self, prompt: str, system: str = "") -> str:
        # Parse the prompt to understand what's being asked
        prompt_lower = prompt.lower()
        
        # Simulate memory relevance analysis
        if "evaluate the relevance" in prompt_lower or "analyze these memories" in prompt_lower:
            return self._mock_memory_analysis(prompt)
        
        # Simulate promotion suggestions
        if "suggest which memories should be promoted" in prompt_lower:
            return self._mock_promotion_suggestion(prompt)
        
        # Simulate mode selection
        if "select the best response mode" in prompt_lower:
            return self._mock_mode_selection(prompt)
        
        # Default response
        return json_module.dumps({"response": "Mock LLM response", "confidence": 0.5})
    
    def _mock_memory_analysis(self, prompt: str) -> str:
        """Simulate memory relevance analysis."""
        # Extract query topic from prompt (simplified heuristic)
        analyses = []
        
        # Look for memory content patterns in the prompt
        if "python" in prompt.lower():
            analyses.append({
                "memory_index": 0,
                "relevance": 0.85,
                "reasoning": "Directly discusses Python programming concepts",
                "suggested_temperature_adjustment": 0.15
            })
        if "deploy" in prompt.lower():
            analyses.append({
                "memory_index": 2,
                "relevance": 0.90,
                "reasoning": "Contains deployment instructions matching the query intent",
                "suggested_temperature_adjustment": 0.20
            })
        if "api" in prompt.lower() or "auth" in prompt.lower():
            analyses.append({
                "memory_index": 1,
                "relevance": 0.75,
                "reasoning": "Related to API and authentication topics",
                "suggested_temperature_adjustment": 0.10
            })
        
        # Default analysis if no specific matches
        if not analyses:
            analyses = [{
                "memory_index": 0,
                "relevance": 0.50,
                "reasoning": "Moderate topical relevance",
                "suggested_temperature_adjustment": 0.0
            }]
        
        return json_module.dumps({"analyses": analyses})
    
    def _mock_promotion_suggestion(self, prompt: str) -> str:
        """Simulate promotion suggestions."""
        suggestions = []
        
        if "deploy" in prompt.lower() and "cold" in prompt.lower():
            suggestions.append({
                "memory_id": "deployment_memory",
                "action": "promote_to_hot",
                "reasoning": "Deployment knowledge is critical for current query"
            })
        
        return json_module.dumps({"suggestions": suggestions})
    
    def _mock_mode_selection(self, prompt: str) -> str:
        """Simulate mode selection."""
        if "analyze" in prompt.lower() or "multiple" in prompt.lower():
            return json_module.dumps({
                "mode": "WORKFLOW",
                "confidence": 0.85,
                "reasoning": "Query requires multi-step analysis"
            })
        elif "pattern" in prompt.lower() and "high" in prompt.lower():
            return json_module.dumps({
                "mode": "PATTERN_DIRECT",
                "confidence": 0.90,
                "reasoning": "High-confidence pattern match available"
            })
        else:
            return json_module.dumps({
                "mode": "FAST",
                "confidence": 0.75,
                "reasoning": "Sufficient context available for direct response"
            })

class OpenAILLM(BaseLLM):
    """OpenAI API integration."""
    
    def __init__(self, model: str = "gpt-4o-mini"):
        self.model = model
        self.api_key = os.environ.get("OPENAI_API_KEY")
        if not self.api_key:
            raise ValueError("OPENAI_API_KEY not set")
    
    def complete(self, prompt: str, system: str = "") -> str:
        try:
            import openai
            client = openai.OpenAI(api_key=self.api_key)
            
            messages = []
            if system:
                messages.append({"role": "system", "content": system})
            messages.append({"role": "user", "content": prompt})
            
            response = client.chat.completions.create(
                model=self.model,
                messages=messages,
                temperature=0.1,  # Low temperature for consistent decisions
                response_format={"type": "json_object"}
            )
            return response.choices[0].message.content
        except Exception as e:
            print(f"OpenAI API error: {e}")
            return json_module.dumps({"error": str(e)})

class AnthropicLLM(BaseLLM):
    """Anthropic API integration."""
    
    def __init__(self, model: str = "claude-3-5-haiku-20241022"):
        self.model = model
        self.api_key = os.environ.get("ANTHROPIC_API_KEY")
        if not self.api_key:
            raise ValueError("ANTHROPIC_API_KEY not set")
    
    def complete(self, prompt: str, system: str = "") -> str:
        try:
            import anthropic
            client = anthropic.Anthropic(api_key=self.api_key)
            
            response = client.messages.create(
                model=self.model,
                max_tokens=1024,
                system=system or "You are a memory orchestration assistant. Respond in JSON format.",
                messages=[{"role": "user", "content": prompt}]
            )
            return response.content[0].text
        except Exception as e:
            print(f"Anthropic API error: {e}")
            return json_module.dumps({"error": str(e)})

def get_llm(provider: str = "auto") -> BaseLLM:
    """
    Get an LLM instance based on available API keys.
    
    Args:
        provider: "openai", "anthropic", "mock", or "auto" (try real APIs first)
    """
    if provider == "mock":
        return MockLLM()
    
    if provider == "auto":
        # Try OpenAI first
        if os.environ.get("OPENAI_API_KEY"):
            try:
                return OpenAILLM()
            except Exception:
                pass
        
        # Try Anthropic
        if os.environ.get("ANTHROPIC_API_KEY"):
            try:
                return AnthropicLLM()
            except Exception:
                pass
        
        # Fall back to mock
        print("No API keys found. Using MockLLM for demonstration.")
        return MockLLM()
    
    if provider == "openai":
        return OpenAILLM()
    elif provider == "anthropic":
        return AnthropicLLM()
    else:
        return MockLLM()

print("LLM providers defined: MockLLM, OpenAILLM, AnthropicLLM")

In [None]:
# =============================================================================
# LLM ORCHESTRATOR
# =============================================================================

@dataclass
class MemoryAnalysis:
    """LLM analysis of a memory's relevance."""
    memory_id: str
    original_temperature: float
    llm_relevance: float
    adjusted_temperature: float
    reasoning: str
    should_promote: bool
    should_demote: bool

@dataclass 
class OrchestratorDecision:
    """Complete orchestrator decision with LLM reasoning."""
    mode: ResponseMode
    mode_reasoning: str
    memory_analyses: list[MemoryAnalysis]
    extracted_keywords: list[str]
    confidence: float
    
class LLMOrchestrator:
    """
    Orchestrator that uses LLM reasoning for intelligent decisions.
    
    This mirrors the HGM Mind's orchestrator which combines rule-based
    retrieval with LLM-driven analysis for better decisions.
    """
    
    SYSTEM_PROMPT = """You are a memory orchestration system for an AI agent.
Your job is to analyze memories and make intelligent decisions about:
1. How relevant each memory is to the current query
2. Whether memories should be promoted (moved to faster storage) or demoted
3. What keywords/concepts are important in the query
4. What response mode the agent should use

Always respond in valid JSON format."""

    MEMORY_ANALYSIS_PROMPT = """Analyze the relevance of these memories to the user's query.

Query: {query}

Memories to analyze:
{memories}

For each memory, provide:
1. relevance: A score from 0.0 to 1.0
2. reasoning: Why this memory is or isn't relevant (1-2 sentences)
3. suggested_temperature_adjustment: How much to adjust the temperature (-0.3 to +0.3)

Respond in JSON format:
{{
    "analyses": [
        {{
            "memory_index": 0,
            "relevance": 0.85,
            "reasoning": "Directly addresses the deployment question",
            "suggested_temperature_adjustment": 0.15
        }}
    ],
    "extracted_keywords": ["deploy", "docker", "production"],
    "overall_assessment": "The memories contain relevant deployment information"
}}"""

    MODE_SELECTION_PROMPT = """Select the best response mode for this query.

Query: {query}

Available context:
- Memories found: {memory_count}
- Best pattern match: {best_pattern} (relevance: {pattern_relevance})
- Query complexity indicators: {complexity_indicators}

Available modes:
- FAST: Single LLM call with retrieved context (best for simple queries with good context)
- AGENT: Multi-step exploration (best when context is insufficient)
- PATTERN_DIRECT: Use pattern strategy directly (best for high-confidence pattern matches)
- WORKFLOW: Multi-step DAG execution (best for complex multi-part tasks)

Respond in JSON:
{{
    "mode": "FAST",
    "confidence": 0.85,
    "reasoning": "Sufficient context available and query is straightforward"
}}"""
    
    def __init__(self, llm: BaseLLM | None = None, temperature_scorer: TemperatureScorer | None = None):
        self.llm = llm or get_llm("auto")
        self.scorer = temperature_scorer or TemperatureScorer()
        self._is_mock = isinstance(self.llm, MockLLM)
    
    def analyze_memories(
        self,
        query: str,
        memories: list[Memory],
        focus_embedding: np.ndarray | None = None,
        focus_entities: set[str] | None = None,
    ) -> list[MemoryAnalysis]:
        """
        Use LLM to analyze memory relevance and suggest temperature adjustments.
        
        This combines rule-based temperature scoring with LLM reasoning.
        """
        analyses = []
        
        # First, compute rule-based temperatures
        for mem in memories:
            rule_temp = self.scorer.compute(
                memory=mem,
                focus_embedding=focus_embedding,
                focus_entities=focus_entities,
            )
            mem.temperature = rule_temp
        
        # Format memories for LLM
        memory_descriptions = []
        for i, mem in enumerate(memories):
            memory_descriptions.append(
                f"{i}. [{mem.memory_type.value}] (temp: {mem.temperature:.2f}) {mem.content[:100]}..."
            )
        
        # Get LLM analysis
        prompt = self.MEMORY_ANALYSIS_PROMPT.format(
            query=query,
            memories="\n".join(memory_descriptions)
        )
        
        try:
            response = self.llm.complete(prompt, self.SYSTEM_PROMPT)
            result = json_module.loads(response)
            
            for analysis in result.get("analyses", []):
                idx = analysis.get("memory_index", 0)
                if idx < len(memories):
                    mem = memories[idx]
                    llm_relevance = analysis.get("relevance", 0.5)
                    adjustment = analysis.get("suggested_temperature_adjustment", 0.0)
                    
                    # Combine rule-based and LLM temperatures (weighted average)
                    # LLM gets 40% weight, rule-based gets 60%
                    adjusted_temp = 0.6 * mem.temperature + 0.4 * llm_relevance
                    adjusted_temp = min(1.0, max(0.0, adjusted_temp + adjustment))
                    
                    analyses.append(MemoryAnalysis(
                        memory_id=mem.id,
                        original_temperature=mem.temperature,
                        llm_relevance=llm_relevance,
                        adjusted_temperature=adjusted_temp,
                        reasoning=analysis.get("reasoning", ""),
                        should_promote=adjusted_temp >= 0.70 and mem.temperature < 0.70,
                        should_demote=adjusted_temp < 0.50 and mem.temperature >= 0.50,
                    ))
        except Exception as e:
            print(f"LLM analysis error: {e}")
            # Fall back to rule-based only
            for mem in memories:
                analyses.append(MemoryAnalysis(
                    memory_id=mem.id,
                    original_temperature=mem.temperature,
                    llm_relevance=mem.temperature,
                    adjusted_temperature=mem.temperature,
                    reasoning="Rule-based scoring (LLM unavailable)",
                    should_promote=mem.temperature >= 0.70,
                    should_demote=mem.temperature < 0.50,
                ))
        
        return analyses
    
    def select_mode(
        self,
        query: str,
        memories: list[Memory],
        patterns: list[ScoredPattern],
    ) -> tuple[ResponseMode, str, float]:
        """
        Use LLM to select the optimal response mode with reasoning.
        """
        best_pattern = patterns[0] if patterns else None
        
        # Detect complexity indicators
        complexity_indicators = []
        query_lower = query.lower()
        if any(kw in query_lower for kw in ["analyze", "compare", "summarize all"]):
            complexity_indicators.append("analysis_required")
        if any(kw in query_lower for kw in ["step by step", "workflow", "process"]):
            complexity_indicators.append("multi_step")
        if len(query.split()) < 5:
            complexity_indicators.append("short_query")
        
        prompt = self.MODE_SELECTION_PROMPT.format(
            query=query,
            memory_count=len(memories),
            best_pattern=best_pattern.trigger[:50] if best_pattern else "None",
            pattern_relevance=f"{best_pattern.relevance:.2f}" if best_pattern else "N/A",
            complexity_indicators=", ".join(complexity_indicators) or "none"
        )
        
        try:
            response = self.llm.complete(prompt, self.SYSTEM_PROMPT)
            result = json_module.loads(response)
            
            mode_str = result.get("mode", "FAST").upper()
            mode_map = {
                "FAST": ResponseMode.FAST,
                "AGENT": ResponseMode.AGENT,
                "PATTERN_DIRECT": ResponseMode.PATTERN_DIRECT,
                "PATTERN": ResponseMode.PATTERN_DIRECT,
                "WORKFLOW": ResponseMode.WORKFLOW,
            }
            mode = mode_map.get(mode_str, ResponseMode.FAST)
            reasoning = result.get("reasoning", "")
            confidence = result.get("confidence", 0.5)
            
            return mode, reasoning, confidence
            
        except Exception as e:
            print(f"LLM mode selection error: {e}")
            # Fall back to rule-based
            if best_pattern and best_pattern.relevance >= 0.7:
                return ResponseMode.PATTERN_DIRECT, "High-confidence pattern (fallback)", 0.7
            elif memories:
                return ResponseMode.FAST, "Memories available (fallback)", 0.6
            else:
                return ResponseMode.AGENT, "Insufficient context (fallback)", 0.5
    
    def orchestrate(
        self,
        query: str,
        memories: list[Memory],
        patterns: list[ScoredPattern],
        focus_embedding: np.ndarray | None = None,
        focus_entities: set[str] | None = None,
    ) -> OrchestratorDecision:
        """
        Full orchestration: analyze memories, select mode, return decision.
        """
        # Analyze memories with LLM
        analyses = self.analyze_memories(query, memories, focus_embedding, focus_entities)
        
        # Select mode with LLM
        mode, mode_reasoning, confidence = self.select_mode(query, memories, patterns)
        
        # Extract keywords from analysis (would come from LLM in full implementation)
        keywords = list(focus_entities) if focus_entities else []
        
        return OrchestratorDecision(
            mode=mode,
            mode_reasoning=mode_reasoning,
            memory_analyses=analyses,
            extracted_keywords=keywords,
            confidence=confidence,
        )

# Create orchestrator (will auto-detect available LLM)
orchestrator = LLMOrchestrator()
print(f"LLMOrchestrator initialized (using {'MockLLM' if orchestrator._is_mock else 'Real LLM'})")

In [None]:
# =============================================================================
# DEMO: LLM-DRIVEN ORCHESTRATION
# =============================================================================

print("=" * 70)
print("LLM-DRIVEN ORCHESTRATION DEMO")
print("=" * 70)

# Test query
test_query = "How do I deploy my Python application to production?"

# Create focus context
np.random.seed(hash(test_query) % 2**32)
query_embedding = np.random.randn(384).astype(np.float32)
query_embedding = query_embedding / np.linalg.norm(query_embedding)
query_entities = {"deploy", "python", "production", "application"}

print(f"\nQuery: {test_query}")
print(f"Focus entities: {query_entities}")

# Get orchestrator decision
decision = orchestrator.orchestrate(
    query=test_query,
    memories=sample_memories,
    patterns=[],  # Would normally come from pattern graph
    focus_embedding=query_embedding,
    focus_entities=query_entities,
)

print("\n" + "-" * 70)
print("MEMORY ANALYSES (LLM-Enhanced)")
print("-" * 70)

for analysis in decision.memory_analyses:
    # Find the memory
    mem = next((m for m in sample_memories if m.id == analysis.memory_id), None)
    if mem:
        print(f"\n[{mem.memory_type.value.upper()}] {mem.content[:50]}...")
        print(f"  Original temperature: {analysis.original_temperature:.3f}")
        print(f"  LLM relevance score:  {analysis.llm_relevance:.3f}")
        print(f"  Adjusted temperature: {analysis.adjusted_temperature:.3f}")
        print(f"  Reasoning: {analysis.reasoning}")
        
        if analysis.should_promote:
            print(f"  → PROMOTE to hot tier")
        elif analysis.should_demote:
            print(f"  → DEMOTE from hot tier")

print("\n" + "-" * 70)
print("MODE SELECTION (LLM-Enhanced)")
print("-" * 70)
print(f"\nSelected mode: {decision.mode.value.upper()}")
print(f"Confidence: {decision.confidence:.2f}")
print(f"Reasoning: {decision.mode_reasoning}")

# Compare with rule-based decision
print("\n" + "-" * 70)
print("COMPARISON: Rule-Based vs LLM-Enhanced")
print("-" * 70)

rule_decision = mode_selector.select(
    query=test_query,
    memories=sample_memories,
    patterns=[],
)

print(f"\nRule-based mode:  {rule_decision.mode.value.upper()} ({rule_decision.reason})")
print(f"LLM-enhanced mode: {decision.mode.value.upper()} ({decision.mode_reasoning})")

# Show temperature adjustment impact
print("\n" + "-" * 70)
print("TEMPERATURE ADJUSTMENT IMPACT")
print("-" * 70)

print("\nMemory | Rule-Based | LLM-Adjusted | Delta")
print("-" * 50)
for analysis in decision.memory_analyses:
    mem = next((m for m in sample_memories if m.id == analysis.memory_id), None)
    if mem:
        delta = analysis.adjusted_temperature - analysis.original_temperature
        direction = "↑" if delta > 0 else "↓" if delta < 0 else "="
        print(f"{mem.memory_type.value[:8]:8} | {analysis.original_temperature:.3f}      | {analysis.adjusted_temperature:.3f}        | {direction} {abs(delta):.3f}")

---

## Section 9C: Agent Chat Continuity

### The Concept

Each LLM call is **stateless**—the model has no memory of previous interactions. So how do AI agents maintain coherent, continuous conversations?

The answer: **Store messages with labels** and reconstruct context from **hot tier memories + matched patterns**.

### Memory Labels

HGM uses three labels to categorize what's stored in hot memory:

| Label | What It Stores | Purpose |
|-------|----------------|--------|
| `USER_QUERY` | User's exact message | Track what user asked |
| `AGENT_THOUGHT` | Agent's reasoning/response | Track decisions made |
| `PATTERN` | Learned strategy reference | Apply known solutions |

### Why Labels Matter

| Without Labels | With Labels |
|---------------|------------|
| All memories treated equally | Context type preserved |
| Can't distinguish user vs agent | Clear conversation flow |
| Patterns mixed with chat | Strategies clearly marked |
| Harder to prioritize recall | Intelligent filtering |

### The Context Assembly Formula

For each user message, HGM assembles context with labels:

```
┌─────────────────────────────────────────────────────────────┐
│                    AGENT CONTEXT WINDOW                      │
├─────────────────────────────────────────────────────────────┤
│  [SYSTEM PROMPT]                                             │
│  [HOT MEMORIES]                                              │
│     • [USER_QUERY] "Deploy my Python app"                   │
│     • [AGENT_THOUGHT] "User needs containerization"         │
│     • [USER_QUERY] "Use Kubernetes"                         │
│     • [AGENT_THOUGHT] "Switching to K8s approach"           │
│  [MATCHED PATTERNS]                                          │
│     • [PATTERN] kubernetes_deployment → k8s manifest        │
│  [EPISODE CONTEXT] Topic: deployment, Entities: k8s, python │
│  [USER MESSAGE] Current query                                │
└─────────────────────────────────────────────────────────────┘
```

### Turn-Based Continuity Flow

```
TURN 1: User → "Help me deploy my app"
   ┌─ STORE ─────────────────────────────────────────────┐
   │  [USER_QUERY] "Help me deploy my app"               │
   │  [AGENT_THOUGHT] "User needs deployment guidance"   │
   │  [PATTERN] deployment → docker_strategy             │
   └─────────────────────────────────────────────────────┘
   Agent → "I'll help you deploy using Docker..."

TURN 2: User → "Use Kubernetes instead"
   ┌─ RECALL from HOT ───────────────────────────────────┐
   │  [USER_QUERY] "Help me deploy my app"  ← Turn 1     │
   │  [AGENT_THOUGHT] "User needs deployment guidance"   │
   └─────────────────────────────────────────────────────┘
   Agent knows we're discussing deployment!

TURN 3: User → "What about secrets?"  (no explicit K8s mention)
   ┌─ RECALL from HOT ───────────────────────────────────┐
   │  [USER_QUERY] "Use Kubernetes instead"  ← Turn 2    │
   │  [PATTERN] kubernetes → active context              │
   └─────────────────────────────────────────────────────┘
   Agent infers "secrets" = Kubernetes secrets!
```

### What the Code Demonstrates

1. **MemoryLabel enum** - Classification for context types
2. **Labeled storage** - Each memory tagged with its type
3. **Context assembly** - Labels enable intelligent recall
4. **Successive messages** - Short follow-ups without breaking context


In [None]:
# =============================================================================
# AGENT CHAT CONTINUITY WITH LABELS
# =============================================================================

class MemoryLabel(str, Enum):
    """
    Labels for categorizing memories in the hot tier.
    These enable intelligent context assembly and filtering.
    """
    USER_QUERY = "user_query"       # User's message
    AGENT_THOUGHT = "agent_thought" # Agent's reasoning/response
    PATTERN = "pattern"             # Matched strategy reference

@dataclass
class LabeledMemory:
    """A memory with an explicit label for context type."""
    label: MemoryLabel
    content: str
    turn_number: int
    timestamp: datetime
    embedding: np.ndarray = field(default_factory=lambda: np.zeros(384))
    temperature: float = 1.0
    
    def to_context_string(self) -> str:
        """Format for inclusion in LLM context."""
        return f"[{self.label.value.upper()}] {self.content}"

@dataclass
class ConversationTurn:
    """A single turn in a conversation with labeled memories."""
    turn_number: int
    user_message: str
    agent_response: str
    memories_recalled: list[str] = field(default_factory=list)
    patterns_matched: list[str] = field(default_factory=list)

class ChatAgent:
    """
    Chat agent demonstrating continuity via labeled hot memory + patterns.
    
    Key innovation: Each memory is labeled (USER_QUERY, AGENT_THOUGHT, PATTERN)
    enabling intelligent context reconstruction for successive messages.
    """
    
    def __init__(self, agent_id: str):
        self.agent_id = agent_id
        self.context = create_agent(agent_id, "assistant")
        self.hot_tier = HotTier(agent_id, max_tokens=4000)
        self.pattern_graph = graph  # Use existing pattern graph
        self.labeled_memories: list[LabeledMemory] = []
        self.conversation_history: list[ConversationTurn] = []
        self.turn_counter = 0
        self.scorer = TemperatureScorer()
    
    def store_labeled_memory(
        self,
        label: MemoryLabel,
        content: str,
        keywords: list[str] = None
    ) -> LabeledMemory:
        """
        Store a memory with an explicit label.
        This is the KEY function for labeled context management.
        """
        # Generate embedding
        np.random.seed(hash(content) % 2**32)
        embedding = np.random.randn(384).astype(np.float32)
        embedding = embedding / np.linalg.norm(embedding)
        
        # Create labeled memory
        labeled_mem = LabeledMemory(
            label=label,
            content=content,
            turn_number=self.turn_counter,
            timestamp=datetime.now(),
            embedding=embedding,
            temperature=1.0,  # Fresh memories are hot
        )
        self.labeled_memories.append(labeled_mem)
        
        # Also store in hot tier for fast retrieval
        hot_mem = create_memory(
            content=f"[{label.value.upper()}] {content}",
            memory_type=MemoryType.EPISODIC,
            hierarchy_path=f"conversation/{label.value}",
            entity_ids=keywords or [],
            hours_ago=0,
            access_count=1,
        )
        self.hot_tier.put(to_hot_memory(hot_mem))
        
        return labeled_mem
    
    def recall_labeled_context(self, query: str, limit: int = 8) -> list[LabeledMemory]:
        """
        Recall relevant labeled memories for context assembly.
        Returns memories sorted by relevance, preserving labels.
        """
        if not self.labeled_memories:
            return []
        
        # Generate query embedding
        np.random.seed(hash(query) % 2**32)
        query_emb = np.random.randn(384).astype(np.float32)
        query_emb = query_emb / np.linalg.norm(query_emb)
        
        # Score memories by recency + relevance
        scored = []
        for mem in self.labeled_memories:
            # Recency: recent memories score higher
            recency = 1.0 - (self.turn_counter - mem.turn_number) * 0.1
            recency = max(0.1, recency)
            
            # Relevance: cosine similarity to query
            relevance = float(np.dot(query_emb, mem.embedding))
            
            # Combined score
            score = 0.4 * recency + 0.6 * relevance
            scored.append((score, mem))
        
        # Sort by score descending, return top memories
        scored.sort(key=lambda x: x[0], reverse=True)
        return [mem for _, mem in scored[:limit]]
    
    def assemble_context(self, query: str) -> dict:
        """
        Assemble full context from labeled memories + patterns.
        This reconstructs conversation state for each turn.
        """
        keywords = self.pattern_graph.extract_keywords(query)
        
        # 1. Get labeled memories from hot tier
        labeled_mems = self.recall_labeled_context(query)
        
        # Separate by label for structured context
        user_queries = [m for m in labeled_mems if m.label == MemoryLabel.USER_QUERY]
        agent_thoughts = [m for m in labeled_mems if m.label == MemoryLabel.AGENT_THOUGHT]
        
        # 2. Get matched patterns
        patterns = self.pattern_graph.find_patterns(keywords, limit=3)
        pattern_strategies = [
            f"{p.trigger}: {p.strategy}" 
            for p in patterns if p.relevance > 0.5
        ]
        
        # 3. Build context string with labels
        context_items = []
        for mem in labeled_mems:
            context_items.append(mem.to_context_string())
        
        for strategy in pattern_strategies:
            context_items.append(f"[PATTERN] {strategy}")
        
        return {
            "labeled_memories": labeled_mems,
            "user_queries": [m.content for m in user_queries],
            "agent_thoughts": [m.content for m in agent_thoughts],
            "patterns": pattern_strategies,
            "keywords": keywords,
            "context_string": "\n".join(context_items),
        }
    
    def process_message(self, user_message: str) -> tuple[str, dict]:
        """
        Process a user message with full labeled context management.
        
        Steps:
        1. Assemble context from previous labeled memories
        2. Store user message as [USER_QUERY]
        3. Generate response (simulated)
        4. Store response as [AGENT_THOUGHT]
        5. Store any matched patterns as [PATTERN]
        """
        self.turn_counter += 1
        
        # Assemble context from previous turns
        context = self.assemble_context(user_message)
        keywords = context["keywords"]
        
        # Store user message with label
        self.store_labeled_memory(
            MemoryLabel.USER_QUERY,
            user_message,
            keywords
        )
        
        # Update agent focus
        np.random.seed(hash(user_message) % 2**32)
        focus_emb = np.random.randn(384).astype(np.float32)
        self.context.update_focus(
            embedding=focus_emb / np.linalg.norm(focus_emb),
            entities=set(keywords),
        )
        
        # Generate response (simulated LLM call)
        response, thought = self._generate_response(user_message, context)
        
        # Store agent thought with label
        self.store_labeled_memory(
            MemoryLabel.AGENT_THOUGHT,
            thought,
            keywords
        )
        
        # Store matched patterns as labeled memories
        for pattern in context["patterns"][:1]:  # Top pattern only
            self.store_labeled_memory(
                MemoryLabel.PATTERN,
                pattern,
                keywords
            )
        
        # Record conversation turn
        self.conversation_history.append(ConversationTurn(
            turn_number=self.turn_counter,
            user_message=user_message,
            agent_response=response,
            memories_recalled=[m.to_context_string() for m in context["labeled_memories"][:3]],
            patterns_matched=context["patterns"][:2],
        ))
        
        return response, context
    
    def _generate_response(self, query: str, context: dict) -> tuple[str, str]:
        """
        Simulate response generation. Returns (response, thought).
        In production, this calls the LLM with assembled context.
        """
        # Check for pattern-based response
        if context["patterns"]:
            strategy = context["patterns"][0].split(": ")[1] if ": " in context["patterns"][0] else context["patterns"][0]
            thought = f"Using pattern strategy: {strategy}"
            return f"Based on our approach: {strategy}", thought
        
        # Check for continuity from previous user queries
        if context["user_queries"]:
            prev_query = context["user_queries"][0]
            thought = f"Continuing from previous query about: {prev_query[:40]}"
            return f"Building on what we discussed: {prev_query[:50]}...", thought
        
        thought = "Starting new conversation thread"
        return "I'll help you with that. What aspect would you like to explore?", thought
    
    def show_memory_state(self):
        """Display current labeled memory state."""
        print(f"\n{'=' * 60}")
        print(f"LABELED MEMORY STATE (Turn {self.turn_counter})")
        print(f"{'=' * 60}")
        
        for label in MemoryLabel:
            mems = [m for m in self.labeled_memories if m.label == label]
            print(f"\n[{label.value.upper()}] ({len(mems)} memories)")
            for m in mems[-3:]:  # Show last 3 of each type
                print(f"  Turn {m.turn_number}: {m.content[:50]}..." if len(m.content) > 50 else f"  Turn {m.turn_number}: {m.content}")

# Create chat agent
chat_agent = ChatAgent("demo-chat-agent")
print("ChatAgent with labeled memory created")
print(f"Hot tier capacity: {chat_agent.hot_tier.max_tokens} tokens")
print(f"\nMemory labels available:")
for label in MemoryLabel:
    print(f"  • {label.value.upper()}: {label.name}")


In [None]:
# =============================================================================
# TURN-BASED CONTINUITY DEMO: SUCCESSIVE MESSAGES
# =============================================================================
# This demonstrates how users can send short follow-up messages
# without breaking context - the key benefit of labeled memories.

print("=" * 70)
print("TURN-BASED CONTINUITY DEMONSTRATION")
print("=" * 70)
print("\nWatch how the agent maintains context across successive short messages.")
print("Each message builds on previous context without explicit repetition.\n")

# Simulate a multi-turn conversation with short follow-ups
conversation = [
    "How do I deploy my Python application?",  # Turn 1: Initial context
    "Can you show me the Docker approach?",     # Turn 2: Builds on "deploy"
    "What about environment variables?",        # Turn 3: Short follow-up
    "And secrets?",                             # Turn 4: Very short - relies on context
    "Show me the full Dockerfile",              # Turn 5: Assumes Docker context
]

for i, message in enumerate(conversation, 1):
    print(f"\n{'─' * 70}")
    print(f"TURN {i}")
    print(f"{'─' * 70}")
    print(f"\n👤 User: {message}")
    
    response, context = chat_agent.process_message(message)
    
    print(f"\n🤖 Agent: {response}")
    
    print(f"\n📊 Context Assembly (with labels):")
    print(f"   Keywords extracted: {context['keywords']}")
    
    # Show recalled labeled memories
    if context['labeled_memories']:
        print(f"   \n   Recalled memories ({len(context['labeled_memories'])}):")
        for mem in context['labeled_memories'][:4]:
            label_str = mem.label.value.upper()
            content_preview = mem.content[:45] + "..." if len(mem.content) > 45 else mem.content
            print(f"      [{label_str}] Turn {mem.turn_number}: {content_preview}")
    
    # Show patterns
    if context['patterns']:
        print(f"   \n   Matched patterns:")
        for pat in context['patterns'][:2]:
            print(f"      [PATTERN] {pat[:55]}..." if len(pat) > 55 else f"      [PATTERN] {pat}")
    
    print(f"\n   Hot tier: {chat_agent.hot_tier.stats()['memory_count']} memories")

# Show final memory state
chat_agent.show_memory_state()

print("\n" + "=" * 70)
print("CONTINUITY ANALYSIS")
print("=" * 70)
print(f"\nTotal turns: {len(conversation)}")
print(f"Total labeled memories: {len(chat_agent.labeled_memories)}")
print(f"\nBreakdown by label:")
for label in MemoryLabel:
    count = len([m for m in chat_agent.labeled_memories if m.label == label])
    print(f"   {label.value.upper()}: {count}")

print("\n✅ Key Observations:")
print("   • Short messages like 'And secrets?' work because of context")
print("   • [USER_QUERY] labels track what user asked")
print("   • [AGENT_THOUGHT] labels preserve reasoning")
print("   • [PATTERN] labels apply learned strategies")
print("   • Agent doesn't need explicit topic repetition")


In [None]:
# =============================================================================
# CROSS-SESSION CONTINUITY (Pattern Persistence)
# =============================================================================
# Labels also enable cross-session memory:
# - [USER_QUERY] and [AGENT_THOUGHT] are session-scoped (hot tier)
# - [PATTERN] can persist across sessions (cold tier)

print("\n" + "=" * 70)
print("CROSS-SESSION CONTINUITY")
print("=" * 70)

print("""
WITHIN A SESSION:
  • Hot tier maintains [USER_QUERY] and [AGENT_THOUGHT] memories
  • Recent messages stay accessible via labeled recall
  • Context builds naturally turn by turn

ACROSS SESSIONS:
  • [PATTERN] memories persist in cold tier
  • Learned strategies survive session restart
  • User preferences become permanent patterns

EXAMPLE:
┌─ Session 1 ────────────────────────────────────────────────────┐
│  User: "I prefer kubectl over helm for Kubernetes"            │
│  ↓                                                             │
│  Store: [USER_QUERY] "I prefer kubectl over helm"             │
│  Store: [AGENT_THOUGHT] "User prefers CLI over helm charts"   │
│  Create: [PATTERN] kubernetes_preference → "use kubectl"      │
│  ↓                                                             │
│  Promote [PATTERN] to COLD tier for persistence               │
└────────────────────────────────────────────────────────────────┘

┌─ Session 2 (next day) ─────────────────────────────────────────┐
│  User: "How do I check my deployments?"                       │
│  ↓                                                             │
│  Recall from COLD: [PATTERN] kubernetes_preference → kubectl  │
│  ↓                                                             │
│  Agent response: "Use kubectl get deployments..."             │
│  (Agent remembers user prefers kubectl!)                       │
└────────────────────────────────────────────────────────────────┘
""")

# Demonstrate pattern creation for preference
print("Creating a preference pattern for cross-session persistence...")
graph.add_pattern(
    "pat_user_pref_kubectl",
    "User prefers kubectl",
    "When discussing Kubernetes, prefer kubectl commands over helm charts",
    effectiveness=0.9,
)
graph.add_entity("kubectl")
graph.add_entity("preference")
graph.link_entity_to_pattern("kubectl", "pat_user_pref_kubectl", 0.9)
graph.link_entity_to_pattern("preference", "pat_user_pref_kubectl", 0.7)

print("Pattern created: [PATTERN] kubernetes_preference")
print("This pattern will persist across sessions in cold tier.")
print(f"\nPattern graph now has {graph.stats()['patterns']} patterns")

print("\n" + "─" * 70)
print("KEY INSIGHT: Labels Enable Smart Persistence")
print("─" * 70)
print("""
  [USER_QUERY]    → Ephemeral (hot tier only)
  [AGENT_THOUGHT] → Ephemeral (hot tier only)  
  [PATTERN]       → Persistent (promoted to cold tier)

This separation allows the system to:
  ✓ Keep conversations fast (hot tier for recent context)
  ✓ Learn permanently (patterns persist in cold tier)
  ✓ Forget appropriately (old queries expire naturally)
""")


---

## Section 10: Complete Pipeline Demo

### The Concept

This section ties everything together into a **complete query processing pipeline**. Watch how a single query flows through all HGM components from input to response decision.

### The End-to-End Flow

```
                           USER QUERY
                               │
                               ▼
┌─────────────────────────────────────────────────────────────┐
│  STEP 1: KEYWORD EXTRACTION                                 │
│  ───────────────────────────                                │
│  Input:  "How do I deploy my Python application?"           │
│  Output: ["deploy", "python", "application"]                │
│  Algorithm: Regex tokenization + stopword filtering         │
└────────────────────────────────┬────────────────────────────┘
                                 │
                                 ▼
┌─────────────────────────────────────────────────────────────┐
│  STEP 2: PATTERN GRAPH LOOKUP                               │
│  ─────────────────────────────                              │
│  Input:  Keywords from Step 1                               │
│  Output: Pattern matches with relevance scores              │
│  Algorithm: Graph traversal via entity→pattern edges        │
└────────────────────────────────┬────────────────────────────┘
                                 │
                                 ▼
┌─────────────────────────────────────────────────────────────┐
│  STEP 3: MEMORY RECALL (Agentic RAG)                        │
│  ────────────────────────────────────                       │
│  Input:  Query embedding + focus entities                   │
│  Output: Relevant memories + tier movements                 │
│  Algorithm: Parallel tier search + temperature scoring      │
│             Automatic promotion/demotion                    │
└────────────────────────────────┬────────────────────────────┘
                                 │
                                 ▼
┌─────────────────────────────────────────────────────────────┐
│  STEP 4: PATTERN SCORING                                    │
│  ────────────────────────                                   │
│  Input:  Pattern matches + query embedding                  │
│  Output: Ranked patterns with detailed breakdowns           │
│  Algorithm: Multi-factor weighted scoring                   │
│             (semantic + keyword + topic + structure)        │
└────────────────────────────────┬────────────────────────────┘
                                 │
                                 ▼
┌─────────────────────────────────────────────────────────────┐
│  STEP 5: MODE SELECTION                                     │
│  ───────────────────────                                    │
│  Input:  Memories + scored patterns + query                 │
│  Output: Response mode (FAST/AGENT/PATTERN_DIRECT/WORKFLOW) │
│  Algorithm: Priority-based decision tree                    │
└────────────────────────────────┬────────────────────────────┘
                                 │
                                 ▼
                          RESPONSE GENERATION
                     (Mode-specific execution)
```

### Why This Architecture Works

1. **Parallel Paths**: Pattern lookup and memory recall can run simultaneously
2. **Progressive Filtering**: Each step reduces the search space
3. **Multi-Signal Decisions**: No single component makes the final call
4. **Adaptive Behavior**: Tier movements improve future queries

### What the Code Demonstrates

1. **HGMPipeline** class - Orchestrates all components
2. **process_query()** - Full trace through all steps
3. **Step-by-step output** - See each component's contribution
4. **Two example queries** - Different paths through the system

### Key Observations to Note

- **Same query, different context**: Results change based on agent focus
- **Promotion effects**: Relevant cold memories get promoted
- **Mode varies**: Simple queries → FAST, complex → WORKFLOW
- **Pattern shortcuts**: High-confidence patterns bypass LLM

In [None]:
class HGMPipeline:
    """Complete HGM query processing pipeline."""
    
    def __init__(self):
        self.graph = graph
        self.tiers = tiers
        self.pattern_scorer = pattern_scorer
        self.mode_selector = mode_selector
        self.temp_scorer = scorer
    
    def process_query(
        self,
        query: str,
        agent_context: AgentContext,
    ) -> dict:
        """Process a query through the full pipeline."""
        results = {"query": query, "steps": []}
        
        # Step 1: Extract keywords
        keywords = self.graph.extract_keywords(query)
        results["steps"].append({
            "step": "1. Keyword Extraction",
            "keywords": keywords,
        })
        
        # Step 2: Pattern graph lookup
        pattern_matches = self.graph.find_patterns(keywords, limit=5)
        results["steps"].append({
            "step": "2. Pattern Graph Lookup",
            "matches_found": len(pattern_matches),
            "top_pattern": pattern_matches[0].trigger[:40] if pattern_matches else None,
        })
        
        # Step 3: Memory recall with promotion
        np.random.seed(hash(query) % 2**32)
        query_emb = np.random.randn(384).astype(np.float32)
        query_emb = query_emb / np.linalg.norm(query_emb)
        
        recalled, placements = self.tiers.recall(
            query_embedding=query_emb,
            focus_entities=set(keywords),
            limit=5,
        )
        results["steps"].append({
            "step": "3. Memory Recall",
            "recalled": len(recalled),
            "promotions": sum(1 for p in placements if p.promoted),
            "demotions": sum(1 for p in placements if p.demoted),
        })
        
        # Step 4: Score patterns
        scored_patterns = []
        for pm in pattern_matches:
            pattern_dict = {"id": pm.pattern_id, "trigger": pm.trigger, "strategy": pm.strategy}
            np.random.seed(hash(pm.trigger) % 2**32)
            pattern_emb = np.random.randn(384).astype(np.float32)
            pattern_emb = pattern_emb / np.linalg.norm(pattern_emb)
            
            scored = self.pattern_scorer.score(
                query=query,
                pattern=pattern_dict,
                query_embedding=query_emb,
                pattern_embedding=pattern_emb,
            )
            scored_patterns.append(scored)
        
        results["steps"].append({
            "step": "4. Pattern Scoring",
            "scored": len(scored_patterns),
            "top_score": f"{max(p.relevance for p in scored_patterns):.3f}" if scored_patterns else "N/A",
        })
        
        # Step 5: Mode selection
        decision = self.mode_selector.select(
            query=query,
            memories=recalled,
            patterns=scored_patterns,
            conversation_turns=agent_context.turn_count,
        )
        results["steps"].append({
            "step": "5. Mode Selection",
            "mode": decision.mode.value,
            "reason": decision.reason,
            "confidence": f"{decision.confidence:.2f}",
        })
        
        # Update agent context
        agent_context.update_focus(
            embedding=query_emb,
            entities=set(keywords),
        )
        
        results["decision"] = decision
        return results

pipeline = HGMPipeline()
print("HGMPipeline ready!")

In [None]:
# Run complete pipeline with test queries

test_queries = [
    "How do I deploy my Python application?",
    "Tell me about machine learning training",
]

print("=" * 70)
print("FULL PIPELINE DEMONSTRATION")
print("=" * 70)

for query in test_queries:
    print(f"\n{'='*70}")
    print(f"Query: {query}")
    print("=" * 70)
    
    result = pipeline.process_query(query, researcher)
    
    for step in result["steps"]:
        print(f"\n{step['step']}:")
        for k, v in step.items():
            if k != "step":
                print(f"  {k}: {v}")
    
    print(f"\n>>> FINAL DECISION: {result['decision'].mode.value.upper()}")
    print(f"    Reason: {result['decision'].reason}")
    if result['decision'].direct_response:
        print(f"    Response: {result['decision'].direct_response[:60]}...")

---

## Section 11: Visualizations

### The Concept

Visualizations help us understand the **state of the system** at any point. Since this workshop avoids external dependencies, we use ASCII-based visualizations that work anywhere.

### What These Visualizations Show

#### 1. Temperature Distribution
See how memories are distributed across temperature zones:
- Which memories are "hot" (actively relevant)?
- Which are "cold" (archived)?
- How does the distribution change after queries?

#### 2. Tier Utilization
Monitor the three-tier memory system:
- Is the hot tier full? (May need larger budget)
- Are memories being promoted effectively?
- Is cold tier growing? (May need archival policy)

#### 3. Pattern Graph Structure
Understand the knowledge graph:
- How many entities and patterns exist?
- How connected is the graph?
- Which entities link to which patterns?

### Why Visualizations Matter

In production systems, these metrics help with:
- **Capacity planning**: Hot tier token budget sizing
- **Performance tuning**: Temperature threshold adjustment
- **Debugging**: Why did a memory not get retrieved?
- **Monitoring**: System health dashboards

In [None]:
# ASCII visualizations (no matplotlib needed)

def ascii_bar(value: float, width: int = 30, char: str = "#") -> str:
    """Create an ASCII progress bar."""
    filled = int(value * width)
    return f"[{char * filled}{' ' * (width - filled)}] {value:.1%}"

def visualize_temperatures(memories: list[Memory]):
    """Visualize temperature distribution."""
    print("\nTemperature Distribution:")
    print("-" * 60)
    
    for m in sorted(memories, key=lambda x: x.temperature, reverse=True):
        zone = scorer.get_zone(m.temperature)
        bar = ascii_bar(m.temperature, 20)
        print(f"  {zone.value:8} {bar} {m.content[:25]}...")

def visualize_tiers(tier_system: ThreeTierMemory):
    """Visualize tier utilization."""
    stats = tier_system.stats()
    total = stats["total"] or 1
    
    print("\nTier Utilization:")
    print("-" * 60)
    
    for tier_name in ["hot", "warm", "cold"]:
        count = stats[tier_name]
        pct = count / total
        bar = ascii_bar(pct, 20)
        print(f"  {tier_name.upper():5} {bar} ({count} memories)")

def visualize_pattern_graph(g: PatternGraph):
    """Visualize pattern graph structure."""
    stats = g.stats()
    
    print("\nPattern Graph Structure:")
    print("-" * 60)
    print(f"  Entities: {stats['entities']}")
    print(f"  Patterns: {stats['patterns']}")
    print(f"  Edges:    {stats['edges']}")
    print(f"  Queries:  {stats['queries']}")
    
    print("\n  Sample Entity -> Pattern Links:")
    for entity, links in list(g._entity_to_patterns.items())[:5]:
        pattern_ids = [pid for pid, _ in links]
        print(f"    {entity} -> {pattern_ids}")

# Run visualizations
visualize_temperatures(sample_memories)
visualize_tiers(tiers)
visualize_pattern_graph(graph)

---

## Section 12: Summary & Exercises

### Workshop Recap

You've learned the core concepts that power intelligent AI agent memory systems:

| Concept | What You Learned | Key Algorithm |
|---------|------------------|---------------|
| **Memory Types** | SEMANTIC, EPISODIC, PROCEDURAL, EMOTIONAL classification | Cognitive science-based categorization |
| **Agent State** | Focus tracking, episodes, sessions | Context engineering patterns |
| **Temperature** | 5-factor scoring for relevance | Weighted linear combination with exponential decay |
| **Hot Tier** | Working memory with token-based eviction | LRU eviction + vectorized similarity |
| **Agentic RAG** | Active recall with promotion/demotion | Three-tier architecture with temperature thresholds |
| **Pattern Graph** | Entity-pattern knowledge structure | Graph traversal with strength-weighted edges |
| **Pattern Scoring** | Multi-factor relevance computation | Cosine similarity + Jaccard + structure matching |
| **Mode Selection** | Response strategy routing | Priority-based decision tree |

### Key Formulas

**Temperature Scoring:**
```
temperature = 0.30 × recency + 0.15 × frequency + 0.35 × relevance + 0.15 × entity_overlap + 0.05 × agent_match

where:
  recency = 0.5^(hours_ago / 24)  // Exponential decay
  frequency = access_count / max_count
  relevance = (cosine_similarity + 1) / 2  // Normalized to [0,1]
  entity_overlap = |memory_entities ∩ focus_entities| / |focus_entities|
```

**Pattern Relevance Scoring:**
```
relevance = 0.40 × semantic + 0.15 × keyword + 0.20 × topic + 0.10 × structure + 0.15 × boost

where:
  semantic = cosine_similarity(query_emb, pattern_emb)
  keyword = jaccard(query_keywords, pattern_keywords)
  topic = min(1.0, |long_word_matches| × 0.5)
  structure = 1.0 if same_question_pattern else 0.3 if both_have_patterns else 0.0
```

### Architecture Benefits

| Benefit | How HGM Achieves It |
|---------|---------------------|
| **Speed** | Hot tier with SIMD-accelerated similarity (<1ms) |
| **Intelligence** | Pattern graph enables learned response strategies |
| **Adaptability** | Temperature-based self-organization |
| **Scalability** | Three tiers handle different access patterns |
| **Personalization** | Per-agent context and focus tracking |
| **Efficiency** | Mode selection minimizes unnecessary LLM calls |

### Comparison: HGM vs Traditional RAG

| Aspect | Traditional RAG | HGM |
|--------|-----------------|-----|
| Retrieval | Static vector search | Dynamic three-tier search |
| Organization | Fixed indexes | Self-organizing via temperature |
| Learning | None | Pattern graph grows from interactions |
| Routing | Always LLM | Mode selection (sometimes no LLM needed) |
| Context | Global | Per-agent focus tracking |
| Latency | Consistent | Tiered (hot=fast, cold=slower) |

### Further Reading
- **Cognitive Science**: Tulving's memory taxonomy, working memory models
- **Vector Search**: FAISS, Qdrant, pgvector for production embeddings
- **Graph Databases**: Neo4j, memgraph for scaled pattern graphs

In [None]:
# ============================================================================
# HANDS-ON EXERCISES
# ============================================================================
# These exercises will help you internalize the concepts by modifying and
# experimenting with the code. Uncomment and run each exercise.

print("""
╔══════════════════════════════════════════════════════════════════════════════╗
║                           WORKSHOP EXERCISES                                  ║
╚══════════════════════════════════════════════════════════════════════════════╝

EXERCISE 1: MEMORY TYPES AND TEMPERATURE
─────────────────────────────────────────
Goal: Understand how memory type affects retrieval in different contexts.

Task: Create a PROCEDURAL memory about error handling, then observe how its
temperature changes when you query with different focus entities.

Expected Observation: The memory should get higher temperature when focus
includes "error" or "handling" entities.

# Uncomment and run:
# error_memory = create_memory(
#     content="When encountering a connection error, first check network status, "
#             "then verify credentials, finally retry with exponential backoff.",
#     memory_type=MemoryType.PROCEDURAL,
#     hierarchy_path="ops/error_handling",
#     entity_ids=["error", "connection", "retry", "backoff"],
#     hours_ago=12,
#     access_count=3,
# )
#
# # Score with error-related focus
# np.random.seed(hash("error handling debugging") % 2**32)
# error_focus = np.random.randn(384).astype(np.float32)
# error_focus = error_focus / np.linalg.norm(error_focus)
# error_entities = {"error", "debugging", "fix", "handling"}
#
# temp_error = scorer.compute(error_memory, error_focus, error_entities)
# print(f"Temperature with error focus: {temp_error:.3f}")
#
# # Score with unrelated focus (cooking)
# np.random.seed(hash("cooking recipes food") % 2**32)
# cooking_focus = np.random.randn(384).astype(np.float32)
# cooking_focus = cooking_focus / np.linalg.norm(cooking_focus)
# cooking_entities = {"cooking", "recipe", "food"}
#
# temp_cooking = scorer.compute(error_memory, cooking_focus, cooking_entities)
# print(f"Temperature with cooking focus: {temp_cooking:.3f}")
# print(f"Difference: {temp_error - temp_cooking:.3f}")


EXERCISE 2: EPISODE BOUNDARIES AND TOPIC DRIFT
──────────────────────────────────────────────
Goal: Understand how episodes help track topic changes in conversations.

Task: Simulate a conversation that starts with ML, then switches to cooking.
Measure the entity drift between episodes.

Expected Observation: Large entity drift indicates topic change, which should
trigger a new episode.

# Uncomment and run:
# def entity_drift(entities_a: set, entities_b: set) -> float:
#     '''Measure how different two entity sets are (0 = same, 1 = completely different).'''
#     if not entities_a and not entities_b:
#         return 0.0
#     intersection = len(entities_a & entities_b)
#     union = len(entities_a | entities_b)
#     return 1.0 - (intersection / union) if union > 0 else 0.0
#
# # Episode 1: Machine Learning discussion
# ml_agent = create_agent("research-agent", "researcher")
# ml_agent.update_focus(
#     entities={"neural_network", "training", "gradient_descent", "pytorch"},
#     hierarchy_path="research/ml/deep_learning",
# )
# ml_agent.start_episode("Discussing neural network training")
# ml_entities = ml_agent.focus_entities.copy()
#
# # Episode 2: Cooking discussion
# cooking_entities = {"recipe", "ingredients", "cooking", "temperature"}
#
# # Measure drift
# drift = entity_drift(ml_entities, cooking_entities)
# print(f"Entity drift: {drift:.3f}")
# print(f"Should trigger new episode: {drift > 0.7}")  # Threshold for new episode


EXERCISE 3: PATTERN GRAPH EXTENSION
───────────────────────────────────
Goal: Learn how to add domain-specific patterns to the graph.

Task: Add patterns for a domain you're interested in (e.g., DevOps, data science,
cooking). Test that they can be retrieved via keyword matching.

# Uncomment and run:
# # Add your own pattern
# graph.add_pattern(
#     "pat_kubernetes",
#     "How to deploy to Kubernetes?",
#     "Use kubectl apply -f deployment.yaml or helm install for complex deployments.",
#     effectiveness=0.85,
# )
#
# # Link keywords
# for keyword in ["kubernetes", "k8s", "deploy", "kubectl", "helm", "pod"]:
#     graph.add_entity(keyword)
#     graph.link_entity_to_pattern(keyword, "pat_kubernetes", strength=0.85)
#
# # Test retrieval
# test_query = "How do I deploy my app to k8s?"
# keywords = graph.extract_keywords(test_query)
# matches = graph.find_patterns(keywords, limit=3)
# print(f"Query: {test_query}")
# print(f"Keywords: {keywords}")
# for m in matches:
#     print(f"  [{m.relevance:.3f}] {m.trigger}")


EXERCISE 4: TEMPERATURE WEIGHT TUNING
────────────────────────────────────
Goal: Understand how weight changes affect temperature computation.

Task: Experiment with different TemperatureConfig weights. What happens when
you increase weight_recency vs weight_relevance?

# Uncomment and run:
# # Default weights
# default_config = TemperatureConfig()
# default_scorer = TemperatureScorer(default_config)
#
# # Recency-heavy config (prioritizes recent memories)
# recency_config = TemperatureConfig(
#     weight_recency=0.50,  # Increased from 0.30
#     weight_frequency=0.10,
#     weight_relevance=0.25,  # Decreased from 0.35
#     weight_entity=0.10,
#     weight_agent=0.05,
# )
# recency_scorer = TemperatureScorer(recency_config)
#
# # Relevance-heavy config (prioritizes semantic match)
# relevance_config = TemperatureConfig(
#     weight_recency=0.15,  # Decreased
#     weight_frequency=0.10,
#     weight_relevance=0.55,  # Increased significantly
#     weight_entity=0.15,
#     weight_agent=0.05,
# )
# relevance_scorer = TemperatureScorer(relevance_config)
#
# # Compare on same memory
# test_memory = sample_memories[2]  # The old deployment memory
# print(f"Memory: {test_memory.content[:50]}...")
# print(f"Age: {(utcnow().timestamp() - test_memory.accessed_at) / 3600:.1f} hours")
# print()
# print(f"Default scorer:   {default_scorer.compute(test_memory, python_focus, python_entities):.3f}")
# print(f"Recency-heavy:    {recency_scorer.compute(test_memory, python_focus, python_entities):.3f}")
# print(f"Relevance-heavy:  {relevance_scorer.compute(test_memory, python_focus, python_entities):.3f}")


EXERCISE 5: MODE SELECTOR ENHANCEMENT
────────────────────────────────────
Goal: Understand mode selection by adding a new mode.

Task: Add a CLARIFICATION mode that triggers when the query is ambiguous
(e.g., very short, contains "?" but no clear topic).

# Uncomment and run:
# class EnhancedResponseMode(str, Enum):
#     FAST = "fast"
#     AGENT = "agent"
#     PATTERN_DIRECT = "pattern"
#     WORKFLOW = "workflow"
#     CLARIFICATION = "clarification"  # NEW!
#
# class EnhancedModeSelector(ModeSelector):
#     def select(self, query, memories=None, patterns=None, conversation_turns=0):
#         # Check for ambiguous queries first
#         if self._is_ambiguous(query):
#             return ModeDecision(
#                 mode=EnhancedResponseMode.CLARIFICATION,
#                 reason="ambiguous_query",
#                 confidence=0.7,
#                 has_memories=bool(memories),
#                 has_patterns=bool(patterns),
#             )
#         # Fall back to parent logic
#         return super().select(query, memories, patterns, conversation_turns)
#     
#     def _is_ambiguous(self, query: str) -> bool:
#         '''Detect ambiguous queries.'''
#         # Very short queries
#         if len(query.split()) <= 2:
#             return True
#         # Questions with no clear topic keywords
#         keywords = graph.extract_keywords(query)
#         if len(keywords) == 0:
#             return True
#         return False
#
# enhanced_selector = EnhancedModeSelector()
#
# test_queries = [
#     "Help?",  # Should trigger CLARIFICATION
#     "What?",  # Should trigger CLARIFICATION
#     "How do I deploy my Python app?",  # Should NOT trigger CLARIFICATION
# ]
#
# for q in test_queries:
#     decision = enhanced_selector.select(q, sample_memories[:1], [])
#     print(f"'{q}' -> {decision.mode.value} ({decision.reason})")

═══════════════════════════════════════════════════════════════════════════════
Uncomment the exercises above one at a time and run them to practice!
═══════════════════════════════════════════════════════════════════════════════
""")