# Module 3: Building AI Agents - Foundations Lab

## 🎯 Learning Objectives
By the end of this lab, you will:
1. **Understand agent orchestration** - How agents coordinate tools, memory, and decision-making
2. **Implement practical tools** - Content search and follow-up question generation
3. **Integrate episodic memory** - How agents remember and use conversation history
4. **Experience the difference** between single-step LLM applications and agentic applications

## 🏗️ What We're Building
A **Personal Assistant ChatBot Agent** for our course website that can:
- Search through course content to answer questions
- Generate thoughtful follow-up questions to enhance learning
- Remember conversation history for better context
- Make intelligent decisions about when to use which capabilities

## ⏱️ Lab Timeline (60 minutes)
- **Section 1**: Setup & Data Loading (10 min)
- **Section 2**: Tool Implementation (20 min) 
- **Section 3**: Agent Orchestration (15 min)
- **Section 4**: Memory Integration (15 min)

---

# Section 1: Setup & Data Loading (10 minutes)

First, let's set up our environment and understand how our course content was prepared for the agent.

In [None]:
# Install required packages (run this cell first)
!pip install boto3 faiss-cpu numpy pandas beautifulsoup4 html2text --quiet

print("✅ Packages installed successfully!")

In [1]:
# Import all required libraries
import json
import boto3
import numpy as np
import faiss
import pandas as pd
from datetime import datetime
from typing import List, Dict, Any, Optional
from dataclasses import dataclass
import os

print("📚 Libraries imported successfully!")

📚 Libraries imported successfully!


## 🔧 Configuration

Set up AWS Bedrock connection and file paths:

In [2]:
# Configuration
AWS_REGION = "us-west-2"  # Change if you prefer a different region
EMBEDDINGS_FILE = "../embeddings/course_embeddings.json"
EMBEDDING_MODEL = "amazon.titan-embed-text-v2:0"
LLM_MODEL = "anthropic.claude-3-5-sonnet-20241022-v2:0"

print(f"🌎 AWS Region: {AWS_REGION}")
print(f"📁 Embeddings file: {EMBEDDINGS_FILE}")
print(f"🧠 LLM Model: {LLM_MODEL}")

# Initialize AWS Bedrock client
try:
    bedrock_client = boto3.client("bedrock-runtime", region_name=AWS_REGION)
    print("✅ Connected to AWS Bedrock successfully!")
except Exception as e:
    print(f"❌ Failed to connect to AWS Bedrock: {e}")
    print("Please ensure your AWS credentials are configured correctly")

🌎 AWS Region: us-west-2
📁 Embeddings file: ../embeddings/course_embeddings.json
🧠 LLM Model: anthropic.claude-3-5-sonnet-20241022-v2:0
✅ Connected to AWS Bedrock successfully!


## 📊 Understanding Our Course Content Data

Before we build our agent, let's understand how our course content was prepared. The embeddings were created using the script at `../course_embeddings_generator.py`.

**How the embeddings were generated:**
1. **HTML Extraction**: Converted HTML pages to clean text using `html2text`
2. **Semantic Chunking**: Split content by `<section>` elements for logical units
3. **Vectorization**: Created embeddings using AWS Bedrock Titan Embeddings
4. **Storage**: Saved as structured JSON with metadata

Let's load and explore this data:

In [3]:
# Load the pre-generated embeddings
def load_course_embeddings(file_path: str) -> Dict[str, Any]:
    """Load embeddings and metadata from JSON file"""
    try:
        with open(file_path, 'r', encoding='utf-8') as f:
            data = json.load(f)
        print(f"✅ Loaded embeddings from {file_path}")
        return data
    except FileNotFoundError:
        print(f"❌ Embeddings file not found: {file_path}")
        print("Please ensure you've run the embedding generation script first")
        return {}
    except Exception as e:
        print(f"❌ Error loading embeddings: {e}")
        return {}

# Load the data
embeddings_data = load_course_embeddings(EMBEDDINGS_FILE)

if embeddings_data:
    metadata = embeddings_data['metadata']
    chunks = embeddings_data['chunks']
    
    print(f"\n📈 Content Statistics:")
    print(f"   📄 Files processed: {len(metadata['processed_files'])}")
    print(f"   📝 Content chunks: {metadata['chunk_count']}")
    print(f"   📊 Total words: {metadata['total_words']:,}")
    print(f"   🧠 Embedding dimension: {metadata['embedding_dimension']}")
    print(f"   🕐 Created: {metadata['created_at'][:19]}")
    
    print(f"\n📚 Sample content chunks:")
    for i, chunk in enumerate(chunks[:3]):
        print(f"   {i+1}. {chunk['title']} ({chunk['word_count']} words) - {chunk['source']}")

✅ Loaded embeddings from ../embeddings/course_embeddings.json

📈 Content Statistics:
   📄 Files processed: 6
   📝 Content chunks: 43
   📊 Total words: 18,800
   🧠 Embedding dimension: 1024
   🕐 Created: 2025-05-25T07:43:34

📚 Sample content chunks:
   1. Welcome to AI Foundations (38 words) - index.html
   2. Understanding Generative AI (358 words) - index.html
   3. The Evolution of Artificial Intelligence (277 words) - index.html


## 🔍 Creating the Search Index

Now let's create a FAISS (Facebook AI Similarity Search) index from our embeddings. This will enable fast semantic search over our course content.

In [4]:
def create_search_index(embeddings_data: Dict[str, Any]) -> tuple:
    """
    Create FAISS index for fast similarity search
    
    Returns:
        tuple: (faiss_index, chunks_list)
    """
    if not embeddings_data:
        return None, []
    
    chunks = embeddings_data['chunks']
    
    # Extract embeddings as numpy array
    embeddings_matrix = np.array([chunk['embedding'] for chunk in chunks], dtype=np.float32)
    
    # Create FAISS index (using Inner Product for cosine similarity)
    dimension = embeddings_matrix.shape[1]
    index = faiss.IndexFlatIP(dimension)  # Inner Product index
    
    # Normalize embeddings for cosine similarity
    faiss.normalize_L2(embeddings_matrix)
    
    # Add embeddings to index
    index.add(embeddings_matrix)
    
    print(f"✅ Created FAISS index with {index.ntotal} vectors")
    print(f"📐 Vector dimension: {dimension}")
    
    return index, chunks

# Create the search index
search_index, content_chunks = create_search_index(embeddings_data)

if search_index:
    print(f"\n🎯 Search index ready! We can now find relevant content for any query.")
else:
    print("❌ Failed to create search index")

✅ Created FAISS index with 43 vectors
📐 Vector dimension: 1024

🎯 Search index ready! We can now find relevant content for any query.


## 🧪 Quick Search Test

Let's test our search index with a simple query to make sure everything works:

In [5]:
def quick_search_test(query: str, top_k: int = 2):
    """Test the search functionality with a sample query"""
    
    if not search_index:
        print("❌ Search index not available")
        return
    
    # Create query embedding
    try:
        response = bedrock_client.invoke_model(
            modelId=EMBEDDING_MODEL,
            body=json.dumps({"inputText": query})
        )
        query_embedding = json.loads(response['body'].read())['embedding']
        
        # Convert to numpy and normalize
        query_vector = np.array([query_embedding], dtype=np.float32)
        faiss.normalize_L2(query_vector)
        
        # Search
        scores, indices = search_index.search(query_vector, top_k)
        
        print(f"🔍 Search results for: '{query}'")
        print("-" * 50)
        
        for i, (score, idx) in enumerate(zip(scores[0], indices[0])):
            chunk = content_chunks[idx]
            print(f"\n{i+1}. **{chunk['title']}** (Score: {score:.3f})")
            print(f"   Source: {chunk['source']}")
            print(f"   Preview: {chunk['content'][:150]}...")
        
    except Exception as e:
        print(f"❌ Search test failed: {e}")

# Test with a course-related query
quick_search_test("What are the key characteristics of large language models?")

🔍 Search results for: 'What are the key characteristics of large language models?'
--------------------------------------------------

1. **Module 1: Understanding Large Language Models** (Score: 0.685)
   Source: llm.html
   Preview: ## Module 1: Understanding Large Language Models

Large Language Models (LLMs) are sophisticated AI systems, trained on vast amounts of text data, tha...

2. **6. LLM Evolution & Architectural Advances** (Score: 0.561)
   Source: llm.html
   Preview: ## 6\. LLM Evolution & Architectural Advances

#### Early LLM Development (2017-2022)

The modern Large Language Model era began with the 2017 paper "...


**🎉 Section 1 Complete!**

You now have:
- ✅ Course content loaded and indexed for search
- ✅ Understanding of how embeddings enable semantic search  
- ✅ A working search system ready for agent integration

---

# Section 2: Tool Implementation (20 minutes)

Now let's build the two tools our agent will use. Remember: **tools are how agents interact with the world beyond just generating text.**

## 🛠️ Tool Design Principles

Good agent tools should:
1. **Do one job well** - Clear, focused purpose
2. **Have clean interfaces** - Easy for agents to understand and use
3. **Provide useful outputs** - Structured, informative results
4. **Handle errors gracefully** - Helpful error messages

## 🔍 Tool 1: Content Search

This tool searches our course content using semantic similarity. It's a **retrieval tool** - it finds and returns existing information.

In [6]:
@dataclass
class SearchResult:
    """Clean data structure for search results"""
    content: str
    title: str
    source: str
    relevance_score: float

def search_content_tool(query: str, max_results: int = 3) -> str:
    """
    Search course content using semantic similarity
    
    Args:
        query: The search query
        max_results: Maximum number of results to return
        
    Returns:
        Formatted search results as a string
    """
    
    if not search_index or not content_chunks:
        return "Error: Search index not available"
    
    if not query.strip():
        return "Error: Search query cannot be empty"
    
    try:
        # Create query embedding
        response = bedrock_client.invoke_model(
            modelId=EMBEDDING_MODEL,
            body=json.dumps({"inputText": query})
        )
        query_embedding = json.loads(response['body'].read())['embedding']
        
        # Convert to numpy and normalize for cosine similarity
        query_vector = np.array([query_embedding], dtype=np.float32)
        faiss.normalize_L2(query_vector)
        
        # Search for similar content
        scores, indices = search_index.search(query_vector, max_results)
        
        # Format results
        results = []
        for score, idx in zip(scores[0], indices[0]):
            if score > 0.3:  # Only include reasonably relevant results
                chunk = content_chunks[idx]
                results.append(SearchResult(
                    content=chunk['content'][:500] + "..." if len(chunk['content']) > 500 else chunk['content'],
                    title=chunk['title'],
                    source=chunk['source'],
                    relevance_score=float(score)
                ))
        
        if not results:
            return f"No relevant content found for query: '{query}'"
        
        # Format output for the agent
        output = f"Found {len(results)} relevant content sections for '{query}':\n\n"
        
        for i, result in enumerate(results, 1):
            output += f"{i}. **{result.title}** (Relevance: {result.relevance_score:.3f})\n"
            output += f"   Source: {result.source}\n"
            output += f"   Content: {result.content}\n\n"
        
        return output
        
    except Exception as e:
        return f"Error searching content: {str(e)}"

print("✅ Content search tool implemented!")

✅ Content search tool implemented!


### 🧪 Test Tool 1: Content Search

In [7]:
# Test the content search tool
print("🔍 Testing Content Search Tool\n")

test_query = "What is prompt engineering?"
search_result = search_content_tool(test_query, max_results=2)

print(f"Query: {test_query}")
print("=" * 50)
print(search_result)

🔍 Testing Content Search Tool

Query: What is prompt engineering?
Found 2 relevant content sections for 'What is prompt engineering?':

1. **1. Prompt Engineering Overview** (Relevance: 0.620)
   Source: prompts.html
   Content: ## 1\. Prompt Engineering Overview

Input (Prompt) AI Model (Processing) Output (Response)

### 1.1 What are Prompts?

A **prompt** is the input you provide to an AI system to elicit a specific output. Think of it as the interface between human intent and AI capability—they're how we communicate what we want the model to do.

In technical terms, a **prompt is a sequence of tokens (words, characters, or subwords) that provides context and instructions** to a language model.

**Simple Prompt:** "W...

2. **3. Prompt Engineering Techniques** (Relevance: 0.549)
   Source: prompts.html
   Content: ## 3\. Prompt Engineering Techniques

Beyond fundamental principles, prompt engineering includes specialized techniques that can significantly enhance model performance fo

## 🤔 Tool 2: Follow-up Question Generator

This tool generates relevant follow-up questions to enhance learning. It's a **generative tool** - it creates new content using an LLM. This demonstrates the **"LLM-as-tool"** pattern.

In [20]:
def generate_followup_questions_tool(current_topic: str, conversation_context: str = "") -> str:
    """
    Generate relevant follow-up questions to enhance learning
    
    Args:
        current_topic: The topic being discussed
        conversation_context: Recent conversation for context
        
    Returns:
        Formatted follow-up questions
    """
    
    if not current_topic.strip():
        return "Error: Topic cannot be empty for question generation"
    
    try:
        # Create prompt for follow-up question generation
        prompt = f"""You are an educational assistant helping students explore Generative AI concepts more deeply.

Based on the current topic '{current_topic}' and this conversation context:
{conversation_context}

Generate 3 thoughtful follow-up questions that would help a student:
1. Deepen their understanding of this topic
2. Connect it to other course concepts (LLMs, prompt engineering, agents)
3. Apply it practically or think about real-world implications

Make the questions specific, engaging, and educational. Format as:
🤔 Question 1: ...
🤔 Question 2: ...
🤔 Question 3: ...

Only return the questions, nothing else."""

        # Create the request body
        request_body = {
            "anthropic_version": "bedrock-2023-05-31",
            "max_tokens": 300,
            "messages": [
                {
                    "role": "user",
                    "content": prompt
                }
            ],
            "temperature": 0.7
        }

        # Call Claude for question generation
        response = bedrock_client.invoke_model(
            modelId=LLM_MODEL,
            body=json.dumps(request_body),
            contentType='application/json'
        )
        
        response_body = json.loads(response['body'].read())
        questions = response_body['content'][0]['text'].strip()
        
        return questions
        
    except Exception as e:
        return f"Error generating follow-up questions: {str(e)}"

print("✅ Follow-up question generator tool implemented!")

✅ Follow-up question generator tool implemented!


### 🧪 Test Tool 2: Follow-up Question Generator

In [21]:
# Test the follow-up question generator
print("🤔 Testing Follow-up Question Generator\n")

test_topic = "agent memory systems"
test_context = "We just discussed how agents use working memory and episodic memory"

followup_questions = generate_followup_questions_tool(test_topic, test_context)

print(f"Topic: {test_topic}")
print(f"Context: {test_context}")
print("=" * 50)
print(followup_questions)

🤔 Testing Follow-up Question Generator

Topic: agent memory systems
Context: We just discussed how agents use working memory and episodic memory
🤔 Question 1: How does the interplay between working memory and episodic memory in AI agents differ from traditional computer memory systems, and what unique challenges arise when trying to implement human-like memory mechanisms in artificial systems?

🤔 Question 2: In what ways could advanced prompt engineering techniques be used to better simulate episodic memory in LLMs, and how might this improve an agent's ability to maintain context consistency across long conversations?

🤔 Question 3: Consider a real-world AI assistant helping multiple users throughout a day - what are the ethical and practical implications of implementing different types of memory systems, especially regarding privacy and information retention?


## 🔬 Tool Comparison Exercise

Let's compare our two tools to understand different tool patterns:

In [12]:
print("🔬 Tool Pattern Analysis\n")

comparison_data = {
    "Aspect": ["Purpose", "Pattern", "Input", "Processing", "Output", "Deterministic"],
    "Content Search": [
        "Find existing information",
        "Retrieval tool", 
        "Query string",
        "Vector similarity search",
        "Existing content + sources",
        "Yes (same query = same results)"
    ],
    "Follow-up Generator": [
        "Create new learning content",
        "Generative tool (LLM-as-tool)",
        "Topic + context", 
        "LLM reasoning + generation",
        "New questions tailored to topic",
        "No (creative variation each time)"
    ]
}

df = pd.DataFrame(comparison_data)
print(df.to_string(index=False))

print("\n💡 Key Insight: Agents become powerful by combining different types of tools!")
print("   - Retrieval tools provide accurate, factual information")
print("   - Generative tools provide creative, contextual assistance")

🔬 Tool Pattern Analysis

       Aspect                  Content Search               Follow-up Generator
      Purpose       Find existing information       Create new learning content
      Pattern                  Retrieval tool     Generative tool (LLM-as-tool)
        Input                    Query string                   Topic + context
   Processing        Vector similarity search        LLM reasoning + generation
       Output      Existing content + sources   New questions tailored to topic
Deterministic Yes (same query = same results) No (creative variation each time)

💡 Key Insight: Agents become powerful by combining different types of tools!
   - Retrieval tools provide accurate, factual information
   - Generative tools provide creative, contextual assistance


**🎉 Section 2 Complete!**

You now have:
- ✅ Two complementary tools with different patterns (retrieval + generative)
- ✅ Understanding of clean tool interface design
- ✅ Experience with LLM-as-tool patterns
- ✅ Working tools ready for agent integration

---

# Section 3: Agent Orchestration (15 minutes)

Now for the exciting part: building an **agent** that intelligently orchestrates our tools! 

## 🧠 What Makes This an Agent?

Unlike a simple LLM application, our agent will:
1. **Decide** which tools to use based on the user's request
2. **Coordinate** multiple tools in sequence when needed
3. **Reason** about when to offer follow-up questions
4. **Adapt** its behavior based on context

This is **orchestration** - the agent acts as the conductor of an orchestra of tools.

In [22]:
class CourseAssistantAgent:
    """
    A personal assistant agent for course content
    
    This agent demonstrates the core pattern of agentic applications:
    - Observe: Analyze user input and current context
    - Plan: Decide which tools to use and in what order
    - Act: Execute the plan and generate a response
    """
    
    def __init__(self):
        # Tool registry - the agent's available capabilities
        self.tools = {
            "search_content": search_content_tool,
            "generate_followup_questions": generate_followup_questions_tool
        }
        
        # Initialize without memory for now (we'll add this in Section 4)
        self.memory = None
        
        print("🤖 Course Assistant Agent initialized!")
        print(f"   Available tools: {list(self.tools.keys())}")
    
    def _call_llm(self, prompt: str, max_tokens: int = 500, temperature: float = 0.1) -> str:
        """Helper method to call Claude via Bedrock"""
        try:
            # Create the request body
            request_body = {
                "anthropic_version": "bedrock-2023-05-31",
                "max_tokens": max_tokens,
                "messages": [
                    {
                        "role": "user",
                        "content": prompt
                    }
                ],
                "temperature": temperature
            }

            # Call Claude for question generation
            response = bedrock_client.invoke_model(
                modelId=LLM_MODEL,
                body=json.dumps(request_body),
                contentType='application/json'
            )
            
            response_body = json.loads(response['body'].read())
            return response_body['content'][0]['text'].strip()
        except Exception as e:
            return f"Error calling LLM: {str(e)}"
    
    def _should_offer_followup_questions(self, user_input: str, response: str) -> bool:
        """Decide if we should offer follow-up questions"""
        
        # Check if user explicitly wants follow-up questions
        followup_triggers = [
            "follow up", "what else", "what next", "more questions", 
            "dig deeper", "learn more", "what should i know"
        ]
        
        if any(trigger in user_input.lower() for trigger in followup_triggers):
            return True
        
        # For now, keep it simple - don't auto-offer 
        # (In a more sophisticated agent, we could use LLM reasoning here)
        return False
    
    def decide_and_act(self, user_input: str) -> str:
        """
        The main agent decision cycle: Observe → Plan → Act
        
        Args:
            user_input: The user's question or request
            
        Returns:
            Agent's response
        """
        
        if not user_input.strip():
            return "I'd be happy to help! Please ask me a question about the course content."
        
        # OBSERVE: Analyze the user's input
        print(f"🔍 Agent observing: '{user_input}'")
        
        # PLAN: Decide what to do
        followup_triggers = ["follow up", "what else", "what next", "more questions"]
        wants_followup = any(trigger in user_input.lower() for trigger in followup_triggers)
        
        if wants_followup:
            print("📋 Agent planning: User wants follow-up questions, will generate them")
            # Extract topic from recent context or ask for clarification
            topic = "the current topic we've been discussing"
            result = self.tools["generate_followup_questions"](topic, user_input)
            return f"Here are some follow-up questions to explore further:\n\n{result}"
        
        # Default plan: Search for content first, then respond
        print("📋 Agent planning: Will search content and provide comprehensive response")
        
        # ACT: Execute the plan
        
        # Step 1: Search for relevant content
        print("⚡ Agent acting: Searching course content...")
        search_results = self.tools["search_content"](user_input, max_results=3)
        
        # Step 2: Generate response using search results
        print("⚡ Agent acting: Generating response with found content...")
        response_prompt = f"""You are a helpful course assistant for an AI/ML education program.

A student asked: "{user_input}"

Here's relevant content from the course materials:
{search_results}

Based on this content, provide a clear, helpful answer to the student's question. 
Focus on the key concepts and make it educational. If the search results don't 
contain relevant information, say so politely and offer to help with other topics.

Keep your response concise but complete (2-3 paragraphs maximum)."""
        
        main_response = self._call_llm(response_prompt, max_tokens=400)
        
        # Step 3: Decide if we should offer follow-up questions
        if self._should_offer_followup_questions(user_input, main_response):
            print("⚡ Agent acting: Adding follow-up questions...")
            followup_questions = self.tools["generate_followup_questions"](user_input)
            return f"{main_response}\n\n---\n💡 **Want to explore further?**\n{followup_questions}"
        else:
            # Subtly offer follow-up option without being pushy
            return f"{main_response}\n\n💡 *Want me to suggest follow-up questions? Just ask!*"

# Create our agent
agent = CourseAssistantAgent()
print("\n✅ Agent ready for testing!")

🤖 Course Assistant Agent initialized!
   Available tools: ['search_content', 'generate_followup_questions']

✅ Agent ready for testing!


## 🧪 Test the Agent: Basic Interaction

In [None]:
print("🤖 Testing Agent - Basic Question\n")

user_question = "What are the main differences between LLMs and AI agents?"
print(f"User: {user_question}")
print("\n" + "="*60 + "\n")

response = agent.decide_and_act(user_question)
print(f"Agent: {response}")

## 🧪 Test the Agent: Follow-up Request

In [None]:
print("🤖 Testing Agent - Follow-up Questions\n")

followup_request = "What else should I know about prompt engineering?"
print(f"User: {followup_request}")
print("\n" + "="*60 + "\n")

response = agent.decide_and_act(followup_request)
print(f"Agent: {response}")

## 🎯 Interactive Demo: Try Your Own Questions!

Now you can interact with the agent directly. Try different types of questions to see how it chooses tools:

In [None]:
def interactive_agent_demo():
    """Interactive demo for testing the agent"""
    print("🎮 Interactive Agent Demo")
    print("=" * 40)
    print("Ask questions about:")
    print("  • LLM fundamentals")
    print("  • Prompt engineering techniques")
    print("  • Agent concepts and memory")
    print("  • Or request 'follow up questions' or 'what else should I know?'")
    print("\nType 'quit' to exit\n")
    
    while True:
        user_input = input("You: ").strip()
        
        if user_input.lower() in ['quit', 'exit', 'stop']:
            print("👋 Thanks for testing the agent!")
            break
        
        if not user_input:
            continue
        
        print(f"\nAgent: {agent.decide_and_act(user_input)}")
        print("\n" + "-"*60 + "\n")

# Uncomment the line below to run the interactive demo
# interactive_agent_demo()

print("💡 Uncomment the line above to try the interactive demo!")
print("   Or run this cell and test specific questions in the cells below.")

**🔍 Understanding Agent Decision-Making**

Notice how the agent:
1. **Analyzes** the user's input (Observe)
2. **Decides** which tools to use (Plan)
3. **Coordinates** tool execution (Act)
4. **Adapts** behavior based on context

This is the core difference between agents and simple LLM applications!

**🎉 Section 3 Complete!**

You now have:
- ✅ A working agent that orchestrates multiple tools
- ✅ Understanding of the agent decision cycle (Observe → Plan → Act)
- ✅ Experience with tool coordination and context-aware responses
- ✅ A foundation ready for memory integration

---

# Section 4: Memory Integration (15 minutes)

The final piece: giving our agent **memory**! This transforms it from a stateless system into one that can maintain context across multiple interactions.

## 🧠 Why Memory Matters

Without memory, each interaction is isolated:
- ❌ "What did we just discuss?"
- ❌ "Can you elaborate on that previous point?"
- ❌ Learning from past mistakes or preferences

With memory, agents become context-aware:
- ✅ Remembers conversation history
- ✅ Builds on previous discussions
- ✅ Provides continuity across interactions

## 💾 Implementing Episodic Memory

We'll implement simple **episodic memory** - storing and retrieving conversation history.

In [None]:
class EpisodicMemory:
    """
    Simple episodic memory for storing conversation history
    
    In production, this might use a database, but for learning
    purposes, we'll use in-memory storage.
    """
    
    def __init__(self):
        self.conversations = []  # List of conversation turns
        print("🧠 Episodic memory initialized")
    
    def store_interaction(self, user_input: str, agent_response: str):
        """
        Store a conversation turn
        
        Args:
            user_input: What the user said
            agent_response: How the agent responded
        """
        interaction = {
            "timestamp": datetime.now().isoformat(),
            "user": user_input,
            "agent": agent_response,
            "turn_number": len(self.conversations) + 1
        }
        
        self.conversations.append(interaction)
        print(f"💾 Stored interaction #{interaction['turn_number']}")
    
    def get_recent_context(self, max_turns: int = 3) -> str:
        """
        Get recent conversation history for context
        
        Args:
            max_turns: Maximum number of recent turns to include
            
        Returns:
            Formatted conversation context
        """
        if not self.conversations:
            return "No previous conversation history."
        
        # Get the most recent turns
        recent = self.conversations[-max_turns:]
        
        context = "Recent conversation history:\n"
        for turn in recent:
            # Truncate long responses for context
            agent_preview = turn['agent'][:100] + "..." if len(turn['agent']) > 100 else turn['agent']
            context += f"Turn {turn['turn_number']} - User: {turn['user']}\n"
            context += f"Turn {turn['turn_number']} - Agent: {agent_preview}\n\n"
        
        return context
    
    def get_conversation_summary(self) -> str:
        """Get a summary of the entire conversation"""
        if not self.conversations:
            return "No conversations yet."
        
        total_turns = len(self.conversations)
        topics = []
        
        # Extract key topics mentioned (simple keyword extraction)
        ai_terms = ['llm', 'prompt', 'agent', 'memory', 'tool', 'embedding', 'model']
        
        for turn in self.conversations:
            user_text = turn['user'].lower()
            for term in ai_terms:
                if term in user_text and term not in topics:
                    topics.append(term)
        
        return f"Conversation summary: {total_turns} turns, topics discussed: {', '.join(topics) if topics else 'general questions'}"

print("✅ Episodic memory class implemented!")

## 🔄 Updating the Agent with Memory

Now let's enhance our agent to use memory in its decision-making:

In [None]:
class MemoryEnabledAgent(CourseAssistantAgent):
    """
    Enhanced agent with episodic memory capabilities
    
    This agent demonstrates how memory integration transforms
    agent behavior from stateless to stateful.
    """
    
    def __init__(self):
        super().__init__()
        self.memory = EpisodicMemory()
        print("🧠 Memory-enabled agent initialized!")
    
    def decide_and_act(self, user_input: str) -> str:
        """
        Enhanced decision cycle with memory integration
        
        Key difference from base agent: includes conversation context
        in reasoning and stores interactions for future use.
        """
        
        if not user_input.strip():
            return "I'd be happy to help! Please ask me a question about the course content."
        
        # OBSERVE: Analyze input AND retrieve relevant memory
        print(f"🔍 Agent observing: '{user_input}'")
        conversation_context = self.memory.get_recent_context(max_turns=3)
        print(f"🧠 Agent remembering: {len(self.memory.conversations)} previous turns")
        
        # PLAN: Enhanced planning with memory context
        followup_triggers = ["follow up", "what else", "what next", "more questions", "elaborate", "tell me more"]
        context_triggers = ["we discussed", "you mentioned", "earlier", "before", "that previous"]
        
        wants_followup = any(trigger in user_input.lower() for trigger in followup_triggers)
        references_context = any(trigger in user_input.lower() for trigger in context_triggers)
        
        if wants_followup:
            print("📋 Agent planning: User wants follow-up questions, using conversation context")
            # Use memory to provide better context for follow-up questions
            if self.memory.conversations:
                last_topic = self.memory.conversations[-1]['user']
                result = self.tools["generate_followup_questions"](last_topic, conversation_context)
            else:
                result = "I'd be happy to generate follow-up questions! What topic would you like to explore further?"
            
            # Store this interaction
            self.memory.store_interaction(user_input, result)
            return f"Here are some follow-up questions based on our conversation:\n\n{result}"
        
        # Default plan: Search + respond with memory context
        print("📋 Agent planning: Will search content and provide response with conversation context")
        
        # ACT: Execute plan with memory integration
        
        # Step 1: Search for relevant content
        print("⚡ Agent acting: Searching course content...")
        search_results = self.tools["search_content"](user_input, max_results=3)
        
        # Step 2: Generate response with memory context
        print("⚡ Agent acting: Generating response with content and conversation context...")
        
        response_prompt = f"""You are a helpful course assistant for an AI/ML education program.

Student's current question: "{user_input}"

Conversation context:
{conversation_context}

Relevant course content:
{search_results}

Provide a helpful answer that:
1. Addresses the current question using the course content
2. References previous conversation when relevant
3. Builds on topics we've already discussed
4. Maintains conversational continuity

If the question references previous discussion, acknowledge that connection.
Keep your response clear and educational (2-3 paragraphs maximum)."""
        
        main_response = self._call_llm(response_prompt, max_tokens=400)
        
        # Step 3: Store interaction in memory
        self.memory.store_interaction(user_input, main_response)
        
        # Step 4: Decide on follow-up offer (enhanced with memory)
        if references_context or len(self.memory.conversations) > 2:
            # For ongoing conversations, be more proactive about follow-ups
            return f"{main_response}\n\n💡 *Want me to suggest follow-up questions based on our discussion? Just ask!*"
        else:
            return f"{main_response}\n\n💡 *Want me to suggest follow-up questions? Just ask!*"

# Create the memory-enabled agent
memory_agent = MemoryEnabledAgent()
print("\n✅ Memory-enabled agent ready!")

## 🧪 Compare: Agent With vs Without Memory

Let's see the difference memory makes in a multi-turn conversation:

In [None]:
print("🔬 Memory Comparison Demo\n")
print("="*70)

# Simulate a conversation sequence
conversation = [
    "What is prompt engineering?",
    "Can you elaborate on that?",
    "What techniques did we just discuss?"
]

print("🤖 WITHOUT MEMORY (Original Agent):")
print("-" * 40)

for i, question in enumerate(conversation, 1):
    print(f"\nTurn {i} - User: {question}")
    response = agent.decide_and_act(question)
    # Show truncated response for comparison
    short_response = response[:150] + "..." if len(response) > 150 else response
    print(f"Turn {i} - Agent: {short_response}")

print("\n" + "="*70)
print("🧠 WITH MEMORY (Memory-Enabled Agent):")
print("-" * 40)

for i, question in enumerate(conversation, 1):
    print(f"\nTurn {i} - User: {question}")
    response = memory_agent.decide_and_act(question)
    # Show truncated response for comparison
    short_response = response[:150] + "..." if len(response) > 150 else response
    print(f"Turn {i} - Agent: {short_response}")

print("\n" + "="*70)
print(f"📊 Memory Summary: {memory_agent.memory.get_conversation_summary()}")

## 🧪 Test Memory Functionality

Let's test specific memory features:

In [None]:
print("🧪 Testing Memory Features\n")

# Test 1: Context reference
print("Test 1: Referencing previous conversation")
test_question = "You mentioned techniques earlier - can you give me more details?"
response = memory_agent.decide_and_act(test_question)
print(f"Response: {response[:200]}...\n")

# Test 2: Follow-up request
print("Test 2: Requesting follow-up questions")
followup_request = "What else should I know about this topic?"
response = memory_agent.decide_and_act(followup_request)
print(f"Response: {response[:200]}...\n")

# Test 3: Memory inspection
print("Test 3: Current memory state")
print(f"Total conversations stored: {len(memory_agent.memory.conversations)}")
print(f"Memory summary: {memory_agent.memory.get_conversation_summary()}")

print("\n📈 Recent context being used:")
print(memory_agent.memory.get_recent_context(max_turns=2))

## 🎯 Final Interactive Demo: Memory-Enabled Agent

Try a conversation with the memory-enabled agent to see how it maintains context:

In [None]:
def memory_agent_demo():
    """Interactive demo with the memory-enabled agent"""
    print("🧠 Memory-Enabled Agent Demo")
    print("=" * 40)
    print("Try these to see memory in action:")
    print("  • Ask a question about AI concepts")
    print("  • Follow up with 'tell me more about that'")
    print("  • Reference previous discussion: 'you mentioned...'")
    print("  • Request: 'what else should I know?'")
    print("\nType 'memory' to see conversation history")
    print("Type 'quit' to exit\n")
    
    while True:
        user_input = input("You: ").strip()
        
        if user_input.lower() in ['quit', 'exit', 'stop']:
            print("\n📊 Final conversation summary:")
            print(memory_agent.memory.get_conversation_summary())
            print("👋 Thanks for testing the memory-enabled agent!")
            break
        
        if user_input.lower() == 'memory':
            print("\n🧠 Current Memory State:")
            print(memory_agent.memory.get_recent_context())
            continue
        
        if not user_input:
            continue
        
        print(f"\nAgent: {memory_agent.decide_and_act(user_input)}")
        print("\n" + "-"*60 + "\n")

# Uncomment to run the interactive demo
# memory_agent_demo()

print("💡 Uncomment the line above to try the memory-enabled agent!")
print("   Notice how it references previous conversations and builds context.")

## 🔍 Understanding Memory Integration

**Key Memory Design Decisions:**

1. **Developer-Controlled Retrieval**: We always include recent context rather than letting the LLM decide when to retrieve memory. This ensures reliability while keeping the implementation simple.

2. **Automatic Storage**: Every interaction is stored automatically, ensuring no conversation history is lost.

3. **Context Window Management**: We limit context to recent turns to stay within LLM token limits while maintaining relevance.

**Production Considerations:**
- Real agents might use databases for persistent memory
- Advanced systems could use LLM-controlled memory retrieval
- Semantic memory (facts/preferences) could be added alongside episodic memory

**🎉 Section 4 Complete!**

You now have:
- ✅ A complete agent with episodic memory
- ✅ Understanding of memory integration patterns
- ✅ Experience with stateful vs stateless AI applications
- ✅ Hands-on knowledge of agent orchestration, tools, and memory

---

# 🎯 Lab Summary & Key Takeaways

Congratulations! You've built a complete AI agent from scratch. Let's review what you've accomplished:

## 🏗️ What You Built

1. **Content Search Tool** - Semantic search using embeddings and FAISS
2. **Follow-up Question Generator** - LLM-as-tool for educational enhancement
3. **Agent Orchestration** - Decision logic for tool coordination
4. **Episodic Memory** - Conversation history for context continuity

## 🧠 Key Concepts Learned

### Agent vs LLM Application
- **LLM Application**: Single-step input → output processing
- **Agent**: Multi-step reasoning with tools, memory, and adaptive behavior

### Tool Patterns
- **Retrieval Tools**: Access external information (search, databases)
- **Generative Tools**: Create new content using LLMs
- **Clean Interfaces**: Well-structured inputs/outputs for reliability

### Memory Types
- **Working Memory**: Current context and reasoning
- **Episodic Memory**: Conversation history and past interactions
- **Integration Strategy**: Developer vs LLM-controlled retrieval

### Orchestration Patterns
- **Observe**: Analyze input and retrieve relevant context
- **Plan**: Decide which tools to use and in what order
- **Act**: Execute the plan and generate responses

## 🚀 Looking Forward

Your agent demonstrates the fundamental patterns of modern AI applications. In production systems, you might see:

- **More sophisticated memory systems** (vector databases, semantic memory)
- **Advanced orchestration frameworks** (LangChain, CrewAI)
- **Multi-agent collaboration** (specialized agents working together)
- **Human-in-the-loop systems** (approval workflows, escalation)

## 🛠️ Extension Ideas

Want to enhance your agent further? Try:

1. **Add more tools** (calculator, web search, code executor)
2. **Implement user preferences** (learning style, preferred topics)
3. **Add error recovery** (retry logic, fallback strategies)
4. **Create specialized agents** (different personas for different subjects)

---

**🎉 Congratulations on completing the AI Agents Foundations Lab!**

You now understand the core principles of building production-grade AI agents and have hands-on experience with the patterns that power modern agentic applications.
