# Module 3: Building AI Agents - Foundations Lab

## üéØ Learning Objectives
By the end of this lab, you will:
1. **Understand agent orchestration** - How agents coordinate tools, memory, and decision-making
2. **Experience the transformation** from workflow to autonomous agent
3. **Implement practical tools** - Content search and intelligent follow-up generation
4. **Integrate episodic memory** - How agents remember and use conversation history
5. **Recognize LLM-controlled decision-making** - The key to true agency

## üèóÔ∏è What We're Building
A **Personal Assistant ChatBot** that evolves from a simple workflow into an autonomous agent:
- Search through course content to answer questions
- Remember conversation history for better context
- **Autonomously decide** when to generate thoughtful follow-up questions
- Make intelligent decisions about tool usage

## ‚è±Ô∏è Lab Timeline (100 minutes)
- **Section 1**: Setup & Data Loading (10 min)
- **Section 2**: Content Search Tool (15 min) 
- **Section 3**: Agent Orchestration (20 min)
- **Section 4**: Memory Integration (15 min)
- **Section 5**: LLM-Controlled Follow-up Questions (25 min)
- **Section 6**: Workflow vs Agent Reflection (15 min)

---

# Section 1: Setup & Data Loading (10 minutes)

First, let's set up our environment and understand how our course content was prepared for the agent.

In [4]:
# MyBinder users: set your credentials here (do NOT share real keys)
#import os
# os.environ['AWS_ACCESS_KEY_ID'] = 'YOUR_ACCESS_KEY'
# os.environ['AWS_SECRET_ACCESS_KEY'] = 'YOUR_SECRET_KEY'
# os.environ['AWS_DEFAULT_REGION'] = 'us-west-2'  # or your region

# Install required packages (run this cell first)
#!pip install -r ../../requirements.txt --quiet
#!conda install -y conda-forge::faiss-cpu --quiet

print("‚úÖ Packages installed successfully!")

‚úÖ Packages installed successfully!


In [33]:
# Import all required libraries
import json
import boto3
import numpy as np
import faiss
import pandas as pd
from datetime import datetime
from typing import List, Dict, Any, Optional
from dataclasses import dataclass
import os

print("üìö Libraries imported successfully!")
print("üïê Lab start time:", datetime.now().strftime("%H:%M:%S"))

üìö Libraries imported successfully!
üïê Lab start time: 19:33:53


## üîß Configuration

Set up AWS Bedrock connection and file paths:

In [34]:
# Configuration
AWS_REGION = "us-west-2"  # Change if you prefer a different region
EMBEDDINGS_FILE = "../embeddings/course_embeddings.json"
EMBEDDING_MODEL = "amazon.titan-embed-text-v2:0"
LLM_MODEL = "anthropic.claude-3-5-sonnet-20241022-v2:0"

print(f"üåé AWS Region: {AWS_REGION}")
print(f"üìÅ Embeddings file: {EMBEDDINGS_FILE}")
print(f"üß† LLM Model: {LLM_MODEL}")

# Initialize AWS Bedrock client
try:
    bedrock_client = boto3.client("bedrock-runtime", region_name=AWS_REGION)
    print("‚úÖ Connected to AWS Bedrock successfully!")
except Exception as e:
    print(f"‚ùå Failed to connect to AWS Bedrock: {e}")
    print("Please ensure your AWS credentials are configured correctly")

üåé AWS Region: us-west-2
üìÅ Embeddings file: ../embeddings/course_embeddings.json
üß† LLM Model: anthropic.claude-3-5-sonnet-20241022-v2:0
‚úÖ Connected to AWS Bedrock successfully!


## üìä Understanding Our Course Content Data

Before we build our agent, let's understand how our course content was prepared. The embeddings were created using the script at `../course_embeddings_generator.py`.

**How the embeddings were generated:**
1. **HTML Extraction**: Converted HTML pages to clean text using `html2text`
2. **Semantic Chunking**: Split content by `<section>` elements for logical units
3. **Vectorization**: Created embeddings using AWS Bedrock Titan Embeddings
4. **Storage**: Saved as structured JSON with metadata

Let's load and explore this data:

In [35]:
# Load the pre-generated embeddings
def load_course_embeddings(file_path: str) -> Dict[str, Any]:
    """Load embeddings and metadata from JSON file"""
    try:
        with open(file_path, 'r', encoding='utf-8') as f:
            data = json.load(f)
        print(f"‚úÖ Loaded embeddings from {file_path}")
        return data
    except FileNotFoundError:
        print(f"‚ùå Embeddings file not found: {file_path}")
        print("Please ensure you've run the embedding generation script first")
        return {}
    except Exception as e:
        print(f"‚ùå Error loading embeddings: {e}")
        return {}

# Load the data
embeddings_data = load_course_embeddings(EMBEDDINGS_FILE)

if embeddings_data:
    metadata = embeddings_data['metadata']
    chunks = embeddings_data['chunks']
    
    print(f"\nüìà Content Statistics:")
    print(f"   üìÑ Files processed: {len(metadata['processed_files'])}")
    print(f"   üìù Content chunks: {metadata['chunk_count']}")
    print(f"   üìä Total words: {metadata['total_words']:,}")
    print(f"   üß† Embedding dimension: {metadata['embedding_dimension']}")
    print(f"   üïê Created: {metadata['created_at'][:19]}")
    
    print(f"\nüìö Sample content chunks:")
    for i, chunk in enumerate(chunks[:3]):
        print(f"   {i+1}. {chunk['title']} ({chunk['word_count']} words) - {chunk['source']}")

‚úÖ Loaded embeddings from ../embeddings/course_embeddings.json

üìà Content Statistics:
   üìÑ Files processed: 6
   üìù Content chunks: 43
   üìä Total words: 18,800
   üß† Embedding dimension: 1024
   üïê Created: 2025-05-25T07:43:34

üìö Sample content chunks:
   1. Welcome to AI Foundations (38 words) - index.html
   2. Understanding Generative AI (358 words) - index.html
   3. The Evolution of Artificial Intelligence (277 words) - index.html


## üîç Creating the Search Index

Now let's create a FAISS (Facebook AI Similarity Search) index from our embeddings. This will enable fast semantic search over our course content.

In [36]:
def create_search_index(embeddings_data: Dict[str, Any]) -> tuple:
    """
    Create FAISS index for fast similarity search
    
    Returns:
        tuple: (faiss_index, chunks_list)
    """
    if not embeddings_data:
        return None, []
    
    chunks = embeddings_data['chunks']
    
    # Extract embeddings as numpy array
    embeddings_matrix = np.array([chunk['embedding'] for chunk in chunks], dtype=np.float32)
    
    # Create FAISS index (using Inner Product for cosine similarity)
    dimension = embeddings_matrix.shape[1]
    index = faiss.IndexFlatIP(dimension)  # Inner Product index
    
    # Normalize embeddings for cosine similarity
    faiss.normalize_L2(embeddings_matrix)
    
    # Add embeddings to index
    index.add(embeddings_matrix)
    
    print(f"‚úÖ Created FAISS index with {index.ntotal} vectors")
    print(f"üìê Vector dimension: {dimension}")
    
    return index, chunks

# Create the search index
search_index, content_chunks = create_search_index(embeddings_data)

if search_index:
    print(f"\nüéØ Search index ready! We can now find relevant content for any query.")
else:
    print("‚ùå Failed to create search index")

‚úÖ Created FAISS index with 43 vectors
üìê Vector dimension: 1024

üéØ Search index ready! We can now find relevant content for any query.


## üß™ Quick Search Test

Let's test our search index with a simple query to make sure everything works:

In [37]:
def quick_search_test(query: str, top_k: int = 2):
    """Test the search functionality with a sample query"""
    
    if not search_index:
        print("‚ùå Search index not available")
        return
    
    # Create query embedding
    try:
        response = bedrock_client.invoke_model(
            modelId=EMBEDDING_MODEL,
            body=json.dumps({"inputText": query})
        )
        query_embedding = json.loads(response['body'].read())['embedding']
        
        # Convert to numpy and normalize
        query_vector = np.array([query_embedding], dtype=np.float32)
        faiss.normalize_L2(query_vector)
        
        # Search
        scores, indices = search_index.search(query_vector, top_k)
        
        print(f"üîç Search results for: '{query}'")
        print("-" * 50)
        
        for i, (score, idx) in enumerate(zip(scores[0], indices[0])):
            chunk = content_chunks[idx]
            print(f"\n{i+1}. **{chunk['title']}** (Score: {score:.3f})")
            print(f"   Source: {chunk['source']}")
            print(f"   Preview: {chunk['content'][:150]}...")
        
    except Exception as e:
        print(f"‚ùå Search test failed: {e}")

# Test with a course-related query
quick_search_test("What are the key characteristics of large language models?")

üîç Search results for: 'What are the key characteristics of large language models?'
--------------------------------------------------

1. **Module 1: Understanding Large Language Models** (Score: 0.685)
   Source: llm.html
   Preview: ## Module 1: Understanding Large Language Models

Large Language Models (LLMs) are sophisticated AI systems, trained on vast amounts of text data, tha...

2. **6. LLM Evolution & Architectural Advances** (Score: 0.561)
   Source: llm.html
   Preview: ## 6\. LLM Evolution & Architectural Advances

#### Early LLM Development (2017-2022)

The modern Large Language Model era began with the 2017 paper "...


**üéâ Section 1 Complete!**

You now have:
- ‚úÖ Course content loaded and indexed for search
- ‚úÖ Understanding of how embeddings enable semantic search  
- ‚úÖ A working search system ready for agent integration

---

# Section 2: Content Search Tool Implementation (15 minutes)

Now let's build our first tool - the content search capability. This demonstrates the **retrieval tool pattern** where we find and return existing information.

## üõ†Ô∏è Tool Design Principles

Good agent tools should:
1. **Do one job well** - Clear, focused purpose
2. **Have clean interfaces** - Easy for agents to understand and use
3. **Provide useful outputs** - Structured, informative results
4. **Handle errors gracefully** - Helpful error messages

In [38]:
@dataclass
class SearchResult:
    """Clean data structure for search results"""
    content: str
    title: str
    source: str
    relevance_score: float

def search_content_tool(query: str, max_results: int = 3) -> str:
    """
    Search course content using semantic similarity
    
    Args:
        query: The search query
        max_results: Maximum number of results to return
        
    Returns:
        Formatted search results as a string
    """
    
    if not search_index or not content_chunks:
        return "Error: Search index not available"
    
    if not query.strip():
        return "Error: Search query cannot be empty"
    
    try:
        # Create query embedding
        response = bedrock_client.invoke_model(
            modelId=EMBEDDING_MODEL,
            body=json.dumps({"inputText": query})
        )
        query_embedding = json.loads(response['body'].read())['embedding']
        
        # Convert to numpy and normalize for cosine similarity
        query_vector = np.array([query_embedding], dtype=np.float32)
        faiss.normalize_L2(query_vector)
        
        # Search for similar content
        scores, indices = search_index.search(query_vector, max_results)
        
        # Format results
        results = []
        for score, idx in zip(scores[0], indices[0]):
            if score > 0.3:  # Only include reasonably relevant results
                chunk = content_chunks[idx]
                results.append(SearchResult(
                    content=chunk['content'][:500] + "..." if len(chunk['content']) > 500 else chunk['content'],
                    title=chunk['title'],
                    source=chunk['source'],
                    relevance_score=float(score)
                ))
        
        if not results:
            return f"No relevant content found for query: '{query}'"
        
        # Format output for the agent
        output = f"Found {len(results)} relevant content sections for '{query}':\n\n"
        
        for i, result in enumerate(results, 1):
            output += f"{i}. **{result.title}** (Relevance: {result.relevance_score:.3f})\n"
            output += f"   Source: {result.source}\n"
            output += f"   Content: {result.content}\n\n"
        
        return output
        
    except Exception as e:
        return f"Error searching content: {str(e)}"

print("‚úÖ Content search tool implemented!")

‚úÖ Content search tool implemented!


## üß™ Test the Content Search Tool

In [39]:
# Test the content search tool
print("üîç Testing Content Search Tool\n")

test_queries = [
    "What is prompt engineering?",
    "How do LLMs work?",
    "What makes agents different from simple LLM applications?"
]

for query in test_queries:
    print(f"Query: {query}")
    print("=" * 60)
    result = search_content_tool(query, max_results=2)
    print(result[:300] + "..." if len(result) > 300 else result)
    print("\n")

üîç Testing Content Search Tool

Query: What is prompt engineering?
Found 2 relevant content sections for 'What is prompt engineering?':

1. **1. Prompt Engineering Overview** (Relevance: 0.620)
   Source: prompts.html
   Content: ## 1\. Prompt Engineering Overview

Input (Prompt) AI Model (Processing) Output (Response)

### 1.1 What are Prompts?

A **prompt** is th...


Query: How do LLMs work?
Found 2 relevant content sections for 'How do LLMs work?':

1. **Module 1: Understanding Large Language Models** (Relevance: 0.520)
   Source: llm.html
   Content: ## Module 1: Understanding Large Language Models

Large Language Models (LLMs) are sophisticated AI systems, trained on vast amounts of ...


Query: What makes agents different from simple LLM applications?
Found 2 relevant content sections for 'What makes agents different from simple LLM applications?':

1. **From LLMs to Agents: Why Go Further?** (Relevance: 0.696)
   Source: agents.html
   Content: ## From LLMs to Agents: Why Go 

**üéâ Section 2 Complete!**

You now have:
- ‚úÖ A working content search tool with clean interface design
- ‚úÖ Understanding of retrieval tool patterns
- ‚úÖ Error handling and structured output formatting
- ‚úÖ Foundation ready for agent integration

---

# Section 3: Agent Orchestration (20 minutes)

Now let's build our first **agent** that can intelligently use the search tool! 

## üß† Agent Fundamentals

Our agent will implement the core decision cycle:
1. **Observe**: Analyze user input and current context
2. **Plan**: Decide what actions to take
3. **Act**: Execute the plan using available tools

**Key insight**: For now, the orchestration logic will be **developer-defined**. We'll see how this evolves later in the lab!

In [40]:
class CourseAssistantAgent:
    """
    A personal assistant agent for course content
    
    This agent demonstrates the core pattern of AI applications:
    - Observe: Analyze user input
    - Plan: Decide which tools to use
    - Act: Execute the plan and generate a response
    """
    
    def __init__(self):
        # Tool registry - the agent's available capabilities
        self.tools = {
            "search_content": search_content_tool
        }
        
        # Initialize without memory for now (we'll add this in Section 4)
        self.memory = None
        
        print("ü§ñ Course Assistant Agent initialized!")
        print(f"   Available tools: {list(self.tools.keys())}")
    
    def _call_llm(self, prompt: str, max_tokens: int = 500, temperature: float = 0.1) -> str:
        """Helper method to call Claude via Bedrock"""
        try:
            # Create the request body
            request_body = {
                "anthropic_version": "bedrock-2023-05-31",
                "max_tokens": max_tokens,
                "messages": [
                    {
                        "role": "user",
                        "content": prompt
                    }
                ],
                "temperature": temperature
            }

            response = bedrock_client.invoke_model(
                modelId=LLM_MODEL,
                body=json.dumps(request_body),
                contentType='application/json'
            )
            
            response_body = json.loads(response['body'].read())
            return response_body['content'][0]['text'].strip()
        except Exception as e:
            return f"Error calling LLM: {str(e)}"
    
    def decide_and_act(self, user_input: str) -> str:
        """
        The main agent decision cycle: Observe ‚Üí Plan ‚Üí Act
        
        Args:
            user_input: The user's question or request
            
        Returns:
            Agent's response
        """
        
        if not user_input.strip():
            return "I'd be happy to help! Please ask me a question about the course content."
        
        # OBSERVE: Analyze the user's input
        print(f"üîç Agent observing: '{user_input}'")
        
        # PLAN: For now, simple plan - always search first, then respond
        print("üìã Agent planning: Will search content and provide comprehensive response")
        
        # ACT: Execute the plan
        
        # Step 1: Search for relevant content
        print("‚ö° Agent acting: Searching course content...")
        search_results = self.tools["search_content"](user_input, max_results=3)
        
        # Step 2: Generate response using search results
        print("‚ö° Agent acting: Generating response with found content...")
        response_prompt = f"""You are a helpful course assistant for an AI/ML education program.

A student asked: "{user_input}"

Here's relevant content from the course materials:
{search_results}

Based on this content, provide a clear, helpful answer to the student's question. 
Focus on the key concepts and make it educational. If the search results don't 
contain relevant information, say so politely and offer to help with other topics.

Keep your response concise but complete (2-3 paragraphs maximum)."""
        
        main_response = self._call_llm(response_prompt, max_tokens=400)
        
        return main_response

# Create our agent
agent = CourseAssistantAgent()
print("\n‚úÖ Agent ready for testing!")

ü§ñ Course Assistant Agent initialized!
   Available tools: ['search_content']

‚úÖ Agent ready for testing!


## üß™ Test the Agent

In [42]:
print("ü§ñ Testing Basic Agent\n")

test_questions = [
    "What are the main differences between LLMs and AI agents?",
    "Can you explain what prompt engineering is?",
    "How do embeddings work in AI applications?"
]

for question in test_questions:
    print(f"User: {question}")
    print("\n" + "="*60 + "\n")
    
    response = agent.decide_and_act(question)
    print(f"Agent: {response}")
    print("\n" + "-"*60 + "\n")

ü§ñ Testing Basic Agent

User: What are the main differences between LLMs and AI agents?


üîç Agent observing: 'What are the main differences between LLMs and AI agents?'
üìã Agent planning: Will search content and provide comprehensive response
‚ö° Agent acting: Searching course content...
‚ö° Agent acting: Generating response with found content...
Agent: Based on the course materials, I can explain the key differences between LLMs and AI agents:

The main distinction is that LLMs are primarily language processing systems that can understand and generate text, reason about information, and answer questions - but they operate within specific boundaries. They lack persistent memory, can't access external tools/data, and don't take autonomous actions. Think of an LLM as a sophisticated language processor that responds to prompts but doesn't actively do anything beyond text generation.

AI agents, on the other hand, are more dynamic systems that build upon LLMs by adding action-orient

## üîç Understanding Current Agent Behavior

**Notice the pattern:** Our agent currently follows a **predictable workflow**:
1. Always searches for content
2. Always generates a response based on search results
3. Never varies from this pattern

**This is "intelligent workflow automation" rather than autonomous decision-making.**

The orchestration logic is **hardcoded by us** (the developers), not dynamically determined by the LLM based on context.

**üéâ Section 3 Complete!**

You now have:
- ‚úÖ A working agent that orchestrates tool usage
- ‚úÖ Understanding of the agent decision cycle (Observe ‚Üí Plan ‚Üí Act)
- ‚úÖ Experience with developer-controlled orchestration patterns
- ‚úÖ Foundation ready for memory integration

---

# Section 4: Memory Integration (15 minutes)

Now let's give our agent **memory**! This transforms it from a stateless system into one that can maintain context across multiple interactions.

## üß† Why Memory Matters

Without memory, each interaction is isolated:
- ‚ùå "What did we just discuss?"
- ‚ùå "Can you elaborate on that previous point?"
- ‚ùå Building on previous conversations

With memory, agents become context-aware:
- ‚úÖ Remembers conversation history
- ‚úÖ Builds on previous discussions
- ‚úÖ Provides continuity across interactions

## üíæ Implementing Episodic Memory

We'll implement simple **episodic memory** - storing and retrieving conversation history.

In [49]:
class EpisodicMemory:
    """
    Simple episodic memory for storing conversation history
    
    In production, this might use a database, but for learning
    purposes, we'll use in-memory storage.
    """
    
    def __init__(self):
        self.conversations = []  # List of conversation turns
        print("üß† Episodic memory initialized")
    
    def store_interaction(self, user_input: str, agent_response: str):
        """
        Store a conversation turn
        
        Args:
            user_input: What the user said
            agent_response: How the agent responded
        """
        interaction = {
            "timestamp": datetime.now().isoformat(),
            "user": user_input,
            "agent": agent_response,
            "turn_number": len(self.conversations) + 1
        }
        
        self.conversations.append(interaction)
        print(f"üíæ Stored interaction #{interaction['turn_number']}")
    
    def get_recent_context(self, max_turns: int = 3) -> str:
        """
        Get recent conversation history for context
        
        Args:
            max_turns: Maximum number of recent turns to include
            
        Returns:
            Formatted conversation context
        """
        if not self.conversations:
            return "No previous conversation history."
        
        # Get the most recent turns
        recent = self.conversations[-max_turns:]
        
        context = "Recent conversation history:\n"
        for turn in recent:
            context += f"Turn {turn['turn_number']} - User: {turn['user']}\n"
            context += f"Turn {turn['turn_number']} - Agent: {turn['agent']}\n\n"
        
        return context
    
    def get_conversation_summary(self) -> str:
        """Get a summary of the entire conversation"""
        if not self.conversations:
            return "No conversations yet."
        
        total_turns = len(self.conversations)
        topics = []
        
        # Extract key topics mentioned (simple keyword extraction)
        ai_terms = ['llm', 'prompt', 'agent', 'memory', 'tool', 'embedding', 'model']
        
        for turn in self.conversations:
            user_text = turn['user'].lower()
            for term in ai_terms:
                if term in user_text and term not in topics:
                    topics.append(term)
        
        return f"Conversation summary: {total_turns} turns, topics discussed: {', '.join(topics) if topics else 'general questions'}"

print("‚úÖ Episodic memory class implemented!")

‚úÖ Episodic memory class implemented!


## üîÑ Enhanced Agent with Memory

In [47]:
class MemoryEnabledAgent(CourseAssistantAgent):
    """
    Enhanced agent with episodic memory capabilities
    
    This agent demonstrates how memory integration transforms
    agent behavior from stateless to stateful.
    """
    
    def __init__(self):
        super().__init__()
        self.memory = EpisodicMemory()
        print("üß† Memory-enabled agent initialized!")
    
    def decide_and_act(self, user_input: str) -> str:
        """
        Enhanced decision cycle with memory integration
        
        Key difference from base agent: includes conversation context
        in reasoning and stores interactions for future use.
        """
        
        if not user_input.strip():
            return "I'd be happy to help! Please ask me a question about the course content."
        
        # OBSERVE: Analyze input AND retrieve relevant memory
        print(f"üîç Agent observing: '{user_input}'")
        conversation_context = self.memory.get_recent_context(max_turns=3)
        print(f"üß† Agent remembering: {len(self.memory.conversations)} previous turns")
        
        # PLAN: Same simple plan for now, but with memory context
        print("üìã Agent planning: Will search content and provide response with conversation context")
        
        # ACT: Execute plan with memory integration
        
        # Step 1: Search for relevant content
        print("‚ö° Agent acting: Searching course content...")
        search_results = self.tools["search_content"](user_input, max_results=3)
        
        # Step 2: Generate response with memory context
        print("‚ö° Agent acting: Generating response with content and conversation context...")
        
        response_prompt = f"""You are a helpful course assistant for an AI/ML education program.

Student's current question: "{user_input}"

Conversation context:
{conversation_context}

Relevant course content:
{search_results}

Provide a helpful answer that:
1. Addresses the current question using the course content
2. References previous conversation when relevant
3. Builds on topics we've already discussed
4. Maintains conversational continuity

If the question references previous discussion, acknowledge that connection.
Keep your response clear and educational (2-3 paragraphs maximum)."""
        
        main_response = self._call_llm(response_prompt, max_tokens=400)
        
        # Step 3: Store interaction in memory
        self.memory.store_interaction(user_input, main_response)
        
        return main_response

# Create the memory-enabled agent
memory_agent = MemoryEnabledAgent()
print("\n‚úÖ Memory-enabled agent ready!")

ü§ñ Course Assistant Agent initialized!
   Available tools: ['search_content']
üß† Episodic memory initialized
üß† Memory-enabled agent initialized!

‚úÖ Memory-enabled agent ready!


## üß™ Test Memory Functionality

In [48]:
print("üß™ Testing Memory-Enabled Agent\n")

# Simulate a conversation sequence
conversation = [
    "What is prompt engineering?",
    "Can you elaborate on that?",
    "What techniques did we just discuss?",
    "How does this relate to what we talked about earlier?"
]

for i, question in enumerate(conversation, 1):
    print(f"Turn {i} - User: {question}")
    print("-" * 40)
    response = memory_agent.decide_and_act(question)
    print(f"Agent: {response[:200]}..." if len(response) > 200 else f"Agent: {response}")
    print("\n" + "="*60 + "\n")

print(f"üìä Final Memory Summary: {memory_agent.memory.get_conversation_summary()}")

üß™ Testing Memory-Enabled Agent

Turn 1 - User: What is prompt engineering?
----------------------------------------
üîç Agent observing: 'What is prompt engineering?'
üß† Agent remembering: 0 previous turns
üìã Agent planning: Will search content and provide response with conversation context
‚ö° Agent acting: Searching course content...
‚ö° Agent acting: Generating response with content and conversation context...
üíæ Stored interaction #1
Agent: Let me explain prompt engineering in a clear and accessible way.

Prompt engineering is the art and science of effectively communicating with AI systems through carefully crafted inputs called prompts...


Turn 2 - User: Can you elaborate on that?
----------------------------------------
üîç Agent observing: 'Can you elaborate on that?'
üß† Agent remembering: 1 previous turns
üìã Agent planning: Will search content and provide response with conversation context
‚ö° Agent acting: Searching course content...
‚ö° Agent acting: Generati

**üéâ Section 4 Complete!**

You now have:
- ‚úÖ An agent with episodic memory
- ‚úÖ Understanding of memory integration patterns
- ‚úÖ Experience with stateful conversation management
- ‚úÖ Foundation ready for the next transformation

---

# Section 5: LLM-Controlled Follow-up Questions (25 minutes)

Now comes the exciting transformation! We'll add a new capability - **follow-up question generation** - but this time, we'll let the **LLM decide** when to use it.

## üéØ The Key Insight

**So far:** Our agent has followed **developer-defined workflows** (search ‚Üí respond)  
**Now:** The agent will make **autonomous decisions** about when to generate follow-up questions

This is the transformation from **"Intelligent Workflow"** to **"Autonomous Agent"**!

## ü§î Tool 2: Follow-up Question Generator

This tool generates relevant follow-up questions to enhance learning. It's a **generative tool** that demonstrates the **"LLM-as-tool"** pattern.

In [None]:
def generate_followup_questions_tool(current_topic: str, conversation_context: str = "") -> str:
    """
    Generate relevant follow-up questions to enhance learning
    
    Args:
        current_topic: The topic being discussed
        conversation_context: Recent conversation for context
        
    Returns:
        Formatted follow-up questions
    """
    
    if not current_topic.strip():
        return "Error: Topic cannot be empty for question generation"
    
    try:
        # Create prompt for follow-up question generation
        prompt = f"""You are an educational assistant helping students explore AI and machine learning concepts more deeply.

Based on the current topic '{current_topic}' and this conversation context:
{conversation_context}

Generate 3 thoughtful follow-up questions that would help a student:
1. Deepen their understanding of this topic
2. Connect it to other course concepts (LLMs, prompt engineering, agents)
3. Apply it practically or think about real-world implications

Make the questions specific, engaging, and educational. Format as:
ü§î Question 1: ...
ü§î Question 2: ...
ü§î Question 3: ...

Only return the questions, nothing else."""

        # Create the request body
        request_body = {
            "anthropic_version": "bedrock-2023-05-31",
            "max_tokens": 300,
            "messages": [
                {
                    "role": "user",
                    "content": prompt
                }
            ],
            "temperature": 0.7
        }

        # Call Claude for question generation
        response = bedrock_client.invoke_model(
            modelId=LLM_MODEL,
            body=json.dumps(request_body),
            contentType='application/json'
        )
        
        response_body = json.loads(response['body'].read())
        questions = response_body['content'][0]['text'].strip()
        
        return questions
        
    except Exception as e:
        return f"Error generating follow-up questions: {str(e)}"

print("‚úÖ Follow-up question generator tool implemented!")

## üß™ Test the Follow-up Question Tool

In [None]:
# Test the follow-up question generator
print("ü§î Testing Follow-up Question Generator\n")

test_topic = "agent memory systems"
test_context = "We just discussed how agents use working memory and episodic memory"

followup_questions = generate_followup_questions_tool(test_topic, test_context)

print(f"Topic: {test_topic}")
print(f"Context: {test_context}")
print("=" * 50)
print(followup_questions)

## üß† LLM-Controlled Decision Making

Here's where the magic happens! Instead of hardcoded `if/else` logic, we'll let Claude decide when follow-up questions would be valuable.

In [None]:
class AutonomousAgent(MemoryEnabledAgent):
    """
    Truly autonomous agent with LLM-controlled decision making
    
    Key difference: The LLM decides when and how to use tools,
    rather than following predetermined patterns.
    """
    
    def __init__(self):
        super().__init__()
        # Add the follow-up questions tool
        self.tools["generate_followup_questions"] = generate_followup_questions_tool
        print("üöÄ Autonomous agent initialized!")
        print(f"   Available tools: {list(self.tools.keys())}")
    
    def _should_offer_followup_questions(self, user_input: str, agent_response: str, conversation_context: str) -> bool:
        """
        LLM decides whether to offer follow-up questions
        
        This is the key transformation - decision making by LLM, not hardcoded logic!
        """
        
        try:
            decision_prompt = f"""You are helping decide when to offer educational follow-up questions.
            
User asked: "{user_input}"
Agent responded: "{agent_response[:200]}..."
Conversation context: {conversation_context}

Should I offer follow-up questions? Consider:
- Was the question clearly answered with good content?
- Is the user exploring/learning vs seeking quick facts?
- Would deeper questions enhance understanding?
- Is this a natural learning moment?
- Are we early enough in the conversation that questions would be welcome?

Respond with just: YES or NO"""
            
            decision = self._call_llm(decision_prompt, max_tokens=10, temperature=0.1).strip().upper()
            print(f"ü§ñ LLM Decision: {decision}")
            return decision == "YES"
            
        except Exception as e:
            print(f"‚ö†Ô∏è Decision LLM failed: {e}, defaulting to NO")
            return False  # Graceful degradation
    
    def decide_and_act(self, user_input: str) -> str:
        """
        Autonomous decision cycle with LLM-controlled orchestration
        
        The LLM now participates in deciding the workflow!
        """
        
        if not user_input.strip():
            return "I'd be happy to help! Please ask me a question about the course content."
        
        # OBSERVE: Analyze input and retrieve memory
        print(f"üîç Agent observing: '{user_input}'")
        conversation_context = self.memory.get_recent_context(max_turns=3)
        print(f"üß† Agent remembering: {len(self.memory.conversations)} previous turns")
        
        # Check for explicit follow-up requests first
        followup_triggers = ["follow up", "what else", "what next", "more questions", "dig deeper"]
        explicit_followup = any(trigger in user_input.lower() for trigger in followup_triggers)
        
        if explicit_followup:
            print("üìã Agent planning: User explicitly requested follow-up questions")
            if self.memory.conversations:
                last_topic = self.memory.conversations[-1]['user']
                result = self.tools["generate_followup_questions"](last_topic, conversation_context)
                self.memory.store_interaction(user_input, result)
                return f"Here are some follow-up questions based on our conversation:\n\n{result}"
            else:
                return "I'd be happy to generate follow-up questions! What topic would you like to explore further?"
        
        # PLAN: Standard workflow - search and respond
        print("üìã Agent planning: Will search content, respond, then LLM will decide about follow-up questions")
        
        # ACT: Execute standard workflow
        
        # Step 1: Search for relevant content
        print("‚ö° Agent acting: Searching course content...")
        search_results = self.tools["search_content"](user_input, max_results=3)
        
        # Step 2: Generate main response
        print("‚ö° Agent acting: Generating main response...")
        response_prompt = f"""You are a helpful course assistant for an AI/ML education program.

Student's current question: "{user_input}"

Conversation context:
{conversation_context}

Relevant course content:
{search_results}

Provide a helpful answer that:
1. Addresses the current question using the course content
2. References previous conversation when relevant
3. Builds on topics we've already discussed
4. Maintains conversational continuity

Keep your response clear and educational (2-3 paragraphs maximum)."""
        
        main_response = self._call_llm(response_prompt, max_tokens=400)
        
        # Step 3: LLM DECIDES whether to offer follow-up questions
        print("ü§ñ Agent consulting LLM: Should I offer follow-up questions?")
        should_offer_followup = self._should_offer_followup_questions(user_input, main_response, conversation_context)
        
        final_response = main_response
        
        if should_offer_followup:
            print("‚ö° Agent acting: Generating follow-up questions based on LLM decision...")
            followup_questions = self.tools["generate_followup_questions"](user_input, conversation_context)
            final_response = f"{main_response}\n\n---\nüí° **Want to explore further?**\n{followup_questions}"
        
        # Step 4: Store interaction in memory
        self.memory.store_interaction(user_input, final_response)
        
        return final_response

# Create the autonomous agent
autonomous_agent = AutonomousAgent()
print("\n‚úÖ Autonomous agent ready!")

## üß™ Test Autonomous Decision Making

In [None]:
print("üöÄ Testing Autonomous Agent\n")

test_questions = [
    "What is the difference between agents and simple LLM applications?",
    "How do vector embeddings enable semantic search?",
    "What is 2+2?",  # Simple factual question
    "Can you explain how agent memory works in more detail?"
]

for i, question in enumerate(test_questions, 1):
    print(f"Test {i} - User: {question}")
    print("\n" + "="*70 + "\n")
    
    response = autonomous_agent.decide_and_act(question)
    print(f"Agent: {response}")
    print("\n" + "-"*70 + "\n")

print(f"üìä Conversation Summary: {autonomous_agent.memory.get_conversation_summary()}")

## üîç What Just Happened?

**Key Transformation:** The agent now uses **LLM reasoning** to decide when to offer follow-up questions!

**Before:** Hardcoded logic (`if user_wants_followup:...`)
**Now:** LLM evaluates context and makes autonomous decisions

**Notice:**
- The LLM considers conversation context, question complexity, and learning value
- Decisions aren't predetermined - they emerge from reasoning
- The agent adapts its behavior based on the situation

**This is the essence of autonomous agency!**

## üéØ Interactive Demo: Experience True Agency

Try conversing with the autonomous agent to see LLM-controlled decision making in action:

In [None]:
def autonomous_agent_demo():
    """Interactive demo with the autonomous agent"""
    print("üöÄ Autonomous Agent Demo")
    print("=" * 40)
    print("Ask questions and watch the agent make autonomous decisions:")
    print("  ‚Ä¢ Complex learning questions (likely to trigger follow-ups)")
    print("  ‚Ä¢ Simple factual queries (likely won't trigger follow-ups)")
    print("  ‚Ä¢ Explicitly request: 'what else should I know?'")
    print("\nType 'memory' to see conversation history")
    print("Type 'quit' to exit\n")
    
    while True:
        user_input = input("You: ").strip()
        
        if user_input.lower() in ['quit', 'exit', 'stop']:
            print("\nüìä Final conversation summary:")
            print(autonomous_agent.memory.get_conversation_summary())
            print("üëã Thanks for testing the autonomous agent!")
            break
        
        if user_input.lower() == 'memory':
            print("\nüß† Current Memory State:")
            print(autonomous_agent.memory.get_recent_context())
            continue
        
        if not user_input:
            continue
        
        print(f"\nAgent: {autonomous_agent.decide_and_act(user_input)}")
        print("\n" + "-"*60 + "\n")

# Uncomment to run the interactive demo
# autonomous_agent_demo()

print("üí° Uncomment the line above to try the autonomous agent!")
print("   Watch for the LLM decision messages to see autonomous reasoning in action.")

**üéâ Section 5 Complete!**

You now have:
- ‚úÖ Experience with LLM-controlled decision making
- ‚úÖ A true autonomous agent that reasons about tool usage
- ‚úÖ Understanding of the workflow ‚Üí agent transformation
- ‚úÖ Hands-on experience with emergent vs predetermined behavior

---

# Section 6: Workflow vs Agent Reflection (15 minutes)

Let's reflect on the transformation you just experienced - from hardcoded workflow to autonomous agent.

## üîç What We Built: The Evolution

You've built **three different systems** in this lab, each with increasing levels of agency:

## üìä The Agency Spectrum: Your Journey

| Stage | System | Decision Making | Tool Usage | Behavior |
|-------|--------|----------------|------------|----------|
| **1** | Basic Agent (Section 3) | Developer-defined workflow | Always search ‚Üí respond | Predictable, reliable |
| **2** | Memory Agent (Section 4) | Developer-defined + context | Search ‚Üí respond + memory | Context-aware, predictable |
| **3** | Autonomous Agent (Section 5) | **LLM-controlled decisions** | **Dynamic tool selection** | **Adaptive, emergent** |

## üéØ The Key Transformation

**The critical difference isn't the tools or memory - it's WHO makes the decisions:**

### **Intelligent Workflow (Stages 1-2):**
```python
# Developer writes the decision logic
if explicit_followup_request:
    generate_followup_questions()
else:
    search_and_respond()
```

### **Autonomous Agent (Stage 3):**
```python
# LLM makes the decision based on context
should_offer = llm_decides_based_on_context(user_input, response, history)
if should_offer:
    generate_followup_questions()
```

## üß† Key Insights from Your Journey

### **1. Agency is a Spectrum**
There's no binary "agent vs not-agent" - it's about the degree of autonomous decision-making.

### **2. Both Approaches Have Value**
- **Intelligent Workflows**: Predictable, debuggable, reliable - perfect for many production use cases
- **Autonomous Agents**: Adaptive, context-aware, capable of handling novel situations

### **3. The Foundation Matters**
Tools, memory, and orchestration patterns are essential for BOTH workflows and agents. You learned the building blocks that power all modern AI applications.

### **4. LLM Orchestration is Powerful**
When you let the LLM participate in decision-making, you get emergent behaviors that you didn't explicitly program.

## üöÄ Looking Forward: Production Considerations

**In real-world applications, you might choose:**

### **Intelligent Workflows When:**
- Predictability is crucial (financial transactions, healthcare)
- Debugging and auditing are essential
- Performance and cost optimization are priorities
- The workflow is well-defined and stable

### **Autonomous Agents When:**
- Handling diverse, unpredictable user needs
- Personalization and adaptation are important
- The problem space is complex and evolving
- You want emergent capabilities

### **Hybrid Approaches (Common in Production):**
- Critical decisions: Developer-controlled
- Creative decisions: LLM-controlled
- Safety nets: Always developer-controlled fallbacks

## üéâ What You've Accomplished

**Foundational Skills:**
- ‚úÖ Tool design and integration patterns
- ‚úÖ Memory systems and context management
- ‚úÖ Agent orchestration and decision cycles
- ‚úÖ Error handling and graceful degradation

**Advanced Concepts:**
- ‚úÖ LLM-controlled decision making
- ‚úÖ Emergent vs predetermined behavior
- ‚úÖ The spectrum of agency in AI systems
- ‚úÖ Production considerations for different approaches

**Real-World Readiness:**
- ‚úÖ Understanding when to use workflows vs agents
- ‚úÖ Ability to build both patterns effectively
- ‚úÖ Foundation for advanced agent architectures
- ‚úÖ Critical thinking about agency and autonomy

## üåü The Bigger Picture

You've experienced firsthand the evolution from **programmed behavior** to **emergent intelligence**. This mirrors the broader transformation happening in AI:

- **Traditional Software**: Explicit programming for every scenario
- **Intelligent Workflows**: LLMs handle language understanding, developers handle logic
- **Autonomous Agents**: LLMs participate in reasoning about what to do next

**You're now equipped to build AI applications across this entire spectrum!**

---

**üéâ Congratulations on completing the AI Agents Foundations Lab!**

You've not just learned to build agents - you've experienced the transformation from workflow automation to autonomous intelligence. This understanding will serve you well as AI systems continue to evolve toward greater autonomy and capability.