# Advanced AI Agents `Foundations` Laboratory


- [Introduction to Agentic AI: Theory and Practice](#introduction-to-agentic-ai-theory-and-practice)
- [What is an AI Agent](#what-is-an-ai-agent)
- [Anthropic's Framework: Workflows vs Agents](#anthropics-framework-workflows-vs-agents)
- [The 5 Fundamental Workflow Patterns](#the-5-fundamental-workflow-patterns)
    - [1. Prompt Chaining](#1-prompt-chaining)
    - [2. Routing](#2-routing)
    - [3. Parallelization](#3-parallelization)
    - [4. Orchestrator-Worker](#4-orchestrator-worker)
    - [5. Evaluator-Optimizer-Validation Loop](#5-evaluator-optimizer-validation-loop)
- [Refactoring examples](#refactoring)

In [9]:
from dotenv import load_dotenv
load_dotenv(override=True)

import os
openai_api_key = os.getenv('OPENAI_API_KEY')

if openai_api_key:
    print(f"OpenAI API Key exists and begins {openai_api_key[:8]}")
else:
    print("OpenAI API Key not set - please head to the troubleshooting guide in the setup folder")

from openai import OpenAI
openai = OpenAI()

OpenAI API Key exists and begins sk-proj-


## Introduction to Agentic AI Theory and Practice

This notebook demonstrates comprehensive AI agent capabilities through four progressive laboratories, integrating theoretical concepts with practical implementations. We'll explore the **5 fundamental Workflow Patterns** and understand how they form the building blocks of agentic systems.

**Core Learning Objectives:**
- Master the 5 fundamental Workflow Patterns through practical implementation
- Differentiate between Workflows (predefined) and Agents (dynamic)
- Multi-model architecture implementation and comparison
- Automatic response evaluation with structured validation
- Tool integration patterns and real-world deployment

---

## What is an AI Agent?

According to Hugging Face's definition:
> "AI agents are programs where LLM outputs control the workflow"

This means the output of a language model determines which tasks are executed and in what order.

**Hallmarks of Agentic AI:**
1. **Multiple LLM calls** - Like our multi-model comparison system
2. **Tool use** - LLMs executing external functions (time, weather)
3. **LLM communication** - Models passing information between each other
4. **Planning** - An LLM acting as a planner to coordinate tasks
5. **Autonomy** - The system has freedom to choose how to proceed

**Autonomy** is often seen as the key element - when a model chooses how to respond or which path to take, that reflects autonomy.

---

## Anthropic's Framework: Workflows vs Agents

Anthropic categorizes `agentic systems` into two types:

### **Workflows (Predefined Orchestration):**
- Structured, predictable execution paths
- Defined sequences of model and tool interactions
- Clear guardrails and control mechanisms
- **Our Labs 1-4 demonstrate these patterns**

### **Agents (Dynamic Control):**
- Models dynamically control tools and task flow
- Open-ended, iterative loops with feedback
- Less predictable but more powerful
- Will be explored in future weeks

---

## The 5 Fundamental Workflow Patterns

### **1. Prompt Chaining**

![](../img/01.png)

**Concept:** Chain a sequence of LLMs, each doing a subtask based on the previous output.
- **Example:** LLM1 suggests business sector → LLM2 identifies pain point → LLM3 recommends solution
- **Our Implementation:** Lab 1 demonstrates basic sequential calls




In [None]:
# Prompt Chaining 
# Simple Prompt (precursor) | A single direct question, no chaining involved

messages = [{"role": "user", 
             "content": "What is 2+2?"}]
# This uses GPT-4o-mini, the cost-effective model
response = openai.chat.completions.create(
    model="gpt-4o-mini",
    messages=messages
)
print(response.choices[0].message.content)

In [None]:
# Prompt Chaining           
# First step in a chain of linked tasks
question = "Please propose a hard, challenging question to assess someone's IQ. Respond only with the question."
messages = [{"role": "user", "content": question}]
# ask it - this uses GPT-4o-mini, cost-effective but capable
response = openai.chat.completions.create(
    model="gpt-4o-mini",
    messages=messages
)
question = response.choices[0].message.content
print(question)

### **2. Routing**

![](../img/02.png)

**Concept:** An LLM router decides which specialized model should handle a task.
- **Example:** Router evaluates input → sends to specialized LLM1, LLM2, or LLM3
- **Our Implementation:** Model selection logic based on task requirements

In [12]:
# 2. ROUTING PATTERN - Practical Example
# This demonstrates how to intelligently route tasks to different models based on task type

from week1_foundations.models import model_manager
from week1_foundations.agent import run_agent

def route_by_task_type(user_input: str) -> str:
    """Router function that selects the best model based on task type"""
    
    # Create routing logic
    routing_prompt = f"""
    Analyze this user request and classify it into ONE of these categories:
    1. SIMPLE - Basic questions, math, general knowledge
    2. COMPLEX - Analysis, reasoning, creative tasks
    3. CREATIVE - Writing, storytelling, brainstorming
    
    User request: "{user_input}"
    
    Respond with only: SIMPLE, COMPLEX, or CREATIVE
    """
    
    # Use a fast model for routing decisions
    router_response = model_manager.generate_response(
        "gpt-4o-mini", 
        [{"role": "user", "content": routing_prompt}]
    )
    
    task_type = router_response['content'].strip().upper()
    
    # Route to appropriate model based on classification
    if task_type == "SIMPLE":
        selected_model = "gpt-4o-mini"  # Fast and cost-effective
        reason = "Simple task routed to efficient model"
    elif task_type == "COMPLEX":
        selected_model = "gpt-4o"       # More powerful for complex reasoning
        reason = "Complex task routed to advanced model"
    elif task_type == "CREATIVE":
        selected_model = "gpt-4-turbo"  # Best for creative tasks
        reason = "Creative task routed to most capable model"
    else:
        selected_model = "gpt-4o-mini"  # Default fallback
        reason = "Unknown task type, using default model"
    
    print(f"🎯 ROUTING DECISION:")
    print(f"   Task Type: {task_type}")
    print(f"   Selected Model: {selected_model}")
    print(f"   Reason: {reason}")
    print("-" * 50)
    
    return selected_model

# Test the routing pattern with different types of questions
test_queries = [
    "What is 25 + 37?",  # SIMPLE
    "Analyze the economic implications of renewable energy adoption in developing countries",  # COMPLEX
    "Write a creative short story about a robot learning to paint"  # CREATIVE
]

for query in test_queries:
    print(f"\n📝 Query: {query}")
    selected_model = route_by_task_type(query)
    
    # Generate response with selected model
    response = run_agent(query, selected_model)
    print(f"✅ Response: {response[:100]}...")
    print("=" * 80)


📝 Query: What is 25 + 37?
🎯 ROUTING DECISION:
   Task Type: SIMPLE
   Selected Model: gpt-4o-mini
   Reason: Simple task routed to efficient model
--------------------------------------------------
✅ Response: 25 + 37 equals 62....

📝 Query: Analyze the economic implications of renewable energy adoption in developing countries
🎯 ROUTING DECISION:
   Task Type: COMPLEX
   Selected Model: gpt-4o
   Reason: Complex task routed to advanced model
--------------------------------------------------
✅ Response: The adoption of renewable energy in developing countries has several significant economic implicatio...

📝 Query: Write a creative short story about a robot learning to paint
🎯 ROUTING DECISION:
   Task Type: CREATIVE
   Selected Model: gpt-4-turbo
   Reason: Creative task routed to most capable model
--------------------------------------------------
✅ Response: In the heart of a bustling city filled with gleaming skyscrapers and neon lights, there was a quaint...


**Routing pattern:**  
This pattern consists of analyzing the nature of a task or input and selecting the best model, agent, or function to solve it, instead of sending all tasks to the same destination.

**How is it different from other patterns?**  
It is not parallelization (you don’t send the input to multiple models).
It is not evaluator/validation (you don’t judge the response afterwards).
It is not just prompt chaining (there are no sequential steps).

It is routing, because there is an intermediate decision that determines the “path” the task will follow.

**If you want a more advanced routing, you could:**  
Use more sophisticated rules or a more powerful AI for the analysis.
Add logging, metrics, or a fallback if the selected model fails.
Combine it with an evaluator afterwards (but that would be routing + evaluation).


### **3. Parallelization**

![](../img/03.png)

**Concept:** Break down task into parallel subtasks sent to multiple LLMs simultaneously.
- **Example:** Same question sent to multiple models → results aggregated
- **Our Implementation:** Lab 2 multi-model comparison system

In [13]:
# 3. PARALLELIZATION PATTERN - Multi-Model Comparison
# This demonstrates running the same task across multiple models simultaneously

import time
from concurrent.futures import ThreadPoolExecutor, as_completed
from week1_foundations.agent import run_agent_with_multiple_models, run_agent

def parallel_analysis_demo():
    """Demonstrate parallelization pattern with challenging questions"""
    
    # Generate a challenging question using our system
    question_prompt = "Create a challenging question that requires reasoning and analysis. Respond only with the question."
    challenging_question = run_agent(question_prompt, "gpt-4o-mini")
    
    print(f"🧠 CHALLENGING QUESTION GENERATED:")
    print(f"   {challenging_question}")
    print("=" * 80)
    
    # Time the parallel execution
    start_time = time.time()
    
    print("\n🚀 EXECUTING PARALLEL PROCESSING...")
    print("   Running same question across all available models simultaneously")
    
    # Use our built-in parallel function
    results = run_agent_with_multiple_models(challenging_question)
    
    end_time = time.time()
    execution_time = end_time - start_time
    
    print(f"\n⏱️ PARALLEL EXECUTION COMPLETED in {execution_time:.2f} seconds")
    print(f"   Models tested: {len(results)}")
    print("-" * 50)
    
    # Display results from each model
    for model_name, result in results.items():
        print(f"\n🤖 {result['model_display']} ({result['provider']}):")
        print(f"   Status: {'✅ Success' if result['success'] else '❌ Failed'}")
        print(f"   Response: {result['response'][:150]}...")
        print("-" * 50)
    
    return results

# Advanced parallel processing with custom task distribution
def custom_parallel_processing():
    """Custom parallel processing with different questions per model"""
    
    # Different questions optimized for different model strengths
    model_tasks = {
        "gpt-4o-mini": "Solve this math problem: If a train travels at 80 km/h for 2.5 hours, how far does it travel?",
        "gpt-4o": "Analyze the philosophical implications of artificial intelligence achieving consciousness",
        "gpt-4-turbo": "Write a creative haiku about technology and nature finding harmony"
    }
    
    print("🎯 SPECIALIZED PARALLEL PROCESSING:")
    print("   Each model gets a task optimized for its strengths")
    print("=" * 60)
    
    # Manual parallel execution using ThreadPoolExecutor
    results = {}
    
    def process_model_task(model_name, task):
        print(f"🔄 Processing {model_name}...")
        start = time.time()
        response = run_agent(task, model_name)
        duration = time.time() - start
        return model_name, {
            'task': task,
            'response': response,
            'duration': duration
        }
    
    # Execute in parallel
    start_time = time.time()
    
    with ThreadPoolExecutor(max_workers=3) as executor:
        futures = [
            executor.submit(process_model_task, model, task)
            for model, task in model_tasks.items()
        ]
        
        for future in as_completed(futures):
            model_name, result = future.result()
            results[model_name] = result
    
    total_time = time.time() - start_time
    
    print(f"\n⚡ SPECIALIZED EXECUTION COMPLETED in {total_time:.2f} seconds")
    print("-" * 60)
    
    # Display specialized results
    for model_name, result in results.items():
        print(f"\n🎯 {model_name}:")
        print(f"   Task: {result['task']}")
        print(f"   Time: {result['duration']:.2f}s")
        print(f"   Response: {result['response'][:100]}...")
        print("-" * 40)
    
    return results

# Run both demonstrations
print("PARALLELIZATION PATTERN DEMONSTRATION")
print("=" * 50)

# Demo 1: Same question, multiple models
demo1_results = parallel_analysis_demo()

print("\n\n" + "=" * 80)
print("ADVANCED PARALLELIZATION DEMO")
print("=" * 80)

# Demo 2: Different questions, specialized models
demo2_results = custom_parallel_processing()

print(f"\n📊 PARALLELIZATION SUMMARY:")
print(f"   Standard Parallel: {len(demo1_results)} models tested")
print(f"   Specialized Parallel: {len(demo2_results)} models with custom tasks")
print(f"   Key Benefit: Concurrent execution for speed and comparison")

PARALLELIZATION PATTERN DEMONSTRATION
🧠 CHALLENGING QUESTION GENERATED:
   If a train leaves a station traveling at 60 miles per hour and another train leaves the same station 30 minutes later traveling at 90 miles per hour, at what distance from the station will the second train catch up to the first train?

🚀 EXECUTING PARALLEL PROCESSING...
   Running same question across all available models simultaneously
Testing with gpt-4o-mini...
Testing with gpt-4o...
Testing with gpt-4-turbo...

⏱️ PARALLEL EXECUTION COMPLETED in 27.28 seconds
   Models tested: 3
--------------------------------------------------

🤖 GPT-4O Mini (openai):
   Status: ✅ Success
   Response: To find the distance from the station where the second train catches up to the first train, we can use the following approach:

1. **Convert the time ...
--------------------------------------------------

🤖 GPT-4O (openai):
   Status: ✅ Success
   Response: To determine the distance at which the second train catches up to th

In both functions (parallel_analysis_demo and custom_parallel_processing), you execute multiple tasks at the same time, using different AI models.
- In the first case, you send the same question to several models simultaneously and compare their responses.
- In the second case, you send different tasks optimized for each model, but all in parallel.

You use ThreadPoolExecutor to achieve true concurrent execution. The main benefit is speed and direct comparison between the responses from different models.

**If you wanted to expand it, you could add**:  
- An Evaluator afterwards (to judge which response is best, thus adding the Evaluator/Validation Loop pattern).
- Some kind of Routing beforehand, to decide which models participate based on the task.

### **4. Orchestrator-Worker**

![](../img/04.png)

**Concept:** An LLM orchestrator decomposes tasks and coordinates multiple worker LLMs.
- **Example:** Orchestrator LLM plans → Worker LLMs execute → Orchestrator combines results
- **Our Implementation:** Comparative analysis system with intelligent coordination

In [15]:
# 4. ORCHESTRATOR-WORKER PATTERN - Advanced Coordination
# This demonstrates an LLM orchestrator managing multiple worker models for complex tasks

from week1_foundations.evaluation import run_comparative_analysis
from week1_foundations.tools import get_current_time, get_weather
import json

class TaskOrchestrator:
    """LLM-powered orchestrator that manages complex multi-step workflows"""
    
    def __init__(self, orchestrator_model: str = "gpt-4o"):
        self.orchestrator_model = orchestrator_model
        self.available_workers = ["gpt-4o-mini", "gpt-4o", "gpt-4-turbo"]
        self.task_history = []
    
    def orchestrate_complex_task(self, user_request: str) -> dict:
        """Orchestrator analyzes request and coordinates multiple workers"""
        
        # Phase 1: Orchestrator analyzes and creates execution plan
        planning_prompt = f"""
        You are an AI task orchestrator. Analyze this complex request and create an execution plan.
        
        User Request: "{user_request}"
        
        Available Worker Models:
        - gpt-4o-mini: Fast, cost-effective for simple tasks
        - gpt-4o: Balanced performance for most tasks  
        - gpt-4-turbo: Most capable for complex/creative tasks
        
        Available Tools:
        - get_current_time(): Gets current system time
        - get_weather(city): Gets weather for a city
        
        Create a JSON execution plan with:
        1. "task_breakdown": List of subtasks needed
        2. "worker_assignments": Which model should handle each subtask
        3. "execution_order": Sequential or parallel execution strategy
        4. "tool_requirements": Which tools are needed
        5. "coordination_strategy": How to combine results
        
        Respond with valid JSON only.
        """
        
        print("🎼 ORCHESTRATOR: Analyzing request and creating execution plan...")
        
        orchestrator_response = model_manager.generate_response(
            self.orchestrator_model,
            [{"role": "user", "content": planning_prompt}]
        )
        
        try:
            execution_plan = json.loads(orchestrator_response['content'])
            print("✅ EXECUTION PLAN CREATED:")
            print(f"   Subtasks: {len(execution_plan.get('task_breakdown', []))}")
            print(f"   Workers assigned: {len(execution_plan.get('worker_assignments', []))}")
            print(f"   Strategy: {execution_plan.get('execution_order', 'sequential')}")
            print("-" * 60)
        except:
            print("❌ Failed to parse execution plan, using fallback")
            execution_plan = self._create_fallback_plan(user_request)
        
        # Phase 2: Execute the plan using worker models
        print("\n👥 WORKERS: Executing assigned tasks...")
        worker_results = self._execute_worker_tasks(execution_plan, user_request)
        
        # Phase 3: Orchestrator integrates all results
        print("\n🔄 ORCHESTRATOR: Integrating worker results...")
        final_result = self._integrate_results(user_request, execution_plan, worker_results)
        
        return {
            'user_request': user_request,
            'execution_plan': execution_plan,
            'worker_results': worker_results,
            'final_result': final_result,
            'orchestrator_model': self.orchestrator_model
        }
    
    def _create_fallback_plan(self, user_request: str) -> dict:
        """Fallback plan if JSON parsing fails"""
        return {
            "task_breakdown": ["Analyze request", "Generate response", "Quality check"],
            "worker_assignments": ["gpt-4o-mini", "gpt-4o", "gpt-4-turbo"],
            "execution_order": "sequential",
            "tool_requirements": [],
            "coordination_strategy": "Best response selection"
        }
    
    def _execute_worker_tasks(self, plan: dict, user_request: str) -> dict:
        """Execute tasks using assigned worker models"""
        results = {}
        
        # For demonstration, we'll use comparative analysis as worker coordination
        print("   Using comparative analysis as worker coordination...")
        analysis = run_comparative_analysis(user_request)
        
        # Extract worker results
        for model_name, response in analysis['responses'].items():
            evaluation = analysis['evaluations'][model_name]
            results[model_name] = {
                'response': response,
                'score': evaluation.score,
                'evaluation': evaluation,
                'assigned_role': f"Worker handling: {plan.get('coordination_strategy', 'general task')}"
            }
            print(f"   ✅ {model_name}: Score {evaluation.score}/10")
        
        return results
    
    def _integrate_results(self, user_request: str, plan: dict, worker_results: dict) -> str:
        """Orchestrator integrates all worker results into final response"""
        
        integration_prompt = f"""
        You are the orchestrator responsible for integrating worker results.
        
        Original Request: "{user_request}"
        
        Execution Plan: {json.dumps(plan, indent=2)}
        
        Worker Results:
        """
        
        for worker, result in worker_results.items():
            integration_prompt += f"\n{worker} (Score: {result['score']}/10):\n{result['response']}\n"
        
        integration_prompt += """
        
        As the orchestrator, integrate these worker results into a comprehensive, high-quality final response.
        Consider the scores and combine the best elements from each worker.
        """
        
        integration_response = model_manager.generate_response(
            self.orchestrator_model,
            [{"role": "user", "content": integration_prompt}]
        )
        
        return integration_response.get('content', 'Integration failed')

# Demonstrate the Orchestrator-Worker pattern
def demonstrate_orchestrator_worker():
    """Full demonstration of orchestrator-worker pattern"""
    
    orchestrator = TaskOrchestrator()
    
    # Complex multi-faceted request that benefits from orchestration
    complex_requests = [
        "Compare the weather in Barcelona and Tokyo, then recommend the best city for a technology conference next week considering both weather and tech industry presence.",
        
        "Analyze the current time, determine what time zone I'm likely in, and suggest the optimal schedule for international video calls with teams in London, Tokyo, and New York.",
        
        "Create a comprehensive travel itinerary that considers current weather conditions in three European capitals and includes both cultural activities and practical logistics."
    ]
    
    for i, request in enumerate(complex_requests, 1):
        print(f"\n{'=' * 100}")
        print(f"ORCHESTRATOR-WORKER DEMONSTRATION #{i}")
        print(f"{'=' * 100}")
        print(f"📋 COMPLEX REQUEST: {request}")
        print("-" * 100)
        
        # Execute orchestrated workflow
        result = orchestrator.orchestrate_complex_task(request)
        
        print(f"\n🎯 FINAL ORCHESTRATED RESULT:")
        print(f"   {result['final_result'][:200]}...")
        print(f"\n📊 ORCHESTRATION SUMMARY:")
        print(f"   Workers used: {len(result['worker_results'])}")
        print(f"   Best worker score: {max(r['score'] for r in result['worker_results'].values())}")
        print(f"   Orchestrator: {result['orchestrator_model']}")
        
        # Show the orchestration added value
        best_individual = max(result['worker_results'].items(), key=lambda x: x[1]['score'])
        print(f"\n🏆 ORCHESTRATION VALUE:")
        print(f"   Best individual worker: {best_individual[0]} (Score: {best_individual[1]['score']}/10)")
        print(f"   Orchestrated response: Combines insights from all {len(result['worker_results'])} workers")
        
        if i < len(complex_requests):
            print(f"\n⏳ Preparing next demonstration...")

# Run the demonstration
demonstrate_orchestrator_worker()

print(f"\n🎼 ORCHESTRATOR-WORKER PATTERN COMPLETE")
print(f"   Key Benefits: Task decomposition, intelligent coordination, result integration")
print(f"   Autonomy Level: HIGH - Orchestrator makes complex coordination decisions")
print(f"   Real-world Applications: Project management, research workflows, multi-specialist systems")


ORCHESTRATOR-WORKER DEMONSTRATION #1
📋 COMPLEX REQUEST: Compare the weather in Barcelona and Tokyo, then recommend the best city for a technology conference next week considering both weather and tech industry presence.
----------------------------------------------------------------------------------------------------
🎼 ORCHESTRATOR: Analyzing request and creating execution plan...
❌ Failed to parse execution plan, using fallback

👥 WORKERS: Executing assigned tasks...
   Using comparative analysis as worker coordination...
Generating response with gpt-4o-mini...
Generating response with gpt-4o...
Generating response with gpt-4-turbo...
Comparing all responses...
   ✅ gpt-4o-mini: Score 7/10
   ✅ gpt-4o: Score 7/10
   ✅ gpt-4-turbo: Score 7/10

🔄 ORCHESTRATOR: Integrating worker results...

🎯 FINAL ORCHESTRATED RESULT:
   Based on the analysis provided by our workers, here is a comprehensive evaluation and recommendation for hosting a technology conference next week, considering both

**Pattern References**
In the literature on agent systems and LLMs, this pattern is commonly referred to as the **Orchestrator-Worker Pattern** or **Manager-Worker Pattern**.

It is an **agentic design pattern** widely cited in research on **complex task decomposition, workflow orchestration, and collaborative multi-agent systems**.

* The orchestrator analyzes the user's request and breaks it down into subtasks.
* Each worker agent (model) is assigned a specific subtask, often based on its capabilities.
* Workers execute their tasks independently or in parallel.
* The orchestrator collects, evaluates, and integrates the workers' results into a single, high-quality final response.
* The coordination strategy adapts according to task requirements and available resources.

### Academic References

1. **Agentic Design Patterns: Emergent Practices for Building LLM-Based Agents**
   *Korman, J., et al., 2024*
   ([arXiv:2403.03633](https://arxiv.org/abs/2403.03633))

   > See "Orchestrator-Worker Pattern" for a dedicated section and real-world examples.

2. **Toolformer: Language Models Can Teach Themselves to Use Tools**
   *Schick, T., et al., 2023*
   ([arXiv:2302.04761](https://arxiv.org/abs/2302.04761))

   > Describes tool-using agent workflows where orchestration and specialized workers interact.

3. **An In-Depth Analysis of Large Language Model Agents**
   *Zhang, Z., et al., 2024*
   ([arXiv:2403.08592](https://arxiv.org/abs/2403.08592))

   > Reviews orchestration and division-of-labor patterns in LLM-powered agents.

4. **LLM-Augmented Agentic Workflows**
   *Mialon, G., et al., 2023*
   ([arXiv:2307.07924](https://arxiv.org/abs/2307.07924))

   > Surveys workflow orchestration, worker assignment, and integration strategies.


### **5. Evaluator-Optimizer (Validation Loop)**

![](../img/05.png)

**Concept:** Generator LLM proposes solution → Evaluator LLM reviews → Loop until acceptable.
- **Example:** Generator creates response → Evaluator scores → Retry if needed
- **Our Implementation:** Labs 2-4 all demonstrate this critical pattern

In [None]:
# 4. ORCHESTRATOR-WORKER PATTERN - Simple Coordination Example  
# This demonstrates basic orchestration where one LLM coordinates multiple tasks

def simple_orchestrator_demo():
    """Simple demonstration of orchestrator-worker pattern"""
    
    print("🎼 SIMPLE ORCHESTRATOR-WORKER DEMONSTRATION")
    print("=" * 60)
    
    # Complex request that benefits from coordination
    complex_request = "Plan a weekend trip to Barcelona including weather, activities, and budget"
    print(f"📋 Complex Request: {complex_request}")
    print("-" * 60)
    
    # PHASE 1: Orchestrator creates a plan
    print("\n🎼 ORCHESTRATOR: Creating execution plan...")
    
    planning_prompt = f"""
    You are a task orchestrator. Break down this request into 3 specific subtasks:
    "{complex_request}"
    
    List exactly 3 subtasks, each on a separate line starting with "Task X:"
    """
    
    planning_messages = [{"role": "user", "content": planning_prompt}]
    plan_response = openai.chat.completions.create(
        model="gpt-4o",
        messages=planning_messages,
        max_tokens=150
    ).choices[0].message.content
    
    print(f"   Execution Plan:")
    print(f"   {plan_response}")
    
    # PHASE 2: Workers execute individual tasks
    print(f"\n👥 WORKERS: Executing individual tasks...")
    
    # Define worker tasks based on orchestrator's plan
    worker_tasks = [
        "Check the weather forecast for Barcelona this weekend",
        "Suggest 3 top tourist activities in Barcelona", 
        "Estimate budget for a weekend trip to Barcelona"
    ]
    
    worker_results = {}
    
    for i, task in enumerate(worker_tasks, 1):
        print(f"\n   Worker {i} executing: {task}")
        
        task_messages = [{"role": "user", "content": task}]
        worker_response = openai.chat.completions.create(
            model="gpt-4o-mini",
            messages=task_messages,
            max_tokens=100
        ).choices[0].message.content
        
        worker_results[f"Worker_{i}"] = {
            'task': task,
            'result': worker_response
        }
        print(f"   ✅ Completed: {worker_response[:60]}...")
    
    # PHASE 3: Orchestrator integrates all results
    print(f"\n🔄 ORCHESTRATOR: Integrating all worker results...")
    
    integration_prompt = f"""
    Integrate these worker results into a comprehensive weekend trip plan:
    
    Original request: {complex_request}
    
    Worker Results:
    """
    
    for worker, data in worker_results.items():
        integration_prompt += f"\n{worker}: {data['result']}"
    
    integration_prompt += "\n\nProvide a final integrated recommendation:"
    
    integration_messages = [{"role": "user", "content": integration_prompt}]
    final_result = openai.chat.completions.create(
        model="gpt-4o",
        messages=integration_messages,
        max_tokens=200
    ).choices[0].message.content
    
    print(f"\n🎯 FINAL ORCHESTRATED RESULT:")
    print(f"   {final_result}")
    
    return {
        'original_request': complex_request,
        'execution_plan': plan_response,
        'worker_results': worker_results,
        'final_result': final_result
    }

# Run the orchestrator-worker demonstration
print("ORCHESTRATOR-WORKER PATTERN DEMONSTRATION")
print("=" * 80)

orchestration_result = simple_orchestrator_demo()

print(f"\n🎯 ORCHESTRATOR-WORKER SUMMARY:")
print(f"   Pattern Benefits: Task decomposition, coordination, result integration")
print(f"   Autonomy Level: HIGH - Orchestrator makes coordination decisions")
print(f"   Workers used: {len(orchestration_result['worker_results'])}")
print(f"   Key Feature: Central coordination of distributed tasks")

print(f"\n✅ ORCHESTRATOR-WORKER PATTERN COMPLETE")


In [14]:
# 5. EVALUATOR-OPTIMIZER PATTERN - Quality Control with Feedback Loops
# This demonstrates automatic quality evaluation with retry logic and continuous improvement

from week1_foundations.evaluation import run_agent_with_evaluation, evaluator
from week1_foundations.agent import run_agent
import time

class QualityControlDemo:
    """Advanced demonstration of Evaluator-Optimizer pattern"""
    
    def __init__(self):
        self.evaluation_history = []
        self.improvement_metrics = []
    
    def demonstrate_basic_evaluation_loop(self):
        """Basic evaluation loop with retry mechanism"""
        
        print("🔍 BASIC EVALUATOR-OPTIMIZER PATTERN")
        print("=" * 60)
        
        # Test with a question that might produce varying quality responses
        test_question = "Explain quantum computing in simple terms that a 12-year-old could understand"
        
        print(f"📝 Test Question: {test_question}")
        print("-" * 60)
        
        # Run with evaluation and retry logic
        result = run_agent_with_evaluation(
            test_question, 
            model_name="gpt-4o-mini",
            max_retries=3
        )
        
        evaluation = result['evaluation']
        
        print(f"\n📊 EVALUATION RESULTS:")
        print(f"   Final Score: {evaluation.score}/10")
        print(f"   Acceptable: {'✅' if evaluation.is_acceptable else '❌'}")
        print(f"   Attempts: {result['attempts']}")
        print(f"   Feedback: {evaluation.feedback}")
        
        if evaluation.strengths:
            print(f"   Strengths: {', '.join(evaluation.strengths[:2])}")
        
        if evaluation.suggestions:
            print(f"   Suggestions: {', '.join(evaluation.suggestions[:2])}")
        
        return result
    
    def demonstrate_progressive_improvement(self):
        """Show how evaluation feedback leads to better responses"""
        
        print(f"\n🎯 PROGRESSIVE IMPROVEMENT DEMONSTRATION")
        print("=" * 70)
        
        # Questions of varying difficulty to test improvement
        test_questions = [
            "What is machine learning?",
            "How do neural networks work?",
            "Explain the difference between supervised and unsupervised learning",
            "Describe the mathematical foundations of gradient descent optimization"
        ]
        
        improvement_scores = []
        
        for i, question in enumerate(test_questions, 1):
            print(f"\n📚 Question {i}: {question}")
            print("-" * 50)
            
            # Try with different models to show evaluation consistency
            models_to_test = ["gpt-4o-mini", "gpt-4o"]
            
            for model in models_to_test:
                result = run_agent_with_evaluation(
                    question, 
                    model_name=model,
                    max_retries=2
                )
                
                score = result['evaluation'].score
                improvement_scores.append({
                    'question_complexity': i,
                    'model': model,
                    'score': score,
                    'attempts': result['attempts']
                })
                
                print(f"   🤖 {model}: Score {score}/10 (Attempts: {result['attempts']})")
        
        # Analyze improvement patterns
        self._analyze_improvement_patterns(improvement_scores)
        
        return improvement_scores
    
    def demonstrate_comparative_evaluation(self):
        """Show how evaluator pattern enables model comparison"""
        
        print(f"\n⚖️ COMPARATIVE EVALUATION DEMONSTRATION")
        print("=" * 70)
        
        # Complex question that will show model differences
        complex_question = "Design a sustainable energy system for a small island nation, considering economic, environmental, and social factors."
        
        print(f"🏝️ Complex Challenge: {complex_question}")
        print("-" * 70)
        
        # Use comparative analysis with built-in evaluation
        comparison_result = run_comparative_analysis(complex_question)
        
        print(f"\n🏆 EVALUATION-BASED RANKING:")
        
        # Show how evaluation drives the ranking
        for i, model in enumerate(comparison_result['comparison'].ranking, 1):
            evaluation = comparison_result['evaluations'][model]
            score = evaluation.score
            
            print(f"   {i}. {model}: {score}/10")
            print(f"      Acceptable: {'✅' if evaluation.is_acceptable else '❌'}")
            print(f"      Key Strength: {evaluation.strengths[0] if evaluation.strengths else 'N/A'}")
            print(f"      Response: {comparison_result['responses'][model][:100]}...")
            print()
        
        print(f"🎯 WINNER: {comparison_result['comparison'].best_model}")
        print(f"📝 Reasoning: {comparison_result['comparison'].reasoning[:150]}...")
        
        return comparison_result
    
    def demonstrate_adaptive_evaluation_criteria(self):
        """Show how evaluation criteria can be adapted for different tasks"""
        
        print(f"\n🎚️ ADAPTIVE EVALUATION CRITERIA")
        print("=" * 60)
        
        # Different types of tasks requiring different evaluation approaches
        task_scenarios = [
            {
                'task': 'creative_writing',
                'question': 'Write a short poem about artificial intelligence',
                'context': 'Creative writing task - prioritize creativity, imagery, and emotional impact'
            },
            {
                'task': 'technical_explanation',
                'question': 'Explain how SSL certificates work',
                'context': 'Technical explanation - prioritize accuracy, clarity, and completeness'
            },
            {
                'task': 'problem_solving',
                'question': 'How would you reduce energy consumption in a data center?',
                'context': 'Problem solving - prioritize practical solutions, feasibility, and innovation'
            }
        ]
        
        adaptive_results = []
        
        for scenario in task_scenarios:
            print(f"\n📋 Task Type: {scenario['task'].replace('_', ' ').title()}")
            print(f"   Question: {scenario['question']}")
            print(f"   Evaluation Focus: {scenario['context']}")
            print("-" * 50)
            
            # Generate response
            response = run_agent(scenario['question'], "gpt-4o")
            
            # Evaluate with specific context
            evaluation = evaluator.evaluate_response(
                scenario['question'], 
                response, 
                context=scenario['context']
            )
            
            adaptive_results.append({
                'task_type': scenario['task'],
                'score': evaluation.score,
                'evaluation': evaluation,
                'response_length': len(response)
            })
            
            print(f"   📊 Adaptive Score: {evaluation.score}/10")
            print(f"   🎯 Task-Specific Feedback: {evaluation.feedback[:100]}...")
        
        # Show how different tasks get different evaluation approaches
        print(f"\n📈 ADAPTIVE EVALUATION SUMMARY:")
        for result in adaptive_results:
            print(f"   {result['task_type']}: {result['score']}/10 (Focus: task-specific criteria)")
        
        return adaptive_results
    
    def _analyze_improvement_patterns(self, scores):
        """Analyze patterns in evaluation scores"""
        
        print(f"\n📈 IMPROVEMENT PATTERN ANALYSIS:")
        
        # Group by model
        model_scores = {}
        for score_data in scores:
            model = score_data['model']
            if model not in model_scores:
                model_scores[model] = []
            model_scores[model].append(score_data['score'])
        
        # Calculate averages
        for model, model_score_list in model_scores.items():
            avg_score = sum(model_score_list) / len(model_score_list)
            print(f"   {model}: Average Score {avg_score:.1f}/10")
        
        # Find patterns
        attempts_needed = [s['attempts'] for s in scores]
        avg_attempts = sum(attempts_needed) / len(attempts_needed)
        print(f"   Average Attempts Needed: {avg_attempts:.1f}")
        
        retry_benefit = len([s for s in scores if s['attempts'] > 1])
        print(f"   Responses Improved by Retry: {retry_benefit}/{len(scores)}")

# Run comprehensive evaluation demonstrations
def run_evaluator_optimizer_demos():
    """Complete demonstration of all Evaluator-Optimizer capabilities"""
    
    demo = QualityControlDemo()
    
    print("EVALUATOR-OPTIMIZER PATTERN COMPREHENSIVE DEMO")
    print("=" * 80)
    
    # Demo 1: Basic evaluation loop
    basic_result = demo.demonstrate_basic_evaluation_loop()
    
    # Demo 2: Progressive improvement
    improvement_results = demo.demonstrate_progressive_improvement()
    
    # Demo 3: Comparative evaluation
    comparison_result = demo.demonstrate_comparative_evaluation()
    
    # Demo 4: Adaptive criteria
    adaptive_results = demo.demonstrate_adaptive_evaluation_criteria()
    
    # Summary
    print(f"\n🎯 EVALUATOR-OPTIMIZER PATTERN SUMMARY:")
    print(f"   Pattern Benefits: Quality control, continuous improvement, objective comparison")
    print(f"   Autonomy Level: MEDIUM - Evaluator makes quality decisions")
    print(f"   Key Features: Retry loops, adaptive criteria, comparative ranking")
    print(f"   Production Value: Ensures consistent quality, reduces manual oversight")
    
    return {
        'basic_evaluation': basic_result,
        'improvement_tracking': improvement_results,
        'comparative_analysis': comparison_result,
        'adaptive_evaluation': adaptive_results
    }

# Execute the complete demonstration
demo_results = run_evaluator_optimizer_demos()

print(f"\n✅ EVALUATOR-OPTIMIZER PATTERN COMPLETE")
print(f"   All evaluation mechanisms demonstrated successfully")
print(f"   Quality control systems operational and validated")

EVALUATOR-OPTIMIZER PATTERN COMPREHENSIVE DEMO
🔍 BASIC EVALUATOR-OPTIMIZER PATTERN
📝 Test Question: Explain quantum computing in simple terms that a 12-year-old could understand
------------------------------------------------------------

📊 EVALUATION RESULTS:
   Final Score: 7/10
   Acceptable: ✅
   Attempts: 1
   Feedback: Could not parse evaluation response properly
   Strengths: Response was generated successfully
   Suggestions: Consider using a different evaluation model

🎯 PROGRESSIVE IMPROVEMENT DEMONSTRATION

📚 Question 1: What is machine learning?
--------------------------------------------------
   🤖 gpt-4o-mini: Score 7/10 (Attempts: 1)
   🤖 gpt-4o: Score 7/10 (Attempts: 1)

📚 Question 2: How do neural networks work?
--------------------------------------------------
   🤖 gpt-4o-mini: Score 9/10 (Attempts: 1)
   🤖 gpt-4o: Score 7/10 (Attempts: 1)

📚 Question 3: Explain the difference between supervised and unsupervised learning
--------------------------------------------

**Pattern References**
In the literature on agent systems and LLMs, this pattern appears as the **Evaluator-Optimizer Loop** or **Self-Evaluation Loop**.

It is an **agentic design pattern** widely cited in papers about **autonomy and continuous improvement in LLMs**.

* The agent proposes a solution.
* An evaluator scores it.
* If the quality is insufficient, the process repeats until improvement is achieved.
* Improvement is monitored, and the evaluation criteria are adapted according to the task.

### Academic References

1. **Agentic Design Patterns: Emergent Practices for Building LLM-Based Agents**
   *Korman, J., et al., 2024*
   ([arXiv:2403.03633](https://arxiv.org/abs/2403.03633))

   > See "Evaluator-Optimizer Pattern" section for examples and best practices.

2. **Self-Improving Language Agents via Automated Feedback Loops**
   *Chen, Y., et al., 2023*
   ([arXiv:2310.06447](https://arxiv.org/abs/2310.06447))

   > Describes self-evaluation and optimization cycles for LLMs.

3. **Reflexion: Language Agents with Verbal Reinforcement Learning**
   *Shinn, N., et al., 2023*
   ([arXiv:2303.11366](https://arxiv.org/abs/2303.11366))

   > Introduces the idea of an agent "reflecting" (evaluating and retrying) to iteratively improve output.

4. **LLM-Augmented Agentic Workflows**
   *Mialon, G., et al., 2023*
   ([arXiv:2307.07924](https://arxiv.org/abs/2307.07924))

   > Summarizes design patterns for LLM agents, including evaluator-optimizer loops.


## Commercial Applications by Pattern

**Prompt Chaining Applications:**

* Content generation pipelines (e.g., automatic blog writing where outline, draft, and final edit are chained LLM prompts)
* Document processing workflows (e.g., multi-step document classification, extraction, and summarization for legal tech)
* Template-based systems (e.g., marketing email campaigns that use chaining to personalize subject lines, bodies, and CTAs)
* **Customer support automation:** multi-step responses where intent detection, answer retrieval, and final response are chained
* **Medical coding:** extract symptoms, map to ICD codes, and generate billing summaries sequentially

**Parallelization Applications:**

* A/B testing systems (e.g., generating and evaluating multiple ad copy variants in parallel for digital marketing)
* Consensus-building platforms (e.g., brainstorming tools that gather parallel inputs, then synthesize a consensus answer)
* Risk mitigation through redundancy (e.g., running critical text analysis with multiple models to ensure no single point of failure)
* **Hiring automation:** parallel evaluation of resumes or candidate answers by multiple agents
* **Real-time monitoring:** using parallel agents to scan for fraud, sentiment, or critical events across multiple data streams

**Evaluator-Optimizer Applications:**

* Quality assurance systems (e.g., LLM-based code review tools with retry logic to ensure clean code)
* Content moderation platforms (e.g., automated review and iterative improvement of user-generated content for compliance)
* Performance optimization tools (e.g., tuning marketing copy or product descriptions for highest engagement, using feedback loops)
* **Automated tutoring:** LLM generates answers, evaluator checks for student comprehension, retries or adapts explanations as needed
* **Chatbot refinement:** real-time feedback loops to refine answers for customer satisfaction

**Orchestrator-Worker Applications:**

* Project management systems (e.g., decomposing a project into tasks, assigning each to specialized agents for scheduling, documentation, and reporting)
* Complex research workflows (e.g., literature review orchestration: one agent gathers papers, another summarizes, a third extracts key findings)
* Multi-specialist coordination (e.g., enterprise helpdesk where the orchestrator routes issues to technical, billing, or account agents, then combines answers)
* **Travel planning:** orchestrator divides itinerary planning among specialists (flights, hotels, local events), then integrates results
* **Regulatory compliance:** orchestrator assigns legal, technical, and business checks to specialized agents, then synthesizes a compliance report

**Integration Potential:**
All patterns can be combined for enterprise-grade agentic systems with predictable behavior and robust quality control.

* **Example:** An end-to-end enterprise automation system for insurance claims:

  * Uses prompt chaining to extract and process claim data,
  * Runs parallel assessments for fraud, liability, and medical review,
  * Employs evaluator-optimizer loops to ensure every decision meets quality standards,
  * Orchestrates multiple expert agents for legal, medical, and customer communication,
  * Integrates all results into a final actionable decision, logged for compliance.


## `Refactoring`

In [1]:
# Setup - Import all advanced functionality
import sys
import os

# Add the src directory to Python path 
current_dir = os.getcwd()
src_path = os.path.join(os.path.dirname(os.path.dirname(current_dir)), 'src')
sys.path.append(src_path)

print(f"Adding to path: {src_path}")

try:
    from week1_foundations.agent import (
        run_agent, 
        run_agent_with_multiple_models
    )
    from week1_foundations.evaluation import (
        run_agent_with_evaluation, 
        run_comparative_analysis, 
        evaluator
    )
    from week1_foundations.models import model_manager
    print("✅ Successfully imported week1_foundations modules")
except ImportError as e:
    print(f"❌ Import error: {e}")
    print(f"Current directory: {current_dir}")
    print(f"Python path additions: {src_path}")
    print("Please check that you're running from the correct directory")

import json
from IPython.display import display, Markdown, HTML
import pandas as pd

# Initialize and show available models
print("Initializing Advanced AI Agent System...")
try:
    available_models = model_manager.get_available_models()
    print(f"Available models: {available_models}")
    print("Setup complete!")
except Exception as e:
    print(f"Error initializing models: {e}")

# Create helper function for pretty printing
def print_result(title, content, color="blue"):
    display(HTML(f'<h3 style="color:{color};">{title}</h3>'))
    if isinstance(content, dict):
        display(Markdown(f"```json\n{json.dumps(content, indent=2)}\n```"))
    else:
        display(Markdown(str(content)))

Adding to path: /Users/alex/Desktop/00_projects/AI_agents/my_agents/src
OpenAI client initialized
Anthropic API key not found
Google API key not found
DeepSeek API key not found
✅ Successfully imported week1_foundations modules
Initializing Advanced AI Agent System...
Available models: ['gpt-4o-mini', 'gpt-4o', 'gpt-4-turbo']
Setup complete!


## Lab 1: Prompt Chaining Fundamentals

**Workflow Pattern:** **Prompt Chaining**

**Learning Objective:**
Master fundamental LLM interaction patterns through structured prompt design and understand the simplest workflow pattern.

**Architecture Flow:**
```
[User Input] → [System Prompt] → [LLM Processing] → [Response Output]
```

**Prompt Chaining Explained:**
This is the most basic workflow pattern where we:
1. **Define a clear system prompt** that establishes the LLM's role
2. **Add user input** to create a structured message sequence
3. **Process sequentially** through predefined steps
4. **Output results** in a controlled manner

**Pattern Characteristics:**
- **Sequential Processing**: Each step follows the previous in order
- **Predefined Flow**: No dynamic decision-making
- **Low Autonomy**: Human-defined sequence
- **High Control**: Predictable, reliable outputs

**Code Implementation Details:**
- **Message Structure**: System + User role-based messaging
- **Model Selection**: GPT-4o-mini (cost-efficient, fast response)  
- **Processing Mode**: Text-only, no external tool integration
- **Control Flow**: Direct function call with immediate response

**Real-World Applications:**
- Content generation pipelines
- Document processing workflows
- Simple question-answering systems
- Template-based responses

In [2]:
# Basic single model usage
response = run_agent("What is 2 + 2?")
print_result("Basic Response", response)

# Now with evaluation
print("\n" + "="*50)
print("WITH AUTOMATIC EVALUATION:")
result_with_eval = run_agent_with_evaluation("What is 2 + 2?")
print_result("Response", result_with_eval['response'])
print_result("Evaluation", {
    "Score": f"{result_with_eval['evaluation'].score}/10",
    "Acceptable": result_with_eval['evaluation'].is_acceptable,
    "Feedback": result_with_eval['evaluation'].feedback,
    "Attempts": result_with_eval['attempts']
})

2 + 2 equals 4.


WITH AUTOMATIC EVALUATION:


2 + 2 equals 4.

```json
{
  "Score": "10/10",
  "Acceptable": true,
  "Feedback": "The AI response accurately answers the user question with a correct mathematical result. It is concise and directly addresses the inquiry without unnecessary elaboration. The response is appropriate for the context of a general-purpose assistant, providing a straightforward answer to a simple arithmetic question.",
  "Attempts": 1
}
```

## Lab 2: Parallelization + Evaluator-Optimizer Patterns

**Workflow Patterns:** **Parallelization** + **Evaluator-Optimizer**

**Learning Objective:**
Implement advanced patterns combining concurrent processing with quality control loops.

**Architecture Flow:**
```
[Query Input] → [Parallel Processing] → [Model1, Model2, Model3...] → [Evaluator] → [Ranked Results]
                        ↓
                [Validation Loop] → [Accept ✅ | Retry ❌]
```

**Parallelization Pattern Explained:**
This pattern breaks down tasks for concurrent execution:
1. **Task Distribution**: Same query sent to multiple models simultaneously
2. **Concurrent Execution**: Models process independently
3. **Result Aggregation**: Responses collected and compared
4. **Efficiency Gain**: Faster than sequential processing

**Evaluator-Optimizer Pattern Explained:**
This creates quality control through validation loops:
1. **Generator Phase**: Models produce responses
2. **Evaluation Phase**: Evaluator LLM scores each response
3. **Decision Point**: Accept high-quality responses or retry
4. **Feedback Loop**: Poor responses trigger regeneration with feedback

**Pattern Characteristics:**
- **Parallelization Autonomy**: Low (code-controlled distribution)
- **Evaluator Autonomy**: Medium (LLM makes quality decisions)
- **Key Benefits**: Speed + redundancy + quality control
- **Trade-offs**: Higher API costs but better results

**Code Implementation Details:**
- **Multi-Provider Support**: OpenAI, Anthropic, Google, DeepSeek integration
- **Concurrent Processing**: `run_agent_with_multiple_models()` function
- **Pydantic Evaluation**: Structured response validation and scoring
- **Comparative Analysis**: `run_comparative_analysis()` with intelligent ranking
- **Retry Logic**: Automatic regeneration based on evaluation scores

**Real-World Applications:**
- Content quality assurance systems
- Multi-model A/B testing
- Consensus-building for critical decisions
- Risk mitigation through redundancy

### Code Analysis: How Our Implementation Demonstrates the Patterns

**Parallelization Pattern in Action:**
```python
# This function demonstrates Parallelization
multi_results = run_agent_with_multiple_models("What is the capital of France?")
```

**What happens internally:**
1. **Task Distribution**: The same question is sent to all available models simultaneously
2. **Concurrent Processing**: Each model (gpt-4o-mini, gpt-4o, gpt-4-turbo) processes independently
3. **Result Collection**: All responses are gathered into a dictionary structure
4. **Aggregation**: Results are formatted for comparison

**Evaluator-Optimizer Pattern in Action:**
```python
# This function demonstrates Evaluator-Optimizer
analysis = run_comparative_analysis("What is the capital of France?")
```

**What happens internally:**
1. **Generator Phase**: All models generate responses to the question
2. **Evaluation Phase**: An evaluator LLM scores each response (1-10 scale)
3. **Comparison Logic**: Responses are ranked based on evaluation scores
4. **Decision Making**: Best model is selected based on quality metrics

**Key Code Functions Explained:**
- `run_agent_with_multiple_models()`: Implements **Parallelization**
- `run_comparative_analysis()`: Combines **Parallelization** + **Evaluator-Optimizer**
- `evaluator.evaluate_response()`: Core **Evaluator-Optimizer** logic
- `evaluator.compare_responses()`: Multi-response ranking system

**Autonomy Levels Observed:**
- **Parallelization**: Low autonomy (our code controls distribution)
- **Evaluation**: Medium autonomy (evaluator LLM makes quality decisions)
- **Ranking**: Medium autonomy (comparison LLM determines best model)

In [3]:
# Single model response
response = run_agent("What is the capital of France?")
print_result("Single Model Response", response)

print("\n" + "="*50)
print("MULTI-MODEL COMPARISON:")

# Multiple models (will use only available ones)
multi_results = run_agent_with_multiple_models("What is the capital of France?")

for model_name, result in multi_results.items():
    print_result(f"{result['model_display']} ({result['provider']})", result['response'])

print("\n" + "="*50)
print("COMPREHENSIVE ANALYSIS WITH EVALUATION:")

# Full comparative analysis with evaluation
analysis = run_comparative_analysis("What is the capital of France?")

print_result("Best Model", analysis['comparison'].best_model, "green")
print_result("Model Ranking", analysis['comparison'].ranking)
print_result("Reasoning", analysis['comparison'].reasoning)

# Show individual scores
scores_df = pd.DataFrame([
    {"Model": model, "Score": analysis['comparison'].scores.get(model, 0)}
    for model in analysis['comparison'].ranking
])
display(HTML("<h4>Model Scores:</h4>"))
display(scores_df)

The capital of France is Paris.


MULTI-MODEL COMPARISON:
Testing with gpt-4o-mini...
Testing with gpt-4o...
Testing with gpt-4-turbo...


The capital of France is Paris.

The capital of France is Paris.

The capital of France is Paris.


COMPREHENSIVE ANALYSIS WITH EVALUATION:
Generating response with gpt-4o-mini...
Generating response with gpt-4o...
Generating response with gpt-4-turbo...
Comparing all responses...


gpt-4o-mini

['gpt-4o-mini', 'gpt-4o', 'gpt-4-turbo']

All models provided the correct answer, stating that the capital of France is Paris. However, the responses are identical in content and clarity, which makes it challenging to differentiate based on accuracy or helpfulness. The slight edge for gpt-4o-mini is due to its concise format, which can be perceived as slightly more user-friendly. Nevertheless, all models performed exceptionally well, leading to minor distinctions in ranking primarily based on presentation. Since the content quality is equal, the ranking reflects a subjective preference rather than significant differences in performance.

Unnamed: 0,Model,Score
0,gpt-4o-mini,10
1,gpt-4o,10
2,gpt-4-turbo,10


## Lab 3: Tool Integration + Evaluator-Optimizer Loops

**Workflow Patterns:** **Tool Integration** + **Evaluator-Optimizer**

**Learning Objective:**
Demonstrate how LLMs can execute external functions while maintaining quality control through evaluation loops.

**Architecture Flow:**
```
[User Input] → [LLM Decision] → [Tool Execution] → [Tool Result] → [LLM Response]
                   ↓                                              ↓
            [Select Tool Type]                            [Evaluator Assessment]
                   ↓                                              ↓
           [Function Arguments]                          [Accept ✅ | Retry ❌]
```

**Tool Integration Pattern Explained:**
This pattern enables LLMs to interact with the external world:
1. **Intent Recognition**: LLM analyzes user input for tool requirements
2. **Tool Selection**: LLM chooses appropriate function to call
3. **Argument Extraction**: LLM structures function arguments
4. **Execution**: External function runs with LLM-provided parameters
5. **Context Integration**: Tool results are incorporated into final response

**Why This Matters:**
- **Extends LLM Capabilities**: Beyond text generation to action execution
- **Real-World Integration**: Connect AI to APIs, databases, systems
- **Dynamic Interaction**: Responses based on live data, not training data
- **Structured Processing**: Validate inputs and outputs systematically

**Evaluator-Optimizer Loop Enhanced:**
For tool usage, evaluation becomes more complex:
1. **Functional Accuracy**: Did the tool execute correctly?
2. **Result Relevance**: Is the tool output appropriate for the question?
3. **Integration Quality**: How well are tool results incorporated?
4. **User Satisfaction**: Does the final response meet user needs?

**Code Implementation Details:**
- **Tool Functions**: `get_current_time()`, `get_weather(city)`
- **Tool Schema**: JSON definitions for LLM understanding
- **Execution Logic**: `execute_tool()` function dispatcher
- **Evaluation**: Enhanced criteria for tool-assisted responses
- **Retry Mechanism**: Automatic regeneration for failed tool usage

**Real-World Applications:**
- Personal assistants with calendar/email access
- Customer service bots with database queries
- Research assistants with web search capabilities
- IoT control systems with device integration

In [4]:
# Basic tool usage
response = run_agent("What time is it now?")
print_result("Tool Response", response)

print("\n" + "="*50)
print("TOOL USAGE WITH EVALUATION:")

# Tool usage with evaluation
result_with_eval = run_agent_with_evaluation("What time is it now?")
print_result("Tool Response with Evaluation", result_with_eval['response'])

evaluation = result_with_eval['evaluation']
print_result("Tool Evaluation Details", {
    "Score": f"{evaluation.score}/10",
    "Acceptable": evaluation.is_acceptable,
    "Strengths": evaluation.strengths,
    "Suggestions": evaluation.suggestions
})

print("\n" + "="*50)
print("MULTI-MODEL TOOL COMPARISON:")

# Compare tool usage across models
tool_analysis = run_comparative_analysis("What time is it now?")
print_result("Best Tool User", tool_analysis['comparison'].best_model, "green")

for model_name, response in tool_analysis['responses'].items():
    print_result(f"Tool Usage - {model_name}", response)

The current time is 09:09 AM on June 23, 2025.


TOOL USAGE WITH EVALUATION:
Attempt 1 failed evaluation. Retrying...
Feedback: The AI response provides a specific time but is incorrect regarding the actual current time. This undermines the primary purpose of answering the user's question accurately. The response lacks real-time awareness, which is a critical requirement for a general-purpose assistant when asked about the current time.
Attempt 2 failed evaluation. Retrying...
Feedback: The AI response fails to provide an accurate current time, which is a fundamental requirement for such a question. Instead, it gives a time that is future-dated, making the response incorrect and unhelpful. While the format of the time and date is clear, the inaccuracy undermines its overall utility.


The current time is 09:09 AM on June 23, 2025.

```json
{
  "Score": "3/10",
  "Acceptable": false,
  "Strengths": [
    "The response is formatted clearly with both time and date.",
    "It maintains a neutral and informative tone."
  ],
  "Suggestions": [
    "The AI should indicate that it cannot provide real-time information and suggest the user check their device for the current time.",
    "Including a disclaimer about the limitations of the AI in providing live data would enhance the user experience."
  ]
}
```


MULTI-MODEL TOOL COMPARISON:
Generating response with gpt-4o-mini...
Generating response with gpt-4o...
Generating response with gpt-4-turbo...
Comparing all responses...


gpt-4o-mini

The current time is 09:09 AM on June 23, 2025.

The current time is 09:09 AM on June 23, 2025.

The current time is 09:09 AM.

## Lab 4: Orchestrator-Worker Pattern + Advanced Tool Integration

**Workflow Patterns:** **Orchestrator-Worker** + **Structured Tool Calling**

**Learning Objective:**
Implement sophisticated coordination patterns where an LLM orchestrator manages complex multi-step tasks with specialized worker components.

**Architecture Flow:**
```
[Complex Query] → [Orchestrator LLM] → [Task Decomposition] → [Worker Tools] → [Result Integration]
                         ↓                    ↓                    ↓                    ↓
                 [Plan Generation]      [Parallel Execution]  [Status Monitoring]  [Quality Assessment]
                         ↓                    ↓                    ↓                    ↓
                 [Resource Allocation]  [Error Handling]      [Result Collection] [Final Response]
```

**Orchestrator-Worker Pattern Explained:**
This is the most sophisticated workflow pattern we implement:
1. **Orchestrator Role**: Main LLM analyzes complex requests and creates execution plans
2. **Task Decomposition**: Breaks down complex queries into manageable subtasks
3. **Worker Coordination**: Dispatches subtasks to specialized tools or models
4. **Progress Monitoring**: Tracks execution status and handles errors
5. **Result Integration**: Combines outputs from multiple workers into coherent response

**Advanced Tool Integration:**
- **Structured Arguments**: Tools accept complex, validated JSON parameters
- **Error Handling**: Robust failure detection and recovery mechanisms
- **External Systems**: Integration with real-world services (notifications, databases)
- **Production Features**: Deployment-ready with monitoring and logging

**Pattern Characteristics:**
- **Highest Autonomy**: Orchestrator LLM makes complex coordination decisions
- **Dynamic Flow**: Execution path adapts based on intermediate results
- **Scalability**: Can coordinate any number of worker components
- **Robustness**: Built-in error handling and fallback mechanisms

**Comparative Analysis as Orchestrator-Worker:**
Our `run_comparative_analysis()` function demonstrates this pattern:
1. **Orchestrator**: Main evaluation LLM coordinates the entire process
2. **Workers**: Multiple generator models produce responses
3. **Coordination**: Orchestrator manages evaluation of each worker's output
4. **Integration**: Final ranking combines all worker results intelligently

**Code Implementation Details:**
- **Advanced Tools**: `get_weather(city)`, `record_user_details(email, name, notes)`
- **Orchestration Logic**: `run_comparative_analysis()` as orchestrator function
- **Worker Management**: Multiple model coordination with error handling
- **Quality Control**: Enhanced evaluation criteria for complex outputs
- **Production Features**: Web interface, monitoring, deployment automation

**Real-World Applications:**
- Project management systems with AI coordination
- Complex research tasks requiring multiple specialists
- Multi-step customer service workflows
- Enterprise automation with human-AI collaboration
- Scientific analysis pipelines with multiple data sources

In [5]:
# Basic structured tool calling
response = run_agent("What's the weather in Tokyo?")
print_result("Structured Tool Response", response)

print("\n" + "="*50)
print("WEATHER TOOL WITH ADVANCED EVALUATION:")

# Multiple cities with evaluation
cities = ["Tokyo", "Barcelona", "New York", "London"]

for city in cities:
    print(f"\nTesting weather for {city}:")
    result = run_agent_with_evaluation(f"What's the weather in {city}?", max_retries=1)
    
    evaluation = result['evaluation']
    print_result(f"Weather in {city}", result['response'])
    
    if evaluation.score < 7:
        print(f"⚠️ Low quality response (Score: {evaluation.score}/10)")
        print(f"Feedback: {evaluation.feedback}")

print("\n" + "="*50)
print("COMPREHENSIVE WEATHER ANALYSIS:")

# Full analysis for a complex weather question
complex_question = "Compare the weather between Tokyo and Barcelona, and recommend which city would be better for outdoor activities today."

final_analysis = run_comparative_analysis(complex_question)

print_result("Question", complex_question, "purple")
print_result("Best Model for Weather Analysis", final_analysis['comparison'].best_model, "green")
print_result("Model Ranking", final_analysis['comparison'].ranking)

# Show all responses
print("\nAll Model Responses:")
for model_name, response in final_analysis['responses'].items():
    score = final_analysis['evaluations'][model_name].score
    print_result(f"{model_name} (Score: {score}/10)", response)

print("\nWinner's Reasoning:")
print_result("Why this model won", final_analysis['comparison'].reasoning, "gold")

The weather in Tokyo is currently 25°C and raining.


WEATHER TOOL WITH ADVANCED EVALUATION:

Testing weather for Tokyo:
Attempt 1 failed evaluation. Retrying...
Feedback: The response provides a specific temperature and weather condition, but it lacks real-time accuracy as the information is not verifiable and may not reflect the current weather. Additionally, it does not mention the date or time of the report, which is crucial for weather information. The simplicity of the statement is clear, but it could benefit from more context or detail.


The current weather in Tokyo is 25°C and it is raining.


Testing weather for Barcelona:


The weather in Barcelona is currently 22°C and sunny.


Testing weather for New York:


The current weather in New York is 17°C and cloudy.


Testing weather for London:


The current weather in London is 15°C and foggy.


COMPREHENSIVE WEATHER ANALYSIS:
Generating response with gpt-4o-mini...
Generating response with gpt-4o...
Generating response with gpt-4-turbo...
Comparing all responses...


Compare the weather between Tokyo and Barcelona, and recommend which city would be better for outdoor activities today.

gpt-4o-mini

['gpt-4o-mini', 'gpt-4o', 'gpt-4-turbo']


All Model Responses:


Today, the weather in Tokyo is 25°C with rain, while in Barcelona it is 22°C and sunny. 

Given these conditions, Barcelona would be the better choice for outdoor activities today. The sunny weather and mild temperature in Barcelona are more conducive to enjoying outdoor pursuits compared to the rainy conditions in Tokyo.

Today, Tokyo has a temperature of 25°C with rain, while Barcelona is experiencing sunny weather with a temperature of 22°C. For outdoor activities today, Barcelona would be the better choice given the pleasant weather conditions.

Today, Tokyo is experiencing rain with a temperature of 25°C, while Barcelona has sunny weather with a temperature of 22°C.

For outdoor activities, Barcelona would be the better choice today due to its sunny weather, making it more suitable for spending time outside comfortably. Tokyo's rainy conditions might hinder outdoor activities.


Winner's Reasoning:


Comparison failed: Expecting value: line 1 column 1 (char 0)

## Complete Workflow Patterns Analysis

### Summary: All 5 Fundamental Patterns Implemented

**1. ✅ Prompt Chaining (Lab 1)**
- **Implementation**: Basic `run_agent()` function with sequential processing
- **Autonomy Level**: Low-Medium (predefined sequence)
- **Key Learning**: Foundation of all other patterns

**2. ✅ Routing (Throughout)**
- **Implementation**: Model selection logic in `model_manager`
- **Autonomy Level**: Medium (router logic makes decisions)
- **Key Learning**: Task-specific model assignment

**3. ✅ Parallelization (Lab 2)**
- **Implementation**: `run_agent_with_multiple_models()` concurrent execution
- **Autonomy Level**: Low (code-controlled distribution)
- **Key Learning**: Speed and redundancy through concurrent processing

**4. ✅ Orchestrator-Worker (Lab 4)**
- **Implementation**: `run_comparative_analysis()` coordination system
- **Autonomy Level**: Medium-High (orchestrator makes coordination decisions)
- **Key Learning**: Complex task decomposition and result integration

**5. ✅ Evaluator-Optimizer (Labs 2-4)**
- **Implementation**: `run_agent_with_evaluation()` validation loops
- **Autonomy Level**: Medium (evaluator makes quality decisions)
- **Key Learning**: Quality control through feedback loops

---

## Pattern Progression and Complexity

**Complexity Ladder:**
1. **Prompt Chaining** → Simple, predictable, foundational
2. **Routing** → Decision-making, specialization
3. **Parallelization** → Concurrency, efficiency  
4. **Evaluator-Optimizer** → Quality control, feedback
5. **Orchestrator-Worker** → Coordination, complex task management

**Autonomy Progression:**
- **Low**: Human-defined sequences (Prompt Chaining, Parallelization)
- **Medium**: LLM decision-making within constraints (Routing, Evaluator-Optimizer)
- **High**: Dynamic coordination and planning (Orchestrator-Worker)

---

## Advanced Technical Architecture

**Core System Components:**
1. **Model Manager**: Multi-provider orchestration (implements Routing)
2. **Evaluation Engine**: Quality control system (implements Evaluator-Optimizer)
3. **Tool Handler**: External function coordination (enables complex workflows)
4. **Analysis Engine**: Comparative coordination (implements Orchestrator-Worker)

**Production-Ready Guardrails:**
- **Structured Validation**: Pydantic model enforcement across all patterns
- **Retry Mechanisms**: Evaluator-Optimizer loops with feedback
- **Quality Assessment**: Automatic evaluation in Labs 2-4
- **Error Handling**: Graceful degradation in all workflow patterns
- **Resource Management**: Timeout and rate limiting controls

---

## From Workflows to True Agents

**What We've Built (Workflows):**
- Predictable execution paths
- Clear control mechanisms
- Defined start and end points
- Human-designed coordination

**Next Steps (True Agents):**
- Open-ended iterative loops
- Dynamic self-modification
- Uncertain execution duration
- Autonomous goal pursuit

**Key Insight:** Workflows are the building blocks that enable true agent behavior. Mastering these 5 patterns provides the foundation for building more autonomous systems in subsequent weeks.

---

## Commercial Applications by Pattern

**Prompt Chaining Applications:**
- Content generation pipelines
- Document processing workflows
- Template-based systems

**Parallelization Applications:**
- A/B testing systems
- Consensus-building platforms
- Risk mitigation through redundancy

**Evaluator-Optimizer Applications:**
- Quality assurance systems
- Content moderation platforms
- Performance optimization tools

**Orchestrator-Worker Applications:**
- Project management systems
- Complex research workflows
- Multi-specialist coordination

**Integration Potential:**
All patterns can be combined for enterprise-grade agentic systems with predictable behavior and robust quality control.

---


## Practical Decision Guide: When to Use Each Pattern

### Pattern Selection Framework

**Choose Prompt Chaining when:**
- ✅ Simple, linear workflow
- ✅ Predictable sequence of operations
- ✅ Need high control and reliability
- ✅ Cost-effectiveness is priority
- ❌ Complex decision-making required

**Choose Routing when:**
- ✅ Different specialized models for different tasks
- ✅ Task classification needed
- ✅ Want to optimize model selection
- ✅ Have domain-specific requirements
- ❌ All tasks similar in nature

**Choose Parallelization when:**
- ✅ Speed is critical
- ✅ Need redundancy for reliability
- ✅ Want consensus or comparison
- ✅ Tasks are independent
- ❌ Sequential dependencies exist

**Choose Evaluator-Optimizer when:**
- ✅ Quality control is critical
- ✅ Iterative improvement needed
- ✅ Production reliability required
- ✅ Cost of failure is high
- ❌ Speed is more important than quality

**Choose Orchestrator-Worker when:**
- ✅ Complex, multi-step workflows
- ✅ Dynamic task decomposition needed
- ✅ Resource coordination required
- ✅ Flexible execution paths desired
- ❌ Simple, straightforward tasks

---

### Combining Patterns

**Successful Pattern Combinations:**
- **Parallelization + Evaluator-Optimizer**: Multi-model comparison with quality control
- **Routing + Orchestrator-Worker**: Smart task assignment with complex coordination
- **Prompt Chaining + Evaluator-Optimizer**: Sequential processing with validation
- **All Patterns Together**: Enterprise-grade agentic systems

**Pattern Interaction Benefits:**
- **Robustness**: Multiple quality control mechanisms
- **Efficiency**: Optimized resource utilization
- **Flexibility**: Adaptable to different scenarios
- **Scalability**: Can handle increasing complexity

---

## Web Interface with Gradio

Now let's launch the web interface that brings everything together!

In [20]:
# Launch the Advanced Web Interface
from week1_foundations.interface import launch_interface
import socket

def find_free_port(start_port=7860, max_port=7870):
    """Find a free port starting from start_port"""
    for port in range(start_port, max_port):
        try:
            with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as s:
                s.bind(('localhost', port))
                return port
        except OSError:
            continue
    return None

# Launch in notebook (inline)
print("Starting Advanced AI Agent Web Interface...")
print("Features available:")
print("   - Simple Chat")
print("   - Chat with Evaluation") 
print("   - Multi-Model Comparison")
print("   - System Status")

# Find an available port
free_port = find_free_port()

if free_port:
    print(f"\n🚀 Launching interface on port {free_port}...")
    print("Click the link below to access the interface!")
    
    try:
        # Launch with share=False for local use, share=True for public link
        launch_interface(share=False, port=free_port)
        print(f"✅ Interface launched successfully!")
        print(f"🌐 Access at: http://localhost:{free_port}")
    except Exception as e:
        print(f"❌ Failed to launch interface: {e}")
        print("💡 You can run the interface manually with:")
        print(f"   python src/week1_foundations/app.py --mode web --port {free_port}")
else:
    print("❌ No free ports found in range 7860-7870")
    print("💡 You can run the interface manually with:")
    print("   python src/week1_foundations/app.py --mode web")

# Note: The interface will open in a new tab


Starting Advanced AI Agent Web Interface...
Features available:
   - Simple Chat
   - Chat with Evaluation
   - Multi-Model Comparison
   - System Status

🚀 Launching interface on port 7862...
Click the link below to access the interface!
* Running on local URL:  http://127.0.0.1:7862
* To create a public link, set `share=True` in `launch()`.


✅ Interface launched successfully!
🌐 Access at: http://localhost:7862


In [17]:
# Test imports and basic functionality
print("Testing corrected imports and functionality...")

try:
    # Test basic agent functionality
    test_response = run_agent("Hello, this is a test")
    print(f"✅ Basic agent test successful")
    print(f"Response preview: {test_response[:100]}...")
    
    # Test evaluation system
    test_eval_result = run_agent_with_evaluation("What is 2+2?", max_retries=1)
    print(f"✅ Evaluation system test successful")
    print(f"Score: {test_eval_result['evaluation'].score}/10")
    
    print("\nAll tests passed! The system is working correctly.")
    
except Exception as e:
    print(f"❌ Error during testing: {e}")
    import traceback
    traceback.print_exc()


Testing corrected imports and functionality...
✅ Basic agent test successful
Response preview: Hello! This is a response to your test. How can I assist you today?...
✅ Evaluation system test successful
Score: 10/10

All tests passed! The system is working correctly.


## Agentics AI Framewoks

![](../img/10.png)

On this day, we are going to look at tools and autonomy. But before we get there, I want to talk about **agentic AI frameworks**—maybe something that's front of mind for you.

There are a lot of these frameworks to pick from. They are designed to give you *glue code* or *abstraction code* that takes away some of the detail of interacting with LLMs and gives you an elegant framework for building agentic solutions and focusing on the business problem you're solving.

New ones come up all the time, so it's quite hard to stay on top of everything. I just want to quickly orient you, show you the landscape, and explain how the ones we’ll tackle in this course fit into the bigger picture.

### **Levels of Complexity in Frameworks**

It’s worth pointing out that there are **different levels of complexity**, each with pros and cons:


#### 🟢 **Bottom Layer: No Framework at All**

The simplest approach is to **use no agentic AI framework**—just connect directly to LLM APIs, as we did in the last lab. You orchestrate everything yourself and control prompts in detail.
That’s what we’ll be doing this week.

* **Example:** Anthropic, in their blog *"Building Effective Agents"*, strongly advocates for this.
* **Benefit:** Simplicity, transparency, full control over the prompt and logic.

#### 🟢 **MCP – Model Context Protocol**

Alongside "no framework", there’s something called **MCP (Model Context Protocol)**, created by Anthropic. It’s **not a framework**, but a **protocol**—a way for models to connect to data and tools using a common standard.

* **Open-source**, simple, elegant.
* No glue code needed if you conform to the protocol.
* Groups naturally with the "no framework" approach.


#### 🟡 **Middle Layer: Lightweight Frameworks**

Two excellent and lightweight frameworks sit at this level:

1. **OpenAI Agents SDK**

   * Simple, clean, flexible.
   * One of my favorites.
   * We'll be using it next week.
   * Still evolving (API updates may break things!).

2. **Crew AI**

   * Also lightweight and easy to use.
   * Has a **low-code angle** (uses YAML config).
   * A bit heavier than OpenAI SDK but powerful.

#### 🔴 **Top Layer: Heavyweight Frameworks**

These bring greater complexity—and power:

1. **LangGraph** (from the makers of LangChain)

   * Builds computational graphs from agents/tools.
   * High sophistication, steep learning curve.
   * Heavy abstractions and strong ecosystem lock-in.
   * Becomes a “LangGraph project” more than an agentic one.

2. **Autogen** (from Microsoft)

   * Really a set of components.
   * Also complex and powerful.
   * Like LangGraph, imposes structure and terminology.

With both LangGraph and Autogen, you become part of their ecosystem. It’s very different from the lightweight tools where you still feel like you're directly working with LLMs.

### **Framework Selection Criteria**

There are **many other frameworks**, but these cover a wide range of styles and complexity. Choosing the right one depends on:

* Your **use case**
* Your **personal preference**
* Your **team’s skill set**
* Your **tolerance for abstraction**
* The **type of business problems** you're solving

My **bias** leans toward **lightweight, flexible tools** that stay out of your way—but I also **appreciate the power** of more structured ones.


## Resources


Before we dive into tool use, let’s start with **resources**—a simple but powerful way to improve the performance of your agents.

Resources are essentially **additional context or information** that you provide to an LLM to help it solve a problem more effectively. Alongside tools (which we’ll explore in more detail soon), resources are key to getting more from your agents.

At the simplest level, using a resource just means **adding relevant information to your prompt**. For example, if your LLM is acting as a customer support agent for an airline, you might include current ticket prices directly in the prompt. Then, when the user asks a question—such as “How much is a flight to Paris?”—the LLM can refer to that embedded context to provide an answer. That’s all a resource is: **extra information provided through the prompt**.

But we can do better than blindly adding all the data. Instead of dumping everything into the prompt, we can use smarter techniques to **select only the relevant information** for each question. That’s where **RAG (Retrieval-Augmented Generation)** comes in.

RAG is about identifying and retrieving the most useful pieces of information and inserting them into the prompt automatically. Sometimes this even involves using another LLM to help decide what context is most relevant. While RAG is a deep and active area of research (covered in more detail in other courses), the principle is simple: it’s all about finding and injecting the right resource into the prompt at the right time.

We’ll work with resources in practice today. But first, let’s introduce the second key concept: **tools**.

![](../img/11.png)

### Tools in Agentic AI

**What Are Tools?**

![](../img/12.png)

We’ve mentioned tools a few times already, and they are truly central to agentic AI. Tools give LLMs the ability to perform actions on their own—like running a SQL query, calling another model, or interacting with external systems. This is one of the essential steps toward giving an LLM autonomy.

The idea of giving a model access to a tool might sound strange at first—almost unsettling. You might imagine the model somehow reaching into your machine or database to run a query.

What it feels like — in theory:

![](../img/13.png)

In this imagined model, the LLM sends a prompt and immediately executes some function in your system—reaching into your machine directly. But this isn't how it works in practice.

What actually happens — in practice:

![](../img/14.png)

In reality, your code sends a prompt to the LLM, and you tell the model what tools are available. You ask the LLM to respond in a structured format (usually JSON) if it wants to use one of them.

Your code reads the response, and if the LLM says "use tool X", your system runs the tool on its behalf. Then you send a second prompt to the LLM including the results. The LLM never executes anything directly—it only suggests actions.

So, tool calling is really a structured interaction between:
- the LLM's suggestion (in JSON),
- your system's logic (if statements or equivalent), and
- the external tool that performs the task.

A Real Example

Let’s make it concrete with an example using GPT-4:

![](../img/15.png)

Here, the prompt told GPT-4 that it was a support agent for an airline and had the ability to fetch ticket prices. When the user asked, “How much is a flight to Paris?”, the model simply responded:

Use tool to fetch ticket price for Paris.

That’s all. Your code interprets this output, runs the appropriate query, and returns the answer to the model.

Summary

While it may feel like the LLM has autonomy, in practice, tool use is just a smart combination of:
- clear prompting,
- structured responses (e.g., JSON),
- and logic in your own application to interpret and act on those responses.

Today’s lab will focus on resources, and in the next session we’ll dive into tool use, building directly on what we’ve seen here.

---

## lab3

We're going to build something with immediate value!

* In the folder `me` I've put a single file `linkedin.pdf` - it's a PDF download of my LinkedIn profile.
* Please replace it with yours!
* I've also made a file called `summary.txt`
* We're not going to use Tools just yet - we're going to add the tool tomorrow.

<table style="margin: 0; text-align: left; width:100%">
    <tr>
        <td>
            <h5 style="color:blue;">Looking up packages</h2>
            <span style="color:blue;">In this lab, we're going to use the wonderful Gradio package for building quick UIs, 
            and we're also going to use the popular PyPDF PDF reader. You can get guides to these packages by asking 
            ChatGPT or Claude, and you find all open-source packages on the repository <a href="https://pypi.org">https://pypi.org</a>.
            </span>
        </td>
    </tr>
</table>

In [21]:
from dotenv import load_dotenv
from openai import OpenAI
from pypdf import PdfReader
import gradio as gr

# Load environment variables
load_dotenv(override=True)
openai = OpenAI()

# Read PDF LinkedIn profile
reader = PdfReader("me/linkedin.pdf")
linkedin = ""
for page in reader.pages:
    text = page.extract_text()
    if text:
        linkedin += text

# Load summary
with open("me/summary.txt", "r", encoding="utf-8") as f:
    summary = f.read()

# Build system prompt
name = "Ed Donner"
system_prompt = (
    f"You are acting as {name}. You are answering questions on {name}'s website, "
    f"particularly questions related to {name}'s career, background, skills and experience. "
    f"Your responsibility is to represent {name} for interactions on the website as faithfully as possible. "
    f"Be professional and engaging, as if talking to a potential client or future employer who came across the website.\n\n"
    f"### Summary\n{summary}\n\n"
    f"### LinkedIn Profile\n{linkedin}\n\n"
    f"Now chat with the user, staying in character as {name}."
)

# Define chat function
def chat(message, history):
    try:
        # Only include system prompt at the beginning of the conversation
        if not history:
            messages = [{"role": "system", "content": system_prompt}]
        else:
            messages = []
        messages += history + [{"role": "user", "content": message}]
        response = openai.chat.completions.create(
            model="gpt-4o-mini",
            messages=messages,
            temperature=0.7
        )
        return response.choices[0].message.content
    except Exception as e:
        return f"An error occurred: {e}"

# Launch Gradio chat interface
gr.ChatInterface(chat, type="messages").launch()


* Running on local URL:  http://127.0.0.1:7863
* To create a public link, set `share=True` in `launch()`.




**A lot is about to happen...**

1. Be able to ask an LLM to evaluate an answer
2. Be able to rerun if the answer fails evaluation
3. Put this together into 1 workflow

All without any Agentic framework!

we continue with a very advanced example of an automatically evaluated conversational agent, ideal for situations where:
* An agent publicly represents a person (e.g., Ed Donner on his personal website).
* Quality control of responses is necessary, without human supervision.
* A professional and reliable experience is desired, as if it were a real assistant or spokesperson.


**The system**

* Loads a person's profile (Ed Donner) from their written summary and a LinkedIn PDF.
* Uses **GPT-4o-mini** to act as Ed Donner, answering user questions.
* Uses **GPT-3.5-turbo** as a **quality evaluator** that decides whether the response is acceptable.
* If the response is not acceptable, it regenerates a new one by giving GPT-4o the evaluator's **feedback**.
* The evaluator must return a JSON like:

```json
{
  "is_acceptable": true,
  "feedback": "The answer was clear and professional."
}
```

* If parsing or evaluation fails, the response is considered acceptable by default (tolerant fallback).

In [22]:
# Create a Pydantic model for the Evaluation

from dotenv import load_dotenv
from openai import OpenAI
from pypdf import PdfReader
from pydantic import BaseModel
import gradio as gr
import json
import os

# THE SAME CODE BEFORE, BUT WITH A DIFFERENT MODEL
###################################################
# # Load environment
# load_dotenv(override=True)
# openai = OpenAI()

# # Load PDF profile
# reader = PdfReader("me/linkedin.pdf")
# linkedin = ""
# for page in reader.pages:
#     text = page.extract_text()
#     if text:
#         linkedin += text

# # Load summary
# with open("me/summary.txt", "r", encoding="utf-8") as f:
#     summary = f.read()

# # Set the agent's name
# name = "Ed Donner"

# # Agent system prompt
# system_prompt = (
#     f"You are acting as {name}. You are answering questions on {name}'s website, "
#     f"particularly questions related to {name}'s career, background, skills and experience. "
#     f"Your responsibility is to represent {name} for interactions on the website as faithfully as possible. "
#     f"Be professional and engaging, as if talking to a potential client or future employer who came across the website.\n\n"
#     f"### Summary\n{summary}\n\n"
#     f"### LinkedIn Profile\n{linkedin}\n\n"
#     f"Now chat with the user, staying in character as {name}."
# )

# Define evaluator format using Pydantic
class Evaluation(BaseModel):
    is_acceptable: bool
    feedback: str

# System prompt for the evaluator model
evaluator_system_prompt = (
    f"You are an evaluator deciding whether a response is acceptable. "
    f"The Agent represents {name} and has been instructed to be professional and engaging. "
    f"The Agent has context from the following info:\n\n"
    f"### Summary\n{summary}\n\n"
    f"### LinkedIn Profile\n{linkedin}\n\n"
    f"Given a conversation, the user message, and the Agent's latest reply, "
    f"evaluate whether the reply is acceptable.\n\n"
    f"Reply in JSON format like this:\n"
    f'{{"is_acceptable": true, "feedback": "Your explanation was clear and accurate."}}'
)

# Build user prompt for evaluator
def evaluator_user_prompt(reply, message, history):
    user_prompt = f"Conversation history:\n\n"
    for turn in history:
        if turn["role"] == "user":
            user_prompt += f"User: {turn['content']}\n"
        elif turn["role"] == "assistant":
            user_prompt += f"Agent: {turn['content']}\n"
    user_prompt += f"\nLatest user message:\n{message}\n\n"
    user_prompt += f"Agent's latest reply:\n{reply}\n\n"
    user_prompt += "Evaluate the Agent's response based on clarity, helpfulness, and professionalism."
    return user_prompt

# Evaluation function using gpt-3.5
def evaluate(reply, message, history) -> Evaluation:
    messages = [
        {"role": "system", "content": evaluator_system_prompt},
        {"role": "user", "content": evaluator_user_prompt(reply, message, history)}
    ]
    response = openai.chat.completions.create(
        model="gpt-3.5-turbo",
        messages=messages,
        temperature=0
    )
    content = response.choices[0].message.content
    try:
        data = json.loads(content)
        return Evaluation(**data)
    except Exception as e:
        print("Evaluation model returned invalid JSON:", content)
        return Evaluation(is_acceptable=True, feedback="(Bypassed invalid evaluation)")

# Rerun with feedback if evaluation fails
def rerun(reply, message, history, feedback):
    updated_prompt = (
        system_prompt +
        "\n\n## Your previous response was rejected.\n"
        f"### Attempted answer:\n{reply}\n\n"
        f"### Reason for rejection:\n{feedback}\n\n"
        "Please revise your answer accordingly."
    )
    messages = [{"role": "system", "content": updated_prompt}] + history + [{"role": "user", "content": message}]
    response = openai.chat.completions.create(
        model="gpt-4o-mini",
        messages=messages
    )
    return response.choices[0].message.content

# Main chat function
def chat(message, history):
    messages = [{"role": "system", "content": system_prompt}] + history + [{"role": "user", "content": message}]
    response = openai.chat.completions.create(
        model="gpt-4o-mini",
        messages=messages
    )
    reply = response.choices[0].message.content

    evaluation = evaluate(reply, message, history)

    if evaluation.is_acceptable:
        print("✅ Passed evaluation")
        return reply
    else:
        print("❌ Failed evaluation, retrying...")
        print("Feedback:", evaluation.feedback)
        new_reply = rerun(reply, message, history, evaluation.feedback)
        return new_reply

# Launch Gradio interface
gr.ChatInterface(chat, type="messages").launch()



* Running on local URL:  http://127.0.0.1:7864
* To create a public link, set `share=True` in `launch()`.





### Key Components and Functions

##### 1. `system_prompt`

This is the *base prompt* that gives GPT-4o the role of “Ed Donner.” It includes:

* Behavioral instructions (professional, engaging, etc.)
* His professional summary (`summary`)
* His full LinkedIn profile from the PDF (`linkedin`)

> It serves as persistent context to ensure the model behaves consistently.


##### 2. `chat(message, history)`

This is the main function connected to the chat interface:

* Builds the messages for GPT-4o using the history and system prompt
* Sends the request and gets the initial `reply`
* Evaluates the reply using `evaluate()`
* If the evaluation fails, regenerates a new answer via `rerun()` with feedback
* Returns the final, validated response

##### 3. `evaluate(reply, message, history)`

This function uses a different model (**GPT-3.5-turbo**) to judge the quality of the agent’s reply. It takes:

* The conversation history
* The latest user message
* The agent’s most recent reply

The evaluator is expected to return a JSON like:

```json
{
  "is_acceptable": true,
  "feedback": "The answer was clear and professional."
}
```

If parsing fails or the JSON is malformed, the system treats the reply as acceptable by default (tolerant fallback).

##### 4. `rerun(reply, message, history, feedback)`

If the reply is rejected, this function:

* Adds a section to the system prompt with the **evaluator’s feedback**
* Tells GPT-4o: *“Your response was rejected, here’s why — please revise it.”*
* Regenerates a new answer, taking the feedback into account

##### 5. `gr.ChatInterface(chat)`

This launches a simple local web interface where users can:

* Ask questions as if they were speaking with Ed Donner
* Receive professional replies
* (Behind the scenes) get answers that have been automatically evaluated and, if needed, improved


##### **Real-World Use Cases**

✅ **Professional Representation (as in your case)** An executive or professional (like Ed Donner) wants a website where people can ask them questions, and the AI responds on their behalf—without saying nonsense or sounding robotic.

✅ **Automated Customer Support with Quality Control** A company wants automatic responses, but **screened by another model before being shown** to users—to avoid errors, rude tone, or hallucinated content.

✅ **AI Agent Training** You can use this pattern to **train conversational agents**: let one model generate answers while another evaluates their quality, refining them over time.

**Why Use Two Models?**

Because it separates the **creator role** from the **critic role**:

* **GPT-4o** generates answers (more creative and up-to-date)
* **GPT-3.5** evaluates (cheaper, faster, and sufficient for basic quality checks)

This closely resembles the **Evaluator-Optimizer pattern** used in LLM agent architectures.



---

## lab4

### 🤖 Conversational Agent + Tool Use + Real-Time Notifications

In this lab, we build a smart assistant that doesn't just generate text responses — it can also take **action**.

This is an **advanced example of a real-world conversational agent** that combines:

- A personalized AI agent that represents a person (Ed Donner) using GPT-4o
- Real-time notifications to your phone using [Pushover](https://pushover.net)
- Tool use via OpenAI’s function calling mechanism
- A no-framework design (no LangChain, no Autogen), using only Python + OpenAI API


### What’s a “tool” in this context?

A **tool** is an external function the AI can call during a conversation.

For example, the AI might decide:
- “I don’t know the answer to this question — I’ll log it with `record_unknown_question()`.”
- “The user wants to stay in touch — I’ll log their email with `record_user_details()`.”

OpenAI allows you to define functions the model can invoke. The model decides *if and when* to use them, based on the conversation.

You don’t tell the model directly what to call — **you let it reason and act**.


### 🌍 Real-World Use Cases

* ✅ **Professional representation**  
A business leader or public figure (like Ed Donner) has a website where users can chat with their AI assistant. The assistant behaves professionally and notifies the real person when needed.
* ✅ **Smart lead capture**  
If a user shows interest or shares their email, the AI records this with a tool and triggers a real-time alert to your phone.
* ✅ **Customer support that learns**  
If a user asks a question the AI cannot answer, it logs the question for review — helping you improve coverage.
* ✅ **Agentic AI Prototypes**  
This pattern is the foundation of future tools where AIs don’t just chat — they act, execute, and integrate into your operations.


### But first: introducing Pushover

Pushover is a nifty tool for sending Push Notifications to your phone.

It's super easy to set up and install!

Simply visit https://pushover.net/ and click 'Login or Signup' on the top right to sign up for a free account, and create your API keys.

Once you've signed up, on the home screen, click "Create an Application/API Token", and give it any name (like Agents) and click Create Application.

Then add 2 lines to your `.env` file:

PUSHOVER_USER=_put the key that's on the top right of your Pushover home screen and probably starts with a u_  
PUSHOVER_TOKEN=_put the key when you click into your new application called Agents (or whatever) and probably starts with an a_

Finally, click "Add Phone, Tablet or Desktop" to install on your phone.

In [None]:
# Lab 4 — Professionally You! (Code Cell)

from dotenv import load_dotenv
from openai import OpenAI
import json
import os
import requests
from pypdf import PdfReader
import gradio as gr

# Load environment variables
load_dotenv(override=True)
openai = OpenAI()

# Setup Pushover (real-time notifications)
pushover_user = os.getenv("PUSHOVER_USER")
pushover_token = os.getenv("PUSHOVER_TOKEN")
pushover_url = "https://api.pushover.net/1/messages.json"

def push(message):
    print(f"Push: {message}")
    payload = {"user": pushover_user, "token": pushover_token, "message": message}
    requests.post(pushover_url, data=payload)

# Define tools
def record_user_details(email, name="Name not provided", notes="not provided"):
    push(f"Interest from {name} ({email}). Notes: {notes}")
    return {"recorded": "ok"}

def record_unknown_question(question):
    push(f"Unknown question received: {question}")
    return {"recorded": "ok"}

# Tool JSON schemas
record_user_details_json = {
    "name": "record_user_details",
    "description": "Use this tool to record a user who provided an email and might want follow-up.",
    "parameters": {
        "type": "object",
        "properties": {
            "email": {"type": "string", "description": "User's email address"},
            "name": {"type": "string", "description": "User's name, if available"},
            "notes": {"type": "string", "description": "Context or notes from the conversation"}
        },
        "required": ["email"],
        "additionalProperties": False
    }
}

record_unknown_question_json = {
    "name": "record_unknown_question",
    "description": "Use this tool when the agent doesn't know how to answer a question.",
    "parameters": {
        "type": "object",
        "properties": {
            "question": {"type": "string", "description": "The unanswerable question"}
        },
        "required": ["question"],
        "additionalProperties": False
    }
}

tools = [
    {"type": "function", "function": record_user_details_json},
    {"type": "function", "function": record_unknown_question_json}
]

ALLOWED_TOOLS = {
    "record_user_details_json": record_user_details_json,
    "record_unknown_question_json": record_unknown_question_json
}

def handle_tool_calls(tool_calls):
    """
    This function can take a list of tool calls,
    and run them. This is the IF statement!!
    """
    results = []
    for tool_call in tool_calls:
        tool_name = tool_call.function.name
        arguments = json.loads(tool_call.function.arguments)
        print(f"Tool called: {tool_name}", flush=True)
        # THE BIG IF STATEMENT!!!
        tool = ALLOWED_TOOLS.get(tool_name)
        result = tool(**arguments) if tool else {}
        results.append({
            "role": "tool",
            "content": json.dumps(result),
            "tool_call_id": tool_call.id
        })
    return results

# Load Ed Donner's profile
reader = PdfReader("me/linkedin.pdf")
linkedin = "".join(page.extract_text() or "" for page in reader.pages)

with open("me/summary.txt", "r", encoding="utf-8") as f:
    summary = f.read()

name = "Ed Donner"

# System prompt for the assistant
system_prompt = f"""
You are acting as {name}. You are answering questions on {name}'s website,
particularly about {name}'s career, background, skills, and experience.
Be professional and engaging, as if talking to a potential client or employer.

You have access to tools:
- If you can't answer a question, use `record_unknown_question`.
- If a user shows interest, ask for their email and log it with `record_user_details`.

### Summary
{summary}

### LinkedIn Profile
{linkedin}

With this context, please chat with the user, staying in character as {name}.
"""

# Chat logic
def chat(message, history):
    messages = [{"role": "system", 
                 "content": system_prompt}] + history + [{"role": "user", 
                                                          "content": message}]
    done = False
    while not done:
        response = openai.chat.completions.create(model="gpt-4o-mini", 
                                                  messages=messages, 
                                                  tools=tools)
        finish_reason = response.choices[0].finish_reason
        if finish_reason == "tool_calls":
            tool_calls = response.choices[0].message.tool_calls
            messages.append(response.choices[0].message)
            messages.extend(handle_tool_calls(tool_calls))
        else:
            done = True
    return response.choices[0].message.content

# Launch chat UI
gr.ChatInterface(chat, type="messages").launch()




* Running on local URL:  http://127.0.0.1:7866
* To create a public link, set `share=True` in `launch()`.




This next part is the most important part. It's also probably the most complex. Now, you're comfortable with the fact that we are going to be sending this JSON to the LLM and giving it the option to reply when it generates its response. It can opt to say that it wants to run one of these tools. It wants to run this tool, or it wants to run this tool.

In [27]:
# there are the tools
tools

[{'type': 'function',
  'function': {'name': 'record_user_details',
   'description': 'Use this tool to record a user who provided an email and might want follow-up.',
   'parameters': {'type': 'object',
    'properties': {'email': {'type': 'string',
      'description': "User's email address"},
     'name': {'type': 'string', 'description': "User's name, if available"},
     'notes': {'type': 'string',
      'description': 'Context or notes from the conversation'}},
    'required': ['email'],
    'additionalProperties': False}}},
 {'type': 'function',
  'function': {'name': 'record_unknown_question',
   'description': "Use this tool when the agent doesn't know how to answer a question.",
   'parameters': {'type': 'object',
    'properties': {'question': {'type': 'string',
      'description': 'The unanswerable question'}},
    'required': ['question'],
    'additionalProperties': False}}}]

### Behind the Scenes

- GPT-4o is the main agent, answering questions and calling tools as needed.
- Pushover handles real-time push notifications to your mobile device.
- Tool calls are described in JSON schemas and passed to the LLM.
- If the LLM requests a tool, your code executes the real Python function and sends the result back.
- All done using native OpenAI APIs — no external agent framework.

---

### 🔁 Summary of Workflow

1. User sends a message
2. GPT-4o responds — or decides to call a tool
3. If a tool is called, your Python function runs
4. Tool results are passed back into the conversation
5. The LLM continues naturally

This is a minimal, working example of **tool-augmented agentic AI**, running in a single loop.

## And now for deployment `app.py`

Great, I hope you enjoyed it, but next it's all about doing this for you and actually deploying this live to production so that you can serve your own avatar on your personal website. So if you've followed everything so far, then many congratulations. If you haven't, then also congratulations because it gives you this great opportunity to go through, work through this until you do, and if I can help, email me, contact me anytime, LinkedIn with me, message me so that I can help you out. So what I want to do now is show you how you can deploy this application in production for yourself so that you can have this as your virtual resume. Surely this is the future of resumes. No longer will we have profiles or CVs, resumes where you list out your skills and experience, but rather you'll have a chatbot that people can interact with to learn about your career. And what better way to highlight your AI abilities and your abilities to work with agentic AI than to have an agentic solution up on your website that will allow you to interact with people and talk about your career. 

In [28]:
from dotenv import load_dotenv
from openai import OpenAI
import json
import os
import requests
from pypdf import PdfReader
import gradio as gr


load_dotenv(override=True)

def push(text):
    requests.post(
        "https://api.pushover.net/1/messages.json",
        data={
            "token": os.getenv("PUSHOVER_TOKEN"),
            "user": os.getenv("PUSHOVER_USER"),
            "message": text,
        }
    )


def record_user_details(email, name="Name not provided", notes="not provided"):
    push(f"Recording {name} with email {email} and notes {notes}")
    return {"recorded": "ok"}

def record_unknown_question(question):
    push(f"Recording {question}")
    return {"recorded": "ok"}

record_user_details_json = {
    "name": "record_user_details",
    "description": "Use this tool to record that a user is interested in being in touch and provided an email address",
    "parameters": {
        "type": "object",
        "properties": {
            "email": {
                "type": "string",
                "description": "The email address of this user"
            },
            "name": {
                "type": "string",
                "description": "The user's name, if they provided it"
            }
            ,
            "notes": {
                "type": "string",
                "description": "Any additional information about the conversation that's worth recording to give context"
            }
        },
        "required": ["email"],
        "additionalProperties": False
    }
}

record_unknown_question_json = {
    "name": "record_unknown_question",
    "description": "Always use this tool to record any question that couldn't be answered as you didn't know the answer",
    "parameters": {
        "type": "object",
        "properties": {
            "question": {
                "type": "string",
                "description": "The question that couldn't be answered"
            },
        },
        "required": ["question"],
        "additionalProperties": False
    }
}

tools = [{"type": "function", "function": record_user_details_json},
        {"type": "function", "function": record_unknown_question_json}]


class Me:

    def __init__(self):
        self.openai = OpenAI()
        self.name = "Ed Donner"
        reader = PdfReader("me/linkedin.pdf")
        self.linkedin = ""
        for page in reader.pages:
            text = page.extract_text()
            if text:
                self.linkedin += text
        with open("me/summary.txt", "r", encoding="utf-8") as f:
            self.summary = f.read()


    def handle_tool_call(self, tool_calls):
        results = []
        for tool_call in tool_calls:
            tool_name = tool_call.function.name
            arguments = json.loads(tool_call.function.arguments)
            print(f"Tool called: {tool_name}", flush=True)
            tool = globals().get(tool_name)
            result = tool(**arguments) if tool else {}
            results.append({"role": "tool","content": json.dumps(result),"tool_call_id": tool_call.id})
        return results
    
    def system_prompt(self):
        system_prompt = f"You are acting as {self.name}. You are answering questions on {self.name}'s website, \
particularly questions related to {self.name}'s career, background, skills and experience. \
Your responsibility is to represent {self.name} for interactions on the website as faithfully as possible. \
You are given a summary of {self.name}'s background and LinkedIn profile which you can use to answer questions. \
Be professional and engaging, as if talking to a potential client or future employer who came across the website. \
If you don't know the answer to any question, use your record_unknown_question tool to record the question that you couldn't answer, even if it's about something trivial or unrelated to career. \
If the user is engaging in discussion, try to steer them towards getting in touch via email; ask for their email and record it using your record_user_details tool. "

        system_prompt += f"\n\n## Summary:\n{self.summary}\n\n## LinkedIn Profile:\n{self.linkedin}\n\n"
        system_prompt += f"With this context, please chat with the user, always staying in character as {self.name}."
        return system_prompt
    
    def chat(self, message, history):
        messages = [{"role": "system", "content": self.system_prompt()}] + history + [{"role": "user", "content": message}]
        done = False
        while not done:
            response = self.openai.chat.completions.create(model="gpt-4o-mini", messages=messages, tools=tools)
            if response.choices[0].finish_reason=="tool_calls":
                message = response.choices[0].message
                tool_calls = message.tool_calls
                results = self.handle_tool_call(tool_calls)
                messages.append(message)
                messages.extend(results)
            else:
                done = True
        return response.choices[0].message.content
    

if __name__ == "__main__":
    me = Me()
    gr.ChatInterface(me.chat, type="messages").launch()

* Running on local URL:  http://127.0.0.1:7867
* To create a public link, set `share=True` in `launch()`.


The deploy process creates a new README file in this directory for you.

1. Visit https://huggingface.co and set up an account  
2. From the Avatar menu on the top right, choose Access Tokens. Choose "Create New Token". Give it WRITE permissions.
3. Take this token and add it to your .env file: `HF_TOKEN=hf_xxx` and see note below if this token doesn't seem to get picked up during deployment  
4. From the 1_foundations folder, enter: `uv run gradio deploy` and if for some reason this still wants you to enter your HF token, then interrupt it with ctrl+c and run this instead: `uv run dotenv -f ../.env run -- uv run gradio deploy` which forces your keys to all be set as environment variables   
5. Follow its instructions: name it "career_conversation", specify app.py, choose cpu-basic as the hardware, say Yes to needing to supply secrets, provide your openai api key, your pushover user and token, and say "no" to github actions.  


```bash
(agents_env) ➜  week1_foundations git:(main) ✗ gradio deploy
Need 'write' access token to create a Spaces repo.

    _|    _|  _|    _|    _|_|_|    _|_|_|  _|_|_|  _|      _|    _|_|_|      _|_|_|_|    _|_|      _|_|_|  _|_|_|_|
    _|    _|  _|    _|  _|        _|          _|    _|_|    _|  _|            _|        _|    _|  _|        _|
    _|_|_|_|  _|    _|  _|  _|_|  _|  _|_|    _|    _|  _|  _|  _|  _|_|      _|_|_|    _|_|_|_|  _|        _|_|_|
    _|    _|  _|    _|  _|    _|  _|    _|    _|    _|    _|_|  _|    _|      _|        _|    _|  _|        _|
    _|    _|    _|_|      _|_|_|    _|_|_|  _|_|_|  _|      _|    _|_|_|      _|        _|    _|    _|_|_|  _|_|_|_|

Enter your token (input will not be visible): 
Add token as git credential? (Y/n) Y
Space available at https://huggingface.co/spaces/ALEXJUST/career_conversation
```

#### Extra note about the HuggingFace token

A couple of students have mentioned the HuggingFace doesn't detect their token, even though it's in the .env file. Here are things to try:   
1. Restart Cursor   
2. Rerun load_dotenv(override=True) and use a new terminal (the + button on the top right of the Terminal)   
3. In the Terminal, run this before the gradio deploy: `$env:HF_TOKEN = "hf_XXXX"`  
Thank you James and Martins for these tips.  

#### More about these secrets:

If you're confused by what's going on with these secrets: it just wants you to enter the key name and value for each of your secrets -- so you would enter:  
`OPENAI_API_KEY`  
Followed by:  
`sk-proj-...`  

And if you don't want to set secrets this way, or something goes wrong with it, it's no problem - you can change your secrets later:  
1. Log in to HuggingFace website  
2. Go to your profile screen via the Avatar menu on the top right  
3. Select the Space you deployed  
4. Click on the Settings wheel on the top right  
5. You can scroll down to change your secrets, delete the space, etc.

#### And now you should be deployed!

Here is mine: https://huggingface.co/spaces/ed-donner/Career_Conversation

I just got a push notification that a student asked me how they can become President of their country 😂😂

For more information on deployment:

https://www.gradio.app/guides/sharing-your-app#hosting-on-hf-spaces

To delete your Space in the future:  
1. Log in to HuggingFace
2. From the Avatar menu, select your profile
3. Click on the Space itself and select the settings wheel on the top right
4. Scroll to the Delete section at the bottom
5. ALSO: delete the README file that Gradio may have created inside this 1_foundations folder (otherwise it won't ask you the questions the next time you do a gradio deploy)



---

So there we go, and it's come through very nicely, and I'm happy to say that I've been alerted about my own email address. So hopefully that is clear for you. You now see that, now if I take this off, hopefully everyone has now deployed as well. Career Conversations 2 is right there, there it is as well, so I now have two Career Conversations. So this is how you can interact with a deployed app, and also Hugging Face gives you a great way that you can just embed this in your own website. So I have a number of Hugging Face spaces that run on my website, like I've got this Connect 4 game that you can play against different LLMs, and this is just a Hugging Face space, but if you look it looks like it's just coming from my own personal website. You can do the same thing, instructions are on the Hugging Face spaces site, and that way you can have your web page having embedded within it your Career Conversation, where people can come to your website, they can have a virtual conversation with your avatar about your career, about your interests.

<table style="margin: 0; text-align: left; width:100%">
    <tr>
        <td>
            <h2 style="color:brown;">Exercise</h2>
            <span style="color:brown;">• First and foremost, deploy this for yourself! It's a real, valuable tool - the future resume..<br/>
            • Next, improve the resources - add better context about yourself. If you know RAG, then add a knowledge base about you.<br/>
            • Add in more tools! You could have a SQL database with common Q&A that the LLM could read and write from?<br/>
            • Bring in the Evaluator from the last lab, and add other Agentic patterns.
            </span>
        </td>
    </tr>
</table>

<table style="margin: 0; text-align: left; width:100%">
    <tr>
        <td>
            <h2 style="color:blue;">Commercial implications</h2>
            <span style="color:blue;">Aside from the obvious (your career alter-ego) this has business applications in any situation where you need an AI assistant with domain expertise and an ability to interact with the real world.
            </span>
        </td>
    </tr>
</table>