# Workshop 3: Building AI Agents — Foundations

**Last week we covered:**
- How LLMs work at a high level
- Prompt engineering techniques
- Getting structured output from LLMs

**Today we will:**
- Discover the fundamental limits of LLMs
- Learn how function calling bridges those limits
- Build a restaurant finder agent — step by step — from keyword matching all the way to a full reasoning loop
- Connect it to our real project


In [1]:
# ── Setup & Imports ───────────────────────────────────────────
from openai import OpenAI
from utils.display import output_box, llm_response, separator, compare_table, heading

client = OpenAI()

def generate(prompt: str, temperature: float = 0) -> str:
    """Send a prompt to the LLM and return the text response."""
    response = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[
            {"role": "system", "content": "You are a helpful assistant."},
            {"role": "user",   "content": prompt},
        ],
        temperature=temperature,
    )
    return response.choices[0].message.content.strip()


---
## Part 1 — The Gap: What LLMs Can and Cannot Do

Let's start with a simple question. What is `1234 × 5678`?

With Python we can get the answer immediately:


In [5]:
# Python does math with certainty
print(f"Python says: {1847 * 293:,}")


Python says: 541,171


Now let's ask the LLM the same thing.


In [11]:
num1, num2 = 1847, 293

response = generate(f"What is {num1} * {num2}? Respond only with the number.")
llm_answer = int(response.replace(",", "").strip())
correct    = num1 * num2

llm_response(response, label="LLM's Answer")
output_box(
    f"LLM said:  {llm_answer:,}\n"
    f"Correct:   {correct:,}\n"
    f"Off by:    {abs(llm_answer - correct):,}",
    label="Comparison",
    style="warning",
)


### What just happened?

LLMs are **language models** — they predict the most likely next token, not the correct answer. When the model sees `1847 × 293`, it guesses a number that *looks* like a 7-million-ish product because it has seen millions of similar examples. But it never runs the multiplication algorithm.

Let's look at a few other examples:


In [12]:
separator("Can the LLM list files on your computer?")
llm_response(generate("List all the files in the current directory."), label="LLM on file listing")

separator("Does the LLM know the current time?")
llm_response(generate("What is the exact current time right now?"), label="LLM on current time")

separator("Can the LLM check today's weather?")
llm_response(generate("What is the current temperature in Boston right now?"), label="LLM on weather")


> **Key Insight:** In every case the LLM produced text that *looks like* an answer — but it never actually performed the action. LLMs can only generate text. They cannot run calculations, read files, check the time, or fetch live data.


---
## Part 2 — But LLMs Can Plan

We just saw that LLMs can't *do* things. But look closely at those responses. When asked about files, the LLM described *exactly* how to list them. When asked for the time, it told you the Python code that would get it.

Let's make that explicit:


In [13]:
separator("How would you calculate 1847 × 293?")
llm_response(generate(f"How would you calculate {num1} * {num2}? Be specific."), label="LLM describes math")

separator("How would you list files?")
llm_response(generate("What Python code would list all files in the current directory?"), label="LLM describes file listing")

separator("How would you get the current time?")
llm_response(generate("What Python code would get the current time?"), label="LLM describes time")


> **Key Insight:** LLMs are *excellent* at understanding tasks and describing what needs to be done.

> **The Big Idea:** What if instead of asking LLMs to **do** things, we ask them to tell us **what needs to be done** — and then *we* execute it?


---
## Part 3 — Bridging the Gap: Function Calling

The idea: give the LLM a set of functions it can choose from. Ask it to output which function to call (as text). Then we execute that function.

Let's prove this with the simplest possible example — arithmetic:


In [14]:
# ── Define simple arithmetic tools ────────────────────────────
def add_ints(x: int, y: int) -> int:
    """Add two integers."""
    return x + y

def multiply_ints(x: int, y: int) -> int:
    """Multiply two integers."""
    return x * y

def divide_ints(x: int, y: int) -> float:
    """Divide two integers."""
    return x / y

# Quick smoke-test
output_box(
    f"add_ints(5, 3)      = {add_ints(5, 3)}\n"
    f"multiply_ints(5, 3) = {multiply_ints(5, 3)}\n"
    f"divide_ints(9, 3)   = {divide_ints(9, 3)}",
    label="Tools work ✓",
    style="success",
)


In [15]:
# ── Ask the LLM which function to call ────────────────────────
prompt = f"""You have access to these functions:
- add_ints(x, y): Adds two integers
- multiply_ints(x, y): Multiplies two integers
- divide_ints(x, y): Divides two integers

Respond with ONLY the function call needed to answer this query:
"What is the product of {num1} and {num2}?"
"""

function_call = generate(prompt)
llm_response(function_call, label="LLM generates a function call (it's just text!)")


In [16]:
# ── We execute it ─────────────────────────────────────────────
tools = {"add_ints": add_ints, "multiply_ints": multiply_ints, "divide_ints": divide_ints}
result = eval(function_call, tools)

output_box(
    f"Function call: {function_call}\n"
    f"Result:        {result:,}\n"
    f"Correct:       {num1 * num2:,}",
    label="Execution Result",
    style="success",
)


In [17]:
# ── Complete end-to-end flow ───────────────────────────────────
user_query = f"Multiply {num1} and {num2}"

separator("Step 1 — User query")
output_box(user_query, label="User")

separator("Step 2 — LLM decides what to call")
call = generate(f"""Functions: add_ints(x,y), multiply_ints(x,y), divide_ints(x,y)
Query: {user_query}
Respond with ONLY the exact function call.""")
llm_response(call, label="LLM Output (still just text)")

separator("Step 3 — System executes")
result = eval(call, tools)
output_box(f"{call} → {result:,}", label="Execution")

separator("Step 4 — LLM formats the final answer")
final = generate(f'The user asked: "{user_query}". The result is {result}. Respond naturally.')
llm_response(final, label="Final Response")


### What We Just Discovered

1. **LLMs are text generators** — they output strings, nothing else
2. **LLMs can't directly run things** — no calculations, files, or live data
3. **LLMs understand tasks** — they know *what* needs to be done
4. **Function calling bridges the gap** — LLM writes instructions → system executes → LLM sees result → responds

**The pattern:**
```
User Query → LLM (thinks) → Function Call (text) → System Executes → Result → LLM Responds
```


---
## Part 4 — A Real Scenario: Restaurant Finder

The arithmetic example was clean but trivial. Real-world tasks are messier — and they often need **multiple steps** to answer one question.

Let's build something more interesting: a restaurant finder that can answer questions like:
- *"What's the wait at Olive Garden?"*
- *"Which restaurant has the best rating?"*
- *"What's the best Italian place within 10 minutes of me?"*

An LLM alone cannot answer these — it doesn't know **current** wait times, live ratings, or your location. But if we give it the right tools, it can figure out exactly what to call.

First, let's set up our data and tools.


In [19]:
# ── Restaurant data (simulated real-time) ─────────────────────
# In a real system this would hit live APIs. Here we use hardcoded data.

RESTAURANTS = {
    "Olive Garden":  {"wait": 25, "rating": 3.8, "distance": 1.2, "cuisine": "Italian",  "price": "$$"},
    "Sushi Palace":  {"wait": 10, "rating": 4.5, "distance": 2.0, "cuisine": "Japanese", "price": "$$$"},
    "Burger Barn":   {"wait":  5, "rating": 3.5, "distance": 0.8, "cuisine": "American", "price": "$"},
    "Taj Mahal":     {"wait": 15, "rating": 4.7, "distance": 3.5, "cuisine": "Indian",   "price": "$$"},
    "La Maison":     {"wait": 40, "rating": 4.9, "distance": 4.2, "cuisine": "French",   "price": "$$$"},
}

# ── Tool functions ─────────────────────────────────────────────
def list_restaurants() -> list[str]:
    """Return all available restaurant names."""
    return list(RESTAURANTS.keys())

def get_wait_time(restaurant: str) -> int:
    """Get current wait time in minutes."""
    return RESTAURANTS.get(restaurant, {}).get("wait", -1)

def get_rating(restaurant: str) -> float:
    """Get star rating (1–5)."""
    return RESTAURANTS.get(restaurant, {}).get("rating", -1)

def get_distance(restaurant: str) -> float:
    """Get distance in miles from current location."""
    return RESTAURANTS.get(restaurant, {}).get("distance", -1)

def get_cuisine(restaurant: str) -> str:
    """Get cuisine type."""
    return RESTAURANTS.get(restaurant, {}).get("cuisine", "Unknown")

def get_price_range(restaurant: str) -> str:
    """Get price range: $, $$, or $$$."""
    return RESTAURANTS.get(restaurant, {}).get("price", "?")

def calculate_total_time(wait: int, distance: float, speed_mph: float = 30.0) -> float:
    """Calculate total time = wait time + travel time (distance / speed * 60 min)."""
    travel_minutes = (distance / speed_mph) * 60
    return round(wait + travel_minutes, 1)


**Our tools at a glance:**

| Function | What it does |
|---|---|
| `list_restaurants()` | All restaurant names |
| `get_wait_time(restaurant)` | Current wait in minutes |
| `get_rating(restaurant)` | Star rating 1–5 |
| `get_distance(restaurant)` | Miles from your location |
| `get_cuisine(restaurant)` | Cuisine type |
| `get_price_range(restaurant)` | `$, $$, or $$$`|
| `calculate_total_time(wait, distance)` | Wait + travel time combined |

Let's verify they all work:


In [20]:
compare_table(
    rows=[
        ("list_restaurants()",               str(list_restaurants())),
        ("get_wait_time('Olive Garden')",     f"{get_wait_time('Olive Garden')} min"),
        ("get_rating('Sushi Palace')",        f"{get_rating('Sushi Palace')} ★"),
        ("get_distance('Burger Barn')",       f"{get_distance('Burger Barn')} miles"),
        ("get_cuisine('Taj Mahal')",          get_cuisine("Taj Mahal")),
        ("get_price_range('La Maison')",      get_price_range("La Maison")),
        ("calculate_total_time(25, 1.2)",     f"{calculate_total_time(25, 1.2)} min"),
    ],
    headers=("Function Call", "Result"),
)


Function Call,Result
list_restaurants(),"['Olive Garden', 'Sushi Palace', 'Burger Barn', 'Taj Mahal', 'La Maison']"
get_wait_time('Olive Garden'),25 min
get_rating('Sushi Palace'),4.5 ★
get_distance('Burger Barn'),0.8 miles
get_cuisine('Taj Mahal'),Indian
get_price_range('La Maison'),$$$
"calculate_total_time(25, 1.2)",27.4 min


---
### Manual Walkthroughs — Building Intuition

Before we write any agent code, let's manually think through three queries of increasing complexity. This is important — the friction you feel doing this by hand is *exactly* what the agent will solve.

---
#### Query 1 — Simple: *"What's the wait at Olive Garden?"*

To answer this, we need exactly one tool.


In [21]:
restaurant = "Olive Garden"

wait = get_wait_time(restaurant)

output_box(
    f"get_wait_time('{restaurant}') → {wait} minutes",
    label="Query 1: What's the wait at Olive Garden?",
    style="success",
)


---
#### Query 2 — Medium: *"Which restaurant has the shortest wait right now?"*

Now we need to check **every** restaurant, calculate each wait, and find the minimum.


In [22]:
separator("Query 2: Which restaurant has the shortest wait?")

results = []
for name in list_restaurants():
    wait = get_wait_time(name)
    results.append((name, wait))
    print(f"  {name:<18} → {wait} min wait")

best_name, best_wait = min(results, key=lambda x: x[1])

output_box(
    f"Winner: {best_name} ({best_wait} min wait)",
    label="Result",
    style="success",
)


  Olive Garden       → 25 min wait
  Sushi Palace       → 10 min wait
  Burger Barn        → 5 min wait
  Taj Mahal          → 15 min wait
  La Maison          → 40 min wait


---
#### Query 3 — Complex: *"What's the best Italian place within 10 minutes of me?"*

This needs multiple filters across multiple tools, in order:
1. Get each restaurant's cuisine — keep only Italian ones
2. For each Italian restaurant, calculate total time (wait + travel)
3. Filter to those with total time ≤ 10 minutes
4. Rank by rating among the ones that remain


In [25]:
separator("Query 3: Best Italian place within 10 minutes?")

MAX_TOTAL_MINUTES = 10
candidates = []

for name in list_restaurants():
    cuisine = get_cuisine(name)
    if cuisine != "Italian":
        continue

    wait     = get_wait_time(name)
    distance = get_distance(name)
    total    = calculate_total_time(wait, distance)
    rating   = get_rating(name)

    print(f"  {name}: cuisine={cuisine}, wait={wait}m, dist={distance}mi, total={total}m, rating={rating}★")

    if total <= MAX_TOTAL_MINUTES:
        candidates.append((name, total, rating))

if candidates:
    best = max(candidates, key=lambda x: x[2])
    output_box(
        f"Winner: {best[0]}\nTotal time: {best[1]} min  |  Rating: {best[2]} ★",
        label="Result",
        style="success",
    )
else:
    output_box("No Italian restaurants within 10 minutes.", style="warning")


  Olive Garden: cuisine=Italian, wait=25m, dist=1.2mi, total=27.4m, rating=3.8★


That took a lot of hardcoded steps — and we wrote those steps ourselves. What if we had 50 different query types? We'd need 50 different blocks of code.

Let's see we can use a LLM to help instead. 

In [26]:
TOOL_DESCRIPTIONS = """
Available tools:
- list_restaurants() — returns all restaurant names
- get_wait_time(restaurant) — current wait in minutes
- get_rating(restaurant) — star rating (1–5)
- get_distance(restaurant) — miles from your location
- get_cuisine(restaurant) — cuisine type (e.g., Italian, Japanese)
- get_price_range(restaurant) — price range ($, $$, $$$)
- calculate_total_time(wait, distance) — wait + travel time combined
"""

test_queries = [
    "What's the wait at Olive Garden?",
    "Which restaurant has the shortest wait?",
    "What's the best Italian place within 10 minutes?",
]

for query in test_queries:
    plan = generate(f"""
{TOOL_DESCRIPTIONS}

User query: {query}

What tool calls are needed, in order? List them one per line.
""")
    separator(f'Query: "{query}"')
    llm_response(plan, label="LLM's Plan")


The LLM can plan the right steps. The question is: can we automate that planning and execution? Let's build toward that.


---
## Part 5 — Building the Agent

We'll build four versions, each fixing a flaw in the previous one.

---
### Agent v1 — Keyword Matching


In [27]:
def restaurant_agent_v1(query: str) -> str:
    """
    Agent v1: Uses keyword matching to route queries.
    Simple, but extremely fragile.
    """
    q = query.lower()

    # Find restaurant name (if any)
    restaurant = next((r for r in list_restaurants() if r.lower() in q), None)

    if "wait" in q and restaurant:
        wait = get_wait_time(restaurant)
        return f"Wait at {restaurant}: {wait} minutes"

    elif "rating" in q and restaurant:
        rating = get_rating(restaurant)
        return f"Rating of {restaurant}: {rating} ★"

    elif "fastest" in q or "shortest wait" in q:
        results = [(r, get_wait_time(r)) for r in list_restaurants()]
        best = min(results, key=lambda x: x[1])
        return f"Shortest wait: {best[0]} ({best[1]} min)"

    else:
        return "I don't understand that request."


In [28]:
# Things v1 gets right
compare_table(
    rows=[
        ("What's the wait at Olive Garden?",    restaurant_agent_v1("What's the wait at Olive Garden?")),
        ("Sushi Palace rating?",                 restaurant_agent_v1("Sushi Palace rating?")),
        ("Which has the shortest wait?",         restaurant_agent_v1("Which has the shortest wait?")),
    ],
    headers=("Query", "v1 Result"),
)


Query,v1 Result
What's the wait at Olive Garden?,Wait at Olive Garden: 25 minutes
Sushi Palace rating?,Rating of Sushi Palace: 4.5 ★
Which has the shortest wait?,Shortest wait: Burger Barn (5 min)


In [29]:
# Things v1 breaks on
compare_table(
    rows=[
        ("How long will I wait at Olive Garden?",       restaurant_agent_v1("How long will I wait at Olive Garden?")),
        ("Is Sushi Palace well reviewed?",              restaurant_agent_v1("Is Sushi Palace well reviewed?")),
        ("Where can I get seated quickly?",             restaurant_agent_v1("Where can I get seated quickly?")),
    ],
    headers=("Query", "v1 Result"),
)


Query,v1 Result
How long will I wait at Olive Garden?,Wait at Olive Garden: 25 minutes
Is Sushi Palace well reviewed?,I don't understand that request.
Where can I get seated quickly?,I don't understand that request.


> **The problem:** v1 understands exact keywords — `"wait"`, `"rating"`, `"fastest"`. Real users don't talk that way. Slight rephrasing breaks everything.

---
### Agent v2 — LLM Understanding

What if we let the LLM *classify* the query instead of matching keywords?


In [30]:
def understand_query(query: str) -> str:
    """Use the LLM to classify what kind of information the user wants."""
    return generate(
        f"""Classify this restaurant query into one of these categories:
- check_wait: user wants the wait time at a specific restaurant
- check_rating: user wants the rating of a specific restaurant
- find_shortest_wait: user wants the restaurant with the shortest wait
- unknown: doesn't match any above

Query: {query}
Respond with the category name only.""",
    )

def extract_restaurant(query: str) -> str:
    """Extract the restaurant name from a query."""
    return generate(
        f"""Available restaurants: {', '.join(list_restaurants())}

Extract the restaurant name from this query, or respond 'none' if no specific restaurant is mentioned.
Query: {query}
Respond with just the restaurant name or 'none'.""",
    )


In [31]:
def restaurant_agent_v2(query: str) -> str:
    """
    Agent v2: LLM understands intent; system executes.
    Handles natural language — but only one tool at a time.
    """
    intent     = understand_query(query)
    restaurant = extract_restaurant(query)

    print(f"  Intent: {intent}")
    print(f"  Restaurant: {restaurant}")

    if intent == "check_wait" and restaurant != "none":
        wait = get_wait_time(restaurant)
        return f"Wait at {restaurant}: {wait} minutes"

    elif intent == "check_rating" and restaurant != "none":
        rating = get_rating(restaurant)
        return f"Rating of {restaurant}: {rating} ★"

    elif intent == "find_shortest_wait":
        results = [(r, get_wait_time(r)) for r in list_restaurants()]
        best = min(results, key=lambda x: x[1])
        return f"Shortest wait: {best[0]} ({best[1]} min)"

    else:
        return "I couldn't understand that request."


In [32]:
# Queries that failed in v1 now work
separator("Queries that v1 could not handle")

for query in [
    "How long will I wait at Olive Garden?",
    "Is Sushi Palace well reviewed?",
    "Where can I get seated quickly?",
]:
    print(f"\nQuery: '{query}'")
    result = restaurant_agent_v2(query)
    output_box(result, label="v2 Result", style="success")



Query: 'How long will I wait at Olive Garden?'
  Intent: check_wait
  Restaurant: Olive Garden



Query: 'Is Sushi Palace well reviewed?'
  Intent: check_rating
  Restaurant: Sushi Palace



Query: 'Where can I get seated quickly?'
  Intent: find_shortest_wait
  Restaurant: none


In [33]:
# But v2 fails on anything needing multiple tools
separator("A query that requires multiple steps")

query = "What's the best Italian place within 10 minutes of me?"
print(f"Query: '{query}'")
result = restaurant_agent_v2(query)
output_box(result, label="v2 Result", style="error")


Query: 'What's the best Italian place within 10 minutes of me?'
  Intent: unknown
  Restaurant: Olive Garden


> **The problem:** v2 understands *language* but still follows rigid if/else branches. Answering "best Italian within 10 minutes" requires calling `get_cuisine`, `get_wait_time`, `get_distance`, and `calculate_total_time` — in a specific order. v2 can't do that.

---
### Agent v3 — LLM Picks the Tool

What if we stop writing branches altogether and let the LLM decide *which function to call*?


In [34]:
TOOLS_V3 = {
    "list_restaurants": list_restaurants,
    "get_wait_time": get_wait_time,
    "get_rating": get_rating,
    "get_distance": get_distance,
    "get_cuisine": get_cuisine,
    "get_price_range": get_price_range,
    "calculate_total_time": calculate_total_time,
}

def restaurant_agent_v3(query: str) -> str:
    """
    Agent v3: LLM selects AND generates the function call; system executes.
    Flexible — but still only one tool call per query.
    """
    tool_call = generate(
        f"""{TOOL_DESCRIPTIONS}
User query: {query}

Respond with ONLY a single function call (e.g., get_wait_time('Olive Garden')).
If multiple calls are needed, give the first one only.""",
    )

    print(f"  LLM chose: {tool_call}")

    # Strip markdown fences if the LLM adds them
    clean = tool_call.strip().strip("`").strip()

    try:
        result = eval(clean, TOOLS_V3)
        return f"{clean} → {result}"
    except Exception as e:
        return f"Error executing '{clean}': {e}"


In [35]:
# v3 picks the right tool without any hardcoded branches
compare_table(
    rows=[
        (q, restaurant_agent_v3(q)) for q in [
            "How long will I wait at Olive Garden?",
            "Is Sushi Palace well reviewed?",
            "What kind of food does Taj Mahal serve?",
            "How far is Burger Barn?",
        ]
    ],
    headers=("Query", "v3 Result"),
)


  LLM chose: get_wait_time('Olive Garden')
  LLM chose: get_rating('Sushi Palace')
  LLM chose: get_cuisine('Taj Mahal')
  LLM chose: get_distance('Burger Barn')


Query,v3 Result
How long will I wait at Olive Garden?,get_wait_time('Olive Garden') → 25
Is Sushi Palace well reviewed?,get_rating('Sushi Palace') → 4.5
What kind of food does Taj Mahal serve?,get_cuisine('Taj Mahal') → Indian
How far is Burger Barn?,get_distance('Burger Barn') → 0.8


> **The remaining problem:** Consider answering *"What's the wait at Olive Garden?"* in full. The correct sequence is:
> 1. `get_wait_time('Olive Garden')` → 25
> 2. Use that result in a natural-language response
>
> That's manageable with one call. But *"What's the best Italian place within 10 minutes?"* needs 7+ calls — and each call's *arguments depend on the result of a previous call*. v3 can only make one.

---
### Agent v4 — The Loop

The fix is simple: let the agent keep going. After each tool call, feed the result back to the LLM and ask "what's next?" Keep looping until it says it's done.


In [36]:
def restaurant_agent_v4(query: str, max_steps: int = 10) -> str:
    """
    Agent v4: LLM reasons and acts in a loop until the query is answered.
    This is the ReAct pattern.
    """
    print(f"Query: {query}")
    separator()

    history = []  # Running log of (tool_call, result) pairs

    for step in range(1, max_steps + 1):
        # Build the prompt from scratch each step, including full history
        history_text = "\n".join(f"  Step {i+1}: {entry}" for i, entry in enumerate(history))

        prompt = f"""You are answering this restaurant query: {query}

{TOOL_DESCRIPTIONS}

Steps taken so far:
{history_text if history_text else "  (none yet)"}

What is the NEXT single tool call needed?
If the query is fully answered, respond with exactly: DONE
Otherwise respond with a single function call only (no explanation, no code fences)."""

        response = generate(prompt).strip().strip("`")

        print(f"  Step {step}: {response}")

        if "DONE" in response.upper():
            break

        try:
            result = eval(response, TOOLS_V3)
            history.append(f"{response} → {result}")
        except Exception as e:
            history.append(f"{response} → ERROR: {e}")

    # Generate a natural-language answer from the accumulated results
    steps_summary = "\n".join(f"  {entry}" for entry in history)
    answer = generate(
        f"""The user asked: "{query}"
Results from tool calls:
{steps_summary}

Write a clear, concise answer to the user's question.""",
    )

    separator()
    output_box(answer, label="Final Answer", style="success")
    return answer


In [37]:
# Simple query — completes in 1–2 steps
restaurant_agent_v4("What's the wait at Olive Garden?")


Query: What's the wait at Olive Garden?


  Step 1: list_restaurants()
  Step 2: get_wait_time('Olive Garden')
  Step 3: DONE


'The wait at Olive Garden is currently 25 minutes.'

In [38]:
# Complex query — requires multiple steps and dependent arguments
restaurant_agent_v4("What's the best Italian place within 10 minutes of me?")


Query: What's the best Italian place within 10 minutes of me?


  Step 1: list_restaurants()
  Step 2: get_cuisine('Olive Garden')
  Step 3: get_distance('Olive Garden')
  Step 4: get_wait_time('Olive Garden')
  Step 5: get_rating('Olive Garden')
  Step 6: get_distance('Sushi Palace')
  Step 7: get_cuisine('Sushi Palace')
  Step 8: get_distance('Burger Barn')
  Step 9: get_cuisine('Burger Barn')
  Step 10: get_distance('Taj Mahal')


'The best Italian place within 10 minutes of you is Olive Garden, which is 1.2 miles away. It has a rating of 3.8 and a wait time of about 25 minutes.'

In [39]:
# Bonus — a query none of the earlier versions could handle
restaurant_agent_v4("Which restaurant has the best rating-to-wait-time ratio?")


Query: Which restaurant has the best rating-to-wait-time ratio?


  Step 1: list_restaurants()
  Step 2: get_wait_time('Olive Garden')
  Step 3: get_rating('Olive Garden')
  Step 4: get_wait_time('Sushi Palace')
  Step 5: get_rating('Sushi Palace')
  Step 6: get_wait_time('Burger Barn')
  Step 7: get_rating('Burger Barn')
  Step 8: get_wait_time('Taj Mahal')
  Step 9: get_rating('Taj Mahal')
  Step 10: get_wait_time('La Maison')


'To find the best rating-to-wait-time ratio, we can calculate the ratio for each restaurant:\n\n1. **Olive Garden**: Rating 3.8, Wait time 25 minutes → Ratio = 3.8 / 25 = 0.152\n2. **Sushi Palace**: Rating 4.5, Wait time 10 minutes → Ratio = 4.5 / 10 = 0.45\n3. **Burger Barn**: Rating 3.5, Wait time 5 minutes → Ratio = 3.5 / 5 = 0.7\n4. **Taj Mahal**: Rating 4.7, Wait time 15 minutes → Ratio = 4.7 / 15 = 0.313\n5. **La Maison**: Rating not provided, Wait time 40 minutes → Ratio cannot be calculated.\n\nThe restaurant with the best rating-to-wait-time ratio is **Burger Barn** with a ratio of **0.7**.'

---
## Part 6 — The ReAct Pattern

Let's look at the evolution we just built:

| Version | Approach | Strengths | Weakness |
|---|---|---|---|
| v1 | Keyword matching | Fast, no API calls | Breaks on any rephrasing |
| v2 | LLM classifies intent | Handles natural language | Rigid if/else branches |
| v3 | LLM picks the tool | No hardcoded branches | Only one tool call |
| v4 | LLM loops until done | Multi-step, flexible | — |

**v4 has a name in AI research: ReAct (Reasoning and Acting)**

It was introduced in this [paper (Yao et al., 2022)](https://arxiv.org/abs/2210.03629). The loop looks like this:

```
┌──────────────────────────────┐
│         User Query           │
└──────────────┬───────────────┘
               ▼
         ┌──────────┐
         │  REASON  │  ← What do I need to know next?
         └────┬─────┘
              ▼
         ┌──────────┐
         │   ACT    │  ← Call the right tool
         └────┬─────┘
              ▼
         ┌──────────┐
         │ OBSERVE  │  ← What did the tool return?
         └────┬─────┘
              ▼
     ┌────────────────┐
     │  Task done?    │
     │  No → repeat   │
     │  Yes → answer  │
     └────────────────┘
```

The key property: **no hardcoded pipelines**. The LLM dynamically decides what to do next at every step, based on everything it has seen so far.

For more detail: [ReAct Prompting Guide](https://www.promptingguide.ai/techniques/react) | [Anthropic on Building Effective Agents](https://www.anthropic.com/engineering/building-effective-agents)


In [40]:
output_box(
    """An agent is just a loop:
  1. LLM reasons about what to do next
  2. System executes the chosen action
  3. Result feeds back into the next reasoning step
  4. Repeat until the goal is met

The LLM provides the intelligence.
The tools provide the capabilities.
The loop provides the flexibility.""",
    label="The Core Idea",
    style="info",
)


---
## Part 7 — What's Next

Today we built agents with *toy tools*: `get_wait_time`, `get_rating`, `get_distance`. These work with hardcoded dictionaries, but the pattern is exactly the same when the tools connect to real data.

**Our real project: an AI-powered Analytics Assistant**

Instead of a restaurant dictionary, users will have a dataset like this:

```
date       | product  | category    | price | quantity | region
-----------|----------|-------------|-------|----------|--------
2024-01-01 | Laptop   | Electronics | 1200  | 2        | North
2024-01-02 | Mouse    | Accessories | 25    | 10       | South
...
```

And instead of `get_wait_time`, we'll have tools like:

```python
load_csv(filepath)
show_data(n_rows)
filter_rows(condition)
group_and_aggregate(group_by, column, operation)
create_plot(plot_type, **kwargs)
```

The user asks: *"What's the average price by category?"*
The agent: loads the data, groups by category, calculates mean, returns results — automatically.

**Before next week, think about:**
1. What 3–5 queries would you ask your own data?
2. What tools would you need to answer them?
3. How granular should those tools be — very broad, very specific, or composable?

We'll build it together in Workshop 4.
