# [STARTER] Udaplay Project

## Part 02 - Agent

In this part of the project, you'll use your VectorDB to be part of your Agent as a tool.

You're building UdaPlay, an AI Research Agent for the video game industry. The agent will:
1. Answer questions using internal knowledge (RAG)
2. Search the web when needed
3. Maintain conversation state
4. Return structured outputs
5. Store useful information for future use

### Setup

In [190]:
# Only needed for Udacity workspace

import importlib.util
import sys

# Check if 'pysqlite3' is available before importing
if importlib.util.find_spec("pysqlite3") is not None:
    import pysqlite3
    sys.modules['sqlite3'] = sys.modules.pop('pysqlite3')

In [191]:
# TODO: Import the necessary libs
# For example: 
# import os
# import json
# import chromadb
# from chromadb.utils import embedding_functions
# from tavily import TavilyClient
# from dotenv import load_dotenv
from openai import OpenAI
# from typing import Optional


# from lib.evaluation import EvaluationReport

# # For Embeddings:
# import numpy as np



import os
import chromadb
from chromadb.utils import embedding_functions
from lib.agents import Agent
from lib.llm import LLM
from lib.messages import UserMessage, SystemMessage, ToolMessage, AIMessage
from lib.tooling import tool
from dotenv import load_dotenv
from typing import List, Optional, Dict, Any
from tavily import TavilyClient
from datetime import datetime
from lib.messages import BaseMessage

In [192]:
# TODO: Load environment variables
load_dotenv()

# Validate required API keys with helpful error messages

# OPEN AI KEY
openai_api_key = os.getenv('OPENAI_API_KEY')
if not openai_api_key:
    raise ValueError(
        'OPENAI_API_KEY not found in environment variables. '
        'Please create a .env file with OPENAI_API_KEY="your_key"'
    )

# TAVILY API
tavily_api_key = os.getenv('TAVILY_API_KEY')
if not tavily_api_key:
    raise ValueError(
        'TAVILY_API_KEY not found in environment variables. '
        'Please create a .env file with TAVILY_API_KEY="your_key"'
    )
try:
    tavily_client = TavilyClient(api_key=tavily_api_key)
except Exception as e:
    raise ValueError(f'Failed to initialize TAVILY client: {str(e)}')


# Initialize clients with error handling
# try:
#     client = OpenAI(
#         base_url="https://openai.vocareum.com/v1",
#         api_key=openai_api_key,
#     )
# except Exception as e:
#     raise ValueError(f'Failed to initialize OpenAI client: {str(e)}')

# try:
#     tavily_client = TavilyClient(api_key=tavily_api_key)
# except Exception as e:
#     raise ValueError(f'Failed to initialize Tavily client: {str(e)}')

print('API clients initialized successfully!')

API clients initialized successfully!


### Tools

Build at least 3 tools:
- retrieve_game: To search the vector DB
- evaluate_retrieval: To assess the retrieval performance
- game_web_search: If no good, search the web


#### Retrieve Game Tool

In [193]:
# TODO: Create retrieve_game tool
# It should use chroma client and collection you created
# chroma_client = chromadb.PersistentClient(path="chromadb")
# collection = chroma_client.get_collection("udaplay")
# Tool Docstring:
#    Semantic search: Finds most results in the vector DB
#    args:
#    - query: a question about game industry. 
#
#    You'll receive results as list. Each element contains:
#    - Platform: like Game Boy, Playstation 5, Xbox 360...)
#    - Name: Name of the Game
#    - YearOfRelease: Year when that game was released for that platform
#    - Description: Additional details about the game

# Define Chroma DB
chroma_client = chromadb.PersistentClient(path="chromadb")

# Define Embedding Function
embedding_fn = embedding_functions.OpenAIEmbeddingFunction(api_base="https://openai.vocareum.com/v1",api_key=openai_api_key)

# Define Collection
collection = chroma_client.get_collection("udaplay", embedding_function=embedding_fn)

 
@tool
def retrieve_game(query: str, n_results: int = 5):
    """
    Perform semantic search query and return list of game info dicts.

    Args:
        query (str): Query string about the game industry.
        n_results (int): Number of top results to return.

    Returns:
        List[dict]: List of games where each dict contains:
            - Platform
            - Name
            - YearOfRelease
            - Description
    """
    # Ensure single query wrapped in list
    query_texts = [query]

    results = collection.query(
        query_texts=query_texts,
        n_results=n_results,
        include=['documents', 'distances', 'metadatas']
    )

    # Extract the first (and only) query results sublists
    metadatas = results.get('metadatas', [[]])[0]

    # Build the list of game dictionaries with requested fields
    games = []
    for metadata in metadatas:
        game_info = {
            "Platform": metadata.get("Platform"),
            "Name": metadata.get("Name"),
            "YearOfRelease": metadata.get("YearOfRelease"),
            "Description": metadata.get("Description")
        }
        games.append(game_info)

    print(f"Retrieved {len(games)} games from vector database")
    
    return games

In [194]:
# Testing game retrieaval
results = retrieve_game.__call__(query="Best RPG games on Game Boy", n_results=3)

# Display results array
results


Retrieved 3 games from vector database


[{'Platform': 'Game Boy Color',
  'Name': 'Pokémon Gold and Silver',
  'YearOfRelease': 1999,
  'Description': 'Second-generation Pokémon games introducing new regions, Pokémon, and gameplay mechanics.'},
 {'Platform': 'Game Boy Advance',
  'Name': 'Pokémon Ruby and Sapphire',
  'YearOfRelease': 2002,
  'Description': 'Third-generation Pokémon games set in the Hoenn region, featuring new Pokémon and double battles.'},
 {'Platform': 'Nintendo 64',
  'Name': 'Super Mario 64',
  'YearOfRelease': 1996,
  'Description': "A groundbreaking 3D platformer that set new standards for the genre, featuring Mario's quest to rescue Princess Peach."}]

#### Evaluate Retrieval Tool

In [195]:
from pydantic import BaseModel, Field
from lib.llm import LLM
from lib.parsers import PydanticOutputParser


# Start LLM
llm = LLM(model = "gpt-4o",api_key=openai_api_key)

# Define Evaluation Model
class EvaluationReport(BaseModel):
    useful: bool = Field(description="Whether the documents are useful to answer the question")
    description: str = Field(description="Detailed explanation about the evaluation result")

In [196]:
class RetrievalEvaluator:
    """
    Tool to evaluate if retrieved documents are sufficient to answer the query.
    Uses an LLM as a judge with structured output parsing.
    """

    def __init__(self, llm):
        self.llm = llm
        self.parser = PydanticOutputParser(model_class=EvaluationReport)

    def evaluate_report(self, question: str, retrieved_docs: list[str]) -> EvaluationReport:
        """
        Evaluate the usability of retrieved documents for answering the question.

        Args:
            question (str): Original user question.
            retrieved_docs (list[str]): List of retrieved documents (texts).

        Returns:
            EvaluationReport: Structured evaluation with usefulness and explanation.
        """

        # Compose prompt to instruct the LLM judge
        prompt = f"""
        Your task is to evaluate if the documents are enough to respond to the query.

        User Query:
        {question}

        Retrieved Documents:
        {chr(10).join(f"- {doc}" for doc in retrieved_docs)}

        Please answer in JSON format with the following fields:
        - useful: true if the documents are sufficient to answer the question, false otherwise
        - description: provide a detailed explanation to justify your assessment, so that it is possible to decide whether to accept or reject these documents for answering the query.
        """

        # Invoke LLM with prompt, asking for structured output
        response = self.llm.invoke(input=prompt, response_format=EvaluationReport)

        # Parse the LLM response into EvaluationReport, with fallback for errors
        try:
            evaluation = self.parser.parse(response)
        except Exception as e:
            # Fallback: if parsing fails, return a default negative evaluation with explanation
            evaluation = EvaluationReport(
                useful=False,
                description=f"Failed to parse LLM evaluation response: {e}. Raw response: {response.content}"
            )

        return evaluation

In [197]:
@tool
def evaluate_retrieval(question: str, retrieved_docs: list[str]) -> dict:
    """
    Based on the user's question and the list of retrieved documents,
    analyze the usability of the documents to respond to that question.

    Args:
        question (str): Original user question.
        retrieved_docs (list[str]): Retrieved documents most similar to the user query.

    Returns:
        dict: {
            "useful": bool,           # Whether the documents are useful to answer the question
            "description": str        # Detailed explanation about the evaluation result
        }
    """
    evaluator = RetrievalEvaluator(llm)

    evaluation_report = evaluator.evaluate_report(question, retrieved_docs)

    print(f"Evaluation: LLM decision is useful: '{evaluation_report.useful}' and '{evaluation_report.description}")

    # Convert EvaluationReport Pydantic model to dict and return relevant fields
    return {
        "useful": evaluation_report.useful,
        "description": evaluation_report.description
    }

In [198]:
# Test Evaluation retreival
sample_question = "What are the most popular RPG games on Game Boy?"
sample_docs = [
    "[Game Boy Color] Pokémon Gold and Silver (1999) - Second-generation Pokémon games introducing new regions, Pokémon, and gameplay mechanics.",
    "[Game Boy Advance] Pokémon Ruby and Sapphire (2002) - Third-generation Pokémon games set in the Hoenn region, featuring new Pokémon and double battles."
]

report = evaluate_retrieval.__call__(sample_question, sample_docs)

# print(report)

Evaluation: LLM decision is useful: 'False' and 'The user query specifically asks for the most popular RPG games on the Game Boy, which refers to the original Game Boy system. The retrieved documents, however, mention games from the Game Boy Color and Game Boy Advance, which are different systems. While Pokémon Gold and Silver are indeed popular RPGs on the Game Boy Color, they do not directly answer the query about the original Game Boy. Additionally, Pokémon Ruby and Sapphire are for the Game Boy Advance, which is not relevant to the query. Therefore, the documents are not sufficient to fully address the user's question about the most popular RPG games on the original Game Boy system.


#### Game Web Search Tool

In [199]:
# TODO: Create game_web_search tool
# Please use Tavily client to search the web
# Tool Docstring:
#    Semantic search: Finds most results in the vector DB
#    args:
#    - question: a question about game industry. 

@tool
def web_search(query: str, search_depth: str = "advanced") -> Dict:
    """
    Search the web using Tavily API
    args:
        query (str): Search query
        search_depth (str): Type of search - 'basic' or 'advanced' (default: advanced)
    """

    print(f"Tool: Performing web search for query: '{query}'")
    try:
        # Perform the search
        search_result = tavily_client.search(
            query=query,
            search_depth=search_depth,
            include_answer=True,
            include_raw_content=False,
            include_images=False
        )
        
        # Format the results
        formatted_results = {
            "answer": search_result.get("answer", ""),
            "results": search_result.get("results", []),
            "search_metadata": {
                "timestamp": datetime.now().isoformat(),
                "query": query
            }
        }
    except Exception as e:
        print(f"Error during Tavily search: {e}")
        return "Web search failed."
    
    return formatted_results 

In [200]:
# Testing
result = web_search.__call__(query = "Who is the current prime minister in Canada?")
print(result)

Tool: Performing web search for query: 'Who is the current prime minister in Canada?'
{'answer': 'As of 2026, the current prime minister of Canada is Mark Carney. He was elected in 2025 and leads the Liberal Party.', 'results': [{'url': 'https://www.pm.gc.ca/en/about', 'title': 'About | Prime Minister of Canada', 'content': "Mark Carney is Canada's 24th Prime Minister. He was elected Leader of the Liberal Party of Canada and sworn in as Prime Minister in March 2025.", 'score': 0.9995695, 'raw_content': None}, {'url': 'https://en.wikipedia.org/wiki/Prime_Minister_of_Canada', 'title': 'Prime Minister of Canada', 'content': 'The prime minister is supported by the Prime Minister\'s Office "Office of the Prime Minister (Canada)") and heads the Privy Council Office "Privy Council Office (Canada)"). The prime minister also selects individuals for appointment as governor general, provincial lieutenant governors "Lieutenant Governor (Canada)"), territorial commissioners, as well as to the Senat

### Agent

In [201]:
# TODO: Create your Agent abstraction using StateMachine
# Equip with an appropriate model
# Craft a good set of instructions 
# Plug all Tools you developed

tools = [retrieve_game, evaluate_retrieval, web_search]

# Test the tools individually to verify they work before agent integration
print("#### Testing retrieve_game_info:")
print("################################")
results = retrieve_game("Nintendo games")
# print(f"Retrieved {len(results)} results")

print("\n#### Testing evaluate_retrieval:")
print("#################################")
evaluation = evaluate_retrieval("What Nintendo games are available?", results)
print(f"Evaluation result: {evaluation}")

if not evaluation['useful']:
    print("\n#### Testing web_search:")
    print("########################")
    web_result = web_search.__call__(query = "Which are the 3 most popular Nintendo Switch games")
    print(f"{web_result}")
    # print(f"Web search result preview: {web_result}")





#### Testing retrieve_game_info:
################################
Retrieved 5 games from vector database

#### Testing evaluate_retrieval:
#################################
Evaluation: LLM decision is useful: 'False' and 'The retrieved documents provide a limited selection of Nintendo games across various platforms, including the Game Boy Color, Nintendo 64, Wii, Super Nintendo Entertainment System (SNES), and Game Boy Advance. While these documents offer a glimpse into some popular and iconic Nintendo games, they are not comprehensive enough to fully answer the query, 'What Nintendo games are available?' 

**Reasons for Insufficiency:**
1. **Limited Scope:** The documents only cover a small subset of Nintendo's extensive game library. Nintendo has released hundreds of games across multiple platforms, including the Nintendo Switch, Nintendo DS, and more, which are not represented in the retrieved documents.
2. **Platform Diversity:** The query is broad and could encompass games from al

In [202]:
# TODO: Create your Agent abstraction using StateMachine
# Equip with an appropriate model
# Craft a good set of instructions 
# Plug all Tools you developed

udaplay_agent = Agent(
    model_name="gpt-4o-mini",
    instructions=(
            "You are a web-aware assistant that can search for update information "
            "For each query, you must use your retrieve_game tool to get relevant information from the vector database to answer the question. "
            "After you receive the results of that tool, use the evaluate_retrieval tool to see if the results are sufficient to answer the question. "
            "If they are sufficient and useful is true, then return the answer. "
            "If they are insufficient and useful is false, fall back to using the web_search tool to look up the query on the internet. "
            "The web search results can then be used to draft your final answer." 
            "Always cite your sources and explain any discrepancies found.\n"
    ),
    tools = tools
)

print('UdaPlay Agent instantiated successfully!')
print(f'Model: {udaplay_agent.model_name}')
print(f'Number of tools: {len(udaplay_agent.tools)}')
print(f'Available tools: {[tool.name for tool in udaplay_agent.tools]}')

UdaPlay Agent instantiated successfully!
Model: gpt-4o-mini
Number of tools: 3
Available tools: ['retrieve_game', 'evaluate_retrieval', 'web_search']


In [203]:
def print_messages(messages: List[BaseMessage]):
    for m in messages:
        print(f" -> (role = {m.role}, content = {m.content}, tool_calls = {getattr(m, 'tool_calls', None)})")

In [204]:
queries = [
    "When was Pokémon Gold and Silver released?",
    "Which one was the first 3D platformer Mario game?",
    "Was Mortal Kombat X released for Playstation 5?"
]

for i, query in enumerate(queries):
    print(f"Query: {query}")
    run = simple_agent.invoke(query, session_id = i)
    messages = run.get_final_state()['messages']
    print_messages(messages)



Query: When was Pokémon Gold and Silver released?
[StateMachine] Starting: __entry__
[StateMachine] Executing step: message_prep
[StateMachine] Executing step: llm_processor
[StateMachine] Terminating: __termination__
 -> (role = system, content = You are a web-aware assistant that can search for update information For each query, you must use your retrieve_game tool to get relevant information from the vector database to answer the question. After you receive the results of that tool, use the evaluate_retrieval tool to see if the results are sufficient to answer the question. If they are deemed sufficient and useful is true, then return the answer. If they are deemed insufficient and useful is false, fall back to using the web_search tool to look up the query on the internet. The web search results can then be used to draft your final answer.Always cite your sources and explain any discrepancies found.
, tool_calls = None)
 -> (role = user, content = When was Pokémon Gold and Silver r

In [205]:
# Demonstrate Let's prove our agent kept short term memory of the session
for i in range(len(queries)):
    print(f"Session {i}:")
    query = "What have we talked about so far?"
    run = simple_agent.invoke(query, session_id = i)
    print(run.get_final_state()["messages"][-1].content)

Session 0:
[StateMachine] Starting: __entry__
[StateMachine] Executing step: message_prep
[StateMachine] Executing step: llm_processor
[StateMachine] Terminating: __termination__
You have repeatedly asked about the release date of Pokémon Gold and Silver, and I have consistently provided the information that they were released for the Game Boy Color in 1999.
Session 1:
[StateMachine] Starting: __entry__
[StateMachine] Executing step: message_prep
[StateMachine] Executing step: llm_processor
[StateMachine] Terminating: __termination__
We've repeatedly discussed the first 3D platformer Mario game. Each time, I confirmed that *Super Mario 64*, released in 1996 for the Nintendo 64, is the first 3D platformer in the Mario series. It introduced significant innovations in 3D gameplay and open-world design. Your questions have focused on verifying this information multiple times.
Session 2:
[StateMachine] Starting: __entry__
[StateMachine] Executing step: message_prep
[StateMachine] Executing 

### (Optional) Advanced

In [206]:
# TODO: Update your agent with long-term memory
# TODO: Convert the agent to be a state machine, with the tools being pre-defined nodes