<center>
    <p style="text-align:center">
        <img alt="phoenix logo" src="https://storage.googleapis.com/arize-phoenix-assets/assets/phoenix-logo-light.svg" width="200"/>
        <br>
        <a href="https://arize.com/docs/phoenix/">Docs</a>
        |
        <a href="https://github.com/Arize-ai/phoenix">GitHub</a>
        |
        <a href="https://arize-ai.slack.com/join/shared_invite/zt-2w57bhem8-hq24MB6u7yE_ZF_ilOYSBw#/shared-invite/email">Community</a>
    </p>
</center>
<h1 align="center">Agno Travel Agent Tracing Project</h1>

We will create a simple travel agent powered by the Agno framework and OpenAI models. Weâ€™ll begin by installing the necessary OpenInference packages and setting up tracing with Arize.

Next, weâ€™ll define a set of basic tools that provide destination information, estimate trip budgets, and suggest local activities.

For this base agent, weâ€™ll build and run our agent, viewing the resulting trace outputs in Phoenix to understand how the agent uses its tools and reasoning.

We'll then follow along through the Evals Tutorials:
- Configure a core LLM & run a built in eval 
- Configure a custom endpoint LLM 
- Create a custom eval 
- Code Evals

You will need to install Arize Phoenix in your terminal (`pip install arize-phoenix`) an OpenAI API key, and a free [Tavily](https://auth.tavily.com/) API Key.

Ensure you have `phoenix serve` running in your terminal prior to running the following cells. 

## Set up keys and dependenies

In [None]:
%pip install -qqqqqq arize-phoenix-otel arize-phoenix-evals agno openai openinference-instrumentation-agno openinference-instrumentation-openai httpx

In [None]:
import os
from getpass import getpass

os.environ["OPENAI_API_KEY"] = globals().get("OPENAI_API_KEY") or getpass(
    "ðŸ”‘ Enter your OpenAI API Key: "
)
os.environ["TAVILY_API_KEY"] = globals().get("TAVILY_API_KEY") or getpass(
    "ðŸ”‘ Enter your Tavily API Key: "
)

## Setup tracing

In [None]:
from phoenix.otel import register

tracer_provider = register(project_name="agno_travel_agent", auto_instrument=True)

## Define tools

First, weâ€™ll define a few helper functions to support our tools. In particular, weâ€™ll use Tavily Search to help the tools gather general information about each destination.

In [None]:
# --- Helper functions for tools ---
import httpx


def _search_api(query: str) -> str | None:
    """Try Tavily search first, fall back to None."""
    tavily_key = os.getenv("TAVILY_API_KEY")
    if not tavily_key:
        return None
    try:
        resp = httpx.post(
            "https://api.tavily.com/search",
            json={
                "api_key": tavily_key,
                "query": query,
                "max_results": 3,
                "search_depth": "basic",
                "include_answer": True,
            },
            timeout=8,
        )
        data = resp.json()
        answer = data.get("answer") or ""
        snippets = [r.get("content", "") for r in data.get("results", [])]
        combined = " ".join([answer] + snippets).strip()
        return combined[:400] if combined else None
    except Exception:
        return None


def _compact(text: str, limit: int = 200) -> str:
    """Compact text for cleaner outputs."""
    cleaned = " ".join(text.split())
    return cleaned if len(cleaned) <= limit else cleaned[:limit].rsplit(" ", 1)[0]

Our agent will have access to three tools, which weâ€™ll continue to enhance in upcoming labs:

1. Essential Info â€“ Provides key travel details about the destination, such as weather and general conditions.

2. Budget Basics â€“ Offers insights into travel costs and helps plan budgets based on selected activities.

3. Local Flavor â€“ Recommends unique local experiences and cultural highlights.

In [None]:
from agno.tools import tool


@tool
def essential_info(destination: str) -> str:
    """Get basic travel info (weather, best time, attractions, etiquette)."""
    q = f"{destination} travel essentials weather best time top attractions etiquette"
    s = _search_api(q)
    if s:
        return f"{destination} essentials: {_compact(s)}"

    return f"{destination} is a popular travel destination. Expect local culture, cuisine, and landmarks worth exploring."


@tool
def budget_basics(destination: str, duration: str) -> str:
    """Summarize travel cost categories."""
    q = f"{destination} travel budget average daily costs {duration}"
    s = _search_api(q)
    if s:
        return f"{destination} budget ({duration}): {_compact(s)}"
    return f"Budget for {duration} in {destination} depends on lodging, meals, transport, and attractions."


@tool
def local_flavor(destination: str, interests: str = "local culture") -> str:
    """Suggest authentic local experiences."""
    q = f"{destination} authentic local experiences {interests}"
    s = _search_api(q)
    if s:
        return f"{destination} {interests}: {_compact(s)}"
    return f"Explore {destination}'s unique {interests} through markets, neighborhoods, and local eateries."

## Define agent

Next, weâ€™ll construct our agent. The Agno framework makes this process straightforward by allowing us to easily define key parameters such as the model, instructions, and tools.

In [None]:
from agno.agent import Agent
from agno.models.openai import OpenAIChat

trip_agent = Agent(
    name="TripPlanner",
    role="AI Travel Assistant",
    model=OpenAIChat(id="gpt-4.1"),
    instructions=(
        "You are a friendly and knowledgeable travel planner. "
        "Combine multiple tools to create a trip plan including essentials, budget, and local flavor. "
        "Keep the tone natural, clear, and under 1000 words."
    ),
    markdown=True,
    tools=[essential_info, budget_basics, local_flavor],
)

## Run agent

Finally, we are ready to run our agent! Run this cell to see an example in action.

In [None]:
# --- Example usage ---
destination = "Tokyo"
duration = "5 days"
interests = "food, culture"

query = f"""
Plan a {duration} trip to {destination}.
Focus on {interests}.
Include essential info, budget breakdown, and local experiences.
"""
trip_agent.print_response(query, stream=True)

In [None]:
from phoenix.client import Client

client = Client()
spans_df = client.spans.get_spans_dataframe(project_identifier="agno_travel_agent")
agent_spans = spans_df[spans_df["span_kind"] == "AGENT"]
agent_spans

# Run evals with built-in eval templates & an OpenAI Model

Let's first use a classic config for our LLM & built in template for our first eval. 

In [None]:
from phoenix.evals.llm import LLM

llm = LLM(
    provider="openai",
    model="gpt-4o",
    client="openai",
)

In [None]:
from phoenix.evals.metrics import CorrectnessEvaluator

correctness_eval = CorrectnessEvaluator(llm=llm)

print(correctness_eval.describe())

In [None]:
from phoenix.evals import bind_evaluator, evaluate_dataframe
from phoenix.trace import suppress_tracing

bound_evaluator = bind_evaluator(
    evaluator=correctness_eval,
    input_mapping={
        "input": "attributes.input.value",
        "output": "attributes.output.value",
    },
)

with suppress_tracing():
    results_df = evaluate_dataframe(agent_spans, [bound_evaluator])
print(results_df)

In [None]:
from phoenix.evals.utils import to_annotation_dataframe

evaluations = to_annotation_dataframe(dataframe=results_df)

Client().spans.log_span_annotations_dataframe(dataframe=evaluations)

# Run evals with built-in eval templates with a Custom Model

Let's now create a custom config for our LLM & re-run this built in template using the same evaluator.

In [None]:
import os

os.environ["FIREWORKS_API_KEY"] = globals().get("FIREWORKS_API_KEY") or getpass(
    "ðŸ”‘ Enter your Fireworks API Key: "
)

In [None]:
from phoenix.evals.llm import LLM

custom_llm = LLM(
    provider="openai",
    model="accounts/fireworks/models/qwen3-235b-a22b-instruct-2507",
    base_url="https://api.fireworks.ai/inference/v1",
    api_key=os.environ.get("FIREWORKS_API_KEY"),
)

In [None]:
from phoenix.evals import bind_evaluator, evaluate_dataframe
from phoenix.evals.metrics import CorrectnessEvaluator
from phoenix.trace import suppress_tracing

correctness_eval = CorrectnessEvaluator(llm=custom_llm)

bound_evaluator = bind_evaluator(
    evaluator=correctness_eval,
    input_mapping={
        "input": "attributes.input.value",
        "output": "attributes.output.value",
    },
)

with suppress_tracing():
    results_df = evaluate_dataframe(agent_spans, [bound_evaluator])
print(results_df)

In [None]:
evaluations = to_annotation_dataframe(dataframe=results_df)
Client().spans.log_span_annotations_dataframe(dataframe=evaluations)

# Create a Custom Evaluator 

Let's now create a custom evaluator & use the custom LLM config we just created. We can keep on the same idea of correctness, but let's add more application specific context to it. 

In [None]:
CUSTOM_CORRECTNESS_TEMPLATE = """You are an expert evaluator judging whether a travel planner agent's response is correct. The agent is a friendly travel planner that must combine multiple tools to create a trip plan with: (1) essential info, (2) budget breakdown, and (3) local flavor/experiences.

CORRECT - The response:
- Accurately addresses the user's destination, duration, and stated interests
- Includes essential travel info (e.g., weather, best time to visit, key attractions, etiquette) for the destination
- Includes a budget or cost breakdown appropriate to the destination and trip duration
- Includes local experiences, cultural highlights, or authentic recommendations matching the user's interests
- Is factually accurate, logically consistent, and helpful for planning the trip
- Uses precise, travel-appropriate terminology

INCORRECT - The response contains any of:
- Factual errors about the destination, costs, or local info
- Missing essential info when the user asked for a full trip plan
- Missing or irrelevant budget information for the given destination/duration
- Missing or generic local experiences that do not match the user's interests
- Wrong destination, duration, or interests addressed
- Contradictions, misleading statements, or unhelpful/off-topic content

[BEGIN DATA]
************
[User Input]:
{{input}}

************
[Travel Plan]:
{{output}}
************
[END DATA]

Focus on factual accuracy and completeness of the trip plan (essentials, budget, local flavor). Is the output correct or incorrect?"""

In [None]:
from phoenix.evals import ClassificationEvaluator

custom_correctness_evaluator = ClassificationEvaluator(
    name="custom_correctness",
    llm=llm,
    prompt_template=CUSTOM_CORRECTNESS_TEMPLATE,
    choices={"correct": 1, "incorrect": 0},
)

In [None]:
bound_evaluator = bind_evaluator(
    evaluator=custom_correctness_evaluator,
    input_mapping={
        "input": "attributes.input.value",
        "output": "attributes.output.value",
    },
)

with suppress_tracing():
    results_df = evaluate_dataframe(agent_spans, [bound_evaluator])
print(results_df)

In [None]:
evaluations = to_annotation_dataframe(dataframe=results_df)
Client().spans.log_span_annotations_dataframe(dataframe=evaluations)