[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://drive.google.com/drive/folders/1IrwoNrb3AWLAhAqjlAkJNYa39p9eT9ui?usp=sharing)

# Flotorch Agent Goal Accuracy Evaluation (Travel Planner Agent Use Case)

This notebook demonstrates how to measure and analyze the **agent goal accuracy** of a **Flotorch ADK agent** (configured as a **Travel Planner Agent** that generates travel plans with real-time pricing and best options) using the **Flotorch Eval** framework.

The evaluation relies on **OpenTelemetry Traces** generated during the agent's run to assess whether the agent successfully accomplished the user's true goal.

---

## Key Concepts

* **Travel Planner Agent**: An agent designed to generate travel plans with real-time pricing and best options.
* **OpenTelemetry Traces**: Detailed records of the agent's execution steps (spans) used to analyze goal accomplishment.
* **AgentGoalAccuracy**: A Flotorch Eval metric that evaluates whether the agent successfully accomplished the user's true goal. The evaluation metric used is **agent_goal_accuracy**.

---

### Architecture Overview

![Workflow Diagram](diagrams/05_AgentGoalAccuracy_Workflow_Diagram.drawio.png)
*Figure 2: Detailed workflow diagram showing the step-by-step process of agent goal accuracy evaluation from agent execution through trace collection to metric computation.*

---

## Requirements

* Flotorch account with configured models.
* Valid Flotorch API key and gateway base URL.
* Agent configured with OpenTelemetry tracing enabled.

---

## Agent Setup in Flotorch Console

**Important**: Before running this notebook, you need to create an agent in the Flotorch Console. This section provides step-by-step instructions on how to set up the agent.

### Step 1: Access Flotorch Console

1. **Log in to Flotorch Console**:
   - Navigate to your Flotorch Console (e.g., `https://dev-console.flotorch.cloud`)
   - Ensure you have the necessary permissions to create agents

2. **Navigate to Agents Section**:
   - Click on **"Agents"** in the left sidebar
   - You should see the "Agent Builder" option selected

### Step 2: Create New Agent

1. **Click "Create FloTorch Agent"**:
   - Look for the blue **"+ Create FloTorch Agent"** button in the top right corner
   - Click it to start creating a new agent

2. **Agent Configuration**:
   - **Agent Name**: Choose a unique name for your agent (e.g., `travel-planner-agent`)
     - **Important**: The name should only contain alphanumeric characters and dashes (a-z, A-Z, 0-9, -)
     - **Note**: Copy this agent name - you'll need to use it in the `agent_name` variable later
   - **Description** (Optional): Add a description if desired

### Step 3: Configure Agent Details

After creating the agent, you'll be directed to the agent configuration page. Configure the following:

#### Required Configuration:

1. **Model** (`* Model`):
   - **Required**: Select a model from the available options
   - Example: `gpt-model` or any available model from your Flotorch gateway
   - Click the edit icon to configure

2. **Agent Details** (`* Agent Details`):
   - **Required**: Configure agent details
   - **System Prompt**: Copy and paste the following system prompt:

You are the Travel Planner Agent. Your purpose is to help users create comprehensive travel plans with real-time pricing and best options. Use available tools to search for flight options, hotel recommendations, weather information, and local attractions to build detailed itineraries.

Guidelines:
Always use web_search to find current flight prices, hotel rates, and travel information.
Use weather tools to provide accurate weather forecasts for the travel dates.
Create detailed daily itineraries that include activities, dining, and sightseeing options.
Calculate and present estimated total costs including flights, hotels, meals, and activities.
Provide multiple options when possible to help users make informed decisions.
Present information in a clear, organized format with dates, times, and locations.
Always verify current pricing and availability using search tools before making recommendations.

   - **Goal**: Copy and paste the following goal:
   
To deliver comprehensive travel plans with real-time pricing, detailed itineraries, and best options by using search tools to gather current flight information, hotel recommendations, weather forecasts, and local attractions, ensuring users receive accurate, up-to-date travel planning assistance.


#### Optional Configuration:

1. **Tools**:
   - Tools will be added programmatically via the notebook (see Section 8)
   - You can leave this as "Not Configured" in the console

2. **Input Schema**:
   - Optional: Leave as "Not Configured" for this use case

3. **Output Schema**:
   - Optional: Leave as "Not Configured" for this use case

### Step 4: Publish the Agent

1. **Review Configuration**:
   - Ensure the Model and Agent Details are configured correctly
   - Verify the System Prompt and Goal are set

2. **Publish Agent**:
   - After configuration, click **"Publish"** or **"Make a revision"** to publish the agent
   - Once published, the agent will have a version number (e.g., v1)

3. **Note the Agent Name**:
   - **Important**: Copy the exact agent name you used when creating the agent
   - You will need to replace `<your_agent_name>` in the `agent_name` variable in Section 2.1 (Global Provider Models and Agent Configuration)

### Step 5: Update Notebook Configuration

1. **Update Agent Name**:
   - Navigate to Section 2.1 in this notebook
   - Find the `agent_name` variable
   - Replace `<your_agent_name>` with the exact agent name you created in the console

**Example**:
- If you created an agent named `travel-planner-agent` in the console
- Set `agent_name = "travel-planner-agent"` in the notebook

### Summary of Required vs Optional Settings

| Setting | Required/Optional | Value |
|---------|------------------|-------|
| **Agent Name** | **Required** | Choose a unique name (copy it for notebook) |
| **Model** | **Required** | Select from available models |
| **System Prompt** | **Required** | Use the system prompt provided above |
| **Goal** | **Required** | Use the goal provided above |
| **Tools** | **Optional** | Will be added via notebook code |
| **Input Schema** | **Optional** | Can leave as "Not Configured" |
| **Output Schema** | **Optional** | Can leave as "Not Configured" |

**Note**: The tools (Knowledge Base, Web Search, Weather, News) will be added to the agent programmatically in the notebook code, so you don't need to configure them manually in the console.

---


## 1. Setup and Installation

### Purpose
Install the necessary packages for the Flotorch Evaluation framework required for agent goal accuracy evaluation.

### Key Components
- **`flotorch-eval`**: Flotorch evaluation framework with all dependencies for agent goal accuracy metrics


In [None]:
# Install Flotorch Eval packages
# flotorch-eval: Flotorch evaluation framework with all dependencies

%pip install flotorch-eval==2.0.0b1 flotorch[adk]==3.1.0b1

## 2.Authentication and Credentials

### Purpose
Configure your Flotorch API credentials and gateway URL for authentication.

### Key Components
This cell configures the essential authentication and connection parameters:

**Authentication Parameters**:

| Parameter | Description | Example |
|-----------|-------------|---------|
| `FLOTORCH_API_KEY` | Your API authentication key (found in your Flotorch Console). Securely entered using `getpass` to avoid displaying in the notebook | `sk_...` |
| `FLOTORCH_BASE_URL` | Your Flotorch gateway endpoint URL | `https://dev-console.flotorch.cloud` |

**Note**: Use secure credential management in production environments.


In [None]:
import getpass  # Securely prompt without echoing in Prefect/notebooks

# authentication for Flotorch access
try:
    FLOTORCH_API_KEY = getpass.getpass("Paste your API key here: ")  
    print(f"✓ FLOTORCH_API_KEY set successfully")
except getpass.GetPassWarning as e:
    print(f"Warning: {e}")
    FLOTORCH_API_KEY = ""
    print(f"✗ FLOTORCH_API_KEY not set")

FLOTORCH_BASE_URL = input("Paste your Flotorch Base URL here: ")  # https://dev-gateway.flotorch.cloud
print(f"✓ FLOTORCH_BASE_URL set: {FLOTORCH_BASE_URL}")

print("✓ All credentials configured successfully!")

### 2.1. Global Provider Models and Agent Configuration

### Purpose
Define available models from the Flotorch gateway and configure agent-specific parameters.

### Key Components

**Global Provider Models**: These are the available models from the Flotorch gateway that can be used for evaluation and agent operations:

| Model Variable | Model Name | Description |
|----------------|------------|-------------|
| `MODEL_CLAUDE_HAIKU` | `flotorch/flotorch-claude-haiku-4-5` | Claude Haiku model via Flotorch gateway |
| `MODEL_CLAUDE_SONNET` | `flotorch/flotorch-claude-sonnet-3-5-v2` | Claude Sonnet model via Flotorch gateway |
| `MODEL_AWS_NOVA_PRO` | `flotorch/flotorch-aws-nova-pro` | AWS Nova Pro model via Flotorch gateway |
| `MODEL_AWS_NOVA_LITE` | `flotorch/flotorch-aws-nova-lite` | AWS Nova Lite model via Flotorch gateway |
| `MODEL_AWS_NOVA_MICRO` | `flotorch/flotorch-aws-nova-micro` | AWS Nova Micro model via Flotorch gateway |

**Agent Configuration Parameters**:

| Parameter | Description | Example |
|-----------|-------------|---------|
| `default_evaluator` | The LLM model used for evaluation (can use MODEL_* variables above) | `MODEL_CLAUDE_SONNET` or `flotorch/flotorch-model` |
| `agent_name` | The name of your Flotorch ADK agent | `travel-planner-agent` |
| `app_name` | The application name identifier | `agent-evaluation-app-name_05` |
| `user_id` | The user identifier | `agent-evaliation-user-05` |


In [None]:
# ============================================================================
# Global Provider Models (Flotorch Gateway Models)
# ============================================================================
# These models are available from the Flotorch gateway and can be used
# for evaluation, agent operations, and other tasks.

MODEL_CLAUDE_HAIKU = "flotorch/flotorch-claude-haiku-4-5"
MODEL_CLAUDE_SONNET = "flotorch/flotorch-claude-sonnet-3-5-v2"
MODEL_AWS_NOVA_PRO = "flotorch/flotorch-aws-nova-pro"
MODEL_AWS_NOVA_LITE = "flotorch/flotorch-aws-nova-lite"
MODEL_AWS_NOVA_MICRO = "flotorch/flotorch-aws-nova-micro"

print("✓ Global provider models defined")

# The LLM model used for evaluation.
# Can be modified to use any MODEL_* constant above (e.g., MODEL_CLAUDE_SONNET, MODEL_AWS_NOVA_PRO)
# You can use your own models from Flotorch Console as well
default_evaluator = MODEL_CLAUDE_HAIKU

agent_name = "<your_agent_name>"  # The name of your Flotorch ADK agent                                        || ex : travel-planner-agent
app_name = "<your_app_name>"  # The application name identifier                                                || ex : agent-evaluation-app-name_05
user_id = "<your_user_id>"  # The user identifier                                                              || ex : agent-evaliation-user-05

print("✓ Agent Configuration Parameter defined ")


## 3. Import Required Libraries

### Purpose
Import all required components for evaluating the Travel Planner Agent goal accuracy using Flotorch Eval.

### Key Components
- **`AgentEvaluator`**: Core client for agent evaluation orchestration and trace fetching
- **`AgentGoalAccuracy`**: Flotorch Eval metric that evaluates whether the agent successfully accomplished the user's true goal
- **`FlotorchADKAgent`**: Creates and configures Flotorch ADK agents with tracing
- **`FlotorchADKSession`**: Manages agent sessions for multi-turn conversations
- **`Runner`**: Executes agent queries and coordinates the agent execution flow
- **`types`**: Google ADK types for creating message content and handling agent events
- **`pandas`**: Data manipulation and display for formatted results tables
- **`display`**: IPython display utility for rendering formatted outputs

In [None]:
# Required imports
# Flotorch Eval components
from flotorch_eval.agent_eval.core.client import AgentEvaluator
from flotorch_eval.agent_eval.metrics.llm_evaluators import AgentGoalAccuracy

# Flotorch ADK components
from flotorch.adk.agent import FlotorchADKAgent
from flotorch.adk.sessions import FlotorchADKSession

# Google ADK components
from google.adk.runners import Runner
from google.genai import types

# Utilities
import pandas as pd
from IPython.display import display

print("✓ Imported necessary libraries successfully")

## 4. Travel Planner Agent Setup

### Purpose
Set up the Travel Planner Agent with OpenTelemetry tracing enabled to capture detailed execution data for agent goal accuracy evaluation.

### Key Components
1. **FlotorchADKAgent** (`agent_client`):
   - Initializes the agent for travel planning tasks
   - Configures `tracer_config` with `enabled: True` and `sampling_rate: 1` to capture 100% of traces
   - Essential for evaluation as traces contain complete goal accomplishment information
2. **FlotorchADKSession** (`session_service`): Manages agent sessions for multi-turn conversations
3. **Runner** (`runner`): Executes agent queries and coordinates the agent execution flow

These components work together to run the Travel Planner Agent and generate OpenTelemetry traces for agent goal accuracy analysis.

In [None]:
# Initialize Flotorch ADK Agent with tracing enabled
agent_client = FlotorchADKAgent(
    agent_name=agent_name,
    base_url=FLOTORCH_BASE_URL,
    api_key=FLOTORCH_API_KEY,
    tracer_config={
        "enabled": True,                                                   # Enable tracing for Agent goal accuracy measurement
        "endpoint": "https://dev-observability.flotorch.cloud/v1/traces",  # Dev observability OTLP HTTP endpoint (used by QA)
        "sampling_rate": 1                                                 # Sample 100% of traces
    }
)
agent = agent_client.get_agent()

# Initialize session service
session_service = FlotorchADKSession(
    api_key=FLOTORCH_API_KEY,
    base_url=FLOTORCH_BASE_URL,
)

# Create the ADK Runner to execute agent queries
runner = Runner(
    agent=agent,
    app_name=app_name,
    session_service=session_service
)

print("✓ Agent and runner and session initialized successfully")

## 5. Helper Function for Running a Query

### Purpose
Define a helper function that executes a single-turn query with the agent and extracts the final response. The agent execution is automatically traced for agent goal accuracy evaluation.

### Functionality
The `run_single_turn` function:
- Accepts a `Runner`, query string, session ID, and user ID as parameters
- Creates a user message using Google ADK types
- Executes the query through the runner
- Iterates through events to find and return the final agent response
- Returns a fallback message if no response is found

This function simplifies the process of running queries and ensures trace generation during execution.

In [None]:
def run_single_turn(runner: Runner, query: str, session_id: str, user_id: str) -> str:
    """
    Execute a single-turn query with the agent and return the final response.
    The agent execution is traced automatically.
    """
    content = types.Content(role="user", parts=[types.Part(text=query)])
    events = runner.run(user_id=user_id, session_id=session_id, new_message=content)

    # Extract the final response
    for event in events:
        if event.is_final_response() and event.content and event.content.parts:
            return event.content.parts[0].text
    return "No response from agent."

print("✓ Helper function defined successfully")

## 6. Define Query

### Purpose
Define the sample travel planning query that will be executed by the Travel Planner Agent to generate OpenTelemetry traces for agent goal accuracy evaluation.

### Key Components
- **`query`**: A sample travel planning request that will be processed by the agent
  - This query will trigger the agent to generate a comprehensive travel plan with multiple components (flights, hotels, itinerary, costs)
  - The query will test the agent's ability to accomplish the user's true goal (planning a complete trip)
  - The execution will be automatically traced to capture goal accomplishment information
  - The agent's response will be evaluated for goal accuracy using the AgentGoalAccuracy metric to assess whether the agent successfully accomplished the user's true goal
  - Example: "Plan a 5-day trip to Singapore from Bangalore in March, including flight options, hotel recommendations, daily itinerary, and estimated total cost."

The query can be modified to test different travel planning scenarios and evaluate goal accuracy for various types of travel-related requests.


In [None]:
# Execute the query to generate traces
query = "Plan a 5-day trip to Singapore from Bangalore in March, including flight options, hotel recommendations, daily itinerary, and estimated total cost."

print(f"✓ Query defined: {query}")

## 7. Run the Query and Get Trace ID

### Purpose
Execute a sample travel planning query with the Travel Planner Agent to generate OpenTelemetry traces that contain goal accomplishment data for evaluation.

### Process
1. **Create Session**: Initialize a new session for the agent interaction
2. **Execute Query**: Run a sample travel planning query (e.g., "Plan a 5-day trip to Singapore from Bangalore in March, including flight options, hotel recommendations, daily itinerary, and estimated total cost.") through the agent
3. **Retrieve Trace IDs**: Extract the generated trace IDs from the agent client
4. **Display Results**: Print the agent response and trace ID for verification

The execution automatically generates OpenTelemetry traces that record goal accomplishment information, which will be used for agent goal accuracy evaluation.

In [None]:
# Create a new session
session = await runner.session_service.create_session(
    app_name=app_name,
    user_id=user_id,
)
print(f"Session created: {session.id}")

response = run_single_turn(
    runner=runner,
    query=query,
    session_id=session.id,
    user_id=user_id
)

# Retrieve the generated trace IDs
trace_ids = agent_client.get_tracer_ids()
print("Agent Response:")
print(response[:200] + "..." if len(response) > 200 else response)
print(f"Found {len(trace_ids)} trace(s). First trace ID: {trace_ids[0] if trace_ids else 'N/A'}")

print(f"✓ Query execution completed successfully")

## 8. Agent Goal Accuracy Evaluation with Flotorch Eval

### Purpose
Initialize the `AgentEvaluator`, fetch the OpenTelemetry trace, and run the `AgentGoalAccuracy` metric to evaluate goal accomplishment. The evaluation metric **agent_goal_accuracy** provides detailed assessment of whether the Travel Planner Agent successfully accomplished the user's true goal.

### Key Components
1. **AgentGoalAccuracy**: Initializes the agent goal accuracy metric that will analyze trace data
2. **AgentEvaluator** (`client`):
   - Connects to the Flotorch gateway using API credentials
   - Configured with a default evaluator model
   - Provides methods to fetch and evaluate traces
3. **Trace Fetching**: Retrieves the complete trace data using the trace ID generated during agent execution

The fetched trace contains detailed information about goal accomplishment, which will be analyzed by the AgentGoalAccuracy metric to compute the agent_goal_accuracy score.

In [None]:
# Initialize the AgentGoalAccuracy metric
metrics = [AgentGoalAccuracy()]

# Initialize the AgentEvaluator client
client = AgentEvaluator(
    api_key=FLOTORCH_API_KEY,
    base_url=FLOTORCH_BASE_URL,
    default_evaluator=default_evaluator
)

traces = None

if trace_ids:
    # Fetch the trace data from the Flotorch gateway
    traces = client.fetch_traces(trace_ids[0])
    print(f"✓ Trace fetched successfully")
else:
    print("✗ No trace IDs found to fetch.")

## 9. Run Evaluation

### Purpose
Execute the agent goal accuracy evaluation by processing the fetched OpenTelemetry trace using the AgentGoalAccuracy metric to assess goal accomplishment.

### Process
- Calls `client.evaluate()` with the trace data and AgentGoalAccuracy metric
- The evaluator processes the trace to analyze goal accomplishment
- Computes the **agent_goal_accuracy** metric which includes:
  - Accuracy score (0.0 to 1.0) indicating whether the agent successfully accomplished the user's true goal
  - Detailed evaluation explanation including:
    - User goal summary
    - Agent perception summary
    - Execution path analysis
    - Final outcome evaluation
    - Overall conclusion
- Returns evaluation results with trajectory ID and metric scores

This step generates the agent goal accuracy analysis that will be displayed in the next section.

In [None]:
if 'traces' in locals() and traces:
    # Evaluate the trace using the AgentGoalAccuracy metric
    results = await client.evaluate(
        trace=traces,
        metrics=metrics
    )

    print("✓ Evaluation completed successfully!")
else:
    print("Cannot evaluate: No traces were available.")

## 10. Display and Interpret Results

### Purpose
Define helper functions to format and display the evaluation output clearly, showing the agent_goal_accuracy metric results in a readable format.

### Functionality
The `display_metrics` function:
- Extracts the `agent_goal_accuracy` metric from evaluation results
- Formats the accuracy score and evaluation details
- Creates a structured display showing:
  - Goal Accuracy Score (0.0 to 1.0)
  - Detailed evaluation explanation including user goal summary, agent perception, execution path analysis, and overall conclusion
- Uses pandas DataFrame with styled formatting for clean presentation

This function provides a user-friendly way to visualize agent goal accuracy metrics.

In [None]:
import json
import pandas as pd
from IPython.display import display

def display_metrics(result):
    """
    Display only the 'agent_goal_accuracy' metric in a clean multiline table.
    """
    print(f"Trajectory ID: {result.trajectory_id}")
    print(f"Timestamp    : {result.timestamp}\n")

    # Find the metric
    metric = next((m for m in result.scores
                   if m.name == "agent_goal_accuracy"), None)
    if not metric:
        print("Metric 'agent_goal_accuracy' not found.")
        return

    # The details field contains a JSON string under key "details"
    raw_details = metric.details.get("details", "{}")
    parsed = json.loads(raw_details)

    # Convert dict to multiline text
    details_str = "\n".join(f"{k}: {v}" for k, v in parsed.items())

    # DataFrame for display
    df = pd.DataFrame([{
        "Metric": metric.name,
        "Score": f"{metric.score:.2f}",
        "Details": details_str
    }])

    # Display with multiline support
    display(
        df.style.set_properties(
            subset=["Details"],
            **{"white-space": "pre-wrap", "text-align": "left"}
        )
    )

print("✓ Display metrics function defined successfully")


## 11. View Agent Goal Accuracy Results

### Purpose
Display the agent goal accuracy evaluation results in a formatted table showing the complete assessment for the Travel Planner Agent.

### Output
The displayed table includes:
- **Metric**: The evaluation metric name (agent_goal_accuracy)
- **Score**: The goal accuracy score (0.0 to 1.0)
- **Details**: Comprehensive evaluation showing:
  - User goal summary
  - Agent perception summary
  - Execution path analysis
  - Final outcome evaluation
  - Overall conclusion

This visualization helps identify goal accomplishment issues and optimize the agent's travel planning capabilities.

In [None]:
if 'results' in locals():
    display_metrics(results)
else:
    print("No results object found. Please run sections 5 and 6 first.")

### Interpreting the Agent Goal Accuracy Results

The **agent_goal_accuracy** metric is a vital tool for quality monitoring of the Travel Planner Agent:

* **Accuracy Score (0.0 to 1.0)**: Indicates whether the agent successfully accomplished the user's true goal:
    * **1.0**: Perfect goal accomplishment - agent fully understood and completed the user's goal
    * **0.5-0.9**: Good goal accomplishment with minor gaps or incomplete aspects
    * **0.0-0.4**: Poor goal accomplishment - agent failed to understand or complete the user's goal
* **Evaluation Details**: Provides a detailed explanation of:
    * **User Goal Summary**: What the user actually wanted to accomplish
    * **Agent Perception Summary**: How the agent understood the task
    * **Execution Path Analysis**: Whether the agent's actions aligned with the goal
    * **Final Outcome Evaluation**: Whether the final output meets the user's requirements
    * **Overall Conclusion**: Comprehensive assessment of goal accomplishment

For a Travel Planner Agent, understanding agent goal accuracy helps identify:
- **Goal understanding gaps**: If the agent misinterprets travel planning requirements (e.g., missing flight options, hotel recommendations, or itinerary details)
- **Completeness issues**: If the agent provides incomplete travel plans (e.g., missing pricing, itinerary gaps, or incomplete recommendations)
- **Quality concerns**: If the agent's travel plans don't meet user expectations (e.g., unrealistic pricing, poor itinerary structure, or missing critical information)
- **Overall effectiveness**: Monitor goal accomplishment to ensure the agent delivers comprehensive and accurate travel plans with real-time pricing and best options

## 12. Summary of Agent Goal Accuracy Evaluation Notebook

This notebook demonstrates the professional methodology for evaluating the goal accuracy of a **Flotorch ADK Agent** (configured as a **Travel Planner Agent** that generates travel plans with real-time pricing and best options) using the **Flotorch Eval framework**.

**Use Case**: Travel Planner Agent - Generates travel plans with real-time pricing and best options.

**Evaluation Metric**: agent_goal_accuracy

## Core Process

### 1. Setup and Instrumentation
- Configure a `FlotorchADKAgent` for travel planning tasks.
- Enable **OpenTelemetry Tracing** via the `tracer_config`.
- This instrumentation allows detailed capture of the complete agent trajectory and goal accomplishment.

### 2. Execution and Data Generation
- Run a sample travel planning query through the agent using the **Runner**.
- This automatically generates an **Agent Trajectory** in the form of OpenTelemetry traces.
- The trace records the complete execution path, including:
  - Goal understanding and interpretation
  - Travel plan generation and recommendations
  - Pricing and option selection
  - Step-by-step agent operations

### 3. Evaluation
- Use the `AgentEvaluator` client along with the specialized **AgentGoalAccuracy** metric from `flotorch-eval`.
- The evaluator processes the trace data to compute goal accuracy statistics using the **agent_goal_accuracy** metric.

### 4. Analysis
- The notebook displays a thorough goal accuracy assessment, including:
  - **Accuracy Score** (0.0 to 1.0)
  - **Evaluation Details** explaining goal accomplishment including:
    - User goal summary
    - Agent perception summary
    - Execution path analysis
    - Final outcome evaluation
    - Overall conclusion

## Purpose and Benefits

This evaluation provides **actionable quality metrics** that help developers:

- Identify goal accomplishment issues in the Travel Planner Agent  
- Optimize travel plan generation to better meet user requirements  
- Track quality trends over time  
- Ensure the Travel Planner Agent delivers **comprehensive and accurate travel plans** with real-time pricing and best options