[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://drive.google.com/drive/folders/1IrwoNrb3AWLAhAqjlAkJNYa39p9eT9ui?usp=sharing)

# Flotorch Agent Tool Call Accuracy Evaluation (Live Weather Agent Use Case)

This notebook demonstrates how to measure and analyze the **tool call accuracy** of a **Flotorch ADK agent** (configured as a **Live Weather Agent** that delivers real-time weather forecasts and data via API integration) using the **Flotorch Eval** framework.

The evaluation relies on **OpenTelemetry Traces** generated during the agent's run to assess the accuracy and appropriateness of tool usage decisions.

---

## Key Concepts

* **Live Weather Agent**: An agent designed to deliver real-time weather forecasts and data via API integration.
* **OpenTelemetry Traces**: Detailed records of the agent's execution steps (spans) used to analyze tool call decisions and accuracy.
* **ToolCallAccuracy**: A Flotorch Eval metric that evaluates the accuracy and appropriateness of tool usage decisions. The evaluation metric used is **toolcall_accuracy**.

---
### Architecture Overview

![Workflow Diagram](diagrams/04_ToolCallAccuracy_Workflow_Diagram.drawio.png)
*Figure 2: Detailed workflow diagram showing the step-by-step process of tool call accuracy evaluation from agent execution through trace collection to metric computation.*

---


## Requirements

* Flotorch account with configured models.
* Valid Flotorch API key and gateway base URL.
* Agent configured with OpenTelemetry tracing enabled.
* External API access for weather data (e.g., Open-Meteo API).

---

## Agent Setup in Flotorch Console

**Important**: Before running this notebook, you need to create an agent in the Flotorch Console. This section provides step-by-step instructions on how to set up the agent.

### Step 1: Access Flotorch Console

1. **Log in to Flotorch Console**:
   - Navigate to your Flotorch Console (e.g., `https://dev-console.flotorch.cloud`)
   - Ensure you have the necessary permissions to create agents

2. **Navigate to Agents Section**:
   - Click on **"Agents"** in the left sidebar
   - You should see the "Agent Builder" option selected

### Step 2: Create New Agent

1. **Click "Create FloTorch Agent"**:
   - Look for the blue **"+ Create FloTorch Agent"** button in the top right corner
   - Click it to start creating a new agent

2. **Agent Configuration**:
   - **Agent Name**: Choose a unique name for your agent (e.g., `toolcall-agent`)
     - **Important**: The name should only contain alphanumeric characters and dashes (a-z, A-Z, 0-9, -)
     - **Note**: Copy this agent name - you'll need to use it in the `agent_name` variable later
   - **Description** (Optional): Add a description if desired

### Step 3: Configure Agent Details

After creating the agent, you'll be directed to the agent configuration page. Configure the following:

#### Required Configuration:

1. **Model** (`* Model`):
   - **Required**: Select a model from the available options
   - Example: `gpt-model` or any available model from your Flotorch gateway
   - Click the edit icon to configure

2. **Agent Details** (`* Agent Details`):
   - **Required**: Configure agent details
   - **System Prompt**: Copy and paste the following system prompt:

you are the helpful assistant. you need call the get_weather tool when user ask about the weather.

Available tools:
1.get_weather


   - **Goal**: Copy and paste the following goal:
   
you are the helpful assistant. you need call the get_weather tool when user ask about the weather.


#### Optional Configuration:

1. **Tools**:
   - Tools will be added programmatically via the notebook (see Section 8)
   - You can leave this as "Not Configured" in the console

2. **Input Schema**:
   - Optional: Leave as "Not Configured" for this use case

3. **Output Schema**:
   - Optional: Leave as "Not Configured" for this use case

### Step 4: Publish the Agent

1. **Review Configuration**:
   - Ensure the Model and Agent Details are configured correctly
   - Verify the System Prompt and Goal are set

2. **Publish Agent**:
   - After configuration, click **"Publish"** or **"Make a revision"** to publish the agent
   - Once published, the agent will have a version number (e.g., v1)

3. **Note the Agent Name**:
   - **Important**: Copy the exact agent name you used when creating the agent
   - You will need to replace `<your_agent_name>` in the `agent_name` variable in Section 2.1 (Global Provider Models and Agent Configuration)

### Step 5: Update Notebook Configuration

1. **Update Agent Name**:
   - Navigate to Section 2.1 in this notebook
   - Find the `agent_name` variable
   - Replace `<your_agent_name>` with the exact agent name you created in the console

**Example**:
- If you created an agent named `toolcall-agent` in the console
- Set `agent_name = "toolcall-agent"` in the notebook

### Summary of Required vs Optional Settings

| Setting | Required/Optional | Value |
|---------|------------------|-------|
| **Agent Name** | **Required** | Choose a unique name (copy it for notebook) |
| **Model** | **Required** | Select from available models |
| **System Prompt** | **Required** | Use the system prompt provided above |
| **Goal** | **Required** | Use the goal provided above |
| **Tools** | **Optional** | Will be added via notebook code |
| **Input Schema** | **Optional** | Can leave as "Not Configured" |
| **Output Schema** | **Optional** | Can leave as "Not Configured" |

**Note**: The tools (Knowledge Base, Web Search, Weather, News) will be added to the agent programmatically in the notebook code, so you don't need to configure them manually in the console.

---


## 1. Setup and Installation

### Purpose
Install the necessary packages for the Flotorch Evaluation framework required for tool call accuracy evaluation.

### Key Components
- **`flotorch-eval`**: Flotorch evaluation framework with all dependencies for tool call accuracy metrics


In [None]:
# Install Flotorch Eval packages
# flotorch-eval: Flotorch evaluation framework with all dependencies

%pip install flotorch-eval==2.0.0b1 flotorch[adk]==3.1.0b1

## 2.Authentication and Credentials

### Purpose
Configure your Flotorch API credentials and gateway URL for authentication.

### Key Components
This cell configures the essential authentication and connection parameters:

**Authentication Parameters**:

| Parameter | Description | Example |
|-----------|-------------|---------|
| `FLOTORCH_API_KEY` | Your API authentication key (found in your Flotorch Console). Securely entered using `getpass` to avoid displaying in the notebook | `sk_...` |
| `FLOTORCH_BASE_URL` | Your Flotorch gateway endpoint URL | `https://dev-console.flotorch.cloud` |

**Note**: Use secure credential management in production environments.


In [None]:
import getpass  # Securely prompt without echoing in Prefect/notebooks

# authentication for Flotorch access
try:
    FLOTORCH_API_KEY = getpass.getpass("Paste your API key here: ")
    print(f"✓ FLOTORCH_API_KEY set successfully")
except getpass.GetPassWarning as e:
    print(f"Warning: {e}")
    FLOTORCH_API_KEY = ""
    print(f"✗ FLOTORCH_API_KEY not set")

FLOTORCH_BASE_URL = input("Paste your Flotorch Base URL here: ")  # Prefect gateway or cloud endpoint          || https://dev-console.flotorch.cloud
print(f"✓ FLOTORCH_BASE_URL set: {FLOTORCH_BASE_URL}")

print("✓ All credentials configured successfully!")

### 2.1. Global Provider Models and Agent Configuration

### Purpose
Define available models from the Flotorch gateway and configure agent-specific parameters.

### Key Components

**Global Provider Models**: These are the available models from the Flotorch gateway that can be used for evaluation and agent operations:

| Model Variable | Model Name | Description |
|----------------|------------|-------------|
| `MODEL_CLAUDE_HAIKU` | `flotorch/flotorch-claude-haiku-4-5` | Claude Haiku model via Flotorch gateway |
| `MODEL_CLAUDE_SONNET` | `flotorch/flotorch-claude-sonnet-3-5-v2` | Claude Sonnet model via Flotorch gateway |
| `MODEL_AWS_NOVA_PRO` | `flotorch/flotorch-aws-nova-pro` | AWS Nova Pro model via Flotorch gateway |
| `MODEL_AWS_NOVA_LITE` | `flotorch/flotorch-aws-nova-lite` | AWS Nova Lite model via Flotorch gateway |
| `MODEL_AWS_NOVA_MICRO` | `flotorch/flotorch-aws-nova-micro` | AWS Nova Micro model via Flotorch gateway |

**Agent Configuration Parameters**:

| Parameter | Description | Example |
|-----------|-------------|---------|
| `default_evaluator` | The LLM model used for evaluation (can use MODEL_* variables above) | `MODEL_CLAUDE_SONNET` or `flotorch/flotorch-model` |
| `agent_name` | The name of your Flotorch ADK agent | `toolcall-agent` |
| `app_name` | The application name identifier | `agent-evaluation-app-name_04` |
| `user_id` | The user identifier | `agent-evaliation-user-04` |


In [None]:
# ============================================================================
# Global Provider Models (Flotorch Gateway Models)
# ============================================================================
# These models are available from the Flotorch gateway and can be used
# for evaluation, agent operations, and other tasks.

MODEL_CLAUDE_HAIKU = "flotorch/flotorch-claude-haiku-4-5"
MODEL_CLAUDE_SONNET = "flotorch/flotorch-claude-sonnet-3-5-v2"
MODEL_AWS_NOVA_PRO = "flotorch/flotorch-aws-nova-pro"
MODEL_AWS_NOVA_LITE = "flotorch/flotorch-aws-nova-lite"
MODEL_AWS_NOVA_MICRO = "flotorch/flotorch-aws-nova-micro"

print("✓ Global provider models defined")

# The LLM model used for evaluation.
# Can be modified to use any MODEL_* constant above (e.g., MODEL_CLAUDE_SONNET, MODEL_AWS_NOVA_PRO)
# You can use your own models from Flotorch Console as well
default_evaluator = MODEL_CLAUDE_HAIKU

agent_name = "<your_agent_name>"  # The name of your Flotorch ADK agent                                        || ex : toolcall-agent
app_name = "<your_app_name>"  # The application name identifier                                                || ex : agent-evaluation-app-name_04
user_id = "<your_user_id>"  # The user identifier                                                              || ex : agent-evaliation-user-04

print("✓ Agent Configuration Parameter defined ")


## 3. Import Required Libraries

### Purpose
Import all required components for evaluating the Live Weather Agent tool call accuracy using Flotorch Eval.

### Key Components
- **`AgentEvaluator`**: Core client for agent evaluation orchestration and trace fetching
- **`ToolCallAccuracy`**: Flotorch Eval metric that evaluates the accuracy and appropriateness of tool usage decisions
- **`FlotorchADKAgent`**: Creates and configures Flotorch ADK agents with custom tools and tracing
- **`FlotorchADKSession`**: Manages agent sessions for multi-turn conversations
- **`Runner`**: Executes agent queries and coordinates the agent execution flow
- **`FunctionTool`**: Wraps Python functions as tools that can be used by the agent
- **`types`**: Google ADK types for creating message content and handling agent events
- **`pandas`**: Data manipulation and display for formatted results tables
- **`display`**: IPython display utility for rendering formatted outputs

In [None]:
# Required imports
# Flotorch Eval components
from flotorch_eval.agent_eval.core.client import AgentEvaluator
from flotorch_eval.agent_eval.metrics.llm_evaluators import ToolCallAccuracy

# Flotorch ADK components
from flotorch.adk.agent import FlotorchADKAgent
from flotorch.adk.sessions import FlotorchADKSession

# Google ADK components
from google.adk.runners import Runner
from google.adk.tools import FunctionTool
from google.genai import types

# Utilities
import pandas as pd
from IPython.display import display

print("✓ Imported necessary libraries successfully")

## 4. Live Weather Agent Setup

### Purpose
Set up the Live Weather Agent with OpenTelemetry tracing enabled to capture detailed execution data for tool call accuracy evaluation.

### Key Components
1. **FlotorchADKAgent** (`agent_client`):
   - Initializes the agent for weather forecasting tasks
   - Configures `tracer_config` with `enabled: True` and `sampling_rate: 1` to capture 100% of traces
   - Essential for evaluation as traces contain complete tool call decision information
2. **FlotorchADKSession** (`session_service`): Manages agent sessions for multi-turn conversations
3. **Runner** (`runner`): Executes agent queries and coordinates the agent execution flow

These components work together to run the Live Weather Agent and generate OpenTelemetry traces for tool call accuracy analysis.

### Custom Tool: Weather API Integration

The Live Weather Agent uses a custom tool (`get_weather`) that integrates with external APIs to retrieve real-time weather forecasts and data. This tool:
- Accepts a city name as input
- Uses Open-Meteo Geocoding API to convert city name to latitude/longitude
- Fetches real-time weather data using Open-Meteo Weather API
- Returns structured weather information including temperature, wind speed, and humidity
- Handles errors gracefully with exception handling

The tool is wrapped as a `FunctionTool` that can be used by the agent to deliver real-time weather forecasts and data via API integration.

In [None]:
import requests
from typing import Dict, Any

def get_weather(city_name: str) -> Dict[str, Any]:
    """Return latitude, longitude, and current weather for a given city name
       using Open-Meteo's free Geocoding + Weather APIs.

    Args:
        city_name: The name of the city to get weather for

    Returns:
        A dictionary containing city information and current weather data
    """

    # --- Step 1: Geocode city name to lat/lon ---
    geo_url = "https://geocoding-api.open-meteo.com/v1/search"
    geo_params = {
        "name": city_name,
        "count": 1,
        "language": "en",
        "format": "json"
    }

    geo_res = requests.get(geo_url, params=geo_params).json()

    if "results" not in geo_res:
        raise ValueError(f"City '{city_name}' not found")

    city = geo_res["results"][0]
    lat = city["latitude"]
    lon = city["longitude"]

    # --- Step 2: Fetch real-time weather using lat/lon ---
    weather_url = "https://api.open-meteo.com/v1/forecast"
    weather_params = {
        "latitude": lat,
        "longitude": lon,
        "current": "temperature_2m,wind_speed_10m,relative_humidity_2m"
    }

    weather_res = requests.get(weather_url, params=weather_params).json()
    current_weather = weather_res.get("current", {})

    # --- Return neatly structured result ---
    result =  {
        "city": city["name"],
        "country": city.get("country"),
        "latitude": lat,
        "longitude": lon,
        "weather": current_weather
    }

    return result
# Register the custom tool - FunctionTool will automatically infer the schema from the function
tools = [FunctionTool(get_weather)]

print("✓ Weather tool defined and registered successfully")


## 5. Agent and Runner Initialization

### Purpose
Set up the Flotorch ADK Agent and Runner with OpenTelemetry tracing enabled to capture detailed execution data for tool call accuracy evaluation.

### Key Components
1. **FlotorchADKAgent** (`agent_client`):
   - Initializes the agent with custom weather API tools
   - Configures `tracer_config` with `enabled: True` and `sampling_rate: 1` to capture 100% of traces
   - Essential for evaluation as traces contain tool call decision information
2. **FlotorchADKSession** (`session_service`): Manages agent sessions for multi-turn conversations
3. **Runner** (`runner`): Executes agent queries and coordinates the agent execution flow

These components work together to run the Live Weather Agent and generate OpenTelemetry traces for tool call accuracy analysis.

In [None]:
# Initialize Flotorch ADK Agent with tracing enabled
agent_client = FlotorchADKAgent(
    agent_name=agent_name,
    custom_tools=tools,
    base_url=FLOTORCH_BASE_URL,
    api_key=FLOTORCH_API_KEY,
    tracer_config={
        "enabled": True,                                                   # Enable tracing for toolcall accuracy measurement
        "endpoint": "https://dev-observability.flotorch.cloud/v1/traces",  # Dev observability OTLP HTTP endpoint (used by QA)
        "sampling_rate": 1                                                 # Sample 100% of traces
    }
)
agent = agent_client.get_agent()

# Initialize session service
session_service = FlotorchADKSession(
    api_key=FLOTORCH_API_KEY,
    base_url=FLOTORCH_BASE_URL,
)

# Create the ADK Runner to execute agent queries
runner = Runner(
    agent=agent,
    app_name=app_name,
    session_service=session_service
)

print("✓ Agent and runner and session initialized successfully")

## 6. Helper Function for Running a Query

### Purpose
Define a helper function that executes a single-turn query with the agent and extracts the final response. The agent execution is automatically traced for tool call accuracy evaluation.

### Functionality
The `run_single_turn` function:
- Accepts a `Runner`, query string, session ID, and user ID as parameters
- Creates a user message using Google ADK types
- Executes the query through the runner
- Iterates through events to find and return the final agent response
- Returns a fallback message if no response is found

This function simplifies the process of running queries and ensures trace generation during execution.

In [None]:
def run_single_turn(runner: Runner, query: str, session_id: str, user_id: str) -> str:
    """
    Execute a single-turn query with the agent and return the final response.
    The agent execution is traced automatically.
    """
    content = types.Content(role="user", parts=[types.Part(text=query)])
    events = runner.run(user_id=user_id, session_id=session_id, new_message=content)

    # Extract the final response
    for event in events:
        if event.is_final_response() and event.content and event.content.parts:
            return event.content.parts[0].text
    return "No response from agent."

print("✓ Helper function defined successfully")

## 7. Define Query

### Purpose
Define the sample weather query that will be executed by the Live Weather Agent to generate OpenTelemetry traces for tool call accuracy evaluation.

### Key Components
- **`query`**: A sample weather question that will be processed by the agent
  - This query will trigger the agent to use weather API tools to fetch real-time weather data
  - The query will test the agent's ability to make appropriate tool call decisions (e.g., selecting the correct weather API tool, passing appropriate parameters)
  - The execution will be automatically traced to capture tool call decisions and execution
  - The tool calls will be evaluated for accuracy and appropriateness using the ToolCallAccuracy metric
  - Example: "what is the weather in the hyderbad"

The query can be modified to test different weather scenarios and evaluate tool call accuracy for various types of weather-related questions.


In [None]:
# Execute the query to generate traces
query = "what is the weather in the hyderbad"

print(f"✓ Query defined: {query}")

## 8. Run the Query and Get Trace ID

### Purpose
Execute a sample query with the Live Weather Agent to generate OpenTelemetry traces that contain tool call decision data for evaluation.

### Process
1. **Create Session**: Initialize a new session for the agent interaction
2. **Execute Query**: Run a sample query (e.g., "what is the weather in the hyderabad") through the agent
3. **Retrieve Trace IDs**: Extract the generated trace IDs from the agent client
4. **Display Results**: Print the agent response and trace ID for verification

The execution automatically generates OpenTelemetry traces that record tool call decisions and execution information, which will be used for tool call accuracy evaluation.

In [None]:
# Create a new session
session = await runner.session_service.create_session(
    app_name=app_name,
    user_id=user_id,
)
print(f"Session created: {session.id}")

response = run_single_turn(
    runner=runner,
    query=query,
    session_id=session.id,
    user_id=user_id
)

# Retrieve the generated trace IDs
trace_ids = agent_client.get_tracer_ids()
print(trace_ids)
print("Agent Response:")
print(response[:200] + "..." if len(response) > 200 else response)
print(f"Found {len(trace_ids)} trace(s). First trace ID: {trace_ids[0] if trace_ids else 'N/A'}")

print(f"✓ Query execution completed successfully")

## 9. Tool Call Accuracy Evaluation with Flotorch Eval

### Purpose
Initialize the `AgentEvaluator`, fetch the OpenTelemetry trace, and run the `ToolCallAccuracy` metric to evaluate tool call decision accuracy. The evaluation metric **toolcall_accuracy** provides detailed assessment of tool usage decisions for the Live Weather Agent.

### Key Components
1. **ToolCallAccuracy**: Initializes the tool call accuracy metric that will analyze trace data
2. **AgentEvaluator** (`client`):
   - Connects to the Flotorch gateway using API credentials
   - Configured with a default evaluator model
   - Provides methods to fetch and evaluate traces
3. **Trace Fetching**: Retrieves the complete trace data using the trace ID generated during agent execution

The fetched trace contains detailed information about tool call decisions and execution, which will be analyzed by the ToolCallAccuracy metric to compute the toolcall_accuracy score.

In [None]:
# Initialize the ToolCallAccuracy metric
metrics = [ToolCallAccuracy()]

# Initialize the AgentEvaluator client
client = AgentEvaluator(
    api_key=FLOTORCH_API_KEY,
    base_url=FLOTORCH_BASE_URL,
    default_evaluator=default_evaluator
)

traces = None
if trace_ids:
    # Fetch the trace data from the Flotorch gateway
    traces = client.fetch_traces(trace_ids[0])
    print(f"✓ Trace fetched successfully")
else:
    print("✗ No trace IDs found to fetch.")

## 10. Run Evaluation

### Purpose
Execute the tool call accuracy evaluation by processing the fetched OpenTelemetry trace using the ToolCallAccuracy metric to assess tool usage decisions.

### Process
- Calls `client.evaluate()` with the trace data and ToolCallAccuracy metric
- The evaluator processes the trace to analyze tool call decisions and execution
- Computes the **toolcall_accuracy** metric which includes:
  - Accuracy score (0.0 to 1.0) indicating how appropriate tool usage was
  - Detailed evaluation explanation of tool call decisions
  - Assessment of tool selection, parameter accuracy, and timing
- Returns evaluation results with trajectory ID and metric scores

This step generates the tool call accuracy analysis that will be displayed in the next section.

In [None]:
if 'traces' in locals() and traces:
    # Evaluate the trace using the ToolCallAccuracy metric
    results = await client.evaluate(
        trace=traces,
        metrics=metrics
    )

    print("✓ Evaluation completed successfully!")
else:
    print("Cannot evaluate: No traces were available.")

## 11. Display and Interpret Results

### Purpose
Define helper functions to format and display the evaluation output clearly, showing the toolcall_accuracy metric results in a readable format.

### Functionality
The `display_metrics` function:
- Extracts the `toolcall_accuracy` metric from evaluation results
- Formats the accuracy score and evaluation details
- Creates a structured display showing:
  - Tool Call Accuracy Score (0.0 to 1.0)
  - Detailed evaluation explanation
- Uses pandas DataFrame with styled formatting for clean presentation

This function provides a user-friendly way to visualize tool call accuracy metrics.

In [None]:
def display_metrics(result):
    """
    Display tool call accuracy metrics in a formatted table.
    """
    # Find the toolcall_accuracy metric
    metric = next((m for m in result.scores if m.name == "toolcall_accuracy"), None)
    if not metric:
        print("No toolcall_accuracy metric found.")
        return

    # Extract metric details
    d = metric.details

    # Get the details string (which contains the evaluation explanation)
    details_text = d.get("details", "No details available.")

    # Format the details string with better readability
    details = f"Score: {metric.score:.2f} / 1.0\n\nEvaluation Details:\n{details_text}"

    # Create DataFrame for display
    df = pd.DataFrame([{
        "Metric": metric.name.replace("_", " ").title(),
        "Score": f"{metric.score:.2f}",
        "Details": details
    }])

    # Display DataFrame with multiline support
    display(df.style.set_properties(
        subset=['Details'],
        **{'white-space': 'pre-wrap', 'text-align': 'left'}
    ))

print("✓ Display metrics function defined successfully")

## 12. View Tool Call Accuracy Results

### Purpose
Display the tool call accuracy evaluation results in a formatted table showing the complete assessment for the Live Weather Agent.

### Output
The displayed table includes:
- **Metric**: The evaluation metric name (toolcall_accuracy)
- **Score**: The tool call accuracy score (0.0 to 1.0)
- **Details**: Comprehensive evaluation showing:
  - Accuracy score out of 1.0
  - Detailed explanation of tool call decisions
  - Assessment of tool selection, parameter accuracy, and timing appropriateness

This visualization helps identify tool usage issues and optimize the agent's tool call decisions.

In [None]:
if 'results' in locals():
    display_metrics(results)
else:
    print("No results object found. Please run sections 5 and 6 first.")

### Interpreting the Tool Call Accuracy Results

The **toolcall_accuracy** metric is a vital tool for quality monitoring of the Live Weather Agent:

* **Accuracy Score (0.0 to 1.0)**: Indicates how appropriate and accurate the agent's tool usage decisions were:
    * **1.0**: Perfect tool call accuracy - tool selection, parameters, and timing were all appropriate
    * **0.5-0.9**: Good accuracy with minor issues in tool selection or parameter formatting
    * **0.0-0.4**: Poor accuracy - incorrect tool selection, wrong parameters, or inappropriate timing
* **Evaluation Details**: Provides a detailed explanation of:
    * **Tool Selection Appropriateness**: Whether the agent selected the correct tool for the task (e.g., using `get_weather` for weather queries)
    * **Parameter Accuracy and Formatting**: Whether tool parameters were correctly formatted and accurate (e.g., city name properly extracted and passed)
    * **Timing and Necessity**: Whether the tool call was made at the right time and was necessary for the task
    * **Overall Quality**: Comprehensive assessment of tool usage decision quality

For a Live Weather Agent, understanding tool call accuracy helps identify:
- **Tool selection issues**: If the agent uses incorrect tools or fails to use tools when needed
- **Parameter formatting problems**: If city names or other parameters are incorrectly extracted or formatted
- **Timing optimization**: If tool calls are made unnecessarily or at inappropriate times
- **Overall reliability**: Monitor accuracy to ensure the agent delivers accurate and reliable real-time weather data via proper API integration

## 13. Summary of Agent Tool Call Accuracy Evaluation Notebook

This notebook demonstrates the professional methodology for evaluating the tool call accuracy of a **Flotorch ADK Agent** (configured as a **Live Weather Agent** that delivers real-time weather forecasts and data via API integration) using the **Flotorch Eval framework**.

**Use Case**: Live Weather Agent - Delivers real-time weather forecasts and data via API integration.

**Evaluation Metric**: toolcall_accuracy

## Core Process

### 1. Setup and Instrumentation
- Configure a `FlotorchADKAgent` with custom weather API tools (e.g., Open-Meteo weather API integration).
- Enable **OpenTelemetry Tracing** via the `tracer_config`.
- This instrumentation allows detailed capture of tool call decisions and execution information.

### 2. Execution and Data Generation
- Run a sample query through the agent using the **Runner**.
- This automatically generates an **Agent Trajectory** in the form of OpenTelemetry traces.
- The trace records tool call decisions and execution, including:
  - Tool selection decisions
  - Parameter accuracy
  - Tool execution results
  - Step-by-step agent operations

### 3. Evaluation
- Use the `AgentEvaluator` client along with the specialized **ToolCallAccuracy** metric from `flotorch-eval`.
- The evaluator processes the trace data to compute tool call accuracy statistics using the **toolcall_accuracy** metric.

### 4. Analysis
- The notebook displays a thorough tool call accuracy assessment, including:
  - **Accuracy Score** (0.0 to 1.0)
  - **Evaluation Details** explaining tool usage decisions
  - Assessment of tool selection, parameter accuracy, and timing

## Purpose and Benefits

This evaluation provides **actionable quality metrics** that help developers:

- Identify tool call accuracy issues in the Live Weather Agent  
- Optimize tool usage decisions, particularly API parameter formatting  
- Track quality trends over time  
- Ensure the Live Weather Agent delivers **accurate and reliable real-time weather data** via proper API integration