[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://drive.google.com/drive/folders/1IrwoNrb3AWLAhAqjlAkJNYa39p9eT9ui?usp=sharing)

# Flotorch Agent Trajectory Evaluation (Intelligent Research Assistant Use Case)

This notebook demonstrates how to measure and analyze the **trajectory evaluation** of a **Flotorch ADK agent** (configured as an **Intelligent Research Assistant** that performs web-based research and synthesizes information using search APIs) using the **Flotorch Eval** framework.

The evaluation relies on **OpenTelemetry Traces** generated during the agent's run to assess the overall quality and effectiveness of the agent's trajectory through LLM-based evaluation.

---

## Key Concepts

* **Intelligent Research Assistant**: An agent designed to perform web-based research and synthesize information using search APIs.
* **OpenTelemetry Traces**: Detailed records of the agent's execution steps (spans) used to analyze the complete agent trajectory.
* **TrajectoryEvalWithLLM**: A Flotorch Eval metric that uses LLM-based evaluation to assess the overall quality and effectiveness of agent trajectories. The evaluation metric used is **trajectory_evaluation_with_llm**.

---
### Architecture Overview

![Workflow Diagram](diagrams/02_TrajectoryEvalWithLLM_Workflow_Diagram.drawio.png)
*Figure 2: Detailed workflow diagram showing the step-by-step process of Trajectory eval with llm evaluation from agent execution through trace collection to metric computation.*

---
## Requirements

* Flotorch account with configured models.
* Valid Flotorch API key and gateway base URL.
* Agent configured with OpenTelemetry tracing enabled.
* External API access for web search (e.g., Google Custom Search API).

---

## Agent Setup in Flotorch Console

**Important**: Before running this notebook, you need to create an agent in the Flotorch Console. This section provides step-by-step instructions on how to set up the agent.

### Step 1: Access Flotorch Console

1. **Log in to Flotorch Console**:
   - Navigate to your Flotorch Console (e.g., `https://dev-console.flotorch.cloud`)
   - Ensure you have the necessary permissions to create agents

2. **Navigate to Agents Section**:
   - Click on **"Agents"** in the left sidebar
   - You should see the "Agent Builder" option selected

### Step 2: Create New Agent

1. **Click "Create FloTorch Agent"**:
   - Look for the blue **"+ Create FloTorch Agent"** button in the top right corner
   - Click it to start creating a new agent

2. **Agent Configuration**:
   - **Agent Name**: Choose a unique name for your agent (e.g., `research-agent`)
     - **Important**: The name should only contain alphanumeric characters and dashes (a-z, A-Z, 0-9, -)
     - **Note**: Copy this agent name - you'll need to use it in the `agent_name` variable later
   - **Description** (Optional): Add a description if desired

### Step 3: Configure Agent Details

After creating the agent, you'll be directed to the agent configuration page. Configure the following:

#### Required Configuration:

1. **Model** (`* Model`):
   - **Required**: Select a model from the available options
   - Example: `gpt-model` or any available model from your Flotorch gateway
   - Click the edit icon to configure

2. **Agent Details** (`* Agent Details`):
   - **Required**: Configure agent details
   - **System Prompt**: Copy and paste the following system prompt:

you are the helpful assistant.  you need to call the web_search tool when the user ask about anything. you need give more context about the user required data.  give context in the bullet points minimum 20 bullet points.

Available Tools:
web_search


   - **Goal**: Copy and paste the following goal:

you are the helpful assistant.  you need to call the web_search tool when the user ask about anything.


#### Optional Configuration:

1. **Tools**:
   - Tools will be added programmatically via the notebook (see Section 8)
   - You can leave this as "Not Configured" in the console

2. **Input Schema**:
   - Optional: Leave as "Not Configured" for this use case

3. **Output Schema**:
   - Optional: Leave as "Not Configured" for this use case

### Step 4: Publish the Agent

1. **Review Configuration**:
   - Ensure the Model and Agent Details are configured correctly
   - Verify the System Prompt and Goal are set

2. **Publish Agent**:
   - After configuration, click **"Publish"** or **"Make a revision"** to publish the agent
   - Once published, the agent will have a version number (e.g., v1)

3. **Note the Agent Name**:
   - **Important**: Copy the exact agent name you used when creating the agent
   - You will need to replace `<your_agent_name>` in the `agent_name` variable in Section 2.1 (Global Provider Models and Agent Configuration)

### Step 5: Update Notebook Configuration

1. **Update Agent Name**:
   - Navigate to Section 2.1 in this notebook
   - Find the `agent_name` variable
   - Replace `<your_agent_name>` with the exact agent name you created in the console

**Example**:
- If you created an agent named `research-agent` in the console
- Set `agent_name = "research-agent"` in the notebook

### Summary of Required vs Optional Settings

| Setting | Required/Optional | Value |
|---------|------------------|-------|
| **Agent Name** | **Required** | Choose a unique name (copy it for notebook) |
| **Model** | **Required** | Select from available models |
| **System Prompt** | **Required** | Use the system prompt provided above |
| **Goal** | **Required** | Use the goal provided above |
| **Tools** | **Optional** | Will be added via notebook code |
| **Input Schema** | **Optional** | Can leave as "Not Configured" |
| **Output Schema** | **Optional** | Can leave as "Not Configured" |

**Note**: The tools (Knowledge Base, Web Search, Weather, News) will be added to the agent programmatically in the notebook code, so you don't need to configure them manually in the console.

---


## 1. Setup and Installation

### Purpose
Install the necessary packages for the Flotorch Evaluation framework required for agent trajectory evaluation.

### Key Components
- **`flotorch-eval`**: Flotorch evaluation framework with all dependencies for trajectory metrics


In [None]:
# Install Flotorch Eval packages
# flotorch-eval: Flotorch evaluation framework with all dependencies

%pip install flotorch-eval==2.0.0b1 flotorch[adk]==3.1.0b1

## 2.Authentication and Credentials

### Purpose
Configure your Flotorch API credentials and gateway URL for authentication.

### Key Components
This cell configures the essential authentication and connection parameters:

**Authentication Parameters**:

| Parameter | Description | Example |
|-----------|-------------|---------|
| `FLOTORCH_API_KEY` | Your API authentication key (found in your Flotorch Console). Securely entered using `getpass` to avoid displaying in the notebook | `sk_...` |
| `FLOTORCH_BASE_URL` | Your Flotorch gateway endpoint URL | `https://dev-console.flotorch.cloud` |

**Note**: Use secure credential management in production environments.


In [None]:
import getpass  # Securely prompt without echoing in Prefect/notebooks

# authentication for Flotorch access
try:
    FLOTORCH_API_KEY = getpass.getpass("Paste your API key here: ") 
    print(f"âœ“ FLOTORCH_API_KEY set successfully")
except getpass.GetPassWarning as e:
    print(f"Warning: {e}")
    FLOTORCH_API_KEY = ""
    print(f"âœ— FLOTORCH_API_KEY not set")

FLOTORCH_BASE_URL = input("Paste your Flotorch Base URL here: ")  # https://dev-gateway.flotorch.cloud
print(f"âœ“ FLOTORCH_BASE_URL set: {FLOTORCH_BASE_URL}")

print("âœ“ All credentials configured successfully!")

### 2.1. Global Provider Models and Agent Configuration

### Purpose
Define available models from the Flotorch gateway and configure agent-specific parameters.

### Key Components

**Global Provider Models**: These are the available models from the Flotorch gateway that can be used for evaluation and agent operations:

| Model Variable | Model Name | Description |
|----------------|------------|-------------|
| `MODEL_CLAUDE_HAIKU` | `flotorch/flotorch-claude-haiku-4-5` | Claude Haiku model via Flotorch gateway |
| `MODEL_CLAUDE_SONNET` | `flotorch/flotorch-claude-sonnet-3-5-v2` | Claude Sonnet model via Flotorch gateway |
| `MODEL_AWS_NOVA_PRO` | `flotorch/flotorch-aws-nova-pro` | AWS Nova Pro model via Flotorch gateway |
| `MODEL_AWS_NOVA_LITE` | `flotorch/flotorch-aws-nova-lite` | AWS Nova Lite model via Flotorch gateway |
| `MODEL_AWS_NOVA_MICRO` | `flotorch/flotorch-aws-nova-micro` | AWS Nova Micro model via Flotorch gateway |

**Agent Configuration Parameters**:

| Parameter | Description | Example |
|-----------|-------------|---------|
| `default_evaluator` | The LLM model used for evaluation (can use MODEL_* variables above) | `MODEL_CLAUDE_SONNET` or `flotorch/flotorch-model` |
| `agent_name` | The name of your Flotorch ADK agent | `research-agent` |
| `app_name` | The application name identifier | `agent-evaluation-app-name_02` |
| `user_id` | The user identifier | `agent-evaliation-user-02` |


In [None]:
# ============================================================================
# Global Provider Models (Flotorch Gateway Models)
# ============================================================================
# These models are available from the Flotorch gateway and can be used
# for evaluation, agent operations, and other tasks.

MODEL_CLAUDE_HAIKU = "flotorch/flotorch-claude-haiku-4-5"
MODEL_CLAUDE_SONNET = "flotorch/flotorch-claude-sonnet-3-5-v2"
MODEL_AWS_NOVA_PRO = "flotorch/flotorch-aws-nova-pro"
MODEL_AWS_NOVA_LITE = "flotorch/flotorch-aws-nova-lite"
MODEL_AWS_NOVA_MICRO = "flotorch/flotorch-aws-nova-micro"

print("âœ“ Global provider models defined")

# The LLM model used for evaluation.
# Can be modified to use any MODEL_* constant above (e.g., MODEL_CLAUDE_SONNET, MODEL_AWS_NOVA_PRO)
# You can use your own models from Flotorch Console as well
default_evaluator = MODEL_CLAUDE_HAIKU

agent_name = "<your_agent_name>"  # The name of your Flotorch ADK agent                                        || ex : research-agent
app_name = "<your_app_name>"  # The application name identifier                                                || ex : agent-evaluation-app-name_02
user_id = "<your_user_id>"  # The user identifier                                                              || ex : agent-evaliation-user-02

print("âœ“ Agent Configuration Parameter defined ")


## 3. Import Required Libraries

### Purpose
Import all required components for evaluating the Intelligent Research Assistant trajectory using Flotorch Eval.

### Key Components
- **`AgentEvaluator`**: Core client for agent evaluation orchestration and trace fetching
- **`TrajectoryEvalWithLLM`**: Flotorch Eval metric that uses LLM-based evaluation to assess overall trajectory quality
- **`FlotorchADKAgent`**: Creates and configures Flotorch ADK agents with custom tools and tracing
- **`FlotorchADKSession`**: Manages agent sessions for multi-turn conversations
- **`Runner`**: Executes agent queries and coordinates the agent execution flow
- **`FunctionTool`**: Wraps Python functions as tools that can be used by the agent
- **`types`**: Google ADK types for creating message content and handling agent events
- **`pandas`**: Data manipulation and display for formatted results tables
- **`display`**: IPython display utility for rendering formatted outputs

In [None]:
# Required imports
# Flotorch Eval components
from flotorch_eval.agent_eval.core.client import AgentEvaluator
from flotorch_eval.agent_eval.metrics.llm_evaluators import TrajectoryEvalWithLLM

# Flotorch ADK components
from flotorch.adk.agent import FlotorchADKAgent
from flotorch.adk.sessions import FlotorchADKSession

# Google ADK components
from google.adk.runners import Runner
from google.adk.tools import FunctionTool
from google.genai import types

# Utilities
import pandas as pd
from IPython.display import display

print("âœ“ Imported necessary libraries successfully")

## 4. Intelligent Research Assistant Setup

### Purpose
Set up the Intelligent Research Assistant with OpenTelemetry tracing enabled to capture detailed execution data for trajectory evaluation.

### Key Components
1. **FlotorchADKAgent** (`agent_client`):
   - Initializes the agent for performing web-based research and synthesizing information using search APIs
   - Configures `tracer_config` with `enabled: True` and `sampling_rate: 1` to capture 100% of traces
   - Essential for evaluation as traces contain complete trajectory information
2. **FlotorchADKSession** (`session_service`): Manages agent sessions for multi-turn conversations
3. **Runner** (`runner`): Executes agent queries and coordinates the agent execution flow

These components work together to run the Intelligent Research Assistant and generate OpenTelemetry traces for trajectory evaluation analysis.

### Custom Tool: Web Search API Integration

The Intelligent Research Assistant uses a custom tool (`web_search`) that integrates with external search APIs to retrieve and synthesize information. This tool:
- Accepts a search query string as input
- Uses Google Custom Search API to perform web searches
- Retrieves top search results with titles, snippets, and links
- Returns formatted search results for information synthesis
- Handles errors gracefully with fallback responses

The tool is wrapped as a `FunctionTool` that can be used by the agent to perform web-based research and synthesize information using search APIs.

In [None]:
import requests

def web_search(query: str) -> str:
    """Perform a Google search and return top results."""
    api_key = "<GOOGLE_API_KEY>" # Enter your google api key || ex : AIzaSyA_jSk0x7mubtDTo..........
    cse_id = "<GOOGLE CSE_ID>"   # Enter your google cse_id  || ex : 77ef93ba66............

    url = "https://www.googleapis.com/customsearch/v1"
    params = {
        "key": api_key,
        "cx": cse_id,
        "q": query,
        "num": 5  # top 5 results
    }

    response = requests.get(url, params=params)
    data = response.json()

    if "items" not in data:
        return "No results found."

    results = []
    for item in data["items"]:
        title = item.get("title", "")
        snippet = item.get("snippet", "")
        link = item.get("link", "")
        results.append(f"ðŸ”¹ {title}\n{snippet}\n{link}")

    # Combine results into a single text output
    return "\n\n".join(results)

# --- Wrap as ADK Tool ---
tools = [FunctionTool(func=web_search)]

print("âœ“ Web search tool defined and registered successfully")

## 5. Agent and Runner Initialization

### Purpose
Set up the Flotorch ADK Agent and Runner with OpenTelemetry tracing enabled to capture detailed execution data for trajectory evaluation.

### Key Components
1. **FlotorchADKAgent** (`agent_client`):
   - Initializes the agent with custom web search tools
   - Configures `tracer_config` with `enabled: True` and `sampling_rate: 1` to capture 100% of traces
   - Essential for evaluation as traces contain complete trajectory information
2. **FlotorchADKSession** (`session_service`): Manages agent sessions for multi-turn conversations
3. **Runner** (`runner`): Executes agent queries and coordinates the agent execution flow

These components work together to run the Intelligent Research Assistant and generate OpenTelemetry traces for trajectory evaluation analysis.

In [None]:
# Initialize Flotorch ADK Agent with tracing enabled
agent_client = FlotorchADKAgent(
    agent_name=agent_name,
    custom_tools=tools,
    base_url=FLOTORCH_BASE_URL,
    api_key=FLOTORCH_API_KEY,
    tracer_config={
        "enabled": True,                                                   # Enable tracing for trajectory evaluation
        "endpoint": "https://dev-observability.flotorch.cloud/v1/traces",  # Dev observability OTLP HTTP endpoint (used by QA)
        "sampling_rate": 1                                                 # Sample 100% of traces
    }
)
agent = agent_client.get_agent()

# Initialize session service
session_service = FlotorchADKSession(
    api_key=FLOTORCH_API_KEY,
    base_url=FLOTORCH_BASE_URL,
)

# Create the ADK Runner to execute agent queries
runner = Runner(
    agent=agent,
    app_name=app_name,
    session_service=session_service
)

print("âœ“ Agent and runner and session initialized successfully")

## 6. Helper Function for Running a Query

### Purpose
Define a helper function that executes a single-turn query with the agent and extracts the final response. The agent execution is automatically traced for trajectory evaluation.

### Functionality
The `run_single_turn` function:
- Accepts a `Runner`, query string, session ID, and user ID as parameters
- Creates a user message using Google ADK types
- Executes the query through the runner
- Iterates through events to find and return the final agent response
- Returns a fallback message if no response is found

This function simplifies the process of running queries and ensures trace generation during execution.

In [None]:
def run_single_turn(runner: Runner, query: str, session_id: str, user_id: str) -> str:
    """
    Execute a single-turn query with the agent and return the final response.
    The agent execution is traced automatically.
    """
    content = types.Content(role="user", parts=[types.Part(text=query)])
    events = runner.run(user_id=user_id, session_id=session_id, new_message=content)

    # Extract the final response
    for event in events:
        if event.is_final_response() and event.content and event.content.parts:
            return event.content.parts[0].text
    return "No response from agent."

print("âœ“ Helper function defined successfully")

## 7. Define Query

### Purpose
Define the sample query that will be executed by the Intelligent Research Assistant agent to generate OpenTelemetry traces for trajectory evaluation.

### Key Components
- **`query`**: A sample research question that will be processed by the agent
  - This query will trigger the agent to perform web-based research using search APIs and synthesize information
  - The execution will be automatically traced to capture the complete agent trajectory
  - The trajectory will be evaluated using LLM-based assessment to measure quality and effectiveness
  - Example: "Tell me about Google ADK?"

The query can be modified to test different research scenarios and evaluate trajectory quality for various types of questions.

In [None]:
# Execute the query to generate traces

query = "Tell me about Google ADK?"

print(f"âœ“ Query defined: {query}")

## 8. Run the Query and Get Trace ID

### Purpose
Execute a sample query with the Intelligent Research Assistant to generate OpenTelemetry traces that contain trajectory data for evaluation.

### Process
1. **Create Session**: Initialize a new session for the agent interaction
2. **Execute Query**: Run a sample query through the agent
3. **Retrieve Trace IDs**: Extract the generated trace IDs from the agent client
4. **Display Results**: Print the agent response and trace ID for verification

The execution automatically generates OpenTelemetry traces that record the complete agent trajectory, which will be used for trajectory evaluation.

In [None]:
# Create a new session
session = await runner.session_service.create_session(
    app_name=app_name,
    user_id=user_id,
)
print(f"Session created: {session.id}")

response = run_single_turn(
    runner=runner,
    query=query,
    session_id=session.id,
    user_id=user_id
)

# Retrieve the generated trace IDs
trace_ids = agent_client.get_tracer_ids()
print(trace_ids)
print("Agent Response:")
print(response[:200] + "..." if len(response) > 200 else response)
print(f"Found {len(trace_ids)} trace(s). First trace ID: {trace_ids[0] if trace_ids else 'N/A'}")

print(f"âœ“ Query execution completed successfully")

## 9. Trajectory Evaluation with Flotorch Eval

### Purpose
Initialize the `AgentEvaluator`, fetch the OpenTelemetry trace, and run the `TrajectoryEvalWithLLM` metric to evaluate overall trajectory quality. The evaluation metric **trajectory_evaluation_with_llm** provides comprehensive assessment of the Intelligent Research Assistant's trajectory.

### Key Components
1. **TrajectoryEvalWithLLM**: Initializes the trajectory evaluation metric that uses LLM-based evaluation to assess trajectory quality
2. **AgentEvaluator** (`client`):
   - Connects to the Flotorch gateway using API credentials
   - Configured with a default evaluator model
   - Provides methods to fetch and evaluate traces
3. **Trace Fetching**: Retrieves the complete trace data using the trace ID generated during agent execution

The fetched trace contains detailed information about the complete agent trajectory, which will be analyzed by the TrajectoryEvalWithLLM metric to compute the trajectory_evaluation_with_llm score.

In [None]:
# Initialize the TrajectoryEvalWithLLM metric
metrics = [TrajectoryEvalWithLLM()]

# Initialize the AgentEvaluator client
client = AgentEvaluator(
    api_key=FLOTORCH_API_KEY,
    base_url=FLOTORCH_BASE_URL,
    default_evaluator=default_evaluator
)

traces = None
if trace_ids:
    # Fetch the trace data from the Flotorch gateway
    traces = client.fetch_traces(trace_ids[0])
    print(f"âœ“ Trace fetched successfully")
else:
    print("âœ— No trace IDs found to fetch.")

## 10. Run Evaluation

### Purpose
Execute the trajectory evaluation by processing the fetched OpenTelemetry trace using the TrajectoryEvalWithLLM metric to assess overall trajectory quality.

### Process
- Calls `client.evaluate()` with the trace data and TrajectoryEvalWithLLM metric
- The evaluator processes the trace to analyze the complete agent trajectory
- Computes the **trajectory_evaluation_with_llm** metric which includes:
  - Quality score (0.0 to 1.0) indicating overall trajectory effectiveness
  - Detailed LLM-based evaluation explanation of trajectory quality
  - Assessment of tool usage, reasoning, and response quality
- Returns evaluation results with trajectory ID and metric scores

This step generates the trajectory evaluation analysis that will be displayed in the next section.

In [None]:
if 'traces' in locals() and traces:
    # Evaluate the trace using the TrajectoryEvalWithLLM metric
    results = await client.evaluate(
        trace=traces,
        metrics=metrics
    )

    print("âœ“ Evaluation completed successfully!")
else:
    print("Cannot evaluate: No traces were available.")

## 11. Display and Interpret Results

### Purpose
Define helper functions to format and display the evaluation output clearly, showing the trajectory_evaluation_with_llm metric results in a readable format.

### Functionality
The `display_metrics` function:
- Extracts the `trajectory_evaluation` metric from evaluation results
- Formats the quality score and evaluation details
- Creates a structured display showing:
  - Trajectory Quality Score (0.0 to 1.0)
  - Detailed LLM-based evaluation explanation
- Uses pandas DataFrame with styled formatting for clean presentation

This function provides a user-friendly way to visualize trajectory evaluation metrics.

In [None]:
def display_metrics(result):
    """
    Display trajectory evaluation metrics in a formatted table.
    """
    # Find the trajectory_evaluation metric
    metric = next((m for m in result.scores if m.name == "trajectory_evaluation"), None)
    if not metric:
        print("No trajectory_evaluation metric found.")
        return

    # Extract metric details
    d = metric.details

    # Get the details string (which contains the evaluation explanation)
    details_text = d.get("details", "No details available.")

    # Format the details string with better readability
    details = f"Score: {metric.score:.2f} / 1.0\n\nEvaluation Details:\n{details_text}"

    # Create DataFrame for display
    df = pd.DataFrame([{
        "Metric": metric.name.replace("_", " ").title(),
        "Score": f"{metric.score:.2f}",
        "Details": details
    }])

    # Display DataFrame with multiline support
    display(df.style.set_properties(
        subset=['Details'],
        **{'white-space': 'pre-wrap', 'text-align': 'left'}
    ))

print("âœ“ Display metrics function defined successfully")

## 12. View Trajectory Evaluation Results

### Purpose
Display the trajectory evaluation results in a formatted table showing the complete assessment for the Intelligent Research Assistant.

### Output
The displayed table includes:
- **Metric**: The evaluation metric name (trajectory_evaluation)
- **Score**: The trajectory quality score (0.0 to 1.0)
- **Details**: Comprehensive evaluation showing:
  - Quality score out of 1.0
  - Detailed LLM-based explanation of trajectory quality
  - Assessment of tool usage, reasoning, and response effectiveness

This visualization helps identify trajectory quality issues and optimize the agent's research and synthesis capabilities.

In [None]:
if 'results' in locals():
    display_metrics(results)
else:
    print("No results object found. Please run sections 5 and 6 first.")

### Interpreting the Trajectory Evaluation Results

The **trajectory_evaluation_with_llm** metric is a vital tool for quality monitoring of the Intelligent Research Assistant:

* **Quality Score (0.0 to 1.0)**: Indicates the overall effectiveness and quality of the agent's trajectory:
    * **1.0**: Excellent trajectory - effective tool usage, sound reasoning, and high-quality responses
    * **0.5-0.9**: Good trajectory with minor issues in tool usage or response quality
    * **0.0-0.4**: Poor trajectory - ineffective tool usage, poor reasoning, or low-quality responses
* **Evaluation Details**: Provides a detailed LLM-based explanation of:
    * **Tool Usage Effectiveness**: Whether tools were used appropriately and effectively (e.g., web search queries)
    * **Reasoning Quality**: Assessment of the agent's reasoning and decision-making process
    * **Response Quality**: Evaluation of the final response's accuracy, completeness, and usefulness
    * **Overall Trajectory Assessment**: Comprehensive evaluation of the complete agent execution path

For an Intelligent Research Assistant, understanding trajectory evaluation helps identify:
- **Tool usage issues**: If web search tools are used ineffectively or queries are poorly formulated
- **Reasoning problems**: If the agent's research and synthesis process has logical gaps
- **Response quality**: If the synthesized information is incomplete, inaccurate, or poorly presented
- **Overall effectiveness**: Monitor trajectory quality to ensure the agent delivers accurate and comprehensive research results via proper API integration

## 13. Summary of Agent Trajectory Evaluation Notebook

This notebook demonstrates the professional methodology for evaluating the trajectory quality of a **Flotorch ADK Agent** (configured as an **Intelligent Research Assistant** that performs web-based research and synthesizes information using search APIs) using the **Flotorch Eval framework**.

**Use Case**: Intelligent Research Assistant - Performs web-based research and synthesizes information using search APIs.

**Evaluation Metric**: trajectory_evaluation_with_llm

## Core Process

### 1. Setup and Instrumentation
- Configure a `FlotorchADKAgent` with custom web search tools (e.g., Google Custom Search API integration).
- Enable **OpenTelemetry Tracing** via the `tracer_config`.
- This instrumentation allows detailed capture of the complete agent trajectory and decision-making process.

### 2. Execution and Data Generation
- Run a sample query through the agent using the **Runner**.
- This automatically generates an **Agent Trajectory** in the form of OpenTelemetry traces.
- The trace records the complete execution path, including:
  - Tool usage decisions (web search queries)
  - LLM interactions and reasoning
  - Information synthesis and response generation
  - Step-by-step agent operations

### 3. Evaluation
- Use the `AgentEvaluator` client along with the specialized **TrajectoryEvalWithLLM** metric from `flotorch-eval`.
- The evaluator processes the trace data to compute trajectory quality statistics using the **trajectory_evaluation_with_llm** metric.

### 4. Analysis
- The notebook displays a thorough trajectory evaluation assessment, including:
  - **Quality Score** (0.0 to 1.0)
  - **LLM-based Evaluation Details** explaining trajectory quality
  - Assessment of tool usage, reasoning, and response effectiveness

## Purpose and Benefits

This evaluation provides **actionable quality metrics** that help developers:

- Identify trajectory quality issues in the Intelligent Research Assistant  
- Optimize tool usage decisions, particularly web search query formulation  
- Track quality trends over time  
- Ensure the Intelligent Research Assistant delivers **accurate and comprehensive research results** via proper API integration