## Evaluate Pydantic AI weather agent
This tutorial will show you how to evaluate Pydantic AI agents using DeepEval's dataset iterator.


### Install dependencies:

In [None]:
!pip install pydantic-ai -U deepeval --quiet

### Set your OpenAI API key:

In [None]:
import os
os.environ['OPENAI_API_KEY'] = "<your-openai-api-key>"

### Hyperparameters

Hyperparameters of an LLM are the parameters that are used to control the behavior of the LLM application. It can be model, temperature, max tokens, or even you static prompts (for eg, system prompt). One of the main aim of performing evlauation is to find the best set of hyperparameters for a given agent.

For this application, we are using model as one of the hyperparameter.


In [None]:
hyperparameter_model = "gpt-4o"

### Create a Pydantic AI agent. 

This is the same example as the one in the [Pydantic AI docs](https://ai.pydantic.dev/examples/weather-agent/). User can ask for the weather in multiple cities, the agent will use the `get_lat_lng` tool to get the latitude and longitude of the locations, then use
the `get_weather` tool to get the weather.

In [None]:
from __future__ import annotations as _annotations

import asyncio
from dataclasses import dataclass
from typing import Any

from httpx import AsyncClient
from pydantic import BaseModel

from pydantic_ai import Agent, RunContext


@dataclass
class Deps:
    client: AsyncClient


weather_agent = Agent(
    hyperparameter_model,
    instructions='Be concise, reply with one sentence.',
    deps_type=Deps,
    retries=2,
)


class LatLng(BaseModel):
    lat: float
    lng: float


@weather_agent.tool
async def get_lat_lng(ctx: RunContext[Deps], location_description: str) -> LatLng:
    """Get the latitude and longitude of a location.

    Args:
        ctx: The context.
        location_description: A description of a location.
    """
    # NOTE: the response here will be random, and is not related to the location description.
    r = await ctx.deps.client.get(
        'https://demo-endpoints.pydantic.workers.dev/latlng',
        params={'location': location_description},
    )
    r.raise_for_status()
    return LatLng.model_validate_json(r.content)


@weather_agent.tool
async def get_weather(ctx: RunContext[Deps], lat: float, lng: float) -> dict[str, Any]:
    """Get the weather at a location.

    Args:
        ctx: The context.
        lat: Latitude of the location.
        lng: Longitude of the location.
    """
    # NOTE: the responses here will be random, and are not related to the lat and lng.
    temp_response, descr_response = await asyncio.gather(
        ctx.deps.client.get(
            'https://demo-endpoints.pydantic.workers.dev/number',
            params={'min': 10, 'max': 30},
        ),
        ctx.deps.client.get(
            'https://demo-endpoints.pydantic.workers.dev/weather',
            params={'lat': lat, 'lng': lng},
        ),
    )
    temp_response.raise_for_status()
    descr_response.raise_for_status()
    return {
        'temperature': f'{temp_response.text} °C',
        'description': descr_response.text,
    }


async def run_agent(input_query: str):
    async with AsyncClient() as client:
        deps = Deps(client=client)
        result = await weather_agent.run(
            input_query, deps=deps
        )
        return result.output

await run_agent("What is the weather like in London and in Wiltshire?")  # test run the agent

### Evaluate the agent

To evaluate Pydantic AI agents, use Deepeval's Pydantic AI `Agent` to supply metrics.


> (Pro Tip) View your Agent's trace and publish test runs on [Confident AI](https://www.confident-ai.com/). Apart from this you get an in-house dataset editor and more advaced tools to monitor and enventually improve your Agent's performance. Get your API key from [here](https://app.confident-ai.com/)

Given below is the code to instrument the application.


In [None]:
# optional
from deepeval.integrations.pydantic_ai import instrument_pydantic_ai
instrument_pydantic_ai(api_key="<your-confident-api-key>")


### Dataset

For evaluating the agent, we need a dataset. You can create your own dataset or use the one from the [Confident AI](https://www.confident-ai.com/docs/llm-evaluation/dataset-management/create-goldens).


In [None]:
from deepeval.dataset import EvaluationDataset

dataset = EvaluationDataset()
dataset.pull(alias="weather_agent_queries", public=True)


### Create a metric to evaluate the agent.

Deepeval provides a state of the art ready to use [metric](https://deepeval.com/docs/metrics-introduction) to evaluate the agent. For this example, we will use the `AnswerRelevancyMetric`.

> [!NOTE]
You can only run end-to-end evals on metrics that evaluate the input and actual output of your Pydantic agent.



Using Deepeval's Pydantic AI `Agent` wrapper, you can supply metrics to the agent.

In [None]:
from deepeval.integrations.pydantic_ai import Agent
from deepeval.metrics import BaseMetric

weather_agent = Agent(
    hyperparameter_model,
    instructions='Be concise, reply with one sentence.',
    deps_type=Deps,
    retries=2,
)


class LatLng(BaseModel):
    lat: float
    lng: float


@weather_agent.tool
async def get_lat_lng(ctx: RunContext[Deps], location_description: str) -> LatLng:
    r = await ctx.deps.client.get('https://demo-endpoints.pydantic.workers.dev/latlng',params={'location': location_description},)
    r.raise_for_status()
    return LatLng.model_validate_json(r.content)


@weather_agent.tool
async def get_weather(ctx: RunContext[Deps], lat: float, lng: float) -> dict[str, Any]:

    temp_response, descr_response = await asyncio.gather(
        ctx.deps.client.get('https://demo-endpoints.pydantic.workers.dev/number',params={'min': 10, 'max': 30},),
        ctx.deps.client.get('https://demo-endpoints.pydantic.workers.dev/weather',params={'lat': lat, 'lng': lng},),
    )
    temp_response.raise_for_status()
    descr_response.raise_for_status()
    return {
        'temperature': f'{temp_response.text} °C',
        'description': descr_response.text,
    }

async def run_agent(input_query: str, metrics: list[BaseMetric]):
    async with AsyncClient() as client:
        deps = Deps(client=client)
        result = await weather_agent.run(
            input_query, deps=deps, metrics=metrics
        )
        return result.output

### Use the dataset iterator to evaluate the agent.

Use the dataset iterator (from the dataset that was pulled earlier from the Confident AI) to evaluate the agent.

In [None]:
from deepeval.metrics import AnswerRelevancyMetric

for golden in dataset.evals_iterator():
    task = asyncio.create_task(run_agent(
        golden.input,
        metrics=[AnswerRelevancyMetric(threshold=0.7,model="gpt-4o",include_reason=True)],
    ))
    dataset.evaluate(task)

Try changing hyperparameters and see how the agent performs.