# Evaluate Semantic Kernel Azure AI Agents in Azure AI Foundry

## Objective

This sample demonstrates how to evaluate an AI agent (Azure AI Agent Service) on these important aspects of your agentic workflow:

- Intent Resolution: Measures how well the agent identifies the user’s request, including how well it scopes the user’s intent, asks clarifying questions, and reminds end users of its scope of capabilities.
- Tool Call Accuracy: Evaluates the agent's ability to select the appropriate tools, and process correct parameters from previous steps.
- Task Adherence: Measures how well the agent’s response adheres to its assigned tasks, according to its system message and prior steps.

## Time
You can expect to complete this sample in approximately 20 minutes.

## Prerequisites

### Packages
- `semantic-kernel` installed (`pip install semantic-kernel`)
- `azure-ai-evaluation` SDK installed

Before running the sample:
```bash
pip install semantic-kernel azure-ai-projects azure-identity azure-ai-evaluation
```

### Azure Resources
- An Azure OpenAI resource with a deployment configured
- An Azure AI Foundry project

### Environment Variables

- For **Foundry Agent service**:
  - **`AZURE_AI_AGENT_ENDPOINT`** – Endpoint of your Azure AI Foundry project.
  - **`AZURE_AI_AGENT_MODEL_DEPLOYMENT_NAME`** – Deployment name of the model used by the Foundry Agent.

- For **evaluating agents**:
  - **`AZURE_OPENAI_ENDPOINT`** – Azure OpenAI endpoint used for evaluation.
  - **`AZURE_OPENAI_API_KEY`** – Azure OpenAI API key used for evaluation.
  - **`AZURE_OPENAI_CHAT_DEPLOYMENT_NAME`** – Deployment name of the chat model used for evaluation.
  - **`AZURE_OPENAI_API_VERSION`** – Azure OpenAI API version used for evaluation (e.g., `2024-05-01-preview`).

- For **Azure AI Foundry** (Bonus):
  - **`AZURE_AI_AGENT_ENDPOINT`** – Endpoint of your Azure AI Foundry project.

### Create an Azure AI Agent with a plugin - [reference](https://learn.microsoft.com/en-us/semantic-kernel/frameworks/agent/agent-types/azure-ai-agent?pivots=programming-language-python)

In [None]:
from typing import Annotated

from azure.identity import DefaultAzureCredential

from semantic_kernel.agents import AzureAIAgent, AzureAIAgentSettings
from semantic_kernel.functions import kernel_function


# Define a sample plugin for the sample
class MenuPlugin:
    """A sample Menu Plugin used for the concept sample."""

    @kernel_function(description="Provides a list of specials from the menu.")
    def get_specials(self) -> Annotated[str, "Returns the specials from the menu."]:
        return """
        Special Soup: Clam Chowder
        Special Salad: Cobb Salad
        Special Drink: Chai Tea
        """

    @kernel_function(description="Provides the price of the requested menu item.")
    def get_item_price(
        self, menu_item: Annotated[str, "The name of the menu item."]
    ) -> Annotated[str, "Returns the price of the menu item."]:
        _ = menu_item  # This is just to simulate a function that uses the input.
        return "$9.99"


# Create an agent
creds = DefaultAzureCredential()
project_client = AzureAIAgent.create_client(credential=creds)

deployment_name = AzureAIAgentSettings().model_deployment_name
agent_definition = await project_client.agents.create_agent(
    model=deployment_name,
    name="Host",
    instructions="Answer questions about the menu.",
)

agent = AzureAIAgent(
    client=project_client,
    definition=agent_definition,
    plugins=[MenuPlugin()],
)

### Invoke the agent

In [None]:
USER_INPUTS = [
    "Hello",
    "What is the special soup?",
    "What is the special drink?",
    "How much is it?",
    "Thank you",
]

thread = None
for user_input in USER_INPUTS:
    print(f"## User: {user_input}")
    response = await agent.get_response(messages=user_input, thread=thread)
    print(f"## {response.name}: {response.content}")
    thread = response.thread

### Converter: Get data from agent

In [None]:
from azure.ai.evaluation import AIAgentConverter
from azure.ai.projects import AIProjectClient

# Print the thread ID for reference
print(thread.id)

# The AIAgentConverter requires a sync project client
ai_agent_settings = AzureAIAgentSettings()
sync_project_client = AIProjectClient(
    endpoint=ai_agent_settings.endpoint,
    credential=DefaultAzureCredential(),
)

converter = AIAgentConverter(sync_project_client)

file_name = "evaluation_data.jsonl"
# Save the agent thread data to a JSONL file (all turns)
evaluation_data = converter.prepare_evaluation_data([thread.id], filename=file_name)
# print(json.dumps(evaluation_data, indent=4))
len(evaluation_data)  # number of turns in the thread

### Setting up evaluator

We will select the following evaluators to assess the different aspects relevant for agent quality: 

- [Intent resolution](https://aka.ms/intentresolution-sample): measures the extent of which an agent identifies the correct intent from a user query. Scale: integer 1-5. Higher is better.
- [Tool call accuracy](https://aka.ms/toolcallaccuracy-sample): evaluates the agent’s ability to select the appropriate tools, and process correct parameters from previous steps. Scale: float 0-1. Higher is better.
- [Task adherence](https://aka.ms/taskadherence-sample): measures the extent of which an agent’s final response adheres to the task based on its system message and a user query. Scale: integer 1-5. Higher is better.


In [None]:
from pprint import pprint

from azure.ai.evaluation import (
    AzureOpenAIModelConfiguration,
    IntentResolutionEvaluator,
    TaskAdherenceEvaluator,
    ToolCallAccuracyEvaluator,
)

from semantic_kernel.connectors.ai.open_ai import AzureOpenAISettings

azure_openai_settings = AzureOpenAISettings()
if not azure_openai_settings.endpoint:
    raise ValueError("Azure OpenAI endpoint is not set in the environment variables.")
if not azure_openai_settings.api_key:
    raise ValueError("Azure OpenAI API key is not set in the environment variables.")
if not azure_openai_settings.chat_deployment_name:
    raise ValueError("Azure OpenAI chat deployment name is not set in the environment variables.")


model_config = AzureOpenAIModelConfiguration(
    azure_endpoint=str(azure_openai_settings.endpoint),
    api_key=azure_openai_settings.api_key.get_secret_value(),
    api_version=azure_openai_settings.api_version,
    azure_deployment=azure_openai_settings.chat_deployment_name,
)

intent_resolution = IntentResolutionEvaluator(model_config=model_config)
tool_call_accuracy = ToolCallAccuracyEvaluator(model_config=model_config)
task_adherence = TaskAdherenceEvaluator(model_config=model_config)

### Run Evaluator

In [None]:
from azure.ai.evaluation import evaluate

response = evaluate(
    data=file_name,
    evaluators={
        "tool_call_accuracy": tool_call_accuracy,
        "intent_resolution": intent_resolution,
        "task_adherence": task_adherence,
    },
    azure_ai_project=ai_agent_settings.endpoint,
)
pprint(f"AI Foundary URL: {response.get('studio_url')}")

## Inspect results on Azure AI Foundry

Go to AI Foundry URL for rich Azure AI Foundry data visualization to inspect the evaluation scores and reasoning to quickly identify bugs and issues of your agent to fix and improve.