## Azure AI Evaluation Example

This notebook demonstrates how to configure and use the Azure AI Evaluation SDK to evaluate an agent using a critic agent. The setup includes loading environment variables, configuring the model, and running both synchronous and asynchronous evaluations.

- **Environment Setup:** Loads credentials and endpoints from a `.env` file.
- **Model Configuration:** Sets up the Azure OpenAI model configuration.
- **Critic Agent:** Instantiates a critic agent for evaluation tasks.
- **Evaluation:** Runs both standard and auto evaluation methods on the specified agent and project.

Refer to the code cells above for implementation details.

In [1]:
import os
from azure.ai.evaluation import AzureOpenAIModelConfiguration, AzureAIProject
from azure.ai.evaluation._agents._critic_agent import CriticAgent
from dotenv import load_dotenv


[INFO] Could not import SKAgentConverter. Please install the dependency with `pip install semantic-kernel`.


In [2]:
load_dotenv("bingtool.env")

True

In [3]:
model_config = AzureOpenAIModelConfiguration(
    azure_endpoint=os.environ.get("AZURE_OPENAI_ENDPOINT"),
    api_key=os.environ.get("AZURE_OPENAI_API_KEY"),
    azure_deployment=os.environ.get("AZURE_OPENAI_DEPLOYMENT"),
    api_version=os.environ.get("AZURE_OPENAI_API_VERSION", "2024-02-01")
)
#
critic_agent = CriticAgent(model_config=model_config)

Class CriticAgent: This is an experimental class, and may change at any time. Please see https://aka.ms/azuremlexperimental for more information.
Class IntentResolutionEvaluator: This is an experimental class, and may change at any time. Please see https://aka.ms/azuremlexperimental for more information.
Class ToolCallAccuracyEvaluator: This is an experimental class, and may change at any time. Please see https://aka.ms/azuremlexperimental for more information.
Class TaskAdherenceEvaluator: This is an experimental class, and may change at any time. Please see https://aka.ms/azuremlexperimental for more information.


In [4]:
azure_ai_project = {
    "azure_endpoint": os.environ.get("PROJECT_ENDPOINT"),
}


In [5]:
critic_agent.evaluate(agent_id="asst_8LTtd9HiRi1tABmOIergHYPX", azure_ai_project=azure_ai_project, evaluators=['IntentResolution', 'TaskAdherence'])

Class AIAgentConverter: This is an experimental class, and may change at any time. Please see https://aka.ms/azuremlexperimental for more information.
Class FDPAgentDataRetriever: This is an experimental class, and may change at any time. Please see https://aka.ms/azuremlexperimental for more information.
Class AIAgentDataRetriever: This is an experimental class, and may change at any time. Please see https://aka.ms/azuremlexperimental for more information.


Fetched 5 threads for agent asst_8LTtd9HiRi1tABmOIergHYPX.


[{'thread_id': 'thread_e7W3LwwfjMFRhKBp5fKuZway',
  'results': {'IntentResolution': {'intent_resolution': 5.0,
    'intent_resolution_result': 'pass',
    'intent_resolution_threshold': 3,
    'intent_resolution_reason': "The user asked for tools or features to monitor user activity in Microsoft Teams for business. The agent provided a comprehensive, accurate list of relevant tools and features, briefly describing each, fully resolving the user's intent with clear and actionable information."},
   'TaskAdherence': {'task_adherence': 3.0,
    'task_adherence_result': 'pass',
    'task_adherence_threshold': 3,
    'task_adherence_reason': 'The assistant provided a relevant, concise answer, included a free trial link, and offered a follow-up question. However, it failed to use the required bing_custom_search tool and did not cite URLs as mandated by the system, which is a significant rule violation.'}}},
 {'thread_id': 'thread_QGcquQg5WyxQNhDCsFFLO9p9',
  'results': {'IntentResolution': {

In [6]:

import nest_asyncio
import asyncio

nest_asyncio.apply()

critic_agent.auto_evaluate(agent_id="asst_8LTtd9HiRi1tABmOIergHYPX", azure_ai_project=azure_ai_project)

Fetched 5 threads for agent asst_8LTtd9HiRi1tABmOIergHYPX.
Selected evaluators: ['IntentResolution'] for thread thread_e7W3LwwfjMFRhKBp5fKuZway
Selected evaluators: ['TaskAdherence', 'IntentResolution'] for thread thread_QGcquQg5WyxQNhDCsFFLO9p9
Selected evaluators: ['TaskAdherence', 'IntentResolution'] for thread thread_7aqCEIf45IBjjvBeCNV5CNSf
Selected evaluators: ['TaskAdherence', 'IntentResolution'] for thread thread_BptOsITtns2D9x5bCRWImlwO
Selected evaluators: ['TaskAdherence', 'IntentResolution'] for thread thread_HJbWlH063heMgE297bstenx6


[{'justification': "The agent was asked about tools or features to monitor user activity in Microsoft Teams for business solutions. The response provided a comprehensive list of relevant features and tools, explained their purposes, and offered to provide further assistance. Since there were no tool calls or specific instructions/constraints to follow, the primary evaluation goal is to assess whether the agent understood and resolved the user's intent holistically and helpfully.",
  'distinct_assessments': {'IntentResolution': "This evaluator will assess whether the agent correctly understood the user's request for monitoring tools/features in Microsoft Teams, provided a relevant and complete answer, and addressed any implicit needs for business solutions."},
  'thread_id': 'thread_e7W3LwwfjMFRhKBp5fKuZway',
  'results': {'IntentResolution': {'intent_resolution': 5.0,
    'intent_resolution_result': 'pass',
    'intent_resolution_threshold': 3,
    'intent_resolution_reason': "The user