# Batch evaluation with your own data 
The following sample shows the basic way to evaluate a Generative AI application in your development environment with the Azure AI evaluation SDK.

> ✨ ***Note*** <br>
> Please check the reference document before you get started - https://learn.microsoft.com/en-us/azure/ai-studio/how-to/develop/evaluate-sdk

## 🔨 Current Support and Limitations (as of 2025-01-14) 
- Check the region support for the Azure AI Evaluation SDK. https://learn.microsoft.com/en-us/azure/ai-studio/concepts/evaluation-metrics-built-in?tabs=warning#region-support

### Region support for evaluations
| Region              | Hate and Unfairness, Sexual, Violent, Self-Harm, XPIA, ECI (Text) | Groundedness (Text) | Protected Material (Text) | Hate and Unfairness, Sexual, Violent, Self-Harm, Protected Material (Image) |
|---------------------|------------------------------------------------------------------|---------------------|----------------------------|----------------------------------------------------------------------------|
| North Central US    | no                                                               | no                  | no                         | yes                                                                        |
| East US 2           | yes                                                              | yes                 | yes                        | yes                                                                        |
| Sweden Central      | yes                                                              | yes                 | yes                        | yes                                                                        |
| US North Central    | yes                                                              | no                  | yes                        | yes                                                                        |
| France Central      | yes                                                              | yes                 | yes                        | yes                                                                        |
| Switzerland West    | yes                                                              | no                  | no                         | yes                                                                        |

### Region support for adversarial simulation
| Region            | Adversarial Simulation (Text) | Adversarial Simulation (Image) |
|-------------------|-------------------------------|---------------------------------|
| UK South          | yes                           | no                              |
| East US 2         | yes                           | yes                             |
| Sweden Central    | yes                           | yes                             |
| US North Central  | yes                           | yes                             |
| France Central    | yes                           | no                              |


## ✔️ Pricing and billing
- Effective 1/14/2025, Azure AI Safety Evaluations will no longer be free in public preview. It will be billed based on consumption as following:

| Service Name              | Safety Evaluations       | Price Per 1K Tokens (USD) |
|---------------------------|--------------------------|---------------------------|
| Azure Machine Learning    | Input pricing for 3P     | $0.02                     |
| Azure Machine Learning    | Output pricing for 3P    | $0.06                     |
| Azure Machine Learning    | Input pricing for 1P     | $0.012                    |
| Azure Machine Learning    | Output pricing for 1P    | $0.012                    |


In [26]:
import pandas as pd
import os
import json

from pprint import pprint
from azure.ai.evaluation import evaluate
from azure.ai.evaluation import RelevanceEvaluator
from azure.ai.evaluation import GroundednessEvaluator, GroundednessProEvaluator
from azure.identity import DefaultAzureCredential
from dotenv import load_dotenv
from azure.ai.projects import AIProjectClient
from azure.ai.projects.models import (
    Evaluation,
    EvaluatorConfiguration,
    InputData
)
import pathlib

from azure.ai.evaluation import evaluate
from azure.ai.evaluation import (
    ContentSafetyEvaluator,
    RelevanceEvaluator,
    CoherenceEvaluator,
    GroundednessEvaluator,
    FluencyEvaluator,
    SimilarityEvaluator,
    F1ScoreEvaluator,
    RetrievalEvaluator,
)

from azure.ai.ml import MLClient

load_dotenv(override=True)

True

## 🚀 Run Evaluators for local and upload to cloud (azure.ai.evaluation.evaluate)

In [46]:
credential = DefaultAzureCredential()

azure_ai_project_endpoint = os.environ.get("AZURE_AI_PROJECT_ENDPOINT")
project_resource_id = os.environ.get("AZURE_AI_PROJECT_RESOURCE_ID")
subscription_id = project_resource_id.split("/")[2]
resource_group_name = project_resource_id.split("/")[4]
project_name = azure_ai_project_endpoint.split("/")[5]


azure_ai_project_dict = {
    "subscription_id": subscription_id,
    "resource_group_name": resource_group_name,
    "project_name": project_name,
}

azure_ai_project_client = AIProjectClient(
    credential=DefaultAzureCredential(), 
    endpoint=os.environ.get("AZURE_AI_PROJECT_ENDPOINT"),
    
)



model_config = {
    "azure_endpoint": os.environ.get("AZURE_OPENAI_ENDPOINT"),
    "api_key": os.environ.get("AZURE_OPENAI_API_KEY"),
    "azure_deployment": os.environ.get("AZURE_OPENAI_CHAT_DEPLOYMENT_NAME"),
    "api_version": os.environ.get("AZURE_OPENAI_API_VERSION"),
    "type": "azure_openai",
}
azure_ai_project_dict

{'subscription_id': '3d4d3dd0-79d4-40cf-a94e-b4154812c6ca',
 'resource_group_name': 'rg-hyokeunchoi-7652',
 'project_name': 'hyo-ai-foundry-pjt1'}

## 🚀 Generate response dataset with Azure OpenAI
- Use your models to generate answers based on the query data set. These response records serve as the seed for creating assessments. By customizing your prompts, you can produce text tailored to your domain.

In [4]:
from openai import AzureOpenAI

aoai_api_endpoint = os.getenv("AZURE_OPENAI_ENDPOINT")
aoai_api_key = os.getenv("AZURE_OPENAI_API_KEY")
aoai_api_version = os.getenv("AZURE_OPENAI_API_VERSION")
aoai_deployment_name = os.getenv("AZURE_OPENAI_CHAT_DEPLOYMENT_NAME")

try:
    client = AzureOpenAI(
        azure_endpoint=aoai_api_endpoint,
        api_key=aoai_api_key,
        api_version=aoai_api_version,
    )

    print("=== Initialized AzuureOpenAI client ===")
    print(f"AZURE_OPENAI_ENDPOINT={aoai_api_endpoint}")
    print(f"AZURE_OPENAI_API_VERSION={aoai_api_version}")
    print(f"AZURE_OPENAI_DEPLOYMENT_NAME={aoai_deployment_name}")

except (ValueError, TypeError) as e:
    print(e)

=== Initialized AzuureOpenAI client ===
AZURE_OPENAI_ENDPOINT=https://aoai-services1.openai.azure.com/
AZURE_OPENAI_API_VERSION=2025-03-01-preview
AZURE_OPENAI_DEPLOYMENT_NAME=gpt-4o-mini


## Case1. When you have all the data (Query + Response + Ground Truth)
- If you have all the data, you can evaluate the model / service with the following steps.
- Assume that you already upload the data file (csv) to Azure Blob storage 

In [5]:
from azure.storage.blob import BlobServiceClient


def upload_evaldata_to_blob(save_path, blob_name, container_name="eval-container"):
    # Create a blob client using the local file name as the name for the blob
    # Retrieve the storage connection string from environment variables
    blob_conn_str = os.getenv("AZURE_STORAGE_BLOB_CONNECTION_STRING")
    if not blob_conn_str or blob_conn_str == "":
        raise ValueError("AZURE_STORAGE_BLOB_CONNECTION_STRING must be set.")

    blob_service_client = BlobServiceClient.from_connection_string(blob_conn_str)

    if not blob_service_client.get_container_client(container_name).exists():
        blob_service_client.create_container(container_name)

    container_client = blob_service_client.get_container_client(container_name)

    # Upload the created file
    with open(f"data/{blob_name}", "rb") as data:
        container_client.upload_blob(blob_name, data, overwrite=True)
    blob_client = container_client.get_blob_client(blob_name)

    # Download CSV from Azure Blob and save locally
    with open(save_path, "wb") as f:
        blob_data = blob_client.download_blob()
        blob_data.readinto(f)
    print(f"Downloaded blob data to {save_path}")

In [6]:
blob_name = "eval_all_data.csv"
container_name = "eval-container"
save_path = "case1_temp_data.csv"
upload_evaldata_to_blob(save_path, blob_name, container_name)
df = pd.read_csv(save_path)
df.head(5)

Downloaded blob data to case1_temp_data.csv


Unnamed: 0,query,context,response,ground_truth
0,What is the warranty period for Contoso products?,Customer service inquiry,The warranty period for Contoso products typic...,The warranty period for Contoso products is on...
1,How can I reset my Contoso device?,Technical support request,"To reset your Contoso device, press and hold t...",Press and hold the power button for ten second...
2,Where can I find the user manual for my Contos...,Product documentation inquiry,You can find the user manual for your Contoso ...,The user manual is available on the Contoso we...
3,What should I do if my Contoso device won't tu...,Troubleshooting issue,"If your Contoso device won't turn on, try char...",Charge the device for 30 minutes if it won't t...
4,How do I contact Contoso customer support?,Customer service inquiry,You can contact Contoso customer support by ca...,Contact customer support via phone or live cha...


In [7]:
# Create a jsonl file from the query data
outname = "case1_all_data.jsonl"

outdir = "./data"
if not os.path.exists(outdir):
    os.mkdir(outdir)

input_path = os.path.join(outdir, outname)
df.to_json(input_path, orient="records", lines=True, force_ascii=False)

## Case2. When you have only query and ground truth data (Query + Ground Truth)
- If you have all the data, you can generate the response and evaluate the model / service with the following steps.
- Assume that you already upload the data file (csv) to Azure Blob storage 

In [8]:
blob_name = "eval_all_data.csv"
container_name = "eval-container"
save_path = "case2_temp_data.csv"
upload_evaldata_to_blob(save_path, blob_name, container_name)
df = pd.read_csv(save_path)
df.head(5)

Downloaded blob data to case2_temp_data.csv


Unnamed: 0,query,context,response,ground_truth
0,What is the warranty period for Contoso products?,Customer service inquiry,The warranty period for Contoso products typic...,The warranty period for Contoso products is on...
1,How can I reset my Contoso device?,Technical support request,"To reset your Contoso device, press and hold t...",Press and hold the power button for ten second...
2,Where can I find the user manual for my Contos...,Product documentation inquiry,You can find the user manual for your Contoso ...,The user manual is available on the Contoso we...
3,What should I do if my Contoso device won't tu...,Troubleshooting issue,"If your Contoso device won't turn on, try char...",Charge the device for 30 minutes if it won't t...
4,How do I contact Contoso customer support?,Customer service inquiry,You can contact Contoso customer support by ca...,Contact customer support via phone or live cha...


In [9]:
# Create a jsonl file from the query data
outname = "case2_temp_data.jsonl"

outdir = "./data"
if not os.path.exists(outdir):
    os.mkdir(outdir)

query_path = os.path.join(outdir, outname)
df.to_json(query_path, orient="records", lines=True, force_ascii=False)

In [10]:
import tqdm

# This is the final jsonl file with the response added
# it will be used for evaluation
input_path = "./data/case2_query_response_data.jsonl"
with open(input_path, "w", encoding="utf-8") as outfile:
    outfile.write("")


with open(query_path, "r", encoding="utf-8") as infile, open(
    input_path, "a", encoding="utf-8"
) as outfile:

    for idx, line in enumerate(infile):
        print(f"=== Processing line {idx} ===")
        data = json.loads(line)
        resp = client.chat.completions.create(
            model=aoai_deployment_name,
            messages=[{"role": "user", "content": data["query"]}],
            temperature=0,
        )

        response_text = resp.choices[0].message.content
        print(response_text)
        data["response"] = response_text

        outfile.write(json.dumps(data, ensure_ascii=False) + "\n")

=== Processing line 0 ===
The warranty period for Contoso products can vary depending on the specific product and its category. Typically, electronics might have a one-year warranty, while other products could have different terms. For the most accurate and detailed information, it's best to check the warranty policy provided with the product or visit the official Contoso website. If you have a specific product in mind, I can help you look for more detailed information!
=== Processing line 1 ===
To reset your Contoso device, the steps may vary depending on the type of device you have (e.g., smartphone, tablet, computer). Here are general instructions for resetting common types of devices:

### For Windows PC:
1. **Open Settings**: Click on the Start menu and select the gear icon to open Settings.
2. **Update & Security**: Click on "Update & Security."
3. **Recovery**: Select "Recovery" from the left sidebar.
4. **Reset this PC**: Under "Reset this PC," click on "Get started."
5. **Choo

## Evaluate the model / service with the set up data 
- Check that you have already run case1 or case2 as input data.

In [11]:
input_path = input_path  # be sure which case you took
output_path = "./data/cloud_evaluation_output.json"


# https://learn.microsoft.com/en-us/azure/ai-studio/how-to/develop/flow-evaluate-sdk
retrieval_evaluator = RetrievalEvaluator(model_config)
fluency_evaluator = FluencyEvaluator(model_config)
groundedness_evaluator = GroundednessEvaluator(model_config)
relevance_evaluator = RelevanceEvaluator(model_config)
coherence_evaluator = CoherenceEvaluator(model_config)
similarity_evaluator = SimilarityEvaluator(model_config)

column_mapping = {
    "query": "${data.query}",
    "ground_truth": "${data.ground_truth}",
    "response": "${data.response}",
    "context": "${data.context}",
}

In [29]:
import datetime

result = evaluate(
    evaluation_name=f"evaluation_local_upload_cloud_{datetime.datetime.now().strftime('%Y-%m-%d %H:%M:%S')}",
    data=input_path,
    evaluators={
        "groundedness": groundedness_evaluator,
        "retrieval": retrieval_evaluator,
        "relevance": relevance_evaluator,
        "coherence": coherence_evaluator,
        "fluency": fluency_evaluator,
        "similarity": similarity_evaluator,
    },
    evaluator_config={
        "groundedness": {"column_mapping": column_mapping},
        "retrieval": {"column_mapping": column_mapping},
        "relevance": {"column_mapping": column_mapping},
        "coherence": {"column_mapping": column_mapping},
        "fluency": {"column_mapping": column_mapping},
        "similarity": {"column_mapping": column_mapping},
    },
    output_path=output_path,
)

[2025-06-20 09:31:56 +0000][promptflow._sdk._orchestrator.run_submitter][INFO] - Submitting run azure_ai_evaluation_evaluators_groundedness_20250620_093155_936922, log path: /home/azureuser/.promptflow/.runs/azure_ai_evaluation_evaluators_groundedness_20250620_093155_936922/logs.txt
[2025-06-20 09:31:56 +0000][promptflow._sdk._orchestrator.run_submitter][INFO] - Submitting run azure_ai_evaluation_evaluators_relevance_20250620_093155_963768, log path: /home/azureuser/.promptflow/.runs/azure_ai_evaluation_evaluators_relevance_20250620_093155_963768/logs.txt


[2025-06-20 09:31:56 +0000][promptflow._sdk._orchestrator.run_submitter][INFO] - Submitting run azure_ai_evaluation_evaluators_retrieval_20250620_093155_947665, log path: /home/azureuser/.promptflow/.runs/azure_ai_evaluation_evaluators_retrieval_20250620_093155_947665/logs.txt
[2025-06-20 09:31:56 +0000][promptflow._sdk._orchestrator.run_submitter][INFO] - Submitting run azure_ai_evaluation_evaluators_coherence_20250620_093155_975697, log path: /home/azureuser/.promptflow/.runs/azure_ai_evaluation_evaluators_coherence_20250620_093155_975697/logs.txt
[2025-06-20 09:31:56 +0000][promptflow._sdk._orchestrator.run_submitter][INFO] - Submitting run azure_ai_evaluation_evaluators_fluency_20250620_093155_995709, log path: /home/azureuser/.promptflow/.runs/azure_ai_evaluation_evaluators_fluency_20250620_093155_995709/logs.txt
[2025-06-20 09:31:56 +0000][promptflow._sdk._orchestrator.run_submitter][INFO] - Submitting run azure_ai_evaluation_evaluators_similarity_20250620_093156_039147, log path

2025-06-20 09:32:00 +0000  493403 execution.bulk     INFO     Finished 2 / 20 lines.
2025-06-20 09:32:00 +0000  493403 execution.bulk     INFO     Average execution time for completed lines: 2.1 seconds. Estimated time for incomplete lines: 37.8 seconds.
2025-06-20 09:32:00 +0000  493403 execution.bulk     INFO     Finished 4 / 20 lines.
2025-06-20 09:32:00 +0000  493403 execution.bulk     INFO     Average execution time for completed lines: 1.1 seconds. Estimated time for incomplete lines: 17.6 seconds.
2025-06-20 09:32:00 +0000  493403 execution.bulk     INFO     Finished 6 / 20 lines.
2025-06-20 09:32:00 +0000  493403 execution.bulk     INFO     Average execution time for completed lines: 0.76 seconds. Estimated time for incomplete lines: 10.64 seconds.
2025-06-20 09:32:01 +0000  493403 execution.bulk     INFO     Finished 8 / 20 lines.
2025-06-20 09:32:01 +0000  493403 execution.bulk     INFO     Average execution time for completed lines: 0.58 seconds. Estimated time for incomplet