# Custom Evaluators with Azure AI Foundry

This notebook demonstrates how to evaluate data using custom evaluators and send the results to [Azure AI Foundry](https://learn.microsoft.com/en-us/azure/ai-studio/what-is-ai-studio).

### Prerequisites

- An Azure subscription.
- An Azure AI Foundry workspace.
- An Azure AI Foundry project.
- An Azure OpenAI resource.

### Install the required packages

```bash
pip install -r requirements.txt
```

### Create the following environment variables or add them to an `.env` file

```bash
AZURE_OPENAI_ENDPOINT=<your-azure-openai-endpoint>
AZURE_OPENAI_API_KEY=<your-azure-openai-api-key>
AZURE_OPENAI_DEPLOYMENT=<your-azure-openai-deployment>
AZURE_OPENAI_API_VERSION=<your-azure-openai-api-version>
AZURE_SUBSCRIPTION_ID=<your-azure-subscription-id>
AZURE_RESOURCE_GROUP=<your-azure-resource-group>
AZURE_AI_FOUNDRY_PROJECT=<your-azure-azure_foundry_project>
```

### References

- [Azure AI Foundry](https://learn.microsoft.com/en-us/azure/ai-studio/what-is-ai-studio)
- [Evaluate your Generative AI application locally with the Azure AI Evaluation SDK](https://learn.microsoft.com/en-us/azure/ai-studio/how-to/develop/evaluate-sdk#evaluating-direct-and-indirect-attack-jailbreak-vulnerability)

In [1]:
!pip install -r requirements.txt



## Imports

In [2]:
# Import necessary libraries
import os
from dotenv import load_dotenv
from azure.identity import DefaultAzureCredential
from promptflow.core import AzureOpenAIModelConfiguration
from promptflow.tracing import start_trace

if "AZURE_OPENAI_API_KEY" not in os.environ:
    # load environment variables from .env file
    load_dotenv()

# start a trace session, and print a url for user to check trace
start_trace()

## Setup Credentials and Configuration

In [3]:
import os
from dotenv import load_dotenv

load_dotenv()


azure_ai_project = {
    "subscription_id": os.getenv("AZURE_SUBSCRIPTION_ID"),
    "resource_group_name": os.getenv("AZURE_RESOURCE_GROUP"),
    "project_name": os.getenv("AZURE_AI_FOUNDRY_PROJECT"),
}


model_config = {
    "api_key":os.getenv("AZURE_OPENAI_API_KEY"),
    "azure_endpoint": os.getenv("AZURE_OPENAI_ENDPOINT"),
    "azure_deployment": os.getenv("AZURE_OPENAI_DEPLOYMENT"),
}



configuration = AzureOpenAIModelConfiguration(
    azure_deployment=os.getenv("AZURE_OPENAI_DEPLOYMENT"),
    azure_endpoint=os.getenv("AZURE_OPENAI_ENDPOINT"),
    api_version = os.getenv("AZURE_OPENAI_API_VERSION"),
    api_key=os.getenv("AZURE_OPENAI_API_KEY")
)

credential = DefaultAzureCredential()

In [4]:
print(azure_ai_project)

{'subscription_id': '0debeb64-562c-44d8-9966-110450d5f9ed', 'resource_group_name': 'paramount-data-analytics', 'project_name': 'paramountda01'}


## Groundedness Evaluator and Groundedness Pro Evaluator

In [5]:
from azure.ai.evaluation import GroundednessProEvaluator, GroundednessEvaluator

# Initialazing Groundedness and Groundedness Pro evaluators
groundedness_eval = GroundednessEvaluator(model_config)
groundedness_pro_eval = GroundednessProEvaluator(azure_ai_project=azure_ai_project, credential=credential)

query_response = dict(
    query="Which tent is the most waterproof?",
    context="The Alpine Explorer Tent is the second most water-proof of all tents available.",
    response="The Alpine Explorer Tent is the most waterproof."
)

# Running Groundedness Evaluator on a query and response pair
groundedness_score = groundedness_eval(
    **query_response
)
print(groundedness_score)

groundedness_pro_score = groundedness_pro_eval(
    **query_response
)
print(groundedness_pro_score)



{'groundedness': 3.0, 'gpt_groundedness': 3.0, 'groundedness_reason': 'The response attempts to answer the query but contains incorrect information that contradicts the context.', 'groundedness_result': 'pass', 'groundedness_threshold': 3}
{'groundedness_pro_reason': '\'The Alpine Explorer Tent is the most waterproof.\' is ungrounded because "The Alpine Explorer Tent is the second most water-proof of all tents available." Thus, it is not the most waterproof. It\'s contradiction.', 'groundedness_pro_label': False, 'groundedness_pro_score': 1, 'groundedness_pro_threshold': 5, 'groundedness_pro_result': 'pass'}


## Friendliness Evaluator

In [6]:
from friendliness.friendliness import FriendlinessEvaluator

friendliness_eval = FriendlinessEvaluator(configuration)

friendliness_score = friendliness_eval(response="I will not apologize for my behavior!")
print(friendliness_score)

{'score': 1, 'reason': 'The response is defensive and lacks warmth or approachability.'}


## Evaluate with both built-in and custom evaluators

In [None]:
import pathlib

from azure.ai.evaluation import evaluate
from azure.ai.evaluation import (
    ContentSafetyEvaluator,
    RelevanceEvaluator,
    CoherenceEvaluator,
    GroundednessEvaluator,
    FluencyEvaluator,
    SimilarityEvaluator,
)
from model_endpoint import ModelEndpoint


content_safety_evaluator = ContentSafetyEvaluator(
    azure_ai_project=azure_ai_project, credential=DefaultAzureCredential()
)
relevance_evaluator = RelevanceEvaluator(model_config)
coherence_evaluator = CoherenceEvaluator(model_config)
groundedness_evaluator = GroundednessEvaluator(model_config)
fluency_evaluator = FluencyEvaluator(model_config)
similarity_evaluator = SimilarityEvaluator(model_config)

path = str(pathlib.Path(pathlib.Path.cwd())) + "/data.jsonl"

print(path)

results = evaluate(
    evaluation_name="Eval-Run-" + "-" + model_config["azure_deployment"].title(),
    data= "./data/data.jsonl",
    target=ModelEndpoint(model_config),

    evaluators={
        "groundedness": groundedness_eval,
        "content_safety": content_safety_evaluator,
        "coherence": coherence_evaluator,
        "relevance": relevance_evaluator,
        "groundedness": groundedness_evaluator,
        "fluency": fluency_evaluator,
        "similarity": similarity_evaluator,
        "friendliness": friendliness_eval #custom evaluator
    },
    # column mapping
    evaluator_config={
        "content_safety": {"column_mapping": {"query": "${data.query}", "response": "${target.response}"}},
        "coherence": {"column_mapping": {"response": "${target.response}", "query": "${data.query}"}},
        "relevance": {
            "column_mapping": {"response": "${target.response}", "context": "${data.context}", "query": "${data.query}"}
        },
        "groundedness": {
            "column_mapping": {
                "response": "${target.response}",
                "context": "${data.context}",
                "query": "${data.query}",
            }
        },
        "fluency": {
            "column_mapping": {"response": "${target.response}", "context": "${data.context}", "query": "${data.query}"}
        },
        "similarity": {
            "column_mapping": {"response": "${target.response}", "context": "${data.context}", "query": "${data.query}"}
        },
        "friendliness": {
            "column_mapping": {"response": "${target.response}", "context": "${data.context}", "query": "${data.query}"
            }
        }
    },
    # Optionally provide your Azure AI project information to track your evaluation results in your Azure AI project
    azure_ai_project = azure_ai_project,
    # # Optionally provide an output path to dump a json of metric summary, row level data and metric and Azure AI project URL
    output_path="./results.jsonl"

)







c:\Users\akaruparti\Documents\genai-evals\genai-evals/data.jsonl
{'api_key': 'e058bda1c87b4de297a562abe6439ccb', 'azure_endpoint': 'https://paramountda2801552698.openai.azure.com/', 'azure_deployment': 'gpt-4o', 'type': 'azure_openai', 'api_version': '2024-02-15-preview'}


[2025-04-07 16:06:45 -0400][promptflow._sdk._orchestrator.run_submitter][INFO] - Submitting run azure_ai_evaluation_evaluators_None_20250407_160636_765081, log path: C:\Users\akaruparti\.promptflow\.runs\azure_ai_evaluation_evaluators_None_20250407_160636_765081\logs.txt


2025-04-07 16:06:45 -0400   23208 execution.bulk     INFO     Current thread is not main thread, skip signal handler registration in BatchEngine.
2025-04-07 16:06:45 -0400   23208 execution.bulk     INFO     Current system's available memory is 1999.9921875MB, memory consumption of current process is 214.47265625MB, estimated available worker count is 1999.9921875/214.47265625 = 9
2025-04-07 16:06:45 -0400   23208 execution.bulk     INFO     Set process count to 4 by taking the minimum value among the factors of {'default_worker_count': 4, 'row_count': 4, 'estimated_worker_count_based_on_memory_usage': 9}.
2025-04-07 16:06:51 -0400   23208 execution.bulk     INFO     Process name(SpawnProcess-5)-Process id(36640)-Line number(0) start execution.
2025-04-07 16:06:51 -0400   23208 execution.bulk     INFO     Process name(SpawnProcess-4)-Process id(19316)-Line number(1) start execution.
2025-04-07 16:06:51 -0400   23208 execution.bulk     INFO     Process name(SpawnProcess-7)-Process id(23

[2025-04-07 16:14:54 -0400][promptflow._sdk._orchestrator.run_submitter][INFO] - Submitting run azure_ai_evaluation_evaluators_similarity_20250407_161454_069071, log path: C:\Users\akaruparti\.promptflow\.runs\azure_ai_evaluation_evaluators_similarity_20250407_161454_069071\logs.txt
[2025-04-07 16:14:54 -0400][promptflow._sdk._orchestrator.run_submitter][INFO] - Submitting run azure_ai_evaluation_evaluators_relevance_20250407_161454_064557, log path: C:\Users\akaruparti\.promptflow\.runs\azure_ai_evaluation_evaluators_relevance_20250407_161454_064557\logs.txt
[2025-04-07 16:14:54 -0400][promptflow._sdk._orchestrator.run_submitter][INFO] - Submitting run azure_ai_evaluation_evaluators_groundedness_20250407_161454_058930, log path: C:\Users\akaruparti\.promptflow\.runs\azure_ai_evaluation_evaluators_groundedness_20250407_161454_058930\logs.txt
[2025-04-07 16:14:54 -0400][promptflow._sdk._orchestrator.run_submitter][INFO] - Submitting run azure_ai_evaluation_evaluators_friendliness_202504

2025-04-07 16:14:54 -0400   23208 execution.bulk     INFO     Current thread is not main thread, skip signal handler registration in BatchEngine.
2025-04-07 16:14:55 -0400   23208 execution.bulk     INFO     Finished 4 / 4 lines.
2025-04-07 16:14:55 -0400   23208 execution.bulk     INFO     Average execution time for completed lines: 0.07 seconds. Estimated time for incomplete lines: 0.0 seconds.
2025-04-07 16:14:55 -0400   23208 execution          ERROR    4/4 flow run failed, indexes: [3,1,0,2], exception of index 3: (UserError) SimilarityEvaluator: Either 'conversation' or individual inputs must be provided.

Run name: "azure_ai_evaluation_evaluators_similarity_20250407_161454_069071"
Run status: "Completed"
Start time: "2025-04-07 16:14:54.241290-04:00"
Duration: "0:00:01.739604"
Output path: "C:\Users\akaruparti\.promptflow\.runs\azure_ai_evaluation_evaluators_similarity_20250407_161454_069071"

2025-04-07 16:14:56 -0400   23208 execution.bulk     INFO     Finished 1 / 4 lines.
20