# Quality Evaluators with the Azure AI Evaluation SDK
The following sample shows the basic way to evaluate a Generative AI application in your development environment with the Azure AI evaluation SDK.

> ✨ ***Note*** <br>
> Please check the reference document before you get started - https://learn.microsoft.com/en-us/azure/ai-studio/how-to/develop/evaluate-sdk

## Prerequisites
Configure a Python virtual environment for 3.10 or later: 
 1. open the Command Palette (Ctrl+Shift+P).
 1. Search for Python: Create Environment.
 1. select Venv / Conda and choose where to create the new environment.
 1. Select the Python interpreter version. Create with version 3.10 or later.

For a dependency installation, run the code below to install the packages required to run it. 

```bash
pip install -r requirements.txt
```

## Set up your environment
Git clone the repository to your local machine. 

```bash
git clone https://github.com/hyogrin/Azure_OpenAI_samples.git
```

Create an .env file based on the .env-sample file. Copy the new .env file to the folder containing your notebook and update the variables.

## 🔨 Current Support and Limitations (as of 2025-01-12) 
- Check the region support for the Azure AI Evaluation SDK. https://learn.microsoft.com/en-us/azure/ai-studio/concepts/evaluation-metrics-built-in?tabs=warning#region-support

| Region | Hate and Unfairness, Sexual, Violent, Self-Harm, XPIA | Groundedness Pro | Protected Material |
| - | - | - | - |
| UK South | Will be deprecated 12/1/24 | no | no |
| East US 2 | yes | yes | yes |
| Sweden Central | yes | yes | no |
| US North Central | yes | no | no |
| France Central | yes | no | no |
| Switzerland West | yes | no | no |

In [None]:
import pandas as pd
import os
import json

from pprint import pprint
from azure.ai.evaluation import evaluate
from azure.ai.evaluation import RelevanceEvaluator
from azure.ai.evaluation import GroundednessEvaluator, GroundednessProEvaluator
from azure.identity import DefaultAzureCredential
from dotenv import load_dotenv
from azure.ai.projects import AIProjectClient
from azure.ai.projects.models import (
    Evaluation,
    Dataset,
    EvaluatorConfiguration,
    ConnectionType,
    EvaluationSchedule,
    RecurrenceTrigger,
    ApplicationInsightsConfiguration
)
import pathlib

from azure.ai.evaluation import evaluate
from azure.ai.evaluation import (
    ContentSafetyEvaluator,
    IndirectAttackEvaluator,
)
from azure.ai.evaluation.simulator import (
    AdversarialSimulator,
    AdversarialScenario,
    AdversarialScenarioJailbreak,
    IndirectAttackSimulator,
)
from openai import AzureOpenAI
from model_endpoint import ModelEndpoint



load_dotenv(override=True)

In [5]:
credential = DefaultAzureCredential()

# Initialize Azure AI project and Azure OpenAI conncetion with your environment variables
azure_ai_project = {
    "subscription_id": os.environ.get("AZURE_SUBSCRIPTION_ID"),
    "resource_group_name": os.environ.get("AZURE_RESOURCE_GROUP_NAME"),
    "project_name": os.environ.get("AZURE_PROJECT_NAME"),
}

azure_openai_deployment = os.environ.get("AZURE_OPENAI_DEPLOYMENT_NAME")
azure_openai_endpoint = os.environ.get("AZURE_OPENAI_ENDPOINT")
azure_openai_key = os.environ.get("AZURE_OPENAI_API_KEY"),
azure_openai_api_version = os.environ.get("AZURE_OPENAI_API_VERSION")

credential = DefaultAzureCredential()

In [6]:
query_response = dict(
    query="Which tent is the most waterproof?",
    context="The Alpine Explorer Tent is the most water-proof of all tents available.",
    response="The Alpine Explorer Tent is the most waterproof."
)

conversation_str =  """{"messages": [ { "content": "Which tent is the most waterproof?", "role": "user" }, { "content": "The Alpine Explorer Tent is the most waterproof", "role": "assistant", "context": "From the our product list the alpine explorer tent is the most waterproof. The Adventure Dining Table has higher weight." }, { "content": "How much does it cost?", "role": "user" }, { "content": "$120.", "role": "assistant", "context": "The Alpine Explorer Tent is $120."} ] }""" 
conversation = json.loads(conversation_str)

## 🧪 AI-assisted ContentSafety evaluator
- Combines all the safety evaluators - ViolenceEvaluator, SexualEvaluator, SelfHarmEvaluator and HateUnfairnessEvaluator for a single output of combined metrics for query and response pairs
- Evaluating direct attack is a comparative measurement using the content safety evaluators as a control.

In [None]:
contentsafety_eval = ContentSafetyEvaluator(credential=credential, azure_ai_project=azure_ai_project)

uery_response = dict(
    query="Which tent is the most waterproof?", # optional
    context="The Alpine Explorer Tent is the most water-proof of all tents available.",
    response="The Alpine Explorer Tent is the most waterproof."
)

contentsafety_score = contentsafety_eval(
    **query_response
)
print(contentsafety_score)

contentsafety_conv_score = contentsafety_eval(conversation=conversation)
print(contentsafety_conv_score)

## 🧪 AI-assisted Indirect Attack evaluator
- Evaluates the indirect attack score for a given query and response or a multi-turn conversation, with reasoning.
- Indirect attacks, also known as cross-domain prompt injected attacks (XPIA), are when jailbreak attacks are injected into the context of a document or source that may result in an altered, unexpected behavior.


In [None]:
indirect_attack_eval = IndirectAttackEvaluator(credential=credential, azure_ai_project=azure_ai_project)

uery_response = dict(
    query="Which tent is the most waterproof?", # optional
    context="The Alpine Explorer Tent is the most water-proof of all tents available.",
    response="The Alpine Explorer Tent is the most waterproof."
)

indirect_attack_score = indirect_attack_eval(
    **query_response
)
print(indirect_attack_score)

indirect_attack_cov_score = indirect_attack_eval(conversation=conversation)
print(indirect_attack_cov_score)

## 🧪 Use AdversarialSimulator to generate abnormal contents
- Test that the Protected Material (i.e. copyrighted content or material) is not being generated by your generative AI applications. The following example uses an AdversarialSimulator paired with a protected content scenario to prompt your model to respond with material that is protected by intellectual property laws.

In [9]:
from typing import List, Dict, Optional, Any

def call_to_your_ai_application(query: str) -> str:
    # logic to call your application
    # use a try except block to catch any errors
    deployment = os.environ.get("AZURE_OPENAI_DEPLOYMENT")
    endpoint = os.environ.get("AZURE_OPENAI_ENDPOINT")
    # Get a client handle for the model
    client = AzureOpenAI(
        azure_endpoint=azure_openai_endpoint,
        api_version=azure_openai_api_version,
        api_key=azure_openai_key,
    )
    completion = client.chat.completions.create(
        model=deployment,
        messages=[
            {
                "role": "user",
                "content": query,
            }
        ],
        max_tokens=800,
        temperature=0.7,
        top_p=0.95,
        frequency_penalty=0,
        presence_penalty=0,
        stop=None,
        stream=False,
    )
    message = completion.to_dict()["choices"][0]["message"]
    # change this to return the response from your application
    return message["content"]


async def callback(
    messages: Dict[str, List[Dict]],
    stream: bool = False,
    session_state: Any = None,
    context: Optional[Dict[str, Any]] = None,
) -> dict:
    messages_list = messages["messages"]
    # Get the last message from the user
    latest_message = messages_list[-1]
    query = latest_message["content"]

    # Call the model
    response = call_to_your_ai_application(query)

    formatted_response = formatted_response = {
        "content": response,
        "role": "assistant",
        "context": "",
    }
    
    messages["messages"].append(formatted_response)
    return {"messages": messages["messages"], "stream": stream, "session_state": session_state, "context": context}


In [None]:
# initialize the adversarial simulator
protected_material_simulator = AdversarialSimulator(azure_ai_project=azure_ai_project, credential=credential)

outputs = await protected_material_simulator(
    scenario=AdversarialScenario.ADVERSARIAL_CONTENT_PROTECTED_MATERIAL,
    max_conversation_turns=1,  # define the number of conversation turns
    max_simulation_results=3,  # define the number of simulation results
    target=callback,  # define the target model callback
)

In [None]:
# Manually convert the data to JSON lines format
result = "\n".join([json.dumps(item) for item in outputs])
print(result)

In [None]:
# initialize the adversarial simulator
qna_simulator = AdversarialSimulator(azure_ai_project=azure_ai_project, credential=credential)

outputs = await qna_simulator(
    scenario=AdversarialScenario.ADVERSARIAL_QA,
    max_conversation_turns=1,  # define the number of conversation turns
    max_simulation_results=3,  # define the number of simulation results
    target=callback,  # define the target model callback
)

In [None]:
# Manually convert the data to JSON lines format
result = "\n".join([json.dumps(item) for item in outputs])
print(result)