# Cloud evaluation: Evaluating AI app data remotely in the cloud 

## Objective

This tutorial provides a step-by-step guide on how to evaluate data generated by AI applications or LLMs remotely in the cloud. 

This tutorial uses the following Azure AI services:

- [Azure AI Safety Evaluation](https://aka.ms/azureaistudiosafetyeval)
- [azure-ai-evaluation](https://learn.microsoft.com/en-us/azure/ai-studio/how-to/develop/evaluate-sdk)

## Time

You should expect to spend 20 minutes running this sample. 

## About this example

This example demonstrates the cloud evaluation of query and response pairs that were generated by an AI app or a LLM. It is important to have access to AzureOpenAI credentials and an AzureAI project. **To create data to use in your own evaluation, learn more [here](https://learn.microsoft.com/en-us/azure/ai-studio/how-to/develop/simulator-interaction-data)** . This example demonstrates: 

- Single-instance, triggered cloud evaluation on a test dataset (to be used for pre-deployment evaluation of an AI application).

## Before you begin
### Prerequesite
- Have an Azure OpenAI Deployment with GPT model supporting `chat completion`, for example `gpt-4`.
- Make sure you're first logged into your Azure subscription by running `az login`.
- You have some test data you want to evaluate, which includes the user queries and responses (and perhaps context, or ground truth) from your AI applications. See [data requirements for our built-in evaluators](https://learn.microsoft.com/en-us/azure/ai-studio/how-to/develop/evaluate-sdk#data-requirements-for-built-in-evaluators). Alternatively, if you want to simulate data against your application endpoints using Azure AI Evaluation SDK, see our samples on simulation. 

### Installation

Install the following packages required to execute this notebook. 

In [None]:
%pip install -U azure-identity
%pip install -U azure-ai-projects
%pip install -U azure-ai-evaluation

In [None]:
from azure.ai.projects import AIProjectClient
from azure.identity import DefaultAzureCredential
from azure.ai.projects.models import (
    Evaluation,
    Dataset,
    EvaluatorConfiguration,
    ConnectionType,
)
from azure.ai.evaluation import F1ScoreEvaluator, ViolenceEvaluator

### Configuration

Set the following variables for use in this notebook:

In [None]:
azure_ai_connection_string = "<your-connection-string>"  # At the moment, it should be in the format "<Region>.api.azureml.ms;<AzureSubscriptionId>;<ResourceGroup>;<HubName>" Ex: eastus2.api.azureml.ms;xxxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxxxxx;rg-sample;sample-project-eastus2
azure_openai_deployment = "<your-azure-openai-deployment>"  # Your AOAI resource, you must use an AOAI GPT model
azure_openai_api_version = "<your-azure-openai-api-version>"

### Connect to your Azure Open AI deployment
To evaluate your LLM-generated data remotely in the cloud, we must connect to your Azure Open AI deployment. This deployment must be a GPT model which supports `chat completion`, such as `gpt-4`. To see the proper value for `conn_str`, navigate to the connection string at the "Project Overview" page for your Azure AI project. 

In [None]:
project_client = AIProjectClient.from_connection_string(
    credential=DefaultAzureCredential(),
    conn_str=azure_ai_connection_string,
)

In [None]:
# Connect to your AOAI resource, you must use an AOAI GPT model
default_connection = project_client.connections.get_default(connection_type=ConnectionType.AZURE_OPEN_AI)
model_config = default_connection.to_evaluator_model_config(
    deployment_name=azure_openai_deployment, api_version=azure_openai_api_version
)

### Data
The following code demonstrates how to upload the data for evaluation to your Azure AI project. Below we use `evaluate_test_data.jsonl` which exemplifies LLM-generated data in the query-response format expected by the Azure AI Evaluation SDK. For your use case, you should upload data in the same format, which can be generated using the [`Simulator`](https://learn.microsoft.com/en-us/azure/ai-studio/how-to/develop/simulator-interaction-data) from Azure AI Evaluation SDK. 

Alternatively, if you already have an existing dataset for evaluation, you can use that by finding the link to your dataset in your [registry](https://ml.azure.com/registries) or find the dataset ID.

In [None]:
# # Upload data for evaluation
data_id, _ = project_client.upload_file("./evaluate_test_data.jsonl")
# data_id = "azureml://registries/<registry>/data/<dataset>/versions/<version>"
# To use an existing dataset, replace the above line with the following line
# data_id = "<dataset_id>"

### Configure Evaluators to Run
The code below demonstrates how to configure the evaluators you want to run. In this example, we use the `F1ScoreEvaluator`, `RelevanceEvaluator` and the `ViolenceEvaluator`, but all evaluators supported by [Azure AI Evaluation](https://learn.microsoft.com/en-us/azure/ai-studio/concepts/evaluation-metrics-built-in?tabs=warning) are supported by cloud evaluation and can be configured here. You can either import the classes from the SDK and reference them with the `.id` property, or you can find the fully formed `id` of the evaluator in the AI Studio registry of evaluators, and use it here. 

In [None]:
# id for each evaluator can be found in your AI Studio registry - please see documentation for more information
# init_params is the configuration for the model to use to perform the evaluation
# data_mapping is used to map the output columns of your query to the names required by the evaluator
evaluators = {
    "f1_score": EvaluatorConfiguration(
        id=F1ScoreEvaluator.id,
    ),
    "relevance": EvaluatorConfiguration(
        id="azureml://registries/azureml-staging/models/Relevance-Evaluator/versions/4",
        init_params={"model_config": model_config},
        data_mapping={"query": "${data.Input}", "response": "${data.Output}"},
    ),
    "violence": EvaluatorConfiguration(
        id=ViolenceEvaluator.id,
        init_params={"azure_ai_project": project_client.scope},
        data_mapping={"query": "${data.Input}", "response": "${data.Output}"},
    ),
}

### Create cloud evaluation
Below we demonstrate how to trigger a single-instance Cloud Evaluation remotely on a test dataset. This can be used for pre-deployment testing of your AI application. 
 
Here we pass in the `data_id` we would like to use for the evaluation. 

In [None]:
evaluation = Evaluation(
    display_name="Cloud Evaluation",
    description="Cloud Evaluation of dataset",
    data=Dataset(id=data_id),
    evaluators=evaluators,
)

# Create evaluation
evaluation_response = project_client.evaluations.create(
    evaluation=evaluation,
)

In [None]:
# Get evaluation
get_evaluation_response = project_client.evaluations.get(evaluation_response.id)

print("----------------------------------------------------------------")
print("Created evaluation, evaluation ID: ", get_evaluation_response.id)
print("Evaluation status: ", get_evaluation_response.status)
print("AI Foundry Portal URI: ", get_evaluation_response.properties["AiStudioEvaluationUri"])
print("----------------------------------------------------------------")