# Remote Evaluations: Evaluating in the Cloud 

## Objective

This tutorial provides a step-by-step guide on how to evaluate data generated by LLMs remotely in the cloud. 

This tutorial uses the following Azure AI services:

- [Azure AI Safety Evaluation](https://aka.ms/azureaistudiosafetyeval)
- [azure-ai-evaluation](https://learn.microsoft.com/en-us/azure/ai-studio/how-to/develop/evaluate-sdk)

## Time

You should expect to spend 20 minutes running this sample. 

## About this example

This example demonstrates the remote evaluation of query and response pairs that were generated by an LLM model. It is important to have access to AzureOpenAI credentials and an AzureAI project. **To create data to use in your own evaluation, learn more [here](https://learn.microsoft.com/en-us/azure/ai-studio/how-to/develop/simulator-interaction-data)** . This example demonstrates: 

- Single-instance, triggered Remote Evaluation (to be used for pre-deployment evaluation of LLMs)

## Before you begin
### Prerequesite
- [Have an online deployment on Azure Open AI studio supporting `chat completion` such as `gpt-4`](https://learn.microsoft.com/en-us/azure/machine-learning/concept-endpoints-online?view=azureml-api-2)
- You also might want to evaluate data generated by your LLM, to see how to generate data to be evaluated using the Azure AI Evaluation SDK, see our samples on simulation 

### Installation

Install the following packages required to execute this notebook. 

In [None]:
# %pip uninstall azure-ai-project azure-ai-ml azure-ai-evaluation
# %pip install azure-identity
# %pip install azure-ai-project
# %pip install azure-ai-evaluation

In [None]:
from azure.ai.project import AIProjectClient
from azure.identity import DefaultAzureCredential
from azure.ai.project.models import Evaluation, Dataset, EvaluatorConfiguration, ConnectionType
from azure.ai.evaluation import F1ScoreEvaluator, RelevanceEvaluator, ViolenceEvaluator

### Connect to your Azure Open AI deployment
To evaluate your LLM-generated data remotely in the cloud, we must first connect to your Azure Open AI deployment. This deployment must be a GPT model which supports `chat completion`, such as `gpt-4`. To see the connection string, navigate to the "Project Overview" page for your Azure AI project. 

In [None]:
project_client = AIProjectClient.from_connection_string(
    credential=DefaultAzureCredential(),
    conn_str="<connection_string>",  # At the moment, it should be in the format "<Region>.api.azureml.ms;<AzureSubscriptionId>;<ResourceGroup>;<HubName>" Ex: eastus2.api.azureml.ms;xxxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxxxxx;rg-sample;sample-project-eastus2
)

In [None]:
deployment_name = "<deployment_name>"
api_version = "<api_version>"
default_connection = project_client.connections.get_default(connection_type=ConnectionType.AZURE_OPEN_AI)
model_config = default_connection.to_evaluator_model_config(deployment_name=deployment_name, api_version=api_version)

### Data
The following code demonstrates how to upload the data for evaluation to your Azure AI project. Below we use `evaluate_test_data.jsonl` which exemplifies LLM-generated data in the query-response format expected by the Azure AI Evaluation SDK. For your use case, you should upload data in the same format, which can be generated using the [`Simulator`](https://learn.microsoft.com/en-us/azure/ai-studio/how-to/develop/simulator-interaction-data) from Azure AI Evaluation SDK. 

Alternatively, if you already have an existing dataset for evaluation, you can use that by finding the link to your dataset in your [registry](https://ml.azure.com/registries) or find the dataset ID.

In [None]:
# Upload data for evaluation
data_id = project_client.upload_file("./evaluate_test_data.jsonl")
# data_id = "azureml://registries/<registry_name>/data/<dataset_name>/versions/1"
# To use an existing dataset, replace the above line with the following line
# data_id = "<dataset_id>"

### Evaluate in the Cloud with Remote Evaluation
Below we demonstrate how to trigger a single-instance Remote Evaluation in the Cloud. This can be used for pre-deployment testing of an LLM. 
 
Here we pass in the `data_id` we would like to use for the evaluation and the `EvaluatorConfiguration` for each of the evaluators we would like to include. Below we demonstrate how to use the `F1ScoreEvaluator`, `RelevanceEvaluator`, and the `ViolenceEvaluator`

In [None]:
evaluation = Evaluation(
    display_name="Remote Evaluation",
    description="Evaluation of dataset",
    data=Dataset(id=data_id),
    evaluators={
        "f1_score": EvaluatorConfiguration(
            id=F1ScoreEvaluator.id,
        ),
        "relevance": EvaluatorConfiguration(
            id=RelevanceEvaluator.id,
            init_params={"model_config": model_config},
        ),
        "violence": EvaluatorConfiguration(
            id=ViolenceEvaluator.id,
            init_params={"azure_ai_project": project_client.scope},
        ),
    },
)

# Create evaluation
evaluation_response = project_client.evaluations.create(
    evaluation=evaluation,
)

In [None]:
# Get evaluation
get_evaluation_response = project_client.evaluations.get(evaluation_response.id)

print("----------------------------------------------------------------")
print("Created evaluation, evaluation ID: ", get_evaluation_response.id)
print("Evaluation status: ", get_evaluation_response.status)
print("AI Foundry Portal URI: ", get_evaluation_response.properties["AiFoundryPortalUri"])
print("----------------------------------------------------------------")