# Evaluation with Data
In this notebook, we introduce built-in evaluators and guide you through creating your own custom evaluators. We'll cover both code-based and prompt-based custom evaluators. Finally, we'll demonstrate how to use the `evaluate` API to assess data using these evaluators.


In [None]:
# Clearing any old installation
# This is important since older version of promptflow has one package.
# Now it is split into number of them.
! pip uninstall -y promptflow promptflow-cli promptflow-azure promptflow-core promptflow-devkit promptflow-tools promptflow-evals

# Install packages in this order
! pip install promptflow-evals

In [None]:
#! pip install azure_ai_ml --extra-index-url https://pkgs.dev.azure.com/azure-sdk/public/_packaging/azure-sdk-for-python/pypi/simple/

# Dependencies needed for some of the notebooks
#! pip install azure-cli
#! pip install bs4
#! pip install ipykernel

Expected env vars

```
AZURE_OPENAI_API_KEY
AZURE_OPENAI_API_VERSION
AZURE_OPENAI_DEPLOYMENT
AZURE_OPENAI_ENDPOINT
```

In [None]:
from dotenv import load_dotenv

load_dotenv()  # take environment variables from .env.

## 0. Prepare eval dataset

In [None]:
dataset_path_hf_eval = "dataset/hf.eval.jsonl"
dataset_path_ft_eval = "dataset/ft.eval.jsonl"

In [None]:
! python ../format.py \
    --input $dataset_path_hf_eval \
    --input-type jsonl \
    --output $dataset_path_ft_eval \
    --output-format eval

## 1. Built-in Evaluators

The table below lists all the built-in evaluators we support. In the following sections, we will select a few of these evaluators to demonstrate how to use them.

| Category       | Namespace                                        | Evaluator Class           | Notes                                             |
|----------------|--------------------------------------------------|---------------------------|---------------------------------------------------|
| Quality        | promptflow.evals.evaluators                      | GroundednessEvaluator     |                                                   |
|                |                                                  | RelevanceEvaluator        |                                                   |
|                |                                                  | CoherenceEvaluator        |                                                   |
|                |                                                  | FluencyEvaluator          |                                                   |
|                |                                                  | SimilarityEvaluator       |                                                   |
|                |                                                  | F1ScoreEvaluator          |                                                   |
| Content Safety | promptflow.evals.evaluators.content_safety       | ViolenceEvaluator         |                                                   |
|                |                                                  | SexualEvaluator           |                                                   |
|                |                                                  | SelfHarmEvaluator         |                                                   |
|                |                                                  | HateUnfairnessEvaluator   |                                                   |
| Composite      | promptflow.evals.evaluators                      | QAEvaluator               | Built on top of individual quality evaluators.    |
|                |                                                  | ChatEvaluator             | Similar to QAEvaluator but designed for evaluating chat messages. |
|                |                                                  | ContentSafetyEvaluator    | Built on top of individual content safety evaluators. |



### 1.1 Quality Evaluator

In [None]:
import os
from promptflow.core import AzureOpenAIModelConfiguration

azure_endpoint=os.environ.get("AZURE_OPENAI_ENDPOINT")
api_key=os.environ.get("AZURE_OPENAI_API_KEY")
azure_deployment=os.environ.get("AZURE_OPENAI_DEPLOYMENT")
api_version=os.environ.get("OPENAI_API_VERSION")

print("azure_endpoint=" + azure_endpoint)
print("azure_deployment=" + azure_deployment)
print("api_version=" + api_version)

# Initialize Azure OpenAI Connection
model_config = AzureOpenAIModelConfiguration(
    azure_endpoint=azure_endpoint,
    api_key=api_key,
    azure_deployment=azure_deployment,
    api_version=api_version,
)

In [None]:
from promptflow.evals.evaluators import RelevanceEvaluator

# Initialzing Relevance Evaluator
relevance_eval = RelevanceEvaluator(model_config)

In [None]:
# Running Relevance Evaluator on single input row
relevance_score = relevance_eval(
    answer="The Alpine Explorer Tent is the most waterproof.",
    context="From the our product list,"
    " the alpine explorer tent is the most waterproof."
    " The Adventure Dining Table has higher weight.",
    question="Which tent is the most waterproof?",
)

In [None]:
print(relevance_score)

## 3. Using Evaluate API to evaluate with data

In previous sections, we walked you through how to use built-in evaluators to evaluate a single row and how to define your own custom evaluators. Now, we will show you how to use these evaluators with the powerful `evaluate` API to assess an entire dataset.

First, let's take a peek at what the data looks like.

In [None]:
import pandas as pd

data_path = "data.jsonl"

df = pd.read_json(data_path, lines=True)
df

Now, we will invoke the `evaluate` API using a few evaluators that we already initialized

Additionally, we have a column mapping to map the `truth` column from the dataset to `ground_truth`, which is accepted by the evaluator.

In [None]:
from promptflow.evals.evaluate import evaluate

result = evaluate(
    data="data.jsonl",
    evaluators={
        "relevance": relevance_eval
    },
    # column mapping
    evaluator_config={
        "default": {
            "ground_truth": "${data.truth}"
        }
    }
)


Finally, let's check the results produced by the evaluate API.

In [None]:
from IPython.display import display, JSON

display(JSON(result))

In [None]:
# Check the results using Azure AI Studio UI
print(result["studio_url"])