# Evaluate Metrics
This notebook evaluates RAG metrics using IBM watsonx.governance SDK. It can evaluate RAG metrics by taking in the data containing contexts, question, answer and ground truth (Optional) information. The metrics result will be visualized using the `ModelInsights`.

This notebook should be run in Python 3.10 or greater runtime environment.

## Learning goals

- Compute RAG metrics using `evaluate_metrics`.
- Visualize metrics result using `ModelInsights`.

## Contents

- [Step 1 - Install libraries](#install)
- [Step 2 - Configuration](#configuration)
- [Step 3 - Evaluate Metrics](#evaluate)
- [Step 4 - Display the results](#display)





## Install libraries<a name="install"></a>

Install the required packages

In [None]:
%pip install "ibm-watsonx-gov[metrics,visualization]"
import warnings
warnings.filterwarnings("ignore")

Note: you may need to restart the kernel to use updated packages.

## Configuration <a name="configuration"></a>

### Configure your watsonx.governance credentials

In [None]:
from ibm_watsonx_gov.config import Credentials

credentials = Credentials(
    url="<EDIT_THIS>",
    api_key="<EDIT_THIS>",
    service_instance_id="<EDIT_THIS>",

    # Uncomment the following attributes when using watsonx.governance
    # username="<EDIT_THIS>",
    # version="<EDIT_THIS>",
    # disable_ssl="<EDIT_THIS>",
)

## Evaluate Metrics<a name="evaluate"></a>

In this step the input to the RAG application, its generated output, and the retrieved contexts are loaded. To get started, please use the sample CSV dataset file provided in this notebook. However, if you want to use your own RAG application and would like to evaluate it's output, the dataset can be loaded in the following cell and the configuration can be updated to read the specific columns in the dataset.

### Loading your test data

In this step, you can load your own sample RAG dataset in `input_df` and update the configuration object in the next step to match the column names from the dataset.

In [None]:
import pandas as pd
input_df = pd.read_csv("https://raw.githubusercontent.com/IBM/ibm-watsonx-gov/refs/heads/samples/notebooks/data/rag/rag_with_ground_truth.csv")
input_df.head()

### Configure the evaluator

Once the dataset is loaded, need to configure the evaluator to specify what are the columns of interests in the data set and to specify which metrics to evaluate

#### Dataset columns

To configure the evaluator for the dataset, please update `question_field` to be the column name that contains the questions, `context_field` to be a list of column names that contains the contexts, and `output_fields` to have the column that contain the generated answer. Optionally, you can add the ground truth column name under `reference_fields`.

#### Metrics

You can provide the metrics to evaluate in the `metrics` list. The metric evaluation method(i.e. the technique used to compute the metric) can be provided by specifying the `method` value when creating the metric object. In addition, you can also provide the metric groups under `metric_groups` which will evaluate all the metrics belonging to the specified group(s).

In [None]:
from ibm_watsonx_gov.config import GenAIConfiguration
from ibm_watsonx_gov.metrics import ContextRelevanceMetric, FaithfulnessMetric, AnswerSimilarityMetric, AnswerRelevanceMetric
from ibm_watsonx_gov.entities.enums import TaskType, MetricGroup

question_field = "question"
context_field = "contexts"

config = GenAIConfiguration(
    input_fields=[question_field, context_field],
    question_field=question_field,
    context_fields=[context_field],
    output_fields=["answer"],
    reference_fields=["ground_truth"],
    task_type=TaskType.RAG,
)

metrics = [
    ContextRelevanceMetric(),
    FaithfulnessMetric(),
    AnswerSimilarityMetric(),
    AnswerRelevanceMetric(),
]

metric_groups = [
    MetricGroup.RETRIEVAL_QUALITY,
]

### Run the metrics evaluation

Create `MetricEvaluator` instance using the configuration and credentials. After that compute the desired metrics by invoking `evaluate()` method.

In [None]:
from ibm_watsonx_gov.clients.api_client import APIClient
from ibm_watsonx_gov.evaluators import MetricsEvaluator

evaluator = MetricsEvaluator(
    api_client=APIClient(credentials=credentials),
    configuration=config,
)

evaluation_result = evaluator.evaluate(
    data=input_df,
    metrics=metrics,
    metric_groups=metric_groups
)

## Display the results  <a name="display"></a>

### Display the result table

Now that the evaluation is done, display a table containing each record with its metric values.

In [None]:
evaluator.display_table()

### `ModelInsights` visualization

Finally, the evaluation result is displayed interactively using `ModelInsights`

Notes:
- The interactive visualization is supported only with Jupyter lab.
- If the diagram is not displayed correctly or has missing components, please refresh the browser page.

In [None]:
%matplotlib ipympl
evaluator.display_insights()