# Evaluate Metrics
This notebook evaluates RAG metrics using IBM watsonx.governance SDK. It can evaluate RAG metrics by taking in the data containing contexts, question, answer and ground truth (Optional) information. The metrics result will be visualized using the `ModelInsights`.

This notebook should be run in Python 3.10 or greater runtime environment.

## Learning goals

- Compute RAG metrics using `evaluate_metrics`.
- Visualize metrics result using `ModelInsights`.

## Contents

- [Step 1 - Install libraries](#install)
- [Step 2 - Configuration](#configuration)
- [Step 3 - Evaluate Metrics](#evaluate)
- [Step 4 - Display the results](#display)





## Install libraries<a name="install"></a>

Install the required packages

In [None]:
%pip install "ibm-watsonx-gov[metrics,visualization]"
import warnings
warnings.filterwarnings("ignore")

Note: you may need to restart the kernel to use updated packages.

## Configuration <a name="configuration"></a>

### Configure your watsonx.governance credentials

In [None]:
from ibm_watsonx_gov.config import Credentials

credentials = Credentials(
    url="<EDIT_THIS>",
    api_key="<EDIT_THIS>",
    service_instance_id="<EDIT_THIS>",

    # Uncomment the following attributes when using watsonx.governance
    # username="<EDIT_THIS>",
    # version="<EDIT_THIS>",
    # disable_ssl="<EDIT_THIS>",
)

## Evaluate Metrics<a name="evaluate"></a>

In this step the input to the RAG application, its generated output, and the retrieved contexts are loaded. To get started, please use the sample CSV dataset file provided in this notebook. However, if you want to use your own RAG application and would like to evaluate it's output, the dataset can be loaded in the following cell and the configuration can be updated to read the specific columns in the dataset.

### Loading your test data

In this step, you can load your own sample RAG dataset in `input_df` and update the configuration object in the next step to match the column names from the dataset.

In [None]:
import pandas as pd
input_df = pd.read_csv("https://raw.githubusercontent.com/IBM/ibm-watsonx-gov/refs/heads/samples/notebooks/data/rag/rag_with_ground_truth.csv")
input_df.head()

### Configure the evaluator

Once the dataset is loaded, need to configure the evaluator to specify what are the columns of interests in the data set and to specify which metrics to evaluate

#### Dataset columns

To configure the evaluator for the dataset, please update `question_field` to be the column name that contains the questions, `context_field` to be a list of column names that contais the contexts, and `output_fields` to have the column that contain the generated answer. Optionally, you can add the ground truth column name under `reference_fields`.

#### Metrics

You can select the metrics to evaluate and using which method under `metrics` list.

In [None]:
from ibm_watsonx_gov.config import GenAIConfiguration
from ibm_watsonx_gov.metrics import ContextRelevanceMetric, FaithfulnessMetric, AnswerSimilarityMetric
from ibm_watsonx_gov.entities.enums import TaskType

question_field = "question"
context_field = "contexts"

config = GenAIConfiguration(
    input_fields=[question_field, context_field],
    question_field=question_field,
    context_fields=[context_field],
    output_fields=["answer"],
    reference_fields=["ground_truth"],
    task_type=TaskType.RAG,
)

metrics = [
    ContextRelevanceMetric(method="sentence_bert_mini_lm"),
    FaithfulnessMetric(method="token_k_precision"),
    FaithfulnessMetric(method="sentence_bert_mini_lm"),
    AnswerSimilarityMetric(method="token_recall"),
    AnswerSimilarityMetric(method="sentence_bert_mini_lm"),
]

### Run the metrics evaluation

Pass the credentials, configuration, metrics, and the RAG dataset to `evaluate_metrics()` to start the evaluation process.

In [None]:
from ibm_watsonx_gov.evaluate import evaluate_metrics

evaluation_result = evaluate_metrics(
    credentials=credentials,
    configuration=config,
    metrics=metrics,
    data=input_df,
    output_format="dataframe",
)

## Display the results  <a name="display"></a>

### Display the result table

Now that the evaluation is done, display a table containing each record with its metric values.

In [None]:
from ibm_watsonx_gov.visualizations import display_table
display_table(evaluation_result)

### `ModelInsights` visualization

Finally, the evaluation result is displayed interactively using `ModelInsights`

In [None]:
%matplotlib ipympl
from ibm_watsonx_gov.visualizations import ModelInsights

model_insights = ModelInsights(configuration=config, metrics=metrics)
model_insights.display_metrics(metrics_result=evaluation_result)