# Evaluate Metrics
This notebook evaluates RAG metrics using IBM watsonx.governance SDK. It can evaluate RAG metrics by taking in the data containing contexts, question, answer and ground truth(Optional) information. The metrics result can be visualized using the ModelInsights.

This notebook should be run in Python 3.10 or greater runtime environment.

## Learning goals

- Computes RAG metrics using evaluate_metrics.
- Visualize metrics result using ModelInsights.

## Contents

- [Step 1 - Install libraries](#install)
- [Step 2 - Configuration](#configuration)
- [Step 3 - Evaluate Metrics](#evaluate)
- [Step 4 - Display the results](#display)





## Install libraries<a name="install"></a>

Install the required packages

In [None]:
!pip install ibm-watsonx-gov[metrics,visualization]
import warnings
warnings.filterwarnings("ignore")

Note: you may need to restart the kernel to use updated packages.

## Configuration <a name="configuration"></a>

Configure your watsonx.governance credentials


In [None]:
from ibm_watsonx_gov.config import Credentials

credentials = Credentials(
    url="<EDIT_THIS>",
    api_key="<EDIT_THIS>",
    service_instance_id="<EDIT_THIS>",

    # Uncomment the following attributes when using watsonx.governance
    # username="<EDIT_THIS>",
    # version="<EDIT_THIS>",
    # disable_ssl="<EDIT_THIS>",
)

## Evaluate Metrics<a name="evaluate"></a>

#### Read RAG data from a file or from an application invoking the model and generating responses.

Here we read the data containing the application input and output from a file.

In [None]:
import pandas as pd
input_df = pd.read_csv("https://raw.githubusercontent.com/IBM/ibm-watsonx-gov/refs/heads/samples/notebooks/data/rag/rag_with_ground_truth.csv")
input_df.head()

Create metrics configuration.

In [None]:
from ibm_watsonx_gov.config import GenAIConfiguration
from ibm_watsonx_gov.metrics import ContextRelevanceMetric, FaithfulnessMetric, AnswerSimilarityMetric
from ibm_watsonx_gov.entities.enums import TaskType

question_field = "question"
context_field = "contexts"

config = GenAIConfiguration(
    input_fields=[question_field, context_field],
    question_field=question_field,
    context_fields=[context_field],
    output_fields=["answer"],
    reference_fields=["ground_truth", "answer"],
    task_type=TaskType.RAG,
)

metrics = [
    ContextRelevanceMetric(method="sentence_bert_mini_lm"),
    FaithfulnessMetric(method="token_k_precision"),
    FaithfulnessMetric(method="sentence_bert_mini_lm"),
    AnswerSimilarityMetric(method="token_recall"),
    AnswerSimilarityMetric(method="sentence_bert_mini_lm"),
]

Run the metrics evaluation

In [None]:
from ibm_watsonx_gov.evaluate import evaluate_metrics

evaluation_result = evaluate_metrics(
    credentials=credentials,
    configuration=config,
    metrics=metrics,
    data=input_df,
    output_format="dataframe",
)

## Display the results  <a name="display"></a>

Download the evaluation result

In [None]:
from ibm_watsonx_gov.visualizations import display_table
display_table(evaluation_result)

Display the model insights based on the thresholds specified in the metrics configuration

In [None]:
%matplotlib ipympl
from ibm_watsonx_gov.visualizations import ModelInsights

model_insights = ModelInsights(configuration=config, metrics=metrics)
model_insights.display_metrics(metrics_result=evaluation_result)