# RAG Evaluation Basic Demo

This shows a basic RAG evaluation demo based on the [Ragas](https://github.com/explodinggradients/ragas) framework.

## Set up the evaluation environment

In [1]:
from datasets import Dataset 
from llama_index.core import ServiceContext, set_global_service_context, set_global_handler
from llama_index.core.node_parser import SentenceSplitter
import os
from pathlib import Path
from ragas import evaluate
from ragas.metrics import faithfulness, answer_correctness, context_precision

This implementation requires an OpenAI key.

In [2]:
try:
    f = open(Path.home() / ".openai.key", "r")
    os.environ["OPENAI_API_KEY"] = f.read().rstrip("\n")
    f.close()
except Exception as err:
    print(f"Could not read your OpenAI API key. If you wish to run RAG evaluation, please make sure this is available in plain text under your home directory in ~/.openai.key: {err}")

## Setup the data samples we'll use to evaluate our RAG pipeline

In the `data_samples` structure below, the **answer** attribute contains the answers that a RAG pipeline might have returned to the questions asked under **question**. Try changing these answers to see how that affects the score in the next section.

In [8]:
data_samples = {
    'question': [
        'When was the first super bowl?', 
        'Who won the most super bowls?'
    ],
    'answer': [
        'The first superbowl was held on Jan 15, 1967', 
        'The most super bowls have been won by The New England Patriots'
    ],
    'contexts' : [
        [
            'The First AFL–NFL World Championship Game was an American football game played on January 15, 1967, at the Los Angeles Memorial Coliseum in Los Angeles,'
        ], 
        [
            'The Green Bay Packers...Green Bay, Wisconsin.',
            'The Packers compete...Football Conference'
        ]
    ],
    'ground_truth': [
        'The first superbowl was held on January 15, 1967', 
        'The New England Patriots have won the Super Bowl a record six times'
    ]
}

## Now evaluate the RAG pipeline

In [10]:
dataset = Dataset.from_dict(data_samples)

score = evaluate(dataset,metrics=[faithfulness,context_precision,answer_correctness])
score.to_pandas()

Evaluating:   0%|          | 0/6 [00:00<?, ?it/s]

Unnamed: 0,question,answer,contexts,ground_truth,faithfulness,context_precision,answer_correctness
0,When was the first super bowl?,"The first superbowl was held on Jan 15, 1967",[The First AFL–NFL World Championship Game was...,"The first superbowl was held on January 15, 1967",0.0,1.0,0.749093
1,Who won the most super bowls?,The most super bowls have been won by The New ...,"[The Green Bay Packers...Green Bay, Wisconsin....",The New England Patriots have won the Super Bo...,,0.0,0.731086
