# Basic RAG Eval

Example of how to evaluate a RAG system using the [ragas](https://github.com/explodinggradients/ragas) evaluation tool. Uses `gpt-3.5` ad default model for evaluation.

Tutorial adopted from here: https://docs.ragas.io/en/stable/getstarted/evaluation.html#get-started-evaluation

<a href="https://colab.research.google.com/github/dair-ai/maven-pe-for-llms-8/blob/main/demos/session-3/rag-eval.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
%%capture
# update or install the necessary libraries
!pip install --upgrade openai
!pip install --upgrade python-dotenv
!pip install --upgrade langchain
!pip install datasets
!pip install ragas
!pip install pandas

In [8]:
import os
from dotenv import load_dotenv
import pandas as pd
load_dotenv()


os.environ["OPENAI_API_KEY"] = os.getenv("OPENAI_API_KEY")

Get some data using the dataset library:

In [3]:
from datasets import load_dataset

fiqa_eval = load_dataset("explodinggradients/fiqa", "ragas_eval")
fiqa_eval

  from .autonotebook import tqdm as notebook_tqdm
Downloading data: 100%|██████████| 115k/115k [00:00<00:00, 125kB/s]
Generating baseline split: 100%|██████████| 30/30 [00:00<00:00, 5953.31 examples/s]


DatasetDict({
    baseline: Dataset({
        features: ['question', 'ground_truths', 'answer', 'contexts'],
        num_rows: 30
    })
})

A summary of the information you need to run the evaluation:

- question: These are the questions your RAG pipeline will be evaluated on.

- answer: The answer generated from the RAG pipeline and given to the user.

- contexts: The contexts which were passed into the LLM to answer the question.

- ground_truths: The ground truth answer to the questions. (only required if you are using context_recall)

In [14]:
fiqa_eval["baseline"].to_pandas().head()

Unnamed: 0,question,ground_truths,answer,contexts
0,How to deposit a cheque issued to an associate...,[Have the check reissued to the proper payee.J...,\nThe best way to deposit a cheque issued to a...,[Just have the associate sign the back and the...
1,Can I send a money order from USPS as a business?,[Sure you can. You can fill in whatever you w...,"\nYes, you can send a money order from USPS as...",[Sure you can. You can fill in whatever you w...
2,1 EIN doing business under multiple business n...,[You're confusing a lot of things here. Compan...,"\nYes, it is possible to have one EIN doing bu...",[You're confusing a lot of things here. Compan...
3,Applying for and receiving business credit,"[""I'm afraid the great myth of limited liabili...",\nApplying for and receiving business credit c...,[Set up a meeting with the bank that handles y...
4,401k Transfer After Business Closure,[You should probably consult an attorney. Howe...,\nIf your employer has closed and you need to ...,[The time horizon for your 401K/IRA is essenti...


In [20]:
fiqa_eval["baseline"].to_pandas().head()[:1].to_dict()

{'question': {0: 'How to deposit a cheque issued to an associate in my business into my business account?'},
 'ground_truths': {0: array(["Have the check reissued to the proper payee.Just have the associate sign the back and then deposit it.  It's called a third party cheque and is perfectly legal.  I wouldn't be surprised if it has a longer hold period and, as always, you don't get the money if the cheque doesn't clear. Now, you may have problems if it's a large amount or you're not very well known at the bank.  In that case you can have the associate go to the bank and endorse it in front of the teller with some ID.  You don't even technically have to be there.  Anybody can deposit money to your account if they have the account number. He could also just deposit it in his account and write a cheque to the business."],
        dtype=object)},
 'answer': {0: '\nThe best way to deposit a cheque issued to an associate in your business into your business account is to open a business acco

### Evaluation

![alt text](ragas.png)

Retriever: offers `context_precision` and `context_recall` which give you the measure of the performance of your retrieval system.

Generator (LLM): offers `faithfulness` which measures hallucinations and `answer_relevancy` which measures how to the point the answers are to the question.

The main metrics:

- `faithfulness` - the factual consistency of the answer to the context base on the question.

- `answer_relevancy` - a measure of how relevant the answer is to the question

- `context_precision` - a measure of how relevant the retrieved context is to the question. Conveys quality of the retrieval pipeline.

- `context_recall` - measures the ability of the retriever to retrieve all the necessary information needed to answer the question.

In [21]:
# the selected metrics
from ragas.metrics import (
    answer_relevancy,
    faithfulness,
    context_recall,
    context_precision,
)

In [22]:
from ragas import evaluate

result = evaluate(
    fiqa_eval["baseline"].select(range(3)), # selecting only 3
    metrics=[
        context_precision,
        faithfulness,
        answer_relevancy,
        context_recall,
    ],
)

result

evaluating with [context_precision]


100%|██████████| 1/1 [00:11<00:00, 11.70s/it]


evaluating with [faithfulness]


100%|██████████| 1/1 [01:06<00:00, 66.56s/it]


evaluating with [answer_relevancy]


100%|██████████| 1/1 [00:07<00:00,  7.25s/it]


evaluating with [context_recall]


100%|██████████| 1/1 [02:40<00:00, 160.93s/it]


{'context_precision': 0.3333, 'faithfulness': 0.9444, 'answer_relevancy': 0.9756, 'context_recall': 0.9524}

Analysis:

In [24]:
df = result.to_pandas()
df

Unnamed: 0,question,ground_truths,answer,contexts,context_precision,faithfulness,answer_relevancy,context_recall
0,How to deposit a cheque issued to an associate...,[Have the check reissued to the proper payee.J...,\nThe best way to deposit a cheque issued to a...,[Just have the associate sign the back and the...,1.0,1.0,0.982502,0.857143
1,Can I send a money order from USPS as a business?,[Sure you can. You can fill in whatever you w...,"\nYes, you can send a money order from USPS as...",[Sure you can. You can fill in whatever you w...,0.0,0.833333,0.995249,1.0
2,1 EIN doing business under multiple business n...,[You're confusing a lot of things here. Compan...,"\nYes, it is possible to have one EIN doing bu...",[You're confusing a lot of things here. Compan...,0.0,1.0,0.94892,1.0


In [None]:
s