# RAG Evaluation Test Set Generation

This example shows how to use the [Ragas](https://github.com/explodinggradients/ragas) framework to generate a **test set** that can be used to evaluate the quality of a RAG pipeline. We then use the Python [langchain](https://python.langchain.com/docs/get_started/introduction) library to run some requests through this pipeline and we evaluate the quality of the results.

**Requirements:**
- You will need an access key to OpenAI's API key, which you can sign up for at (https://dashboard.cohere.com/welcome/login). A free trial account will suffice, but will be limited to a small number of requests.
- After obtaining this key, store it in plain text in your home in directory in the `~/.openai.key` file.
- (Optional) Upload some pdf files into the `source_documents` subfolder under this notebook. We have already provided some sample pdfs, but feel free to replace these with your own.

## Set up the RAG workflow environment

In [1]:
from datasets import Dataset 
from getpass import getpass
from langchain.chains import RetrievalQA
from langchain.document_loaders.pdf import PyPDFDirectoryLoader
from langchain.retrievers import ContextualCompressionRetriever
from langchain.retrievers.document_compressors import EmbeddingsFilter
from langchain.schema import HumanMessage
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_openai import ChatOpenAI, OpenAIEmbeddings
from langchain_community.vectorstores import FAISS
import numpy as np
import os
from pathlib import Path

from ragas import evaluate
from ragas.metrics import faithfulness, answer_correctness, context_precision
from ragas.testset.generator import TestsetGenerator
from ragas.testset.evolutions import simple, reasoning, multi_context

Set up some helper functions:

In [2]:
def pretty_print_docs(docs):
    print(
        f"\n{'-' * 100}\n".join(
            [f"Document {i+1}:\n\n" + d.page_content for i, d in enumerate(docs)]
        )
    )

Make sure other necessary items are in place:

In [3]:
try:
    os.environ["OPENAI_API_KEY"] = open(Path.home() / ".openai.key", "r").read().strip()
except Exception as err:
    print(f"Could not read your OpenAI key. Please make sure this is available in plain text under your home directory in ~/.openai.key: {err}")

# Look for the source_documents folder and make sure there is at least 1 pdf file here
contains_pdf = False
documents_path = "./source_documents"
if not os.path.exists(documents_path):
    print(f"ERROR: The {documents_path} subfolder must exist under this notebook")
for filename in os.listdir(documents_path):
    contains_pdf = True if ".pdf" in filename else contains_pdf
if not contains_pdf:
    print(f"ERROR: The {documents_path} subfolder must contain at least one .pdf file")

## Generate a sythentic test set

Start by loading in the documents we'll be using to augment our RAG generations

In [4]:
loader = PyPDFDirectoryLoader(documents_path)
documents = loader.load()
for document in documents:
    document.metadata['file_name'] = document.metadata['source']

Now use OpenAI to generate a test set from the data in these documents

In [5]:
# Create generator with openai models
generator = TestsetGenerator.with_openai()

# Generate the testset
testset = generator.generate_with_langchain_docs(
    documents, 
    test_size=10, 
    distributions={simple: 0.5, reasoning: 0.25, multi_context: 0.25}
)

embedding nodes:   0%|          | 0/118 [00:00<?, ?it/s]

Generating:   0%|          | 0/10 [00:00<?, ?it/s]

Preview the test dataset so far

In [6]:
testset.to_pandas()

Unnamed: 0,question,contexts,ground_truth,evolution_type,episode_done
0,What is the mission of the Vector Institute an...,[4 \nAnnual Report 2021–22 Vector Institute\n ...,The mission of the Vector Institute is to lead...,simple,True
1,How does Vector aim to ensure safe and secure ...,[ in Vector’s leading work \nin making health ...,Vector aims to ensure safe and secure research...,simple,True
2,What is the purpose of Vector's annual AI Summ...,[28 \n MAKING CAREER \nCONNECTIONS FASTER WITH...,The purpose of Vector's annual AI Summit and C...,simple,True
3,How did the addition of full-time AI Engineeri...,[36 \n \n \n CREATING IMPACT \nTHROUGH APPLIC...,The addition of full-time AI Engineering resou...,simple,True
4,What skills did participants gain in the Vecto...,[37 \nAnnual Report 2021–22 Vector Institute\n...,Participants in the Vector bootcamp gained new...,simple,True
5,Who provides funding for the Vector Institute ...,[4 \nAnnual Report 2021–22 Vector Institute\n ...,The Vector Institute is funded by the Governme...,reasoning,True
6,How does Vector's leading work in making healt...,[ in Vector’s leading work \nin making health ...,Vector's leading work in making health data mo...,reasoning,True
7,"""What is the Vector Institute's mission in dee...",[4 \nAnnual Report 2021–22 Vector Institute\n ...,The Vector Institute's mission is to lead Onta...,multi_context,True
8,How does Vector's approach to data governance ...,[ in Vector’s leading work \nin making health ...,Vector's approach to data governance contribut...,multi_context,True
9,How does the Vector Institute contribute to AI...,[4 \nAnnual Report 2021–22 Vector Institute\n ...,The Vector Institute contributes to AI-based i...,reasoning,True


# Now, start the RAG pipeline!

## Generate answers for all the questions in our test set

Go through the embedding, storage and retrieval steps.

In [7]:
# Split the documents into smaller chunks
text_splitter = RecursiveCharacterTextSplitter(chunk_size=250, chunk_overlap=100)
chunks = text_splitter.split_documents(documents)
print(f"Number of text chunks: {len(chunks)}")

# Define the embeddings model
model_name = "BAAI/bge-small-en-v1.5"
encode_kwargs = {'normalize_embeddings': True} # set True to compute cosine similarity

print(f"Setting up the embeddings model...")
embeddings = OpenAIEmbeddings(model="text-embedding-ada-002")

Number of text chunks: 569
Setting up the embeddings model...


Iterate over the questions in our synthetic testset, and run them each through the RAG pipeline to see what answers get returned.

In [8]:
dataset = testset.to_dataset()
answers = np.empty(len(dataset), dtype=object)
llm = ChatOpenAI()

for index, row in enumerate(dataset):
    query = row["question"]
    
    # Retrieve the most relevant context from the vector store based on the query
    vectorstore = FAISS.from_documents(chunks, embeddings)
    retriever = vectorstore.as_retriever(search_kwargs={"k": 20})
    
    # Run the query via our llm (OpenAI) without reranking
    qa = RetrievalQA.from_chain_type(llm=llm,
            chain_type="stuff",
            retriever=retriever)
    answer = qa.run(query=query)
    print(f"Result {index}\nQuestion: {query}\nAnswer: {answer}\n")
   
    # Store the result
    answers[index] = answer
    
    # Let's skip the reranking in this example to keep it fast.
    # Uncomment the following lines to see how reranking will affect the results.
    """
    embeddings = OpenAIEmbeddings()
    embeddings_filter = EmbeddingsFilter(embeddings=embeddings, similarity_threshold=0.76)
    compression_retriever = ContextualCompressionRetriever(
        base_compressor=embeddings_filter, base_retriever=retriever
    )
    compressed_docs = compression_retriever.get_relevant_documents(query)
    
    answer = RetrievalQA.from_chain_type(llm=llm,
        chain_type="stuff",
        retriever=compression_retriever)
        
    dataset[index]["answer"]["answer"] = answer
    print(f"Result:\nQuestion: {query}\nAnswer: {qa.run(query=query)}\n")
    """

  warn_deprecated(


Result 0
Question: What is the mission of the Vector Institute and how does it relate to deep learning and machine learning?
Answer: The mission of the Vector Institute is to drive excellence and leadership in Canada's knowledge, creation, and use of AI to foster economic growth and improve the lives of Canadians. The institute is focused on advancing the frontiers of AI knowledge through research, education, and engineering new applications of AI in various industries. They work with a team of applied machine learning experts to push the boundaries of AI innovation and provide learning opportunities for students, postdoctoral fellows, and recent graduates in machine learning and deep learning. Additionally, Vector Institute is committed to developing programming for Black and Indigenous students to expand career pathways in AI for historically underrepresented groups, aiming towards greater inclusion in the field of AI.

Result 1
Question: How does Vector aim to ensure safe and secure

Add the list of answers into our original dataset. Now we have a complete test set that is ready for evaluation.

In [9]:
dataset = dataset.add_column("answer", answers)

## Evaluate the results

In [10]:
# Preview the final dataset
dataset.to_pandas()

Unnamed: 0,question,contexts,ground_truth,evolution_type,episode_done,answer
0,What is the mission of the Vector Institute an...,[4 \nAnnual Report 2021–22 Vector Institute\n ...,The mission of the Vector Institute is to lead...,simple,True,The mission of the Vector Institute is to driv...
1,How does Vector aim to ensure safe and secure ...,[ in Vector’s leading work \nin making health ...,Vector aims to ensure safe and secure research...,simple,True,Vector aims to ensure safe and secure research...
2,What is the purpose of Vector's annual AI Summ...,[28 \n MAKING CAREER \nCONNECTIONS FASTER WITH...,The purpose of Vector's annual AI Summit and C...,simple,True,The purpose of Vector's annual AI Summit and C...
3,How did the addition of full-time AI Engineeri...,[36 \n \n \n CREATING IMPACT \nTHROUGH APPLIC...,The addition of full-time AI Engineering resou...,simple,True,The addition of new full-time AI Engineering r...
4,What skills did participants gain in the Vecto...,[37 \nAnnual Report 2021–22 Vector Institute\n...,Participants in the Vector bootcamp gained new...,simple,True,Participants in the Vector bootcamp on Privacy...
5,Who provides funding for the Vector Institute ...,[4 \nAnnual Report 2021–22 Vector Institute\n ...,The Vector Institute is funded by the Governme...,reasoning,True,The Vector Institute is funded by the Governme...
6,How does Vector's leading work in making healt...,[ in Vector’s leading work \nin making health ...,Vector's leading work in making health data mo...,reasoning,True,Vector's leading work in making health data mo...
7,"""What is the Vector Institute's mission in dee...",[4 \nAnnual Report 2021–22 Vector Institute\n ...,The Vector Institute's mission is to lead Onta...,multi_context,True,The Vector Institute's mission in deep learnin...
8,How does Vector's approach to data governance ...,[ in Vector’s leading work \nin making health ...,Vector's approach to data governance contribut...,multi_context,True,Vector's approach to data governance aims to m...
9,How does the Vector Institute contribute to AI...,[4 \nAnnual Report 2021–22 Vector Institute\n ...,The Vector Institute contributes to AI-based i...,reasoning,True,The Vector Institute contributes to AI-based i...


Run the evaluation query to score the results. In this evaluation, we are looking at the following metrics:
- **Faithfulness**: The generated answer is regarded as faithful if all the claims that are made in the answer can be inferred from the given context(s).
- **Context Precision**: Did our retriever return good results that matched the question it was being asked?
- **Answer Correctness**: Was the generated answer actually correct? Was it complete?

In [11]:
score = evaluate(dataset, metrics=[faithfulness,context_precision,answer_correctness])
score.to_pandas()

Evaluating:   0%|          | 0/30 [00:00<?, ?it/s]

Unnamed: 0,question,contexts,ground_truth,evolution_type,episode_done,answer,faithfulness,context_precision,answer_correctness
0,What is the mission of the Vector Institute an...,[4 \nAnnual Report 2021–22 Vector Institute\n ...,The mission of the Vector Institute is to lead...,simple,True,The mission of the Vector Institute is to driv...,0.8,1.0,0.787471
1,How does Vector aim to ensure safe and secure ...,[ in Vector’s leading work \nin making health ...,Vector aims to ensure safe and secure research...,simple,True,Vector aims to ensure safe and secure research...,0.833333,1.0,0.421652
2,What is the purpose of Vector's annual AI Summ...,[28 \n MAKING CAREER \nCONNECTIONS FASTER WITH...,The purpose of Vector's annual AI Summit and C...,simple,True,The purpose of Vector's annual AI Summit and C...,0.125,1.0,0.804268
3,How did the addition of full-time AI Engineeri...,[36 \n \n \n CREATING IMPACT \nTHROUGH APPLIC...,The addition of full-time AI Engineering resou...,simple,True,The addition of new full-time AI Engineering r...,1.0,1.0,0.621038
4,What skills did participants gain in the Vecto...,[37 \nAnnual Report 2021–22 Vector Institute\n...,Participants in the Vector bootcamp gained new...,simple,True,Participants in the Vector bootcamp on Privacy...,1.0,1.0,0.743473
5,Who provides funding for the Vector Institute ...,[4 \nAnnual Report 2021–22 Vector Institute\n ...,The Vector Institute is funded by the Governme...,reasoning,True,The Vector Institute is funded by the Governme...,0.75,1.0,0.748992
6,How does Vector's leading work in making healt...,[ in Vector’s leading work \nin making health ...,Vector's leading work in making health data mo...,reasoning,True,Vector's leading work in making health data mo...,0.666667,1.0,0.491426
7,"""What is the Vector Institute's mission in dee...",[4 \nAnnual Report 2021–22 Vector Institute\n ...,The Vector Institute's mission is to lead Onta...,multi_context,True,The Vector Institute's mission in deep learnin...,0.714286,1.0,0.708225
8,How does Vector's approach to data governance ...,[ in Vector’s leading work \nin making health ...,Vector's approach to data governance contribut...,multi_context,True,Vector's approach to data governance aims to m...,0.666667,1.0,0.715637
9,How does the Vector Institute contribute to AI...,[4 \nAnnual Report 2021–22 Vector Institute\n ...,The Vector Institute contributes to AI-based i...,reasoning,True,The Vector Institute contributes to AI-based i...,0.428571,1.0,0.821558
