# RAG Evaluation Test Set Generation

This example shows how to use the [Ragas](https://github.com/explodinggradients/ragas) framework to generate a **test set** that can be used to evaluate the quality of a RAG pipeline. We then use the Python [langchain](https://python.langchain.com/docs/get_started/introduction) library to run some requests through this pipeline and we evaluate the quality of the results.

**Requirements:**
- You will need an OpenAI access key, which requires a paid account you can sign up for at https://platform.openai.com/signup.
- After obtaining this key, store it in plain text in your home in directory in the `~/.openai.key` file.
- (Optional) Upload some pdf files into the `source_documents` subfolder under this notebook. We have already provided some sample pdfs, but feel free to replace these with your own.

## Set up the RAG workflow environment

In [1]:
from datasets import Dataset 
from getpass import getpass
from langchain.chains import RetrievalQA
from langchain.document_loaders.pdf import PyPDFDirectoryLoader
from langchain.retrievers import ContextualCompressionRetriever
from langchain.retrievers.document_compressors import EmbeddingsFilter
from langchain.schema import HumanMessage
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_openai import ChatOpenAI, OpenAIEmbeddings
from langchain_community.vectorstores import FAISS
import numpy as np
import os
from pathlib import Path

from ragas import evaluate
from ragas.metrics import faithfulness, answer_correctness, context_precision
from ragas.testset.generator import TestsetGenerator
from ragas.testset.evolutions import simple, reasoning, multi_context

Set up some helper functions:

In [2]:
def pretty_print_docs(docs):
    print(
        f"\n{'-' * 100}\n".join(
            [f"Document {i+1}:\n\n" + d.page_content for i, d in enumerate(docs)]
        )
    )

Make sure other necessary items are in place:

In [3]:
try:
    os.environ["OPENAI_API_KEY"] = open(Path.home() / ".openai.key", "r").read().strip()
except Exception as err:
    print(f"Could not read your OpenAI key. Please make sure this is available in plain text under your home directory in ~/.openai.key: {err}")

# Look for the source_documents folder and make sure there is at least 1 pdf file here
contains_pdf = False
documents_path = "./source_documents"
if not os.path.exists(documents_path):
    print(f"ERROR: The {documents_path} subfolder must exist under this notebook")
for filename in os.listdir(documents_path):
    contains_pdf = True if ".pdf" in filename else contains_pdf
if not contains_pdf:
    print(f"ERROR: The {documents_path} subfolder must contain at least one .pdf file")

## Generate a sythentic test set

Start by loading in the documents we'll be using to augment our RAG generations

In [4]:
loader = PyPDFDirectoryLoader(documents_path)
documents = loader.load()
for document in documents:
    document.metadata['file_name'] = document.metadata['source']

Now use OpenAI to generate a test set from the data in these documents. This takes 2-3 minutes.

In [5]:
# Create generator with openai models
generator = TestsetGenerator.with_openai()

# Generate the testset
testset = generator.generate_with_langchain_docs(
    documents, 
    test_size=10, 
    distributions={simple: 0.5, reasoning: 0.25, multi_context: 0.25}
)

embedding nodes:   0%|          | 0/118 [00:00<?, ?it/s]

Generating:   0%|          | 0/10 [00:00<?, ?it/s]

Preview the test dataset so far

In [6]:
testset.to_pandas()

Unnamed: 0,question,contexts,ground_truth,evolution_type,episode_done
0,What is the mission of the Vector Institute an...,[4 \nAnnual Report 2021–22 Vector Institute\n ...,The mission of the Vector Institute is to lead...,simple,True
1,How does Vector aim to ensure safe and secure ...,[ in Vector’s leading work \nin making health ...,Vector aims to ensure safe and secure research...,simple,True
2,What opportunities are available for AI-skille...,[28 \n MAKING CAREER \nCONNECTIONS FASTER WITH...,"On Vector's Digital Talent Hub, AI-skilled tal...",simple,True
3,What skills did participants gain in the Vecto...,[37 \nAnnual Report 2021–22 Vector Institute\n...,Participants in the Vector bootcamp gained new...,simple,True
4,How did the AI Engineering team contribute to ...,[36 \n \n \n CREATING IMPACT \nTHROUGH APPLIC...,The AI Engineering team contributed to creatin...,simple,True
5,How does the Vector Institute contribute to AI...,[4 \nAnnual Report 2021–22 Vector Institute\n ...,The Vector Institute contributes to AI-based i...,reasoning,True
6,How can the development and scaling of innovat...,[ in Vector’s leading work \nin making health ...,By fostering practical and enforceable framewo...,reasoning,True
7,What is the Vector Institute's contribution to...,[24 \nAnnual Report 2021–22 Vector Institute\n...,"The Vector Institute is helping to attract, de...",multi_context,True
8,How does Vector's leading work in health data ...,[ in Vector’s leading work \nin making health ...,Vector's leading work in health data accessibi...,multi_context,True
9,How does the Vector Institute contribute to th...,[4 \nAnnual Report 2021–22 Vector Institute\n ...,The Vector Institute contributes to the growth...,reasoning,True


# Now, start the RAG pipeline!

## Generate answers for all the questions in our test set

Go through the embedding, storage and retrieval steps.

In [7]:
# Split the documents into smaller chunks
text_splitter = RecursiveCharacterTextSplitter(chunk_size=250, chunk_overlap=100)
chunks = text_splitter.split_documents(documents)
print(f"Number of text chunks: {len(chunks)}")

# Define the embeddings model
model_name = "BAAI/bge-small-en-v1.5"
encode_kwargs = {'normalize_embeddings': True} # set True to compute cosine similarity

print(f"Setting up the embeddings model...")
embeddings = OpenAIEmbeddings(model="text-embedding-ada-002")

Number of text chunks: 569
Setting up the embeddings model...


Iterate over the questions in our synthetic testset, and run them each through the RAG pipeline to see what answers get returned. (This also takes 2-3 minutes)

In [8]:
dataset = testset.to_dataset()
answers = np.empty(len(dataset), dtype=object)
llm = ChatOpenAI()

for index, row in enumerate(dataset):
    query = row["question"]
    
    # Retrieve the most relevant context from the vector store based on the query
    vectorstore = FAISS.from_documents(chunks, embeddings)
    retriever = vectorstore.as_retriever(search_kwargs={"k": 20})
    
    # Run the query via our llm (OpenAI) without reranking
    qa = RetrievalQA.from_chain_type(llm=llm,
            chain_type="stuff",
            retriever=retriever)
    answer = qa.run(query=query)
    print(f"Result {index}\nQuestion: {query}\nAnswer: {answer}\n")
   
    # Store the result
    answers[index] = answer
    
    # Let's skip the reranking in this example to keep it fast.
    # Uncomment the following lines to see how reranking will affect the results.
    """
    embeddings = OpenAIEmbeddings()
    embeddings_filter = EmbeddingsFilter(embeddings=embeddings, similarity_threshold=0.76)
    compression_retriever = ContextualCompressionRetriever(
        base_compressor=embeddings_filter, base_retriever=retriever
    )
    compressed_docs = compression_retriever.get_relevant_documents(query)
    
    answer = RetrievalQA.from_chain_type(llm=llm,
        chain_type="stuff",
        retriever=compression_retriever)
        
    dataset[index]["answer"]["answer"] = answer
    print(f"Result:\nQuestion: {query}\nAnswer: {qa.run(query=query)}\n")
    """

  warn_deprecated(


Result 0
Question: What is the mission of the Vector Institute and how does it relate to AI-based innovation and deep learning and machine learning?
Answer: The mission of the Vector Institute is to develop programming for Black and Indigenous students, postdoctoral fellows, and recent graduates to build research opportunities and expand career pathways in AI for historically underrepresented groups. This mission is aimed at fostering diversity and inclusion in the field of AI. Additionally, the institute provides infrastructure and software engineering expertise to support world-leading research, enabling researchers to push the frontiers of AI innovation. Vector also actively contributes expertise and insights on policy issues related to AI adoption to support the best interests of Ontarians and Canadians. Through initiatives like the Smart Health Initiative, Vector is demonstrating the power of machine learning in improving the quality and efficiency of health services. Additionally

Result 9
Question: How does the Vector Institute contribute to the growth and productivity of Canada by focusing on deep learning and machine learning, and what role does it play in supporting Canadian industry and public institutions in the use of AI?
Answer: The Vector Institute contributes to the growth and productivity of Canada by focusing on deep learning and machine learning through various initiatives. It provides learning and work experience in these fields to Black and Indigenous students, postdoctoral fellows, recent graduates, and other historically underrepresented groups. By expanding opportunities for talent in Canada, Vector helps in fostering economic growth and improving the lives of Canadians.

Additionally, Vector equips Canadian businesses to accelerate AI adoption and drive business value. It supports organizations in various sectors to apply AI effectively, ultimately strengthening the adoption and application of AI across the Canadian industry. Vector's collabor

Add the list of answers into our original dataset. Now we have a complete test set that is ready for evaluation.

In [9]:
dataset = dataset.add_column("answer", answers)

## Evaluate the results

In [10]:
# Preview the final test set
dataset.to_pandas()

Unnamed: 0,question,contexts,ground_truth,evolution_type,episode_done,answer
0,What is the mission of the Vector Institute an...,[4 \nAnnual Report 2021–22 Vector Institute\n ...,The mission of the Vector Institute is to lead...,simple,True,The mission of the Vector Institute is to deve...
1,How does Vector aim to ensure safe and secure ...,[ in Vector’s leading work \nin making health ...,Vector aims to ensure safe and secure research...,simple,True,Vector aims to ensure safe and secure research...
2,What opportunities are available for AI-skille...,[28 \n MAKING CAREER \nCONNECTIONS FASTER WITH...,"On Vector's Digital Talent Hub, AI-skilled tal...",simple,True,Opportunities available for AI-skilled talent ...
3,What skills did participants gain in the Vecto...,[37 \nAnnual Report 2021–22 Vector Institute\n...,Participants in the Vector bootcamp gained new...,simple,True,Participants in the Vector bootcamp on Privacy...
4,How did the AI Engineering team contribute to ...,[36 \n \n \n CREATING IMPACT \nTHROUGH APPLIC...,The AI Engineering team contributed to creatin...,simple,True,The AI Engineering team contributed to creatin...
5,How does the Vector Institute contribute to AI...,[4 \nAnnual Report 2021–22 Vector Institute\n ...,The Vector Institute contributes to AI-based i...,reasoning,True,The Vector Institute contributes to AI-based i...
6,How can the development and scaling of innovat...,[ in Vector’s leading work \nin making health ...,By fostering practical and enforceable framewo...,reasoning,True,By fostering practical and enforceable framewo...
7,What is the Vector Institute's contribution to...,[24 \nAnnual Report 2021–22 Vector Institute\n...,"The Vector Institute is helping to attract, de...",multi_context,True,The Vector Institute plays a significant role ...
8,How does Vector's leading work in health data ...,[ in Vector’s leading work \nin making health ...,Vector's leading work in health data accessibi...,multi_context,True,Vector's leading work in health data accessibi...
9,How does the Vector Institute contribute to th...,[4 \nAnnual Report 2021–22 Vector Institute\n ...,The Vector Institute contributes to the growth...,reasoning,True,The Vector Institute contributes to the growth...


Run the evaluation query to score the results. In this evaluation, we are looking at the following metrics:
- **Faithfulness**: The generated answer is regarded as faithful if all the claims that are made in the answer can be inferred from the given context(s).
- **Context Precision**: Did our retriever return good results that matched the question it was being asked?
- **Answer Correctness**: Was the generated answer actually correct? Was it complete?

In [11]:
score = evaluate(dataset, metrics=[faithfulness,context_precision,answer_correctness])
score.to_pandas()

Evaluating:   0%|          | 0/30 [00:00<?, ?it/s]

Unnamed: 0,question,contexts,ground_truth,evolution_type,episode_done,answer,faithfulness,context_precision,answer_correctness
0,What is the mission of the Vector Institute an...,[4 \nAnnual Report 2021–22 Vector Institute\n ...,The mission of the Vector Institute is to lead...,simple,True,The mission of the Vector Institute is to deve...,0.0,1.0,0.768787
1,How does Vector aim to ensure safe and secure ...,[ in Vector’s leading work \nin making health ...,Vector aims to ensure safe and secure research...,simple,True,Vector aims to ensure safe and secure research...,1.0,1.0,0.405552
2,What opportunities are available for AI-skille...,[28 \n MAKING CAREER \nCONNECTIONS FASTER WITH...,"On Vector's Digital Talent Hub, AI-skilled tal...",simple,True,Opportunities available for AI-skilled talent ...,0.857143,1.0,0.815894
3,What skills did participants gain in the Vecto...,[37 \nAnnual Report 2021–22 Vector Institute\n...,Participants in the Vector bootcamp gained new...,simple,True,Participants in the Vector bootcamp on Privacy...,1.0,1.0,0.74863
4,How did the AI Engineering team contribute to ...,[36 \n \n \n CREATING IMPACT \nTHROUGH APPLIC...,The AI Engineering team contributed to creatin...,simple,True,The AI Engineering team contributed to creatin...,0.5,1.0,0.842657
5,How does the Vector Institute contribute to AI...,[4 \nAnnual Report 2021–22 Vector Institute\n ...,The Vector Institute contributes to AI-based i...,reasoning,True,The Vector Institute contributes to AI-based i...,0.75,1.0,0.672653
6,How can the development and scaling of innovat...,[ in Vector’s leading work \nin making health ...,By fostering practical and enforceable framewo...,reasoning,True,By fostering practical and enforceable framewo...,1.0,1.0,0.863141
7,What is the Vector Institute's contribution to...,[24 \nAnnual Report 2021–22 Vector Institute\n...,"The Vector Institute is helping to attract, de...",multi_context,True,The Vector Institute plays a significant role ...,1.0,1.0,0.781068
8,How does Vector's leading work in health data ...,[ in Vector’s leading work \nin making health ...,Vector's leading work in health data accessibi...,multi_context,True,Vector's leading work in health data accessibi...,0.777778,1.0,0.670394
9,How does the Vector Institute contribute to th...,[4 \nAnnual Report 2021–22 Vector Institute\n ...,The Vector Institute contributes to the growth...,reasoning,True,The Vector Institute contributes to the growth...,0.625,1.0,0.367064
