# Using Ragas to Evaluate the RAG Application

## Dependencies and API Keys:

In [1]:
#!pip install -qU ragas==0.2.10

In [3]:
#!pip install -qU langchain-community==0.3.14 langchain-openai==0.2.14 unstructured==0.16.12 langgraph==0.2.61 langchain-qdrant==0.2.0

We'll also need to provide our API keys.

First, OpenAI's for our LLM/embedding model combination!

In [1]:
import os
from getpass import getpass
os.environ["OPENAI_API_KEY"] = getpass("Please enter your OpenAI API key!")

In [2]:
# Import the necessary Libraries
import json
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.document_loaders import PyPDFDirectoryLoader
from langchain_openai.embeddings import OpenAIEmbeddings


## Generating Synthetic Test Data

We wil be using Ragas to build out a set of synthetic test questions, references, and reference contexts. This is useful because it will allow us to find out how our system is performing.


### Data Preparation

Let's start by collecting our data into a useful pile!

In [5]:
#Upload Dataset-10k.zip and unzip it dataset folder using -d option
!unzip Dataset-10k.zip -d dataset

Next, let's load our data into a familiar LangChain format using the `DirectoryLoader`.

In [8]:
from langchain_community.document_loaders import DirectoryLoader

path = "data/"
loader = DirectoryLoader(path, glob="*.html")
docs = loader.load()

In [8]:
# Provide pdf_folder_location
pdf_folder_location = "data"
# Load the directory to pdf_loader
pdf_loader = PyPDFDirectoryLoader(pdf_folder_location)
docs = pdf_loader.load()


### Knowledge Graph Based Synthetic Generation

Ragas uses a knowledge graph based approach to create data. This is extremely useful as it allows us to create complex queries rather simply. The additional testset complexity allows us to evaluate larger problems more effectively, as systems tend to be very strong on simple evaluation tasks.

Let's start by defining our `generator_llm` (which will generate our questions, summaries, and more), and our `generator_embeddings` which will be useful in building our graph.

### Abstracted SDG

The above method is the full process - but we can shortcut that using the provided abstractions!

This will generate our knowledge graph under the hood, and will - from there - generate our personas and scenarios to construct our queries.



In [13]:
from ragas.llms import LangchainLLMWrapper
from ragas.embeddings import LangchainEmbeddingsWrapper
from langchain_openai import ChatOpenAI
from langchain_openai import OpenAIEmbeddings
generator_llm = LangchainLLMWrapper(ChatOpenAI(model="gpt-4.1"))
generator_embeddings = LangchainEmbeddingsWrapper(OpenAIEmbeddings())

In [14]:
from ragas.testset import TestsetGenerator

generator = TestsetGenerator(llm=generator_llm, embedding_model=generator_embeddings)
dataset = generator.generate_with_langchain_docs(docs, testset_size=5)

Applying HeadlinesExtractor:   0%|          | 0/107 [00:00<?, ?it/s]

unable to apply transformation: Error code: 429 - {'error': {'message': 'Rate limit reached for gpt-4.1 in organization org-raBMZk0ComSs6jAO0zww7ARn on tokens per min (TPM): Limit 30000, Used 29237, Requested 1721. Please try again in 1.916s. Visit https://platform.openai.com/account/rate-limits to learn more.', 'type': 'tokens', 'param': None, 'code': 'rate_limit_exceeded'}}
unable to apply transformation: Error code: 429 - {'error': {'message': 'Rate limit reached for gpt-4.1 in organization org-raBMZk0ComSs6jAO0zww7ARn on tokens per min (TPM): Limit 30000, Used 29044, Requested 1931. Please try again in 1.95s. Visit https://platform.openai.com/account/rate-limits to learn more.', 'type': 'tokens', 'param': None, 'code': 'rate_limit_exceeded'}}
unable to apply transformation: Error code: 429 - {'error': {'message': 'Rate limit reached for gpt-4.1 in organization org-raBMZk0ComSs6jAO0zww7ARn on tokens per min (TPM): Limit 30000, Used 29347, Requested 1745. Please try again in 2.184s. 

Applying HeadlineSplitter:   0%|          | 0/135 [00:00<?, ?it/s]

unable to apply transformation: 'headlines' property not found in this node
unable to apply transformation: 'headlines' property not found in this node
unable to apply transformation: 'headlines' property not found in this node
unable to apply transformation: 'headlines' property not found in this node
unable to apply transformation: 'headlines' property not found in this node
unable to apply transformation: 'headlines' property not found in this node
unable to apply transformation: 'headlines' property not found in this node
unable to apply transformation: 'headlines' property not found in this node
unable to apply transformation: 'headlines' property not found in this node
unable to apply transformation: 'headlines' property not found in this node
unable to apply transformation: 'headlines' property not found in this node
unable to apply transformation: 'headlines' property not found in this node
unable to apply transformation: 'headlines' property not found in this node
unable to ap

Applying SummaryExtractor:   0%|          | 0/176 [00:00<?, ?it/s]

unable to apply transformation: Error code: 429 - {'error': {'message': 'Rate limit reached for gpt-4.1 in organization org-raBMZk0ComSs6jAO0zww7ARn on tokens per min (TPM): Limit 30000, Used 29247, Requested 1742. Please try again in 1.978s. Visit https://platform.openai.com/account/rate-limits to learn more.', 'type': 'tokens', 'param': None, 'code': 'rate_limit_exceeded'}}
unable to apply transformation: Error code: 429 - {'error': {'message': 'Rate limit reached for gpt-4.1 in organization org-raBMZk0ComSs6jAO0zww7ARn on tokens per min (TPM): Limit 30000, Used 29507, Requested 1663. Please try again in 2.34s. Visit https://platform.openai.com/account/rate-limits to learn more.', 'type': 'tokens', 'param': None, 'code': 'rate_limit_exceeded'}}
unable to apply transformation: Error code: 429 - {'error': {'message': 'Rate limit reached for gpt-4.1 in organization org-raBMZk0ComSs6jAO0zww7ARn on tokens per min (TPM): Limit 30000, Used 29487, Requested 1743. Please try again in 2.46s. V

Applying CustomNodeFilter:   0%|          | 0/60 [00:00<?, ?it/s]

Node 7edd94b5-028c-4970-bfac-1a543071dc7b does not have a summary. Skipping filtering.
Node 10673cec-02d9-47e2-a2ac-6bdc16e01adc does not have a summary. Skipping filtering.
Node afc75a8d-1431-48ee-8f67-2047ee87c23e does not have a summary. Skipping filtering.
Node 4f626d84-d51a-4386-ac40-53902a0778d6 does not have a summary. Skipping filtering.
Node d86d7b15-8647-4213-ae2f-ffceb9a550fc does not have a summary. Skipping filtering.
Node 8a451db6-d1b3-4ff3-bc33-cd3f6175f485 does not have a summary. Skipping filtering.
Node fe311487-13d4-4681-89c9-1f0cd238001a does not have a summary. Skipping filtering.
Node 5b979cd2-cccd-4bf1-b5c6-3e133e91a523 does not have a summary. Skipping filtering.
Node 4e3802da-60eb-415a-8bbf-1e593e30ff1c does not have a summary. Skipping filtering.
Node 43c7be9c-3041-439a-8799-e8b4c0b16a38 does not have a summary. Skipping filtering.
Node 579f69c4-949f-4b65-9664-2afa48e0982c does not have a summary. Skipping filtering.
Node 1f341758-cd32-4d48-8c9f-a7d38bac8c9b d

Applying [EmbeddingExtractor, ThemesExtractor, NERExtractor]:   0%|          | 0/296 [00:00<?, ?it/s]

Property 'summary_embedding' already exists in node '326b85'. Skipping!
Property 'summary_embedding' already exists in node 'af42a5'. Skipping!
Property 'summary_embedding' already exists in node 'f8d7f9'. Skipping!
Property 'summary_embedding' already exists in node 'f64cee'. Skipping!
Property 'summary_embedding' already exists in node '2884b6'. Skipping!
Property 'summary_embedding' already exists in node '40e777'. Skipping!
Property 'summary_embedding' already exists in node 'f38695'. Skipping!
Property 'summary_embedding' already exists in node '0dc75b'. Skipping!
Property 'summary_embedding' already exists in node '7e90b1'. Skipping!
Property 'summary_embedding' already exists in node '47d16f'. Skipping!
Property 'summary_embedding' already exists in node 'dd114d'. Skipping!
Property 'summary_embedding' already exists in node '0d3190'. Skipping!
Property 'summary_embedding' already exists in node 'da6c84'. Skipping!
Property 'summary_embedding' already exists in node '98061f'. Sk

Applying [CosineSimilarityBuilder, OverlapScoreBuilder]:   0%|          | 0/2 [00:00<?, ?it/s]

unable to apply transformation: Node 4f47a0c1-38a5-4f1d-952a-9aa237b937c3 has no summary_embedding


Generating personas:   0%|          | 0/3 [00:00<?, ?it/s]

Generating Scenarios:   0%|          | 0/2 [00:00<?, ?it/s]

Generating Samples:   0%|          | 0/6 [00:00<?, ?it/s]

In [15]:
dataset.to_pandas()

Unnamed: 0,user_input,reference_contexts,reference,synthesizer_name
0,"What is the significance of WASHINGTON, D.C. i...",[UNITED STATES SECURITIES AND EXCHANGE COMMISS...,"WASHINGTON, D.C. is referenced as the location...",single_hop_specifc_query_synthesizer
1,"me need know what Rule 405 is for, like why th...",[Indicate by check mark if the registrant is a...,Rule 405 in this context is used to define if ...,single_hop_specifc_query_synthesizer
2,"what all them documents say about December 31,...",[Documents incorporated by reference: Portions...,Portions of IBM’s Annual Report to Stockholder...,single_hop_specifc_query_synthesizer
3,"Wut is Note 1 and Note 5 about, and how do the...","[<1-hop>\n\nAs of December 31, 2023 , our oper...","Note 1 relates to accounting policies, includi...",multi_hop_specific_query_synthesizer
4,How do the disclosures in Note 1 and Note 6 of...,[<1-hop>\n\nDuring the years ended December 31...,Note 1 of the Notes to Consolidated Financial ...,multi_hop_specific_query_synthesizer
5,what is Note 15 and Note 4 talk about in the a...,[<1-hop>\n\nDuring the years ended December 31...,Note 4 of the Notes to Consolidated Financial ...,multi_hop_specific_query_synthesizer


## LangChain RAG


### R - Retrieval

Let's start with building our retrieval pipeline, which will involve loading the same data we used to create our synthetic test set above.

In [12]:
path = "data/"
loader = DirectoryLoader(path, glob="*.html")
docs = loader.load()

In [16]:
# Provide pdf_folder_location
pdf_folder_location = "dataset"
# Load the directory to pdf_loader
pdf_loader = PyPDFDirectoryLoader(pdf_folder_location)
docs = pdf_loader.load()


Now that we have our data loaded, let's split it into chunks!

In [17]:
from langchain.text_splitter import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=20)
split_documents = text_splitter.split_documents(docs)
len(split_documents)

3657

Next up, we'll need to provide an embedding model that we can use to construct our vector store.

In [18]:
from langchain_openai import OpenAIEmbeddings

embeddings = OpenAIEmbeddings(model="text-embedding-3-small")

Now we can build our in memory QDrant vector store.

In [19]:
from langchain_qdrant import QdrantVectorStore
from qdrant_client import QdrantClient
from qdrant_client.http.models import Distance, VectorParams

client = QdrantClient(":memory:")

client.create_collection(
    collection_name="10-k reports",
    vectors_config=VectorParams(size=1536, distance=Distance.COSINE),
)

vector_store = QdrantVectorStore(
    client=client,
    collection_name="10-k reports",
    embedding=embeddings,
)

We can now add our documents to our vector store.

In [20]:
_ = vector_store.add_documents(documents=split_documents)

Let's define our retriever.

In [21]:
retriever = vector_store.as_retriever(search_kwargs={"k": 5})

Now we can produce a node for retrieval!

In [22]:
def retrieve(state):
  retrieved_docs = retriever.invoke(state["question"])
  return {"context" : retrieved_docs}

### Augmented

Let's create a simple RAG prompt!

In [23]:
from langchain.prompts import ChatPromptTemplate

RAG_PROMPT = """\
You are a helpful assistant who answers questions based on provided context. You must only use the provided context, and cannot use your own knowledge.

### Question
{question}

### Context
{context}
"""

rag_prompt = ChatPromptTemplate.from_template(RAG_PROMPT)

### Generation

We'll also need an LLM to generate responses - we'll use `gpt-4o-mini` to avoid using the same model as our judge model.

In [24]:
from langchain_openai import ChatOpenAI

llm = ChatOpenAI(model="gpt-4o-mini")

Then we can create a `generate` node!

In [25]:
def generate(state):
  docs_content = "\n\n".join(doc.page_content for doc in state["context"])
  messages = rag_prompt.format_messages(question=state["question"], context=docs_content)
  response = llm.invoke(messages)
  return {"response" : response.content}

### Building RAG Graph with LangGraph

Let's create some state for our LangGraph RAG graph!

In [26]:
from langgraph.graph import START, StateGraph
from typing_extensions import List, TypedDict
from langchain_core.documents import Document

class State(TypedDict):
  question: str
  context: List[Document]
  response: str

Now we can build our simple graph!


In [27]:
graph_builder = StateGraph(State).add_sequence([retrieve, generate])
graph_builder.add_edge(START, "retrieve")
graph = graph_builder.compile()

Let's do a test to make sure it's doing what we'd expect.

In [28]:
response = graph.invoke({"question" : "What are Microsoft investment in Azure"})

In [29]:
response["response"]

"Microsoft's investment in Azure encompasses several key areas:\n\n1. **Infrastructure and Platform Services**: Azure revenue is primarily driven by infrastructure-as-a-service and platform-as-a-service offerings, which are consumption-based, alongside per-user services like Enterprise Mobility + Security.\n\n2. **AI Capabilities**: Microsoft is focused on enhancing its AI offerings within Azure, providing customers with tools to optimize their businesses and adapt to challenges. This investment in AI aims to unlock value from digital expenditures and drive innovation.\n\n3. **Azure Virtual Desktop and Windows 365**: These services together achieved over $1 billion in annual revenue for the first time, reflecting significant investment and growth in cloud-based solutions.\n\n4. **Sustainability Investments**: Microsoft is investing in more efficient datacenters, clean energy, and sustainability initiatives. Through its Climate Innovation Fund, the company has allocated over $700 millio

## Evaluating the App with Ragas

Now we can finally do our evaluation!

We'll start by running the queries we generated usign SDG above through our application to get context and responses.

In [30]:
for test_row in dataset:
  response = graph.invoke({"question" : test_row.eval_sample.user_input})
  test_row.eval_sample.response = response["response"]
  test_row.eval_sample.retrieved_contexts = [context.page_content for context in response["context"]]

In [31]:
dataset.to_pandas()

Unnamed: 0,user_input,retrieved_contexts,reference_contexts,response,reference,synthesizer_name
0,"What is the significance of WASHINGTON, D.C. i...",[UNITED STATES\nSECURITIES AND EXCHANGE COMMIS...,[UNITED STATES SECURITIES AND EXCHANGE COMMISS...,"The significance of WASHINGTON, D.C. in the co...","WASHINGTON, D.C. is referenced as the location...",single_hop_specifc_query_synthesizer
1,"me need know what Rule 405 is for, like why th...",[4. The registrant’s other certifying officer(...,[Indicate by check mark if the registrant is a...,The context provided does not explicitly menti...,Rule 405 in this context is used to define if ...,single_hop_specifc_query_synthesizer
2,"what all them documents say about December 31,...",[Item 6. [Reserved]\nItem 7. Management’s Disc...,[Documents incorporated by reference: Portions...,"The documents mention December 31, 2023, in th...",Portions of IBM’s Annual Report to Stockholder...,single_hop_specifc_query_synthesizer
3,"Wut is Note 1 and Note 5 about, and how do the...",[reference.\nThe instruments defining the righ...,"[<1-hop>\n\nAs of December 31, 2023 , our oper...",Note 1 and Note 5 discuss aspects related to l...,"Note 1 relates to accounting policies, includi...",multi_hop_specific_query_synthesizer
4,How do the disclosures in Note 1 and Note 6 of...,[the Notes to Consolidated Financial Statement...,[<1-hop>\n\nDuring the years ended December 31...,The disclosures in Note 1 and Note 6 of the No...,Note 1 of the Notes to Consolidated Financial ...,multi_hop_specific_query_synthesizer
5,what is Note 15 and Note 4 talk about in the a...,[For additional information about each line it...,[<1-hop>\n\nDuring the years ended December 31...,Note 4 and Note 15 in the annual report relate...,Note 4 of the Notes to Consolidated Financial ...,multi_hop_specific_query_synthesizer


Then we can convert that table into a `EvaluationDataset` which will make the process of evaluation smoother.

In [32]:
from ragas import EvaluationDataset

evaluation_dataset = EvaluationDataset.from_pandas(dataset.to_pandas())

We'll need to select a judge model - in this case we're using the same model that was used to generate our Synthetic Data.

In [33]:
from ragas import evaluate
from ragas.llms import LangchainLLMWrapper

evaluator_llm = LangchainLLMWrapper(ChatOpenAI(model="gpt-4.1"))

Next up - we simply evaluate on our desired metrics!

In [34]:
from ragas.metrics import LLMContextRecall, Faithfulness, FactualCorrectness, ResponseRelevancy, ContextEntityRecall, NoiseSensitivity
from ragas import evaluate, RunConfig

custom_run_config = RunConfig(timeout=360)

result = evaluate(
    dataset=evaluation_dataset,
    metrics=[LLMContextRecall(), Faithfulness(), FactualCorrectness(), ResponseRelevancy(), ContextEntityRecall(), NoiseSensitivity()],
    llm=evaluator_llm,
    run_config=custom_run_config
)
result

Evaluating:   0%|          | 0/36 [00:00<?, ?it/s]

Exception raised in Job[29]: RateLimitError(Error code: 429 - {'error': {'message': 'Rate limit reached for gpt-4.1 in organization org-raBMZk0ComSs6jAO0zww7ARn on tokens per min (TPM): Limit 30000, Used 29208, Requested 1839. Please try again in 2.094s. Visit https://platform.openai.com/account/rate-limits to learn more.', 'type': 'tokens', 'param': None, 'code': 'rate_limit_exceeded'}})
Exception raised in Job[11]: RateLimitError(Error code: 429 - {'error': {'message': 'Rate limit reached for gpt-4.1 in organization org-raBMZk0ComSs6jAO0zww7ARn on tokens per min (TPM): Limit 30000, Used 29670, Requested 1669. Please try again in 2.678s. Visit https://platform.openai.com/account/rate-limits to learn more.', 'type': 'tokens', 'param': None, 'code': 'rate_limit_exceeded'}})
Exception raised in Job[7]: RateLimitError(Error code: 429 - {'error': {'message': 'Rate limit reached for gpt-4.1 in organization org-raBMZk0ComSs6jAO0zww7ARn on tokens per min (TPM): Limit 30000, Used 29363, Reques

{'context_recall': 0.2778, 'faithfulness': 0.3958, 'factual_correctness': 0.3083, 'answer_relevancy': 0.6308, 'context_entity_recall': 0.1656, 'noise_sensitivity_relevant': 0.0370}