# 📊 RAG System Evaluation

**Purpose:**  
Evaluate the end-to-end Retrieval-Augmented Generation (RAG) pipeline using Ragas metrics (Faithfulness, Answer Relevancy, Context Precision & Recall) on a fixed set of professor-supplied questions.

**Contents:**
1. Environment setup & imports  
2. Instantiate production RAG generator (placeholder)  
3. Load and configure vector retriever  
4. Define evaluation questions & gold references  
5. Build Ragas `SingleTurnSample`s  
6. Configure and run Ragas evaluation  
7. Display & save results



In [None]:
# pip install ragas langchain-openai
# !pip install -qU langchain-chroma langchain-core  # run once

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


In [5]:
import sys, pathlib
PROJECT_ROOT = pathlib.Path().resolve().parent      # parent of notebooks/
SRC_DIR = PROJECT_ROOT / "src"
sys.path.append(str(SRC_DIR))                       # now “src” is importable

In [None]:
# ─────────────────────────────────────────────────────────────────────
# 0. Setup imports and paths
# ─────────────────────────────────────────────────────────────────────
import sys, pathlib
from dotenv import load_dotenv

# If your notebook sits next to src/, add it to sys.path
PROJECT_ROOT = pathlib.Path().resolve().parent
sys.path.append(str(PROJECT_ROOT / "src"))

load_dotenv()  # loads OPENAI_API_KEY or other credentials

True

In [2]:

# ─────────────────────────────────────────────────────────────────────
# 1. Imports for RAG evaluation
# ─────────────────────────────────────────────────────────────────────
from ragas import SingleTurnSample, EvaluationDataset, evaluate
from ragas.metrics import Faithfulness, AnswerRelevancy, ContextPrecision, ContextRecall
from ragas.llms import LangchainLLMWrapper
from langchain_openai import ChatOpenAI


In [None]:
# ─────────────────────────────────────────────────────────────────────
# 2. Instantiate your RAG generator model (placeholder)
# ─────────────────────────────────────────────────────────────────────
# TODO: replace this with your actual RAGChatbot or RetrievalQA chain
# e.g. from rag_chatbot import RAGChatbot
# chatbot = RAGChatbot(model="gpt-3.5-turbo")
chatbot = None  # <-- placeholder for your RAG pipeline instance

# ─────────────────────────────────────────────────────────────────────
# 3. Instantiate the judge LLM wrapper
# ─────────────────────────────────────────────────────────────────────
# GPT-4 will act as the judge for faithfulness, relevancy, etc.
judge_llm = LangchainLLMWrapper(
    ChatOpenAI(
        model="gpt-4",
        temperature=0
    )
)

In [4]:

# ─────────────────────────────────────────────────────────────────────
# 4. Define your evaluation questions + gold references
# ─────────────────────────────────────────────────────────────────────
evaluation_data = [
    {
        "question": "What is tuition cost for the program?",
        "reference": "Tuition for the MS in Applied Data Science program: $5,967 per course/$71,604 total tuition"
    },
    {
        "question": "What scholarships are available?",
        "reference": "The Data Science Institute Scholarship, MS in Applied Data Science Alumni Scholarship"
    },
    {
        "question": "What are the deadlines for the in-person program?",
        "reference": (
            "November 7, 2024 – Priority Application Deadline; "
            "December 4, 2024 – Scholarship Priority Deadline; "
            "January 21, 2025 – International Application Deadline; "
            "March 4, 2025 – Second Priority Application Deadline; "
            "May 6, 2025 – Third Priority Application Deadline; "
            "June 23, 2025 – Final Application Deadline"
        )
    }
]


In [None]:
# ─────────────────────────────────────────────────────────────────────
# 5. Define rag_query function (placeholder)
# ─────────────────────────────────────────────────────────────────────
def rag_query(question: str):
    """
    Placeholder for your actual RAG pipeline call.
    Should return:
      - answer: str        ← model-generated answer
      - contexts: list[str] ← list of retrieved chunk texts
    """
    # TODO: replace the following two lines with your RAG pipeline call
    answer = "<YOUR_MODEL_GENERATED_ANSWER>"  
    contexts = ["<CHUNK TEXT 1>", "<CHUNK TEXT 2>", "..."]  
    return answer, contexts


In [None]:
# ─────────────────────────────────────────────────────────────────────
# 6. Build Ragas samples from evaluation_data
# ─────────────────────────────────────────────────────────────────────
samples = []
for item in evaluation_data:
    q = item["question"]
    gt = item["reference"]
    
    # call your rag_query (currently placeholder)
    answer, retrieved_contexts = rag_query(q)
    
    samples.append(
        SingleTurnSample(
            user_input=q,
            retrieved_contexts=retrieved_contexts,
            response=answer,
            reference=gt,
        )
    )

dataset = EvaluationDataset(samples=samples)

# ─────────────────────────────────────────────────────────────────────
# 7. Define metrics for evaluation
# ─────────────────────────────────────────────────────────────────────
metrics = [
    Faithfulness(),
    AnswerRelevancy(),
    ContextPrecision(),
    ContextRecall(),
]

# ─────────────────────────────────────────────────────────────────────
# 8. Run Ragas evaluation
# ─────────────────────────────────────────────────────────────────────
results = evaluate(
    dataset=dataset,
    metrics=metrics,
    llm=judge_llm
)


Evaluating:   0%|          | 0/12 [00:00<?, ?it/s]

Exception raised in Job[9]: OutputParserException(Invalid json output: {
    "question": "<YOUR_MODEL_GENERATED_QUESTION>",
    "noncommittal": <1_OR_0>
}
For troubleshooting, visit: https://python.langchain.com/docs/troubleshooting/errors/OUTPUT_PARSING_FAILURE )
Prompt fix_output_format failed to parse output: The output parser failed to parse the output including retries.
Prompt fix_output_format failed to parse output: The output parser failed to parse the output including retries.
Prompt fix_output_format failed to parse output: The output parser failed to parse the output including retries.
Prompt response_relevance_prompt failed to parse output: The output parser failed to parse the output including retries.
Prompt fix_output_format failed to parse output: The output parser failed to parse the output including retries.
Prompt fix_output_format failed to parse output: The output parser failed to parse the output including retries.
Prompt fix_output_format failed to parse output: 

In [None]:
# ─────────────────────────────────────────────────────────────────────
# 9. Convert to DataFrame, display, and save
# ─────────────────────────────────────────────────────────────────────
df = results.to_pandas()
display(df)
df.to_csv("rag_eval_results.csv", index=False)


Unnamed: 0,user_input,retrieved_contexts,response,reference,faithfulness,answer_relevancy,context_precision,context_recall
0,What is tuition cost for the program?,"[<CHUNK TEXT 1>, <CHUNK TEXT 2>, ...]",<YOUR_MODEL_GENERATED_ANSWER>,Tuition for the MS in Applied Data Science pro...,0.0,,0.0,
1,What scholarships are available?,"[<CHUNK TEXT 1>, <CHUNK TEXT 2>, ...]",<YOUR_MODEL_GENERATED_ANSWER>,"The Data Science Institute Scholarship, MS in ...",0.0,,0.0,0.0
2,What are the deadlines for the in-person program?,"[<CHUNK TEXT 1>, <CHUNK TEXT 2>, ...]",<YOUR_MODEL_GENERATED_ANSWER>,"November 7, 2024 – Priority Application Deadli...",0.0,,0.0,0.0


In [None]:
import pathlib

# Adjust this to the real directory you used when building the index:
CHROMA_DIR = pathlib.Path("../data/chroma_index").expanduser().resolve()
print("Looking for Chroma files in:", CHROMA_DIR)
print("Exists?", CHROMA_DIR.exists(), "| Contents:", list(CHROMA_DIR.iterdir())[:5])


In [16]:
from langchain_chroma import Chroma
from langchain_openai import OpenAIEmbeddings
import pathlib

# Open the same directory
CHROMA_DIR = pathlib.Path("../data/chroma_index").resolve()
emb = OpenAIEmbeddings()

# Don’t specify a collection_name; let Chroma client tell us what’s inside
vectordb_client = Chroma(
    persist_directory=str(CHROMA_DIR),
    embedding_function=emb,
)

# Inspect its internal client
collections = vectordb_client._client.list_collections()
print("Collections in this directory:", collections)


Collections in this directory: [Collection(name=uchicago_ms_applied_ds_header_chunks), Collection(name=langchain)]


---

# V1 Code from here

In [None]:
from ragas import SingleTurnSample, EvaluationDataset

samples = [
    SingleTurnSample(
        user_input="What is tuition cost for the program?",
        retrieved_contexts=[
            "Tuition for the MS in Applied Data Science program: $5,967 per course/$71,604 total tuition"
        ],
        response="The total tuition for the MS in Applied Data Science program is $71,604.",
        reference="Tuition for the MS in Applied Data Science program: $5,967 per course/$71,604 total tuition"
    ),
    SingleTurnSample(
        user_input="What scholarships are available?",
        retrieved_contexts=[
            "The Data Science Institute Scholarship, MS in Applied Data Science Alumni Scholarship"
        ],
        response="Available scholarships include the Data Science Institute Scholarship and the Alumni Scholarship.",
        reference="The Data Science Institute Scholarship, MS in Applied Data Science Alumni Scholarship"
    ),
    SingleTurnSample(
        user_input="What are the deadlines for the in-person program?",
        retrieved_contexts=[
            "November 7, 2024 – Priority Application Deadline\nDecember 4, 2024 – Scholarship Priority Deadline\nJanuary 21, 2025 – International Deadline..."
        ],
        response="The deadlines include November 7, December 4, January 21, March 4, May 6, and June 23.",
        reference="November 7, 2024 – Priority Application Deadline\nDecember 4, 2024 – Scholarship Priority Deadline..."
    ),
]
evaluation_dataset = EvaluationDataset(samples=samples)


In [None]:
from ragas.metrics import Faithfulness, AnswerRelevancy, ContextPrecision, ContextRecall
from ragas.llms import LangchainLLMWrapper
from langchain_openai import ChatOpenAI
import os

evaluator_llm = LangchainLLMWrapper(ChatOpenAI(model="gpt-4"))

metrics = [
    Faithfulness(llm=evaluator_llm),
    AnswerRelevancy(llm=evaluator_llm),
    ContextPrecision(llm=evaluator_llm),
    ContextRecall(llm=evaluator_llm)
]

evaluate(dataset=evaluation_dataset, metrics=metrics)

In [None]:
from ragas.metrics import Faithfulness, AnswerRelevancy, ContextPrecision, ContextRecall
from ragas.llms import LangchainLLMWrapper
from langchain_openai import ChatOpenAI
import os
from dotenv import load_dotenv

load_dotenv()  # Loads variables from .env

# Optional: Check that the key is loaded

evaluator_llm = LangchainLLMWrapper(ChatOpenAI(model="gpt-4"))

In [3]:
from ragas import evaluate

metrics = [
    Faithfulness(llm=evaluator_llm),
    AnswerRelevancy(llm=evaluator_llm),
    ContextPrecision(llm=evaluator_llm),
    ContextRecall(llm=evaluator_llm)
]

results = evaluate(dataset=evaluation_dataset, metrics=metrics)


Evaluating:   0%|          | 0/12 [00:00<?, ?it/s]

In [4]:
import pandas as pd

df_results = results.to_pandas()
print(df_results)


                                          user_input  \
0              What is tuition cost for the program?   
1                   What scholarships are available?   
2  What are the deadlines for the in-person program?   

                                  retrieved_contexts  \
0  [Tuition for the MS in Applied Data Science pr...   
1  [The Data Science Institute Scholarship, MS in...   
2  [November 7, 2024 – Priority Application Deadl...   

                                            response  \
0  The total tuition for the MS in Applied Data S...   
1  Available scholarships include the Data Scienc...   
2  The deadlines include November 7, December 4, ...   

                                           reference  faithfulness  \
0  Tuition for the MS in Applied Data Science pro...           1.0   
1  The Data Science Institute Scholarship, MS in ...           0.0   
2  November 7, 2024 – Priority Application Deadli...           0.5   

   answer_relevancy  context_precision  cont