In [8]:
import os
import sys
import pandas as pd
from datasets import Dataset
from ragas import evaluate
from ragas.metrics import (
    context_precision,
    answer_relevancy,
    faithfulness,
    context_recall,
    answer_correctness, 
)
from langchain_core.messages import SystemMessage

sys.path.append(os.path.abspath(os.path.join(os.getcwd(), '..')))

from backend.src.pipelines.rag import RAGPipeline
from backend.src.utils import get_llm_1 

rag = RAGPipeline()
llm = get_llm_1()

test_questions = [
    "What is the first law of thermodynamics?",
    "Define adiabatic process.",
    "What is the Carnot engine and its efficiency?"
]

ground_truths = [
    ["""The First Law of Thermodynamics is the
principle of conservation of energy. Common
experience shows that there are many
conceivable processes that are perfectly
allowed by the First Law and yet are never
observed. For example, nobody has ever seen
a book lying on a table jumping to a height by
itself. But such a thing would be possible if
the principle of conservation of energy were
the only restriction. The table could cool
spontaneously, converting some of its internal
energy into an equal amount of mechanical
energy of the book, which would then hop to a
height with potential energy equal to the
mechanical energy it acquired. But this never
happens. Clearly, some additional basic
principle of nature forbids the above, even
though it satisfies the energy conservation
principle. This principle, which disallows
many phenomena consistent with the First
Law of Thermodynamics is known as the
Second Law of Thermodynamics."""],
    ["""In an adiabatic process, the system is insulated
from the surroundings and heat absorbed or
released is zero. From Eq. (11.1), we see that
work done by the gas results in decrease in its
internal energy (and hence its temperature for
an ideal gas). We quote without proof (the result
that you will learn in higher courses) that for
an adiabatic process of an ideal gas.
 P V Î³
 = const (11.13)
where Î³ is the ratio of specific heats (ordinary
or molar) at constant pressure and at constant
volume.
 Î³ =
Cp
Cv
Thus if an ideal gas undergoes a change in
its state adiabatically from (P1
, V1
) to (P2
, V2
) :
P1 V1
Î³
 = P2 V2
Î³
(11.14)
Figure11.8 shows the P-V curves of an ideal
gas for two adiabatic processes connecting two
isotherms.
"""],
    ["""Suppose we have a hot reservoir at temperature
T1
 and a cold reservoir at temperature T2
. What
is the maximum efficiency possible for a heat
engine operating between the two reservoirs and
what cycle of processes should be adopted to
achieve the maximum efficiency ? Sadi Carnot,
a French engineer, first considered this question
in 1824. Interestingly, Carnot arrived at the
correct answer, even though the basic concepts
of heat and thermodynamics had yet to be firmly
established.
We expect the ideal engine operating between
two temperatures to be a reversible engine.
Irreversibility is associated with dissipative
effects, as remarked in the preceding section,
and lowers efficiency. A process is reversible if
it is quasi-static and non-dissipative. We have
seen that a process is not quasi-static if it
involves finite temperature difference between
the system and the reservoir. This implies that
in a reversible heat engine operating between
two temperatures, heat should be absorbed
(from the hot reservoir) isothermally and
released (to the cold reservoir) isothermally. We
thus have identified two steps of the reversible
heat engine : isothermal process at temperature
T1
 absorbing heat Q1
 from the hot reservoir, and
another isothermal process at temperature T2
releasing heat Q2
 to the cold reservoir. To
complete a cycle, we need to take the system
from temperature T1
 to T2
 and then back from
temperature T2
 to T1
. Which processes should
we employ for this purpose that are reversible?
A little reflection shows that we can only adopt
reversible adiabatic processes for these
purposes, which involve no heat flow from any
reservoir. If we employ any other process that is
not adiabatic, say an isochoric process, to take
the system from one temperature to another, we
shall need a series of reservoirs in the
temperature range T2
 to T1
 to ensure that at each
stage the process is quasi-static. (Remember
again that for a process to be quasi-static and
reversible, there should be no finite temperature
difference between the system and the reservoir.)
But we are considering a reversible engine that
operates between only two temperatures. Thus
adiabatic processes must bring about the
temperature change in the system from T1
 to T2
and T2
 to T1
 in this engine.
Fig. 11.9 Carnot cycle for a heat engine with an
ideal gas as the working substance.
A reversible heat engine operating between
two temperatures is called a Carnot engine. We
have just argued that such an engine must have
the following sequence of steps constituting one
Reprint 2025-26
238 PHYSICS
cycle, called the Carnot cycle, shown in
Fig. 11.9. We have taken the working substance
of the Carnot engine to be an ideal gas.
"""]
]

# --- 2. RUN EXPERIMENT ---
results_rag = {"question": [], "answer": [], "contexts": [], "ground_truth": []}
results_no_rag = {"question": [], "answer": [], "contexts": [], "ground_truth": []}

print("Running Evaluation... This may take a moment.")

for i, question in enumerate(test_questions):
    print(f"Processing Q{i+1}: {question}")
    
    # --- A. WITH RAG ---
    # 1. Retrieve Context
    context = rag.retrieve_context(question)
    
    # 2. Generate Answer with Context
    prompt_rag = f"""Answer the question based on the context below:
    Context: {context}
    Question: {question}"""
    
    ans_rag = llm.invoke([SystemMessage(content=prompt_rag)]).content
    
    results_rag["question"].append(question)
    results_rag["answer"].append(ans_rag)
    results_rag["contexts"].append([context]) 
    results_rag["ground_truth"].append(ground_truths[i][0])

    # --- B. WITHOUT RAG (Raw LLM) ---
    ans_no_rag = llm.invoke([SystemMessage(content=question)]).content
    
    results_no_rag["question"].append(question)
    results_no_rag["answer"].append(ans_no_rag)
    results_no_rag["contexts"].append([""]) 
    results_no_rag["ground_truth"].append(ground_truths[i][0])

ds_rag = Dataset.from_dict(results_rag)
ds_no_rag = Dataset.from_dict(results_no_rag)

metrics_rag = [
    context_precision,
    faithfulness,
    answer_relevancy,
    context_recall,
    answer_correctness, 
]

metrics_no_rag = [
    context_precision,
    faithfulness,
    answer_relevancy,
    context_recall,
    answer_correctness,
]

print("\nEvaluating RAG Pipeline...")
scores_rag = evaluate(
    ds_rag,
    metrics=metrics_rag,
    llm=llm,
    embeddings=rag.embeddings
)

print("\nEvaluating LLM-Only Performance...")
scores_no_rag = evaluate(
    ds_no_rag,
    metrics=metrics_no_rag,
    llm=llm,
    embeddings=rag.embeddings
)

# --- 4. DISPLAY RESULTS ---
print("\n" + "="*50)
print("ðŸ“Š RAG PIPELINE PERFORMANCE")
print("="*50)
print(scores_rag)

print("\n" + "="*50)
print("ðŸ§  LLM-ONLY PERFORMANCE")
print("="*50)
print(scores_no_rag)

print("\n" + "="*50)
print("ðŸ†š COMPARISON: RAG vs NO-RAG ANSWERS")
print("="*50)

df_comparison = pd.DataFrame({
    "Question": test_questions,
    "RAG Answer": results_rag["answer"],
    "No-RAG Answer": results_no_rag["answer"],
    "Ground Truth": [gt[0] for gt in ground_truths]
})

# Save to CSV for later analysis
df_comparison.to_csv("rag_evaluation_results.csv", index=False)
print("\nDetailed results saved to 'rag_evaluation_results.csv'")

Running Evaluation... This may take a moment.
Processing Q1: What is the first law of thermodynamics?
Processing Q2: Define adiabatic process.
Processing Q3: What is the Carnot engine and its efficiency?

Evaluating RAG Pipeline...


Evaluating: 100%|â–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆ| 15/15 [03:00<00:00, 12.00s/it]



Evaluating LLM-Only Performance...


Evaluating: 100%|â–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆ| 15/15 [03:00<00:00, 12.00s/it]



ðŸ“Š RAG PIPELINE PERFORMANCE
{'context_precision': 1.0000, 'faithfulness': 0.9333, 'answer_relevancy': 0.9132, 'context_recall': 1.0000, 'answer_correctness': 0.3389}

ðŸ§  LLM-ONLY PERFORMANCE
{'context_precision': 0.5000, 'faithfulness': 0.4659, 'answer_relevancy': nan, 'context_recall': 1.0000, 'answer_correctness': nan}

ðŸ†š COMPARISON: RAG vs NO-RAG ANSWERS

Detailed results saved to 'rag_evaluation_results.csv'
