# 🧠 RAGAS‑Enabled LangChain‑Groq Chat Evaluation

This notebook shows how to:
1. **Engage in a multi-turn chat** using LangChain + Groq  
2. **Capture** (question, retrieved context, GPT answer) tuples  
3. **Evaluate** them with RAGAS using key metrics:  
   - Context Recall/Precision  
   - Faithfulness  
   - Answer Relevance / Correctness  


User → LangChain + Groq → (Answer + Context Chunks stored)
       ↓
 Collect Q/A/C tuples → Format into RAGAS Dataset → evaluate()
 → Metrics: context_recall, faithfulness, answer_relevancy …


In [None]:
# Cell 1: Install dependencies
!pip install -q langchain langchain-groq ragas langchain-openai

[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/190.9 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m190.9/190.9 kB[0m [31m7.6 MB/s[0m eta [36m0:00:00[0m
[?25h[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/70.4 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m70.4/70.4 kB[0m [31m5.0 MB/s[0m eta [36m0:00:00[0m
[?25h[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/45.5 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m45.5/45.5 kB[0m [31m3.4 MB/s[0m eta [36m0:00:00[0m
[?25h[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/2.5 MB[0m [31m?[0m eta [36m-:--:--[0m[2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m [32m2.5/2.5 MB[0m [31m109.9 MB/s[0m eta [36m0:00:01[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

In [None]:
# Cell 2: Imports & Setup
from langchain_groq import ChatGroq
from langchain.prompts.chat import ChatPromptTemplate, SystemMessagePromptTemplate, HumanMessagePromptTemplate, MessagesPlaceholder
from langchain_core.runnables import RunnableWithMessageHistory
from langchain.memory import ConversationBufferMemory
from IPython.display import Markdown, display

# RAGAS imports
from ragas import SingleTurnSample, EvaluationDataset, evaluate
from ragas.metrics import LLMContextRecall, ContextPrecision, Faithfulness, AnswerRelevancy, FactualCorrectness
from ragas.llms import LangchainLLMWrapper

In [None]:
# Cell 3: Initialize LLMs for chat and for RAGAS evaluation
from google.colab import userdata
llm = ChatGroq(model="llama-3.3-70b-versatile", api_key=userdata.get('GROQ_API_KEY'), temperature=0)
llm_eval = LangchainLLMWrapper(llm)  # reuse same LLM as evaluator via wrapper

In [None]:
from langchain_openai import ChatOpenAI
from ragas.llms import LangchainLLMWrapper
import os
os.environ["OPENAI_API_KEY"] = userdata.get('OPENAI_API_KEY_')
llm_eval = LangchainLLMWrapper(
    ChatOpenAI(model="gpt-4o-mini", temperature=0.2, api_key=os.environ["OPENAI_API_KEY"])  # or gpt-3.5-turbo
)

In [None]:
# Cell 4: Build Chat Pipeline
system_msg = SystemMessagePromptTemplate.from_template("You are a helpful assistant.")
human_msg = HumanMessagePromptTemplate.from_template("{input}")
chat_prompt = ChatPromptTemplate.from_messages([system_msg, MessagesPlaceholder("history"), human_msg])

memory = ConversationBufferMemory(return_messages=True)
chat_chain = chat_prompt | llm
chatbot = RunnableWithMessageHistory(chat_chain, lambda _: memory.chat_memory,
                                     input_messages_key="input", history_messages_key="history")

In [None]:
# Cell 5: Run chat and collect data
session_id = "session1"
records = []

user_inputs = ["What is the moon made of?", "How does gravity on the moon differ from Earth?"]
for user in user_inputs:
    resp = chatbot.invoke({"input": user}, config={"configurable": {"session_id": session_id}})
    answer = resp.content
    contexts = []  # if using retrieval pipeline, collect contexts here
    # For demo, mark empty context or custom retrieval
    records.append({"question": user, "answer": answer, "contexts": contexts})
    display(Markdown(f"**User:** {user}\n\n**Assistant:** {answer}"))

**User:** What is the moon made of?

**Assistant:** The Moon is a fascinating and complex celestial body, and its composition is still a subject of ongoing research and study. However, based on various scientific investigations, including NASA's Apollo missions, lunar meteorites, and remote sensing data, here's a general overview of what the Moon is made of:

1. **Crust:** The Moon's crust is composed of a variety of rock types, including:
	* Feldspathic rocks (rich in feldspar minerals)
	* Pyroxene-rich rocks
	* Olivine-rich rocks
	* Basalts (similar to those found on Earth)
2. **Mantle:** The Moon's mantle is thought to be composed of:
	* Olivine (a magnesium-iron silicate mineral)
	* Pyroxene (a group of minerals rich in iron, magnesium, and calcium)
	* Garnet (a mineral rich in iron, magnesium, and calcium)
3. **Core:** The Moon is believed to have a small, partially molten core with a radius of about 350 kilometers (220 miles). The core is thought to be composed of:
	* Iron (about 85% of the core's mass)
	* Sulfur (about 10% of the core's mass)
	* Nickel (about 5% of the core's mass)
4. **Other components:** The Moon's surface also contains:
	* Regolith (a layer of fragmented rock and soil created by meteorite impacts)
	* Water ice (trapped in permanently shadowed craters near the poles)
	* Helium-3 (a rare isotope of helium, which could be a valuable resource for future lunar missions)

The Moon's composition is thought to have been shaped by a combination of processes, including:

* The giant impact hypothesis: A massive collision between the Earth and a Mars-sized object, which is believed to have formed the Moon about 4.5 billion years ago.
* Magmatic differentiation: The separation of magma into distinct layers, which led to the formation of the Moon's crust and mantle.
* Meteorite impacts: The constant bombardment of the Moon's surface by meteorites, which has shaped its geology and created the regolith.

These are the main components that make up the Moon, but there's still much to be learned about our closest celestial neighbor!

**User:** How does gravity on the moon differ from Earth?

**Assistant:** Gravity on the Moon is significantly weaker than on Earth. Here's a comparison:

* **Earth's surface gravity:** 9.8 meters per second squared (m/s²) or 32.2 feet per second squared (ft/s²)
* **Moon's surface gravity:** 1.62 m/s² or 5.3 ft/s²

This means that the Moon's surface gravity is about **one-sixth** (1/6) of the Earth's surface gravity. This difference is due to the Moon's:

1. **Smaller mass:** The Moon has a much smaller mass than the Earth, which results in a weaker gravitational pull.
2. **Smaller radius:** The Moon's radius is about one-quarter (1/4) of the Earth's radius, which also contributes to the weaker gravity.

The effects of the Moon's lower gravity are quite interesting:

* **Jumping:** On the Moon, you could jump about **six times higher** than on Earth. If you can jump 1 meter (3.3 feet) on Earth, you could jump about 6 meters (19.7 feet) on the Moon.
* **Weight:** If you weigh 100 pounds (45 kilograms) on Earth, you would weigh only about **17 pounds (7.7 kilograms)** on the Moon.
* **Object behavior:** Objects on the Moon would fall more slowly and travel farther when thrown or dropped, due to the weaker gravity.
* **Spacecraft operations:** The lower gravity on the Moon requires spacecraft to adjust their trajectories and landing procedures to ensure a safe and successful touchdown.

The reduced gravity on the Moon also has implications for:

* **Human exploration:** Prolonged exposure to the Moon's low gravity could affect the human body, particularly the muscles, bones, and cardiovascular system.
* **Lunar base construction:** Building structures on the Moon would require innovative designs and materials to withstand the lower gravity and lack of atmospheric pressure.
* **Resource utilization:** The Moon's gravity could influence the extraction and processing of resources, such as water ice, which is thought to be present in some lunar craters.

The unique gravity environment on the Moon presents both challenges and opportunities for scientific research, exploration, and potential human settlement.

In [None]:
samples = [
    {
        "user_input": r["question"],
        "retrieved_contexts": r["contexts"],  # list of context strings
        "response": r["answer"],              # RAGAS expects "response" now
        # No reference since you're not using ground truth
    }
    for r in records
]
print(samples)
from ragas.evaluation import EvaluationDataset
dataset = EvaluationDataset.from_dict(samples)

[{'user_input': 'What is the moon made of?', 'retrieved_contexts': [], 'response': "The Moon is a fascinating and complex celestial body, and its composition is still a subject of ongoing research and study. However, based on various scientific investigations, including NASA's Apollo missions, lunar meteorites, and remote sensing data, here's a general overview of what the Moon is made of:\n\n1. **Crust:** The Moon's crust is composed of a variety of rock types, including:\n\t* Feldspathic rocks (rich in feldspar minerals)\n\t* Pyroxene-rich rocks\n\t* Olivine-rich rocks\n\t* Basalts (similar to those found on Earth)\n2. **Mantle:** The Moon's mantle is thought to be composed of:\n\t* Olivine (a magnesium-iron silicate mineral)\n\t* Pyroxene (a group of minerals rich in iron, magnesium, and calcium)\n\t* Garnet (a mineral rich in iron, magnesium, and calcium)\n3. **Core:** The Moon is believed to have a small, partially molten core with a radius of about 350 kilometers (220 miles). The

In [None]:
from ragas.metrics import LLMContextRecall, ContextPrecision, Faithfulness, AnswerRelevancy, FactualCorrectness

metrics = [
    # FactualCorrectness(),
    # LLMContextRecall(),
    # ContextPrecision(),
    Faithfulness(),
    AnswerRelevancy()
]

In [None]:
# Cell 7: Run RAGAS evaluation
# metrics = [LLMContextRecall(), ContextPrecision(), Faithfulness(), AnswerRelevancy(), FactualCorrectness()]

result = evaluate(
    dataset,
    metrics=metrics,
    llm=llm_eval,
    show_progress=True
)

print(result)

Evaluating:   0%|          | 0/4 [00:00<?, ?it/s]

{'faithfulness': 0.4194, 'answer_relevancy': 0.9497}
