<a href="https://colab.research.google.com/github/Giri-Shankar/rag-vs-llm/blob/main/Deep_Learning_(RAG_and_LLM).ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
# Install all required libraries
# - langchain_community: PDF loaders, document loaders
# - chromadb: vector database for storing embeddings
# - sentence-transformers: similarity scoring (hallucination evaluation)
# - pypdf: PDF text extraction
# - langchain-openai: OpenAI LLM + embeddings wrapper
# - langchainhub, langchain: core RAG framework

!pip install langchain_community langchainhub chromadb langchain langchain-openai sentence-transformers pypdf


Collecting langchain_community
  Downloading langchain_community-0.4.1-py3-none-any.whl.metadata (3.0 kB)
Collecting langchainhub
  Downloading langchainhub-0.1.21-py3-none-any.whl.metadata (659 bytes)
Collecting chromadb
  Downloading chromadb-1.3.5-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (7.2 kB)
Collecting langchain-openai
  Downloading langchain_openai-1.1.0-py3-none-any.whl.metadata (2.6 kB)
Collecting pypdf
  Downloading pypdf-6.4.1-py3-none-any.whl.metadata (7.1 kB)
Collecting langchain-classic<2.0.0,>=1.0.0 (from langchain_community)
  Downloading langchain_classic-1.0.0-py3-none-any.whl.metadata (3.9 kB)
Collecting requests<3.0.0,>=2.32.5 (from langchain_community)
  Downloading requests-2.32.5-py3-none-any.whl.metadata (4.9 kB)
Collecting dataclasses-json<0.7.0,>=0.6.7 (from langchain_community)
  Downloading dataclasses_json-0.6.7-py3-none-any.whl.metadata (25 kB)
Collecting packaging<25,>=23.2 (from langchainhub)
  Downloading packaging-24.2-py3-no

In [None]:
# Load API keys securely from Google Colab's userdata storage.
# Make sure you have saved your "OpenRouterAPIKey" in:
#   Runtime → Run time secrets → Add new secret

from google.colab import userdata
import os

# Set environment variables so LangChain/OpenAI libraries can detect your OpenRouter key
os.environ['OPENAI_API_KEY'] = userdata.get('OpenRouterAPIKey2')

# Set the base URL for OpenRouter API endpoint
os.environ['OPENAI_API_BASE'] = 'https://openrouter.ai/api/v1'


In [None]:
# This cell opens a file-picker so you can upload PDFs from your computer.
# The uploaded files will appear in the Colab runtime and can be accessed by name.

from google.colab import files

uploaded = files.upload()  # Allows you to upload one or multiple PDF files


Saving OS interview questions.pdf to OS interview questions (1).pdf


In [None]:
# Load PDFs using LangChain's PyPDFLoader.
# This converts your uploaded PDF into a list of Document objects (one per page).

from langchain_community.document_loaders import PyPDFLoader

# Automatically picks the first uploaded file
file_path = list(uploaded.keys())[0]

# Initialize the PDF loader
loader = PyPDFLoader(file_path)

# Load the PDF into LangChain Document objects
docs = loader.load()

print("Number of pages loaded:", len(docs))

# Display the content of the second page (index 1) as a preview
docs[1]


Number of pages loaded: 38


Document(metadata={'producer': 'Skia/PDF m85', 'creator': 'Chromium', 'creationdate': '2021-06-11T07:54:25+00:00', 'moddate': '2021-06-11T07:54:25+00:00', 'source': 'OS interview questions (1).pdf', 'total_pages': 38, 'page': 1, 'page_label': '2'}, page_content="Basic OS Interview Questions\n1.\xa0\xa0\xa0Why is the operating system important?\n2.\xa0\xa0\xa0What's the main purpose of an OS? What are the diﬀerent types of OS?\n3.\xa0\xa0\xa0What are the benefits of a multiprocessor system?\n4.\xa0\xa0\xa0What is RAID structure in OS? What are the diﬀerent levels of RAID configuration?\n5.\xa0\xa0\xa0What is GUI?\n6.\xa0\xa0\xa0What is a Pipe and when it is used?\n7.\xa0\xa0\xa0What are the diﬀerent kinds of operations that are possible on semaphore?\n8.\xa0\xa0\xa0What is a bootstrap program in OS?\n9.\xa0\xa0\xa0Explain demand paging?\n10.\xa0\xa0\xa0What do you mean by RTOS?\n11.\xa0\xa0\xa0What do you mean by process synchronization?\n12.\xa0\xa0\xa0What is IPC? What are the diﬀeren

In [None]:
# RecursiveCharacterTextSplitter splits long PDF pages into smaller chunks.
# These chunks are used later for embedding + retrieval in the RAG pipeline.

from langchain_text_splitters import RecursiveCharacterTextSplitter

# Create a text splitter:
# - chunk_size: maximum number of characters per chunk
# - chunk_overlap: repeated characters between chunks to preserve context across boundaries
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=1000,
    chunk_overlap=200
)

# Split all pages into smaller, retrieval-friendly text chunks
splits = text_splitter.split_documents(docs)

print("Total number of chunks created:", len(splits))
splits[0]  # Show the first chunk as a preview


Total number of chunks created: 54


Document(metadata={'producer': 'Skia/PDF m85', 'creator': 'Chromium', 'creationdate': '2021-06-11T07:54:25+00:00', 'moddate': '2021-06-11T07:54:25+00:00', 'source': 'OS interview questions (1).pdf', 'total_pages': 38, 'page': 0, 'page_label': '1'}, page_content='Operating System Interview\nQuestions\nTo view the live version of the\npage, click here.\n© Copyright by Interviewbit')

In [None]:
# Print the total number of text chunks produced after splitting the PDF.
# This helps verify chunking worked correctly and ensures enough data for retrieval.

print("Number of chunks generated:", len(splits))


Number of chunks generated: 54


In [None]:
# Import dependencies for embeddings and vector store creation
from langchain_openai import OpenAIEmbeddings
from langchain_community.vectorstores import Chroma
import os
from google.colab import userdata

# ---------------------------------------------------------------
# Ensure API keys for OpenRouter are loaded into environment
# These should already be saved in: Runtime → Run time secrets
# ---------------------------------------------------------------
os.environ['OPENAI_API_KEY'] = userdata.get('OpenRouterAPIKey')
os.environ['OPENAI_API_BASE'] = 'https://openrouter.ai/api/v1'

# ---------------------------------------------------------------
# Create Chroma vector store with OpenAI Embeddings via OpenRouter
# Chroma stores embeddings locally inside Colab session
# ---------------------------------------------------------------
vectorstore = Chroma.from_documents(
    documents=splits,                     # The text chunks we created earlier
    embedding=OpenAIEmbeddings(           # Embedding model for vector database
        openai_api_key=os.environ['OPENAI_API_KEY'],
        openai_api_base=os.environ['OPENAI_API_BASE']
    )
)

# Show vectorstore info
vectorstore


<langchain_community.vectorstores.chroma.Chroma at 0x7e9094d7f1d0>

In [None]:
# This cell deletes the current Chroma collection so you can rebuild the vector store.
# Use this when:
# - You upload a new PDF
# - You change chunking settings
# - You want a clean vector database

try:
    # Delete all documents where page >= 0 (i.e., delete everything)
    vectorstore._collection.delete(where={'page': {'$gte': 0}})
    print("✔️ Chroma collection deleted successfully.")
    print("➡️ Re-run the cells starting from the 'splits' creation to rebuild the vector store.")
except Exception as e:
    print("⚠️ No existing collection found or deletion failed.")
    print("Error:", e)


✔️ Chroma collection deleted successfully.
➡️ Re-run the cells starting from the 'splits' creation to rebuild the vector store.


In [None]:
# This prints the total number of embedded chunks stored inside the Chroma vector database.
# Use this to confirm whether your embeddings were created successfully.

print("Number of embedded documents in Chroma:", vectorstore._collection.count())


Number of embedded documents in Chroma: 54


In [None]:
# Convert the Chroma vector store into a retriever.
# The retriever is responsible for:
# - Accepting a user query
# - Finding the most similar chunks from the PDF
# - Passing those chunks to the LLM for grounded answering

retriever = vectorstore.as_retriever()

print("Retriever initialized successfully.")


Retriever initialized successfully.


In [None]:
# This cell suppresses noisy deprecation warnings coming from Jupyter internals.
# It does NOT affect your model, embeddings, or RAG pipeline.
# It simply keeps the notebook output clean and readable.

import warnings

warnings.filterwarnings(
    'ignore',
    category=DeprecationWarning,
    module='jupyter_client'
)

print("✔️ Deprecation warnings from jupyter_client are now suppressed.")




In [None]:
# Import LangChain's ChatPromptTemplate for constructing structured prompts
from langchain_core.prompts import ChatPromptTemplate

# ---------------------------------------------------------------
# Define the RAG prompt template:
# - The system message tells the model HOW to answer
# - The human message contains the actual question + retrieved context
# - {question} and {context} will be dynamically filled during inference
# ---------------------------------------------------------------
prompt = ChatPromptTemplate.from_messages([
    (
        "system",
        "You are an assistant for question-answering tasks. "
        "Use the following pieces of retrieved context to answer the question. "
        "If you don't know the answer, just say that you don't know. "
        "Use three sentences maximum and keep the answer concise."
    ),
    (
        "human",
        "Question: {question}\nContext: {context}"
    )
])

print("RAG prompt template created successfully.")


RAG prompt template created successfully.


In [None]:
rag_chain.invoke

In [None]:
from langchain_openai import ChatOpenAI

# Initialize the LLM with a reduced max_tokens to prevent credit issues.
# The prompt asks for three sentences maximum, so 1000 tokens should be sufficient.
llm = ChatOpenAI(max_tokens=1000)

print("LLM initialized successfully using OpenRouter settings with max_tokens set to 1000.")

LLM initialized successfully using OpenRouter settings with max_tokens set to 1000.


In [None]:
# Import RunnablePassthrough:
# This allows building modular RAG pipelines where inputs can flow through
# different components (retriever → prompt → model → parser).

from langchain_core.runnables import RunnablePassthrough

# Import StrOutputParser:
# This converts the raw model output (which may be a complex object)
# into a clean string that we can print, store, or evaluate.

from langchain_core.output_parsers import StrOutputParser

print("Runnable and output parser imported successfully.")


Runnable and output parser imported successfully.


In [None]:
# This helper function takes a list of LangChain Document objects
# and converts them into a single string.
#
# Why do we need this?
# - The retriever returns multiple document chunks.
# - The LLM expects the context as plain text, not objects.
# - This function joins the text content of each chunk cleanly.

def format_docs(docs):
    return "\n".join(doc.page_content for doc in docs)

print("Document formatting function created successfully.")


Document formatting function created successfully.


In [None]:
# ---------------------------------------------------------------
# Build the RAG pipeline (Retriever → Prompt → LLM → Parser)
#
# The flow works like this:
# 1. The user provides a question.
# 2. The retriever fetches the most relevant document chunks.
# 3. format_docs converts those chunks into plain text.
# 4. The prompt is filled with {question} and {context}.
# 5. The LLM generates an answer grounded in the retrieved context.
# 6. StrOutputParser converts the LLM output into a clean string.
# ---------------------------------------------------------------

rag_chain = (
    {
        "context": retriever | format_docs,   # Retrieve docs → format them
        "question": RunnablePassthrough()     # Pass the question unchanged
    }
    | prompt        # Fill the prompt template
    | llm           # Generate answer using the model
    | StrOutputParser()   # Convert output to plain text
)

print("RAG chain constructed successfully.")


RAG chain constructed successfully.


In [None]:
rag_chain.invoke("What is today's date")

In [None]:
# -------------------------------------------------------------------------
# This string represents a CSV file containing questions + their ground truths.
# We will later load this into a pandas DataFrame using pd.read_csv(StringIO(...)).
#
# Why store ground truth this way?
# - Easy to modify inside notebook
# - Can be exported
# - Works perfectly for evaluation (RAG vs LLM hallucination detection)
# -------------------------------------------------------------------------

ground_truth_string = '''
question,ground_truth
What is a Bootstrap Program?,"The document defines a bootstrap program as the first code executed when the system starts, stored entirely in the boot block at a fixed disk location, and responsible for locating the kernel and loading it into main memory before execution begins."
What is RTOS?,"A Real-Time Operating System is described specifically as an OS used for applications where data must be processed within a fixed and small measure of time, and examples given include air-traffic control systems, anti-lock braking systems, and heart pacemakers."
What is process synchronization?,"Process synchronization is defined as the mechanism used to coordinate processes that share resources to maintain data consistency, achieved through mutual exclusion between independent and cooperative processes."
What is IPC?,"Interprocess Communication is defined as a mechanism where processes exchange data using OS-approved methods such as pipes, message queues, semaphores, sockets, shared memory, and signals."
What is the difference between main memory and secondary memory?,"Main memory is defined as volatile read-write memory accessed directly by the processing unit, whereas secondary memory is non-volatile storage requiring data to be transferred to primary memory before CPU access and accessed only via I/O channels."
What is virtual memory?,"Virtual memory is described as a memory management technique that creates the illusion of a very large main memory by storing programs as pages and managing them through paging or segmentation."
What is a thread?,"A thread is defined as a basic unit of CPU utilization that consists of a program counter, thread identifier, stack, and a set of registers, enabling improved performance through parallelism and reduced context-switching overhead."
What is FCFS?,"FCFS is described as a scheduling algorithm in which the process that arrives first is executed first, following a strictly non-preemptive order."
What is thrashing?,"Thrashing occurs when excessive paging operations significantly reduce CPU utilization because the system spends more time swapping pages than executing processes due to improper memory allocation."
What is a deadlock?,"A deadlock is defined as a state where processes wait indefinitely for resources held by one another, arising only when the four necessary conditions—mutual exclusion, hold and wait, no preemption, and circular wait—are simultaneously present."
What is spooling?,"Spooling is defined as the technique of storing data temporarily for devices that operate at different speeds, such as printers, allowing the OS to manage data flow independently of the device's processing rate."
What is a time-sharing system?,"A time-sharing system is described as an OS that allows multiple users to share computer resources simultaneously by rapidly switching the CPU among them to provide interactive response times."
What is context switching?,"Context switching is defined specifically as saving the state of a currently running process and loading the saved state of another process, allowing multitasking by the OS."
What is SMP (Symmetric Multiprocessing)?,"SMP is defined as an architecture in which multiple processors share a single operating system and memory, and all processors are treated equally and run in parallel."
Explain the difference between kernel and OS?,"The kernel is defined as the core component of the system responsible for low-level operations like scheduling, memory management, and device handling, whereas the operating system includes the kernel plus all supporting system software and utilities."
'''

print("Ground truth string loaded successfully.")


Ground truth string loaded successfully.


In [None]:
# -------------------------------------------------------------------------
# Convert the multi-line CSV string (ground_truth_string) into structured data.
# Using Python's `csv` module ensures correct parsing  even when fields contain commas.
# -------------------------------------------------------------------------

import csv
from io import StringIO

ground_truth_data = []

# Create CSV reader from string
csv_reader = csv.DictReader(StringIO(ground_truth_string.strip()))

# Each row will automatically contain: {"question": ..., "ground_truth": ...}
for row in csv_reader:
    ground_truth_data.append({
        "question": row["question"].strip(),
        "ground_truth": row["ground_truth"].strip()
    })

# Preview a few entries
print("Structured ground truth data created successfully:")
print(ground_truth_data[:3])   # show first 3 items


Structured ground truth data created successfully:
[{'question': 'What is a Bootstrap Program?', 'ground_truth': 'The document defines a bootstrap program as the first code executed when the system starts, stored entirely in the boot block at a fixed disk location, and responsible for locating the kernel and loading it into main memory before execution begins.'}, {'question': 'What is RTOS?', 'ground_truth': 'A Real-Time Operating System is described specifically as an OS used for applications where data must be processed within a fixed and small measure of time, and examples given include air-traffic control systems, anti-lock braking systems, and heart pacemakers.'}, {'question': 'What is process synchronization?', 'ground_truth': 'Process synchronization is defined as the mechanism used to coordinate processes that share resources to maintain data consistency, achieved through mutual exclusion between independent and cooperative processes.'}]


In [None]:
# -------------------------------------------------------------------------
# This cell generates:
#   1. RAG answers (retrieval → context → LLM)
#   2. LLM answers WITH retrieved context (no RAG chain, direct prompt)
#
# It stores the results in `for_analysis`, which is later used for
# hallucination evaluation and comparison.
# -------------------------------------------------------------------------

for_analysis = []

for item in ground_truth_data:  # include ALL questions (no skipping)
    question = item['question']
    ground_truth = item['ground_truth']

    print("\n==============================================")
    print("QUESTION:", question)
    print("GROUND TRUTH:", ground_truth)

    # ---------------------------------------------------
    # (1) RAG Answer → rag_chain handles retrieval + prompt + LLM
    # ---------------------------------------------------
    rag_answer = rag_chain.invoke(question)
    print("RAG Answer:", rag_answer)

    # ---------------------------------------------------
    # (2) LLM Answer with Context (manual RAG-style call)
    # ---------------------------------------------------
    retrieved_docs = retriever.invoke(question)
    formatted_context = format_docs(retrieved_docs)

    # Build messages for ChatOpenAI manually
    llm_messages = prompt.format_messages(
        question=question,
        context=formatted_context
    )

    # Generate answer from LLM (same model as RAG step)
    llm_answer = llm.invoke(llm_messages).content
    print("LLM (with context) Answer:", llm_answer)

    # Store everything for evaluation
    for_analysis.append({
        "Question": question,
        "Ground Truth": ground_truth,
        "RAG Answer": rag_answer,
        "LLM (with context) Answer": llm_answer,
    })

print("\n✔️ All answers generated and stored in `for_analysis`.")



QUESTION: What is a Bootstrap Program?
GROUND TRUTH: The document defines a bootstrap program as the first code executed when the system starts, stored entirely in the boot block at a fixed disk location, and responsible for locating the kernel and loading it into main memory before execution begins.
RAG Answer: A bootstrap program in an operating system is the initial program that initializes the operating system during startup. It is the first code executed when a computer system boots up, loading the OS through a bootstrapping process. The bootstrap program is crucial for locating the kernel, loading it into memory, and starting the execution of the OS.
LLM (with context) Answer: A bootstrap program in an operating system is the initial program that initializes the OS during startup. It is the first code executed when the computer system starts up, loading the OS through a bootstrapping process. The bootstrap program is crucial for locating and loading the kernel into main memory b

In [None]:
# -------------------------------------------------------------------------
# IMPORTS
# -------------------------------------------------------------------------

from sentence_transformers import SentenceTransformer, util
import pandas as pd
import numpy as np
import re

# -------------------------------------------------------------------------
# Load SentenceTransformer embedding model
# Used for semantic similarity scoring
# -------------------------------------------------------------------------

model = SentenceTransformer('sentence-transformers/all-mpnet-base-v2')

# -------------------------------------------------------------------------
# Base Evaluation Function
# Computes:
#   1. semantic similarity score
#   2. exact match (0/1)
#   3. plain hallucination score (1 - similarity)
# -------------------------------------------------------------------------

def evaluate(ground_truth, prediction):
    # Semantic similarity using embeddings
    emb_gt = model.encode(ground_truth, convert_to_tensor=True)
    emb_pred = model.encode(prediction, convert_to_tensor=True)
    similarity = float(util.cos_sim(emb_gt, emb_pred))

    # Simple exact match (not usually high for long answers)
    exact = 1 if prediction.strip().lower() == ground_truth.strip().lower() else 0

    # Baseline hallucination score
    hallucination_score = 1 - similarity

    return similarity, exact, hallucination_score

# -------------------------------------------------------------------------
# Advanced Hallucination Detector
# Incorporates:
#   - semantic drift (1 - similarity)
#   - unsupported content: tokens in answer not present in ground truth
#
# final score = 0.6*(1 - similarity) + 0.4*(unsupported_ratio)
# -------------------------------------------------------------------------

def hallucination_score(ground_truth, answer, similarity):
    # Tokenize into words
    gt_tokens = set(re.findall(r"\w+", ground_truth.lower()))
    ans_tokens = set(re.findall(r"\w+", answer.lower()))

    # Words appearing in the answer but NOT in the ground truth
    unsupported = ans_tokens - gt_tokens

    # Ratio of unsupported tokens in the answer
    unsupported_ratio = (len(unsupported) / len(ans_tokens)) if len(ans_tokens) > 0 else 0

    # Combined hallucination score (0–1 scale, higher is worse)
    score = 0.6*(1 - similarity) + 0.4*unsupported_ratio
    return score, unsupported_ratio

# -------------------------------------------------------------------------
# Apply scoring to ALL question–answer pairs
# Generates a DataFrame with:
#   sim_rag, sim_llm
#   ex_rag, ex_llm
#   hall_rag, hall_llm
#   unsupported_rag, unsupported_llm
# -------------------------------------------------------------------------

rows = []

for row in for_analysis:
    # Evaluate RAG
    sim_rag, ex_rag, _ = evaluate(row["Ground Truth"], row["RAG Answer"])

    # Evaluate LLM with context
    sim_llm, ex_llm, _ = evaluate(row["Ground Truth"], row["LLM (with context) Answer"])

    # Compute advanced hallucination scores
    hall_rag, unsup_rag = hallucination_score(
        row["Ground Truth"],
        row["RAG Answer"],
        sim_rag
    )

    hall_llm, unsup_llm = hallucination_score(
        row["Ground Truth"],
        row["LLM (with context) Answer"],
        sim_llm
    )

    # Store results for DataFrame
    rows.append({
        "question": row["Question"],
        "sim_rag": sim_rag,
        "sim_llm": sim_llm,
        "ex_rag": ex_rag,
        "ex_llm": ex_llm,
        "hall_rag": hall_rag,
        "hall_llm": hall_llm,
        "unsupported_rag": unsup_rag,
        "unsupported_llm": unsup_llm
    })

# Create final results table
df = pd.DataFrame(rows)

print("✔️ Evaluation complete — DataFrame created:")
df


✔️ Evaluation complete — DataFrame created:


Unnamed: 0,question,sim_rag,sim_llm,ex_rag,ex_llm,hall_rag,hall_llm,unsupported_rag,unsupported_llm
0,What is a Bootstrap Program?,0.928359,0.929304,0,0,0.237579,0.222417,0.486486,0.45
1,What is RTOS?,0.809098,0.821102,0,0,0.298541,0.32473,0.46,0.543478
2,What is process synchronization?,0.905188,0.855818,0,0,0.294725,0.366509,0.594595,0.7
3,What is IPC?,0.78334,0.772284,0,0,0.390866,0.40046,0.652174,0.659574
4,What is the difference between main memory and...,0.884665,0.896326,0,0,0.363938,0.356544,0.736842,0.735849
5,What is virtual memory?,0.921704,0.920703,0,0,0.313644,0.337774,0.666667,0.72549
6,What is a thread?,0.908232,0.91456,0,0,0.330061,0.307264,0.6875,0.64
7,What is FCFS?,0.923996,0.919737,0,0,0.347643,0.341491,0.755102,0.733333
8,What is thrashing?,0.889805,0.926398,0,0,0.354352,0.327495,0.720588,0.708333
9,What is a deadlock?,0.916296,0.922271,0,0,0.28295,0.26886,0.581818,0.555556


In [None]:
# -------------------------------------------------------------------------
# Automatic Hallucination Labeling
#
# Based on final hallucination score:
#   - F (Fully Grounded)       → score < 0.15
#   - P (Partially Grounded)   → 0.15 ≤ score < 0.35
#   - H (Hallucinated)         → score ≥ 0.35
#
# These thresholds work well with combined metrics:
#   similarity + unsupported content ratio
# -------------------------------------------------------------------------

def auto_label_hall(score):
    if score < 0.15:
        return "F"   # Fully grounded (very low hallucination)
    elif score < 0.35:
        return "P"   # Partially grounded (moderate hallucination)
    else:
        return "H"   # Hallucinated (high hallucination)

# Apply labels to both RAG and LLM hallucination scores
df["label_rag"] = df["hall_rag"].apply(auto_label_hall)
df["label_llm"] = df["hall_llm"].apply(auto_label_hall)

print("✔️ Automatic hallucination labels (F/P/H) added to DataFrame.")
df[["question", "label_rag", "label_llm"]]


✔️ Automatic hallucination labels (F/P/H) added to DataFrame.


Unnamed: 0,question,label_rag,label_llm
0,What is a Bootstrap Program?,P,P
1,What is RTOS?,P,P
2,What is process synchronization?,P,H
3,What is IPC?,H,H
4,What is the difference between main memory and...,H,H
5,What is virtual memory?,P,P
6,What is a thread?,P,P
7,What is FCFS?,P,P
8,What is thrashing?,H,P
9,What is a deadlock?,P,P


In [None]:
# -------------------------------------------------------------------------
# Compute percentage distribution of F / P / H labels automatically.
#
# For a given column (label_rag or label_llm), this function returns:
#   - Fully Grounded %
#   - Partially Grounded %
#   - Hallucinated %
#
# This is perfect for comparing RAG vs LLM hallucination behavior.
# -------------------------------------------------------------------------

def stats_auto(col):
    total = len(df)
    return {
        "Fully Grounded %": float(round((df[col] == "F").mean() * 100, 2)),
        "Partially Grounded %": float(round((df[col] == "P").mean() * 100, 2)),
        "Hallucinated %": float(round((df[col] == "H").mean() * 100, 2))
    }

# After computing stats
stats_rag = {k: float(v) for k, v in stats_auto("label_rag").items()}
stats_llm = {k: float(v) for k, v in stats_auto("label_llm").items()}

print("📊 RAG Auto Labels:", stats_rag)
print("📊 LLM Auto Labels:", stats_llm)


📊 RAG Auto Labels: {'Fully Grounded %': 0.0, 'Partially Grounded %': 73.33, 'Hallucinated %': 26.67}
📊 LLM Auto Labels: {'Fully Grounded %': 0.0, 'Partially Grounded %': 66.67, 'Hallucinated %': 33.33}
