**Run 19 July - RAG

RAG for geetha vahini.pdf




# Part 1: (Aggressive) Installation.

IMPORTANT: follow these steps precisely:
1.	Run this code block.
2.	After it completes, go to Runtime -> Disconnect and delete runtime.
3.	Once the runtime restarts, RUN THIS CODE BLOCK AGAIN.
4.	After this block finishes its second run, you can safely proceed to the next sections.

In [1]:
# --- Installation Block ---
print("Starting library installations and upgrades for RAG...")

# Aggressively uninstall to ensure a clean slate for core libraries
# This helps prevent conflicts with pre-installed Colab packages or previous runs.
!pip uninstall -y torch torchvision torchaudio transformers accelerate bitsandbytes trl peft datasets xformers langchain langchain-community pypdf chromadb sentence-transformers

# Clear relevant caches
# Clearing bitsandbytes cache can help resolve issues with 4-bit quantization.
print("Clearing bitsandbytes cache...")
!rm -rf ~/.cache/bitsandbytes
# Clearing Hugging Face cache can help if model downloads were corrupted.
print("Clearing Hugging Face cache...")
!rm -rf ~/.cache/huggingface/hub/*

# Install PyTorch and Torchvision specifically for CUDA 12.1 (common in Colab)
# This is crucial for GPU acceleration and compatibility.
print("Installing PyTorch and Torchvision for CUDA 12.1...")
!pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121

# Install core Hugging Face libraries
# `transformers`: For loading LLMs and tokenizers.
# `accelerate`: For efficient model loading and GPU usage.
# `bitsandbytes`: For 4-bit quantization (memory efficiency).
# `trl`: For Transformer Reinforcement Learning (though not directly used in Phase 1, it's common for fine-tuning).
# `peft`: Parameter-Efficient Fine-Tuning (for LoRA adapters, used later).
# `datasets`: For handling datasets efficiently.
print("Installing transformers, accelerate, bitsandbytes, trl, peft, datasets...")
!pip install transformers accelerate bitsandbytes "trl==0.8.6" peft datasets

# Install RAG-specific libraries
# `langchain`: Orchestration framework for building LLM applications.
# `langchain-community`: Contains various integrations, including PDF loaders and vector stores.
# `pypdf`: Library for reading PDF files.
# `chromadb`: The specific vector database we'll use.
# `sentence-transformers`: For the embedding model.
print("Installing langchain, langchain-community, pypdf, chromadb, sentence-transformers...")
!pip install langchain langchain-community pypdf chromadb sentence-transformers

# xformers is optional, uncomment if you want to try it. It can optimize attention mechanisms
# but sometimes causes installation/compatibility issues. Not strictly necessary for basic RAG.
# !pip install xformers

print("\nLibrary installation complete.")
print("IMPORTANT: Please follow the instructions above about restarting the runtime.")


Starting library installations and upgrades for RAG...
Found existing installation: torch 2.5.1+cu121
Uninstalling torch-2.5.1+cu121:
  Successfully uninstalled torch-2.5.1+cu121
Found existing installation: torchvision 0.20.1+cu121
Uninstalling torchvision-0.20.1+cu121:
  Successfully uninstalled torchvision-0.20.1+cu121
Found existing installation: torchaudio 2.5.1+cu121
Uninstalling torchaudio-2.5.1+cu121:
  Successfully uninstalled torchaudio-2.5.1+cu121
Found existing installation: transformers 4.53.2
Uninstalling transformers-4.53.2:
  Successfully uninstalled transformers-4.53.2
Found existing installation: accelerate 1.9.0
Uninstalling accelerate-1.9.0:
  Successfully uninstalled accelerate-1.9.0
Found existing installation: bitsandbytes 0.46.1
Uninstalling bitsandbytes-0.46.1:
  Successfully uninstalled bitsandbytes-0.46.1
Found existing installation: trl 0.8.6
Uninstalling trl-0.8.6:
  Successfully uninstalled trl-0.8.6
Found existing installation: peft 0.16.0
Uninstalling pe

# Part 2: Phase 1:

Step Description	Component/Model Used	Vendor/Library/Database Used
Phase 1: Knowledge Ingestion (Building Your Searchable Library)
1. Document Loading & Text Extraction	N/A (Text extraction logic)	PyPDFLoader (from langchain-community)
2. Text Chunking	N/A (Text splitting algorithm)	RecursiveCharacterTextSplitter (from langchain.text_splitter)
3. Embedding (Vectorization)	Embedding Model: BAAI/bge-small-en-v1.5 (a Sentence-BERT model)	Hugging Face (model hub), sentence-transformers library
4. Vector Store (Storage & Indexing)	N/A (Database technology)	ChromaDB (from langchain-community.vectorstores)


In [1]:
import os
from google.colab import drive # For Google Drive mounting
from langchain_community.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.embeddings import SentenceTransformerEmbeddings
from langchain_community.vectorstores import Chroma

# --- Configuration ---
# Ensure this DATA_PATH matches the folder where your PDF is located in Google Drive
DATA_PATH = "/content/drive/MyDrive/fpdata/rag_geetha_vahini"
PDF_FILE_NAME = "geetha_vahini.pdf"
PDF_FILE_PATH = os.path.join(DATA_PATH, PDF_FILE_NAME)
CHROMA_DB_PATH = os.path.join(DATA_PATH, "chroma_db") # Path to store Chroma DB persistently

# --- Mount Google Drive (if not already mounted in this session) ---
print("Mounting Google Drive...")
if not os.path.exists('/content/drive/MyDrive'):
    try:
        drive.mount('/content/drive')
        print("Google Drive mounted successfully!")
    except Exception as e:
        print(f"Error mounting Google Drive: {e}")
        print("Please ensure you are running this in a Google Colab environment and authorize Drive access.")
        exit()
else:
    print("Google Drive already mounted.")

# Ensure the RAG data directory exists
os.makedirs(DATA_PATH, exist_ok=True)
print(f"Ensured RAG data directory exists: {DATA_PATH}")

# --- Check for PDF existence ---
if not os.path.exists(PDF_FILE_PATH):
    print(f"Error: PDF file '{PDF_FILE_NAME}' not found at {PDF_FILE_PATH}.")
    print("Please ensure your 'Geetha Vahini.pdf' is uploaded to this path.")
    exit()
else:
    print(f"PDF file '{PDF_FILE_NAME}' found. Proceeding.")

# --- Step 1: Load PDF Document ---
print(f"\nStep 1: Loading PDF document from: {PDF_FILE_PATH}")
try:
    loader = PyPDFLoader(PDF_FILE_PATH)
    documents = loader.load()
    print(f"Loaded {len(documents)} pages from the PDF.")
except Exception as e:
    print(f"Error loading PDF: {e}")
    print("Please ensure the PDF file is valid and accessible.")
    exit()

# --- Step 2: Split Text into Chunks ---
print("\nStep 2: Splitting documents into chunks...")
# Recommended chunk_size and chunk_overlap for general QA
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=1000,      # Max characters per chunk
    chunk_overlap=200,    # Overlap between chunks to maintain context
    length_function=len,
    add_start_index=True, # Adds metadata about where the chunk came from
)
chunks = text_splitter.split_documents(documents)
print(f"Created {len(chunks)} text chunks.")
if chunks:
    print(f"Example chunk (first 200 chars): {chunks[0].page_content[:200]}...")
    print(f"Source page for example chunk: {chunks[0].metadata.get('page', 'N/A')}")
else:
    print("No chunks were created. Check PDF content or chunking parameters.")

# --- Step 3 & 4: Create Embeddings and Build Vector Store (ChromaDB) ---
print("\nStep 3 & 4: Creating embeddings and building/loading vector store (this may take some time)...")
# Choose an embedding model. 'BAAI/bge-small-en-v1.5' is a good balance of quality and speed.
embedding_model_name = "BAAI/bge-small-en-v1.5"
embeddings = SentenceTransformerEmbeddings(model_name=embedding_model_name)

try:
    # Check if a Chroma DB already exists at the path to avoid re-embedding
    if os.path.exists(CHROMA_DB_PATH) and os.listdir(CHROMA_DB_PATH):
        print(f"Loading existing Chroma DB from {CHROMA_DB_PATH}...")
        vectorstore = Chroma(persist_directory=CHROMA_DB_PATH, embedding_function=embeddings)
        print("Existing Chroma DB loaded.")
    else:
        print(f"Creating new Chroma DB and persisting to {CHROMA_DB_PATH}...")
        if not chunks:
            print("Cannot create vector store: No chunks available. Please check previous steps.")
            exit()
        vectorstore = Chroma.from_documents(documents=chunks, embedding=embeddings, persist_directory=CHROMA_DB_PATH)
        vectorstore.persist() # Explicitly persist the database to disk
        print("New Chroma DB created and persisted.")
except Exception as e:
    print(f"Error creating/loading ChromaDB: {e}")
    print("Please check your disk space, permissions, or ensure the embedding model loaded correctly.")
    exit()

print("\nPhase 1: Knowledge Ingestion complete. Vector store is ready for retrieval!")


Mounting Google Drive...
Google Drive already mounted.
Ensured RAG data directory exists: /content/drive/MyDrive/fpdata/rag_geetha_vahini
PDF file 'geetha_vahini.pdf' found. Proceeding.

Step 1: Loading PDF document from: /content/drive/MyDrive/fpdata/rag_geetha_vahini/geetha_vahini.pdf
Loaded 146 pages from the PDF.

Step 2: Splitting documents into chunks...
Created 508 text chunks.
Example chunk (first 200 chars): GEETHA VAHINI
(The Divine Gospel)
by
Bhagawan Sri Sathya Sai Baba
SRI SATHYA SAI BOOKS & PUBLICATIONS TRUST
Prasanthi Nilayam - 515 134
Anantapur District, Andhra Pradesh, India.
Grams: BOOK TRUST    ...
Source page for example chunk: 0

Step 3 & 4: Creating embeddings and building/loading vector store (this may take some time)...


  embeddings = SentenceTransformerEmbeddings(model_name=embedding_model_name)


modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

README.md: 0.00B [00:00, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/52.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/743 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/133M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/366 [00:00<?, ?B/s]

vocab.txt: 0.00B [00:00, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

special_tokens_map.json:   0%|          | 0.00/125 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

Creating new Chroma DB and persisting to /content/drive/MyDrive/fpdata/rag_geetha_vahini/chroma_db...
New Chroma DB created and persisted.

Phase 1: Knowledge Ingestion complete. Vector store is ready for retrieval!


  vectorstore.persist() # Explicitly persist the database to disk


# Part 3: Fine tuned Llama adapters moved from previous to rag folder

In [4]:
import os
import shutil
from google.colab import drive

# --- Configuration ---
# Original path where your fine-tuned adapters were saved (from your fine-tuning step)
SOURCE_ADAPTERS_BASE_PATH = "/content/drive/MyDrive/fpdata/geetha_vahini"
FINE_TUNED_ADAPTERS_FOLDER_NAME = "llama3_8b_qa_finetuned_adapters_standard_hf"
SOURCE_ADAPTERS_PATH = os.path.join(SOURCE_ADAPTERS_BASE_PATH, FINE_TUNED_ADAPTERS_FOLDER_NAME)

# Destination path for the RAG project
DESTINATION_RAG_DATA_PATH = "/content/drive/MyDrive/fpdata/rag_geetha_vahini"
DESTINATION_ADAPTERS_PATH = os.path.join(DESTINATION_RAG_DATA_PATH, FINE_TUNED_ADAPTERS_FOLDER_NAME)


# --- Mount Google Drive (if not already mounted in this session) ---
print("Mounting Google Drive...")
if not os.path.exists('/content/drive/MyDrive'):
    try:
        drive.mount('/content/drive')
        print("Google Drive mounted successfully!")
    except Exception as e:
        print(f"Error mounting Google Drive: {e}")
        print("Please ensure you are running this in a Google Colab environment and authorize Drive access.")
        exit()
else:
    print("Google Drive already mounted.")

# --- Check if source adapters exist ---
print(f"\nChecking for source adapters at: {SOURCE_ADAPTERS_PATH}")
if not os.path.exists(SOURCE_ADAPTERS_PATH):
    print(f"Error: Source adapters directory not found at {SOURCE_ADAPTERS_PATH}.")
    print("Please ensure your fine-tuned adapters are located there.")
    exit()
else:
    print("Source adapters directory found.")

# --- Create destination directory if it doesn't exist ---
os.makedirs(DESTINATION_RAG_DATA_PATH, exist_ok=True)
print(f"Ensured destination RAG data directory exists: {DESTINATION_RAG_DATA_PATH}")

# --- Copy Adapters ---
print(f"\nAttempting to copy adapters from:\n  {SOURCE_ADAPTERS_PATH}\nTo:\n  {DESTINATION_ADAPTERS_PATH}")

try:
    # Remove destination if it already exists to avoid errors with shutil.copytree
    if os.path.exists(DESTINATION_ADAPTERS_PATH):
        print(f"Removing existing directory at {DESTINATION_ADAPTERS_PATH} to ensure a clean copy...")
        shutil.rmtree(DESTINATION_ADAPTERS_PATH)

    shutil.copytree(SOURCE_ADAPTERS_PATH, DESTINATION_ADAPTERS_PATH)
    print("\nSuccessfully copied fine-tuned LoRA adapters!")

    # Verify contents of the copied directory
    if os.path.exists(DESTINATION_ADAPTERS_PATH) and os.path.isdir(DESTINATION_ADAPTERS_PATH):
        copied_contents = os.listdir(DESTINATION_ADAPTERS_PATH)
        print(f"Contents of copied directory '{FINE_TUNED_ADAPTERS_FOLDER_NAME}': {copied_contents}")
        if "adapter_config.json" in copied_contents and "adapter_model.safetensors" in copied_contents:
            print("Verification: Essential adapter files found in the destination.")
        else:
            print("Warning: Essential adapter files (adapter_config.json or adapter_model.safetensors) not found after copy.")
    else:
        print("Verification: Destination directory not found after copy operation.")

except Exception as e:
    print(f"\nAn error occurred during copying: {e}")
    print("Please check file paths, permissions, and disk space.")

print("\nAdapter copying process complete.")


Mounting Google Drive...
Google Drive already mounted.

Checking for source adapters at: /content/drive/MyDrive/fpdata/geetha_vahini/llama3_8b_qa_finetuned_adapters_standard_hf
Source adapters directory found.
Ensured destination RAG data directory exists: /content/drive/MyDrive/fpdata/rag_geetha_vahini

Attempting to copy adapters from:
  /content/drive/MyDrive/fpdata/geetha_vahini/llama3_8b_qa_finetuned_adapters_standard_hf
To:
  /content/drive/MyDrive/fpdata/rag_geetha_vahini/llama3_8b_qa_finetuned_adapters_standard_hf
Removing existing directory at /content/drive/MyDrive/fpdata/rag_geetha_vahini/llama3_8b_qa_finetuned_adapters_standard_hf to ensure a clean copy...

Successfully copied fine-tuned LoRA adapters!
Contents of copied directory 'llama3_8b_qa_finetuned_adapters_standard_hf': ['adapter_model.safetensors', 'chat_template.jinja', 'special_tokens_map.json', 'tokenizer_config.json', 'adapter_config.json', 'README.md', 'tokenizer.json']
Verification: Essential adapter files fou

# Part 2: Phase 2

In [2]:
##Make sure model adapters and unit test file is accessble

import os
import torch
import json
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
from peft import PeftModel
from huggingface_hub import login
from langchain_community.embeddings import SentenceTransformerEmbeddings
from langchain_community.vectorstores import Chroma
from google.colab import drive

# --- Configuration ---
DATA_PATH = "/content/drive/MyDrive/fpdata/rag_geetha_vahini"
FINE_TUNED_ADAPTERS_FOLDER_NAME = "llama3_8b_qa_finetuned_adapters_standard_hf"
FINE_TUNED_ADAPTERS_PATH = os.path.join(DATA_PATH, FINE_TUNED_ADAPTERS_FOLDER_NAME)
MODEL_NAME = "meta-llama/Meta-Llama-3-8B-Instruct"
CHROMA_DB_PATH = os.path.join(DATA_PATH, "chroma_db")

UNIT_TEST_FILE_NAME = "unit_test_passage_questions_clean.jsonl"
UNIT_TEST_FILE_PATH = os.path.join(DATA_PATH, UNIT_TEST_FILE_NAME)

# --- Mount Google Drive ---
print("Mounting Google Drive...")
if not os.path.exists('/content/drive/MyDrive'):
    try:
        drive.mount('/content/drive')
        print("Google Drive mounted successfully!")
    except Exception as e:
        print(f"Error mounting Google Drive: {e}")
        print("Please ensure you are running this in a Google Colab environment and authorize Drive access.")
        exit()
else:
    print("Google Drive already mounted.")

# --- Check for Unit Test File Existence ---
print(f"\nChecking for existence of Unit Test File: {UNIT_TEST_FILE_PATH}")
if os.path.exists(UNIT_TEST_FILE_PATH):
    print(f"SUCCESS: Unit test file '{UNIT_TEST_FILE_NAME}' found at {UNIT_TEST_FILE_PATH}.")
    try:
        with open(UNIT_TEST_FILE_PATH, 'r', encoding='utf-8') as f:
            first_line = f.readline()
            json.loads(first_line) # Try to parse the first line to check if it's valid JSON
        print("SUCCESS: Unit test file appears to be valid JSONL format.")
    except Exception as e:
        print(f"WARNING: Unit test file found, but could not parse as JSONL. Error: {e}")
else:
    print(f"ERROR: Unit test file '{UNIT_TEST_FILE_NAME}' NOT found at {UNIT_TEST_FILE_PATH}.")
    print("Please ensure it is uploaded to your RAG data directory.")


# --- Hugging Face Login ---
print("\nAttempting Hugging Face Hub login...")
try:
    login()
    print("SUCCESS: Hugging Face login successful!")
except Exception as e:
    print(f"ERROR: Hugging Face login failed: {e}")
    print("Please ensure you have accepted the Llama 3 license and pasted a valid token.")


# --- Check for Fine-tuned Adapters Existence ---
print(f"\nChecking for existence of Fine-tuned Adapters: {FINE_TUNED_ADAPTERS_PATH}")
if os.path.exists(os.path.join(FINE_TUNED_ADAPTERS_PATH, "adapter_config.json")) and \
   os.path.exists(os.path.join(FINE_TUNED_ADAPTERS_PATH, "adapter_model.safetensors")):
    print("SUCCESS: Fine-tuned adapter files found.")
else:
    print(f"ERROR: Fine-tuned adapter files NOT found in {FINE_TUNED_ADAPTERS_PATH}.")
    print("Please ensure they were copied/extracted correctly.")


# --- Attempt to Load Base Model ---
print(f"\nAttempting to load Base Model: {MODEL_NAME}...")
try:
    bnb_config = BitsAndBytesConfig(
        load_in_4bit=True,
        bnb_4bit_quant_type="nf4",
        bnb_4bit_compute_dtype=torch.float16,
    )
    base_model_loaded = AutoModelForCausalLM.from_pretrained(
        MODEL_NAME,
        quantization_config=bnb_config,
        device_map="auto",
        trust_remote_code=True,
    )
    tokenizer_loaded = AutoTokenizer.from_pretrained(MODEL_NAME, trust_remote_code=True)
    print("SUCCESS: Base model and tokenizer loaded.")
    # Clean up to free memory if this is just a check
    del base_model_loaded
    del tokenizer_loaded
    if torch.cuda.is_available():
        torch.cuda.empty_cache()
except Exception as e:
    print(f"ERROR: Could not load Base Model. This might be due to HF login, network issues, or model availability. Error: {e}")


# --- Attempt to Load ChromaDB Vector Store ---
print(f"\nChecking for existence and attempting to load ChromaDB from: {CHROMA_DB_PATH}")
try:
    if os.path.exists(CHROMA_DB_PATH) and os.listdir(CHROMA_DB_PATH):
        # A lightweight embedding function is sufficient for just loading the DB structure
        temp_embeddings = SentenceTransformerEmbeddings(model_name="BAAI/bge-small-en-v1.5")
        vectorstore_loaded = Chroma(persist_directory=CHROMA_DB_PATH, embedding_function=temp_embeddings)
        # Try a dummy query to ensure it's functional
        count = vectorstore_loaded._collection.count()
        print(f"SUCCESS: ChromaDB found and loaded. Contains {count} documents.")
        # Clean up
        del vectorstore_loaded
    else:
        print(f"ERROR: ChromaDB directory not found or is empty at {CHROMA_DB_PATH}.")
        print("Please ensure Phase 1 (Knowledge Ingestion) was completed successfully.")
except Exception as e:
    print(f"ERROR: Could not load ChromaDB. Error: {e}")
    print("This might indicate corruption or an issue during its creation in Phase 1.")

print("\n--- All accessibility checks complete. ---")


Mounting Google Drive...
Google Drive already mounted.

Checking for existence of Unit Test File: /content/drive/MyDrive/fpdata/rag_geetha_vahini/unit_test_passage_questions_clean.jsonl
SUCCESS: Unit test file 'unit_test_passage_questions_clean.jsonl' found at /content/drive/MyDrive/fpdata/rag_geetha_vahini/unit_test_passage_questions_clean.jsonl.
SUCCESS: Unit test file appears to be valid JSONL format.

Attempting Hugging Face Hub login...


VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…

SUCCESS: Hugging Face login successful!

Checking for existence of Fine-tuned Adapters: /content/drive/MyDrive/fpdata/rag_geetha_vahini/llama3_8b_qa_finetuned_adapters_standard_hf
SUCCESS: Fine-tuned adapter files found.

Attempting to load Base Model: meta-llama/Meta-Llama-3-8B-Instruct...


Loading checkpoint shards:   0%|          | 0/4 [00:00<?, ?it/s]

SUCCESS: Base model and tokenizer loaded.

Checking for existence and attempting to load ChromaDB from: /content/drive/MyDrive/fpdata/rag_geetha_vahini/chroma_db


  temp_embeddings = SentenceTransformerEmbeddings(model_name="BAAI/bge-small-en-v1.5")
  vectorstore_loaded = Chroma(persist_directory=CHROMA_DB_PATH, embedding_function=temp_embeddings)


SUCCESS: ChromaDB found and loaded. Contains 508 documents.

--- All accessibility checks complete. ---


In [1]:
import os
import torch
import json
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
from peft import PeftModel
from huggingface_hub import login
from langchain_community.embeddings import SentenceTransformerEmbeddings
from langchain_community.vectorstores import Chroma
from langchain.chains import create_retrieval_chain
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough, RunnableLambda
from langchain_core.documents import Document # Import Document type for clarity in type hints
from typing import Any, List, Mapping, Optional, Dict
from google.colab import drive

# --- Configuration ---
DATA_PATH = "/content/drive/MyDrive/fpdata/rag_geetha_vahini" # This is the RAG-specific data path (UPDATED)
FINE_TUNED_ADAPTERS_FOLDER_NAME = "llama3_8b_qa_finetuned_adapters_standard_hf"
FINE_TUNED_ADAPTERS_PATH = os.path.join(DATA_PATH, FINE_TUNED_ADAPTERS_FOLDER_NAME)
MODEL_NAME = "meta-llama/Meta-Llama-3-8B-Instruct"
CHROMA_DB_PATH = os.path.join(DATA_PATH, "chroma_db") # Path to stored Chroma DB

# Input file for questions
UNIT_TEST_FILE_NAME = "unit_test_passage_questions_clean.jsonl"
UNIT_TEST_FILE_PATH = os.path.join(DATA_PATH, UNIT_TEST_FILE_NAME)

# Output file for generated answers
GENERATED_ANSWERS_OUTPUT_FILE = os.path.join(DATA_PATH, "geetha_vahini_rag_generated_answers.jsonl")


# --- Mount Google Drive (if not already mounted in this session) ---
print("Mounting Google Drive...")
if not os.path.exists('/content/drive/MyDrive'):
    try:
        drive.mount('/content/drive')
        print("Google Drive mounted successfully!")
    except Exception as e:
        print(f"Error mounting Google Drive: {e}")
        print("Please ensure you are running this in a Google Colab environment and authorize Drive access.")
        exit()
else:
    print("Google Drive already mounted.")

# --- Hugging Face Login (REQUIRED for Llama 3) ---
print("\nLogging into Hugging Face Hub...")
try:
    login()
    print("Hugging Face login successful!")
except Exception as e:
    print(f"Hugging Face login failed: {e}")
    print("Please ensure you have accepted the Llama 3 license and pasted a valid token.")
    exit()

# --- Load Base Model with 4-bit Quantization ---
print(f"\nLoading base model: {MODEL_NAME} with 4-bit quantization...")
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.float16,
)

base_model = AutoModelForCausalLM.from_pretrained(
    MODEL_NAME,
    quantization_config=bnb_config,
    device_map="auto",
    trust_remote_code=True,
)
tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME, trust_remote_code=True)
if tokenizer.pad_token is None:
    tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = "left" # For inference

print("Base model and tokenizer loaded.")

# --- Load Fine-tuned LoRA Adapters ---
print(f"Loading LoRA adapters from: {FINE_TUNED_ADAPTERS_PATH}...")
try:
    # Check if adapter files exist before attempting to load
    if not os.path.exists(os.path.join(FINE_TUNED_ADAPTERS_PATH, "adapter_config.json")) or \
       not os.path.exists(os.path.join(FINE_TUNED_ADAPTERS_PATH, "adapter_model.safetensors")):
        raise FileNotFoundError(f"Adapter files not found in {FINE_TUNED_ADAPTERS_PATH}")

    model = PeftModel.from_pretrained(base_model, FINE_TUNED_ADAPTERS_PATH)
    print("LoRA adapters loaded.")
    # Merge adapters for inference
    print("Merging LoRA adapters into base model for inference...")
    model = model.merge_and_unload()
    print("Adapters merged.")
except Exception as e:
    print(f"Error loading or merging LoRA adapters: {e}")
    print("Please ensure the fine-tuned adapters folder exists and contains 'adapter_config.json' and 'adapter_model.safetensors'.")
    exit()

# Set model to evaluation mode
model.eval()

# --- Initialize Embeddings and Vector Store for Retriever ---
print("\nInitializing embedding model and loading vector store for retriever...")
try:
    embedding_model_name = "BAAI/bge-small-en-v1.5"
    embeddings = SentenceTransformerEmbeddings(model_name=embedding_model_name)

    # Check if Chroma DB directory exists and is not empty
    if not os.path.exists(CHROMA_DB_PATH) or not os.listdir(CHROMA_DB_PATH):
        raise FileNotFoundError(f"Chroma DB not found or is empty at {CHROMA_DB_PATH}. Please run Phase 1 first.")

    vectorstore = Chroma(persist_directory=CHROMA_DB_PATH, embedding_function=embeddings)
    retriever = vectorstore.as_retriever(search_kwargs={"k": 3}) # Retrieve top 3 relevant chunks
    print("Retriever initialized.")
except Exception as e:
    print(f"Error initializing retriever or loading vector store: {e}")
    print("Please ensure the Chroma DB was successfully created/persisted in Phase 1.")
    exit()

# --- Define RAG Prompt Template ---
# This prompt guides the LLM to use the retrieved context
prompt = ChatPromptTemplate.from_messages([
    ("system", "You are a helpful assistant that answers questions based ONLY on the provided context. If the answer is not in the context, state that you don't know.\n\nContext: {context}"),
    ("user", "{input}")
])

# --- Standalone LLM Generation Function ---
def generate_llm_response_func(messages_list: List[Any]) -> str:
    """
    Generates a response from the Llama 3 model given a list of chat messages.
    Converts LangChain Message objects to dictionary format expected by tokenizer.apply_chat_template.
    """
    formatted_messages = []
    for msg in messages_list:
        # LangChain's message objects have a 'type' (e.g., 'system', 'human', 'ai') and 'content' attribute.
        if hasattr(msg, 'type') and hasattr(msg, 'content'):
            formatted_messages.append({"role": msg.type, "content": msg.content})
        elif isinstance(msg, dict) and "role" in msg and "content" in msg:
            # Fallback if it's already a dict in the correct format
            formatted_messages.append(msg)
        else:
            # This case should ideally not be hit if the chain is constructed correctly.
            # If it is, it means a non-message object or incorrectly formatted dict was passed.
            print(f"WARNING: Unexpected message format in generate_llm_response_func: {type(msg)} - {msg}")
            # Attempt to convert to a user message, but this indicates an upstream issue
            formatted_messages.append({"role": "user", "content": str(msg)})


    input_ids = tokenizer.apply_chat_template(
        formatted_messages,
        tokenize=True,
        add_generation_prompt=True,
        return_tensors="pt"
    ).to(model.device)

    with torch.no_grad():
        outputs = model.generate(
            input_ids=input_ids,
            max_new_tokens=150,
            do_sample=True,
            top_p=0.9,
            temperature=0.7,
            pad_token_id=tokenizer.pad_token_id,
        )

    decoded_output = tokenizer.decode(outputs[0], skip_special_tokens=False)

    assistant_start_tag = "<|start_header_id|>assistant<|end_header_id|>\n"
    start_index = decoded_output.find(assistant_start_tag)

    if start_index != -1:
        generated_answer = decoded_output[start_index + len(assistant_start_tag):].strip()
        generated_answer = generated_answer.replace("<|eot_id|>", "").strip()
    else:
        generated_answer = "Could not parse assistant's response."
    return generated_answer


# --- Construct the RAG Chain with explicit combine_documents_chain ---
# This chain will take {'context': List[Document], 'input': str} and produce the final answer string
combine_docs_and_generate_chain = (
    # Step 1: Format the context documents into a single string for the prompt
    # The 'context' here is a list of LangChain Document objects from the retriever.
    # We need to extract their page_content and join them.
    {
        "context": lambda x: "\n\n".join([doc.page_content for doc in x["context"]]),
        "input": lambda x: x["input"] # Pass the original input (question) through
    }
    # Step 2: Pass the formatted context and input to the ChatPromptTemplate to get List[BaseMessage]
    | prompt
    # Step 3: Pass the List[BaseMessage] to our custom generation function
    | RunnableLambda(generate_llm_response_func)
    # Step 4: Ensure the final output is a string
    | StrOutputParser()
)

# The full RAG chain combines the retriever with the combine_docs_and_generate_chain.
# create_retrieval_chain ensures the output is a dictionary with 'answer' and 'context'.
rag_chain = create_retrieval_chain(retriever, combine_docs_and_generate_chain)


# --- Load Questions from File and Perform Inference ---
print(f"\nLoading questions from: {UNIT_TEST_FILE_PATH}")
questions_data = []
if not os.path.exists(UNIT_TEST_FILE_PATH):
    print(f"Error: Unit test file not found at {UNIT_TEST_FILE_PATH}.")
    print("Please ensure 'unit_test_passage_questions_clean.jsonl' is in your RAG data directory.")
    exit()

try:
    with open(UNIT_TEST_FILE_PATH, 'r', encoding='utf-8') as f:
        for line in f:
            questions_data.append(json.loads(line))
    print(f"Loaded {len(questions_data)} questions from '{UNIT_TEST_FILE_NAME}'.")
except Exception as e:
    print(f"Error loading questions from {UNIT_TEST_FILE_PATH}: {e}")
    exit()

print(f"\nStarting RAG inference for {len(questions_data)} questions...")
generated_results = []

with open(GENERATED_ANSWERS_OUTPUT_FILE, 'w', encoding='utf-8') as f_out:
    for i, entry in enumerate(questions_data):
        doc_id = entry.get('id', f"unknown_id_{i+1}")
        question = entry.get('question', 'No question provided.')

        print(f"\n--- Processing Question {i+1} (ID: {doc_id}) ---")
        print(f"Question: {question}")

        try:
            # Clear CUDA cache before each RAG query
            if torch.cuda.is_available():
                torch.cuda.empty_cache()

            # Invoke the RAG chain with the user's query
            # rag_chain.invoke now correctly returns a dictionary with 'answer' and 'context'
            response = rag_chain.invoke({"input": question})

            generated_answer = response.get('answer', 'No answer generated.')
            retrieved_context = response.get('context', [])

            print(f"Generated Answer: {generated_answer}")
            print("\n--- Retrieved Context (First 200 chars of each chunk) ---")
            if retrieved_context:
                for j, doc in enumerate(retrieved_context):
                    print(f"Chunk {j+1} (Page {doc.metadata.get('page', 'N/A')}): {doc.page_content[:200]}...")
            else:
                print("No context retrieved.")
            print("----------------------------------------------------------\n")

            # Store results for output file
            generated_results.append({
                "id": doc_id,
                "question": question,
                "generated_answer": generated_answer
            })
            json.dump(generated_results[-1], f_out)
            f_out.write('\n')

        except Exception as e:
            print(f"An error occurred during RAG inference for ID {doc_id}: {e}")
            generated_results.append({
                "id": doc_id,
                "question": question,
                "generated_answer": f"ERROR: {e}"
            })
            json.dump(generated_results[-1], f_out)
            f_out.write('\n')

print(f"\nPhase 2: RAG inference complete. All generated answers saved to: {GENERATED_ANSWERS_OUTPUT_FILE}")


Mounting Google Drive...
Google Drive already mounted.

Logging into Hugging Face Hub...


VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…

Hugging Face login successful!

Loading base model: meta-llama/Meta-Llama-3-8B-Instruct with 4-bit quantization...


Loading checkpoint shards:   0%|          | 0/4 [00:00<?, ?it/s]

Base model and tokenizer loaded.
Loading LoRA adapters from: /content/drive/MyDrive/fpdata/rag_geetha_vahini/llama3_8b_qa_finetuned_adapters_standard_hf...
LoRA adapters loaded.
Merging LoRA adapters into base model for inference...




Adapters merged.

Initializing embedding model and loading vector store for retriever...


  embeddings = SentenceTransformerEmbeddings(model_name=embedding_model_name)
  vectorstore = Chroma(persist_directory=CHROMA_DB_PATH, embedding_function=embeddings)


Retriever initialized.

Loading questions from: /content/drive/MyDrive/fpdata/rag_geetha_vahini/unit_test_passage_questions_clean.jsonl
Error loading questions from /content/drive/MyDrive/fpdata/rag_geetha_vahini/unit_test_passage_questions_clean.jsonl: Expecting value: line 2 column 1 (char 1)

Starting RAG inference for 23 questions...

--- Processing Question 1 (ID: 1) ---
Question: What do the terms Niraakaara, Para, and Parabrahmamreveal about the nature of the Eternal, and how does this contrast with physical identification?


The attention mask is not set and cannot be inferred from input because pad token is same as eos token. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.


Generated Answer: Based on the provided context, I'll do my best to answer your question.

--- Retrieved Context (First 200 chars of each chunk) ---
Chunk 1 (Page 64): these two and there is no Prapancha or Universe any more.
The Form is conceived and controlled by the Name. The
Roopa is dependent on the Name. So if you reason out
which is more lasting, you will fin...
Chunk 2 (Page 133): Geetha Vahini 262 Geetha Vahini
CHAPTER XXV
“
Krishna! You say that those who recognise the
world as mere world cannot claim to know the
Vedas. They must recognise it as God,
Paramatma. The world is a...
Chunk 3 (Page 89): quickly enlighten me about the seventh attribute of the
Saguna-nirakaara (God with qualities but without form).”
“Yes, the seventh is: Aadhithya-varnam, with the
Splendour of the Sun as His Complexion...
----------------------------------------------------------


--- Processing Question 2 (ID: 2) ---
Question: How does understanding the etymological roots of Brahmam and Purusha dee