# **WiseBot: Philosophical Inquiry Agent**

This notebook implements a WiseBot, an AI chatbot designed to provide philosophical and religious insights by semantically searching through various holy texts via RAG and generating responses using a large language model.

In [2]:
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [None]:
!pip install sentence_transformers
!pip install faiss-cpu
import os
import pickle
import re
import logging
import pandas as pd
import numpy as np
import faiss
from sentence_transformers import SentenceTransformer
import google.generativeai as genai
from google.colab import userdata

Collecting faiss-cpu
  Downloading faiss_cpu-1.13.2-cp310-abi3-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.metadata (7.6 kB)
Downloading faiss_cpu-1.13.2-cp310-abi3-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (23.8 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m23.8/23.8 MB[0m [31m62.2 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: faiss-cpu
Successfully installed faiss-cpu-1.13.2



All support for the `google.generativeai` package has ended. It will no longer be receiving 
updates or bug fixes. Please switch to the `google.genai` package as soon as possible.
See README for more details:

https://github.com/google-gemini/deprecated-generative-ai-python/blob/main/README.md

  loader.exec_module(module)


## Pre-embedding and FAISS Index Setup

In [None]:
DRIVE_PATH = '/content/drive/MyDrive/WiseBot'
print(f"Google Drive path set to: {DRIVE_PATH}")

def load_datasets():
    """Load datasets from CSV files."""
    gita_df = pd.read_csv('/content/drive/MyDrive/WiseBot/gita.csv')
    quran_df = pd.read_csv('/content/drive/MyDrive/WiseBot/quran.csv')
    bible_df = pd.read_csv('/content/drive/MyDrive/WiseBot/bible.csv')
    return gita_df, quran_df, bible_df

def generate_embeddings(datasets, model, batch_size=32):
    """Generate embeddings for all verses using batch processing."""
    embeddings = []
    metadata = []

    for book, verses_list in datasets.items():
        for i in range(0, len(verses_list), batch_size):
            batch = verses_list[i : i + batch_size]
            verses = [entry["verse"] for entry in batch]
            batch_embeddings = model.encode(verses)
            embeddings.extend(batch_embeddings)
            metadata.extend([
                {
                    "book": book,
                    "verse": entry["verse"],
                }
                for entry in batch
            ])

    return np.array(embeddings), metadata

def create_faiss_index(embeddings):
    """Create and return a FAISS index."""
    if embeddings.size == 0:
        return None

    embedding_dim = embeddings.shape[1]
    index = faiss.IndexFlatL2(embedding_dim)
    index.add(embeddings)
    return index

def generate_and_save_embeddings(datasets, model):
    """Generate and save embeddings and metadata."""
    embeddings, metadata = generate_embeddings(datasets, model)
    with open(os.path.join(DRIVE_PATH, 'embeddings.pkl'), 'wb') as f:
        pickle.dump(embeddings, f)
    with open(os.path.join(DRIVE_PATH, 'metadata.pkl'), 'wb') as f:
        pickle.dump(metadata, f)
    print("Embeddings and metadata saved.")
    return embeddings, metadata # Return embeddings for index creation

def create_and_save_index(embeddings):
    """Create and save FAISS index."""
    index = create_faiss_index(embeddings)
    with open(os.path.join(DRIVE_PATH, 'index.pkl'), 'wb') as f:
        pickle.dump(index, f)
    print("FAISS index saved.")

def pre_embed_and_save():
    """Pre-calculate embeddings and save to file, or load if already present."""
    os.makedirs(DRIVE_PATH, exist_ok=True) # Ensure the directory exists

    # Check if pre-existing files are available in DRIVE_PATH
    embeddings_path = os.path.join(DRIVE_PATH, 'embeddings.pkl')
    metadata_path = os.path.join(DRIVE_PATH, 'metadata.pkl')
    index_path = os.path.join(DRIVE_PATH, 'index.pkl')

    if os.path.exists(embeddings_path) and os.path.exists(metadata_path) and os.path.exists(index_path):
        print("Loading pre-existing embeddings, metadata, and FAISS index from Google Drive...")
        print("Pre-embedding files loaded/confirmed.")
    else:
        print("Pre-embedding files not found in Google Drive. Generating new embeddings and FAISS index...")
        model = SentenceTransformer('all-MiniLM-L6-v2')
        gita_df, quran_df, bible_df = load_datasets()
        datasets = {
            "Gita": gita_df.to_dict('records'),
            "Quran": quran_df.to_dict('records'),
            "Bible": bible_df.to_dict('records')
        }
        embeddings, _ = generate_and_save_embeddings(datasets, model) # Get embeddings for index creation
        create_and_save_index(embeddings)
        print("Pre-embedding completed.")

if __name__ == "__main__":
    pre_embed_and_save()

Google Drive path set to: /content/drive/MyDrive/WiseBot
Loading pre-existing embeddings, metadata, and FAISS index from Google Drive...
Pre-embedding files loaded/confirmed.


## Global Configuration, Logging, and Data Initialization

In [None]:
DRIVE_PATH = '/content/drive/MyDrive/WiseBot'

# Ensure the DRIVE_PATH exists
os.makedirs(DRIVE_PATH, exist_ok=True)

# Define CACHE_FILE path
CACHE_FILE = os.path.join(DRIVE_PATH, 'cache.pkl')

# Configure logging
log_file_name = os.path.join(DRIVE_PATH, 'app.log')
logger = logging.getLogger(__name__)
logger.setLevel(logging.WARNING)

handler = logging.FileHandler(log_file_name, mode='a')
formatter = logging.Formatter('%(asctime)s - %(levelname)s - %(message)s')
handler.setFormatter(formatter)

# Clear existing handlers to prevent duplicate logs if cell is run multiple times
if logger.handlers:
    logger.handlers.clear()
logger.addHandler(handler)

logger.info("Application logging setup and initiated.")
handler.flush()

def load_cache():
    """Loads the cache from a pickle file."""
    if os.path.exists(CACHE_FILE):
        try:
            with open(CACHE_FILE, 'rb') as f:
                cache = pickle.load(f)
            logger.info(f"Cache loaded from {CACHE_FILE}")
            handler.flush()
            return cache
        except Exception as e:
            logger.error(f"Error loading cache from {CACHE_FILE}: {e}")
            handler.flush()
            return {}
    logger.info("Cache file not found. Starting with empty cache.")
    handler.flush()
    return {}

def save_cache(cache):
    """Saves the cache to a pickle file."""
    try:
        with open(CACHE_FILE, 'wb') as f:
            pickle.dump(cache, f)
        logger.info(f"Cache saved to {CACHE_FILE}")
        handler.flush()
    except Exception as e:
        logger.error(f"Error saving cache to {CACHE_FILE}: {e}")
        handler.flush()

def load_preembedded_data():
    """Load pre-embedded data from files."""
    global DRIVE_PATH
    try:
        with open(os.path.join(DRIVE_PATH, 'embeddings.pkl'), 'rb') as f:
            embeddings = pickle.load(f)
        with open(os.path.join(DRIVE_PATH, 'metadata.pkl'), 'rb') as f:
            metadata = pickle.load(f)
        with open(os.path.join(DRIVE_PATH, 'index.pkl'), 'rb') as f:
            index = pickle.load(f)
        logger.info("Pre-embedded data loaded successfully.")
        handler.flush()
        metadata_dict = {i: entry for i, entry in enumerate(metadata)}
        return embeddings, metadata_dict, index
    except Exception as e:
        logger.error(f"Error loading pre-embedded data: {e}")
        handler.flush()
        raise

# Initialize global variables
cache = load_cache()
embeddings, metadata_dict, index = load_preembedded_data()
model_encoder = SentenceTransformer('all-MiniLM-L6-v2')

# --- Gemini API Configuration ---
# Fetch API key from Colab Secrets
try:
    GOOGLE_API_KEY = userdata.get('GOOGLE_API_KEY')
    genai.configure(api_key=GOOGLE_API_KEY);
    gemini_model = genai.GenerativeModel('gemini-2.5-flash-lite') # Using a free, fast Gemini model

except Exception as e:
    print(f"Error configuring Gemini API. Make sure GOOGLE_API_KEY is set in Colab secrets: {e}")
    gemini_model = None # Set to None if API key isn't configured
# --- End Gemini API Configuration ---

# --- Out-of-Scope Keywords for API Quota Saving ---
OUT_OF_SCOPE_KEYWORDS = [
    "hello", "hlo", "hllo", "hii", "hi", "how are you", "what's up", "hey", # Greetings
    "time", "date", "weather", "news", "current events", # General knowledge/current affairs
    "what is the capital", "how many", # Factual/general questions
    "tell me about yourself", "your name", "who made you", # Questions about the bot itself
    "joke", "story", "fun fact", # Entertainment
    "exit", "quit", "bye", "goodbye" # Explicit exit commands (should already be handled but for redundancy)
]
# --- End Out-of-Scope Keywords ---

print("Setup complete: Logging configured, cache and pre-embedded data loaded, SentenceTransformer model initialized.")

Setup complete: Logging configured, cache and pre-embedded data loaded, SentenceTransformer model initialized.


## Interactive Question Answering Logic

In [None]:
def get_llm_response(question, grouped_context_verses):
    """Gets a response from the Gemini API for a given question and context."""
    global gemini_model

    if gemini_model is None:
        return "Error: Gemini API model is not initialized. Please ensure GOOGLE_API_KEY is set and the model is initialized in the global setup."

    formatted_context_parts = []
    for book, verses in grouped_context_verses.items():
        if verses:
            formatted_context_parts.append(f"From {book}: {', '.join(verses)}")
        else:
            formatted_context_parts.append(f"From {book}: No specific verses found.")
    formatted_context = "\n".join(formatted_context_parts)

    prompt = f"""Your primary role is to act as a WiseBot, providing philosophical or religious insights based on holy texts.\n\n--- INSTRUCTIONS ---\n\n1.  **Scope Determination**:\n    *   **IF** the user's question is a greeting, casual conversation (e.g., 'hello', 'how are you'), about current events, weather, time, general knowledge (e.g., 'who is the president?'), or any topic *not* related to philosophy, religion, spirituality, ethics, or the teachings in the holy books:\n        *   **THEN**, you *must* respond with the exact phrase: "This question is out of scope for my purpose." and provide no further answer.\n    *   **OTHERWISE** (if the question is within scope), proceed to the next step.\n\n2.  **Answering based on Context**:\n    *   The 'Context' below provides verses grouped by holy book. It will either show specific verses (e.g., "From Gita: verse1, verse2") or explicitly state "No specific verses found.".\n    *   You *must* generate a *separate answer block* for *each* holy book mentioned in the provided 'Context', strictly following the format below.\n\n    a.  **For Books with Specific Verses (e.g., "From Gita: verse1, verse2")**:\n        *   Your answer for that specific book *must be based solely on the provided verses for that book*.\n        *   From the verses provided for that book, select *only the single most relevant verse* to the user's question to include in your answer block.\n\n    b.  **For Books with No Specific Verses (e.g., "From Quran: No specific verses found.")**:\n        *   You should generate a *general philosophical answer* to the 'Question' *as it relates to the broader teachings or philosophical stance of that holy book*, even without specific verses directly from the context. This answer should reflect the general spirit of the book on the topic.\n\n3.  **Strict Output Format for EACH Answer Block**:\n    *   Each answer block *must* adhere strictly to the following 3-line format. No deviations, no extra lines, no missing lines:\n        ```\n        According to [Book Name]:\n        Verse: [Either the single most relevant verse from the context OR a general philosophical statement if no verses were found]\n        Meaning: [The philosophical meaning or interpretation, based on the provided context (if verses exist), or on the general teachings of the book (if no verses exist), explained in simple, understandable language.]\n        Example: [A short, real-life example that clearly illustrates the meaning of the verse/philosophical statement.]\n        ```\n\n--- QUESTION AND CONTEXT ---\nQuestion: {question}\nContext from the selected holy books: {formatted_context}
"""

    try:
        logger.info(f"Attempting to call Gemini API with model 'gemini-2.5-flash-lite' for question: {question[:50]}...")
        handler.flush()
        response = gemini_model.generate_content(prompt)
        generated_content = response.text
        logger.info(f"Gemini API call successful.")
        handler.flush()
        return generated_content
    except Exception as e:
        logger.error(f"Error calling Gemini API for question: {question[:50]}... - {e}")
        handler.flush()
        return f"Error: Gemini API call failed: {e}"

# Refactored function to get only the user's question and handle immediate exit
def _get_user_question_and_handle_exit():
    """Get user's question and handle immediate exit commands."""
    question = input("You: ")
    if question.lower() in ["exit", "quit", "bye"]:
        return None # Signal exit
    return question

# New function to get book choices and handle invalid input/exit
def _get_book_choices_and_handle_exit():
    """Get user's book choices, handling invalid input and exit commands."""
    book_map = {
        1: "Gita",
        2: "Quran",
        3: "Bible"
    }

    # The full book selection menu is explicitly removed from here to meet user requirements.
    # It's assumed the user knows the options or can refer to an earlier static markdown cell.

    while True:
        choices_input = input("Enter your choices (e.g., 1,2 for Gita and Quran): ")

        if choices_input.lower() in ["exit", "quit", "bye"]:
            return None # Signal exit

        selected_book_names = []
        has_error = False # Flag to track any input error

        try:
            raw_choices = choices_input.split(',')
            if not raw_choices or all(not c.strip() for c in raw_choices):
                has_error = True
            else:
                for choice_str in raw_choices:
                    choice_str = choice_str.strip()
                    if not choice_str:
                        continue # Skip empty strings from extra commas
                    choice = int(choice_str)
                    if choice in book_map:
                        selected_book_names.append(book_map[choice])
                    else:
                        has_error = True
                        break # Break and re-prompt if any choice is invalid

        except ValueError:
            has_error = True

        if has_error or not selected_book_names:
            # Print a single, consistent error message for all invalid cases.
            print("Invalid input format.")
            logger.warning(f"Invalid book choice input detected: {choices_input}")
            handler.flush()
            # The loop continues, prompting for input again.
        else:
            logger.info(f"User selected books: {selected_book_names}")
            handler.flush()
            return list(dict.fromkeys(selected_book_names)) # Remove duplicates

def _perform_semantic_search(question, model_encoder, index):
    """Performs semantic search using the query and FAISS index."""
    query_embedding = model_encoder.encode(question)
    D, I = index.search(query_embedding.reshape(1, -1), k=50)
    logger.info(f"Semantic search performed for question: {question[:50]}...")
    handler.flush()
    return D, I

def _group_context_verses(search_indices, selected_book_names, metadata_dict):
    """Groups relevant verses by book from search results."""
    grouped_context_verses = {book: [] for book in selected_book_names}
    verse_counts = {book: 0 for book in selected_book_names}

    for idx in search_indices:
        metadata_entry = metadata_dict.get(idx, {})
        if metadata_entry and metadata_entry.get('book') in selected_book_names:
            book = metadata_entry['book']
            verse = metadata_entry.get('verse', '')
            if verse and verse_counts[book] < 3:
                if verse not in grouped_context_verses[book]:
                    grouped_context_verses[book].append(verse)
                    verse_counts[book] += 1
    logger.info(f"Grouped context verses for selected books: {grouped_context_verses}")
    handler.flush()
    return grouped_context_verses

def _parse_llm_response_and_print(answer):
    """Parses the LLM response and prints it to the console."""
    if answer.strip().startswith("This question is out of scope"):
        print(answer)
        logger.info(f"LLM response: {answer.strip()}")
        handler.flush()
        return

    pattern = r"According to (.*?):\nVerse: (.*?)\nMeaning: (.*?)(?=(?:\nAccording to | *$))"
    matches = re.findall(pattern, answer, re.DOTALL)

    if matches:
        for match in matches:
            book_name = match[0].strip()
            verse = match[1].strip()
            meaning = match[2].strip()

            print(f"According to {book_name}:")
            print(f"Verse: {verse}")
            print(f"Meaning: {meaning}")
            logger.info(f"Printed answer for {book_name}")
            handler.flush()
    else:
        print(answer)
        logger.warning(f"LLM response did not match expected pattern: {answer[:100]}...")
        handler.flush()

def process_and_answer(question, selected_book_names, embeddings, metadata_dict, index, model_encoder, cache):
    """Process user input, perform search, and generate answer for all selected books."""
    cache_key = (question, tuple(sorted(selected_book_names)))
    if cache_key in cache:
        logger.info(f"Serving answer from cache for question: {question[:50]}...")
        handler.flush()
        answer = cache[cache_key]
    else:
        # Removed: OUT_OF_SCOPE_KEYWORDS check from here as it's now handled in ask_question
        D, I = _perform_semantic_search(question, model_encoder, index)
        grouped_context_verses = _group_context_verses(I[0], selected_book_names, metadata_dict)
        answer = get_llm_response(question, grouped_context_verses)
        cache[cache_key] = answer
        save_cache(cache)

    _parse_llm_response_and_print(answer)

def ask_question(): # No parameters needed, relies on globals
    """Ask questions, relying on globally loaded data and model."""
    print("I am WiseBot, your philosophical companion.")
    print("Ask me any question and gain Wisdom!")

    while True:
        question = _get_user_question_and_handle_exit() # Get question first
        if question is None: # User exited from question prompt
            logger.info("User quit the application.")
            handler.flush()
            break

        # NEW: Perform out-of-scope check immediately after getting the question
        is_out_of_scope_by_keywords = False
        for keyword in OUT_OF_SCOPE_KEYWORDS:
            if keyword in question.lower():
                is_out_of_scope_by_keywords = True
                break

        if is_out_of_scope_by_keywords:
            print("This question is out of scope for my purpose.")
            logger.info(f"Question identified as out of scope by keywords: {question[:50]}...")
            handler.flush()
            continue # Loop back to ask a new question, bypassing book choice

        # Print book selection menu once a valid question is entered
        print("Choose books by entering numbers separated by commas (e.g., 1,2 for Gita and Quran):")
        print("1. Gita")
        print("2. Quran")
        print("3. Bible")

        # If question is in scope, proceed to get book choices
        selected_book_names = _get_book_choices_and_handle_exit()
        if selected_book_names is None: # User exited from book choice prompt
            logger.info("User quit the application during book choice.")
            handler.flush()
            break
        # If selected_book_names is an empty list, it means invalid input, the loop in _get_book_choices_and_handle_exit
        # would re-prompt. So, if we get here with an empty list, it implies no books were validly selected after retries,
        # which means _get_book_choices_and_handle_exit would just keep prompting.

        process_and_answer(question, selected_book_names, embeddings, metadata_dict, index, model_encoder, cache)

if __name__ == "__main__":
    logger.info("Application started.")
    handler.flush()
    ask_question()
    logger.info("Application ended.")
    handler.flush()
    handler.close()


I am WiseBot, your philosophical companion.
Ask me any question and gain Wisdom!
You: what is pain
Choose books by entering numbers separated by commas (e.g., 1,2 for Gita and Quran):
1. Gita
2. Quran
3. Bible
Enter your choices (e.g., 1,2 for Gita and Quran): 1
According to Gita:
Verse: It is said the fruit of actions performed in the mode of passion result in pain, while those performed in the mode of ignorance result in darkness.
Meaning: Pain arises not just from external circumstances, but also from our own actions, particularly those driven by passion (intense desire and attachment) or ignorance (lack of wisdom and understanding). These actions lead to suffering.
Example: A student who cheats on an exam out of passion for a good grade, or out of ignorance about the value of learning, might face the pain of guilt, potential expulsion, or a lack of genuine knowledge if caught.
You: what is the meaning of life
Choose books by entering numbers separated by commas (e.g., 1,2 for Gita 



Invalid input format.
Enter your choices (e.g., 1,2 for Gita and Quran): 1,2,3
According to Gita:
Verse: I am the Supreme Goal of all living beings, and I am also their Sustainer, Master, Witness, Abode, Shelter, and Friend. I am the Origin, End, and Resting Place of creation; I am the Repository and Eternal Seed.
Meaning: Life's meaning is found in recognizing and connecting with the divine, which is the source and sustainer of all existence. The ultimate purpose is to realize this divine connection.
Example: A person dedicates their life to selfless service and spiritual practice, seeking to understand and align themselves with a higher purpose beyond their individual desires.
According to Quran:
Verse: And nothing is the life of this world but a play and a passing delight; and the life in the hereafter is by far the better for all who are conscious of God.
Meaning: The true meaning and purpose of life lies not in the temporary pleasures of this world, but in preparing for the eterna