# DocuAsk - Stage 2 Interview Task - RAG System

## Initialization & Configuration
ensure the ipykernel package is installed to run Jupyter Notebooks

In [1]:
%pip install -r requirements.txt

Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com
Collecting sentence-transformers (from -r requirements.txt (line 3))
  Downloading sentence_transformers-4.1.0-py3-none-any.whl.metadata (13 kB)
Collecting llama_index (from -r requirements.txt (line 4))
  Downloading llama_index-0.12.44-py3-none-any.whl.metadata (12 kB)
Collecting llama-index-llms-google-genai (from -r requirements.txt (line 5))
  Downloading llama_index_llms_google_genai-0.2.2-py3-none-any.whl.metadata (3.0 kB)
Collecting google-generativeai (from -r requirements.txt (line 6))
  Downloading google_generativeai-0.8.5-py3-none-any.whl.metadata (3.9 kB)
Collecting llama-index-llms-ollama (from -r requirements.txt (line 7))
  Downloading llama_index_llms_ollama-0.6.2-py3-none-any.whl.metadata (3.6 kB)
Collecting llama-index-embeddings_huggingface (from -r requirements.txt (line 8))
  Downloading llama_index_embeddings_huggingface-0.5.5-py3-none-any.whl.metadata (458 bytes)
Collecting transformers<5.


[notice] A new release of pip is available: 24.2 -> 25.1.1
[notice] To update, run: python.exe -m pip install --upgrade pip


In [2]:
# Execute Asychrounously
import nest_asyncio
nest_asyncio.apply()

This exercise uses Gemini to perform RAG.

To opt for a locally-hosted LLM, uncomment the code for Ollama below and tweak the variables for the LLM Configuration.

In [None]:
# LLM Configuration

llm = "gemini-2.0-flash" 
api_key = "<API_KEY>" # Gemini API Key
delay = 7  # Delay in seconds between requests to prevent rate limiting

In [5]:
# Gemini

from llama_index.core import Settings
from llama_index.llms.google_genai import GoogleGenAI
from llama_index.embeddings.google_genai import GoogleGenAIEmbedding

# Configure the embedding model for loading the index
Settings.embed_model = GoogleGenAIEmbedding(api_key=api_key)
# Configure the LLM (assuming GEMINI_API_KEY is set as an environment variable)
Settings.llm = GoogleGenAI(model=llm, api_key=api_key, generate_kwargs={"max_output_tokens": 1}) # only one answer "A, B, C, D"

In [6]:
# Ollama

# from llama_index.llms.ollama import Ollama
# from llama_index.embeddings.huggingface import HuggingFaceEmbedding

# Settings.llm = Ollama(model=llm, request_timeout=100000)
# Settings.embed_model = HuggingFaceEmbedding(model_name="BAAI/bge-small-en-v1.5")

## Indexing Data

In [7]:
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader

PDF_DIR = 'data/5_estate_planning/Lessons/'
INDEX_DIR = 'index'

# 1. Load all PDF files from the specified directory
try:
    documents = SimpleDirectoryReader(PDF_DIR).load_data()
    print(f"Loaded {len(documents)} documents from {PDF_DIR}")
except Exception as e:
    print(f"Error loading documents: {e}")
    documents = [] # Ensure documents is an empty list if loading fails

# 2. Create a VectorStoreIndex from the loaded documents
if documents:
    print("Creating VectorStoreIndex...")
    index = VectorStoreIndex.from_documents(documents)
    print("VectorStoreIndex created successfully.")

    # 3. Save the index to disk
    print(f"Saving index to disk in directory '{INDEX_DIR}'...")
    index.storage_context.persist(persist_dir=INDEX_DIR)
    print("Index saved successfully.")
else:
    print("No documents loaded, index creation skipped.")

Loaded 140 documents from data/5_estate_planning/Lessons/
Creating VectorStoreIndex...
VectorStoreIndex created successfully.
Saving index to disk in directory 'index'...
Index saved successfully.


## RAG & Evaluation

In [9]:
import json
import os
from llama_index.core import StorageContext, load_index_from_storage
import re
import time
from llama_index.core.response import Response


# Define the directory where the index is saved
INDEX_DIR = 'index'

# Define the path to the questions JSON file
QUESTIONS_FILE = 'data/processed/5_estate_planning_questions.json'


# Load the index
print(f"Attempting to load index from {INDEX_DIR}")
try:
    # Try loading from storage context (assuming it was saved that way)
    if os.path.exists(INDEX_DIR):
        from llama_index.core import StorageContext, load_index_from_storage
        storage_context = StorageContext.from_defaults(persist_dir=INDEX_DIR)
        index = load_index_from_storage(storage_context)
        print(f"Successfully loaded index from {INDEX_DIR} using StorageContext")
except Exception as e:
    print(f"An unexpected error occurred while loading index: {e}")
    print("Could not load the index. Please ensure the index is created and saved correctly.")
    exit()


# Read the questions and expected answers from the JSON file
try:
    with open(QUESTIONS_FILE, 'r') as f:
        questions_data = json.load(f)
    print(f"Successfully loaded questions from {QUESTIONS_FILE}")
except FileNotFoundError:
    print(f"Error: Questions file not found at {QUESTIONS_FILE}")
    exit()
except json.JSONDecodeError:
    print(f"Error: Could not decode JSON from {QUESTIONS_FILE}")
    exit()

# At this point, 'index' contains the loaded index and 'questions_data'
# contains the list of questions and expected answers.
# The next step would be to process each question using the index and an LLM.

query_engine = index.as_query_engine()

correct_answers = 0

for chapter_key, questions_list in questions_data.items():
    for question_entry in questions_list:

        question_text = question_entry['question']
        expected_answer = question_entry['answer']
        # print(f"Type of question_entry: {type(question_entry)}") # Keep this commented out or remove

        # Construct the query with explicit instructions
        query_text = f"""\
        You are an agent designed to answer a multiple choice question over a set of given documents.
        You must respond with ONLY the letter of the correct answer: A, B, C, or D.
        Do NOT include any explanation, reasoning, or extra text.
        If you include anything other than A, B, C, or D, your answer will be considered invalid. \n
        Question: {question_text}
        """
        # Get response from the RAG system
        # Modify the query to explicitly ask for only the letter answer
        response: Response = query_engine.query(query_text)
        # Add a small delay to avoid hitting API rate limits
        time.sleep(delay)

        # Print details for analysis
        # Keep the print statements for analysis as needed, or remove them later
        print("-" * 20)
        print(f"Question: {question_text}")
        print(f"Expected Answer: {expected_answer}")
        # print("Retrieved Context:")
        # for node in response.source_nodes:
        #     print(node.text)
        #     print("-" * 10)
        print(f"Raw LLM Response: {str(response)}")
        print("-" * 20) # Ensure this is consistent

        # Use regex to extract the predicted answer (a single uppercase letter A-D)
        match = re.search(r'[A-D]', str(response).strip().upper())
        predicted_answer = match.group(0) if match else None

        # Compare predicted and expected answers
        if predicted_answer == expected_answer:
            correct_answers += 1

# Calculate and print accuracy
total_questions = sum(len(questions_list) for questions_list in questions_data.values())
accuracy = (correct_answers / total_questions) * 100 if total_questions > 0 else 0

print(f"\nValidation Results:")
print(f"Total Questions: {total_questions}")
print(f"Correct Answers: {correct_answers}")
print(f"Accuracy: {accuracy:.2f}%")

Attempting to load index from index
Loading llama_index.core.storage.kvstore.simple_kvstore from index\docstore.json.
Loading llama_index.core.storage.kvstore.simple_kvstore from index\index_store.json.
Successfully loaded index from index using StorageContext
Successfully loaded questions from data/processed/5_estate_planning_questions.json
--------------------
Question: 1. In Estate Planning there are many views and approaches taken. Which of the following is not true? 

A. It is common for many Malaysians to avoid estate planning 

B. Writing a Will is adequate for estate planning purposes 

C. People generally tend to avoid estate planning because it is a complex subject. 

D. Preservation of the estate is a key objective in estate planning
Expected Answer: B
Raw LLM Response: B

--------------------
--------------------
Question: 2. The various steps in the process of estate planning ensure that all issues are covered. Which of the following sequence in the process is correct? 

A