## Installation
Install required packages before running the notebook.


In [None]:
!pip install -r requirements.txt



## Mr. HelpMate AI
This notebook consolidates the functionalities of the Mr. HelpMate AI repository into a single interactive notebook.

## Imports and Dependencies
Import all necessary libraries and modules.


In [21]:
import json
import pickle
from sentence_transformers import CrossEncoder
import google.generativeai as genai
from langchain_community.document_loaders import PyPDFLoader
from langchain_core.documents import Document
from langchain.text_splitter import RecursiveCharacterTextSplitter
import re
from sentence_transformers import SentenceTransformer
from chromadb import PersistentClient



## Search with Cache
Define a function to perform semantic search with caching support.


In [22]:


CACHE_FILE = "/home/mobiledairy/Mr.HelpMate/Mr.HelpMate-AI/cache/search_cache.pkl"

import os
import pickle

def search_with_cache(query, embedder, collection, top_k=5):
    cache = {}

    # Handle missing or corrupted cache file gracefully
    if os.path.exists(CACHE_FILE) and os.path.getsize(CACHE_FILE) > 0:
        try:
            with open(CACHE_FILE, "rb") as f:
                cache = pickle.load(f)
        except (EOFError, pickle.UnpicklingError):
            print("Warning: Cache file is empty or corrupted. Rebuilding cache.")
            cache = {}
    else:
        print("Cache file does not exist or is empty. Creating a new one.")

    # Return from cache if available
    if query in cache:
        return cache[query]

    # Perform embedding and retrieval
    q_embed = embedder.encode(query).tolist()
    results = collection.query(query_embeddings=[q_embed], n_results=top_k)
    docs = results["documents"][0]

    # Update and save cache
    cache[query] = docs
    with open(CACHE_FILE, "wb") as f:
        pickle.dump(cache, f)

    return docs


def rerank(query, docs, top_n=3):
    reranker = CrossEncoder("cross-encoder/ms-marco-MiniLM-L-6-v2")
    pairs = [[query, doc] for doc in docs]
    scores = reranker.predict(pairs)
    reranked = sorted(zip(docs, scores), key=lambda x: x[1], reverse=True)
    return reranked[:top_n]


In [23]:

import google.generativeai as genai
from dotenv import load_dotenv
import os

# Load the .env file
load_dotenv()

api_key = os.getenv("GENERATIVE_AI_API_KEY")
if not api_key:
    raise ValueError("API key not found. Please ensure it is set in the .env file.")

genai.configure(api_key=api_key)

def generate_answer(query, top_chunks):
    context = "\n\n".join([chunk for chunk, _ in top_chunks])
    prompt = f"""
You are Mr. HelpMate AI, a smart assistant built to help people understand complex insurance policy documents. These documents are often long and full of hidden clauses that buyers usually miss.

Your job is to carefully read the provided context and answer the user’s specific question clearly and accurately.

You must:
- Provide a concise and clear answer to the question.
- Mention the section heading (e.g., "PART IV - BENEFITS, Article 2 - Death Benefits Payable") where the information was found.
- Include the page number(s) where this content appears in the policy.
- If the answer spans multiple sections, mention all relevant headings and pages.
- Use simple, professional language that helps the user understand their policy without legal jargon.

---

Context:
{context}

Question: {query}

---

Example 1:
Question: Does this policy provide coverage for death due to an accident?

Answer: Yes, the policy includes coverage for accidental death.

Found in:
Section: PART IV - BENEFITS, Section B - Member Accidental Death and Dismemberment Insurance  
Article: Article 3 - Benefits Payable  
Pages: 49–51

---

Example 2:
Question: Is there a suicide exclusion clause in this insurance policy?

Answer: Yes, the policy does not pay a benefit if the insured dies by suicide within the first two years.

Found in:
Section: PART V - GENERAL EXCLUSIONS  
Article: Article 1 - Suicide Exclusion  
Pages: 52

---

Example 3:
Question: Are pre-existing conditions covered in this plan?

Answer: No, the policy excludes coverage for pre-existing conditions during the first 12 months.

Found in:
Section: PART III - LIMITATIONS  
Article: Article 2 - Pre-existing Condition Limitation  
Pages: 35–36

---

Now, based on the given context and user question, follow the above format to respond appropriately.

"""
    model = genai.GenerativeModel("gemini-2.0-pro-exp-02-05")
    response = model.generate_content(prompt)
    return response.text


In [24]:

def clean_and_group_sections(text):
    # Group by parts like "PART I - DEFINITIONS", "PART IV - BENEFITS", etc.
    section_pattern = r"(PART [IVXL]+ - .+?)(?=PART [IVXL]+ -|\Z)"
    matches = re.findall(section_pattern, text, re.DOTALL)
    return matches if matches else [text]

def load_and_split(pdf_path):
    loader = PyPDFLoader("/home/mobiledairy/Mr.HelpMate/Mr.HelpMate-AI/data/Principal-Sample-Life-Insurance-Policy.pdf")
    docs = loader.load()

    # Merge all pages to capture full structured content
    full_text = "\n".join(doc.page_content for doc in docs)
    sections = clean_and_group_sections(full_text)

    # Wrap each section into a Document with metadata
    section_docs = [
        Document(page_content=section.strip(), metadata={"section": f"Part {i+1}"})
        for i, section in enumerate(sections)
    ]

    # Chunk semantically for RAG using LangChain's splitter
    splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=50)
    return splitter.split_documents(section_docs)


## ChromaDB Setup
Initialize ChromaDB and the sentence embedder, and upload chunks.


In [25]:

def setup_chroma(chunks):
    client = PersistentClient(path="cache/chroma_db")
    collection = client.get_or_create_collection("insurance_docs")
    embedder = SentenceTransformer("sentence-transformers/all-MiniLM-L12-v2")

    for i, chunk in enumerate(chunks):
        collection.add(documents=[chunk.page_content], ids=[str(i)])
    
    return collection, embedder


## PDF Loading and Chunking
Load the PDF document and split it into manageable text chunks.


In [26]:
pdf_path = "/home/mobiledairy/Mr.HelpMate/Mr.HelpMate-AI/data/Principal-Sample-Life-Insurance-Policy.pdf"
chunks = load_and_split(pdf_path)

In [27]:
collection, embedder = setup_chroma(chunks)

## Execute Queries
Run each query: search, rerank, and generate answers.


In [32]:
# ---------------------------
# Query 1
# ---------------------------
query1 = "What is the grace period for premium payment?"
print("=" * 100)
print(f"Query 1: {query1}\n")

docs1 = search_with_cache(query1, embedder, collection)
reranked1 = rerank(query1, docs1)

print("🔍 Top 3 Retrieved Chunks for Query 1:\n")
for i, (text, score) in enumerate(reranked1[:3], 1):
    print(f"[{i}] Score: {score:.4f}\n{text[:300]}...\n")

print("Final Generated Answer for Query 1:\n")
response1 = generate_answer(query1, reranked1[:3])
print(response1)
print("=" * 100 + "\n\n")


Query 1: What is the grace period for premium payment?

🔍 Top 3 Retrieved Chunks for Query 1:

[1] Score: 8.5812
will be due on the first of each Insurance Month.  Except for the first premium, a Grace Period of 
31 days will be allowed for payment of premium.  "Grace Period" means the first 31- day period 
following a premium due date.  The Group Policy will remain in fo rce until the end of the Grace...

[2] Score: 2.5242
This policy has been updated effective  January 1, 2014 
 
 
PART II - POLICY ADMINISTRATION 
GC 6004 Section B - Premiums, Page 1  
 
Section B - Premiums 
 
 
Article 1 - Payment Responsibility; Due Dates; Grace Period...

[3] Score: 1.8549
Dependent Rights Article 9 
 
 Policy Interpretation Article 10 
 
 Electronic Transactions Article 11 
 
 
 Section B – Premium 
 
 
 Payment Responsibility; Due Dates; Grace Period Article 1 
 Premium Rates Article 2 
 Premium Rate Changes Article 3 
 Premium Amount Article 4...

Final Generated Answer for Query 1:

Answer: T

In [33]:
# ---------------------------
# Query 2
# ---------------------------
query2 = "Is suicide covered under the policy?"
print("=" * 100)
print(f"Query 2: {query2}\n")

docs2 = search_with_cache(query2, embedder, collection)
reranked2 = rerank(query2, docs2)

print("🔍 Top 3 Retrieved Chunks for Query 2:\n")
for i, (text, score) in enumerate(reranked2[:3], 1):
    print(f"[{i}] Score: {score:.4f}\n{text[:300]}...\n")

print("Final Generated Answer for Query 2:\n")
response2 = generate_answer(query2, reranked2[:3])
print(response2)
print("=" * 100 + "\n\n")


Query 2: Is suicide covered under the policy?

🔍 Top 3 Retrieved Chunks for Query 2:

[1] Score: -4.7528
This policy has been updated effective  January 1, 2014 
 
      PART IV - BENEFITS 
GC 6015  Section B - Member Accidental Death and 
Dismemberment Insurance, Page 6 
 
 
 
a. willful self-injury or self-destruction, while sane or insane; or...

[2] Score: -7.3011
This policy has been updated effective January 1, 2014 
 
 
 
GC 6001 TABLE OF CONTENTS, PAGE 2  
 
 
 
 Section A – Eligibility 
 
 
 Member Life Insurance Article 1 
 
 Member Accidental Death and Dismemberment Insurance Article 2 
 
 Dependent Life Insurance Article 3...

[3] Score: -7.3558
This policy has been updated effective  January 1, 2014 
 
 
PART IV - BENEFITS 
GC 6013  Section A - Member Life Insurance, Page 2  
 
Member's death, the Death B enefits Payable may be withheld until additional information has 
been received or the trial has been held....

Final Generated Answer for Query 2:

Based on the informat

In [34]:
# ---------------------------
# Query 3
# ---------------------------
query3 = "Can the policyholder surrender the policy early?"
print("=" * 100)
print(f"Query 3: {query3}\n")

docs3 = search_with_cache(query3, embedder, collection)
reranked3 = rerank(query3, docs3)

print("🔍 Top 3 Retrieved Chunks for Query 3:\n")
for i, (text, score) in enumerate(reranked3[:3], 1):
    print(f"[{i}] Score: {score:.4f}\n{text[:300]}...\n")

print("Final Generated Answer for Query 3:\n")
response3 = generate_answer(query3, reranked3[:3])
print(response3)
print("=" * 100 + "\n\n")


Query 3: Can the policyholder surrender the policy early?

🔍 Top 3 Retrieved Chunks for Query 3:

[1] Score: 1.2513
Policyholder relocates to a state where this Group Policy is not marketed, by giving the 
Policyholder 31 days advanced notice in Writing. 
 
 
Article 4 - Policyholder Responsibility to Members 
 
If this Group Policy terminates for any reason, the Policyholder must:...

[2] Score: -6.1964
extension. 
 
In actual practice, benefits under this Group Policy will be payable sooner, provided The 
Principal receives complete and proper proof of loss.  Further, if a claim is not payable or cannot 
be processed, The Principal will submit a detailed explanation of the basis for its denial....

[3] Score: -8.5765
by an officer of The Principal.  
 
The Principal reserves the right to change this Group Policy as follows: 
 
a. Any or all provisions of this Group Policy may be amended or changed at any time, 
including retroactive changes, to the extent necessary to meet the requir