<a href="https://colab.research.google.com/github/anushan1989/helpmate_ai/blob/main/helpmate_ai.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [7]:
!pip install pdfplumber tiktoken openai chromadb sentence_transformers



In [8]:
import pdfplumber
from pathlib import Path
import pandas as pd
from operator import itemgetter
import json
import tiktoken
import openai
from sentence_transformers import CrossEncoder, util
from chromadb.utils.embedding_functions import OpenAIEmbeddingFunction
import chromadb

In [9]:
pdf_path='/content/Principal-Sample-Life-Insurance-Policy.pdf'
with pdfplumber.open(pdf_path) as pdf:

    single_page = pdf.pages[6]
    text = single_page.extract_text()
    tables = single_page.extract_tables()
    print(text)

Section A – Eligibility
Member Life Insurance Article 1
Member Accidental Death and Dismemberment Insurance Article 2
Dependent Life Insurance Article 3
Section B - Effective Dates
Member Life Insurance Article 1
Member Accidental Death and Dismemberment Insurance Article 2
Dependent Life Insurance Article 3
Section C - Individual Terminations
Member Life Insurance Article 1
Member Accidental Death and Dismemberment Insurance Article 2
Dependent Life Insurance Article 3
Termination for Fraud Article 4
Coverage While Outside of the United States Article 5
Section D - Continuation
Member Life Insurance Article 1
Dependent Insurance - Developmentally Disabled or
Physically Handicapped Children Article 2
Section E - Reinstatement
Reinstatement Article 1
Federal Required Family and Medical Leave Act (FMLA) Article 2
Reinstatement of Coverage for a Member or Dependent When
Coverage Ends due to Living Outside of the United States Article 3
Section F - Individual Purchase Rights
Member Life In

In [10]:
def check_bboxes(word, bbox):
    """
    Helper function to check if a word's bbox is within a table's bbox.
    """
    word_bbox = (word['x0'], word['top'], word['x1'], word['bottom'])
    return (word_bbox[0] >= bbox[0] and word_bbox[2] <= bbox[2] and
            word_bbox[1] >= bbox[1] and word_bbox[3] <= bbox[3])

def extract_text_from_pdf(pdf_path):
    full_text = []
    with pdfplumber.open(pdf_path) as pdf:
        for p, page in enumerate(pdf.pages, start=1):
            page_no = f"Page {p}"
            text = page.extract_text()

            # Extract heading (if text exists)
            heading = text.split('\n')[0].strip() if text else None

            tables = page.find_tables()
            table_bboxes = [i.bbox for i in tables]
            tables = [{'table': i.extract(), 'top': i.bbox[1]} for i in tables]
            non_table_words = [word for word in page.extract_words() if not any(
                check_bboxes(word, table_bbox) for table_bbox in table_bboxes)]
            lines = []

            for cluster in pdfplumber.utils.cluster_objects(non_table_words + tables, itemgetter('top'), tolerance=5):
                if 'text' in cluster[0]:
                    try:
                        lines.append(' '.join([i['text'] for i in cluster]))
                    except KeyError:
                        pass
                elif 'table' in cluster[0]:
                    lines.append(json.dumps(cluster[0]['table']))

            full_text.append([page_no, heading, " ".join(lines)])

    # Convert the extracted data to a DataFrame
    df = pd.DataFrame(full_text, columns=['Page_Number', 'Heading', 'Text'])

    return df

In [11]:
df = extract_text_from_pdf(pdf_path)

In [12]:
df.head(10)

Unnamed: 0,Page_Number,Heading,Text
0,Page 1,DOROTHEA GLAUSE S655,DOROTHEA GLAUSE S655 RHODE ISLAND JOHN DOE 01/...
1,Page 2,This page left blank intentionally,This page left blank intentionally
2,Page 3,POLICY RIDER,POLICY RIDER GROUP INSURANCE POLICY NO: S655 C...
3,Page 4,This page left blank intentionally,This page left blank intentionally
4,Page 5,PRINCIPAL LIFE INSURANCE COMPANY,PRINCIPAL LIFE INSURANCE COMPANY (called The P...
5,Page 6,TABLE OF CONTENTS,TABLE OF CONTENTS PART I - DEFINITIONS PART II...
6,Page 7,Section A – Eligibility,Section A – Eligibility Member Life Insurance ...
7,Page 8,Section A - Member Life Insurance,Section A - Member Life Insurance Schedule of ...
8,Page 9,P ART I - DEFINITIONS,P ART I - DEFINITIONS When used in this Group ...
9,Page 10,T he legally recognized union of two eligible ...,T he legally recognized union of two eligible ...


In [13]:
df['Text'][9]

'T he legally recognized union of two eligible individuals of the same sex established according to law. Civil Union Partner For two persons to establish a Civil Union in Rhode Island, it shall be necessary that they satisfy all of the following criteria: a. not be a party to another Civil Union or marriage in Rhode Island; b. be of the same sex and therefore be excluded from the marriage laws of Rhode Island or any other state; c. be at least 18 years of age; d. not be related to the other proposed party to the Civil Union. NOTE: For the purposes of this Group Policy, the term "spouse" will include Civil Union Partner, except as otherwise provided in this Group Policy. Date of Issue The date this Group Policy is placed in force: November 1, 2007. Dependent a. A Member\'s spouse, if that spouse: (1) is legally married to the Member; and (2) is not in the Armed Forces of any country; and (3) is not insured under this Group Policy as a Member. A Member\'s spouse will also include a Civil

In [14]:
df['Text_Length'] = df['Text'].apply(lambda x: len(x.split(' ')))

In [15]:
df.head(10)

Unnamed: 0,Page_Number,Heading,Text,Text_Length
0,Page 1,DOROTHEA GLAUSE S655,DOROTHEA GLAUSE S655 RHODE ISLAND JOHN DOE 01/...,30
1,Page 2,This page left blank intentionally,This page left blank intentionally,5
2,Page 3,POLICY RIDER,POLICY RIDER GROUP INSURANCE POLICY NO: S655 C...,230
3,Page 4,This page left blank intentionally,This page left blank intentionally,5
4,Page 5,PRINCIPAL LIFE INSURANCE COMPANY,PRINCIPAL LIFE INSURANCE COMPANY (called The P...,110
5,Page 6,TABLE OF CONTENTS,TABLE OF CONTENTS PART I - DEFINITIONS PART II...,153
6,Page 7,Section A – Eligibility,Section A – Eligibility Member Life Insurance ...,176
7,Page 8,Section A - Member Life Insurance,Section A - Member Life Insurance Schedule of ...,171
8,Page 9,P ART I - DEFINITIONS,P ART I - DEFINITIONS When used in this Group ...,387
9,Page 10,T he legally recognized union of two eligible ...,T he legally recognized union of two eligible ...,251


In [16]:
df = df.loc[df['Text_Length']>=10]

In [17]:
df.head(10)

Unnamed: 0,Page_Number,Heading,Text,Text_Length
0,Page 1,DOROTHEA GLAUSE S655,DOROTHEA GLAUSE S655 RHODE ISLAND JOHN DOE 01/...,30
2,Page 3,POLICY RIDER,POLICY RIDER GROUP INSURANCE POLICY NO: S655 C...,230
4,Page 5,PRINCIPAL LIFE INSURANCE COMPANY,PRINCIPAL LIFE INSURANCE COMPANY (called The P...,110
5,Page 6,TABLE OF CONTENTS,TABLE OF CONTENTS PART I - DEFINITIONS PART II...,153
6,Page 7,Section A – Eligibility,Section A – Eligibility Member Life Insurance ...,176
7,Page 8,Section A - Member Life Insurance,Section A - Member Life Insurance Schedule of ...,171
8,Page 9,P ART I - DEFINITIONS,P ART I - DEFINITIONS When used in this Group ...,387
9,Page 10,T he legally recognized union of two eligible ...,T he legally recognized union of two eligible ...,251
10,Page 11,(2) has been placed with the Member or spouse ...,(2) has been placed with the Member or spouse ...,299
11,Page 12,An institution that is licensed as a Hospital ...,An institution that is licensed as a Hospital ...,352


In [18]:
df.head()

Unnamed: 0,Page_Number,Heading,Text,Text_Length
0,Page 1,DOROTHEA GLAUSE S655,DOROTHEA GLAUSE S655 RHODE ISLAND JOHN DOE 01/...,30
2,Page 3,POLICY RIDER,POLICY RIDER GROUP INSURANCE POLICY NO: S655 C...,230
4,Page 5,PRINCIPAL LIFE INSURANCE COMPANY,PRINCIPAL LIFE INSURANCE COMPANY (called The P...,110
5,Page 6,TABLE OF CONTENTS,TABLE OF CONTENTS PART I - DEFINITIONS PART II...,153
6,Page 7,Section A – Eligibility,Section A – Eligibility Member Life Insurance ...,176


In [19]:
df['Metadata'] = df.apply(
    lambda x: {
        'Section': (x['Heading'][:20] if x['Heading'] else ''),
        'Page_No.': x['Page_Number']
    },
    axis=1
)

In [20]:
def chunk_text(text, chunk_size=300, overlap_size=50):
    # Split the text into individual words
    words = text.split()
    chunks = []

    # Iterate over the words to create chunks with overlap
    for i in range(0, len(words), chunk_size - overlap_size):
        # Create a chunk from the current position
        chunk = " ".join(words[i:i + chunk_size])
        chunks.append(chunk)

    return chunks


In [21]:
df['Chunks'] = df['Text'].apply(lambda x: chunk_text(x))

# Flatten the DataFrame to have one row per chunk
chunked_df = df.explode('Chunks').reset_index(drop=True)

# Add an identifier to each chunk to keep track of the page and chunk number
chunked_df['Chunk_ID'] = chunked_df.index + 1

In [22]:
chunked_df.head(20)

Unnamed: 0,Page_Number,Heading,Text,Text_Length,Metadata,Chunks,Chunk_ID
0,Page 1,DOROTHEA GLAUSE S655,DOROTHEA GLAUSE S655 RHODE ISLAND JOHN DOE 01/...,30,"{'Section': 'DOROTHEA GLAUSE S655', 'Page_No.'...",DOROTHEA GLAUSE S655 RHODE ISLAND JOHN DOE 01/...,1
1,Page 3,POLICY RIDER,POLICY RIDER GROUP INSURANCE POLICY NO: S655 C...,230,"{'Section': 'POLICY RIDER', 'Page_No.': 'Page 3'}",POLICY RIDER GROUP INSURANCE POLICY NO: S655 C...,2
2,Page 5,PRINCIPAL LIFE INSURANCE COMPANY,PRINCIPAL LIFE INSURANCE COMPANY (called The P...,110,"{'Section': 'PRINCIPAL LIFE INSUR', 'Page_No.'...",PRINCIPAL LIFE INSURANCE COMPANY (called The P...,3
3,Page 6,TABLE OF CONTENTS,TABLE OF CONTENTS PART I - DEFINITIONS PART II...,153,"{'Section': 'TABLE OF CONTENTS', 'Page_No.': '...",TABLE OF CONTENTS PART I - DEFINITIONS PART II...,4
4,Page 7,Section A – Eligibility,Section A – Eligibility Member Life Insurance ...,176,"{'Section': 'Section A – Eligibil', 'Page_No.'...",Section A – Eligibility Member Life Insurance ...,5
5,Page 8,Section A - Member Life Insurance,Section A - Member Life Insurance Schedule of ...,171,"{'Section': 'Section A - Member L', 'Page_No.'...",Section A - Member Life Insurance Schedule of ...,6
6,Page 9,P ART I - DEFINITIONS,P ART I - DEFINITIONS When used in this Group ...,387,"{'Section': 'P ART I - DEFINITION', 'Page_No.'...",P ART I - DEFINITIONS When used in this Group ...,7
7,Page 9,P ART I - DEFINITIONS,P ART I - DEFINITIONS When used in this Group ...,387,"{'Section': 'P ART I - DEFINITION', 'Page_No.'...",f. Continence - the ability to voluntarily con...,8
8,Page 10,T he legally recognized union of two eligible ...,T he legally recognized union of two eligible ...,251,"{'Section': 'T he legally recogni', 'Page_No.'...",T he legally recognized union of two eligible ...,9
9,Page 10,T he legally recognized union of two eligible ...,T he legally recognized union of two eligible ...,251,"{'Section': 'T he legally recogni', 'Page_No.'...",2,10


In [23]:
# chunked_df['Metadata'] = chunked_df.apply(
#     lambda x: {
#         'Section': (x['Heading'][:20] if x['Heading'] else ''),
#         'Page_No.': x['Page_Number'],
#         'Chunk_ID':x['Chunk_ID']
#     },
#     axis=1
# )

In [24]:
filepath = "/content/OPEN_AI_KEY.txt"

with open(filepath , "r") as f:
  openai.api_key = ' '.join(f.readlines())

In [25]:
chroma_data_path = '/kaggle/working/'
client = chromadb.PersistentClient(path=chroma_data_path)

In [26]:
embedding_function = OpenAIEmbeddingFunction(api_key=openai.api_key, model_name='text-embedding-ada-002')
insurance_collection = client.get_or_create_collection(name='InsurancePolicyDoc', embedding_function=embedding_function)

In [27]:
documents_list = chunked_df["Chunks"].tolist()
metadata_list = chunked_df['Metadata'].tolist()

In [28]:
insurance_collection.add(
    documents= documents_list,
    ids = [str(i) for i in range(0, len(documents_list))],
    metadatas = metadata_list
)

In [29]:
cache_collection = client.get_or_create_collection(name='Insurance_Cache', embedding_function=embedding_function)
cache_collection.peek()

{'ids': [],
 'embeddings': array([], dtype=float64),
 'documents': [],
 'uris': None,
 'included': ['metadatas', 'documents', 'embeddings'],
 'data': None,
 'metadatas': []}

In [30]:
query_1=input()

 what types of coverage does this policy include?


In [31]:
# Query the cache collection to check if the results are already stored
cache_results = cache_collection.query(
    query_texts=query_1,
    n_results=1
)

In [32]:
# Query the cache collection to check if the results are already stored
cache_results = cache_collection.query(
    query_texts=query_1,
    n_results=1
)

# Print the results from the cache query for debugging
print(cache_results)

# If the cache did not return satisfactory results (e.g., based on distance), query the main collection
results = insurance_collection.query(
    query_texts=query_1,
    n_results=10
)

{'ids': [[]], 'embeddings': None, 'documents': [[]], 'uris': None, 'included': ['metadatas', 'documents', 'distances'], 'data': None, 'metadatas': [[]], 'distances': [[]]}


In [33]:
import json

threshold = 0.2

results_df_1 = pd.DataFrame()

# Query the cache collection to check if the results are already stored
cache_results = cache_collection.query(
    query_texts=query_1,
    n_results=1
)

# Print the results from the cache query for debugging
print(cache_results)

# Check if the cache is empty or if the distance exceeds the threshold
if not cache_results['distances'][0] or cache_results['distances'][0][0] > threshold:
    # Query the main collection for the top 10 results
    results = insurance_collection.query(
        query_texts=query_1,
        n_results=10
    )

    # Prepare keys and values for storing in cache
    cache_data = {}
    for key, val in results.items():
        if val is None:
            continue
        # Adjust the loop to match the actual number of items in val
        for i in range(min(len(val[0]), 10)):  # Ensure you only loop over existing items
            cache_data[f"{key}_{i}"] = val[0][i]

    # Flatten the metadata for storage in ChromaDB
    flat_cache_data = {}
    for k, v in cache_data.items():
        if isinstance(v, dict):
            # Convert the dictionary to a JSON string
            flat_cache_data[k] = json.dumps(v)
        else:
            flat_cache_data[k] = v

    # Store the query in cache
    cache_collection.add(
        documents=[query_1],
        ids=[query_1],  # Alternatively, you can use a unique ID
        metadatas=flat_cache_data
    )

    print("Not found in cache. Found in main collection.")

    # Convert the results to a DataFrame
    result_dict = {
        'Metadatas_1': results['metadatas'][0],
        'Documents_1': results['documents'][0],
        'Distances_1': results['distances'][0],
        'IDs': results['ids'][0]
    }
    results_df_1 = pd.DataFrame.from_dict(result_dict)

# If the distance is within the threshold, retrieve results from the cache
elif cache_results['distances'][0][0] <= threshold:
    # Extract data from the cache
    cache_result_dict = cache_results['metadatas'][0][0]
    ids = []
    documents = []
    distances = []
    metadatas = []

    # Collect data based on keys
    for key, value in cache_result_dict.items():
        if 'ids' in key:
            ids.append(value)
        elif 'documents' in key:
            documents.append(value)
        elif 'distances' in key:
            distances.append(value)
        elif 'metadatas' in key:
            metadatas.append(value)

    print("Found in cache!")

    # Convert the cache data to a DataFrame
    results_df_1 = pd.DataFrame({
        'IDs_1': ids,
        'Documents_1': documents,
        'Distances_1': distances,
        'Metadatas_1': metadatas
    })

# Display the DataFrame with results
#print(results_df_1)


{'ids': [[]], 'embeddings': None, 'documents': [[]], 'uris': None, 'included': ['metadatas', 'documents', 'distances'], 'data': None, 'metadatas': [[]], 'distances': [[]]}
Not found in cache. Found in main collection.


In [34]:
import json

threshold = 0.2

results_df_1 = pd.DataFrame()

# Query the cache collection to check if the results are already stored
cache_results = cache_collection.query(
    query_texts=query_1,
    n_results=1
)

# Print the results from the cache query for debugging
print(cache_results)

# Check if the cache is empty or if the distance exceeds the threshold
if not cache_results['distances'][0] or cache_results['distances'][0][0] > threshold:
    # Query the main collection for the top 10 results
    results = insurance_collection.query(
        query_texts=query_1,
        n_results=10
    )

    # Prepare keys and values for storing in cache
    cache_data = {}
    for key, val in results.items():
        if val is None:
            continue
        # Adjust the loop to match the actual number of items in val
        for i in range(min(len(val[0]), 10)):  # Ensure you only loop over existing items
            cache_data[f"{key}_{i}"] = val[0][i]

    # Flatten the metadata for storage in ChromaDB
    flat_cache_data = {}
    for k, v in cache_data.items():
        if isinstance(v, dict):
            # Convert the dictionary to a JSON string
            flat_cache_data[k] = json.dumps(v)
        else:
            flat_cache_data[k] = v

    # Store the query in cache
    cache_collection.add(
        documents=[query_1],
        ids=[query_1],  # Alternatively, you can use a unique ID
        metadatas=flat_cache_data
    )

    print("Not found in cache. Found in main collection.")

    # Convert the results to a DataFrame
    result_dict = {
        'Metadatas_1': results['metadatas'][0],
        'Documents_1': results['documents'][0],
        'Distances_1': results['distances'][0],
        'IDs': results['ids'][0]
    }
    results_df_1 = pd.DataFrame.from_dict(result_dict)

# If the distance is within the threshold, retrieve results from the cache
elif cache_results['distances'][0][0] <= threshold:
    # Extract data from the cache
    cache_result_dict = cache_results['metadatas'][0][0]
    ids = []
    documents = []
    distances = []
    metadatas = []

    # Collect data based on keys
    for key, value in cache_result_dict.items():
        if 'ids' in key:
            ids.append(value)
        elif 'documents' in key:
            documents.append(value)
        elif 'distances' in key:
            distances.append(value)
        elif 'metadatas' in key:
            metadatas.append(value)

    print("Found in cache!")

    # Convert the cache data to a DataFrame
    results_df_1 = pd.DataFrame({
        'IDs_1': ids,
        'Documents_1': documents,
        'Distances_1': distances,
        'Metadatas_1': metadatas
    })

# Display the DataFrame with results
#print(results_df_1)


{'ids': [[' what types of coverage does this policy include?']], 'embeddings': None, 'documents': [[' what types of coverage does this policy include?']], 'uris': None, 'included': ['metadatas', 'documents', 'distances'], 'data': None, 'metadatas': [[{'distances_7': 0.4418503940105438, 'included_5': 'a', 'included_7': 'a', 'distances_9': 0.4433935284614563, 'documents_1': '"Automobile" means a four-wheel passenger vehicle, station wagon, pick-up truck, or van-type vehicle, but excludes recreational-type vehicles such as a "dune-buggy" or an "all-terrain" vehicle. The term "Seat Belt" means a factory-installed device that forms an occupant restraint and injury avoidance system. Article 5 - Loss of Use or Paralysis Benefit This policy has been updated effective January 1, 2014 PART IV - BENEFITS GC 6015 Section B - Member Accidental Death and Dismemberment Insurance, Page 3', 'metadatas_8': '{"Section": "POLICY RIDER", "Page_No.": "Page 3"}', 'included_2': 't', 'ids_2': '52', 'metadatas_

In [35]:
results_df_1.head()

Unnamed: 0,IDs_1,Documents_1,Distances_1,Metadatas_1
0,52,"""Automobile"" means a four-wheel passenger vehi...",0.44185,"{""Section"": ""POLICY RIDER"", ""Page_No."": ""Page 3""}"
1,1,dependent on the Member for principal support....,0.443394,"{""Section"": ""a . A licensed Docto"", ""Page_No.""..."
2,3,POLICY RIDER GROUP INSURANCE POLICY NO: S655 C...,0.433825,"{""Section"": ""Exposure"", ""Page_No."": ""Page 55""}"
3,64,"coverage, benefits, and participation privileg...",0.4336,"{""Section"": ""A Member's insurance"", ""Page_No.""..."
4,11,"under this Article 1, and less any Accelerated...",0.442223,"{""Page_No."": ""Page 16"", ""Section"": ""PART II - ..."


In [36]:
query_2=input()


   what documentation is required when filing a claim?


In [37]:
cache_results= cache_collection.query(
         query_texts=query_2,
          n_results=1
     )
cache_results

{'ids': [[' what types of coverage does this policy include?']],
 'embeddings': None,
 'documents': [[' what types of coverage does this policy include?']],
 'uris': None,
 'included': ['metadatas', 'documents', 'distances'],
 'data': None,
 'metadatas': [[{'distances_9': 0.4433935284614563,
    'distances_8': 0.4422228932380676,
    'distances_6': 0.4380476474761963,
    'distances_3': 0.4335998296737671,
    'ids_1': '86',
    'distances_1': 0.43088415265083313,
    'distances_2': 0.43138954043388367,
    'documents_7': "a . A licensed Doctor of Medicine (M.D.) or Osteopathy (D.O.); or b. any other licensed health care practitioner that state law requires be recognized as a Physician under this Group Policy. The term Physician does not include the Member, an employee of the Member, a business or professional partner or associate of the Member, any person who has a financial affiliation or business interest with the Member, anyone related to the Member by blood or marriage, or anyone 

In [38]:
threshold = 0.2

results_df_2 = pd.DataFrame()

# Query the cache collection to check if the results are already stored
cache_results = cache_collection.query(
    query_texts=query_2,
    n_results=1
)

# Print the results from the cache query for debugging
print(cache_results)

# Check if the cache is empty or if the distance exceeds the threshold
if not cache_results['distances'][0] or cache_results['distances'][0][0] > threshold:
    # Query the main collection for the top 10 results
    results = insurance_collection.query(
        query_texts=query_2,
        n_results=10
    )

    # Prepare keys and values for storing in cache
    cache_data = {}
    for key, val in results.items():
        if val is None:
            continue
        # Adjust the loop to match the actual number of items in val
        for i in range(min(len(val[0]), 10)):  # Ensure you only loop over existing items
            cache_data[f"{key}_{i}"] = val[0][i]

    # Flatten the metadata for storage in ChromaDB
    flat_cache_data = {}
    for k, v in cache_data.items():
        if isinstance(v, dict):
            # Convert the dictionary to a JSON string
            flat_cache_data[k] = json.dumps(v)
        else:
            flat_cache_data[k] = v

    # Store the query in cache
    cache_collection.add(
        documents=[query_2],
        ids=[query_2],  # Alternatively, you can use a unique ID
        metadatas=flat_cache_data
    )

    print("Not found in cache. Found in main collection.")

    # Convert the results to a DataFrame
    result_dict = {
        'Metadatas_2': results['metadatas'][0],
        'Documents_2': results['documents'][0],
        'Distances_2': results['distances'][0],
        'IDs': results['ids'][0]
    }
    results_df_2 = pd.DataFrame.from_dict(result_dict)

# If the distance is within the threshold, retrieve results from the cache
elif cache_results['distances'][0][0] <= threshold:
    # Extract data from the cache
    cache_result_dict = cache_results['metadatas'][0][0]
    ids = []
    documents = []
    distances = []
    metadatas = []

    # Collect data based on keys
    for key, value in cache_result_dict.items():
        if 'ids' in key:
            ids.append(value)
        elif 'documents' in key:
            documents.append(value)
        elif 'distances' in key:
            distances.append(value)
        elif 'metadatas' in key:
            metadatas.append(value)

    print("Found in cache!")

    # Convert the cache data to a DataFrame
    results_df_2 = pd.DataFrame({
        'IDs_2': ids,
        'Documents_2': documents,
        'Distances_2': distances,
        'Metadatas_2': metadatas
    })

{'ids': [[' what types of coverage does this policy include?']], 'embeddings': None, 'documents': [[' what types of coverage does this policy include?']], 'uris': None, 'included': ['metadatas', 'documents', 'distances'], 'data': None, 'metadatas': [[{'documents_8': 'POLICY RIDER GROUP INSURANCE POLICY NO: S655 COVERAGE: Life EMPLOYER: RHODE ISLAND JOHN DOE Effective on the later of the Date of Issue of this Group Policy or March 1, 2005, the following will apply to your Policy: From time to time The Principal may offer or provide certain employer groups who apply for coverage with The Principal a Financial Services Hotline and Grief Support Services or any other value added service for the employees of that employer group. In addition, The Principal may arrange for third party service providers (i.e., optometrists, health clubs), to provide discounted goods and services to those employer groups who apply for coverage with The Principal or who become insureds/enrollees of The Principal

In [39]:
results_df_2

Unnamed: 0,Metadatas_2,Documents_2,Distances_2,IDs
0,"{'Section': 'Section D - Claim Pr', 'Page_No.'...",Section D - Claim Procedures Article 1 - Notic...,0.366993,95
1,"{'Page_No.': 'Page 62', 'Section': 'A claimant...",of loss has been filed and before the appeal p...,0.378093,98
2,"{'Page_No.': 'Page 62', 'Section': 'A claimant...",A claimant may request an appeal of a claim de...,0.384601,97
3,"{'Section': 'Section D - Claim Pr', 'Page_No.'...",will be considered to be met when the appropri...,0.388429,96
4,"{'Page_No.': 'Page 18', 'Section': 'c . a copy...",c . a copy of the form which contains the stat...,0.409103,23
5,"{'Page_No.': 'Page 29', 'Section': 'Insurance ...",by The Principal. A Member must submit Proof o...,0.414384,40
6,"{'Section': 'f . claim requiremen', 'Page_No.'...","f . claim requirements listed in PART IV, Sect...",0.419817,83
7,"{'Section': 'Section B - Effectiv', 'Page_No.'...",to an individual policy; or (2) were eligible ...,0.420368,38
8,"{'Page_No.': 'Page 23', 'Section': 'Section C ...",or d. fails to pay premium in accordance with ...,0.437295,32
9,"{'Section': 'Scheduled Benefit in', 'Page_No.'...",Benefit amounts due to a request by the Member...,0.439123,44


In [40]:
query_3 = input()


   what happens if i miss a payment?


In [41]:
cache_results= cache_collection.query(
         query_texts=query_3,
          n_results=1
     )
cache_results

{'ids': [['   what documentation is required when filing a claim?']],
 'embeddings': None,
 'documents': [['   what documentation is required when filing a claim?']],
 'uris': None,
 'included': ['metadatas', 'documents', 'distances'],
 'data': None,
 'metadatas': [[{'included_8': 's',
    'included_4': 'd',
    'distances_2': 0.3846011757850647,
    'metadatas_9': '{"Section": "Scheduled Benefit in", "Page_No.": "Page 31"}',
    'documents_5': 'by The Principal. A Member must submit Proof of Good Health: (1) If insurance for which a Member contributes a part of premium is requested more than 31 days after the date the Member is eligible including any insurance the Member refuses and later requests. (2) If insurance is requested under this Group Policy by a Member that was eligible under the Prior Policy, but elected to waive coverage under the Prior Policy. (3) If a Member has failed to provide required Proof of Good Health or has been refused insurance under this Group Policy at any 

In [42]:
threshold = 0.2

results_df_3 = pd.DataFrame()

# Query the cache collection to check if the results are already stored
cache_results = cache_collection.query(
    query_texts=query_3,
    n_results=1
)

# Print the results from the cache query for debugging
#print(cache_results)

# Check if the cache is empty or if the distance exceeds the threshold
if not cache_results['distances'][0] or cache_results['distances'][0][0] > threshold:
    # Query the main collection for the top 10 results
    results = insurance_collection.query(
        query_texts=query_3,
        n_results=10
    )

    # Prepare keys and values for storing in cache
    cache_data = {}
    for key, val in results.items():
        if val is None:
            continue
        # Adjust the loop to match the actual number of items in val
        for i in range(min(len(val[0]), 10)):  # Ensure you only loop over existing items
            cache_data[f"{key}_{i}"] = val[0][i]

    # Flatten the metadata for storage in ChromaDB
    flat_cache_data = {}
    for k, v in cache_data.items():
        if isinstance(v, dict):
            # Convert the dictionary to a JSON string
            flat_cache_data[k] = json.dumps(v)
        else:
            flat_cache_data[k] = v

    # Store the query in cache
    cache_collection.add(
        documents=[query_3],
        ids=[query_3],  # Alternatively, you can use a unique ID
        metadatas=flat_cache_data
    )

    print("Not found in cache. Found in main collection.")

    # Convert the results to a DataFrame
    result_dict = {
        'Metadatas_3': results['metadatas'][0],
        'Documents_3': results['documents'][0],
        'Distances_3': results['distances'][0],
        'IDs': results['ids'][0]
    }
    results_df_3 = pd.DataFrame.from_dict(result_dict)

# If the distance is within the threshold, retrieve results from the cache
elif cache_results['distances'][0][0] <= threshold:
    # Extract data from the cache
    cache_result_dict = cache_results['metadatas'][0][0]
    ids = []
    documents = []
    distances = []
    metadatas = []

    # Collect data based on keys
    for key, value in cache_result_dict.items():
        if 'ids' in key:
            ids.append(value)
        elif 'documents' in key:
            documents.append(value)
        elif 'distances' in key:
            distances.append(value)
        elif 'metadatas' in key:
            metadatas.append(value)

    print("Found in cache!")

    # Convert the cache data to a DataFrame
    results_df_3 = pd.DataFrame({
        'IDs_3': ids,
        'Documents_3': documents,
        'Distances_3': distances,
        'Metadatas_3': metadatas
    })

Not found in cache. Found in main collection.


In [43]:
results_df_3

Unnamed: 0,Metadatas_3,Documents_3,Distances_3,IDs
0,"{'Page_No.': 'Page 23', 'Section': 'Section C ...",or d. fails to pay premium in accordance with ...,0.436712,32
1,"{'Section': 'f . claim requiremen', 'Page_No.'...","Settlement of Proceeds provisions of PART IV, ...",0.46005,84
2,"{'Section': 'Section C - Policy T', 'Page_No.'...",Section C - Policy Termination Article 1 - Fai...,0.465014,31
3,"{'Section': 'f . claim requiremen', 'Page_No.'...","f . claim requirements listed in PART IV, Sect...",0.47508,83
4,"{'Section': 'Section B - Premiums', 'Page_No.'...",Section B - Premiums Article 1 - Payment Respo...,0.479925,26
5,"{'Page_No.': 'Page 24', 'Section': 'T he Princ...",T he Principal may terminate the Policyholder'...,0.483909,33
6,"{'Page_No.': 'Page 47', 'Section': 'M ember's ...",insurance and recorded by the Policyholder or ...,0.485125,71
7,"{'Section': 'b . on any date the ', 'Page_No.'...",may result in certain administrative fees bein...,0.486457,29
8,"{'Page_No.': 'Page 48', 'Section': 'c . If a b...",c . If a beneficiary dies at the same time or ...,0.488092,72
9,"{'Section': 'b . on any date the ', 'Page_No.'...",b . on any date the definition of Member or De...,0.491183,28


In [44]:
cross_encoder = CrossEncoder('cross-encoder/ms-marco-MiniLM-L-6-v2')

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


config.json:   0%|          | 0.00/794 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/90.9M [00:00<?, ?B/s]

tokenizer_config.json: 0.00B [00:00, ?B/s]

vocab.txt: 0.00B [00:00, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

special_tokens_map.json:   0%|          | 0.00/132 [00:00<?, ?B/s]

README.md: 0.00B [00:00, ?B/s]

In [45]:
results_df_3

Unnamed: 0,Metadatas_3,Documents_3,Distances_3,IDs
0,"{'Page_No.': 'Page 23', 'Section': 'Section C ...",or d. fails to pay premium in accordance with ...,0.436712,32
1,"{'Section': 'f . claim requiremen', 'Page_No.'...","Settlement of Proceeds provisions of PART IV, ...",0.46005,84
2,"{'Section': 'Section C - Policy T', 'Page_No.'...",Section C - Policy Termination Article 1 - Fai...,0.465014,31
3,"{'Section': 'f . claim requiremen', 'Page_No.'...","f . claim requirements listed in PART IV, Sect...",0.47508,83
4,"{'Section': 'Section B - Premiums', 'Page_No.'...",Section B - Premiums Article 1 - Payment Respo...,0.479925,26
5,"{'Page_No.': 'Page 24', 'Section': 'T he Princ...",T he Principal may terminate the Policyholder'...,0.483909,33
6,"{'Page_No.': 'Page 47', 'Section': 'M ember's ...",insurance and recorded by the Policyholder or ...,0.485125,71
7,"{'Section': 'b . on any date the ', 'Page_No.'...",may result in certain administrative fees bein...,0.486457,29
8,"{'Page_No.': 'Page 48', 'Section': 'c . If a b...",c . If a beneficiary dies at the same time or ...,0.488092,72
9,"{'Section': 'b . on any date the ', 'Page_No.'...",b . on any date the definition of Member or De...,0.491183,28


In [46]:
cross_inputs_1 = [[query_1, response] for response in results_df_1['Documents_1']]
cross_rerank_scores_1 = cross_encoder.predict(cross_inputs_1)
cross_rerank_scores_1

array([-5.0158772, -2.547558 , -2.123281 , -2.1723297, -3.6183167,
       -5.11115  , -1.9306291, -1.8970758, -9.388777 , -8.101702 ],
      dtype=float32)

In [47]:
results_df_1['Reranked_scores'] = cross_rerank_scores_1
results_df_1

Unnamed: 0,IDs_1,Documents_1,Distances_1,Metadatas_1,Reranked_scores
0,52,"""Automobile"" means a four-wheel passenger vehi...",0.44185,"{""Section"": ""POLICY RIDER"", ""Page_No."": ""Page 3""}",-5.015877
1,1,dependent on the Member for principal support....,0.443394,"{""Section"": ""a . A licensed Docto"", ""Page_No.""...",-2.547558
2,3,POLICY RIDER GROUP INSURANCE POLICY NO: S655 C...,0.433825,"{""Section"": ""Exposure"", ""Page_No."": ""Page 55""}",-2.123281
3,64,"coverage, benefits, and participation privileg...",0.4336,"{""Section"": ""A Member's insurance"", ""Page_No.""...",-2.17233
4,11,"under this Article 1, and less any Accelerated...",0.442223,"{""Page_No."": ""Page 16"", ""Section"": ""PART II - ...",-3.618317
5,86,This policy has been updated effective January...,0.430884,"{""Page_No."": ""Page 11"", ""Section"": ""(2) has be...",-5.11115
6,79,state or federal law. Article 5 - Coverage Whi...,0.418285,"{""Page_No."": ""Page 43"", ""Section"": ""Any indivi...",-1.930629
7,20,a . A licensed Doctor of Medicine (M.D.) or Os...,0.438048,"{""Section"": ""a. be actively engag"", ""Page_No.""...",-1.897076
8,14,TABLE OF CONTENTS PART I - DEFINITIONS PART II...,0.434248,"{""Page_No."": ""Page 51"", ""Section"": ""Coverage D...",-9.388777
9,21,a. be actively engaged in business for profit ...,0.43139,"{""Page_No."": ""Page 6"", ""Section"": ""TABLE OF CO...",-8.101702


In [48]:
top_3_semantic_1 = results_df_1.sort_values(by='Distances_1')
top_3_semantic_1[:3]

Unnamed: 0,IDs_1,Documents_1,Distances_1,Metadatas_1,Reranked_scores
6,79,state or federal law. Article 5 - Coverage Whi...,0.418285,"{""Page_No."": ""Page 43"", ""Section"": ""Any indivi...",-1.930629
5,86,This policy has been updated effective January...,0.430884,"{""Page_No."": ""Page 11"", ""Section"": ""(2) has be...",-5.11115
9,21,a. be actively engaged in business for profit ...,0.43139,"{""Page_No."": ""Page 6"", ""Section"": ""TABLE OF CO...",-8.101702


In [49]:
top_3_rerank_1 = results_df_1.sort_values(by='Reranked_scores', ascending=False)
top_3_rerank_1[:3]

Unnamed: 0,IDs_1,Documents_1,Distances_1,Metadatas_1,Reranked_scores
7,20,a . A licensed Doctor of Medicine (M.D.) or Os...,0.438048,"{""Section"": ""a. be actively engag"", ""Page_No.""...",-1.897076
6,79,state or federal law. Article 5 - Coverage Whi...,0.418285,"{""Page_No."": ""Page 43"", ""Section"": ""Any indivi...",-1.930629
2,3,POLICY RIDER GROUP INSURANCE POLICY NO: S655 C...,0.433825,"{""Section"": ""Exposure"", ""Page_No."": ""Page 55""}",-2.123281


In [50]:
cross_inputs_2 = [[query_2, response] for response in results_df_2['Documents_2']]
cross_rerank_scores_2 = cross_encoder.predict(cross_inputs_2)
results_df_2['Reranked_scores'] = cross_rerank_scores_2#

In [51]:
top_3_semantic_2 = results_df_2.sort_values(by='Distances_2')
top_3_semantic_2[:3]

Unnamed: 0,Metadatas_2,Documents_2,Distances_2,IDs,Reranked_scores
0,"{'Section': 'Section D - Claim Pr', 'Page_No.'...",Section D - Claim Procedures Article 1 - Notic...,0.366993,95,-0.474058
1,"{'Page_No.': 'Page 62', 'Section': 'A claimant...",of loss has been filed and before the appeal p...,0.378093,98,-2.255105
2,"{'Page_No.': 'Page 62', 'Section': 'A claimant...",A claimant may request an appeal of a claim de...,0.384601,97,-0.586972


In [53]:
top_3_rerank_2 = results_df_2.sort_values(by='Reranked_scores', ascending=False)
top_3_rerank_2[:3]

Unnamed: 0,Metadatas_2,Documents_2,Distances_2,IDs,Reranked_scores
0,"{'Section': 'Section D - Claim Pr', 'Page_No.'...",Section D - Claim Procedures Article 1 - Notic...,0.366993,95,-0.474058
2,"{'Page_No.': 'Page 62', 'Section': 'A claimant...",A claimant may request an appeal of a claim de...,0.384601,97,-0.586972
1,"{'Page_No.': 'Page 62', 'Section': 'A claimant...",of loss has been filed and before the appeal p...,0.378093,98,-2.255105


In [54]:
cross_inputs_3 = [[query_3, response] for response in results_df_3['Documents_3']]
cross_rerank_scores_3 = cross_encoder.predict(cross_inputs_3)

In [56]:
results_df_3['Reranked_scores'] = cross_rerank_scores_3
top_3_semantic_3 = results_df_3.sort_values(by='Distances_3')
top_3_semantic_3[:3]

Unnamed: 0,Metadatas_3,Documents_3,Distances_3,IDs,Reranked_scores
0,"{'Page_No.': 'Page 23', 'Section': 'Section C ...",or d. fails to pay premium in accordance with ...,0.436712,32,-9.93795
1,"{'Section': 'f . claim requiremen', 'Page_No.'...","Settlement of Proceeds provisions of PART IV, ...",0.46005,84,-5.53019
2,"{'Section': 'Section C - Policy T', 'Page_No.'...",Section C - Policy Termination Article 1 - Fai...,0.465014,31,-7.30039


In [57]:
top_3_rerank_3 = results_df_3.sort_values(by='Reranked_scores', ascending=False)
top_3_rerank_3[:3]

Unnamed: 0,Metadatas_3,Documents_3,Distances_3,IDs,Reranked_scores
8,"{'Page_No.': 'Page 48', 'Section': 'c . If a b...",c . If a beneficiary dies at the same time or ...,0.488092,72,-5.348609
1,"{'Section': 'f . claim requiremen', 'Page_No.'...","Settlement of Proceeds provisions of PART IV, ...",0.46005,84,-5.53019
6,"{'Page_No.': 'Page 47', 'Section': 'M ember's ...",insurance and recorded by the Policyholder or ...,0.485125,71,-6.560678


In [58]:
top_3_RAG_1 = top_3_rerank_1[["Documents_1", "Metadatas_1"]][:3]
top_3_RAG_2 = top_3_rerank_2[["Documents_2", "Metadatas_2"]][:3]
top_3_RAG_3 = top_3_rerank_3[["Documents_3", "Metadatas_3"]][:3]

In [60]:
 #Define the function to generate the response. Provide a comprehensive prompt that passes the user query and the top 3 results to the model

def generate_response(query, top_3_RAG):
    """
    Generate a response using GPT-3.5's ChatCompletion based on the user query and retrieved information.
    """
    messages = [
        {"role": "system", "content": "You are a helpful assistant in the insurance domain who can effectively answer user queries about insurance policies and documents."},
        {"role": "user", "content": f"""
            You are a helpful assistant in the insurance domain who can effectively answer user queries about insurance policies and documents.
            You have a question asked by the user in '{query}' and you have some search results from a corpus of insurance documents in the dataframe '{top_3_RAG}'. These search results are essentially one page of an insurance document that may be relevant to the user query.

            The column 'documents' inside this dataframe contains the actual text from the policy document and the column 'metadata' contains the policy name and source page. The text inside the document may also contain tables in the format of a list of lists where each of the nested lists indicates a row.

            Use the documents in '{top_3_RAG}' to answer the query '{query}'. Frame an informative answer and also, use the dataframe to return the relevant policy names and page numbers as citations.

            Follow the guidelines below when performing the task:
            1. Try to provide relevant/accurate numbers if available.
            2. You don’t have to necessarily use all the information in the dataframe. Only choose information that is relevant.
            3. If the document text has tables with relevant information, please reformat the table and return the final information in a tabular format.
            4. Use the 'metadata' columns in the dataframe to retrieve and cite the policy name(s) and page number(s) as citation.
            5. If you can't provide the complete answer, please also provide any information that will help the user to search specific sections in the relevant cited documents.
            6. You are a customer-facing assistant, so do not provide any information on internal workings, just answer the query directly.

            The generated response should answer the query directly addressing the user and avoiding additional information. If you think that the query is not relevant to the document, reply that the query is irrelevant. Provide the final response as a well-formatted and easily readable text along with the citation. Provide your complete response first with all information, and then provide the citations.

            ### Few-Shot Examples

            ### Example 1: Basic Query about Coverage
            **Query:**
            What does the policy say about coverage for accidental death?

            **Top 3 RAG Results:**
            - **Document 1:** "This policy provides coverage for accidental death. The insured amount for accidental death is 200% of the base coverage amount if the death occurs within 90 days of the accident..."
            - **Document 2:** "Accidental death benefits are payable under this policy if the insured dies as a result of an accident. The benefit amount equals double the coverage amount, provided the death is a direct result of the accident and occurs within a specified time frame..."
            - **Document 3:** "In the event of accidental death, the policy pays an additional benefit, which is equal to twice the original coverage amount. This benefit is contingent on the death occurring within 180 days from the date of the accident..."

            **Response:**
            The policy provides coverage for accidental death, where the benefit amount is typically 200% of the base coverage. The death must occur as a direct result of an accident and within a specified period, which varies between 90 to 180 days depending on the policy.
            **Citations:**
            Document 1: Policy X, Page 5
            Document 2: Policy Y, Page 12
            Document 3: Policy Z, Page 7

            ### Example 2: Query about Exclusions
            **Query:**
            Are there any exclusions for pre-existing conditions in this policy?

            **Top 3 RAG Results:**
            - **Document 1:** "This policy excludes coverage for any conditions that were diagnosed or treated within 12 months prior to the policy's start date. However, if the condition remains stable for 24 months after the policy's start date, it may be eligible for coverage..."
            - **Document 2:** "Pre-existing conditions are generally not covered under this policy unless explicitly stated otherwise. Any condition that has shown symptoms or required medical attention in the 12 months before the policy start date is excluded..."
            - **Document 3:** "Exclusions apply to pre-existing conditions if they were present within a 12-month window before the policy commencement. After a waiting period of 24 months, these conditions may be reconsidered for coverage..."

            **Response:**
            The policy excludes coverage for pre-existing conditions that were diagnosed or treated within 12 months before the policy start date. However, if the condition remains stable and does not require treatment for 24 months after the policy start date, it may be eligible for coverage.
            **Citations:**
            Document 1: Policy X, Page 8
            Document 2: Policy Y, Page 15
            Document 3: Policy Z, Page 10

            ### Example 3: Query about Beneficiaries
            **Query:**
            How can I update the beneficiary for my life insurance policy?

            **Top 3 RAG Results:**
            - **Document 1:** "To update the beneficiary, the policyholder must submit a written request to the insurance company. The request should include the policy number, current beneficiary, and the new beneficiary's details..."
            - **Document 2:** "Beneficiary changes can be made by filling out the 'Beneficiary Change Form,' which must be signed by the policyholder and submitted to the insurance provider. The change takes effect upon the company's receipt and acceptance of the form..."
            - **Document 3:** "The policyholder has the right to change the beneficiary at any time by providing written notice to the insurer. The change will be recorded and acknowledged by the insurer upon receipt..."

            **Response:**
            To update the beneficiary of your life insurance policy, you need to submit a written request or complete a 'Beneficiary Change Form' provided by your insurer. This request must be signed by you as the policyholder, and the change will take effect once the insurance company has received and acknowledged it.
            **Citations:**
            Document 1: Policy X, Page 3
            Document 2: Policy Y, Page 6
            Document 3: Policy Z, Page 4

            ### Example 4: Complex Query about Premium Payments
            **Query:**
            What are the options if I miss a premium payment?

            **Top 3 RAG Results:**
            - **Document 1:** "If a premium payment is missed, the policyholder typically has a 30-day grace period to make the payment without penalty. If payment is not received within this period, the policy may lapse, and coverage could be lost..."
            - **Document 2:** "The policy includes a 30-day grace period for missed payments. During this time, coverage remains in force. If payment is not made by the end of the grace period, the policyholder may request reinstatement of the policy, subject to underwriting..."
            - **Document 3:** "In the event of a missed premium, a 30-day grace period is granted. If the premium is not paid within this period, the policyholder may choose to reinstate the policy, which may require proof of insurability and payment of overdue premiums..."

            **Response:**
            If you miss a premium payment, your policy provides a 30-day grace period during which you can make the payment without losing coverage. If the payment is not made within this period, the policy may lapse. However, you may have the option to reinstate the policy by providing proof of insurability and paying the overdue premiums.
            **Citations:**
            Document 1: Policy X, Page 10
            Document 2: Policy Y, Page 11
            Document 3: Policy Z, Page 9
        """},
    ]

    response = openai.chat.completions.create(
        model="gpt-4o-mini",
        messages=messages
    )

    return response.choices[0].message.content.split('\n')

In [61]:
# Generate the response - For Query 1

response = generate_response(query_1, top_3_RAG_1)
print("Query 1: ","\n",query_1,"\n_________________________________________________________________________________________________________________\n_________________________________________________________________________________________________________________\n")
# Print the response
print("\n".join(response))

Query 1:  
  what types of coverage does this policy include? 
_________________________________________________________________________________________________________________
_________________________________________________________________________________________________________________

The policy includes various types of coverage, although the specific details regarding coverage types were not explicitly mentioned in the documents retrieved. Based on the provided information, it is suggested that you look into the following sections of the policy documents, which may offer insights on the coverage types included:

1. **Coverage Details**: It appears that there may be a section dedicated to coverage types, which is typically outlined in the policy. The documents discuss policies but do not explicitly list the coverage contents.

2. **Exposure Section**: 
   - It may describe the risk exposures covered under the policy, which can also indicate the type of coverage included.

To get

In [62]:
# Generate the response - For Query 2

response = generate_response(query_2, top_3_RAG_2)
print("Query 2: ","\n",query_2,"\n_________________________________________________________________________________________________________________\n_________________________________________________________________________________________________________________\n")
# Print the response
print("\n".join(response))

Query 2:  
    what documentation is required when filing a claim? 
_________________________________________________________________________________________________________________
_________________________________________________________________________________________________________________

To file a claim, specific documentation is typically required to support your request and establish the basis for the claim. While the exact requirements may vary depending on the policy and the nature of the claim, common documentation required includes:

1. **Proof of Loss**: This is a formal statement providing details about the loss incurred. It serves as a foundation for your claim.
2. **Claim Form**: A claim form needs to be completed, which provides necessary details about the claim and the claimant.
3. **Supporting Documents**: This may include receipts, medical records, photographs, or any other evidence that substantiates the loss or damage.
4. **Policy Information**: A copy of the in

In [63]:
top_3_RAG_3['Metadatas_3']

Unnamed: 0,Metadatas_3
8,"{'Page_No.': 'Page 48', 'Section': 'c . If a b..."
1,"{'Section': 'f . claim requiremen', 'Page_No.'..."
6,"{'Page_No.': 'Page 47', 'Section': 'M ember's ..."


In [64]:
# Generate the response - For Query 3

response = generate_response(query_3, top_3_RAG_3)
print("Query 3: ","\n",query_3,"\n_________________________________________________________________________________________________________________\n_________________________________________________________________________________________________________________\n")
# Print the response
print("\n".join(response))

Query 3:  
    what happens if i miss a payment? 
_________________________________________________________________________________________________________________
_________________________________________________________________________________________________________________

If you miss a payment on your insurance policy, you typically have a 30-day grace period to make the payment without incurring penalties. During this grace period, your coverage remains active. If you fail to make the payment within this timeframe, your policy may lapse, meaning that your coverage is lost. In such an event, you may still have the option to reinstate the policy, which would generally require you to provide proof of insurability and pay any overdue premiums.

**Citations:**  
- Document 1: Policy X, Page 10  
- Document 2: Policy Y, Page 11  
- Document 3: Policy Z, Page 9
