#### Project Background

Retrieval-augmented generation (RAG) is an AI framework for improving the quality of LLM-generated responses by grounding the model on external sources of knowledge to supplement the LLM’s internal representation of information. Implementing RAG in an LLM-based question answering system has two main benefits: It ensures that the model has access to the most current, reliable facts, and that users have access to the model’s sources, ensuring that its claims can be checked for accuracy and ultimately trusted.


#### Problem Statement

The goal of the project is to build a robust generative search system capable of effectively and accurately answering questions from a policy document using the RAG framework described above


#### Dataset
The data used for the project is a single long life insurance policy document

In [3]:
# Install all the required libraries

!pip install -U -q pdfplumber tiktoken openai chromaDB 

In [4]:
# Import all the required Libraries

import pdfplumber
from pathlib import Path
import pandas as pd
from operator import itemgetter
import json
import tiktoken
import openai
import chromadb
import warnings
warnings.filterwarnings("ignore")

## 1. Embedding Layer

Here the  PDF document needs to be effectively processed, cleaned, and chunked for the embeddings

We will be using pdfplumber to read and process the PDF files.

pdfplumber allows for better parsing of the PDF file as it can read various elements of the PDF apart from the plain text, such as, tables, images, etc. It also offers wide functionaties and visual debugging features to help with advanced preprocessing as well.

####   Reading a single PDF file and exploring it through pdfplumber

In [8]:
# Open the PDF file
with pdfplumber.open(r'C:/Users/hp/Downloads/policy_documents/Principal-Sample-Life-Insurance-Policy.pdf') as pdf:

    # Get one of the pages from the PDF and examine it
    single_page = pdf.pages[6]

    # Extract text from the first page
    text = single_page.extract_text()

    # Extract tables from the first page
    tables = single_page.extract_tables()

    # Print the extracted text
    print(text)

Section A – Eligibility
Member Life Insurance Article 1
Member Accidental Death and Dismemberment Insurance Article 2
Dependent Life Insurance Article 3
Section B - Effective Dates
Member Life Insurance Article 1
Member Accidental Death and Dismemberment Insurance Article 2
Dependent Life Insurance Article 3
Section C - Individual Terminations
Member Life Insurance Article 1
Member Accidental Death and Dismemberment Insurance Article 2
Dependent Life Insurance Article 3
Termination for Fraud Article 4
Coverage While Outside of the United States Article 5
Section D - Continuation
Member Life Insurance Article 1
Dependent Insurance - Developmentally Disabled or
Physically Handicapped Children Article 2
Section E - Reinstatement
Reinstatement Article 1
Federal Required Family and Medical Leave Act (FMLA) Article 2
Reinstatement of Coverage for a Member or Dependent When
Coverage Ends due to Living Outside of the United States Article 3
Section F - Individual Purchase Rights
Member Life In

In [9]:
# View the table in the page, if any

tables

[]

#### Extracting text from multiple PDFsAlthough this step is not required for this particular project as we are using a single pdf file, however it is still ideal to have this as the project can be scaled further to ingest more pdfs

In [11]:
# Define the path where all pdf documents are present

pdf_path = 'C:/Users/hp/Downloads/policy_documents'

In [12]:
# Function to check whether a word is present in a table or not for segregation of regular text and tables

def check_bboxes(word, table_bbox):
    # Check whether word is inside a table bbox.
    l = word['x0'], word['top'], word['x1'], word['bottom']
    r = table_bbox
    return l[0] > r[0] and l[1] > r[1] and l[2] < r[2] and l[3] < r[3]

In [13]:
# Function to extract text from a PDF file.
# 1. Declare a variable p to store the iteration of the loop that will help us store page numbers alongside the text
# 2. Declare an empty list 'full_text' to store all the text files
# 3. Use pdfplumber to open the pdf pages one by one
# 4. Find the tables and their locations in the page
# 5. Extract the text from the tables in the variable 'tables'
# 6. Extract the regular words by calling the function check_bboxes() and checking whether words are present in the table or not
# 7. Use the cluster_objects utility to cluster non-table and table words together so that they retain the same chronology as in the original PDF
# 8. Declare an empty list 'lines' to store the page text
# 9. If a text element in present in the cluster, append it to 'lines', else if a table element is present, append the table
# 10. Append the page number and all lines to full_text, and increment 'p'
# 11. When the function has iterated over all pages, return the 'full_text' list

def extract_text_from_pdf(pdf_path):
    p = 0
    full_text = []


    with pdfplumber.open(pdf_path) as pdf:
        for page in pdf.pages:
            page_no = f"Page {p+1}"
            text = page.extract_text()

            tables = page.find_tables()
            table_bboxes = [i.bbox for i in tables]
            tables = [{'table': i.extract(), 'top': i.bbox[1]} for i in tables]
            non_table_words = [word for word in page.extract_words() if not any(
                [check_bboxes(word, table_bbox) for table_bbox in table_bboxes])]
            lines = []

            for cluster in pdfplumber.utils.cluster_objects(non_table_words + tables, itemgetter('top'), tolerance=5):

                if 'text' in cluster[0]:
                    try:
                        lines.append(' '.join([i['text'] for i in cluster]))
                    except KeyError:
                        pass

                elif 'table' in cluster[0]:
                    lines.append(json.dumps(cluster[0]['table']))


            full_text.append([page_no, " ".join(lines)])
            p +=1

    return full_text

*Now that we have defined the function for extracting the text and tables from a PDF, let's iterate and call this function for the insurance PDF and store them in a list.*

In [15]:
# Define the directory containing the PDF files
pdf_directory = Path(pdf_path)

# Initialize an empty list to store the extracted texts and document names
data = []

# Loop through all files in the directory
for pdf_path in pdf_directory.glob("*.pdf"):

    # Process the PDF file
    print(f"...Processing {pdf_path.name}")

    # Call the function to extract the text from the PDF
    extracted_text = extract_text_from_pdf(pdf_path)

    # Convert the extracted list to a PDF, and add a column to store document names
    extracted_text_df = pd.DataFrame(extracted_text, columns=['Page No.', 'Page_Text'])
    extracted_text_df['Document Name'] = pdf_path.name

    # Append the extracted text and document name to the list
    data.append(extracted_text_df)

    # Print a message to indicate progress
    print(f"Finished processing {pdf_path.name}")

# Print a message to indicate all PDFs have been processed
print("All PDFs have been processed.")

...Processing Principal-Sample-Life-Insurance-Policy.pdf
Finished processing Principal-Sample-Life-Insurance-Policy.pdf
All PDFs have been processed.


In [16]:
# Concatenate all the DFs in the list 'data' together

insurance_pdfs_data = pd.concat(data, ignore_index=True)

In [17]:
insurance_pdfs_data

Unnamed: 0,Page No.,Page_Text,Document Name
0,Page 1,DOROTHEA GLAUSE S655 RHODE ISLAND JOHN DOE 01/...,Principal-Sample-Life-Insurance-Policy.pdf
1,Page 2,This page left blank intentionally,Principal-Sample-Life-Insurance-Policy.pdf
2,Page 3,POLICY RIDER GROUP INSURANCE POLICY NO: S655 C...,Principal-Sample-Life-Insurance-Policy.pdf
3,Page 4,This page left blank intentionally,Principal-Sample-Life-Insurance-Policy.pdf
4,Page 5,PRINCIPAL LIFE INSURANCE COMPANY (called The P...,Principal-Sample-Life-Insurance-Policy.pdf
...,...,...,...
59,Page 60,I f a Dependent who was insured dies during th...,Principal-Sample-Life-Insurance-Policy.pdf
60,Page 61,Section D - Claim Procedures Article 1 - Notic...,Principal-Sample-Life-Insurance-Policy.pdf
61,Page 62,A claimant may request an appeal of a claim de...,Principal-Sample-Life-Insurance-Policy.pdf
62,Page 63,This page left blank intentionally,Principal-Sample-Life-Insurance-Policy.pdf


In [18]:
# Check one of the extracted page texts to ensure that the text has been correctly read

insurance_pdfs_data.Page_Text[2]

'POLICY RIDER GROUP INSURANCE POLICY NO: S655 COVERAGE: Life EMPLOYER: RHODE ISLAND JOHN DOE Effective on the later of the Date of Issue of this Group Policy or March 1, 2005, the following will apply to your Policy: From time to time The Principal may offer or provide certain employer groups who apply for coverage with The Principal a Financial Services Hotline and Grief Support Services or any other value added service for the employees of that employer group. In addition, The Principal may arrange for third party service providers (i.e., optometrists, health clubs), to provide discounted goods and services to those employer groups who apply for coverage with The Principal or who become insureds/enrollees of The Principal. While The Principal has arranged these goods, services and/or third party provider discounts, the third party service providers are liable to the applicants/insureds/enrollees for the provision of such goods and/or services. The Principal is not responsible for the

In [19]:
# Let's also check the length of all the texts as there might be some empty pages or pages with very few words that we can drop

insurance_pdfs_data['Text_Length'] = insurance_pdfs_data['Page_Text'].apply(lambda x: len(x.split(' ')))

In [20]:
insurance_pdfs_data['Text_Length']

0      30
1       5
2     230
3       5
4     110
     ... 
59    285
60    418
61    322
62      5
63      8
Name: Text_Length, Length: 64, dtype: int64

In [21]:
# Retain only the rows with a text length of at least 10

insurance_pdfs_data = insurance_pdfs_data.loc[insurance_pdfs_data['Text_Length'] >= 10]
insurance_pdfs_data

Unnamed: 0,Page No.,Page_Text,Document Name,Text_Length
0,Page 1,DOROTHEA GLAUSE S655 RHODE ISLAND JOHN DOE 01/...,Principal-Sample-Life-Insurance-Policy.pdf,30
2,Page 3,POLICY RIDER GROUP INSURANCE POLICY NO: S655 C...,Principal-Sample-Life-Insurance-Policy.pdf,230
4,Page 5,PRINCIPAL LIFE INSURANCE COMPANY (called The P...,Principal-Sample-Life-Insurance-Policy.pdf,110
5,Page 6,TABLE OF CONTENTS PART I - DEFINITIONS PART II...,Principal-Sample-Life-Insurance-Policy.pdf,153
6,Page 7,Section A – Eligibility Member Life Insurance ...,Principal-Sample-Life-Insurance-Policy.pdf,176
7,Page 8,Section A - Member Life Insurance Schedule of ...,Principal-Sample-Life-Insurance-Policy.pdf,171
8,Page 9,P ART I - DEFINITIONS When used in this Group ...,Principal-Sample-Life-Insurance-Policy.pdf,387
9,Page 10,T he legally recognized union of two eligible ...,Principal-Sample-Life-Insurance-Policy.pdf,251
10,Page 11,(2) has been placed with the Member or spouse ...,Principal-Sample-Life-Insurance-Policy.pdf,299
11,Page 12,An institution that is licensed as a Hospital ...,Principal-Sample-Life-Insurance-Policy.pdf,352


In [22]:
# Store the metadata for each page in a separate column

insurance_pdfs_data['Metadata'] = insurance_pdfs_data.apply(lambda x: {'Policy_Name': x['Document Name'][:-4], 'Page_No.': x['Page No.']}, axis=1)

In [23]:
insurance_pdfs_data

Unnamed: 0,Page No.,Page_Text,Document Name,Text_Length,Metadata
0,Page 1,DOROTHEA GLAUSE S655 RHODE ISLAND JOHN DOE 01/...,Principal-Sample-Life-Insurance-Policy.pdf,30,{'Policy_Name': 'Principal-Sample-Life-Insuran...
2,Page 3,POLICY RIDER GROUP INSURANCE POLICY NO: S655 C...,Principal-Sample-Life-Insurance-Policy.pdf,230,{'Policy_Name': 'Principal-Sample-Life-Insuran...
4,Page 5,PRINCIPAL LIFE INSURANCE COMPANY (called The P...,Principal-Sample-Life-Insurance-Policy.pdf,110,{'Policy_Name': 'Principal-Sample-Life-Insuran...
5,Page 6,TABLE OF CONTENTS PART I - DEFINITIONS PART II...,Principal-Sample-Life-Insurance-Policy.pdf,153,{'Policy_Name': 'Principal-Sample-Life-Insuran...
6,Page 7,Section A – Eligibility Member Life Insurance ...,Principal-Sample-Life-Insurance-Policy.pdf,176,{'Policy_Name': 'Principal-Sample-Life-Insuran...
7,Page 8,Section A - Member Life Insurance Schedule of ...,Principal-Sample-Life-Insurance-Policy.pdf,171,{'Policy_Name': 'Principal-Sample-Life-Insuran...
8,Page 9,P ART I - DEFINITIONS When used in this Group ...,Principal-Sample-Life-Insurance-Policy.pdf,387,{'Policy_Name': 'Principal-Sample-Life-Insuran...
9,Page 10,T he legally recognized union of two eligible ...,Principal-Sample-Life-Insurance-Policy.pdf,251,{'Policy_Name': 'Principal-Sample-Life-Insuran...
10,Page 11,(2) has been placed with the Member or spouse ...,Principal-Sample-Life-Insurance-Policy.pdf,299,{'Policy_Name': 'Principal-Sample-Life-Insuran...
11,Page 12,An institution that is licensed as a Hospital ...,Principal-Sample-Life-Insurance-Policy.pdf,352,{'Policy_Name': 'Principal-Sample-Life-Insuran...


In [24]:
insurance_pdfs_data['Metadata'][0]

{'Policy_Name': 'Principal-Sample-Life-Insurance-Policy', 'Page_No.': 'Page 1'}

In [25]:
max(insurance_pdfs_data['Text_Length'])

462

This concludes the chunking aspect also, as we can see that mostly the pages contain few hundred words, maximum going upto 500. So, we don't need to chunk the documents further; we can perform the embeddings on individual pages. This strategy makes sense for 2 reasons:
1. The way insurance documents are generally structured, we will not have a lot of extraneous information in a page, and all the text pieces in that page will likely be interrelated.
2. We want to have larger chunk sizes to be able to pass appropriate context to the LLM during the generation layer.

## 2. Search Layer

### 2.1 Generate and Store Embeddings using OpenAI and ChromaDB

In this section, we will embed the pages in the dataframe through OpenAI's `text-embedding-ada-002` model, and store them in a ChromaDB collection.

In [29]:
# read the API key
from config import open_api_key
openai.api_key=open_api_key

In [30]:
# Import the OpenAI Embedding Function into chroma
import chromadb
from chromadb.utils.embedding_functions import OpenAIEmbeddingFunction

In [31]:
# Define the path where chroma collections will be stored

chroma_data_path = "C:/Users/hp/Downloads/ChromaDB_Data"

In [32]:
# Call PersistentClient()

client = chromadb.PersistentClient(path=chroma_data_path)

In [33]:
# Set up the embedding function using the OpenAI embedding model

model = "text-embedding-ada-002"
embedding_function = OpenAIEmbeddingFunction(api_key=openai.api_key, model_name=model)

In [34]:
# Initialise a collection in chroma and pass the embedding_function to it so that it used OpenAI embeddings to embed the documents

insurance_collection = client.get_or_create_collection(name='RAG_on_Insurance', embedding_function=embedding_function)

In [35]:
# Convert the page text and metadata from your dataframe to lists to be able to pass it to chroma

documents_list = insurance_pdfs_data["Page_Text"].tolist()
metadata_list = insurance_pdfs_data['Metadata'].tolist()

In [36]:
# Add the documents and metadata to the collection alongwith generic integer IDs. You can also feed the metadata information as IDs by combining the policy name and page no.

insurance_collection.add(
    documents= documents_list,
    ids = [str(i) for i in range(0, len(documents_list))],
    metadatas = metadata_list
)

In [37]:
print(insurance_collection.count())

60


In [38]:
# Let's take a look at the first few entries in the collection

insurance_collection.get(
    ids = ['0','1','2'],
    include = ['embeddings', 'documents', 'metadatas']
)

{'ids': ['0', '1', '2'],
 'embeddings': array([[-2.24625990e-02,  1.87449213e-02, -2.73151454e-02, ...,
         -3.69158834e-02,  2.89750565e-03, -1.26286899e-03],
        [-1.32036684e-02,  8.89394712e-03, -4.63569537e-03, ...,
         -1.57016590e-02, -4.11756810e-05,  7.26064527e-03],
        [-1.24170724e-02,  1.34976832e-02, -2.81754183e-03, ...,
         -2.98062786e-02, -1.01497807e-02,  9.69234388e-03]],
       shape=(3, 1536)),
 'documents': ['DOROTHEA GLAUSE S655 RHODE ISLAND JOHN DOE 01/01/2014 711 HIGH STREET GEORGE RI 02903 GROUP POLICY FOR: RHODE ISLAND JOHN DOE ALL MEMBERS Group Member Life Insurance Print Date: 07/16/2014',
  'POLICY RIDER GROUP INSURANCE POLICY NO: S655 COVERAGE: Life EMPLOYER: RHODE ISLAND JOHN DOE Effective on the later of the Date of Issue of this Group Policy or March 1, 2005, the following will apply to your Policy: From time to time The Principal may offer or provide certain employer groups who apply for coverage with The Principal a Financial 

### 2.2 Implementing the Cache mechanism

In [227]:
cache_collection = client.get_or_create_collection(name='Insurance_Cache', embedding_function=embedding_function)

In [229]:
cache_collection.peek()

{'ids': ['what are the conditions for premium rate changes?',
  'what are the premium rates for the members insured?',
  'What is the name of the policyholder and when was this policy issued?'],
 'embeddings': array([[-0.00315954, -0.0032998 , -0.00205001, ...,  0.00330879,
          0.00689093, -0.0048553 ],
        [-0.01630243, -0.00665511,  0.01582092, ..., -0.02362133,
         -0.0023869 , -0.01185194],
        [-0.00161785, -0.00643846,  0.01907839, ..., -0.00274853,
         -0.02089538, -0.00258065]], shape=(3, 1536)),
 'documents': ['what are the conditions for premium rate changes?',
  'what are the premium rates for the members insured?',
  'What is the name of the policyholder and when was this policy issued?'],
 'uris': None,
 'included': ['metadatas', 'documents', 'embeddings'],
 'data': None,
 'metadatas': [{'documents0': "b . on any date the definition of Member or Dependent is changed; and c. on any date the Policyholder's business, as specified on the Policyholder ap

### 2.3 Semantic Search with Cache

In this section, we will perform a semantic search of a query in the collections embeddings to get several top semantically similar results.

In [231]:
# Read the user query

query = input()

 what are the premium rates for the members insured?


In [233]:
# Searh the Cache collection first
# Query the collection against the user query 

cache_results = cache_collection.query(
    query_texts=query,
    n_results=1
)

In [235]:
cache_results

{'ids': [['what are the premium rates for the members insured?']],
 'embeddings': None,
 'documents': [['what are the premium rates for the members insured?']],
 'uris': None,
 'included': ['metadatas', 'documents', 'distances'],
 'data': None,
 'metadatas': [[{'distances8': '0.3694944381713867',
    'documents6': "(6) If, on the date a Member becomes eligible for any increased or additional Scheduled Benefit amount, fewer than five Members are insured. (7) To make effective any Scheduled Benefit amounts for the Member that are, initially or through later increases, in excess of: - $10,000 for Members who are under age 65; and - $10,000 for Members who are age 65 or over but under age 70; and - *$10,000 for Members who are age 70 or over. *If a Member is insured under this Group Policy on its Date of Issue and this insurance replaces insurance in force on the day immediately before the Date of Issue: the lesser of the amount shown above or the amount for which the Member was insured un

In [237]:
# Implementing Cache in Semantic Search

# Set a threshold for cache search
threshold = 0.2

ids = []
documents = []
distances = []
metadatas = []
results_df = pd.DataFrame()


# If the distance is greater than the threshold, then return the results from the main collection.

if cache_results['distances'][0] == [] or cache_results['distances'][0][0] > threshold:
      # Query the collection against the user query and return the top 10 results
      results = insurance_collection.query(
      query_texts=query,
      n_results=10
      )

      # Store the query in cache_collection as document w.r.t to ChromaDB so that it can be embedded and searched against later
      # Store retrieved text, ids, distances and metadatas in cache_collection as metadatas, so that they can be fetched easily if a query indeed matches to a query in cache
      Keys = []
      Values = []

      for key, val in results.items():
        if val is None:
          continue
        if key != 'included':    
          for i in range(10):
            Keys.append(str(key)+str(i))
            Values.append(str(val[0][i]))

    

      cache_collection.add(
          documents= [query],
          ids = [query],
          metadatas = dict(zip(Keys, Values))
      )

      print("Not found in cache. Found in main collection.")

      result_dict = {'Metadatas': results['metadatas'][0], 'Documents': results['documents'][0], 'Distances': results['distances'][0], "IDs":results["ids"][0]}
      results_df = pd.DataFrame.from_dict(result_dict)
      results_df


# If the distance is, however, less than the threshold, you can return the results from cache

elif cache_results['distances'][0][0] <= threshold:
      cache_result_dict = cache_results['metadatas'][0][0]

      # Loop through each inner list and then through the dictionary
      for key, value in cache_result_dict.items():
          if 'ids' in key:
              ids.append(value)
          elif 'documents' in key:
              documents.append(value)
          elif 'distances' in key:
              distances.append(value)
          elif 'metadatas' in key:
              metadatas.append(value)

      print("Found in cache!")

      # Create a DataFrame
      results_df = pd.DataFrame({
        'IDs': ids,
        'Documents': documents,
        'Distances': distances,
        'Metadatas': metadatas
      })


Found in cache!


In [239]:
results_df

Unnamed: 0,IDs,Documents,Distances,Metadatas
0,19,"(6) If, on the date a Member becomes eligible for any increased or additional Scheduled Benefit amount, fewer than five Members are insured. (7) To make effective any Scheduled Benefit amounts for the Member that are, initially or through later increases, in excess of: - $10,000 for Members who are under age 65; and - $10,000 for Members who are age 65 or over but under age 70; and - *$10,000 for Members who are age 70 or over. *If a Member is insured under this Group Policy on its Date of Issue and this insurance replaces insurance in force on the day immediately before the Date of Issue: the lesser of the amount shown above or the amount for which the Member was insured under the replaced insurance. f. Effective Date for Benefit Changes Due to Change in Insurance Class (1) A change in the Member's Scheduled Benefit amount because of a change in the Member's insurance class for which Proof of Good Health is not required (see e. above) will normally be effective on the date of chan...",0.3694944381713867,"{'Page_No.': 'Page 53', 'Policy_Name': 'Principal-Sample-Life-Insurance-Policy'}"
1,27,"PART IV - BENEFITS Section A - Member Life Insurance Article 1 - Schedule of Insurance Subject to the Effective Date provisions of PART III, Section B, and the qualifying provisions of this Section A, the Scheduled Benefit for an insured Member will be based on his or her class: Class *Scheduled Benefit ALL MEMBERS $10,000 However, if a Member has received any payments under the Accelerated Benefits provision as described in Section A, Article 7, the Scheduled Benefit will be reduced by the amount of such payment. *The Scheduled Benefit is subject to the Proof of Good Health requirements as shown in PART III, Section B, Article 1. Because of the Proof of Good Health requirements, the amount of insurance approved by The Principal may be different than the Scheduled Benefit. If the approved amount of insurance is different than the Scheduled Benefit, the approved amount will apply. For the age(s) shown below, the amount of a Member's insurance will be the percentage of the Scheduled ...",0.3672015964984894,"{'Page_No.': 'Page 35', 'Policy_Name': 'Principal-Sample-Life-Insurance-Policy'}"
2,50,"Section B - Member Accidental Death and Dismemberment Insurance Article 1 - Schedule of Insurance Subject to the Effective Date provisions of PART III, Section B, and the qualifying provisions of this Section B, the Scheduled Benefit for an insured Member will be based on his or her class: Class *Scheduled Benefit ALL MEMBERS $10,000 *The Scheduled Benefit is subject to the Proof of Good Health requirements as shown in PART III, Section B, Article 1. Because of the Proof of Good Health requirements, the amount of insurance approved by The Principal may be different than the Scheduled Benefit. If the approved amount of insurance is different than the Scheduled Benefit, the approved amount will apply. For the age(s) shown below, the amount of a Member's insurance will be the percentage of the Scheduled Benefit (or approved amount, if applicable) as shown below. Age % of Scheduled Benefit (or approved amount, whichever applies) Age 70 but less than age 75 65% Age 75 and over 45% Artic...",0.3395887315273285,"{'Page_No.': 'Page 22', 'Policy_Name': 'Principal-Sample-Life-Insurance-Policy'}"
3,14,"T he Principal may terminate the Policyholder's coverage on any premium due date if the Policyholder relocates to a state where this Group Policy is not marketed, by giving the Policyholder 31 days advanced notice in Writing. Article 4 - Policyholder Responsibility to Members If this Group Policy terminates for any reason, the Policyholder must: a. notify each Member of the effective date of the termination; and b. refund or otherwise account to each Member all contributions received or withheld from Members for premiums not actually paid to The Principal. This policy has been updated effective January 1, 2014 PART II - POLICY ADMINISTRATION GC 6005 Section C - Policy Termination, Page 2",0.2943752706050873,"{'Page_No.': 'Page 46', 'Policy_Name': 'Principal-Sample-Life-Insurance-Policy'}"
4,43,"a. be actively engaged in business for profit within the meaning of the Internal Revenue Code, or be established as a legitimate nonprofit corporation within the meaning of the Internal Revenue Code; and b. make at least the level of premium contributions required for insurance on its eligible Members. The Policyholder must: (1) contribute at least 50% of the required premium for all Members (including disabled Members, if any); and c. if the Member is to contribute part of the premium, maintain the following participation percentages with respect to eligible employees and Dependents, excluding those for whom Proof of Good Health is not satisfactory to The Principal: (1) Employees: - at least 75% of all eligible employees must enroll; (2) Dependents: - maintain a Dependent participation of at least 75% of eligible Dependents; and d. if the Member is to contribute no part of the premium, 100% of eligible employees and Dependents must enroll. Article 4 - Policy Incontestability In th...",0.3701924681663513,"{'Policy_Name': 'Principal-Sample-Life-Insurance-Policy', 'Page_No.': 'Page 17'}"
5,32,"Section B - Premiums Article 1 - Payment Responsibility; Due Dates; Grace Period The Policyholder is responsible for collection and payment of all premiums due while this Group Policy is in force. Payments must be sent to the home office of The Principal in Des Moines, Iowa. The first premium is due on the Date of Issue of this Group Policy. Each premium thereafter will be due on the first of each Insurance Month. Except for the first premium, a Grace Period of 31 days will be allowed for payment of premium. ""Grace Period"" means the first 31-day period following a premium due date. The Group Policy will remain in force until the end of the Grace Period, unless the Group Policy has been terminated by notice as described in PART II, Section C. The Policyholder will be liable for payment of the premium for the time this Group Policy remains in force during the Grace Period. Article 2 - Premium Rates The premium rate(s) for each Member insured for Life Insurance will be: a. Member Life...",0.3244076371192932,"{'Page_No.': 'Page 30', 'Policy_Name': 'Principal-Sample-Life-Insurance-Policy'}"
6,18,"b . on any date the definition of Member or Dependent is changed; and c. on any date the Policyholder's business, as specified on the Policyholder application, is changed; and d. on any date that a schedule of insurance or class of insured Members is changed; and e. on any premium due date, if the Policyholder has been receiving a multiple policy discount rate and the Policyholder drops below the minimum number of coverages to receive such discount rate; and f. on any date the premium contribution required of Members is changed; and g. with respect to Member Life Insurance, on any Policy Anniversary, if the average age, average Scheduled Benefit amount, or the male/female distribution for then insured Members has changed since the last Policy Anniversary; and h. on any Policy Anniversary, if the volume of insurance for then insured Members has increased or decreased by more than 25% since the last Policy Anniversary. If the Policyholder has other group insurance with The Principal,...",0.3016231656074524,"{'Page_No.': 'Page 21', 'Policy_Name': 'Principal-Sample-Life-Insurance-Policy'}"
7,21,"Section C - Individual Terminations Article 1 - Member Life Insurance A Member's insurance under this Group Policy will terminate on the earliest of: a. the date this Group Policy is terminated; or b. the date the last premium is paid for the Member's insurance; or c. any date desired, if requested by the Member before that date; or d. the date the Member ceases to be a Member as defined in PART I; or e. the date the Member ceases to be in a class for which Member Life Insurance is provided; or f. the date the Member retires; or g. the date the Member ceases Active Work. Article 2 - Member Accidental Death and Dismemberment Insurance A Member's Accidental Death and Dismemberment Insurance under this Group Policy will terminate on the earliest of: a. the date his or her Member Life Insurance ceases; or b. the date Member Accidental Death and Dismemberment Insurance is removed from this Group Policy; or c. the date the last premium is paid for the Member's Accidental Death and Dismem...",0.2897762358188629,"{'Page_No.': 'Page 52', 'Policy_Name': 'Principal-Sample-Life-Insurance-Policy'}"
8,17,"The number of Members insured for Dependent Life Insurance will be multiplied by the premium rate then in effect. To ensure accurate premium calculations, the Policyholder is responsible for reporting to The Principal, the following information during the stated time periods: a. Members who are eligible to become insured are to be reported during the month prior to or during the month that coverage becomes effective. b. Members whose coverage has terminated are to be reported within a month of the date coverage terminated. c. Changes in Member insurance class are to be reported within a month of the date that the change in insurance class took place. If a Member is added or a present Member's insurance is increased or terminated on other than the first of an Insurance Month, premium for that Member will be adjusted and applied as if the change were to take place on the first of the next following Insurance Month. Article 5 - Contributions from Members Members are not required to co...",0.3475748300552368,"{'Page_No.': 'Page 20', 'Policy_Name': 'Principal-Sample-Life-Insurance-Policy'}"
9,49,"(1) only one Accelerated Benefit payment will be made during the Member's lifetime; and (2) the amount requested must be at least $5,000; and (3) in no event will payment exceed the lesser of: - 75% of the Member Life Insurance benefit in force on the date of the request; or - $250,000. The Accelerated Benefit payment will be made in a lump sum. d. Effect on Member Life Insurance Benefits If an Accelerated Benefit is paid, the Member Life Insurance Benefit otherwise payable upon the Member's death will be reduced by any Accelerated Benefit payment. e. Premium Waiver Period A premium waiver period will be established on the date The Principal pays an Accelerated Benefit to a Member. This period will end on the earlier of the Member's death or the date two years after the date of the Accelerated Benefit. During a premium waiver period: (1) there will be no Member Life and Member Accidental Death and Dismemberment Insurance and Dependent Life Insurance premium charge for the Member; a...",0.3690225481986999,"{'Policy_Name': 'Principal-Sample-Life-Insurance-Policy', 'Page_No.': 'Page 24'}"


### 2.4 ReRanking Block

In [241]:
# Import the CrossEncoder library from sentence_transformers
from sentence_transformers import CrossEncoder, util

In [243]:
# Initialise the cross encoder model

cross_encoder = CrossEncoder('cross-encoder/ms-marco-MiniLM-L-12-v2')

In [245]:
scores = cross_encoder.predict([['Does the insurance cover diabetic patients?', 'The insurance policy covers some pre-existing conditions including diabetes, heart diseases, etc. The policy does not howev'],
                                ['Does the insurance cover diabetic patients?', 'The premium rates for various age groups are given as follows. Age group (<18 years): Premium rate']])

In [247]:
scores

array([  4.460841, -11.19713 ], dtype=float32)

In [249]:
# Input (query, response) pairs for each of the top 10 responses received from the semantic search to the cross encoder
# Generate the cross_encoder scores for these pairs

cross_inputs = [[query, response] for response in results_df['Documents']]
cross_rerank_scores = cross_encoder.predict(cross_inputs)

In [251]:
cross_rerank_scores

array([ 3.3479323 , -1.3688796 , -2.0769591 , -7.3283377 ,  0.38655895,
        2.8565874 ,  1.6918321 , -3.7101178 ,  2.3693707 ,  0.5782056 ],
      dtype=float32)

In [253]:
# Store the rerank_scores in results_df

results_df['Reranked_scores'] = cross_rerank_scores

In [255]:
results_df

Unnamed: 0,IDs,Documents,Distances,Metadatas,Reranked_scores
0,19,"(6) If, on the date a Member becomes eligible for any increased or additional Scheduled Benefit amount, fewer than five Members are insured. (7) To make effective any Scheduled Benefit amounts for the Member that are, initially or through later increases, in excess of: - $10,000 for Members who are under age 65; and - $10,000 for Members who are age 65 or over but under age 70; and - *$10,000 for Members who are age 70 or over. *If a Member is insured under this Group Policy on its Date of Issue and this insurance replaces insurance in force on the day immediately before the Date of Issue: the lesser of the amount shown above or the amount for which the Member was insured under the replaced insurance. f. Effective Date for Benefit Changes Due to Change in Insurance Class (1) A change in the Member's Scheduled Benefit amount because of a change in the Member's insurance class for which Proof of Good Health is not required (see e. above) will normally be effective on the date of chan...",0.3694944381713867,"{'Page_No.': 'Page 53', 'Policy_Name': 'Principal-Sample-Life-Insurance-Policy'}",3.347932
1,27,"PART IV - BENEFITS Section A - Member Life Insurance Article 1 - Schedule of Insurance Subject to the Effective Date provisions of PART III, Section B, and the qualifying provisions of this Section A, the Scheduled Benefit for an insured Member will be based on his or her class: Class *Scheduled Benefit ALL MEMBERS $10,000 However, if a Member has received any payments under the Accelerated Benefits provision as described in Section A, Article 7, the Scheduled Benefit will be reduced by the amount of such payment. *The Scheduled Benefit is subject to the Proof of Good Health requirements as shown in PART III, Section B, Article 1. Because of the Proof of Good Health requirements, the amount of insurance approved by The Principal may be different than the Scheduled Benefit. If the approved amount of insurance is different than the Scheduled Benefit, the approved amount will apply. For the age(s) shown below, the amount of a Member's insurance will be the percentage of the Scheduled ...",0.3672015964984894,"{'Page_No.': 'Page 35', 'Policy_Name': 'Principal-Sample-Life-Insurance-Policy'}",-1.36888
2,50,"Section B - Member Accidental Death and Dismemberment Insurance Article 1 - Schedule of Insurance Subject to the Effective Date provisions of PART III, Section B, and the qualifying provisions of this Section B, the Scheduled Benefit for an insured Member will be based on his or her class: Class *Scheduled Benefit ALL MEMBERS $10,000 *The Scheduled Benefit is subject to the Proof of Good Health requirements as shown in PART III, Section B, Article 1. Because of the Proof of Good Health requirements, the amount of insurance approved by The Principal may be different than the Scheduled Benefit. If the approved amount of insurance is different than the Scheduled Benefit, the approved amount will apply. For the age(s) shown below, the amount of a Member's insurance will be the percentage of the Scheduled Benefit (or approved amount, if applicable) as shown below. Age % of Scheduled Benefit (or approved amount, whichever applies) Age 70 but less than age 75 65% Age 75 and over 45% Artic...",0.3395887315273285,"{'Page_No.': 'Page 22', 'Policy_Name': 'Principal-Sample-Life-Insurance-Policy'}",-2.076959
3,14,"T he Principal may terminate the Policyholder's coverage on any premium due date if the Policyholder relocates to a state where this Group Policy is not marketed, by giving the Policyholder 31 days advanced notice in Writing. Article 4 - Policyholder Responsibility to Members If this Group Policy terminates for any reason, the Policyholder must: a. notify each Member of the effective date of the termination; and b. refund or otherwise account to each Member all contributions received or withheld from Members for premiums not actually paid to The Principal. This policy has been updated effective January 1, 2014 PART II - POLICY ADMINISTRATION GC 6005 Section C - Policy Termination, Page 2",0.2943752706050873,"{'Page_No.': 'Page 46', 'Policy_Name': 'Principal-Sample-Life-Insurance-Policy'}",-7.328338
4,43,"a. be actively engaged in business for profit within the meaning of the Internal Revenue Code, or be established as a legitimate nonprofit corporation within the meaning of the Internal Revenue Code; and b. make at least the level of premium contributions required for insurance on its eligible Members. The Policyholder must: (1) contribute at least 50% of the required premium for all Members (including disabled Members, if any); and c. if the Member is to contribute part of the premium, maintain the following participation percentages with respect to eligible employees and Dependents, excluding those for whom Proof of Good Health is not satisfactory to The Principal: (1) Employees: - at least 75% of all eligible employees must enroll; (2) Dependents: - maintain a Dependent participation of at least 75% of eligible Dependents; and d. if the Member is to contribute no part of the premium, 100% of eligible employees and Dependents must enroll. Article 4 - Policy Incontestability In th...",0.3701924681663513,"{'Policy_Name': 'Principal-Sample-Life-Insurance-Policy', 'Page_No.': 'Page 17'}",0.386559
5,32,"Section B - Premiums Article 1 - Payment Responsibility; Due Dates; Grace Period The Policyholder is responsible for collection and payment of all premiums due while this Group Policy is in force. Payments must be sent to the home office of The Principal in Des Moines, Iowa. The first premium is due on the Date of Issue of this Group Policy. Each premium thereafter will be due on the first of each Insurance Month. Except for the first premium, a Grace Period of 31 days will be allowed for payment of premium. ""Grace Period"" means the first 31-day period following a premium due date. The Group Policy will remain in force until the end of the Grace Period, unless the Group Policy has been terminated by notice as described in PART II, Section C. The Policyholder will be liable for payment of the premium for the time this Group Policy remains in force during the Grace Period. Article 2 - Premium Rates The premium rate(s) for each Member insured for Life Insurance will be: a. Member Life...",0.3244076371192932,"{'Page_No.': 'Page 30', 'Policy_Name': 'Principal-Sample-Life-Insurance-Policy'}",2.856587
6,18,"b . on any date the definition of Member or Dependent is changed; and c. on any date the Policyholder's business, as specified on the Policyholder application, is changed; and d. on any date that a schedule of insurance or class of insured Members is changed; and e. on any premium due date, if the Policyholder has been receiving a multiple policy discount rate and the Policyholder drops below the minimum number of coverages to receive such discount rate; and f. on any date the premium contribution required of Members is changed; and g. with respect to Member Life Insurance, on any Policy Anniversary, if the average age, average Scheduled Benefit amount, or the male/female distribution for then insured Members has changed since the last Policy Anniversary; and h. on any Policy Anniversary, if the volume of insurance for then insured Members has increased or decreased by more than 25% since the last Policy Anniversary. If the Policyholder has other group insurance with The Principal,...",0.3016231656074524,"{'Page_No.': 'Page 21', 'Policy_Name': 'Principal-Sample-Life-Insurance-Policy'}",1.691832
7,21,"Section C - Individual Terminations Article 1 - Member Life Insurance A Member's insurance under this Group Policy will terminate on the earliest of: a. the date this Group Policy is terminated; or b. the date the last premium is paid for the Member's insurance; or c. any date desired, if requested by the Member before that date; or d. the date the Member ceases to be a Member as defined in PART I; or e. the date the Member ceases to be in a class for which Member Life Insurance is provided; or f. the date the Member retires; or g. the date the Member ceases Active Work. Article 2 - Member Accidental Death and Dismemberment Insurance A Member's Accidental Death and Dismemberment Insurance under this Group Policy will terminate on the earliest of: a. the date his or her Member Life Insurance ceases; or b. the date Member Accidental Death and Dismemberment Insurance is removed from this Group Policy; or c. the date the last premium is paid for the Member's Accidental Death and Dismem...",0.2897762358188629,"{'Page_No.': 'Page 52', 'Policy_Name': 'Principal-Sample-Life-Insurance-Policy'}",-3.710118
8,17,"The number of Members insured for Dependent Life Insurance will be multiplied by the premium rate then in effect. To ensure accurate premium calculations, the Policyholder is responsible for reporting to The Principal, the following information during the stated time periods: a. Members who are eligible to become insured are to be reported during the month prior to or during the month that coverage becomes effective. b. Members whose coverage has terminated are to be reported within a month of the date coverage terminated. c. Changes in Member insurance class are to be reported within a month of the date that the change in insurance class took place. If a Member is added or a present Member's insurance is increased or terminated on other than the first of an Insurance Month, premium for that Member will be adjusted and applied as if the change were to take place on the first of the next following Insurance Month. Article 5 - Contributions from Members Members are not required to co...",0.3475748300552368,"{'Page_No.': 'Page 20', 'Policy_Name': 'Principal-Sample-Life-Insurance-Policy'}",2.369371
9,49,"(1) only one Accelerated Benefit payment will be made during the Member's lifetime; and (2) the amount requested must be at least $5,000; and (3) in no event will payment exceed the lesser of: - 75% of the Member Life Insurance benefit in force on the date of the request; or - $250,000. The Accelerated Benefit payment will be made in a lump sum. d. Effect on Member Life Insurance Benefits If an Accelerated Benefit is paid, the Member Life Insurance Benefit otherwise payable upon the Member's death will be reduced by any Accelerated Benefit payment. e. Premium Waiver Period A premium waiver period will be established on the date The Principal pays an Accelerated Benefit to a Member. This period will end on the earlier of the Member's death or the date two years after the date of the Accelerated Benefit. During a premium waiver period: (1) there will be no Member Life and Member Accidental Death and Dismemberment Insurance and Dependent Life Insurance premium charge for the Member; a...",0.3690225481986999,"{'Policy_Name': 'Principal-Sample-Life-Insurance-Policy', 'Page_No.': 'Page 24'}",0.578206


In [211]:
# Return the top 3 results from semantic search

top_3_semantic = results_df.sort_values(by='Distances')
top_3_semantic[:3]

Unnamed: 0,IDs,Documents,Distances,Metadatas,Reranked_scores
0,18,"a. be actively engaged in business for profit within the meaning of the Internal Revenue Code, or be established as a legitimate nonprofit corporation within the meaning of the Internal Revenue Code; and b. make at least the level of premium contributions required for insurance on its eligible Members. The Policyholder must: (1) contribute at least 50% of the required premium for all Members (including disabled Members, if any); and c. if the Member is to contribute part of the premium, maintain the following participation percentages with respect to eligible employees and Dependents, excluding those for whom Proof of Good Health is not satisfactory to The Principal: (1) Employees: - at least 75% of all eligible employees must enroll; (2) Dependents: - maintain a Dependent participation of at least 75% of eligible Dependents; and d. if the Member is to contribute no part of the premium, 100% of eligible employees and Dependents must enroll. Article 4 - Policy Incontestability In th...",0.3210233449935913,"{'Page_No.': 'Page 23', 'Policy_Name': 'Principal-Sample-Life-Insurance-Policy'}",-9.70567
8,14,"(6) If, on the date a Member becomes eligible for any increased or additional Scheduled Benefit amount, fewer than five Members are insured. (7) To make effective any Scheduled Benefit amounts for the Member that are, initially or through later increases, in excess of: - $10,000 for Members who are under age 65; and - $10,000 for Members who are age 65 or over but under age 70; and - *$10,000 for Members who are age 70 or over. *If a Member is insured under this Group Policy on its Date of Issue and this insurance replaces insurance in force on the day immediately before the Date of Issue: the lesser of the amount shown above or the amount for which the Member was insured under the replaced insurance. f. Effective Date for Benefit Changes Due to Change in Insurance Class (1) A change in the Member's Scheduled Benefit amount because of a change in the Member's insurance class for which Proof of Good Health is not required (see e. above) will normally be effective on the date of chan...",0.3658180832862854,"{'Page_No.': 'Page 17', 'Policy_Name': 'Principal-Sample-Life-Insurance-Policy'}",-6.970013
7,3,"TABLE OF CONTENTS PART I - DEFINITIONS PART II - POLICY ADMINISTRATION Section A – Contract Entire Contract Article 1 Policy Changes Article 2 Policyholder Eligibility Requirements Article 3 Policy Incontestability Article 4 Individual Incontestability Article 5 Information to be Furnished Article 6 Certificates Article 7 Assignments Article 8 Dependent Rights Article 9 Policy Interpretation Article 10 Electronic Transactions Article 11 Section B – Premium Payment Responsibility; Due Dates; Grace Period Article 1 Premium Rates Article 2 Premium Rate Changes Article 3 Premium Amount Article 4 Contributions from Members Article 5 Section C - Policy Termination Failure to Pay Premium Article 1 Termination Rights of the Policyholder Article 2 Termination Rights of The Principal Article 3 Policyholder Responsibility to Members Article 4 Section D - Policy Renewal Renewal Article 1 PART III - INDIVIDUAL REQUIREMENTS AND RIGHTS This policy has been updated effective January 1, 2014 GC 600...",0.3705097436904907,"{'Page_No.': 'Page 31', 'Policy_Name': 'Principal-Sample-Life-Insurance-Policy'}",-2.53408


In [257]:
# Return the top 3 results after reranking
pd.set_option("display.max_colwidth", None)
top_3_rerank = results_df.sort_values(by='Reranked_scores', ascending=False)
top_3_rerank[:3]

Unnamed: 0,IDs,Documents,Distances,Metadatas,Reranked_scores
0,19,"(6) If, on the date a Member becomes eligible for any increased or additional Scheduled Benefit amount, fewer than five Members are insured. (7) To make effective any Scheduled Benefit amounts for the Member that are, initially or through later increases, in excess of: - $10,000 for Members who are under age 65; and - $10,000 for Members who are age 65 or over but under age 70; and - *$10,000 for Members who are age 70 or over. *If a Member is insured under this Group Policy on its Date of Issue and this insurance replaces insurance in force on the day immediately before the Date of Issue: the lesser of the amount shown above or the amount for which the Member was insured under the replaced insurance. f. Effective Date for Benefit Changes Due to Change in Insurance Class (1) A change in the Member's Scheduled Benefit amount because of a change in the Member's insurance class for which Proof of Good Health is not required (see e. above) will normally be effective on the date of change. However, if the Member is not Actively at Work on the date a Scheduled Benefit change would otherwise be effective, the Scheduled Benefit change will not be in force until the date the Member returns to Active Work. Any decrease in Scheduled Benefit amounts due to a change in a Member's insurance class will be effective on the date of the change, whether or not the Member is Actively at Work. Any termination of Scheduled Benefit amounts due to a change in the Member's insurance class will be effective on the date of the change, whether or not the Member is Actively at Work. (2) A change in a Member's Scheduled Benefit amount because of a change in the Member's insurance class for which Proof of Good Health is required (see e. above) will be effective on the later of: - the date the change would have been effective if Proof of Good Health had not been required; or - the date Proof of Good Health is approved by The Principal. g. Effective Date for Benefit Changes Due to Change by Policy Amendment (1) A change in the Member's Scheduled Benefit amount because of a change in the Schedule of Insurance (as described in PART IV, Section A) by amendment to this Group Policy for which Proof of Good Health is not required (see e. above) will be effective on the date of change. However, if the Member is not Actively at Work on the date an increase in the Scheduled Benefit would otherwise be effective, the This policy has been updated effective January 1, 2014 PART III - INDIVIDUAL REQUIREMENTS AND RIGHTS GC 6007 Section B - Effective Dates, Page 3",0.3694944381713867,"{'Page_No.': 'Page 53', 'Policy_Name': 'Principal-Sample-Life-Insurance-Policy'}",3.347932
5,32,"Section B - Premiums Article 1 - Payment Responsibility; Due Dates; Grace Period The Policyholder is responsible for collection and payment of all premiums due while this Group Policy is in force. Payments must be sent to the home office of The Principal in Des Moines, Iowa. The first premium is due on the Date of Issue of this Group Policy. Each premium thereafter will be due on the first of each Insurance Month. Except for the first premium, a Grace Period of 31 days will be allowed for payment of premium. ""Grace Period"" means the first 31-day period following a premium due date. The Group Policy will remain in force until the end of the Grace Period, unless the Group Policy has been terminated by notice as described in PART II, Section C. The Policyholder will be liable for payment of the premium for the time this Group Policy remains in force during the Grace Period. Article 2 - Premium Rates The premium rate(s) for each Member insured for Life Insurance will be: a. Member Life Insurance $0.210 for each $1,000 of insurance in force. b. Member Accidental Death and Dismemberment Insurance $0.025 for each $1,000 of Member Life Insurance in force. c. Dependent Life Insurance $1.46 for each Member insured for Dependent Life Insurance. If the Policyholder has at least two other eligible group insurance policies underwritten by The Principal, as determined by The Principal, the Policyholder may be eligible for a multiple policy discount. Article 3 - Premium Rate Changes The Principal may change a premium rate: a. on any premium due date, if the initial premium rate has then been in force 24 months or more and if Written notice is given to the Policyholder at least 31 days before the date of change; or This policy has been updated effective January 1, 2014 PART II - POLICY ADMINISTRATION GC 6004 Section B - Premiums, Page 1",0.3244076371192932,"{'Page_No.': 'Page 30', 'Policy_Name': 'Principal-Sample-Life-Insurance-Policy'}",2.856587
8,17,"The number of Members insured for Dependent Life Insurance will be multiplied by the premium rate then in effect. To ensure accurate premium calculations, the Policyholder is responsible for reporting to The Principal, the following information during the stated time periods: a. Members who are eligible to become insured are to be reported during the month prior to or during the month that coverage becomes effective. b. Members whose coverage has terminated are to be reported within a month of the date coverage terminated. c. Changes in Member insurance class are to be reported within a month of the date that the change in insurance class took place. If a Member is added or a present Member's insurance is increased or terminated on other than the first of an Insurance Month, premium for that Member will be adjusted and applied as if the change were to take place on the first of the next following Insurance Month. Article 5 - Contributions from Members Members are not required to contribute a part of the premium for their Member insurance under this Group Policy. Members are required to contribute a part of the premium for their Dependent's insurance under this Group Policy. This policy has been updated effective January 1, 2014 PART II - POLICY ADMINISTRATION GC 6004 Section B - Premiums, Page 3",0.3475748300552368,"{'Page_No.': 'Page 20', 'Policy_Name': 'Principal-Sample-Life-Insurance-Policy'}",2.369371


In [265]:
pd.set_option('display.max_colwidth', None)
top_3_rerank[:3]

Unnamed: 0,IDs,Documents,Distances,Metadatas,Reranked_scores
0,19,"(6) If, on the date a Member becomes eligible for any increased or additional Scheduled Benefit amount, fewer than five Members are insured. (7) To make effective any Scheduled Benefit amounts for the Member that are, initially or through later increases, in excess of: - $10,000 for Members who are under age 65; and - $10,000 for Members who are age 65 or over but under age 70; and - *$10,000 for Members who are age 70 or over. *If a Member is insured under this Group Policy on its Date of Issue and this insurance replaces insurance in force on the day immediately before the Date of Issue: the lesser of the amount shown above or the amount for which the Member was insured under the replaced insurance. f. Effective Date for Benefit Changes Due to Change in Insurance Class (1) A change in the Member's Scheduled Benefit amount because of a change in the Member's insurance class for which Proof of Good Health is not required (see e. above) will normally be effective on the date of change. However, if the Member is not Actively at Work on the date a Scheduled Benefit change would otherwise be effective, the Scheduled Benefit change will not be in force until the date the Member returns to Active Work. Any decrease in Scheduled Benefit amounts due to a change in a Member's insurance class will be effective on the date of the change, whether or not the Member is Actively at Work. Any termination of Scheduled Benefit amounts due to a change in the Member's insurance class will be effective on the date of the change, whether or not the Member is Actively at Work. (2) A change in a Member's Scheduled Benefit amount because of a change in the Member's insurance class for which Proof of Good Health is required (see e. above) will be effective on the later of: - the date the change would have been effective if Proof of Good Health had not been required; or - the date Proof of Good Health is approved by The Principal. g. Effective Date for Benefit Changes Due to Change by Policy Amendment (1) A change in the Member's Scheduled Benefit amount because of a change in the Schedule of Insurance (as described in PART IV, Section A) by amendment to this Group Policy for which Proof of Good Health is not required (see e. above) will be effective on the date of change. However, if the Member is not Actively at Work on the date an increase in the Scheduled Benefit would otherwise be effective, the This policy has been updated effective January 1, 2014 PART III - INDIVIDUAL REQUIREMENTS AND RIGHTS GC 6007 Section B - Effective Dates, Page 3",0.3694944381713867,"{'Page_No.': 'Page 53', 'Policy_Name': 'Principal-Sample-Life-Insurance-Policy'}",3.347932
5,32,"Section B - Premiums Article 1 - Payment Responsibility; Due Dates; Grace Period The Policyholder is responsible for collection and payment of all premiums due while this Group Policy is in force. Payments must be sent to the home office of The Principal in Des Moines, Iowa. The first premium is due on the Date of Issue of this Group Policy. Each premium thereafter will be due on the first of each Insurance Month. Except for the first premium, a Grace Period of 31 days will be allowed for payment of premium. ""Grace Period"" means the first 31-day period following a premium due date. The Group Policy will remain in force until the end of the Grace Period, unless the Group Policy has been terminated by notice as described in PART II, Section C. The Policyholder will be liable for payment of the premium for the time this Group Policy remains in force during the Grace Period. Article 2 - Premium Rates The premium rate(s) for each Member insured for Life Insurance will be: a. Member Life Insurance $0.210 for each $1,000 of insurance in force. b. Member Accidental Death and Dismemberment Insurance $0.025 for each $1,000 of Member Life Insurance in force. c. Dependent Life Insurance $1.46 for each Member insured for Dependent Life Insurance. If the Policyholder has at least two other eligible group insurance policies underwritten by The Principal, as determined by The Principal, the Policyholder may be eligible for a multiple policy discount. Article 3 - Premium Rate Changes The Principal may change a premium rate: a. on any premium due date, if the initial premium rate has then been in force 24 months or more and if Written notice is given to the Policyholder at least 31 days before the date of change; or This policy has been updated effective January 1, 2014 PART II - POLICY ADMINISTRATION GC 6004 Section B - Premiums, Page 1",0.3244076371192932,"{'Page_No.': 'Page 30', 'Policy_Name': 'Principal-Sample-Life-Insurance-Policy'}",2.856587
8,17,"The number of Members insured for Dependent Life Insurance will be multiplied by the premium rate then in effect. To ensure accurate premium calculations, the Policyholder is responsible for reporting to The Principal, the following information during the stated time periods: a. Members who are eligible to become insured are to be reported during the month prior to or during the month that coverage becomes effective. b. Members whose coverage has terminated are to be reported within a month of the date coverage terminated. c. Changes in Member insurance class are to be reported within a month of the date that the change in insurance class took place. If a Member is added or a present Member's insurance is increased or terminated on other than the first of an Insurance Month, premium for that Member will be adjusted and applied as if the change were to take place on the first of the next following Insurance Month. Article 5 - Contributions from Members Members are not required to contribute a part of the premium for their Member insurance under this Group Policy. Members are required to contribute a part of the premium for their Dependent's insurance under this Group Policy. This policy has been updated effective January 1, 2014 PART II - POLICY ADMINISTRATION GC 6004 Section B - Premiums, Page 3",0.3475748300552368,"{'Page_No.': 'Page 20', 'Policy_Name': 'Principal-Sample-Life-Insurance-Policy'}",2.369371


In [267]:
top_3_RAG = top_3_rerank[["Documents", "Metadatas"]][:3]

In [269]:

top_3_RAG

Unnamed: 0,Documents,Metadatas
0,"(6) If, on the date a Member becomes eligible for any increased or additional Scheduled Benefit amount, fewer than five Members are insured. (7) To make effective any Scheduled Benefit amounts for the Member that are, initially or through later increases, in excess of: - $10,000 for Members who are under age 65; and - $10,000 for Members who are age 65 or over but under age 70; and - *$10,000 for Members who are age 70 or over. *If a Member is insured under this Group Policy on its Date of Issue and this insurance replaces insurance in force on the day immediately before the Date of Issue: the lesser of the amount shown above or the amount for which the Member was insured under the replaced insurance. f. Effective Date for Benefit Changes Due to Change in Insurance Class (1) A change in the Member's Scheduled Benefit amount because of a change in the Member's insurance class for which Proof of Good Health is not required (see e. above) will normally be effective on the date of change. However, if the Member is not Actively at Work on the date a Scheduled Benefit change would otherwise be effective, the Scheduled Benefit change will not be in force until the date the Member returns to Active Work. Any decrease in Scheduled Benefit amounts due to a change in a Member's insurance class will be effective on the date of the change, whether or not the Member is Actively at Work. Any termination of Scheduled Benefit amounts due to a change in the Member's insurance class will be effective on the date of the change, whether or not the Member is Actively at Work. (2) A change in a Member's Scheduled Benefit amount because of a change in the Member's insurance class for which Proof of Good Health is required (see e. above) will be effective on the later of: - the date the change would have been effective if Proof of Good Health had not been required; or - the date Proof of Good Health is approved by The Principal. g. Effective Date for Benefit Changes Due to Change by Policy Amendment (1) A change in the Member's Scheduled Benefit amount because of a change in the Schedule of Insurance (as described in PART IV, Section A) by amendment to this Group Policy for which Proof of Good Health is not required (see e. above) will be effective on the date of change. However, if the Member is not Actively at Work on the date an increase in the Scheduled Benefit would otherwise be effective, the This policy has been updated effective January 1, 2014 PART III - INDIVIDUAL REQUIREMENTS AND RIGHTS GC 6007 Section B - Effective Dates, Page 3","{'Page_No.': 'Page 53', 'Policy_Name': 'Principal-Sample-Life-Insurance-Policy'}"
5,"Section B - Premiums Article 1 - Payment Responsibility; Due Dates; Grace Period The Policyholder is responsible for collection and payment of all premiums due while this Group Policy is in force. Payments must be sent to the home office of The Principal in Des Moines, Iowa. The first premium is due on the Date of Issue of this Group Policy. Each premium thereafter will be due on the first of each Insurance Month. Except for the first premium, a Grace Period of 31 days will be allowed for payment of premium. ""Grace Period"" means the first 31-day period following a premium due date. The Group Policy will remain in force until the end of the Grace Period, unless the Group Policy has been terminated by notice as described in PART II, Section C. The Policyholder will be liable for payment of the premium for the time this Group Policy remains in force during the Grace Period. Article 2 - Premium Rates The premium rate(s) for each Member insured for Life Insurance will be: a. Member Life Insurance $0.210 for each $1,000 of insurance in force. b. Member Accidental Death and Dismemberment Insurance $0.025 for each $1,000 of Member Life Insurance in force. c. Dependent Life Insurance $1.46 for each Member insured for Dependent Life Insurance. If the Policyholder has at least two other eligible group insurance policies underwritten by The Principal, as determined by The Principal, the Policyholder may be eligible for a multiple policy discount. Article 3 - Premium Rate Changes The Principal may change a premium rate: a. on any premium due date, if the initial premium rate has then been in force 24 months or more and if Written notice is given to the Policyholder at least 31 days before the date of change; or This policy has been updated effective January 1, 2014 PART II - POLICY ADMINISTRATION GC 6004 Section B - Premiums, Page 1","{'Page_No.': 'Page 30', 'Policy_Name': 'Principal-Sample-Life-Insurance-Policy'}"
8,"The number of Members insured for Dependent Life Insurance will be multiplied by the premium rate then in effect. To ensure accurate premium calculations, the Policyholder is responsible for reporting to The Principal, the following information during the stated time periods: a. Members who are eligible to become insured are to be reported during the month prior to or during the month that coverage becomes effective. b. Members whose coverage has terminated are to be reported within a month of the date coverage terminated. c. Changes in Member insurance class are to be reported within a month of the date that the change in insurance class took place. If a Member is added or a present Member's insurance is increased or terminated on other than the first of an Insurance Month, premium for that Member will be adjusted and applied as if the change were to take place on the first of the next following Insurance Month. Article 5 - Contributions from Members Members are not required to contribute a part of the premium for their Member insurance under this Group Policy. Members are required to contribute a part of the premium for their Dependent's insurance under this Group Policy. This policy has been updated effective January 1, 2014 PART II - POLICY ADMINISTRATION GC 6004 Section B - Premiums, Page 3","{'Page_No.': 'Page 20', 'Policy_Name': 'Principal-Sample-Life-Insurance-Policy'}"


## 3. Generation Layer

### Retrieval Augmented Generation

Now that we have the final top search results, we can pass it to an GPT 3.5 along with the user query and a well-engineered prompt, to generate a direct answer to the query along with citations, rather than returning whole pages/chunks.

In [271]:
# Define the function to generate the response. Provide a comprehensive prompt that passes the user query and the top 3 results to the model

def generate_response(query, top_3_RAG):
    """
    Generate a response using GPT-3.5's ChatCompletion based on the user query and retrieved information.
    """
    messages = [
                {"role": "system", "content":  "You are a helpful assistant in the insurance domain who can effectively answer user queries about insurance policies and documents."},
                {"role": "user", "content": f"""You are a helpful assistant in the insurance domain who can effectively answer user queries about insurance policies and documents.
                                                You have a question asked by the user in '{query}' and you have some search results from a corpus of insurance documents in the dataframe '{top_3_RAG}'. These search results are essentially one page of an insurance document that may be relevant to the user query.

                                                The column 'documents' inside this dataframe contains the actual text from the policy document and the column 'metadata' contains the policy name and source page. The text inside the document may also contain tables in the format of a list of lists where each of the nested lists indicates a row.

                                                Use the documents in '{top_3_RAG}' to answer the query '{query}'. Frame an informative answer and also, use the dataframe to return the relevant policy names and page numbers as citations.

                                                Follow the guidelines below when performing the task.
                                                1. Try to provide relevant/accurate numbers if available.
                                                2. You don’t have to necessarily use all the information in the dataframe. Only choose information that is relevant.
                                                3. If the document text has tables with relevant information, please reformat the table and return the final information in a tabular in format.
                                                3. Use the Metadatas columns in the dataframe to retrieve and cite the policy name(s) and page numbers(s) as citation.
                                                4. If you can't provide the complete answer, please also provide any information that will help the user to search specific sections in the relevant cited documents.
                                                5. You are a customer facing assistant, so do not provide any information on internal workings, just answer the query directly.

                                                The generated response should answer the query directly addressing the user and avoiding additional information. If you think that the query is not relevant to the document, reply that the query is irrelevant. Provide the final response as a well-formatted and easily readable text along with the citation. Provide your complete response first with all information, and then provide the citations.
            **Citations:**  
            Document 1: Policy X, Page 5  
            Document 2: Policy Y, Page 12  
            Document 3: Policy Z, Page 7

          

        """},

                                                
              ]

    response = openai.chat.completions.create(
        model="gpt-3.5-turbo",
        messages=messages
    )

    return response.choices[0].message.content.split('\n')

In [273]:
# Generate the response

response = generate_response(query, top_3_RAG)

In [275]:
# Print the response

print("\n".join(response))

The premium rates for the members insured are as follows:

- Member Life Insurance: $0.210 for each $1,000 of insurance in force
- Member Accidental Death and Dismemberment Insurance: $0.025 for each $1,000 of Member Life Insurance in force
- Dependent Life Insurance: $1.46 for each Member insured for Dependent Life Insurance

To get more detailed information or verify the data, please refer to the relevant sections in the following policy document:
- Policy Name: Principal-Sample-Life-Insurance-Policy
- Page Number: Page 30

**Citations:**  
Document: Principal-Sample-Life-Insurance-Policy  
Page: 30
