## <font color = red> Introduction and System Design

The goal of the project is to build a robust generative search system capable of effectively and accurately answering questions from a policy document. This is called Retrival Augmented Generation or RAG pipeline.

The RAG pipeline will consist of the following three layers:

1. **Embedding Layer**- The first step in the pipeline is to build the vector store. This step involves ingesting the documents, processing them to create individual chunks and passing these to an embedding model to create individual vector representations of the text.

  Focus:
  - chunking strategy
  - choice of embedding model


2. **Search and Rank Layer**- The second layer in the pipeline is the search and rank layer, which will perform a semantic similarity search on the knowledge bank based on the query and retrieve the top results. The output of this layer is the top K closest documents or chunks for the query and their indices.

  Focus:
  - design 3 test queries
  - embed and search vector DB against each query. Implement cache mechanism
  - implement reranking block with choice of cross-encoding model

3. **Generation Layer**- The last layer is the generation layer, which receives the results of the previous layer, which contains the top retrieved search results, the original user query and a well-constructed prompt to the LLM. These inputs allow the LLM to generate a more coherent answer that is relevant to the user query with information/relevant chunks stored in the knowledge base.

  Focus:
  - final prompt to be exhaustive, accurate and includes few-shot examples

In [1]:
# Install all the required libraries

!pip install -U -q pdfplumber tiktoken openai chromaDB sentence-transformers

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m42.0/42.0 kB[0m [31m596.0 kB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m48.5/48.5 kB[0m [31m1.1 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m67.3/67.3 kB[0m [31m1.2 MB/s[0m eta [36m0:00:00[0m
[?25h  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m50.4/50.4 kB[0m [31m1.1 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m59.2/59.2 kB[0m [31m2.3 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m5.6/5.6 MB[0m [31m21.3 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

In [2]:
# Import all the required Libraries

import pdfplumber
from pathlib import Path
import pandas as pd
from operator import itemgetter
import json
import tiktoken
import openai
import chromadb

In [3]:
# Mount Google Drive
from google.colab import drive
drive.mount('/content/drive', force_remount=True)

Mounted at /content/drive


## <font color = red> 1. Embedding Layer

We will be using [pdfplumber](https://https://pypi.org/project/pdfplumber/) to read and process the PDF files.

The PDF document is effectively processed, cleaned, and chunked for the embeddings in this layer. The choice of the chunking strategy has a large impact on the final quality of the retrieved results.

Another important aspect in the embedding layer is the choice of the embedding model. Chunks can be embedded using the OpenAI embedding model or any model from the SentenceTransformers library on HuggingFace.

In [4]:
# Function to check whether a word is present in a table or not for segregation of regular text and tables

def check_bboxes(word, table_bbox):
    # Check whether word is inside a table bbox.
    l = word['x0'], word['top'], word['x1'], word['bottom']
    r = table_bbox
    return l[0] > r[0] and l[1] > r[1] and l[2] < r[2] and l[3] < r[3]

In [5]:
# Function to extract text from a PDF file by reading multiple pages in a document, extracting text from them using appropriate preprocessing, and storing them in a dataframe.

'''
 Structure of the function:
  1. Declare a variable p to store the iteration of the loop that will help us store page numbers alongside the text
  2. Declare an empty list 'full_text' to store all the text files
  3. Use pdfplumber to open the pdf pages one by one
  4. Find the tables and their locations in the page
  5. Extract the text from the tables in the variable 'tables'
  6. Extract the regular words by calling the function check_bboxes() and checking whether words are present in the table or not
  7. Use the cluster_objects utility to cluster non-table and table words together so that they retain the same chronology as in the original PDF
  8. Declare an empty list 'lines' to store the page text
  9. If a text element in present in the cluster, append it to 'lines', else if a table element is present, append the table
  10. Append the page number and all lines to full_text, and increment 'p'
  11. When the function has iterated over all pages, return the 'full_text' list

'''

def extract_text_from_pdf(pdf_path):
    p = 0
    full_text = []


    with pdfplumber.open(pdf_path) as pdf:
        for page in pdf.pages:
            page_no = f"Page {p+1}"
            text = page.extract_text()

            tables = page.find_tables()
            table_bboxes = [i.bbox for i in tables]
            tables = [{'table': i.extract(), 'top': i.bbox[1]} for i in tables]
            non_table_words = [word for word in page.extract_words() if not any(
                [check_bboxes(word, table_bbox) for table_bbox in table_bboxes])]
            lines = []

            for cluster in pdfplumber.utils.cluster_objects(non_table_words + tables, itemgetter('top'), tolerance=5):

                if 'text' in cluster[0]:
                    try:
                        lines.append(' '.join([i['text'] for i in cluster]))
                    except KeyError:
                        pass

                elif 'table' in cluster[0]:
                    lines.append(json.dumps(cluster[0]['table']))


            full_text.append([page_no, " ".join(lines)])
            p +=1

    return full_text

In [6]:
# Define the directory containing the PDF files
pdf_path = "/content/drive/MyDrive/upgrad/HelpMate AI Codes/"
pdf_directory = Path(pdf_path)

# Initialize an empty list to store the extracted texts and document names
data = []

# Loop through all files in the directory
for pdf_path in pdf_directory.glob("*.pdf"):

    # Process the PDF file
    print(f"...Processing {pdf_path.name}")

    # Call the function to extract the text from the PDF
    extracted_text = extract_text_from_pdf(pdf_path)

    # Convert the extracted list to a PDF, and add a column to store document names
    extracted_text_df = pd.DataFrame(extracted_text, columns=['Page No.', 'Page_Text'])
    extracted_text_df['Document Name'] = pdf_path.name

    # Append the extracted text and document name to the list
    data.append(extracted_text_df)

    # Print a message to indicate progress
    print(f"Finished processing {pdf_path.name}")

# Print a message to indicate all PDFs have been processed
print("All PDFs have been processed.")

...Processing Principal-Sample-Life-Insurance-Policy.pdf
Finished processing Principal-Sample-Life-Insurance-Policy.pdf
All PDFs have been processed.


In [7]:
# Concatenate all the DFs in the list 'data' together
insurance_pdfs_data = pd.concat(data, ignore_index=True)

# Let's also check the length of all the texts as there might be some empty pages or pages with very few words that we can drop
insurance_pdfs_data['Text_Length'] = insurance_pdfs_data['Page_Text'].apply(lambda x: len(x.split(' ')))

# Retain only the rows with a text length of at least 10
insurance_pdfs_data = insurance_pdfs_data.loc[insurance_pdfs_data['Text_Length'] >= 10]

# Store the metadata for each page in a separate column
insurance_pdfs_data['Metadata'] = insurance_pdfs_data.apply(lambda x: {'Policy_Name': x['Document Name'][:-4], 'Page_No.': x['Page No.']}, axis=1)

#### Chunking Strategy for document text

This concludes the chunking aspect also, as we can see that mostly the pages contain few hundred words, maximum going upto 500. So, we don't need to chunk the documents further; we can perform the embeddings on individual pages. This strategy makes sense for 2 reasons:
1. The way insurance documents are generally structured, you will not have a lot of extraneous information in a page, and all the text pieces in that page will likely be interrelated.
2. We want to have larger chunk sizes to be able to pass appropriate context to the LLM during the generation layer.

#### Choice of Embedding Model

In this section, we will embed the pages in the dataframe through OpenAI's `text-embedding-ada-002` model, and store them in a ChromaDB collection.

#### Generate and Store Embeddings using OpenAI and ChromaDB


In [8]:
# Set the API key
filepath = "/content/drive/MyDrive/upgrad/HelpMate AI Codes/"

# initialise with the key
from google.colab import userdata
openai.api_key = userdata.get('OpenAI_API_Key')

In [9]:
# Import the OpenAI Embedding Function into chroma

from chromadb.utils.embedding_functions import OpenAIEmbeddingFunction

In [10]:
# Define the path where chroma collections will be stored

chroma_data_path = '/content/drive/MyDrive/upgrad/HelpMate AI Codes/ChromaDB_Data'

In [11]:
import chromadb

In [12]:
# Call PersistentClient()

client = chromadb.PersistentClient()

In [13]:
# Set up the embedding function using the OpenAI embedding model

model = "text-embedding-ada-002"
embedding_function = OpenAIEmbeddingFunction(api_key=openai.api_key, model_name=model)

In [14]:
# Initialise a collection in chroma and pass the embedding_function to it so that it used OpenAI embeddings to embed the documents

insurance_collection = client.get_or_create_collection(name='RAG_on_Insurance', embedding_function=embedding_function)

In [15]:
# Convert the page text and metadata from your dataframe to lists to be able to pass it to chroma

documents_list = insurance_pdfs_data["Page_Text"].tolist()
metadata_list = insurance_pdfs_data['Metadata'].tolist()

In [16]:
# Add the documents and metadata to the collection alongwith generic integer IDs. You can also feed the metadata information as IDs by combining the policy name and page no.

insurance_collection.add(
    documents= documents_list,
    ids = [str(i) for i in range(0, len(documents_list))],
    metadatas = metadata_list
)

In [17]:
# Let's take a look at the first few entries in the collection

insurance_collection.get(
    ids = ['0','1','2'],
    include = ['embeddings', 'documents', 'metadatas']
)

{'ids': ['0', '1', '2'],
 'embeddings': [[-0.022469419986009598,
   0.01871146820485592,
   -0.02729734405875206,
   -0.031290166079998016,
   -0.0036731369327753782,
   0.027923669666051865,
   0.0008538575493730605,
   0.023656828328967094,
   -0.016571523621678352,
   0.004253792576491833,
   -0.005251998547464609,
   0.004058065824210644,
   -0.006273039150983095,
   0.023252326995134354,
   -0.011299951933324337,
   0.030977005138993263,
   0.002877181861549616,
   -0.024726800620555878,
   0.03055945597589016,
   0.007920404896140099,
   0.006413309834897518,
   0.01886804960668087,
   0.00953188817948103,
   0.012689611874520779,
   -0.012356876395642757,
   -0.009440548717975616,
   0.017889415845274925,
   -0.021686512976884842,
   0.023526344448328018,
   0.0030223457142710686,
   0.013283316045999527,
   -0.0145490150898695,
   -0.03361279144883156,
   -0.041180890053510666,
   -0.00845539104193449,
   0.013531235978007317,
   -0.0026716687716543674,
   -0.003868863452225923

## <font color = red> 2. Search Layer with Caching

In this section, we will perform a semantic search of a query in the collections embeddings to get several top semantically similar results.

Here, you first need to design at least 3 queries against which you will test the system. We need to understand and skim through the document, and accordingly come up with some queries, the answers to which can be found in the policy document.

Next, we need to embed the queries and search your ChromaDB vector database against each of these queries. Implementing a cache mechanism is also mandatory.

Finally, we need to implement the re-ranking block, and for this you can choose from a range of cross-encoding models on HuggingFace.

In [64]:
# Defining the 3 test queries for the RAG pipeline

query1 = "Can you opt out after signing a Life Insurance policy?"
query2 = "What is the order of inheritance if there is no will left behind?"
query3 = "What are some cases in which the beneficiaries may be denied?"

In [19]:
# Initialising a cache mechanism
cache_collection = client.get_or_create_collection(name='Insurance_Cache', embedding_function=embedding_function)

cache_collection.peek()

{'ids': [],
 'embeddings': [],
 'metadatas': [],
 'documents': [],
 'uris': None,
 'data': None,
 'included': ['embeddings', 'metadatas', 'documents']}

### For query 1:

In [24]:
query = query1

In [80]:

def rag_function(query):
    cache_results = cache_collection.query(
      query_texts=query,
      n_results=1
    )

    results = insurance_collection.query(
      query_texts=query,
      n_results=10
    )

    # Implementing Cache in Semantic Search

    # Set a threshold for cache search
    threshold = 0.2

    ids = []
    documents = []
    distances = []
    metadatas = []
    results_df = pd.DataFrame()


    # If the distance is greater than the threshold, then return the results from the main collection.

    if cache_results['distances'][0] == [] or cache_results['distances'][0][0] > threshold:
          # Query the collection against the user query and return the top 10 results
          results = insurance_collection.query(
          query_texts=query,
          n_results=8
          )

          # Store the query in cache_collection as document w.r.t to ChromaDB so that it can be embedded and searched against later
          # Store retrieved text, ids, distances and metadatas in cache_collection as metadatas, so that they can be fetched easily if a query indeed matches to a query in cache
          Keys = []
          Values = []

          for key, val in results.items():
            if val is None:
              continue
            for i in range(8):
              Keys.append(str(key)+str(i))
              Values.append(str(val[0][i]))


          cache_collection.add(
              documents= [query],
              ids = [query],  # Or if you want to assign integers as IDs 0,1,2,.., then you can use "len(cache_results['documents'])" as will return the no. of queries currently in the cache and assign the next digit to the new query."
              metadatas = dict(zip(Keys, Values))
          )

          print("Not found in cache. Found in main collection.")

          result_dict = {'Metadatas': results['metadatas'][0], 'Documents': results['documents'][0], 'Distances': results['distances'][0], "IDs":results["ids"][0]}
          results_df = pd.DataFrame.from_dict(result_dict)
         # print("results_df",results_df)
          return results_df

    # If the distance is, however, less than the threshold, you can return the results from cache

    elif cache_results['distances'][0][0] <= threshold:
          cache_result_dict = cache_results['metadatas'][0][0]

          # Loop through each inner list and then through the dictionary
          for key, value in cache_result_dict.items():
              if 'ids' in key:
                  ids.append(value)
              elif 'documents' in key:
                  documents.append(value)
              elif 'distances' in key:
                  distances.append(value)
              elif 'metadatas' in key:
                  metadatas.append(value)

          print("Found in cache!")

          # Create a DataFrame
          results_df = pd.DataFrame({
            'IDs': ids,
            'Documents': documents,
            'Distances': distances,
            'Metadatas': metadatas
          })

        # Return the results
          return results_df

In [82]:
results_df1 = rag_function(query1)

Found in cache!


In [83]:
results_df1

Unnamed: 0,IDs,Documents,Distances,Metadatas
0,39,Section F - Individual Purchase Rights Article...,0.3490327988712673,"{'Page_No.': 'Page 42', 'Policy_Name': 'Princi..."
1,40,Any individual policy issued will then be in f...,0.3653254983095474,"{'Page_No.': 'Page 43', 'Policy_Name': 'Princi..."
2,32,Section C - Individual Terminations Article 1 ...,0.3776018350016346,"{'Page_No.': 'Page 35', 'Policy_Name': 'Princi..."
3,33,A Member's insurance under this Group Policy f...,0.3790099999761553,"{'Page_No.': 'Page 36', 'Policy_Name': 'Princi..."
4,15,c . a copy of the form which contains the stat...,0.3798865974907497,"{'Page_No.': 'Page 18', 'Policy_Name': 'Princi..."
5,44,"M ember's death, the Death Benefits Payable ma...",0.38001075696808,"{'Page_No.': 'Page 47', 'Policy_Name': 'Princi..."
6,41,(4) Premium will be based on the Dependent's a...,0.3827679786125416,"{'Page_No.': 'Page 44', 'Policy_Name': 'Princi..."
7,23,PART III - INDIVIDUAL REQUIREMENTS AND RIGHTS ...,0.3851396443806584,"{'Page_No.': 'Page 26', 'Policy_Name': 'Princi..."


In [84]:
results_df2 = rag_function(query2)

Found in cache!


In [85]:
results_df3 = rag_function(query3)

Found in cache!


### Re-Ranking with a Cross Encoder

Re-ranking the results obtained from your semantic search can sometime significantly improve the relevance of the retrieved results. This is often done by passing the query paired with each of the retrieved responses into a cross-encoder to score the relevance of the response w.r.t. the query.

In [41]:
# Import the CrossEncoder library from sentence_transformers
from sentence_transformers import CrossEncoder, util

In [42]:
# Initialise the cross encoder model
cross_encoder = CrossEncoder('cross-encoder/ms-marco-MiniLM-L-6-v2')



In [78]:
def cross_encoder_step(query,results_df):
  # Input (query, response) pairs for each of the top 8 responses received from the semantic search to the cross encoder
  # Generate the cross_encoder scores for these pairs

  cross_inputs = [[query, response] for response in results_df['Documents']]
  cross_rerank_scores = cross_encoder.predict(cross_inputs)

  # Store the rerank_scores in results_df
  results_df['Reranked_scores'] = cross_rerank_scores

  # Return the top 3 results from semantic search
  top_3_semantic = results_df.sort_values(by='Distances')
  top_3_semantic[:3]

  # Return the top 3 results after reranking
  top_3_rerank = results_df.sort_values(by='Reranked_scores', ascending=False)
  top_3_rerank[:3]

  top_3_RAG = top_3_rerank[["Documents", "Metadatas"]][:3]
  print("Query:",query)

  return top_3_RAG

In [79]:
top_3_RAG1 = cross_encoder_step(query1,results_df1)
top_3_RAG1

Query: Can you opt out after signing a Life Insurance policy?


Unnamed: 0,Documents,Metadatas
0,Section F - Individual Purchase Rights Article...,"{'Page_No.': 'Page 42', 'Policy_Name': 'Princi..."
5,"M ember's death, the Death Benefits Payable ma...","{'Page_No.': 'Page 47', 'Policy_Name': 'Princi..."
6,(4) Premium will be based on the Dependent's a...,"{'Page_No.': 'Page 44', 'Policy_Name': 'Princi..."


In [86]:
top_3_RAG2 = cross_encoder_step(query2,results_df2)
top_3_RAG2

Query: What is the order of inheritance if there is no will left behind?


Unnamed: 0,Documents,Metadatas
0,c . If a beneficiary dies at the same time or ...,"{'Page_No.': 'Page 48', 'Policy_Name': 'Princi..."
1,"M ember's death, the Death Benefits Payable ma...","{'Page_No.': 'Page 47', 'Policy_Name': 'Princi..."
5,(1) marriage or establishment of a Civil Union...,"{'Page_No.': 'Page 32', 'Policy_Name': 'Princi..."


In [87]:
top_3_RAG3 = cross_encoder_step(query3,results_df3)
top_3_RAG3

Query: What are some cases in which the beneficiaries may be denied?


Unnamed: 0,Documents,Metadatas
0,"M ember's death, the Death Benefits Payable ma...","{'Page_No.': 'Page 47', 'Policy_Name': 'Princi..."
1,c . If a beneficiary dies at the same time or ...,"{'Page_No.': 'Page 48', 'Policy_Name': 'Princi..."
3,I f a Dependent who was insured dies during th...,"{'Page_No.': 'Page 60', 'Policy_Name': 'Princi..."


## <font color = red> 3. Generation Layer

Now that we have the final top search results, we can pass it to an GPT 3.5 along with the user query and a well-engineered prompt, to generate a direct answer to the query along with citations, rather than returning whole pages/chunks.

In the generation layer, the final prompt that we design is the major component. We need to make sure that the prompt is exhaustive in its instructions, and the relevant information is correctly passed to the prompt. We may also choose to provide some few-shot examples in an attempt to improve the LLM output.

In [88]:
# Define the function to generate the response. Provide a comprehensive prompt that passes the user query and the top 3 results to the model

def generate_response(query, results_df):
    """
    Generate a response using GPT-3.5's ChatCompletion based on the user query and retrieved information.
    """
    messages = [
                {"role": "system", "content":  "You are a helpful assistant in the insurance domain who can effectively answer user queries about insurance policies and documents."},
                {"role": "user", "content": f"""You are a helpful assistant in the insurance domain who can effectively answer user queries about insurance policies and documents.
                                                You have a question asked by the user in '{query}' and you have some search results from a corpus of insurance documents in the dataframe '{top_3_RAG}'. These search results are essentially one page of an insurance document that may be relevant to the user query.

                                                The column 'documents' inside this dataframe contains the actual text from the policy document and the column 'metadata' contains the policy name and source page. The text inside the document may also contain tables in the format of a list of lists where each of the nested lists indicates a row.

                                                Use the documents in '{top_3_RAG}' to answer the query '{query}'. Frame an informative answer and also, use the dataframe to return the relevant policy names and page numbers as citations.

                                                Follow the guidelines below when performing the task.
                                                1. Try to provide relevant/accurate numbers if available.
                                                2. You don’t have to necessarily use all the information in the dataframe. Only choose information that is relevant.
                                                3. If the document text has tables with relevant information, please reformat the table and return the final information in a tabular in format.
                                                3. Use the Metadatas columns in the dataframe to retrieve and cite the policy name(s) and page numbers(s) as citation.
                                                4. If you can't provide the complete answer, please also provide any information that will help the user to search specific sections in the relevant cited documents.
                                                5. You are a customer facing assistant, so do not provide any information on internal workings, just answer the query directly.

                                                The generated response should answer the query directly addressing the user and avoiding additional information. If you think that the query is not relevant to the document, reply that the query is irrelevant. Provide the final response as a well-formatted and easily readable text along with the citation. Provide your complete response first with all information, and then provide the citations.
                                                """},
              ]

    response = openai.chat.completions.create(
        model="gpt-3.5-turbo",
        messages=messages
    )
    return response.choices[0].message.content.split('\n')

## <font color = red> Retrieval Augmented Generation Pipeline

In [92]:
# Read the user query

query = query1
results_df=rag_function(query)

if results_df is None:
  print("No results found")
else:
  top_3_RAG=cross_encoder_step(query,results_df)
  response = generate_response(query, top_3_RAG)

print("\n".join(response))

Found in cache!
Query: Can you opt out after signing a Life Insurance policy?
If you have signed a Life Insurance policy, typically you can opt out within a certain period known as the "free look period." During this time frame, which is usually around 15 to 30 days after signing the policy, you have the option to cancel the policy and receive a refund of the premiums paid. However, once this free look period expires, opting out may not be as straightforward and could be subject to penalties or restrictions.

In the given document excerpts:

- The document does not directly address opt-out procedures after signing a Life Insurance policy.
- It is advisable to refer to the specific Terms and Conditions section of your Life Insurance policy document for detailed information on the opt-out process and any associated terms, conditions, or penalties.

Citations:
- Policy Name: Principle Policy
- Relevant Pages for Further Reference: Page 42, Page 44, Page 47


In [65]:
# Read the user query

query = query2
results_df=rag_function(query)

if results_df is None:
  print("No results found")
else:
  top_3_RAG=cross_encoder_step(query,results_df)
  response = generate_response(query, top_3_RAG)

print("\n".join(response))

Found in cache!
Query: What is the order of inheritance if there is no will left behind?
The order of inheritance when there is no will left behind typically follows intestacy laws, where the estate is distributed based on legal guidelines. The specific order may vary depending on the jurisdiction, but generally, it prioritizes immediate family members such as spouses, children, parents, and siblings.

Citations:
1. Policy Name: Principle of Inheritance
   Page Number: Page 48

2. Policy Name: Principle of Inheritance
   Page Number: Page 47


In [90]:
# Read the user query

query = query3
results_df=rag_function(query)

if results_df is None:
  print("No results found")
else:
  top_3_RAG=cross_encoder_step(query,results_df)
  response = generate_response(query, top_3_RAG)

print("\n".join(response))

Found in cache!
Query: What are some cases in which the beneficiaries may be denied?
In some cases, beneficiaries may be denied benefits when:

1. If a beneficiary dies at the same time or before the insured member.
2. If a Dependent who was insured dies during the same accident as the insured member.

Please refer to the specific policy document sections for detailed information and exceptions.

Citations:  
1. Policy Name: Principal Life Insurance Policy  
   Page No.: Page 48

2. Policy Name: Principal Life Insurance Policy  
   Page No.: Page 60
