# Meet Insura Bot

**Meet InsuraBot - Your Intelligent Insurance Assistant**


Introducing InsuraBot, your trusted companion for all things insurance-related. Designed to streamline your insurance inquiries and policy needs, InsuraBot is here to simplify the complex world of insurance and provide personalized assistance just for you..

# Leveraging 'RAG' (Retrieval-Augmented Generation) for the bot

Certainly! Here's a short description leveraging the RAG architecture for the bot.

InsuraBot employs the **RAG (Retrieval-Augmented Generation) architecture** , integrating **OpenAI's LLM**, (Large Language Model) based on the 3.5 version. It utilizes **FAISS for vector database management** and **OpenAI embeddings** for enhanced understanding. With LLM employing a **'similarity' retrieval mechanism**, fine-tuning is facilitated through PDF and CSV files. Additionally, **prompt engineering is optimized through template usage.** with **LangChain's chain** function orchestrating seamless integration and communication among the components.."

### Why we choose RAG over Fine Tuning OpenAI Model

# Insurabot Architecture Diagram

<img src="https://github.com/SandeepKamakaze/insurabot/blob/main/insurabot_architecture.png?raw=true" width=80%>

# Implementation

We have five steps for the implementation

- Setting up environment for LLM, Vector DB and Data Loading
- Loading the data
- Managing storage of vectors and chat history
- Prompt Engineering
- LangChain Chain 

### Importing Necessary Libraries for the RAG Model

In [1]:
!pip install langchain==0.1.16
!pip install pypdf==4.2.0
!pip install -U langchain-openai
!pip install faiss-cpu



1. `langchain==0.1.16`: This library provides a framework for orchestrating and integrating various language-related tools and technologies, enabling streamlined development of natural language processing pipelines.

2. `pypdf==4.2.0`: This library facilitates parsing and extracting information from PDF files, which may be used as a data source for training or fine-tuning natural language models.

3. `langchain-openai`: This package extends LangChain's capabilities by integrating the OpenAI API, allowing seamless interaction with OpenAI's language models for tasks such as text generation and retrieval.

4. `faiss-cpu`: This library implements efficient similarity search and clustering of dense vectors, commonly used for vector database management and retrieval tasks, such as those involved in natural language understanding and retrieval-based conversational systems.

In [2]:
import os
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain_openai import OpenAIEmbeddings
from langchain.chat_models import ChatOpenAI
from langchain.chains import ConversationalRetrievalChain
from langchain.document_loaders.csv_loader import CSVLoader
from langchain_community.document_loaders import PyPDFLoader
from langchain.vectorstores import FAISS
from langchain.prompts import load_prompt

import json
import time
import datetime
import pypdf
from pprint import pprint

# Setting up environment for LLM, Vector DB and Data Loading

### Paths for prompt template, vector DB, Data Sources

In [3]:
prompt_template = "template_insurance.json"
faiss_index = "faiss_vector_store"
pdf_source = "Home_insurance_sample.pdf"

1. `prompt_template = "template_default.json"`: This parameter specifies the JSON file containing a template for prompt engineering. By allowing modification, it provides flexibility to change the response style of the bot, enabling adaptable and customizable interactions with users.

2. `faiss_index = "faiss_vector_store"`: This parameter determines the folder location for storing the vector store created by FAISS. It can be customized to suit specific storage requirements or preferences, ensuring efficient management and retrieval of vector embeddings used in natural language processing tasks.

3. `pdf_source = "Home_insurance_sample.pdf"`: This parameter specifies the location of text data stored in PDF format, which may contain unlabeled or unstructured information. It can be customized to use different PDF documents as a source of textual data, enabling the incorporation of diverse information sources into natural language processing pipelines.

### Setting Up OpenAI API Environment

In [4]:
# Replace with your key
os.environ['OPENAI_API_KEY'] = "Enter-your-OpenAI-Key"
embeddings = OpenAIEmbeddings()

This sets the OpenAI API key as an environment variable using the `os.environ` dictionary. 

    
Then, it imports the `OpenAIEmbeddings` class from the `langchain.embeddings` module. Finally, it initializes an instance of `OpenAIEmbeddings`. This instance can be used to access pre-trained embeddings provided by OpenAI for our text data / tokens


# Setting Up Data Sources Loading Data

In [5]:
pdf_loader = PyPDFLoader(pdf_source)
pdf_data = pdf_loader.load()

In [6]:
type(pdf_data)

list

`PyPDFLoader`: Initializes a PDF loader (`PyPDFLoader`) to extract text data from the PDF file. 

## Domain Specific Data for the RAG Bot

The PDF data for a sample home insurance is sourced from the website "https://www.iii.org/" ensuring compliance with ethical and legal standards. 

In [7]:
pdf_data[:2]

[Document(page_content='HOMEOWNERS\nHO 00 03 10 00\nHO 00 03 10 00 Copyright, Insurance Services Office, Inc., 1999 Page 1 of 22HOMEOWNERS 3 – SPECIAL FORM\nAGREEMENT\nWe will provide the insurance described in this policy\nin return for the premium and compliance with allapplicable provisions of this policy.\nDEFINITIONS\nA.In this policy, "you" and "your" refer to the "named\ninsured" shown in the Declarations and the spouseif a resident of the same household. "We", "us"and "our" refer to the Company providing this in-surance.\nB.In addition, certain words and phrases are definedas follows:\n1."Aircraft Liability", "Hovercraft Liability", "Motor\nVehicle Liability" and "Watercraft Liability",subject to the provisions in b. below, mean the\nfollowing:\na.Liability for "bodily injury" or "property dam-age" arising out of the:\n(1)Ownership of such vehicle or craft by an"insured";\n(2)Maintenance, occupancy, operation,use, loading or unloading of such vehi-cle or craft by any person;\n(

# Managing storage of vectors and chat history

## Embedding Data & Storage of vector data into Vector Store (Using FAISS -Facebook AI Similarity Search )

In [8]:
data = pdf_data

# Create embeddings for the docs
vectors = FAISS.from_documents(data, embeddings)
vectors.save_local("faiss_vector_store")

## Setting Up Chat History

In [9]:
chat_history = []

## Setting up Conversational Retrieval Chain using LangChain

In this process, a Conversational Retrieval Chain is established using LangChain libraries. Integration is performed between an OpenAI Chat model and a document retriever, facilitated by the `ConversationalRetrievalChain.from_llm()` function. This enables contextually relevant responses based on retrieved documents. Utilizing LangChain ensures streamlined integration and communication among the components, enhancing efficiency and effectiveness in developing conversational AI systems. This setup enhances conversational interactions by providing informative and contextually rich responses.

In [10]:
openai_api_key = os.getenv("OPENAI_API_KEY")

## Choosing the desired Template for Prompt Engineering

We employ various prompt templates to generate responses in different styles according to the user's preference. These prompts are integrated into LangChain's chain during response retrieval. 

The `combine_docs_chain_kwargs` parameter in the chain function facilitates this process.

In [11]:
prompt = load_prompt(prompt_template)

In [12]:
with open(prompt_template) as f:
    prompt_data = json.load(f)
pprint(prompt_data)

{'_type': 'prompt',
 'input_variables': ['context', 'question'],
 'template': 'System: You are a chatbot for home insurance company. You need '
             'to provide responses to users who prompts insurance questions to '
             'you. Be polite and provide answers based on the provided context '
             'and chat_history only. Use only the provided data and not prior '
             'knowledge. \n'
             ' Human: Take a deep breath and do the following step by step: \n'
             ' 1. Read the context and chat_history below \n'
             ' 2. Answer the question with detail using the provided Insurance '
             'information \n'
             ' 3. If a user question seem like it requires previous prompts or '
             'the responses, make sure the user question can be answered from '
             'chat history first and then go to context. \n'
             ' 4. Make sure to nicely format the output in a three paragraph '
             'answer and try to

## Setting up the Chain Function 

In [13]:
retriever = vectors.as_retriever(search_type="similarity", search_kwargs={"k":6, "include_metadata":True, "score_threshold":0.6})

First, we create a vector retrieval function to specify the type of retriving mechanishm the LLM has to perform from the vector store. 

In [14]:
chain = ConversationalRetrievalChain.from_llm(llm=ChatOpenAI(temperature=0.3,model_name='gpt-3.5-turbo', openai_api_key=openai_api_key), 
                                                retriever=retriever,return_source_documents=False,verbose=False,chain_type="stuff",
                                                max_tokens_limit=4097, combine_docs_chain_kwargs={"prompt": prompt})

  warn_deprecated(


Then, we create a chain function that takes a LLM for the generation, retrieval and incorporating context from its pre-trained neural network. 

- The Chain Function uses OpenAI LLM 'gpt-3.5-turbo'
- This uses a similarity retreival as most of the RAG architectured bots do
- The maximum tokens generated is also mentioned on top of providing the response structure during the prompt engineering.
- The prompt template is also passed along that the LLM takes into consideration when retrieval and generating responses.
                                                                    

In [15]:
def get_response(user_prompt):
    result = chain({
        "system": "You are a chatbot for home insurance company. You need to provide responses to users who prompt insurance questions to you. Be polite and provide answers based on the provided context only. Use only the provided data and not prior knowledge.", 
        "question": user_prompt,
        "chat_history": chat_history
    })
    
    chat_history.append((result["question"], result["answer"]))
    print("\nChatbot Response:\n")
    print(json.loads(result["answer"]).get("response"))

# InsuraBot Functionality 

"To test out the chatbot within the code, just replace the question in the function call to receive a relevant respons."

In [16]:
get_response("What is the policy says about Debris Removal?")

  warn_deprecated(



Chatbot Response:

Debris Removal coverage under the policy states that the insurance company will pay for reasonable expenses for the removal of debris of covered property if a Peril Insured Against causes the loss. This expense is included in the limit of liability that applies to the damaged property. Additionally, if the amount for the actual damage to the property plus the debris removal expense exceeds the limit of liability, an additional 5% of that limit is available for such expense. The policy also covers the removal of fallen trees due to specific perils up to a certain limit.


In [17]:
get_response("How much the policy give for damage of trees?")


Chatbot Response:

The policy covers the removal of fallen trees due to specific perils up to $1,000 for your liability assumed by contract or agreement. This includes the removal of your tree(s) felled by the peril of Windstorm or Hail or Weight of Ice, Snow or Sleet, or a neighbor's tree(s) felled by a Peril Insured Against under Coverage C. The limit is $1,000 for any one loss, regardless of the number of fallen trees. No more than $500 of this limit will be paid for the removal of any one tree. This coverage is additional insurance.


In [18]:
get_response("Can my family also be covered for the insurance?")


Chatbot Response:

Coverage under the insurance policy extends to you, residents of your household who are your relatives, other persons under the age of 21 and in the care of the named insured, and students enrolled in school full time who were residents of your household before moving out to attend school. The policy also includes the legal representative of a deceased person named in the Declarations or the spouse, if a resident of the same household, with respect to the premises and property covered under the policy at the time of death. This coverage is subject to the terms and conditions outlined in the policy.
