# Retrieval-augmented generation (RAG)

In a retrieval-augmented generation (RAG) pipeline, this would look like:

- Retriever: embeddings helps retrieve relevant documents based on query similarity.
- Generator: llm generates answers or summaries based on the retrieved context.

In [1]:
from langchain.vectorstores import FAISS
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.document_loaders import PyPDFLoader

import torch
import os
from tqdm.auto import tqdm

In [2]:
import os
from dotenv import load_dotenv
from langchain_openai import AzureChatOpenAI, AzureOpenAIEmbeddings

# Step 1: Load environment variables from .env file
load_dotenv(".env")

# Step 2: Retrieve Azure OpenAI environment variables
AZURE_OPENAI_ENDPOINT = os.getenv("AZURE_OPENAI_ENDPOINT")
AZURE_OPENAI_API_KEY = os.getenv("AZURE_OPENAI_API_KEY")
OPENAI_API_VERSION = os.getenv("OPENAI_API_VERSION")
MODEL_NAME = os.getenv("MODEL_NAME")

#Generator
llm = AzureChatOpenAI(
    model="gpt-4o-mini", 
    azure_deployment= "gpt-4o-mini",  # Your Azure deployment
    api_version     = "2024-08-01-preview", #OPENAI_API_VERSION,  # Your API version
    api_key         = AZURE_OPENAI_API_KEY,
    azure_endpoint  = AZURE_OPENAI_ENDPOINT,
    temperature     = 0,
)

#Retreiver
embedding = AzureOpenAIEmbeddings(
    model="text-embedding-3-large"
)

In [4]:
# llm.invoke("Hello, world!")

# Part 1 : Data Preprocessing | Vector Store | Retriver

## 1.1 Data Preprocessing

In [5]:
##STEP1 Document loaders
folder_path = './docs/'
pdf_list = []
for filename in os.listdir(folder_path):
    if filename.endswith('.pdf'):
        pdf_list.append(folder_path + filename)

documents = []
pdf_loaders = [PyPDFLoader(pdf) for pdf in pdf_list]
for loader in pdf_loaders:
    documents.extend(loader.load())

In [6]:
documents

[Document(metadata={'source': './docs/m3-f2.pdf', 'page': 0}, page_content='DIPLOMA IN INSURANCE SERVICES\nMODULE - 3\nNotesInsurance Documents\nPractice of Life Insurance\n 202.0 INTRODUCTION\nDocuments are necessary to evidence the existence of a\ncontract. In life insurance several documents are in vogue.The documents stand as a proof of the contract between theinsurer and the insured. The major documents in vogue inlife insurance are premium receipt, insurance policy,endorsements etc.\n2.1 OBJECTIVEAfter going through this lesson you will be able to\nzRecall the various documents used in life insurance\nzLearn the utility of each document.\n2.2 NEED FOR INSURANCE DOCUMENTATIONLife insurance is a legally enforceable contract between two\nparties both of whom are legally qualified to contract. It istherefore, necessary that the terms and conditions of theagreement must be suitably documented in a manner thatwould make it clear that both parties to the contract are Ad-idem i.e., of th

In [7]:
len(documents) #how many page

13

In [8]:
##STEP2 Document transformers
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size = 1000,
    chunk_overlap = 100
)

docs = text_splitter.split_documents(documents) 

In [9]:
len(docs) #each page has been chunk

32

## 1.2 Vector Store

In [10]:
##STEP3 Text embedding models
#above is call already

##STEP4 Vector stores
vector_path = 'vectordb_path'
db_file_name = 'insurance'

vectordb = FAISS.from_documents(
        documents = docs, 
        embedding = embedding)

vectordb.save_local(
    os.path.join(vector_path, db_file_name)
)

In [11]:
##STEP5 Retrievers
vectordb = FAISS.load_local(
        folder_path = os.path.join(vector_path, db_file_name),
        embeddings  = embedding,
        allow_dangerous_deserialization=True
    ) 

retriever = vectordb.as_retriever()

## 1.3 Retriever

In [16]:
# Testing
retriever.invoke("What is Proposal form?")

[Document(metadata={'source': './docs/m3-f2.pdf', 'page': 1}, page_content='(1) at the stage of proposal, which if accepted result into a\npolicy,\n(2) during the duration of the policy where several alterations\nmay become necessary\n(3) at the end of the policy contract when insurer pays the\nfinal claim.\n2.2.1  Documents needed at the stage of the proposal2.2.2 Proposal form  is the basic format which is filled in by\nthe proposer who wants to take an insurance policy. It can bedefined as  the application for insurance.\nA proposal form has three portions(1) The first gives details about the proposer , his name,\naddress, occupation, the details about the type of insurancethat he wants to take and the name of the nominee towhom the money is payable in case the policyholder doesnot survive to take the maturity amount.\n(2) The second portion relates to the details of the insurance\npolicy  that the proposer already possesses, the present\nhealth conditions and the personal history o

# Part 2 : LLM  

## 2.1 Prompt Engineering

In [17]:
# sample prompt
# we have create template with placeholder
from langchain.prompts import PromptTemplate

template = "Tell me a {adjective} joke about {content}."
prompt_template = PromptTemplate.from_template(template)
prompt_template

PromptTemplate(input_variables=['adjective', 'content'], input_types={}, partial_variables={}, template='Tell me a {adjective} joke about {content}.')

In [18]:
prompt_template.format(adjective="funny", content="chickens")
# This is not good prompt as we talked

'Tell me a funny joke about chickens.'

In [19]:
from langchain.prompts import PromptTemplate

template = """
You are the chatbot and the face of our Insurance Company. Your job is to provide helpful, accurate, and friendly information to prospective and current clients about our insurance services and policies.
Your responsibility is to answer questions only and only related to our insurance offerings. 
Anything unrelated should be responded to with a polite reminder that your primary role is to assist with insurance-related inquiries.
MUST only use the following pieces of context to answer the question at the end. If the answers are not in the context or you are not sure of the answer, just say that you don't know — do not try to make up an answer.

When encountering abusive, offensive, or harmful language, such as fuck, bitch, etc, just politely ask the users to maintain appropriate behavior.

Always make sure to elaborate your response and use a warm, professional, and reassuring tone to reflect the trust and reliability of our brand. Never answer with any unfinished response.

Context : {context}
Question: {question}
Answer:"""

prompt = PromptTemplate(
    template = template, 
    input_variables=["context", "question"]
)

##context is information from documents that are retrieved

## 2.2 Chain = Prompt + Model

In [20]:
chain = prompt | llm

# Invoke with insurance-related topic
result = chain.invoke({
    "context": "Policy document is the evidence of the insurance contract and is a detailed document which mentions all the terms and conditions of the insurance", 
    "question": "What is Policy document"
})

In [21]:
result

AIMessage(content='A policy document is a crucial piece of evidence for your insurance contract. It is a detailed document that outlines all the terms and conditions of your insurance coverage. This document serves as a reference for both you and the insurance company, ensuring that all parties are clear on the specifics of the policy, including coverage limits, exclusions, and the responsibilities of both the insurer and the insured. If you have any further questions about your policy document or need assistance with understanding any part of it, feel free to ask!', additional_kwargs={'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 101, 'prompt_tokens': 230, 'total_tokens': 331, 'completion_tokens_details': {'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 0, 'rejected_prediction_tokens': 0}, 'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 0}}, 'model_name': 'gpt-4o-mini-2024-07-18', 'system_fingerprint': 'fp_b705f0c291', 

In [22]:
print(result.content)

A policy document is a crucial piece of evidence for your insurance contract. It is a detailed document that outlines all the terms and conditions of your insurance coverage. This document serves as a reference for both you and the insurance company, ensuring that all parties are clear on the specifics of the policy, including coverage limits, exclusions, and the responsibilities of both the insurer and the insured. If you have any further questions about your policy document or need assistance with understanding any part of it, feel free to ask!


In [45]:
# In case, you want to return only string
from langchain_core.output_parsers import StrOutputParser
# Output parser for the response
output_parser = StrOutputParser()

# Chain setup with model and output parser
chain = prompt | llm | output_parser

# Invoke with insurance-related topic
result = chain.invoke({
    "context": "Policy document is the evidence of the insurance contract and is a detailed document which mentions all the terms and conditions of the insurance", 
    "question": "What is Policy document"
})

In [40]:
print(result)

A policy document is a crucial piece of evidence for your insurance contract. It serves as a detailed record that outlines all the terms and conditions associated with your insurance coverage. This document provides important information regarding what is covered, any exclusions, the duration of the policy, and the responsibilities of both the insurer and the insured. It is essential to review your policy document carefully to understand your coverage and ensure that it meets your needs. If you have any further questions about your policy or need assistance with anything else related to our insurance services, feel free to ask!


# PART 3 RAG (Retriver + LLM)

In [23]:
question = input("What is your question?")

In [24]:
question

'What is Policy document'

In [25]:
docs = retriever.invoke(question)
# Combine the documents into a single string
context = "\n-------------------------------\n".join(d.page_content for d in docs)

In [37]:
print(context)

9. Which one of the following statements is correct?
a. IRDA has prescribed proposal forms for all insurers.b. Renewal premium cannot be paid without the renewal
notice.
c. Both (a) and (b) statements are correct.d. Both (a) and (b) statements are wrong.
10. Which one of the following statements is correct?
a. If policy document is lost the insurance contract
becomes void.
b. The family history appears in the personal statement.c. Both (a) and (b) statements are correct.d. Both (a) and (b) statements are wrong.
2.7 ANSWERS TO INTEXT QUESTIONS
2.1
1. At the time of taking the policy, For any endorsement
and at the time of claim
2. Proposal form means where all the particulars of an
-------------------------------
This is possible provided all the terms and conditions, rights
and duties - privileges and obligations are properly documentedin terms which can be clearly interpreted in a court of law.Between  two human beings sometime silence means an2
INSURANCE DOCUMENTS
-------------------

In [27]:
# In case, you want to return only string
from langchain_core.output_parsers import StrOutputParser
# Output parser for the response
output_parser = StrOutputParser()

# Chain setup with model and output parser
rag_chain = prompt | llm | output_parser

rag_result = rag_chain.invoke({
    "context": context,
    "question": question
})

In [28]:
print(rag_result)

The policy document is a comprehensive and detailed document that serves as evidence of the insurance contract between the insurer and the insured. It outlines all the terms and conditions of the insurance agreement, making it clear that the insured is purchasing the right to receive a sum of money at a future date, contingent upon the insured fulfilling their obligation to pay the premiums as scheduled.

The preamble of the policy document clarifies that the policy is issued subject to the conditions and privileges printed on the back of the policy. Additionally, any endorsements placed on the policy are considered part of the contract, and the statements provided in the proposal form are foundational to the agreement.

The policy document includes essential information such as the identification number of the policy, the name of the policyholder, the date of commencement, the beneficiary's name and address, the type of policy, and the details regarding premium payments. It is crucial