# FSI Use-case: Insurance CoPilot 

In this example we will explore how GenAI can be used to improve customer experience in a Contact Center setting. 

## Use-case details
Imagine a scenario where customers call in to an Insurance company helpline and asking for information about their Insurance policy. Currently, insurance agents have to find the right insurance policy for the customer, open the document and scan through pages and pages to find the relevant part of the policy corresponding to the customer's questions. In the meantime, the customer has to wait on the call. 

With Generative AI we can improve the experience of the customer on the call by making it easy for Contact Center agent to find the right policy for the customer and to ask questions to a chatbot based on the insurance policy.

## Solution

As discussed in Chapter 4, we will implement this use-case using the Retrieval Augmented Generation (RAG) architecture. 

![architecture diagram](./contact-center-workflow.png)

In [1]:
# Installing required libraries
!pip3 install -r requirements.txt

Collecting boto3 (from -r requirements.txt (line 1))
  Using cached boto3-1.34.84-py3-none-any.whl (139 kB)
Collecting langchain (from -r requirements.txt (line 2))
  Using cached langchain-0.1.16-py3-none-any.whl (817 kB)
Collecting chromadb (from -r requirements.txt (line 3))
  Using cached chromadb-0.4.24-py3-none-any.whl (525 kB)
Collecting sentence-transformers (from -r requirements.txt (line 4))
  Using cached sentence_transformers-2.6.1-py3-none-any.whl (163 kB)
Collecting tiktoken (from -r requirements.txt (line 5))
  Using cached tiktoken-0.6.0-cp311-cp311-macosx_10_9_x86_64.whl (999 kB)
Collecting pypdf (from -r requirements.txt (line 6))
  Using cached pypdf-4.2.0-py3-none-any.whl (290 kB)
Collecting botocore<1.35.0,>=1.34.84 (from boto3->-r requirements.txt (line 1))
  Using cached botocore-1.34.84-py3-none-any.whl (12.1 MB)
Collecting jmespath<2.0.0,>=0.7.1 (from boto3->-r requirements.txt (line 1))
  Using cached jmespath-1.0.1-py3-none-any.whl (20 kB)
Collecting s3transf

## Indexing the policy documents 

As the first step we prepare and index the documents into a vector DB. In this example we will use a local ChromaDB vector database, but for a full deployment you can use other open source VectorDB such as OpenSearch, pgVector for PostgreSQL or proprietary ones such as Pinecone.

In [2]:
from langchain.vectorstores import Chroma
from langchain.document_loaders import TextLoader, DirectoryLoader, PyPDFLoader
from langchain.text_splitter import CharacterTextSplitter, TokenTextSplitter

# from langchain.llms import Bedrock
from langchain_community.llms import Bedrock
from langchain.embeddings import HuggingFaceEmbeddings 

import os
import boto3


For this example we are using the following models:

* LLM: Llama 2 13b Chat model available through Amazon Bedrock
* Embedding: Sentence Transformers embedding available through HuggingFace

You can modify this example and call your favorite LLM and Embedding model as well. 

In [3]:
DEFAULT_MODEL_ID = "meta.llama2-13b-chat-v1"

def get_llm(model_id=DEFAULT_MODEL_ID, aws_region="us-east-1"):
    bedrock = boto3.client(service_name='bedrock-runtime', region_name=aws_region)
    llm = Bedrock(
        model_id=model_id,
        region_name=aws_region,
        client=bedrock
    )
    return llm


def get_embedding_model():
    return HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2")

Let's write a utility function that does all the steps we need for indexing documents: 

* loads PDF files from a directory, 
* splits it into chunks,
* embed and store the documents into local ChromaDB vector database using Langchain provided interface 

In this example we are using a publicly available sample Insurance Policy available from AXA Insurance website [here](http://www.axainsurance.com/home/policy-wording/policywording_153.pdf)

In [4]:
def index_docs(file_path, embeddings, chunk_size = 1000):
    # Loading text from local files
    isDirectory = os.path.isdir(file_path)
    if isDirectory:
        loader = DirectoryLoader(file_path, loader_cls=PyPDFLoader, show_progress=True)    
    else:
        loader = PyPDFLoader(file_path, encoding='utf8')
    
    print("Loading documents from path:", file_path)
    documents = loader.load()

    print("Splitting documents. chunk_size=", chunk_size)
    # For splitting by characters, use CharacterTextSplitter class
    # text_splitter = CharacterTextSplitter(chunk_size=chunk_size, chunk_overlap=5000)
    # For splitting by tokens, use TokenTextSplitter class
    text_splitter = TokenTextSplitter(chunk_size=chunk_size, chunk_overlap=50)

    # Split the documents and save the first chunk
    texts = text_splitter.split_documents(documents)

    print("Calculating embedding and storing in vector db")
    db = Chroma.from_documents(texts, embeddings)
    return db


Now let's load and index the documents in our local folder

In [5]:
CHUNK_SIZE = 1000

db = index_docs(
    file_path="files/",
    embeddings=get_embedding_model(),
    chunk_size = CHUNK_SIZE, 
)


  from .autonotebook import tqdm as notebook_tqdm


Loading documents from path: files/


100%|██████████| 1/1 [00:01<00:00,  1.82s/it]


Splitting documents. chunk_size= 1000
Calculating embedding and storing in vector db


## Querying the policy documents

In [6]:
def search_docs(user_query, db, k=2):
    # Obtain top-k similar chunks from VectorDB
    docs = db.similarity_search_with_score(user_query)[:k]

    # For each document chunk retrieved capture the text and source metadata for displaying references
    docs = [{"content": x[0].page_content, "source":x[0].metadata["source"]} for x in docs]

    return docs

In [7]:
QNA_PROMPT = """
\n\nHuman: You are a financial AI, an artificial intelligence developed to answer questions about finances and investments. 
Use the following documents and the information contained therein to answer the following question and provide relevant information: "{question}".

The text of the document is within the <text></text> XML tags: {documents}

Use these documents to formulate your own answer to the question "{question}", as if you were directly answering the question. Ensure that your answer is correct and does not contain any information that cannot be directly taken from the documents. Do not directly cite the document or metadata.

Assistant:"""

HUMAN_PROMPT = "Human:"

def ask_llm(user_query, docs, llm):
    qna_prompt = QNA_PROMPT.format(question = user_query, documents=docs)
    print("QNA Prompt: ",qna_prompt)
    answer = llm.invoke(
        input=qna_prompt,
        stop=[HUMAN_PROMPT]
    )
    return answer

In [8]:
user_query = "What type of accidental breakage is covered in Home Insurance?"

# First search vector db to retreive chunks of relevant information from documents.
docs = search_docs(user_query, db)

# Ask the question to LLM, by augmenting the query with relevant chunks of information from documents.
llm = get_llm()
answer = ask_llm(user_query, docs, llm)

print("Answer: ", answer)
for doc in docs:
    print("# Source: " + doc["source"].split("/")[-1])
    print(doc["content"][:2000].replace("#", "") + "...")   # display only first 2k chars for brevity


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


QNA Prompt:  


Human: You are a financial AI, an artificial intelligence developed to answer questions about finances and investments. 
Use the following documents and the information contained therein to answer the following question and provide relevant information: "What type of accidental breakage is covered in Home Insurance?".

The text of the document is within the <text></text> XML tags: [{'content': ' \nADH 15. 10a \n6 Accidental Damage (optional extra)  \nYour policy schedule will show if you have chosen this section.  \n \n7. Accidental damage to cables, drain inspection \ncovers and underground drains, pipes or tanks \nproviding services to or from the home  and for \nwhich you are responsible.  \n \nWe will also pay up to the limit for any one claim \nfor necessary and reasonable costs that you \nincur in finding the source of the damage to the \nhome .  This includes reinstating any wall, floor, \nceiling, drive, fence or path removed or damaged \nduring the search  \n \