# Personalized Question-Answering System for BFSI Policy Documents
Here is a demonstration of a customer-centric application that aims to provide personalized question-answering capabilities to customers. The primary goal is to enhance the customer experience by allowing them to upload their policy documents, such as terms & conditions or other relevant documents, and obtain instant answers to their questions related to those policies. The application leverages state-of-the-art Generative AI models to analyze and understand the uploaded documents and provide accurate and relevant responses to the user's queries.

## Setting up the dependencies and prerequisites

In the demonstration we would require following dependencies:
* **openai**: OpenAI is an artificial intelligence research laboratory consisting of a nonprofit and a for-profit subsidiary. It provides a platform for developers to build and deploy AI models.
* **chromadb**: Chroma is an open-source embedding database that allows developers to build Python or JavaScript LLM apps with memory. It is used in combination with OpenAI to query text files.
* **langchain**: LangChain is a Python library that extends the capabilities of OpenAI's ChatGPT and allows users to answer questions based on data from any year.
* **tiktoken**: Tiktoken is a fast BPE tokeniser for use with OpenAI's models. It is used to encode and decode text data for use with OpenAI models.
* **pypdf**: PyPDF is a Python library for working with PDF files. It allows developers to extract text and metadata from PDF files.

In [None]:
!pip install openai chromadb langchain tiktoken
!pip install pypdf

**What is ChromaDB and how it is used with OpenAI?**

* ChromaDB is an open-source embedding (vector) database that is designed to provide efficient, scalable, and flexible ways to store and search embeddings.
* It is used in combination with OpenAI to query text files. Chroma provides a convenient wrapper around OpenAI's embedding API, which runs remotely on OpenAI's servers and requires an API key.
* The openai python package is required to use the OpenAI embedding models, which can be installed using pip install openai.
* Chroma is used to store embeddings, which are then queried using OpenAI's models. By default, Chroma uses Sentence-BERT to encode text data into embeddings.

**Breif about BPE tokenzier**

* Byte-Pair Encoding (BPE) is a subword-based tokenization algorithm used by OpenAI's GPT-3.5 turbo.
* It is used to split text into smaller chunks called tokens, which can then be mapped with numbers to further feed to an NLP model.
* BPE ensures that the most common words are represented in the vocabulary as a single token while the rare words are broken down into two or more subword tokens.
* The BPE algorithm works by searching for the most frequent pair of existing tokens (by “pair,” here we mean two consecutive tokens in a word) at any step during the tokenizer training, and then merging them.

##Connecting drive
This is an optional part. Here we will connect our google drive to the notebook environment, this way we could access file directory directy from our drive in this notebook.

In [None]:
from google.colab import drive
drive.mount('/content/gDrive')

Mounted at /content/gDrive


## Importing dependencies and prerequisities

In [None]:
import os
from langchain.document_loaders import PyPDFLoader
from langchain.indexes import VectorstoreIndexCreator

**`os` module**
* The `os` module is a standard Python module that provides a way to interact with the operating system. It contains functions for accessing various functionalities of the underlying operating system, such as working with files and directories, environmental variables, and executing system commands. Here we will use the `os` module as a dictionary-like object that contains the system's environment variables.

**Document Loader**
* LangChain offers functionalities to process documents from various sources using a feature called `document_loaders`. Document Loaders are designed to easily load data from various sources and make it available for use in language models. LangChain provides a variety of loaders for different types of documents ranging from PDFs and emails to websites and YouTube videos. Here we will using PDF loader.

**Indexes**
* `Indexes` enable LangChain to connect to various data sources, such as databases, APIs, and other external tools, and use them as inputs for language models. LangChain provides standard, extendable interfaces for each module, including indexes, and also provides external integrations and end-to-end implementations for off-the-shelf use. By enabling language models to interact with APIs, LangChain allows them to access up-to-date information and take actions.
* `VectorstoreIndex` is used to create an index for querying text documents. It splits the documents into smaller chunks and enables efficient retrieval of information from the indexed documents. The VectorstoreIndexCreator is designed to work with various vector databases, such as DuckDB, Chroma, FAISS, Mulvus, and Pinecone.
* `VectorstoreIndex` simplifies the process of creating and managing indexes for question answering tasks in LangChain. It eliminates the need for separate steps like text splitting and embedding algorithms, making it easier for users to work with the language models and retrieve relevant information.

## Open AI configuration

In [None]:
OPENAI_API_KEY = "your-api-key"
os.environ["OPENAI_API_KEY"] = OPENAI_API_KEY

Here we will set the OpenAI API key as an environment variable in Python.
* The first line assigns the API key to a variable called OPENAI_API_KEY.
* The second line uses the os module to set the environment variable OPENAI_API_KEY to the value of the API key variable.

This allows the OpenAI API to access the API key without it being hard-coded into the code, which is a more secure way of handling API keys.

## Reading Documents and Mapping

In [None]:
# Document loader
loader = PyPDFLoader("/content/gDrive/MyDrive/29th July workshop/Product Catalog and Terms & Conditions Documentation.pdf")
# Index that wraps above steps
index = VectorstoreIndexCreator().from_loaders([loader])

* Here we will use the PyPDFLoader and `VectorstoreIndexCreator` components of LangChain to create an index for querying text documents.

* We will creates a PyPDFLoader object that loads a PDF document from a specific file path.

* Then we would create an index which would map the document. The index will be created using the `VectorstoreIndexCreator` component. The component would create an index for querying text documents. It takes in a list of loaders, which in this case is a list containing only the PyPDFLoader object created previously. The `VectorstoreIndexCreator` splits the PDF document into smaller chunks and creates an index that enables efficient retrieval of information from the indexed document.

## Question-Answering

***Explanation of the Prompt***

Here we are providing the description of the customer and product in which the customer is facing the issue.

* I am Emily  and I am a frequent traveller and recently applied for your **premium travel credit card**.

* I was attracted to the card's travel benefits, including airport lounge access and travel insurance, which align perfectly with my lifestyle.

* After receiving the credit card, I have used it for booking a flight and hotel for an upcoming vacation."


Here we are describing about the problem faced.

* Unfortunately, during my trip, I faced a problem with my credit card.
* When I tried to purchase a local restaurant, the card was declined, even though i knew i had available credit.
* I felt embarrassed and worried, as I depended on the card for her travel expenses.

In [None]:
question = """I am Emily  and I am a frequent traveller and recently applied for your premium travel credit card.
I was attracted to the card's travel benefits, including airport lounge access and travel insurance, which align perfectly with my lifestyle.
After receiving the credit card, I have used it for booking a flight and hotel for an upcoming vacation.
Unfortunately, during my trip, I faced a problem with my credit card.
When I tried to purchase a local restaurant, the card was declined, even though i knew i had available credit.
I felt embarrassed and worried, as I depended on the card for her travel expenses.
"""
index.query(question)

" I'm sorry to hear that you faced an issue with your credit card. Our customer support team was able to look into the matter promptly and found that there was a temporary transaction block placed on your card as a security measure, suspecting potential fraudulent activity due to the sudden international transactions during your trip. The representative immediately lifted the transaction block on your credit card and apologized for any inconvenience caused. They also informed you that you should have received a notification about the transaction block before your trip, but due to an oversight, the notification may not have reached you. To ensure a smooth experience for the rest of your trip, the customer support representative offered you assistance with any other issues you might encounter during your travel. They also reiterated the card's travel benefits and reminded you of the terms and conditions specified in our credit card agreement. I hope this helps."

* To perform a query on the document we need to use the index created by the `VectorstoreIndexCreator`. This allows us to retrieve relevant information or answers from the indexed documents based on the provided question. The index contains the necessary data and structure to efficiently retrieve information from the indexed documents. This process is called data reterival. Below is the image which will demostrate an document is stored in an DB.

![Data Retrival](https://drive.google.com/uc?export=view&id=1LknznxS-TR3bI40cFTHh3xUI2n9MTCEQ)



* The query(question) method is called on the index object, passing the question as the input. This triggers the search process within the index to find relevant documents or information related to the question.

* The specific implementation and behavior of the query() method may vary depending on the details of the VectorstoreIndexCreator and the underlying indexing mechanism being used. However, in general, the query() method performs a search within the index based on the provided question and returns the relevant results.

***Explanation of the prompt:***

Customer and product description:

* Hi! I am John an existing investor with your financial institution.
* I have been holding mutual fund investments with your firm for several years, and i values the advice and services provided by your team.
* Recently, I noticed some fluctuations in the performance of one of the mutual funds in my portfolio.

Problem description:

* I got Concerned about the recent performance of the mutual fund, so I decided to seek clarification from your financial advisors.
* I was unsure about the reasons behind the fluctuations and wanted to understand if any action was required on my part.

In [None]:
# Question-answering
question = """Hi! I am John an existing investor with your financial institution.
I have been holding mutual fund investments with your firm for several years, and i values the advice and services provided by your team.
Recently, I noticed some fluctuations in the performance of one of the mutual funds in my portfolio.
I got Concerned about the recent performance of the mutual fund, so I decided to seek clarification from your financial advisors.
I was unsure about the reasons behind the fluctuations and wanted to understand if any action was required on my part.
"""
index.query(question)

' Hi John, thank you for being an existing investor with our financial institution. We value your loyalty and appreciate your trust in our services. We understand your concern about the recent performance of the mutual fund in your portfolio. We recommend scheduling a consultation with one of our financial advisors. During this meeting, we will discuss your financial goals, investment objectives, and any other relevant information to determine how our advisory services can best assist you. Please note that our advisory services are not suitable for everyone, and the final decision to engage in our services should be made after careful consideration.'