# Introduction
Companies all over the world are in constant communication with their clients to solve problems
that involve their products. After a while, some patterns will begin to emerge. Customer service
managers looking to make efficient use of their agents' time, will compile a list of Frequently
Asked Questions (FAQs) and put it up on their website. They will then have this as a first level of
support before the more complex queries can be shifted to a human agent. However, it doesn’t
always work out that way.
Why:
1. If you have many products or a big FAQ, no one will take their time to comb through it to
find the specific question related to their issue.
2. People like it when they talk to others or feel like they have talked to others.
Enter Large Language Models (LLMs). This project seeks to leverage the power of LLMs to
produce human-like natural responses to questions by fine-tuning an LLM using proprietary
data so that it can chat with customers and handle their most frequent queries. Customers can
interact with the model via chat, type their questions and have them answered immediately.
This system will be a huge time and resource saver for companies.

FAQ Chatbot is built with these core frameworks and modules:

- [**Streamlit**](https://streamlit.io/) - To create the web app UI and interactivity.
- [**Google PaLM**](https://ai.google/discover/palm2/) - LLM.
- [**Instructor Embeddings**](https://instructor-embedding.github.io/) - Used to create vector embeddings for the proprietary documents and the user queries.
- [**FAISS**](https://engineering.fb.com/2017/03/29/data-infrastructure/faiss-a-library-for-efficient-similarity-search/) - Facebook AI Similarity Search, a vector database to store word embeddings.
- [**Langchain**](https://www.langchain.com/) - A Python Library for developing applications powered by LLM's.
- [**Dataset**](https://huggingface.co/datasets/clips/mfaq ) - Obtained from the Pivdenny bank FAQs via HuggingFace.

## 📈 **Future Roadmap**

Some potential features for future releases:

- User account system.
- Customise the prompt template and model hyperparameters
- Ability to create multiple knowledgebases.

In [1]:
import pandas as pd

In [2]:
df = pd.read_csv('banking.csv')
df.head()

Unnamed: 0,question,response
0,A transfer has not been credited to the card. ...,"If you do not receive the funds, we recommend ..."
1,Am I eligible for a loan from Pivdenny Bank if...,"Yes, if the borrower’s income supported by doc..."
2,Are the terms of the grace credit period appli...,Yes. The payables on the additional card are a...
3,"Are there any additional features in the ""Clie...","Additional features of the ""Client Bank"" inclu..."
4,Are there any additional features in the Clien...,Additional features of the Client Bank include...


In [3]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 273 entries, 0 to 272
Data columns (total 2 columns):
 #   Column    Non-Null Count  Dtype 
---  ------    --------------  ----- 
 0   question  273 non-null    object
 1   response  273 non-null    object
dtypes: object(2)
memory usage: 4.4+ KB


- The data consists of a list of frequently asked questions and their standard corresponding answers. Agents deal with these questions and their variations 80% of the time. 
- Dues to the structured and repetitive nature of this task, this is a prime candidate for automation using AI, specifically, RAG(Retrieval-Augmented Generation).
- **RAG** is the process of optimising the output of a LLM so that it references a knowledge base outside of its training data sets thereby generating a richer and better resposnse for a specific use case.

![Project design](project_design.png)

The first step is to connect to our LLM of choice. Here we will use Google PaLM as it is completely free but the code can be swapped out for any other LLM.

In [10]:
from langchain_google_genai import ChatGoogleGenerativeAI

# api_key = 'AIzaSyAQcMXAEW1UFdXuq5N_ucj8mRp_80WP0gc'
# the temperature variable decides how creative the model can be, 0 is not and 1 is very
llm = ChatGoogleGenerativeAI(model='models/text-bison-002', google_api_key=api_key, temperature=1)

In [11]:
# Testing our LLM
from langchain.prompts import PromptTemplate
from langchain.chains import LLMChain
poem_prompt = PromptTemplate.from_template('write a haiku about AI.')
prompt_chain = LLMChain(llm=llm, prompt=poem_prompt)

In [30]:
import google.generativeai as genai

genai.configure(api_key = api_key)
model = genai.GenerativeModel('gemini-1.5-pro')

In [None]:
response = model.generate_content("List 5 planets each with an interesting fact")
print(response.text)

In [25]:
print(genai.list_tuned_models)

<function list_tuned_models at 0x0000014F9DB28280>


In [1]:
from langchain_community.llms import GooglePalm

plamllm = GooglePalm(google_api_key=api_key, temperature=0.9)

NameError: name 'api_key' is not defined

In [34]:
haiku = plamllm('write a haiku about ai')
print(haiku)

  warn_deprecated(


**AI, our future friend**
**A helping hand, a guiding light**
**A brighter tomorrow**


In [1]:
# load the proprietary data
from langchain.document_loaders.csv_loader import CSVLoader
from langchain.document_loaders import GithubFileLoader

# loader = CSVLoader(file_path='banking.csv', source_column='question', encoding='latin-1')

loader = GithubFileLoader(
        repo="atonui/pds",  # the repo name
        # access_token='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',
        github_api_url="https://api.github.com",
        file_extension = '.csv',
        file_filter=lambda file_path: file_path.endswith(
            ".csv"
            ),  # load all csv files.
    )

data = loader.load()

In [2]:
loader_new = GithubFileLoader(
        repo="atonui/pds",  # the repo name
        # access_token='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',
        github_api_url="https://api.github.com",
        file_extension = '.csv',
    )

data_new = loader.load()

In [3]:
data_new[7].page_content



In [4]:
for item in range(0, len(data_new), 1):
    if data_new[item].metadata['path'] == 'banking.csv':
        doc_data_new = data_new[item]
        print(item)

7


In [6]:
doc_data_new.type

'Document'

In [7]:
for item in range(0, len(data), 1):
    if data[item].metadata['path'] == 'banking.csv':
        doc_data = data[item]
        print(item)

7


In [8]:
doc_data



In [9]:
data[7].type

'Document'

In [None]:
data[7]



In [None]:
data[7].page_content



https://python.langchain.com/docs/integrations/document_loaders/github/ 

In [None]:
# create embeddings
from langchain_community.embeddings import HuggingFaceInstructEmbeddings
from InstructorEmbedding import INSTRUCTOR
from langchain_community.vectorstores import FAISS # to create vector database

instructor_embeddings = HuggingFaceInstructEmbeddings(
    query_instruction="Represent the query for retrieval:"
)

vectordb = FAISS.from_documents(documents=doc_data_new, embedding = instructor_embeddings)

load INSTRUCTOR_Transformer
max_seq_length  512


AttributeError: 'tuple' object has no attribute 'page_content'

In [None]:
from sklearn.metrics.pairwise import cosine_similarity
sentence_a = ['Great expectations make disappointed men.']
sentence_b = ['Hope has two beautiful daughters.']

model = INSTRUCTOR('hkunlp/instructor-large')

embeddings_a = model.encode(sentence_a)
embeddings_b = model.encode(sentence_b)

print(cosine_similarity(embeddings_a, embeddings_b))

load INSTRUCTOR_Transformer
max_seq_length  512
[[0.8053013]]


In [None]:
sentences_a = [['Represent the Art sentence: ','Parton energy loss in QCD matter'], 
               ['Represent the Financial statement: ','The Federal Reserve on Wednesday raised its benchmark interest rate.']]
sentences_b = [['Represent the Science sentence: ','The Chiral Phase Transition in Dissipative Dynamics'],
               ['Represent the Financial statement: ','The funds rose less than 0.5 per cent on Friday']]
embeddings_a = model.encode(sentences_a)
embeddings_b = model.encode(sentences_b)
similarities = cosine_similarity(embeddings_a,embeddings_b)
print(similarities)

[[0.8090116  0.7284529 ]
 [0.6770725  0.81411076]]


In [None]:
retriever = vectordb.as_retriever() # creates an embedding and compares it with the vector database and returns similar embeddings - comparable to the cursor object in SQLite
rdocs = retriever.get_relevant_documents('Some money I deposited has not been moved to the card. What should I do?')
rdocs

[Document(page_content='question: A transfer has not been credited to the card. What should I do?\nresponse: If you do not receive the funds, we recommend contacting the support team support@portmone.com or calling 044 200-09-02. You can also contact the issuing bank to check the details of the authorisation and clarify why the funds have not been credited to the account.', metadata={'source': 'A transfer has not been credited to the card. What should I do?', 'row': 0}),
 Document(page_content="question: Funds from the sender's card were written off twice. What should I do?\nresponse: If the funds in your account are written off twice, this is a bank error. In this case, the funds will be automatically returned to your account.\r\nIf funds are not returned to your card, be sure to contact the help desk of the bank that issued your card, as well as the support service support@portmone.com or call 044Â\xa0200-09-02.", metadata={'source': "Funds from the sender's card were written off twi

In [None]:
from langchain.prompts import PromptTemplate

prompt_template = """Given the following context and a question, generate an answer based on this context only.
In the answer try to provide as much text as possible from "response" section on the source document without making up anything.
If the answer is not found in the context, kindly state "I do not know." Do not try to make up an answer.
CONTEXT: {context}
QUESTION: {question}
"""
PROMPT = PromptTemplate(
    template = prompt_template, input_variables=['context', 'question']
)

In [None]:
from langchain.chains import RetrievalQA
# from langchain_community import RetrievalAQ

chain = RetrievalQA.from_chain_type(
            llm=llm,
            chain_type = 'stuff',
            retriever = retriever,
            input_key = 'query',
            return_source_documents=True,
            chain_type_kwargs={'prompt':PROMPT}
            )

In [None]:
chain('do you have a limit on card transactions?')

  warn_deprecated(


{'query': 'do you have a limit on card transactions?',
 'result': 'response: Yes, there are limits for "Payment" services:\r\n- maximum amount of one transfer â€“ UAH 50,000;- maximum amount of transfers per day â€“ UAH 150,000;No limits are established on transfers between accounts belonging to the same person.',
 'source_documents': [Document(page_content='question: Are there any limits on transactions?\nresponse: Yes, there are limits for "Payment" services:\r\n- maximum amount of one transfer â\x80\x93 UAH 50,000;- maximum amount of transfers per day â\x80\x93 UAH 150,000;No limits are established on transfers between accounts belonging to the same person.The "Bill Payment" service may be subject to minimum/maximum payment limits depending on the type and provider of the service paid by the client. During entry of the amount, the system will advise you concerning the amount eligible for payment.', metadata={'source': 'Are there any limits on transactions?', 'row': 5}),
  Document(p

In [None]:
chain('Some money I deposited has not been moved to the card. What should I do?')

{'query': 'Some money I deposited has not been moved to the card. What should I do?',
 'result': 'response: If money deposited to your card has not been credited yet, call our hotline +380 44 200-09-02 or fill out a form on our website.',
 'source_documents': [Document(page_content='question: A transfer has not been credited to the card. What should I do?\nresponse: If you do not receive the funds, we recommend contacting the support team support@portmone.com or calling 044 200-09-02. You can also contact the issuing bank to check the details of the authorisation and clarify why the funds have not been credited to the account.', metadata={'source': 'A transfer has not been credited to the card. What should I do?', 'row': 0}),
  Document(page_content="question: Funds from the sender's card were written off twice. What should I do?\nresponse: If the funds in your account are written off twice, this is a bank error. In this case, the funds will be automatically returned to your account.\r