# *The Intern Academy*
## *Done By:*
## *1. Yash Ajit Paddalwar*
## *2. Chaitanya Gajanan Yadav*
## *3. Sarvesh Gore*


# *TASK 3:*

Problem Statement:

In today’s world banking is also a platform which is almost automated. However many people face
different issues during online banking and also many are not aware of correct safety measure for
their accounts. To create a bot using Natural Language Processing and deep learning which will be
able to give answers for various banking related issues like your bank id, transaction details,
Security services, loan policies, your account status etc.

Here the objectives are:-

1. To create a bot mobile app which can give answers to all your banking related
queries.

2. Keep you updated with your transaction activity so any kind fraud can be
avoided.

3. Also give you alert signs if find any suspicious activity in your account.

# DONE BY SARVESH GORE

In [1]:
## Let's import all the libraries
import pandas as pd
import numpy as np
import pickle
import operator

In [2]:
## Pre Processing Data
from sklearn.svm import SVC
from sklearn.model_selection import train_test_split as tts
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.preprocessing import LabelEncoder as LE
from sklearn.metrics.pairwise import cosine_similarity

In [4]:
## Importing random and nltk
import random
import nltk

In [8]:
## NLTK is a set of libraries for Natural Language Processing. It is a platform for building Python Progreams to process natural language. Mainly used for Q&A and Chatbots.
## The Random module provide accesss to functions that support types of functions like choosing a random number, calls, query etc.
## Stemming is a technique used to extract the base form of the words by removing affixes from them. It is just like cutting down the branches of a tree to its stems. For example, the stem of the words eating, eats, eaten is eat.

In [5]:
from nltk.stem.lancaster import LancasterStemmer

In [6]:

## NLTK has LancasterStemmer class with the help of which we can easily implement Lancaster Stemmer algorithms for the word we want to stem

In [7]:
stemmer = LancasterStemmer()

In [8]:
def cleanup(sentence):
    word_tok = nltk.word_tokenize(sentence)
    stemmed_words = [stemmer.stem(w) for w in word_tok]

    return ' '.join(stemmed_words)

In [9]:
le = LE()

In [10]:
tfv = TfidfVectorizer(min_df=1, stop_words='english')

In [11]:
data = pd.read_csv('https://raw.githubusercontent.com/MrJay10/banking-faq-bot/master/BankFAQs.csv')
questions = data['Question'].values

# DONE BY CHAITANYA GAJANAN YADAV

In [12]:
import nltk
nltk.download('punkt')

[nltk_data] Downloading package punkt to
[nltk_data]     C:\Users\yash\AppData\Roaming\nltk_data...
[nltk_data]   Package punkt is already up-to-date!


True

In [13]:
X = []
for question in questions:
    X.append(cleanup(question))

In [14]:
tfv.fit(X)
le.fit(data['Class'])

LabelEncoder()

# DONE BY YASH AJIT PADDALWAR

In [15]:
## Training Data

In [16]:
X = tfv.transform(X)
y = le.transform(data['Class'])

In [17]:
trainx, testx, trainy, testy = tts(X, y, test_size=.25, random_state=42)

In [18]:
model = SVC(kernel='linear')
model.fit(trainx, trainy)
print("SVC:", model.score(testx, testy))


SVC: 0.927437641723356


In [19]:
def get_max5(arr):
    ixarr = []
    for ix, el in enumerate(arr):
        ixarr.append((el, ix))
    ixarr.sort()

    ixs = []
    for i in ixarr[-5:]:
        ixs.append(i[1])

    return ixs[::-1]

In [20]:
## Chatbot for banking related queries

In [26]:
def chat():
    cnt = 0
    print("PRESS Q to QUIT")
    print("TYPE \"DEBUG\" to Display Debugging statements.")
    print("TYPE \"STOP\" to Stop Debugging statements.")
    print("TYPE \"TOP5\" to Display 5 most relevent results")
    print("TYPE \"CONF\" to Display the most confident result")
    print()
    print()
    DEBUG = False
    TOP5 = False

    print("Bot: Hi, Welcome to our bank!")
    while True:
        usr = input("You: ")

        if usr.lower() == 'yes':
            print("Bot: Yes!")
            continue

        if usr.lower() == 'no':
            print("Bot: No?")
            continue

        if usr == 'DEBUG':
            DEBUG = True
            print("Debugging mode on")
            continue
            
        if usr == 'STOP':
            DEBUG = False
            print("Debugging mode off")
            continue

        if usr == 'Q':
            print("Bot: It was good to be of help.")
            break

        if usr == 'TOP5':
            TOP5 = True
            print("Will display 5 most relevent results now")
            continue

        if usr == 'CONF':
            TOP5 = False
            print("Only the most relevent result will be displayed")
            continue

        t_usr = tfv.transform([cleanup(usr.strip().lower())])
        class_ = le.inverse_transform(model.predict(t_usr)[0])
        questionset = data[data['Class']==class_]
        
        if DEBUG:
            print("Question classified under category:", class_)
            print("{} Questions belong to this class".format(len(questionset)))

        cos_sims = []
        for question in questionset['Question']:
            sims = cosine_similarity(tfv.transform([question]), t_usr)
            cos_sims.append(sims)
            
        ind = cos_sims.index(max(cos_sims))

        if DEBUG:
            question = questionset["Question"][questionset.index[ind]]
            print("Assuming you asked: {}".format(question))

        if not TOP5:
            print("Bot:", data['Answer'][questionset.index[ind]])
        else:
            inds = get_max5(cos_sims)
            for ix in inds:
                print("Question: "+data['Question'][questionset.index[ix]])
                print("Answer: "+data['Answer'][questionset.index[ix]])
                print('-'*50)

        print("\n"*2)
        outcome = input("Was this answer helpful? Yes/No: ").lower().strip()
        if outcome == 'yes':
            cnt = 0
        elif outcome == 'no':
            inds = get_max5(cos_sims)
            sugg_choice = input("Bot: Do you want me to suggest you questions ? Yes/No: ").lower()
            if sugg_choice == 'yes':
                q_cnt = 1
                for ix in inds:
                    print(q_cnt,"Question: "+data['Question'][questionset.index[ix]])
                    # print("Answer: "+data['Answer'][questionset.index[ix]])
                    print('-'*50)
                    q_cnt += 1
                num = int(input("Please enter the question number you find most relevant: "))
                print("Bot: ", data['Answer'][questionset.index[inds[num-1]]])


chat()

PRESS Q to QUIT
TYPE "DEBUG" to Display Debugging statements.
TYPE "STOP" to Stop Debugging statements.
TYPE "TOP5" to Display 5 most relevent results
TYPE "CONF" to Display the most confident result


Bot: Hi, Welcome to our bank!


ValueError: y should be a 1d array, got an array of shape () instead.

# *THANKYOU!*