# Project - Natural Language Processing Project-1     PART-2
 by ARYAN JAIN

#### SUMMARY

<b>Context:</b>
     Great Learning has a an academic support department which receives numerous support requests every day throughout the year. Teams are spread across geographies and try to provide support round the year. Sometimes there are circumstances where due to heavy workload certain request resolutions are delayed, impacting company’s business. Some of the requests are very generic where a proper resolution procedure delivered to the user can solve the problem. Company is looking forward to design an automation which can interact with the user, understand the problem and display the resolution procedure, if found as a generic requet or redirect the request to an actual human support executive if the request is complex or not in it’s database.
    
    
<b>Data Description:</b>    
    A sample corpus is attached for your reference. Please enhance/add more data to the corpus using your linguistics skills.
    

<b>Domain:</b>
 Customer support
    
    
<b>Objectives:</b>
    Design a python based interactive semi - rule based chatbot which can do the following:
        - Start chat session with greetings and ask what the user is looking for.
        - Accept dynamic text based questions from the user, reply with relevant answer from the designed corpus.
        - End the chat session only if the user requests to end else ask what the user is looking for. 
        Loop continues till the user asks to end it.

Lets import the required libraries

## Importing required libraries

In [1]:
import tensorflow as tf

import nltk
from nltk.stem.lancaster import LancasterStemmer  # Lancaster stemmer
stemmer = LancasterStemmer() # Lancaster stemmer

import numpy as np
import random

from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn import metrics
from sklearn.metrics import accuracy_score,f1_score,recall_score,precision_score, confusion_matrix
from sklearn.model_selection import train_test_split

In [2]:
# Importing corpus including custom tags I have added to the json file provied as part of problem statement

import json

with open('GL Bot.json') as file:
    Org_Corpus = json.load(file)

# Display corpus file
print(Org_Corpus)

{'intents': [{'tag': 'Intro', 'patterns': ['hi', 'how are you', 'is anyone there', 'hello', 'whats up', 'hey', 'yo', 'listen', 'please help me', 'i am learner from', 'i belong to', 'aiml batch', 'aifl batch', 'i am from', 'my pm is', 'blended', 'online', 'i am from', 'hey ya', 'talking to you for first time'], 'responses': ['Hello! how can i help you ?'], 'context_set': ''}, {'tag': 'Exit', 'patterns': ['thank you', 'thanks', 'cya', 'see you', 'later', 'see you later', 'goodbye', 'i am leaving', 'have a Good day', 'you helped me', 'thanks a lot', 'thanks a ton', 'you are the best', 'great help', 'too good', 'you are a good learning buddy'], 'responses': ['I hope I was able to assist you, Good Bye'], 'context_set': ''}, {'tag': 'Olympus', 'patterns': ['olympus', 'explain me how olympus works', 'I am not able to understand olympus', 'olympus window not working', 'no access to olympus', 'unable to see link in olympus', 'no link visible on olympus', 'whom to contact for olympus', 'lot of p

## Tokenize

In [3]:
# Next step is to Tokenize

In [4]:
# Download "punkt" if missing
nltk.download('punkt')

# Extract data
W = [] # Tokens 
L = [] # Identified Tags or Labels
doc_x = [] # Tokenised words
doc_y = [] # Tags or Labels

# Convert to Lower case and tokenize the messages ( here part of Patterns in the corpus )
for intent in Org_Corpus['intents']:
    for pattern in intent['patterns']:
        w_temp = nltk.word_tokenize(pattern)
        W.extend(w_temp)
        doc_x.append(w_temp)
        doc_y.append(intent["tag"])
    
    # Add the mising tag if any    
    if intent['tag'] not in L:
        L.append(intent['tag'])

[nltk_data] Downloading package punkt to /Users/aryan/nltk_data...
[nltk_data]   Package punkt is already up-to-date!


In [5]:
# Performing Stemming

In [6]:
# Stemming
W = [stemmer.stem(w.lower()) for w in W if w != "?"] # Stemming or learning the root word
W = sorted(list(set(W))) # Sorted words
L = sorted(L) # Sorted list of tags or labels

In [7]:
# Sorted Words
W

['a',
 'abl',
 'access',
 'act',
 'ad',
 'adam',
 'aifl',
 'aiml',
 'alexnet',
 'am',
 'an',
 'anyon',
 'ar',
 'art',
 'backward',
 'bad',
 'bag',
 'bas',
 'batch',
 'bay',
 'belong',
 'best',
 'blend',
 'bloody',
 'boost',
 'bot',
 'buddy',
 'centroid',
 'class',
 'clust',
 'cnn',
 'competit',
 'comput',
 'connect',
 'contact',
 'convolv',
 'cre',
 'cross',
 'cv',
 'cya',
 'day',
 'decomposit',
 'deep',
 'did',
 'diffult',
 'do',
 'eig',
 'ensembl',
 'epoch',
 'explain',
 'fac',
 'first',
 'for',
 'forest',
 'forward',
 'from',
 'funct',
 'good',
 'goodby',
 'googlenet',
 'grady',
 'gre',
 'hat',
 'hav',
 'hel',
 'hello',
 'help',
 'hey',
 'hi',
 'hid',
 'hour',
 'how',
 'hyp',
 'i',
 'imagenet',
 'imput',
 'in',
 'intellig',
 'is',
 'jerk',
 'jok',
 'k-means',
 'kernel',
 'knn',
 'lat',
 'lay',
 'learn',
 'leav',
 'lenet',
 'link',
 'list',
 'log',
 'lot',
 'machin',
 'me',
 'ml',
 'my',
 'naiv',
 'nam',
 'nb',
 'net',
 'network',
 'neur',
 'no',
 'not',
 'of',
 'olymp',
 'olyp',
 'o

In [8]:
len(W)

185

In [9]:
#Sorted Tags
L

['Bot',
 'Computer Vision',
 'Exit',
 'Intro',
 'NN',
 'Olympus',
 'Profane',
 'SL',
 'Ticket',
 'USL']

# Creating Bag Of Words ( BOW )

In [10]:
# Creating Bag of words (BOW)

In [11]:
Train = [] # Create list for Training data 
Target = [] # Create list for Target data 

out_empty = [0 for _ in range(len(L))]

# Loop to create bag of words and put the frequency count on each word
for x, doc in enumerate(doc_x):
    bag = []

    w_temp = [stemmer.stem(w.lower()) for w in doc]

    for w in W:
        if w in w_temp:
            bag.append(1)
        else:
            bag.append(0)

    output_row = out_empty[:]
    output_row[L.index(doc_y[x])] = 1 

    Train.append(bag) # List
    Target.append(output_row) # List

In [12]:
# Let's convert the above lists to numpy arrays for further processing
Train = np.array(Train) 
Target = np.array(Target)

In [13]:
Train[:5]

array([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
  

In [14]:
Target[:5]

array([[0, 0, 0, 1, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 1, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 1, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 1, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 1, 0, 0, 0, 0, 0, 0]])

In [15]:
# This is one hot encoding for the tag intro

In [16]:
Train.shape

(172, 185)

In [17]:
# Total 136 entries for Training

In [18]:
X_train, X_test, y_train, y_test = train_test_split(Train, Target, test_size=0.2, random_state=7)
X_train.shape,y_train.shape,X_test.shape,y_test.shape

((137, 185), (137, 10), (35, 185), (35, 10))

In [19]:
# I will try a Neural Network classifier

In [20]:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, BatchNormalization, Activation , Dropout

model_NN = Sequential()
model_NN.add(Dense(128, input_dim = len(X_train[0]), kernel_initializer='uniform')) #158 inputs
model_NN.add(BatchNormalization())
model_NN.add(Activation('relu'))
model_NN.add(Dense(64,  kernel_initializer='uniform'))
model_NN.add(BatchNormalization())
model_NN.add(Activation('relu'))
model_NN.add(Dense(32,  kernel_initializer='uniform'))
model_NN.add(BatchNormalization())
model_NN.add(Activation('relu'))
model_NN.add(Dense(10,  kernel_initializer='uniform'))
model_NN.add(Activation('softmax'))
model_NN.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])

In [21]:
model_NN.summary()

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense (Dense)                (None, 128)               23808     
_________________________________________________________________
batch_normalization (BatchNo (None, 128)               512       
_________________________________________________________________
activation (Activation)      (None, 128)               0         
_________________________________________________________________
dense_1 (Dense)              (None, 64)                8256      
_________________________________________________________________
batch_normalization_1 (Batch (None, 64)                256       
_________________________________________________________________
activation_1 (Activation)    (None, 64)                0         
_________________________________________________________________
dense_2 (Dense)              (None, 32)                2

In [22]:
model_NN.fit(X_train, y_train,           
          validation_data=(X_test, y_test),
          epochs=100,
          batch_size=32)

Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100
Epoch 28/100
Epoch 29/100
Epoch 30/100
Epoch 31/100
Epoch 32/100
Epoch 33/100
Epoch 34/100
Epoch 35/100
Epoch 36/100
Epoch 37/100
Epoch 38/100
Epoch 39/100
Epoch 40/100
Epoch 41/100
Epoch 42/100
Epoch 43/100
Epoch 44/100
Epoch 45/100
Epoch 46/100
Epoch 47/100
Epoch 48/100
Epoch 49/100
Epoch 50/100
Epoch 51/100
Epoch 52/100
Epoch 53/100
Epoch 54/100
Epoch 55/100
Epoch 56/100
Epoch 57/100
Epoch 58/100
Epoch 59/100
Epoch 60/100
Epoch 61/100
Epoch 62/100
Epoch 63/100
Epoch 64/100
Epoch 65/100
Epoch 66/100
Epoch 67/100
Epoch 68/100
Epoch 69/100
Epoch 70/100
Epoch 71/100
Epoch 72/100
Epoch 73/100
Epoch 74/100
Epoch 75/100
Epoch 76/100
Epoch 77/100
Epoch 78

<tensorflow.python.keras.callbacks.History at 0x7fdf3c67deb0>

In [23]:
# Next, let's try Random Forest

In [24]:
from sklearn.ensemble import RandomForestClassifier
model_RF = RandomForestClassifier(n_estimators=100,criterion='gini')
model_RF.fit(X_train, y_train)

pred = model_RF.predict(X_test)
print("Test Accuracy:",accuracy_score(y_test, pred))

Test Accuracy: 0.2571428571428571


In [25]:
# Neural Networks provided a better result than Random Forest

In [26]:
# General outline of how this works:
    # The message from the user is first converted into a Bag of Words
    # The BOW will then be passed to our model to predict the tag
    # Once we get the tag, we can choose any response from it to send back to the user

In [27]:
# Define function to convert message into Bag of Words

def Get_BOW(message, W):
    Test = []
    bow = [0 for _ in range(len(W))] 
    msg_words = nltk.word_tokenize(message)
    msg_words = [stemmer.stem(word.lower()) for word in msg_words]

    for words in msg_words:
        for i, w in enumerate(W):
            if w == words:
                bow[i] = 1
    Test.append(bow)       # needed to convert to the shape needed by NN 
    return np.array(Test)

In [28]:
# Define chat function for interaction

def chat():
    print("Chat with Ramos (type: stop to quit)")
    print("If answer is not right (type: *)")
    while True:
        inp = input("\n\nYou: ")
        if inp.lower()=="*":
            print("BOT: Please rephrase your question and try again")
        if inp.lower() == "quit":
            break

        results = model_NN.predict(Get_BOW(inp, W))
        results_index = np.argmax(results)
        tag = L[results_index]

        for tg in Org_Corpus["intents"]:
            if tg['tag'] == tag:
                responses = tg['responses']

        print(random.choice(responses))

In [29]:
Get_BOW('Hello', W).shape

(1, 185)

In [30]:
chat()

Chat with Ramos (type: stop to quit)
If answer is not right (type: *)


You: Hi
Hello! how can i help you ?


You: CNN
Link: Computer Vision wiki 


You: RNN
Link: Supervised Learning wiki 


You: recommendation systems
Link: Unsupervised Learning wiki 


You: thank you
I hope I was able to assist you, Good Bye


You: quit


In [31]:
# NOTE: this solution is based on the detail available at below link:

    #  https://www.mygreatlearning.com/blog/basics-of-building-an-artificial-intelligence-chatbot/

# END OF PART 2