• DOMAIN: Customer support

• CONTEXT: Great Learning has a an academic support department which receives numerous support requests every day throughout the 
year. Teams are spread across geographies and try to provide support round the year. Sometimes there are circumstances where due to 
heavy workload certain request resolutions are delayed, impacting company’s business. Some of the requests are very generic where a 
proper resolution procedure delivered to the user can solve the problem. Company is looking forward to design an automation which can 
interact with the user, understand the problem and display the resolution procedure [ if found as a generic request ] or redirect the request 
to an actual human support executive if the request is complex or not in it’s database.

• DATA DESCRIPTION: A sample corpus is attached for your reference. Please enhance/add more data to the corpus using your linguistics 
skills.

• PROJECT OBJECTIVE: Design a python based interactive semi - rule based chatbot which can do the following: 
1. Start chat session with greetings and ask what the user is looking for.
2. Accept dynamic text based questions from the user. Reply back with relevant answer from the designed corpus. 
3. End the chat session only if the user requests to end else ask what the user is looking for. Loop continues till the user asks to end it.
Please use the sample chatbot demo video for reference.

• EVALUATION: GL evaluator will use linguistics to twist and turn sentences to ask questions on the topics described in DATA DESCRIPTION 
and check if the bot is giving relevant replies.

In [1]:
from google.colab import drive

drive.mount('/content/drive')

Mounted at /content/drive


In [4]:
import numpy as np
import nltk
import random
import string

import bs4 as bs
import urllib.request
import re

from nltk.stem.porter import PorterStemmer

import json

import torch
import torch.nn as nn
from torch.utils.data import Dataset, DataLoader

In [5]:
nltk.download('punkt')

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt.zip.


True

In [6]:
with open('/content/drive/MyDrive/GL+Bot.json', 'r') as f:
  intents = json.load(f)

In [7]:
def tokenize(sentence):
  return nltk.word_tokenize(sentence)

In [8]:
stemmer = PorterStemmer()
def stem(word):
  return stemmer.stem(word.lower()) 

In [10]:
def bag_of_words(tokenized_sentence, corpus):
  tokenized_sentence = [stem(w) for w in tokenized_sentence]
  
  bag = np.zeros(len(corpus), dtype=np.float32)
  for idx,w, in enumerate(corpus):
    if w in tokenized_sentence:
      bag[idx] = 1.0

  return bag


In [11]:
corpus = []
tags = []
xy = []

for intent in intents['intents']:
  tag = intent['tag']
  tags.append(tag)
  for pattern in intent['patterns']:
    w = tokenize(pattern)
    corpus.extend(w)
    xy.append((w, tag))


In [12]:
# We will avoid the following characters from the sentences
punctuations = ['?','!','.',',']

In [13]:
corpus = [stem(w) for w in corpus if w not in punctuations]
corpus = sorted(set(corpus))
tags = sorted(set(tags))

In [14]:
X_train = []
y_train = []

for (pattern_sentence, tag) in xy:
  bag = bag_of_words(pattern_sentence, corpus)
  X_train.append(bag)

  label = tags.index(tag)
  y_train.append(label)

X_train = np.array(X_train)
y_train = np.array(y_train)

In [15]:
class ChatDataset(Dataset):
  def __init__(self):
    self.n_samples = len(X_train)
    self.x_data = X_train
    self.y_data = y_train

  def __getitem__(self, index):
    return self.x_data[index], self.y_data[index]

  def __len__(self):
    return self.n_samples

In [16]:
class NeuralNet(nn.Module):
    def __init__(self, input_size, hidden_size, num_classes):
        super(NeuralNet, self).__init__()
        self.l1 = nn.Linear(input_size, hidden_size) 
        self.l2 = nn.Linear(hidden_size, hidden_size) 
        self.l3 = nn.Linear(hidden_size, num_classes)
        self.relu = nn.ReLU()
    
    def forward(self, x):
        out = self.l1(x)
        out = self.relu(out)
        out = self.l2(out)
        out = self.relu(out)
        out = self.l3(out)
        # no activation and no softmax at the end
        return out

In [17]:
#Hyperparameters
num_epochs = 1000
batch_size = 8
learning_rate = 0.001
input_size = len(X_train[0])
hidden_size = 8
output_size = len(tags)

In [18]:
dataset = ChatDataset()
train_loader = DataLoader(dataset=dataset, batch_size=batch_size, shuffle=True, num_workers=0)
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

In [19]:
model = NeuralNet(input_size, hidden_size, output_size)

In [20]:
 # Loss and optimizer
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)

In [21]:
# Train the model
for epoch in range(num_epochs):
    for (words, labels) in train_loader:
        words = words.to(device)
        labels = labels.to(dtype=torch.long).to(device)
        
        # Forward pass
        outputs = model(words)
        # if y would be one-hot, we must apply
        # labels = torch.max(labels, 1)[1]
        loss = criterion(outputs, labels)
        
        # Backward and optimize
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
        
    if (epoch+1) % 100 == 0:
        print (f'Epoch [{epoch+1}/{num_epochs}], Loss: {loss.item():.4f}')

Epoch [100/1000], Loss: 0.2594
Epoch [200/1000], Loss: 0.0033
Epoch [300/1000], Loss: 0.0016
Epoch [400/1000], Loss: 0.0007
Epoch [500/1000], Loss: 0.0001
Epoch [600/1000], Loss: 0.0001
Epoch [700/1000], Loss: 0.0000
Epoch [800/1000], Loss: 0.0000
Epoch [900/1000], Loss: 0.0000
Epoch [1000/1000], Loss: 0.0000


In [22]:
data = {
"model_state": model.state_dict(),
"input_size": input_size,
"hidden_size": hidden_size,
"output_size": output_size,
"corpus": corpus,
"tags": tags
}

In [23]:
FILE = "data.pth"
torch.save(data, FILE)

In [24]:
input_size = data["input_size"]
hidden_size = data["hidden_size"]
output_size = data["output_size"]
corpus = data['corpus']
tags = data['tags']
model_state = data["model_state"]

In [25]:
model = NeuralNet(input_size, hidden_size, output_size).to(device)
model.load_state_dict(model_state)
model.eval()

NeuralNet(
  (l1): Linear(in_features=154, out_features=8, bias=True)
  (l2): Linear(in_features=8, out_features=8, bias=True)
  (l3): Linear(in_features=8, out_features=8, bias=True)
  (relu): ReLU()
)

In [26]:
bot_name = "Olympus"
print("Let's chat! (type 'quit' to exit)")
while True:
    # sentence = "do you use credit cards?"
    sentence = input("You: ")
    if sentence == "quit":
        break

    sentence = tokenize(sentence)
    X = bag_of_words(sentence, corpus)
    X = X.reshape(1, X.shape[0])
    X = torch.from_numpy(X).to(device)

    output = model(X)
    _, predicted = torch.max(output, dim=1)

    tag = tags[predicted.item()]

    probs = torch.softmax(output, dim=1)
    prob = probs[0][predicted.item()]
    if prob.item() > 0.75:
        for intent in intents['intents']:
            if tag == intent["tag"]:
                print(f"{bot_name}: {random.choice(intent['responses'])}")
    else:
        print(f"{bot_name}: I do not understand...")

Let's chat! (type 'quit' to exit)
You: vvbk
Olympus: Hello! how can i help you ?
You: hi
Olympus: Hello! how can i help you ?
You: thanks
Olympus: I hope I was able to assist you, Good Bye
You: quit
