# Generate Chatbot using Multi Layer Neural Network <br>

## Muzammil Mushtaq

### Object: 

> As a Data Scientist you are required to make a hospital chatbot for a renowned hospital providing first class assistance in peak hrs of the hospital.

### Content:

> For your reference you are being given a data set that includes {Queries/response} collection of data.Make necessary changes to the dataset to make an interactive chatbot.Build a deep learning model that can generate response to an individuals query to help the hospital customer care department.

In [1]:
'''               Read JSON File 

'''

import json
f = open('intents.json')
data = json.load(f)
print (data['intents'])

[{'tag': 'greeting', 'patterns': ['Hi there', 'How are you', 'Is anyone there?', 'Hey', 'Hola', 'Hello', 'Good day'], 'responses': ['Hello, thanks for asking', 'Good to see you again', 'Hi there, how can I help?'], 'context': ['']}, {'tag': 'goodbye', 'patterns': ['Bye', 'See you later', 'Goodbye', 'Nice chatting to you, bye', 'Till next time'], 'responses': ['See you!', 'Have a nice day', 'Bye! Come back again soon.'], 'context': ['']}, {'tag': 'thanks', 'patterns': ['Thanks', 'Thank you', "That's helpful", 'Awesome, thanks', 'Thanks for helping me'], 'responses': ['Happy to help!', 'Any time!', 'My pleasure'], 'context': ['']}, {'tag': 'noanswer', 'patterns': [], 'responses': ["Sorry, can't understand you", 'Please give me more info', 'Not sure I understand'], 'context': ['']}, {'tag': 'options', 'patterns': ['How you could help me?', 'What you can do?', 'What help you provide?', 'How you can be helpful?', 'What support is offered'], 'responses': ['I can guide you through Adverse dru

> Our JSON file named 'intents' consists of 1 list. It has different dictonaries named "tag","patterns","responses", and "context". Tag represents the type of conversation occuring, patterns are the possible questions could be ask by the Costumers, and response is our bot answering to them. 

## Data Preprocessing

> Read the data and split into tags and patterns.

> Additional apply tokenize, stem and bag of words to convert the strings into proper readbale formats and convert into numeric form.


In [3]:
import json
import nltk

nltk.download('punkt')
from nltk.stem.porter import PorterStemmer
stemmer = PorterStemmer()

with open('intents.json','r') as f:
    intents = json.load(f)
    
all_words = []
tags = []
xy = []

def tokenize(sentence):
    """
    split sentence into array of words/tokens
    a token can be a word or punctuation character, or number
    """
    return nltk.word_tokenize(sentence)


def bag_of_words(tokenized_sentence, words):
    """
    return bag of words array:
    1 for each known word that exists in the sentence, 0 otherwise
    example:
    sentence = ["hello", "how", "are", "you"]
    words = ["hi", "hello", "I", "you", "bye", "thank", "cool"]
    bog   = [  0 ,    1 ,    0 ,   1 ,    0 ,    0 ,      0]
    """
    # stem each word
    sentence_words = [word for word in tokenized_sentence]
    # initialize bag with 0 for each word
    bag = np.zeros(len(words), dtype=np.float32)
    for idx, w in enumerate(words):
        if w in sentence_words: 
            bag[idx] = 1

    return bag


def stem(word):
    """
    stemming = find the root form of the word
    examples:
    words = ["organize", "organizes", "organizing"]
    words = [stem(w) for w in words]
    -> ["organ", "organ", "organ"]
    """
    return stemmer.stem(word.lower())


print (100*'*','\n')

for intent in intents['intents']:
    tag = intent['tag']
    tags.append(tag.lower())
    for pattern in intent['patterns']:
        w = tokenize(pattern.lower())
        all_words.extend(w)
        xy.append((w, tag))

ignore_words = ['?', '!', '.', ',']

all_words = [w for w in all_words if w not in ignore_words]
# remove duplicates and sort
all_words = sorted(set(all_words))
tags = sorted(set(tags))

#print (all_words,tags)
print(len(xy), "patterns",'\n','\n')
print(len(tags), "tags:", tags,'\n','\n')
print(len(all_words), "unique stemmed words:", all_words,'\n','\n')

**************************************************************************************************** 

47 patterns 
 

14 tags: ['adverse_drug', 'blood_pressure', 'blood_pressure_search', 'goodbye', 'greeting', 'hospital_search', 'noanswer', 'options', 'pharmacy_search', 'search_blood_pressure_by_patient_id', 'search_hospital_by_params', 'search_hospital_by_type', 'search_pharmacy_by_name', 'thanks'] 
 

90 unique stemmed words: ["'s", 'a', 'adverse', 'all', 'anyone', 'are', 'awesome', 'be', 'behavior', 'blood', 'by', 'bye', 'can', 'causing', 'chatting', 'check', 'could', 'data', 'day', 'details', 'do', 'dont', 'drug', 'drugs', 'entry', 'find', 'for', 'give', 'good', 'goodbye', 'have', 'hello', 'help', 'helpful', 'helping', 'hey', 'hi', 'history', 'hola', 'hospital', 'how', 'i', 'id', 'is', 'later', 'list', 'load', 'locate', 'log', 'looking', 'lookup', 'management', 'me', 'module', 'nearby', 'next', 'nice', 'of', 'offered', 'open', 'patient', 'pharmacies', 'pharmacy', 'pressure', 'prov

[nltk_data] Downloading package punkt to
[nltk_data]     C:\Users\User\AppData\Roaming\nltk_data...
[nltk_data]   Package punkt is already up-to-date!


> It can be noticed that we have 14 tags (represents different issues of costumers), 47 patterns (possible questions asked by customers), and 90 unique words.

## Create training dataset and NN hyperparameters

In [4]:
import numpy as np
# create training data
X_train = []
y_train = []
for (pattern_sentence, tag) in xy:
    # X: bag of words for each pattern_sentence
    bag = bag_of_words(pattern_sentence, all_words)
    X_train.append(bag)
    # y: PyTorch CrossEntropyLoss needs only class labels, not one-hot
    label = tags.index(tag)
    y_train.append(label)

X_train = np.array(X_train)
y_train = np.array(y_train)
#print (X_train, y_train)
# Hyper-parameters 
num_epochs = 1000
batch_size = 8
learning_rate = 0.001
input_size = len(X_train[0])
hidden_size = 8
output_size = len(tags)
print('Input size of data : ',input_size,'\n', 'Output size of data : ', output_size)

Input size of data :  90 
 Output size of data :  14


### Create 2 Hidden Layers Neural Network 

> Input size is 90 (depends on the unique words)

> Output size is 14 (depends on the total tags)

> Activation Function is rectified linear unit (ReLU)

In [5]:
import torch
import torch.nn as nn


class NeuralNet(nn.Module):
    def __init__(self, input_size, hidden_size, num_classes):
        super(NeuralNet, self).__init__()
        self.l1 = nn.Linear(input_size, hidden_size) 
        self.l2 = nn.Linear(hidden_size, hidden_size) 
        self.l3 = nn.Linear(hidden_size, num_classes)
        self.relu = nn.ReLU()
    
    def forward(self, x):
        out = self.l1(x)
        out = self.relu(out)
        out = self.l2(out)
        out = self.relu(out)
        out = self.l3(out)
        # no activation and no softmax at the end
        return out

## Train Model

> Format the dataset in ChatDataset

> Call the data using DataLoader 

> Apply the GPU if available, also apply Cross Entropy and Adam for loss estimation and optimizer respectively.

> Run the Model in 1000 Epochs

> Saved in data.pth

In [6]:
import torch
import torch.nn as nn
from torch.utils.data import Dataset, DataLoader

class ChatDataset(Dataset):

    def __init__(self):
        self.n_samples = len(X_train)
        self.x_data = X_train
        self.y_data = y_train

    # support indexing such that dataset[i] can be used to get i-th sample
    def __getitem__(self, index):
        return self.x_data[index], self.y_data[index]

    # we can call len(dataset) to return the size
    def __len__(self):
        return self.n_samples

dataset = ChatDataset()
train_loader = DataLoader(dataset=dataset,
                          batch_size=batch_size,
                          shuffle=True,
                          num_workers=0)

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

model = NeuralNet(input_size, hidden_size, output_size).to(device)

# Loss and optimizer
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)

# Train the model
for epoch in range(num_epochs):
    for (words, labels) in train_loader:
        words = words.to(device)
        labels = labels.to(dtype=torch.long).to(device)
        
        # Forward pass
        outputs = model(words)
        # if y would be one-hot, we must apply
        # labels = torch.max(labels, 1)[1]
        loss = criterion(outputs, labels)
        
        # Backward and optimize
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
        
    if (epoch+1) % 100 == 0:
        print (f'Epoch [{epoch+1}/{num_epochs}], Loss: {loss.item():.4f}')


print(f'final loss: {loss.item():.4f}')

data = {
"model_state": model.state_dict(),
"input_size": input_size,
"hidden_size": hidden_size,
"output_size": output_size,
"all_words": all_words,
"tags": tags
}

FILE = "data.pth"
torch.save(data, FILE)

print(f'training complete. file saved to {FILE}')

Epoch [100/1000], Loss: 0.4261
Epoch [200/1000], Loss: 0.0325
Epoch [300/1000], Loss: 0.0107
Epoch [400/1000], Loss: 0.0058
Epoch [500/1000], Loss: 0.0028
Epoch [600/1000], Loss: 0.0010
Epoch [700/1000], Loss: 0.0010
Epoch [800/1000], Loss: 0.0005
Epoch [900/1000], Loss: 0.0002
Epoch [1000/1000], Loss: 0.0001
final loss: 0.0001
training complete. file saved to data.pth


### Testing the NN Model

> Reload the saved NN Model (data.pth)

> Run the NN Model for evaluation test

> Create the Bot Employee

> Calculate Probabilities with P>0.75 make the neuron active else return I dont understand

In [7]:
import random


device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

with open('intents.json', 'r') as json_data:
    intents = json.load(json_data)

FILE = "data.pth"
data = torch.load(FILE)

input_size = data["input_size"]
hidden_size = data["hidden_size"]
output_size = data["output_size"]
all_words = data['all_words']
tags = data['tags']
model_state = data["model_state"]

model = NeuralNet(input_size, hidden_size, output_size).to(device)
model.load_state_dict(model_state)
model.eval()

bot_name = "Hospital Customer Care Service "
print("Let's chat! (type 'quit' to exit)")
while True:
    # sentence = "do you use credit cards?"
    sentence = input("You: ")
    if sentence == "quit":
        break

    sentence = tokenize(sentence)
    X = bag_of_words(sentence, all_words)
    X = X.reshape(1, X.shape[0])
    X = torch.from_numpy(X).to(device)

    output = model(X)
    _, predicted = torch.max(output, dim=1)

    tag = tags[predicted.item()]

    probs = torch.softmax(output, dim=1)
    prob = probs[0][predicted.item()]
    if prob.item() > 0.75:
        for intent in intents['intents']:
            if tag == intent["tag"]:
                print(f"{bot_name}: {random.choice(intent['responses'])}")
    else:
        print(f"{bot_name}: I do not understand...")

Let's chat! (type 'quit' to exit)
You: hello
Hospital Customer Care Service : Hi there, how can I help?
You: how can you help me?
Hospital Customer Care Service : I can guide you through Adverse drug reaction list, Blood pressure tracking, Hospitals and Pharmacies
You: how to logging into blood pressure results
Hospital Customer Care Service : Navigating to Blood Pressure module
You: okay
Hospital Customer Care Service : I do not understand...
You: okay I understand, but where is the pharmacy
Hospital Customer Care Service : Please provide pharmacy name
You: okay thank you goodbye
Hospital Customer Care Service : My pleasure
You: quit


### Conclusion

>  A hospital chatbot for a renowned hospital providing first class assistance in peak hrs of the hospital. For this purpose, a data is given that includes {Queries/response} collection of data.

> Prepocessed the data according to Tags, Patterns, and Responses those are processed by the tokenizer, stem, and bag_of_words functions. 

> I created the 2 hidden layer Neural Network connection with hyperparameters (total epoch=1000, batch size=8, learning rate =0.001) along with input size =90 (i.e, the unique words used in the patterns) and output size (i.e., number of tags).

> The loss in the training dataset is about 0.0001 at the epoch of 1000. The probability is set to be 0.75 or more to activate the neuron or else return I dont understand