> **Copyright (c) 2020 Skymind Holdings Berhad**<br><br>
> **Copyright (c) 2021 Skymind Education Group Sdn. Bhd.**<br>
<br>
Licensed under the Apache License, Version 2.0 (the \"License\");
<br>you may not use this file except in compliance with the License.
<br>You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0/
<br>
<br>Unless required by applicable law or agreed to in writing, software
<br>distributed under the License is distributed on an \"AS IS\" BASIS,
<br>WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
<br>See the License for the specific language governing permissions and
<br>limitations under the License.
<br>
<br>
**SPDX-License-Identifier: Apache-2.0**
<br>

# Introduction

This handson is to demonstrate how to create a chatbot with PyTorch from scratch. There are a few parts in this section and you can run them and even edit the "intents.json" file (intents file is the dataset).

# What we will accomplish?

1. Learn how chatbots work
2. Get an overview of a chatbot structure
3. Learn implementations of PyTorch

# Instructions
Read the guide at each section and run them accordingly.

# Part 1: Data Preprocessing

## Important (Dataset)
Our dataset is the "intents.json". This file contains the user's intent and possible queries (questions) that the user might ask. If you are familiar with json file, you should have no problem adding queries along with their tags and what the chatbot should answer.

In the json file, there are intents, tag, patterns and responses

Intents like the main branch in that contains tag, patterns and responses. You don't have to change anything here.

If you want to add a query, you must add a tag, pattern and its reponses.

Tag is the label that we are a going to give for the query, for example:

patterns: "hello", "hey", "hi"

tag: "greetings"

Patterns like hello will obviously fall under greetings tag. You also need to add a response so that the bot will be able to answer it.

Therefore, in the responses you can add "Hi how may I help you".

Reminder: This intents.json must be formatted correctly, otherwise the model won't be able to train it.

## Data Preprocessing
In this next section is the part where we will preprocess the data. The process that will occur is tokenization, stemming and bag of words. Here, we have defined all the methods.

If you have not installed nltk, do install it. If you have just install nltk and running for the first time, you might want to download punkt as well. You just have to uncomment the code the run them, and it will automatically download.

In [1]:
# !pip install nltk
import numpy as np
import nltk

# You might need to download punkt if you are running this the first time
# nltk.download('punkt')

from nltk.stem.porter import PorterStemmer
stemmer = PorterStemmer()


def tokenize(sentence):
    """
    split sentence into array of words/tokens
    a token can be a word or punctuation character, or number
    """
    return nltk.word_tokenize(sentence)


def stem(word):
    """
    stemming = find the root form of the word
    examples:
    words = ["organize", "organizes", "organizing"]
    words = [stem(w) for w in words]
    -> ["organ", "organ", "organ"]
    """
    return stemmer.stem(word.lower())


def bag_of_words(tokenized_sentence, words):
    """
    return bag of words array:
    1 for each known word that exists in the sentence, 0 otherwise
    example:
    sentence = ["hello", "how", "are", "you"]
    words = ["hi", "hello", "I", "you", "bye", "thank", "cool"]
    bog   = [  0 ,    1 ,    0 ,   1 ,    0 ,    0 ,      0]
    """
    # stem each word
    sentence_words = [stem(word) for word in tokenized_sentence]
    # initialize bag with 0 for each word
    bag = np.zeros(len(words), dtype=np.float32)
    for idx, w in enumerate(words):
        if w in sentence_words: 
            bag[idx] = 1

    return bag

# Part 2: Neural Network Model
Here we define our neural network model. This model has just one hidden layer.

In [2]:
import torch.nn as nn


class NeuralNet(nn.Module):
    def __init__(self, input_size, hidden_size, num_classes):
        super(NeuralNet, self).__init__()
        self.l1 = nn.Linear(input_size, hidden_size) 
        self.l2 = nn.Linear(hidden_size, hidden_size) 
        self.l3 = nn.Linear(hidden_size, num_classes)
        self.relu = nn.ReLU()
    
    def forward(self, x):
        out = self.l1(x)
        out = self.relu(out)
        out = self.l2(out)
        out = self.relu(out)
        out = self.l3(out)
        # no activation and no softmax at the end
        return out


# Part 3: Training the Model
Next step is to train the model. This is the most crucial step and everytime you make changes to the intents.json file, you need to train the model again to see the changes when you run the chatbot. Once the model is trained, it will be saved as data.pth

In [3]:
import numpy as np
import json

import torch
import torch.nn as nn
from torch.utils.data import Dataset, DataLoader

# from nltk_utils import bag_of_words, tokenize, stem
# from model import NeuralNet

with open('intents.json', 'r') as f:
    intents = json.load(f)

all_words = []
tags = []
xy = []
# loop through each sentence in our intents patterns
for intent in intents['intents']:
    tag = intent['tag']
    # add to tag list
    tags.append(tag)
    for pattern in intent['patterns']:
        # tokenize each word in the sentence
        w = tokenize(pattern)
        # add to our words list
        all_words.extend(w)
        # add to xy pair
        xy.append((w, tag))

# stem and lower each word
ignore_words = ['?', '.', '!']
all_words = [stem(w) for w in all_words if w not in ignore_words]
# remove duplicates and sort
all_words = sorted(set(all_words))
tags = sorted(set(tags))

print(len(xy), "patterns")
print(len(tags), "tags:", tags)
print(len(all_words), "unique stemmed words:", all_words)

# create training data
X_train = []
y_train = []
for (pattern_sentence, tag) in xy:
    # X: bag of words for each pattern_sentence
    bag = bag_of_words(pattern_sentence, all_words)
    X_train.append(bag)
    # y: PyTorch CrossEntropyLoss needs only class labels, not one-hot
    label = tags.index(tag)
    y_train.append(label)

X_train = np.array(X_train)
y_train = np.array(y_train)

# Hyper-parameters 
num_epochs = 1000
batch_size = 8
learning_rate = 0.001
input_size = len(X_train[0])
hidden_size = 8
output_size = len(tags)
print(input_size, output_size)


class ChatDataset(Dataset):

    def __init__(self):
        self.n_samples = len(X_train)
        self.x_data = X_train
        self.y_data = y_train

    # support indexing such that dataset[i] can be used to get i-th sample
    def __getitem__(self, index):
        return self.x_data[index], self.y_data[index]

    # we can call len(dataset) to return the size
    def __len__(self):
        return self.n_samples


dataset = ChatDataset()
train_loader = DataLoader(dataset=dataset,
                          batch_size=batch_size,
                          shuffle=True,
                          num_workers=0)
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model = NeuralNet(input_size, hidden_size, output_size).to(device)

# Loss and optimizer
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)

# Train the model
for epoch in range(num_epochs):
    for (words, labels) in train_loader:
        words = words.to(device)
        labels = labels.to(dtype=torch.long).to(device)
        
        # Forward pass
        outputs = model(words)
        # if y would be one-hot, we must apply
        # labels = torch.max(labels, 1)[1]
        loss = criterion(outputs, labels)
        
        # Backward and optimize
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
        
    if (epoch+1) % 100 == 0:
        print (f'Epoch [{epoch+1}/{num_epochs}], Loss: {loss.item():.4f}')

print(f'final loss: {loss.item():.4f}')

data = {
    "model_state": model.state_dict(),
    "input_size": input_size,
    "hidden_size": hidden_size,
    "output_size": output_size,
    "all_words": all_words,
    "tags": tags
}

FILE = "data.pth"
torch.save(data, FILE)

print(f'training complete. file saved to {FILE}')


44 patterns
8 tags: ['delivery', 'funny', 'goodbye', 'greeting', 'items', 'payments', 'price', 'thanks']
81 unique stemmed words: ["'s", 'a', 'accept', 'and', 'anyon', 'anyth', 'are', 'buy', 'bye', 'can', 'card', 'cash', 'cost', 'credit', 'day', 'deliveri', 'do', 'doe', 'for', 'funni', 'get', 'good', 'goodby', 'greet', 'have', 'hello', 'help', 'here', 'hey', 'hi', 'how', 'hye', 'i', 'in', 'is', 'item', 'joke', 'kind', 'know', 'later', 'long', 'lot', 'make', 'mastercard', 'me', 'much', 'my', 'of', 'onli', 'or', 'pay', 'payment', 'paypal', 'plushi', 'price', 'product', 'sale', 'see', 'sell', 'shall', 'ship', 'shop', 'someth', 'store', 't-shirt', 'take', 'tell', 'thank', 'that', 'the', 'there', 'they', 'thi', 'time', 'to', 'wassup', 'what', 'when', 'which', 'with', 'you']
81 8
Epoch [100/1000], Loss: 0.4613
Epoch [200/1000], Loss: 0.0157
Epoch [300/1000], Loss: 0.0034
Epoch [400/1000], Loss: 0.0016
Epoch [500/1000], Loss: 0.0010
Epoch [600/1000], Loss: 0.0005
Epoch [700/1000], Loss: 0.000

# Testing the Model
Finally, we can run the model and test it ourselves. Once you hit run, a terminal will pop up and we can enter our queries. Make sure you have gone through the intents.json so you have a general idea of what to ask the chatbot so it will be able to answer.

In [None]:
import random
import json
import torch
# from model import NeuralNet
# from nltk_utils import bag_of_words, tokenize

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

with open('intents.json', 'r') as json_data:
    intents = json.load(json_data)

FILE = "data.pth"
data = torch.load(FILE)

input_size = data["input_size"]
hidden_size = data["hidden_size"]
output_size = data["output_size"]
all_words = data['all_words']
tags = data['tags']
model_state = data["model_state"]

model = NeuralNet(input_size, hidden_size, output_size).to(device)
model.load_state_dict(model_state)
model.eval()

bot_name = "Chat Bot"
print("Welcome, How may I serve you?")
while True:
    # sentence = "do you use credit cards?"
    sentence = input("You: ")
    if sentence == "quit":
        break

    sentence = tokenize(sentence)
    X = bag_of_words(sentence, all_words)
    X = X.reshape(1, X.shape[0])
    X = torch.from_numpy(X).to(device)

    output = model(X)
    _, predicted = torch.max(output, dim=1)

    tag = tags[predicted.item()]

    probs = torch.softmax(output, dim=1)
    prob = probs[0][predicted.item()]
    if prob.item() > 0.75:
        for intent in intents['intents']:
            if tag == intent["tag"]:
                print(f"{bot_name}: {random.choice(intent['responses'])}")
    else:
        print(f"{bot_name}: I do not understand...")

Welcome, How may I serve you?
You: Hi
Chat Bot: Hey :-)


# Summary
This handson is a simple way of making a chatbot, though you may seen sometimes the model is not able to understand the query and could not provide a response. This could be due to the lack of data in the dataset and vanishing gradient problem in our model.

Nevertheless, this may be a kick start to creating chatbots and we can make improvements to this chatbots by implementing RNN and better data preprocessing.

# Contributors
Author
Pahvindran Raj