<a href="https://colab.research.google.com/github/Antana-A/Star-Wars-Chat-Bot-A-Feedforward-Neural-Network-Approach-using-PyTorch/blob/main/Star_Wars_Chat_Bot.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

### Star Wars Chat Bot
This project explores the development of a simple chatbot using PyTorch, capable of engaging in conversations related to the Star Wars universe. The chatbot utilizes a neural network model trained on a dataset of common Star Wars facts and responses. By leveraging natural language processing techniques, the chatbot is designed to recognize user input patterns and generate contextually relevant replies.
The primary objective of this project is to demonstrate how a generic chatbot can be created and trained on specific conversational data. The chatbot, although built with Star Wars-themed data, can easily be adapted to other domains with appropriate data formatting. This project serves as a foundational demonstration of artificial intelligence in conversational agents, highlighting the simplicity of building machine learning models capable of basic dialogue systems.
This chatbot showcases not only the capabilities of neural networks in text classification but also the potential for expanding into more advanced conversational agents in the future.

IMPORT THE DATA SET

I created a new python file and name it as chatbot.ipynb and then import all the required modules. After that I loaded starwarsintents.json data file in our Python program.

In [2]:
import json
from google.colab import drive

# Mount the drive (if not already mounted)
drive.mount('/content/drive')

# Correct file path
starwarsintents = '/content/drive/My Drive/starwarsintents.json'

# Open and load the JSON file
with open(starwarsintents, "r") as f:
    intents = json.load(f)

# Print the content to verify
print(intents)



Mounted at /content/drive
{'intents': [{'tag': 'greeting', 'patterns': ['Hi', 'Hey', 'How are you', 'Is anyone there?', 'Hello', 'Good day', "What's up", 'Yo!', 'Howdy', 'Nice to meet you.'], 'responses': ['Hey', 'Hello, thanks for visiting.', 'Hi there, what can I do for you?', 'Hi there, how can I help?', 'Hello, there.', 'Hello Dear', 'Ooooo Hello, looking for someone or something?', 'Yes, I am here.', 'Listening carefully.', 'Ok, I am with you.']}, {'tag': 'goodbye', 'patterns': ['Bye', 'See you later.', 'Goodbye', 'Have a great day.', 'See you next time.', 'It was my pleassure.', 'Take care.', 'See ya!', 'Catch you later.', 'Ciao.'], 'responses': ['See you later, thanks for visiting.', 'May the force be with you!', 'See next time.', 'Was my pleassuare to meet you.', 'Hope will cath up sortly.', 'Have a nice day.', 'Bye! Come back again soon.', 'So, till next time.', 'If you need anything just text me anytime. Bye.', 'Well, hope see you soon!']}, {'tag': 'thanks', 'patterns': ['Tha

### Preprocessing the Data
Data Preprocessing
In natural language processing (NLP), raw text needs to be converted into a format that a machine learning model can process. To achieve this, we will implement custom functions that streamline the process of preparing the data. We'll be using the Natural Language Toolkit (nltk), a powerful Python library that provides essential tools for NLP tasks. You can learn more about nltk here.

Stemming:
Words in different grammatical forms may convey the same meaning. For instance, ‚Äúrun,‚Äù ‚Äúrunning,‚Äù and ‚Äúran‚Äù are variations of the same base word, ‚Äúrun.‚Äù This process of reducing a word to its base or root form is called Stemming. Stemming helps the model recognize these variations as the same word, thus improving the efficiency of the model. In this project, we'll use the Porter Stemmer from the NLTK library, which is one of the most commonly used stemming methods. You can find more information on stemming here.

Bag of Words:
The Bag of Words model is a representation of text data where each sentence is broken down into individual words. We create a list containing all unique words in the dataset and then represent each sentence as an array where each position corresponds to whether a word is present (1) or absent (0). For example, if we have a sentence like ‚Äúhow are you‚Äù and an array of words like ["hi", "hello", "you", "how", "are", "goodbye"], the Bag of Words representation would look like [0, 0, 1, 1, 1, 0].

To achieve this, we will tokenize the sentences into words using the nltk.word_tokenize() function and then create the Bag of Words representation. This method will allow us to easily transform sentences into numerical arrays, which can then be used as inputs to our machine learning model.

In addition, we will ensure that all words are converted to lowercase before stemming, so that variations in capitalization (e.g., "Hello" vs. "hello") do not affect the model‚Äôs performance.




In [16]:
import nltk
from nltk.stem.porter import PorterStemmer
import numpy as np

# Download 'punkt' for tokenization
nltk.download('punkt')

# Initialize the stemmer
stemmer = PorterStemmer()

# Function to tokenize the sentence
def tokenize(sentence):
    return nltk.word_tokenize(sentence)

# Function to stem a word
def stem(word):
    return stemmer.stem(word.lower())

# Function to create the bag of words
def bag_of_words(tokenized_sentence, words):
    # Stem each word in the tokenized sentence
    sentence_words = [stem(word) for word in tokenized_sentence]
    # Initialize the bag of words (0 for each word)
    bag = np.zeros(len(words), dtype=np.float32)
    for idx, w in enumerate(words):
        if w in sentence_words:
            bag[idx] = 1
    return bag



[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Package punkt is already up-to-date!


In order to get the right information, we will be unpacking starwarsintents.json it with the following code.

In [59]:
all_words = []
tags = []
xy = []
# loop through each sentence in our intents patterns
for intent in intents["intents"]:
    tag = intent["tag"]
    # add to tag list
    tags.append(tag)
    for pattern in intent["patterns"]:
        # tokenize each word in the sentence
        w = tokenize(pattern)
        # add to our words list
        all_words.extend(w)
        # add to xy pair
        xy.append((w, tag))
print(xy)


[(['Hi'], 'greeting'), (['Hey'], 'greeting'), (['How', 'are', 'you'], 'greeting'), (['Is', 'anyone', 'there', '?'], 'greeting'), (['Hello'], 'greeting'), (['Good', 'day'], 'greeting'), (['What', "'s", 'up'], 'greeting'), (['Yo', '!'], 'greeting'), (['Howdy'], 'greeting'), (['Nice', 'to', 'meet', 'you', '.'], 'greeting'), (['Bye'], 'goodbye'), (['See', 'you', 'later', '.'], 'goodbye'), (['Goodbye'], 'goodbye'), (['Have', 'a', 'great', 'day', '.'], 'goodbye'), (['See', 'you', 'next', 'time', '.'], 'goodbye'), (['It', 'was', 'my', 'pleassure', '.'], 'goodbye'), (['Take', 'care', '.'], 'goodbye'), (['See', 'ya', '!'], 'goodbye'), (['Catch', 'you', 'later', '.'], 'goodbye'), (['Ciao', '.'], 'goodbye'), (['Thanks'], 'thanks'), (['Thank', 'you'], 'thanks'), (['That', "'s", 'helpful'], 'thanks'), (['Thank', "'s", 'a', 'lot', '!'], 'thanks'), (['Tnx'], 'thanks'), (['Wow'], 'thanks'), (['Great', '!'], 'thanks'), (['Good', '!'], 'thanks'), (['That', 'nice', '!'], 'thanks'), (['Amazing', '!'], 'th

This will separate all the tags & words into their separate lists

In [60]:
# stem and lower each word
ignore_words = ["?", ".", "!"]
all_words = [stem(w) for w in all_words if w not in ignore_words]
# remove duplicates and sort
all_words = sorted(set(all_words))
tags = sorted(set(tags))
print(len(xy), "patterns")
print(len(tags), "tags:", tags)
print(len(all_words), "unique stemmed words:", all_words)


97 patterns
16 tags: ['Menu', 'about me', 'alive', 'bounti hounter', 'creator', 'funny', 'goodbye', 'greeting', 'hepl', 'jedi', 'mission', 'myself', 'sith', 'stories', 'tasks', 'thanks']
121 unique stemmed words: ["'s", '10', 'a', 'abil', 'abl', 'about', 'aliv', 'am', 'amaz', 'ani', 'anyon', 'are', 'aslan', 'assist', 'bar', 'be', 'best', 'bit', 'bounti', 'breath', 'bye', 'can', 'care', 'catch', 'check', 'ciao', 'creat', 'creator', 'daddi', 'day', 'detail', 'do', 'drink', 'els', 'father', 'featur', 'for', 'funni', 'galaxi', 'good', 'goodby', 'great', 'have', 'hello', 'help', 'hey', 'hi', 'hope', 'hounter', 'how', 'howdi', 'i', 'identifi', 'in', 'is', 'it', 'item', 'jedi', 'joke', 'kind', 'know', 'later', 'let', 'look', 'lot', 'me', 'meet', 'menu', 'mision', 'mission', 'mr.', 'my', 'myself', 'need', 'next', 'nice', 'now', 'of', 'on', 'one', 'partner', 'person', 'pleas', 'pleassur', 'profil', 'right', 'run', 'see', 'select', 'serv', 'sing', 'sith', 'so', 'someth', 'stori', 'take', 'talk',

#Create Training and Testing Data
We will transform the data into a format that our PyTorch Model can easily understand. One hot encoding Is the process of splitting multiclass or multi valued data column to separate columns and labelling the cell 1 in the row where it exists. (we won‚Äôt use it so don‚Äôt worry about it). Click here to know more about CrossEntopyLoss.

In [61]:
import numpy as np



# Create training data
X_train = []
y_train = []

for (pattern_sentence, tag) in xy:
    # X: bag of words for each pattern_sentence
    bag = bag_of_words(pattern_sentence, all_words)
    X_train.append(bag)

    # Check if tag exists in tags
    #if tag not in tags:
        #print(f"Tag '{tag}' not found in tags list: {tags}")
       # continue  # Skip if the tag is not found

    # y: PyTorch CrossEntropyLoss needs only class labels, not one-hot
    label = tags.index(tag)  # Find index of the tag in the tags list
    y_train.append(label)

# Convert training data lists to numpy arrays
X_train = np.array(X_train)
y_train = np.array(y_train)

print(X_train)
print(y_train)



[[0. 0. 0. ... 0. 0. 0.]
 [0. 0. 0. ... 0. 0. 0.]
 [0. 0. 0. ... 0. 1. 0.]
 ...
 [0. 0. 0. ... 0. 0. 0.]
 [0. 0. 1. ... 0. 0. 0.]
 [0. 0. 1. ... 0. 1. 0.]]
[ 7  7  7  7  7  7  7  7  7  7  6  6  6  6  6  6  6  6  6  6 15 15 15 15
 15 15 15 15 15 15 14 14 14 14 14  2  2  2  0  0  0  0  0  0  0  0  0  0
  8  8  8  8  8  8  8  8  8  8 10 10 10 10 10  9  9  9  9  9 12 12 12 12
 12  3  3  3  3  3  5  5  5  5  5  1  1  1  1  4  4  4  4 11 11 11 11 13
 13]




*  PyTorch Model

In this section, we will create a class that implements our custom Neural Network using PyTorch. Specifically, we‚Äôll develop a Feed Forward Neural Network (FFNN) with three linear layers, using the ReLU (Rectified Linear Unit) activation function to introduce non-linearity to the model. For more detailed information on PyTorch and neural networks, click here.

*  Feed Forward Neural Network (FFNN):
A Feed Forward Neural Network is a simple yet powerful type of artificial neural network where information flows in only one direction‚Äîfrom the input layer, through the hidden layers, to the output layer. In contrast to other neural networks, such as Recurrent Neural Networks (RNNs), FFNNs do not have loops or cycles in their architecture. This structure ensures that the data moves forward in the network without feedback loops. Feedforward networks are widely used in many tasks, such as image classification and natural language processing, and are particularly known for their ease of implementation. Learn more about FFNNs here.

* Activation Function:
In neural networks, activation functions are essential to introducing non-linearities, which help the model learn complex patterns in the data. An activation function takes the output from a neuron and decides whether or not to "activate" it by applying a threshold or transformation. Without activation functions, neural networks would behave like a simple linear transformation and would not be capable of solving problems that require complex, non-linear decision boundaries. You can explore activation functions further here.

* ReLU Function:
One of the most popular and efficient activation functions in deep learning is the ReLU (Rectified Linear Unit) function. The ReLU function outputs zero if the input is negative and outputs the input directly if it is positive. This simple yet powerful function helps models learn faster and handle large datasets more efficiently by mitigating the vanishing gradient problem. The mathematical definition of ReLU is:

ùëì
(
ùë•
)
=
max
‚Å°
(
0
,
ùë•
)
f(x)=max(0,x)




The flat gradient for negative values of
ùë•
x and the linear relationship for positive values make ReLU a highly efficient choice for deep neural networks. To read more about ReLU, click here.


Creating our model. Here we have inherited a class from NN.Module because we will be customizing the model & its layers

In [62]:
import torch
import torch.nn as nn

# Define the neural network class
class NeuralNet(nn.Module):
    def __init__(self, input_size, hidden_size, num_classes):
        super(NeuralNet, self).__init__()
        self.l1 = nn.Linear(input_size, hidden_size)
        self.l2 = nn.Linear(hidden_size, hidden_size)
        self.l3 = nn.Linear(hidden_size, num_classes)
        self.relu = nn.ReLU()

    def forward(self, x):
        out = self.l1(x)
        out = self.relu(out)
        out = self.l2(out)
        out = self.relu(out)
        out = self.l3(out)
        # no activation and no softmax at the end
        return out



We will use some Magic functions, write our class. You can read online about **getitem** and **setitem** magic funtions.

In [63]:
import torch
from torch.utils.data import Dataset

# Define the custom dataset class
class ChatDataset(Dataset):
    def __init__(self, X_train, y_train):
        self.n_samples = len(X_train)
        self.x_data = X_train
        self.y_data = y_train

    # Support indexing such that dataset[i] can be used to get the i-th sample
    def __getitem__(self, index):
        return self.x_data[index], self.y_data[index]

    # We can call len(dataset) to return the size of the dataset
    def __len__(self):
        return self.n_samples



Every Neural network has a set of hyper parameters that need to be set before use.
Before Instantiating our Neural Net Class or Model that we wrote earlier, we will first define some hyper parameters which can be changed accordingly.

In [64]:
import torch
import torch.nn as nn
from torch.utils.data import DataLoader

# Hyper-parameters
num_epochs = 1000
batch_size = 8
learning_rate = 0.001
input_size = len(X_train[0])  # Number of features
hidden_size = 8  # You can adjust this based on your model's complexity
output_size = len(tags)  # Number of classes

print(f"Input size: {input_size}, Output size: {output_size}")



Input size: 121, Output size: 16


# Instantiating the Model, Loss Function, and Optimizer
In this section, we will initialize our custom neural network model, define the loss function, and select an appropriate optimizer to train the model efficiently.

Loss Function: Cross Entropy
For our classification problem, we will use the Cross Entropy Loss function, which is commonly employed for multi-class classification tasks. The Cross Entropy loss function measures the difference between the predicted probability distribution and the actual labels (ground truth). It helps guide the model by penalizing predictions that deviate from the true label, thus minimizing the overall error during training. In PyTorch, we can use nn.CrossEntropyLoss() to apply this loss function. More details on Cross Entropy can be found here.

Optimizer: Adam Optimizer
To update the model‚Äôs weights during training, we will use the Adam (Adaptive Moment Estimation) Optimizer. Adam is an efficient and widely used optimization algorithm that combines the advantages of both AdaGrad (which works well with sparse gradients) and RMSProp (which handles non-stationary objectives). Adam dynamically adjusts the learning rate for each parameter based on estimates of first and second moments of the gradients, making it suitable for large datasets and deep neural networks. In PyTorch, we can initialize the Adam optimizer using torch.optim.Adam(). You can read more about Adam here.

In [66]:
dataset = ChatDataset(X_train, y_train)
train_loader = DataLoader(dataset=dataset, batch_size=batch_size, shuffle=True, num_workers=0)
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = NeuralNet(input_size, hidden_size, output_size).to(device)
# Loss and optimizer
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)



In [65]:
print(f"X_train size: {len(X_train)}")
print(f"y_train size: {len(y_train)}")


X_train size: 97
y_train size: 97


## Training Model

In [67]:
# Train the model
for epoch in range(num_epochs):
    for (words, labels) in train_loader:
        words = words.to(device)
        labels = labels.to(dtype=torch.long).to(device)
        # Forward pass
        outputs = model(words)
        # if y would be one-hot, we must apply
        # labels = torch.max(labels, 1)[1]
        loss = criterion(outputs, labels)
        # Backward and optimize
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
    if (epoch + 1) % 100 == 0:
        print(f"Epoch [{epoch+1}/{num_epochs}], Loss: {loss.item():.4f}")
data = {
    "model_state": model.state_dict(),
    "input_size": input_size,
    "hidden_size": hidden_size,
    "output_size": output_size,
    "all_words": all_words,
    "tags": tags,
}



Epoch [100/1000], Loss: 0.3807
Epoch [200/1000], Loss: 0.0260
Epoch [300/1000], Loss: 0.0006
Epoch [400/1000], Loss: 0.0001
Epoch [500/1000], Loss: 0.0000
Epoch [600/1000], Loss: 0.0009
Epoch [700/1000], Loss: 0.0001
Epoch [800/1000], Loss: 0.0002
Epoch [900/1000], Loss: 0.0000
Epoch [1000/1000], Loss: 0.0000


Saving training model.

In [68]:
FILE = "data.pth"
torch.save(data, FILE)


Loading Data

In [70]:
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
with open(starwarsintents, "r") as json_data:
    intents = json.load(json_data)
FILE = "data.pth"
data = torch.load(FILE)
input_size = data["input_size"]
hidden_size = data["hidden_size"]
output_size = data["output_size"]
all_words = data["all_words"]
tags = data["tags"]
model_state = data["model_state"]
model = NeuralNet(input_size, hidden_size, output_size).to(device)
model.load_state_dict(model_state)
model.eval()


  data = torch.load(FILE)


NeuralNet(
  (l1): Linear(in_features=121, out_features=8, bias=True)
  (l2): Linear(in_features=8, out_features=8, bias=True)
  (l3): Linear(in_features=8, out_features=16, bias=True)
  (relu): ReLU()
)

Our Model is Ready. As our training data was very limited, we can only chat about a handful of topics. You can train it on a bigger dataset to increase the chatbot‚Äôs generalization / knowledge.

In [73]:
bot_name = "Eldoret"
def get_response(msg):
    sentence = tokenize(msg)
    X = bag_of_words(sentence, all_words)
    X = X.reshape(1, X.shape[0])
    X = torch.from_numpy(X).to(device)
    output = model(X)
    _, predicted = torch.max(output, dim=1)
    tag = tags[predicted.item()]
    probs = torch.softmax(output, dim=1)
    prob = probs[0][predicted.item()]
    if prob.item() > 0.75:
        for intent in intents["intents"]:
            if tag == intent["tag"]:
                return random.choice(intent["responses"])
    return "Sorry, didn't get it..."


In [None]:
# Define the bot's name and response logic
bot_name = "Eldoret"

# Placeholder for the response function, modify this with your bot's logic
def get_response(msg):
    return "This is a placeholder response. You said: " + msg

# Chat application for a command-line interface (CLI)
def chat_application():
    print(f"{bot_name}: Hello! How can I assist you today?")
    while True:
        # Get user input
        user_input = input("You: ")
        if user_input.lower() == "exit":
            print(f"{bot_name}: Goodbye!")
            break

        # Get bot's response
        response = get_response(user_input)
        print(f"{bot_name}: {response}")

if __name__ == "__main__":
    chat_application()


Eldoret: Hello! How can I assist you today?
You: anyone there?
Eldoret: This is a placeholder response. You said: anyone there?
You: yes
Eldoret: This is a placeholder response. You said: yes
You: help me 
Eldoret: This is a placeholder response. You said: help me 
You: Goodbye!
Eldoret: This is a placeholder response. You said: Goodbye!
