## Introduction

Our project is to develop a chatbot that can carry out conversation to mental health and support. We used an available dataset in Kaggle to train our model. Our objective of the project is to implement a chatbot without using LLM(Large Language Models). Therefore, our aim was to build a chatbot which provide responses based on the user intent classification. In our project we used core techniques related to NLP (Natural Language Processing) to pre-process data, build models and in addition to that we have uilized a pre-trained model for word embedding with the concept of transfer learning. We utilized LSTM architecture in building the chatbot model since LSTM architecture is well-suited for handling sequential data and have shown promise in capturing context and dependencies in natural language processing tasks.

## Technologies Used

Here we have used following technologies to develop our model;

- Natural Language Toolkit (nltk) in Python for pre-processing text data
- Pytorch for model development
- GloVe for word embedding
- Transfer learning to implement the embedding layer

In [None]:
# Importing the required libraries
import json
import random
import nltk
from nltk.tokenize import word_tokenize
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader, TensorDataset
from torch.nn.utils.rnn import pad_sequence
from torchtext.vocab import GloVe

## Dataset Description

We used an available dataset from Kaggle website. The dataset contains conversational data related to mental health. The data is in JSON format. The dataset is comaparatively smaller. The data is in the format of objects mainly separated for intents. Under intents tags, patterns and responses are included. Tag represent the label for a particular intent. Patterns are the training data which are the questions and answers given to the chatbot by the user. Responses are the set of pre-defined answers which will be used as the chatbot reponse.

Following is the link for the dataset:

https://www.kaggle.com/datasets/elvis23/mental-health-conversational-data

In [None]:
# Importing the dataset
with open('./data/data.json', 'r') as file:
  data = json.load(file)

# Displaying the data in the json file in a readable format
pretty_data = json.dumps(data, indent=4)
print(pretty_data)

{
    "intents": [
        {
            "tag": "greeting",
            "patterns": [
                "Hi",
                "Hey",
                "Is anyone there?",
                "Hi there",
                "Hello",
                "Hey there",
                "Howdy",
                "Hola",
                "Bonjour",
                "Konnichiwa",
                "Guten tag",
                "Ola"
            ],
            "responses": [
                "Hello there. Tell me how are you feeling today?",
                "Hi there. What brings you here today?",
                "Hi there. How are you feeling today?",
                "Great to see you. How do you feel currently?",
                "Hello there. Glad to see you're back. What's going on in your world right now?"
            ]
        },
        {
            "tag": "morning",
            "patterns": [
                "Good morning"
            ],
            "responses": [
                "Good morning. I hope you had a

## Data Preprocessing

Conversational data in the text format need to be pre-processed and formatted in the suitable form before feeding into the model. We use following step to in data pre-processing;

- Tokenization
- Word embedding
- Indexing the output labels
- Add padding
- Convert into tensors
- Creating the dataset
- Creating data loaders

In [None]:
# Downloading the nltk tokenizer
# Here we use nltk python library for processing our text dataset
# Here we use punkt for tokenization which is a pre-trained model
nltk.download('punkt')

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Package punkt is already up-to-date!


True

In [None]:
# Extracting patterns, tags, and responses
# Here we extract the patterns and tags to feed the model in order to train the model
# And also we extract the reponses for each tag in order to do the intent classification and producing the output
patterns = []
tags = []
tag_to_response = {}

for intent in data['intents']:
    for pattern in intent['patterns']:
        patterns.append(pattern.lower())
        tags.append(intent['tag'])
    tag_to_response[intent['tag']] = intent['responses']

tokenized_patterns = []

# Iterating over each pattern and tokenize it
# Here we use word tokenization to tokenize words in the patterns
for pattern in patterns:
    tokenized_patterns.append(word_tokenize(pattern))

# Displaying the tokenized patterns
print(tokenized_patterns)



In [None]:
# Building the vocabulary from patterns
# Here we create a token generator that yield tokens from the iterator passed to this function
def yield_tokens(data_iter):
    for tokens in data_iter:
        yield tokens

## Word Embedding

We used GloVe (Global Vectors for Word Representation) for word embedding. GloVe is an unsupervised learning algorithm for obtaining vector representations for words. We use the transfer learning to import the model parameters of the pre-trained model and implement the embedding layer in our model.

In [None]:
# Here we use GloVe pre-trained embeddings
glove = GloVe(name='6B', dim=100)

In [None]:
patterns_indices = []

# Iterate over each list of tokens in tokenized_patterns
for tokens in tokenized_patterns:
    indices = []

    # Iterate over each token in the current list of tokens
    for token in tokens:
        if token in glove.stoi:
            index = glove.stoi[token]
            indices.append(index)

    patterns_indices.append(indices)

print(patterns_indices)

[[11083], [7942], [14, 1544, 63, 188], [11083, 63], [13075], [7942, 63], [63848], [77598], [80508], [], [221575, 6647], [39458], [219, 766], [219, 1502], [219, 1738], [219, 364], [12538], [253, 81, 168], [10926], [5695, 112629], [75369], [4862, 12538], [12538, 127], [6769, 25759, 143], [3124], [5551, 81], [12, 9, 8347], [3124, 10, 0, 275], [73, 81, 191, 181], [], [936, 181], [38, 32, 81, 188], [102, 32, 81, 188], [38, 81, 32, 188], [1361, 285, 56, 59, 4961, 2], [102, 14, 392, 311, 188], [102, 189, 41, 580, 81, 188], [102, 9, 392, 311, 188], [1361, 285, 59, 4961], [102, 86, 81, 88, 188], [38, 955, 81, 188], [197, 35, 81, 116, 188], [197, 35, 81, 955, 188], [192, 311, 14], [41, 913, 311, 2], [41, 242, 21], [94, 81, 275, 285, 188], [455, 285, 7, 823, 3832], [86, 81, 275, 188], [102, 86, 81, 88, 10, 285, 188], [41, 408, 280], [41, 408, 275], [280, 285, 3832], [41, 913, 2518, 10678], [41, 913, 100, 10678], [41, 998, 135], [41, 998, 5279], [41, 913, 5279], [41, 998, 100, 10678], [41, 998, 39

In [None]:
# Initialize an empty list to store the tensor sequences
tensor_sequences = []

# Iterate over each sequence in patterns_indices
for seq in patterns_indices:
    tensor_seq = torch.tensor(seq)
    tensor_sequences.append(tensor_seq)

# Padding all sequences to the same length with padding_value as 0
patterns_padded = pad_sequence(tensor_sequences, batch_first=True, padding_value=0)

# Output the result
print(patterns_padded)

tensor([[11083,     0,     0,  ...,     0,     0,     0],
        [ 7942,     0,     0,  ...,     0,     0,     0],
        [   14,  1544,    63,  ...,     0,     0,     0],
        ...,
        [  102,     9,     0,  ...,     0,     0,     0],
        [  102,     9,     0,  ...,     0,     0,     0],
        [ 2333,   118, 13866,  ...,     0,     0,     0]])


In [None]:
# Convert tags (labels) to indices
unique_tags = set(tags)

tag_to_index = {}

idx = 0
for tag in unique_tags:
    tag_to_index[tag] = idx
    idx += 1

tags_indices = []

for tag in tags:
    tags_indices.append(tag_to_index[tag])

In [None]:
# Create the Dataset and DataLoaders
dataset = TensorDataset(patterns_padded, torch.tensor(tags_indices))
dataloader = DataLoader(dataset, batch_size=64, shuffle=True)

## Model Implementation

### Model Architecture

Our chatbot model use the following architecture;

- One embedding layer
- Two LSTM layers (stacked)
- One fully connected layer

In [None]:
# Chatbot model implementation
class ChatbotModel(nn.Module):
    def __init__(self, embedding_dim, hidden_dim, output_dim, pretrained_embedding):
        super(ChatbotModel, self).__init__()
        self.embedding = nn.Embedding.from_pretrained(pretrained_embedding, freeze=True)
        self.lstm = nn.LSTM(embedding_dim, hidden_dim, batch_first=True, num_layers=num_layers)
        self.fc = nn.Linear(hidden_dim, output_dim)

    def forward(self, x):
        x = self.embedding(x)
        x, (hidden, cell) = self.lstm(x)
        x = self.fc(hidden[-1])
        return x

embedding_dim = 100
hidden_dim = 400
num_layers=2
output_dim = len(tag_to_index)
pretrained_embedding = torch.FloatTensor(glove.vectors)

# Initializing the model
model = ChatbotModel(embedding_dim, hidden_dim, output_dim, pretrained_embedding)

criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.01)

In [None]:
# Training loop
num_epochs = 150  # Number of epochs

for epoch in range(num_epochs):
    model.train()
    running_loss = 0.0

    for pattern_batch, tag_batch in dataloader:

        optimizer.zero_grad()

        outputs = model(pattern_batch)  # Forward pass
        loss = criterion(outputs, tag_batch)  # Computing loss

        loss.backward()  # Backward pass
        optimizer.step()  # Updating weights

        running_loss += loss.item() * pattern_batch.size(0)

    # Calculate average loss over the epoch
    epoch_loss = running_loss / len(dataloader.dataset)
    print(f"Epoch {epoch+1}/{num_epochs}, Loss: {epoch_loss:.4f}")

print("Training complete")

Epoch 1/150, Loss: 4.8921
Epoch 2/150, Loss: 4.2983
Epoch 3/150, Loss: 4.1823
Epoch 4/150, Loss: 4.1340
Epoch 5/150, Loss: 4.1234
Epoch 6/150, Loss: 4.1129
Epoch 7/150, Loss: 4.1084
Epoch 8/150, Loss: 4.0982
Epoch 9/150, Loss: 4.0879
Epoch 10/150, Loss: 4.0475
Epoch 11/150, Loss: 3.9997
Epoch 12/150, Loss: 3.9463
Epoch 13/150, Loss: 3.8491
Epoch 14/150, Loss: 3.7918
Epoch 15/150, Loss: 3.7528
Epoch 16/150, Loss: 3.7500
Epoch 17/150, Loss: 3.7222
Epoch 18/150, Loss: 3.7135
Epoch 19/150, Loss: 3.6999
Epoch 20/150, Loss: 3.6359
Epoch 21/150, Loss: 3.6249
Epoch 22/150, Loss: 3.6012
Epoch 23/150, Loss: 3.5682
Epoch 24/150, Loss: 3.5108
Epoch 25/150, Loss: 3.4846
Epoch 26/150, Loss: 3.4555
Epoch 27/150, Loss: 3.4028
Epoch 28/150, Loss: 3.3506
Epoch 29/150, Loss: 3.3294
Epoch 30/150, Loss: 3.2764
Epoch 31/150, Loss: 3.2309
Epoch 32/150, Loss: 3.1803
Epoch 33/150, Loss: 3.1314
Epoch 34/150, Loss: 3.0993
Epoch 35/150, Loss: 3.0821
Epoch 36/150, Loss: 3.1030
Epoch 37/150, Loss: 3.0030
Epoch 38/1

In [None]:
# Function to predict tag from input pattern
def predict_tag(model, tokenizer, vocab, text):
    model.eval()
    tokens = word_tokenize(text.lower())
    indices = [vocab[token] for token in tokens if token in vocab]
    tensor = torch.LongTensor(indices).unsqueeze(0)
    with torch.no_grad():
        output = model(tensor)
    _, predicted = torch.max(output, 1)
    tag = list(tag_to_index.keys())[predicted.item()]
    return tag

## Model Evaluation

We qualitatively evaluate the model based on the responses given by the chatbot. Here we do not measure the performance quantitively and users can test the model by using the chatbot.

In [None]:
# Testing the model
def test_model(model, tokenizer, vocab):
    print("Testing the model...")
    while True:
        text = input("You: ")
        if text.lower() == 'exit':
            print("Chatbot session ended.")
            break
        tag = predict_tag(model, tokenizer, vocab, text)
        if tag in tag_to_response:
            responses = tag_to_response[tag]
            response = random.choice(responses)
            print(f"Chatbot: {response}")
        else:
            print("Chatbot: I'm sorry, I don't understand.")

test_model(model, word_tokenize, glove.stoi)

Testing the model...
You: Hello
Chatbot: Hi there. What brings you here today?
You: What is your name?
Chatbot: You can call me Pandora.
You: Can you help me?
Chatbot: Of course. Feel free to ask me anything. I'll do my best to answer you
You: I need your help.
Chatbot: I am sorry to hear that. What is the reason behind this?
You: I have a problem.
Chatbot: Don't let the little worries bring you down. What's the worse that can happen?
You: I feel sad.
Chatbot: I am sorry to hear that. What is the reason behind this?
You: I lost my job last month.
Chatbot: Can you tell me more about this feeling?
You: I feel sad and stressed.
Chatbot: I am sorry to hear that. What is the reason behind this?
You: I think I have depression.
Chatbot: I understand that it can be scary. Tell me more about it.
You: What is depression?
Chatbot: A mental health disorder characterised by persistently depressed mood or loss of interest in activities, causing significant impairment in daily life.
You: How can I be