# Introduction

link https://www.kaggle.com/datasets/elvis23/mental-health-conversational-data/data


Building chatbots capable of providing emotional support to individuals experiencing anxiety and depression has become a key focus in the field of artificial intelligence. A crucial component in developing such chatbots is a well-structured dataset, which serves as the foundation for training models to comprehend and respond empathetically to user messages.

The dataset available here is a comprehensive collection of conversations related to mental health. It encompasses various conversation types, including basic exchanges, frequently asked questions about mental health, classical therapy discussions, and general advice given to individuals facing anxiety and depression. The primary objective of this dataset is to facilitate the training of a chatbot model that emulates a therapist, capable of providing empathetic and supportive responses to those seeking emotional solace.

To train the model effectively, the dataset incorporates the concept of "intents." Each intent represents the underlying purpose behind a user's message. For example, if a user expresses sadness, the associated intent would be "sad." Each intent is accompanied by a set of patterns, which are example messages aligning with the specific intent, as well as corresponding responses that the chatbot should generate based on that intent. Through defining multiple intents and their respective patterns and responses, the model learns to identify user intents and generate relevant and compassionate replies.

By utilizing this dataset, researchers and developers can train chatbot models to better understand and support individuals coping with anxiety and depression. The goal is to create a virtual conversational agent that can offer emotional guidance, provide helpful insights, and alleviate some of the challenges faced by those seeking mental health support.

##Data Preparation
Load the dataset into a suitable data structure (e.g., Pandas DataFrame).
Examine the dataset to understand its structure and distribution.
Preprocess the data by removing unnecessary characters, converting text to lowercase, and handling any missing values.

##Imports

In [9]:
import kagglehub
import pandas as pd
import plotly.graph_objects as go
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.svm import SVC
from sklearn.metrics import classification_report

##Loading the data

In [2]:
# Download latest version
path = kagglehub.dataset_download("elvis23/mental-health-conversational-data")

print("Path to dataset files:", path)

Using Colab cache for faster access to the 'mental-health-conversational-data' dataset.
Path to dataset files: /kaggle/input/mental-health-conversational-data


In [3]:
import json

with open('/kaggle/input/mental-health-conversational-data/intents.json', 'r') as f:
    data = json.load(f)

df = pd.DataFrame(data['intents'])
df

Unnamed: 0,tag,patterns,responses
0,greeting,"[Hi, Hey, Is anyone there?, Hi there, Hello, H...",[Hello there. Tell me how are you feeling toda...
1,morning,[Good morning],[Good morning. I hope you had a good night's s...
2,afternoon,[Good afternoon],[Good afternoon. How is your day going?]
3,evening,[Good evening],[Good evening. How has your day been?]
4,night,[Good night],"[Good night. Get some proper sleep, Good night..."
...,...,...,...
75,fact-28,[What do I do if I'm worried about my mental h...,[The most important thing is to talk to someon...
76,fact-29,[How do I know if I'm unwell?],"[If your beliefs , thoughts , feelings or beha..."
77,fact-30,[How can I maintain social connections? What i...,"[A lot of people are alone right now, but we d..."
78,fact-31,[What's the difference between anxiety and str...,[Stress and anxiety are often used interchange...


##Exploring the data

This code is converting a DataFrame of chatbot intents into a training-ready dictionary where each row is expanded so that every single pattern has a direct mapping to its tag and response

In [5]:
dic = {"tag":[], "patterns":[], "responses":[]}
for i in range(len(df)):
    ptrns = df[df.index == i]['patterns'].values[0]
    rspns = df[df.index == i]['responses'].values[0]
    tag = df[df.index == i]['tag'].values[0]
    for j in range(len(ptrns)):
        dic['tag'].append(tag)
        dic['patterns'].append(ptrns[j])
        dic['responses'].append(rspns)

df = pd.DataFrame.from_dict(dic)
df

Unnamed: 0,tag,patterns,responses
0,greeting,Hi,[Hello there. Tell me how are you feeling toda...
1,greeting,Hey,[Hello there. Tell me how are you feeling toda...
2,greeting,Is anyone there?,[Hello there. Tell me how are you feeling toda...
3,greeting,Hi there,[Hello there. Tell me how are you feeling toda...
4,greeting,Hello,[Hello there. Tell me how are you feeling toda...
...,...,...,...
227,fact-29,How do I know if I'm unwell?,"[If your beliefs , thoughts , feelings or beha..."
228,fact-30,How can I maintain social connections? What if...,"[A lot of people are alone right now, but we d..."
229,fact-31,What's the difference between anxiety and stress?,[Stress and anxiety are often used interchange...
230,fact-32,What's the difference between sadness and depr...,"[Sadness is a normal reaction to a loss, disap..."


In [6]:
df['tag'].unique()

array(['greeting', 'morning', 'afternoon', 'evening', 'night', 'goodbye',
       'thanks', 'no-response', 'neutral-response', 'about', 'skill',
       'creation', 'name', 'help', 'sad', 'stressed', 'worthless',
       'depressed', 'happy', 'casual', 'anxious', 'not-talking', 'sleep',
       'scared', 'death', 'understand', 'done', 'suicide', 'hate-you',
       'hate-me', 'default', 'jokes', 'repeat', 'wrong', 'stupid',
       'location', 'something-else', 'friends', 'ask', 'problem',
       'no-approach', 'learn-more', 'user-agree', 'meditation',
       'user-meditation', 'pandora-useful', 'user-advice',
       'learn-mental-health', 'mental-health-fact', 'fact-1', 'fact-2',
       'fact-3', 'fact-5', 'fact-6', 'fact-7', 'fact-8', 'fact-9',
       'fact-10', 'fact-11', 'fact-12', 'fact-13', 'fact-14', 'fact-15',
       'fact-16', 'fact-17', 'fact-18', 'fact-19', 'fact-20', 'fact-21',
       'fact-22', 'fact-23', 'fact-24', 'fact-25', 'fact-26', 'fact-27',
       'fact-28', 'fact-29', '

Analyze the distribution of intents in the dataset.
  Visualize the frequency of different intents using a bar plot from the Plotly library. The x-axis can represent the intents, and the y-axis can represent the count of patterns or responses associated with each intent.👇

In [8]:
intent_counts = df['tag'].value_counts()
fig = go.Figure(data=[go.Bar(x=intent_counts.index, y=intent_counts.values)])
fig.update_layout(title='Distribution of Intents', xaxis_title='Intents', yaxis_title='Count')
fig.show()

In [12]:
df.columns

Index(['tag', 'patterns', 'responses'], dtype='object')

##Training and building the model

In [13]:
X = df["patterns"]
y = df["tag"]

In [15]:
X_train,X_test, y_train, y_test = train_test_split(
    X,y,
    train_size=0.75,
    random_state=42

)

🔹 What is TfidfVectorizer?

It’s a class in scikit-learn (sklearn.feature_extraction.text) that converts a collection of text documents into a matrix of numerical features based on TF–IDF values.

Formula =
IDF(t)=log(1+df(t)N​)

**bold text**

TF–IDF = Term Frequency – Inverse Document Frequency
It measures how important a word is in a document compared to the entire dataset.
Vectorize the text data using TF-IDF 👇

In [16]:
vectorizer = TfidfVectorizer()
X_train_vec = vectorizer.fit_transform(X_train)
X_test_vec = vectorizer.transform(X_test)


🔹 What is SVC?

SVC stands for Support Vector Classifier.

It comes from Support Vector Machines (SVMs), a powerful supervised machine learning algorithm.

SVC is used for classification problems (binary or multi-class).

🔹 How SVM works (intuition)

SVM tries to find the best boundary (hyperplane) that separates different classes in your data.

It maximizes the margin (distance between the boundary and the nearest data points from each class, called support vectors).

If data isn’t linearly separable, it can use a kernel trick to map data into a higher-dimensional space where separation is possible
Train a Support Vector Machine (SVM) classifier👇

In [17]:
model = SVC()
model.fit(X_train_vec, y_train)

In [18]:
y_pred = model.predict(X_test_vec)

##Evaluate the model's performance

In [19]:
report = classification_report(y_test, y_pred, output_dict=True, zero_division=0)

In [20]:
report = {label: {metric: report[label][metric] for metric in report[label]} for label in report if isinstance(report[label], dict)}

In [21]:
labels = list(report.keys())
evaluation_metrics = ['precision', 'recall', 'f1-score']
metric_scores = {metric: [report[label][metric] for label in labels if label in report] for metric in evaluation_metrics}

In [22]:
fig = go.Figure()
for metric in evaluation_metrics:
    fig.add_trace(go.Bar(name=metric, x=labels, y=metric_scores[metric]))

fig.update_layout(title='Intent Prediction Model Performance',
                  xaxis_title='Intent',
                  yaxis_title='Score',
                  barmode='group')

fig.show()

##Model Deployment
Once satisfied with the model's performance, deploy the intent prediction model in a chatbot framework.
Utilize the trained model to predict intents based on user input in real-time.
Implement an appropriate response generation mechanism to provide relevant and empathetic responses based on the predicted intents.

#Interface to chat with model

In [24]:
import ipywidgets as widgets
from IPython.display import display
import random

def get_response(tag):
    """Finds a response for a given tag."""
    responses = df[df['tag'] == tag]['responses'].values
    if responses.any():
        # Select a random response from the list
        return random.choice(responses[0])
    return "I'm not sure how to respond to that."

def predict_intent(text):
    """Predicts the intent of the input text and returns a response."""
    text_vec = vectorizer.transform([text])
    predicted_tag = model.predict(text_vec)[0]
    response = get_response(predicted_tag)
    return f"Predicted Intent: {predicted_tag}\nResponse: {response}"

# Create an input text area
text_input = widgets.Textarea(
    value='',
    placeholder='Enter your message here...',
    description='Your Message:',
    disabled=False,
    layout=widgets.Layout(width='50%', height='100px')
)

# Create an output area
output_area = widgets.Output()

# Function to handle button click
def on_button_click(b):
    with output_area:
        output_area.clear_output()
        if text_input.value:
            result = predict_intent(text_input.value)
            print(result)
        else:
            print("Please enter a message.")

# Create a button
button = widgets.Button(description="Get Response")
button.on_click(on_button_click)

# Display the widgets
display(text_input, button, output_area)

Textarea(value='', description='Your Message:', layout=Layout(height='100px', width='50%'), placeholder='Enter…

Button(description='Get Response', style=ButtonStyle())

Output()

**How to use the interface:**

1.  Type your message in the text box above.
2.  Click the "Get Response" button.
3.  The predicted intent and a corresponding response will appear below the button.

##Conclusion
Please Upvote if you even learn a bit 🥰