# Smart Conversational Agent for Mental Health Support

## Design Approach: Rule-Based

### Overview

This project aims to create a smart conversational agent designed to provide mental health support. The agent is built using a rule-based approach, allowing it to offer empathetic conversations, coping strategies, and resources for users experiencing stress, anxiety, or depression. Additionally, the agent can guide users to professional help if needed.

### Table of Contents

1. [Introduction](#introduction)
2. [Objectives](#objectives)
3. [Technologies Used](#technologies-used)
4. [Dataset and Preprocessing](#dataset-and-preprocessing)
5. [Rule-Based Design](#rule-based-design)
6. [Implementation](#implementation)
    1. [Text Normalization](#text-normalization)
    2. [Bag of Words (BOW) Model](#bag-of-words-bow-model)
    3. [Cosine Similarity Calculation](#cosine-similarity-calculation)
    4. [Response Generation](#response-generation)
7. [User Interaction](#user-interaction)
8. [Examples](#examples)
9. [Future Enhancements](#future-enhancements)
10. [Conclusion](#conclusion)

### Introduction

Mental health is a crucial aspect of overall well-being, and many people experience stress, anxiety, or depression at some point in their lives. While professional help is essential, immediate support and coping strategies can be beneficial. This project aims to bridge that gap by providing a conversational agent capable of offering support and guidance.

### Objectives

- To provide empathetic conversation to users experiencing mental health issues.
- To offer coping strategies and resources.
- To guide users to professional help when necessary.

### Technologies Used

- Python
- Natural Language Processing (NLP)
- Scikit-learn
- Pandas
- NLTK (Natural Language Toolkit)

### Rule-Based Design

The agent will utilize a rule-based approach, where predefined rules and patterns are used to generate responses. This design ensures that the agent can handle a wide range of queries related to mental health.

### Implementation

In [29]:
import pandas as pd
import nltk 
import numpy as np
import re
from nltk.stem import wordnet                                  # to perform lemmitization
from sklearn.feature_extraction.text import CountVectorizer    # to perform bow
from sklearn.feature_extraction.text import TfidfVectorizer    # to perform tfidf
from nltk import pos_tag                                       # for parts of speech
from sklearn.metrics import pairwise_distances                 # to perfrom cosine similarity
from nltk import word_tokenize                                 # to create tokens
from nltk.corpus import stopwords                              # for stop words

In [30]:
path_to_csv = 'Dataset/mentalhealth.csv'
df = pd.read_csv(path_to_csv, nrows = 20)
df.head()

Unnamed: 0,Question_ID,Questions,Answers
0,1590140,What does it mean to have a mental illness?,Mental illnesses are health conditions that di...
1,2110618,Who does mental illness affect?,"Mental illness does can affect anyone, regardl..."
2,9434130,What are some of the warning signs of mental i...,Symptoms of mental health disorders vary depen...
3,7657263,Can people with mental illness recover?,"When healing from mental illness, early identi..."
4,1619387,What should I do if I know someone who appears...,We encourage those with symptoms to talk to th...


In [31]:
df.isnull().sum()

Question_ID    0
Questions      0
Answers        0
dtype: int64

### Dataset and Preprocessing

For this project, we will use a predefined set of questions and answers related to mental health support. The data will be preprocessed using text normalization techniques to ensure consistency and accuracy in responses.

In [49]:
sample_text="Ahmad Makki is working very hard on nlp project"
s=word_tokenize(sample_text)
s

['Ahmad', 'Makki', 'is', 'working', 'very', 'hard', 'on', 'nlp', 'project']

In [50]:
lemma=wordnet.WordNetLemmatizer()
lemma.lemmatize('booked',pos='v')

'book'

In [54]:
pos_tag(nltk.word_tokenize(sample_text))

[('Ahmad', 'NNP'),
 ('Makki', 'NNP'),
 ('is', 'VBZ'),
 ('working', 'VBG'),
 ('very', 'RB'),
 ('hard', 'RB'),
 ('on', 'IN'),
 ('nlp', 'NNS'),
 ('project', 'NN')]

In [64]:
stop=stopwords.words('english')
print(stop)

['i', 'me', 'my', 'myself', 'we', 'our', 'ours', 'ourselves', 'you', "you're", "you've", "you'll", "you'd", 'your', 'yours', 'yourself', 'yourselves', 'he', 'him', 'his', 'himself', 'she', "she's", 'her', 'hers', 'herself', 'it', "it's", 'its', 'itself', 'they', 'them', 'their', 'theirs', 'themselves', 'what', 'which', 'who', 'whom', 'this', 'that', "that'll", 'these', 'those', 'am', 'is', 'are', 'was', 'were', 'be', 'been', 'being', 'have', 'has', 'had', 'having', 'do', 'does', 'did', 'doing', 'a', 'an', 'the', 'and', 'but', 'if', 'or', 'because', 'as', 'until', 'while', 'of', 'at', 'by', 'for', 'with', 'about', 'against', 'between', 'into', 'through', 'during', 'before', 'after', 'above', 'below', 'to', 'from', 'up', 'down', 'in', 'out', 'on', 'off', 'over', 'under', 'again', 'further', 'then', 'once', 'here', 'there', 'when', 'where', 'why', 'how', 'all', 'any', 'both', 'each', 'few', 'more', 'most', 'other', 'some', 'such', 'no', 'nor', 'not', 'only', 'own', 'same', 'so', 'than', '

#### Text Normalization

Text normalization involves converting text to a standard format, including lowercasing, removing punctuation, and lemmatization.

```python
def text_normalization(text):
    # Implement the actual text normalization logic
    return text.lower()

In [65]:
# function that performs text normalization steps and returns the lemmatized tokens as a sentence

def text_normalization(text):
    text = str(text).lower()                        # text to lower case
    spl_char_text = re.sub(r'[^ a-z]','',text)      # removing special characters
    tokens = nltk.word_tokenize(spl_char_text)      # word tokenizing
    lema = wordnet.WordNetLemmatizer()              # intializing lemmatization
    tags_list = pos_tag(tokens,tagset=None)         # parts of speech
    lema_words = []                                 # empty list 
    for token,pos_token in tags_list:               # lemmatize according to POS
        if pos_token.startswith('V'):               # Verb
            pos_val = 'v'
        elif pos_token.startswith('J'):             # Adjective
            pos_val = 'a'
        elif pos_token.startswith('R'):             # Adverb
            pos_val = 'r'
        else:
            pos_val = 'n'                           # Noun
            
        lema_token = lema.lemmatize(token,pos_val)

        if lema_token in stop: 
          lema_words.append(lema_token)             # appending the lemmatized token into a list
    
    return " ".join(lema_words) 

In [74]:
text_normalization(sample_text)  # example

'be very on'

In [76]:
df['lemmatized_text']=df['Questions'].apply(text_normalization)
df.head(4)

Unnamed: 0,Question_ID,Questions,Answers,lemmatized_text
0,1590140,What does it mean to have a mental illness?,Mental illnesses are health conditions that di...,what do it to have a
1,2110618,Who does mental illness affect?,"Mental illness does can affect anyone, regardl...",who do
2,9434130,What are some of the warning signs of mental i...,Symptoms of mental health disorders vary depen...,what be some of the of
3,7657263,Can people with mental illness recover?,"When healing from mental illness, early identi...",can with


In [81]:
cv=CountVectorizer()
X=cv.fit_transform(df['lemmatized_text']).toarray()

In [97]:
features=cv.get_feature_names_out()
df_bot=pd.DataFrame(X,columns=features)
df_bot.head(4)

Unnamed: 0,about,after,and,be,before,between,can,do,for,have,...,or,should,some,the,this,to,what,where,who,with
0,0,0,0,0,0,0,0,1,0,1,...,0,0,0,0,0,1,1,0,0,0
1,0,0,0,0,0,0,0,1,0,0,...,0,0,0,0,0,0,0,0,1,0
2,0,0,0,1,0,0,0,0,0,0,...,0,0,1,1,0,0,1,0,0,0
3,0,0,0,0,0,0,1,0,0,0,...,0,0,0,0,0,0,0,0,0,1


In [98]:
Question = 'What treatment options are available'                           # example
Question_lemma = text_normalization(Question)                               # clean text
Question_bot = cv.transform([Question_lemma]).toarray()                     # applying bow
Question_bot

array([[0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 1, 0, 0, 0]], dtype=int64)

## Cosine Similarity Calculation
Cosine similarity measures the similarity between two vectors, helping identify the most relevant response.

In [100]:
cosine_sim=1-pairwise_distances(df_bot,Question_bot,metric='cosine')
cosine_sim

array([[0.31622777],
       [0.        ],
       [0.5       ],
       [0.        ],
       [0.23570226],
       [0.        ],
       [1.        ],
       [0.31622777],
       [0.70710678],
       [0.        ],
       [0.31622777],
       [0.        ],
       [0.40824829],
       [0.25      ],
       [0.        ],
       [0.        ],
       [0.        ],
       [0.70710678],
       [0.        ],
       [0.        ]])

In [101]:
df['Similarity']=cosine_sim

In [105]:
simi_score=pd.DataFrame(df,columns=['Answers','Similarity'])
simi_score.head()

Unnamed: 0,Answers,Similarity
0,Mental illnesses are health conditions that di...,0.316228
1,"Mental illness does can affect anyone, regardl...",0.0
2,Symptoms of mental health disorders vary depen...,0.5
3,"When healing from mental illness, early identi...",0.0
4,We encourage those with symptoms to talk to th...,0.235702


In [107]:
simi_score_Decending=simi_score.sort_values(by='Similarity',ascending=False)
simi_score_Decending.head()

Unnamed: 0,Answers,Similarity
6,Different treatment options are available for ...,1.0
17,There are many types of mental health professi...,0.707107
8,There are many types of mental health professi...,0.707107
2,Symptoms of mental health disorders vary depen...,0.5
12,The best source of information regarding medic...,0.408248


In [109]:
threshold=0.1
df_threshold=simi_score_Decending[simi_score_Decending['Similarity']>0.1]
df_threshold.head()

Unnamed: 0,Answers,Similarity
6,Different treatment options are available for ...,1.0
17,There are many types of mental health professi...,0.707107
8,There are many types of mental health professi...,0.707107
2,Symptoms of mental health disorders vary depen...,0.5
12,The best source of information regarding medic...,0.408248


In [111]:
index_value=cosine_sim.argmax()
index_value

6

In [113]:
df['Answers'].loc[index_value]

'Different treatment options are available for individuals with mental illness.'

## TF-IDF

In [135]:
Question1 = 'What treatment options are available'

In [136]:
# using tf-idf

tfidf = TfidfVectorizer()                                             # intializing tf-id 
x_tfidf = tfidf.fit_transform(df['lemmatized_text']).toarray()        # transforming the data into array

In [137]:
Question_lemma1 = text_normalization(Question1)
Question_tfidf = tfidf.transform([Question_lemma1]).toarray()         # applying tf-idf

In [138]:
# returns all the unique word from data with a score of that word

df_tfidf = pd.DataFrame(x_tfidf,columns = tfidf.get_feature_names_out()) 
df_tfidf.head()

Unnamed: 0,about,after,and,be,before,between,can,do,for,have,...,or,should,some,the,this,to,what,where,who,with
0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.392029,0.0,0.550307,...,0.0,0.0,0.0,0.0,0.0,0.367085,0.325401,0.0,0.0,0.0
1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.580211,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.814466,0.0
2,0.0,0.0,0.0,0.321859,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.478821,0.347908,0.0,0.0,0.248876,0.0,0.0,0.0
3,0.0,0.0,0.0,0.0,0.0,0.0,0.440977,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.897519
4,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.282658,0.0,0.396779,...,0.0,0.396779,0.0,0.327977,0.0,0.264673,0.234618,0.0,0.396779,0.0


In [139]:
cos = 1-pairwise_distances(df_tfidf,Question_tfidf,metric='cosine')                     # applying cosine similarity
cos

array([[0.19904882],
       [0.        ],
       [0.40685646],
       [0.        ],
       [0.14351684],
       [0.        ],
       [1.        ],
       [0.20934186],
       [0.56647821],
       [0.        ],
       [0.20934186],
       [0.        ],
       [0.22245129],
       [0.22913146],
       [0.        ],
       [0.        ],
       [0.        ],
       [0.6372619 ],
       [0.        ],
       [0.        ]])

In [141]:
df['similarity_tfidf'] = cos                                                    # creating a new column 
df_simi_tfidf = pd.DataFrame(df, columns=['Answers','similarity_tfidf'])        # taking similarity value of responses for the question we took
df_simi_tfidf.head()

Unnamed: 0,Answers,similarity_tfidf
0,Mental illnesses are health conditions that di...,0.199049
1,"Mental illness does can affect anyone, regardl...",0.0
2,Symptoms of mental health disorders vary depen...,0.406856
3,"When healing from mental illness, early identi...",0.0
4,We encourage those with symptoms to talk to th...,0.143517


In [143]:
df_simi_tfidf_sort = df_simi_tfidf.sort_values(by='similarity_tfidf', ascending=False)            # sorting the values
df_simi_tfidf_sort.head()

Unnamed: 0,Answers,similarity_tfidf
6,Different treatment options are available for ...,1.0
17,There are many types of mental health professi...,0.637262
8,There are many types of mental health professi...,0.566478
2,Symptoms of mental health disorders vary depen...,0.406856
13,Create a plan for switching to a different tre...,0.229131


In [145]:
threshold = 0.1                                                                                   # considering the value of smiliarity to be greater than 0.1
df_threshold = df_simi_tfidf_sort[df_simi_tfidf_sort['similarity_tfidf'] > threshold] 
df_threshold.head()

Unnamed: 0,Answers,similarity_tfidf
6,Different treatment options are available for ...,1.0
17,There are many types of mental health professi...,0.637262
8,There are many types of mental health professi...,0.566478
2,Symptoms of mental health disorders vary depen...,0.406856
13,Create a plan for switching to a different tre...,0.229131


In [146]:
index_value1 = cos.argmax()                                                   # returns the index number of highest value
index_value1

6

In [147]:
df['Answers'].loc[index_value1]                                               # returns the text at that index

'Different treatment options are available for individuals with mental illness.'

## Testing Chatbot


In [152]:
# defining a function that returns response to query using bow

def chat_bow(text):
    lemma = text_normalization(text) # calling the function to perform text normalization
    bow = cv.transform([lemma]).toarray() # applying bow
    cosine_value = 1- pairwise_distances(df_bot,bow, metric = 'cosine' )
    index_value = cosine_value.argmax() # getting index value 
    return df['Answers'].loc[index_value]

In [153]:
chat_bow('can you prevent mental health problems')

'When healing from mental illness, early identification and treatment are of vital importance. '

In [155]:
chat_bow("I'm in depression")

'It is important to be as involved and engaged in the treatment process as possible.'

In [156]:
# Chatbot function
def chatbot():
    print("Hello! I'm a chatbot. Ask me anything.")
    while True:
        user_input = input("You: ")
        if user_input.lower() in ['exit', 'quit', 'bye']:
            print("Chatbot: Goodbye!")
            break
        response = chat_bow(user_input)
        print("Chatbot:", response)

# Running the chatbot
chatbot()

Hello! I'm a chatbot. Ask me anything.


You:  I'm in depression


Chatbot: It is important to be as involved and engaged in the treatment process as possible.


You:  i want medical assistance


Chatbot: Mental illnesses are health conditions that disrupt a person's thoughts, emotions, relationships, and daily functioning.


You:  where i can get medical assistance?


Chatbot: family member, friend, clergy, healthcare provider, or other professionals


You:  i wish i can do suicide


Chatbot: Mental illness does can affect anyone, regardless of gender, age, income, social status, ethnicity, religion, sexual orientation, or background.


You:  bye


Chatbot: Goodbye!


In [157]:
# defining a function that returns response to query using tf-idf

def chat_tfidf(text):
    lemma = text_normalization(text) # calling the function to perform text normalization
    tf = tfidf.transform([lemma]).toarray() # applying tf-idf
    cos = 1-pairwise_distances(df_tfidf,tf,metric='cosine') # applying cosine similarity
    index_value = cos.argmax() # getting index value 
    return df['Answers'].loc[index_value]

In [160]:
# Chatbot function
def chatbot():
    print("Hello! I'm a chatbot. Ask me anything.")
    while True:
        user_input = input("You: ")
        if user_input.lower() in ['exit', 'quit', 'bye']:
            print("Chatbot: Goodbye!")
            break
        response = chat_tfidf(user_input)
        print("Chatbot:", response)

# Running the chatbot
chatbot()

Hello! I'm a chatbot. Ask me anything.


You:  i think i need medical assistance


Chatbot: Mental illnesses are health conditions that disrupt a person's thoughts, emotions, relationships, and daily functioning.


You:  how do i see a counsellor


Chatbot: Visit Healthfinder.gov to learn more.


You:  exit


Chatbot: Goodbye!


## Conclusion
This smart conversational agent aims to provide immediate support for mental health issues through empathetic conversation and coping strategies. While it is not a replacement for professional help, it serves as a valuable tool for users seeking immediate assistance.