# Final Project: Simple Mental Health Chat-Bot

## Team Muggles - Notebook

In [9]:
import pandas as pd
import nltk 
import numpy as np
import re

from nltk.stem import wordnet                                  # to perform lemmitization
from sklearn.feature_extraction.text import CountVectorizer    # to perform bow
from sklearn.feature_extraction.text import TfidfVectorizer    # to perform tfidf
from nltk import pos_tag                                       # for parts of speech
from sklearn.metrics import pairwise_distances                 # to perfrom cosine similarity
from nltk import word_tokenize                                 # to create tokens
from nltk.corpus import stopwords                              # for stop words

In [10]:
# Load the dataset
df = pd.read_csv("D:\Third Semester\Software Technologies\Mental_Health_FAQ_1.csv")
df.head()

Unnamed: 0,Question_ID,Questions,Answers
0,1590140,What does it mean to have a mental illness?,Mental illnesses are health conditions that di...
1,2110618,Who does mental illness affect?,It is estimated that mental illness affects 1 ...
2,6361820,What causes mental illness?,It is estimated that mental illness affects 1 ...
3,9434130,What are some of the warning signs of mental i...,Symptoms of mental health disorders vary depen...
4,7657263,Can people with mental illness recover?,"When healing from mental illness, early identi..."


In [11]:
df.isnull().sum()

Question_ID    0
Questions      0
Answers        0
dtype: int64

In [20]:
# Create a function to clean up text from dataset:
def preprocess_text(text):
    
    # Remove Punctuation
    text = re.sub('[%s]' % re.escape("""!"#$%&'()*+,-./:;<=>?@[\]^_`{|}~"""), ' ', text)  # remove punctuations
    text = re.sub(r'[^\x00-\x7f]',r' ', text) 
    
    # Remove extra whitespace
    text = re.sub('\s+', ' ', text)  # remove extra whitespace
    
    # Create a lemmatizer to convert the word to their true base form
    lemmatizer = wordnet.WordNetLemmatizer()
    
    stemmer = nltk.stem.porter.PorterStemmer()
    
    # Create a list to store all aailable stopwords
    stopwords_lst = stopwords.words('english')
    
    # Clean up text to get only letters
    text = re.sub('[^a-zA-Z]', ' ', text)
    
    # Text Lowering and Tokenization
    tokens = text.lower().split()
    
    # Removing stopwords from the reviews
    meaningful_words = [w for w in tokens if not w in stopwords_lst]
    
    # Remove those tokens with the length less than 3
    meaningful_words = [w for w in meaningful_words if len(w) > 3]
    
    # Tokens Lemmitization and Stemming
    lemmitize_words = [stemmer.stem(lemmatizer.lemmatize(w)) for w in meaningful_words]
#     lemmitize_words = [lemmatizer.lemmatize(w) for w in meaningful_words]
    
    return (' '.join(lemmitize_words))

In [21]:
df['lemmatized_questions'] = df['Questions'].apply(preprocess_text)   # clean text
df.head(5)

Unnamed: 0,Question_ID,Questions,Answers,lemmatized_questions
0,1590140,What does it mean to have a mental illness?,Mental illnesses are health conditions that di...,mean mental ill
1,2110618,Who does mental illness affect?,It is estimated that mental illness affects 1 ...,mental ill affect
2,6361820,What causes mental illness?,It is estimated that mental illness affects 1 ...,caus mental ill
3,9434130,What are some of the warning signs of mental i...,Symptoms of mental health disorders vary depen...,warn sign mental ill
4,7657263,Can people with mental illness recover?,"When healing from mental illness, early identi...",peopl mental ill recov


## Bag-of_Word

In [24]:
cv = CountVectorizer()                                  # intializing the count vectorizer
X_bow = cv.fit_transform(df['lemmatized_questions']).toarray()

In [26]:
# returns all the unique word from data 
features = cv.get_feature_names_out()
df_bow = pd.DataFrame(X_bow, columns = features)
df_bow.head()

Unnamed: 0,addict,adhd,adult,advanc,affect,alcohol,allow,antidepress,antisoci,anxieti,...,treatment,trial,type,unwel,vape,warn,work,worri,young,youth
0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,1,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,1,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [27]:
example_bow = 'What treatment options are available'  

In [29]:
question_lemma = preprocess_text(example_bow)                               # clean text
question_bow = cv.transform([question_lemma]).toarray() 

In [30]:
# cosine similarity for the above question we considered.
cosine_value = 1- pairwise_distances(df_bow, question_bow, metric = 'cosine' )
print(cosine_value)

[[0.        ]
 [0.        ]
 [0.        ]
 [0.        ]
 [0.        ]
 [0.        ]
 [0.        ]
 [1.        ]
 [0.25819889]
 [0.        ]
 [0.        ]
 [0.25819889]
 [0.        ]
 [0.        ]
 [0.        ]
 [0.        ]
 [0.        ]
 [0.25819889]
 [0.        ]
 [0.        ]
 [0.        ]
 [0.        ]
 [0.        ]
 [0.        ]
 [0.        ]
 [0.        ]
 [0.        ]
 [0.        ]
 [0.        ]
 [0.        ]
 [0.        ]
 [0.        ]
 [0.        ]
 [0.        ]
 [0.        ]
 [0.        ]
 [0.        ]
 [0.        ]
 [0.        ]
 [0.        ]
 [0.        ]
 [0.        ]
 [0.        ]
 [0.        ]
 [0.        ]
 [0.        ]
 [0.        ]
 [0.        ]
 [0.        ]
 [0.        ]
 [0.        ]
 [0.        ]
 [0.        ]
 [0.        ]
 [0.        ]
 [0.        ]
 [0.        ]
 [0.        ]
 [0.        ]
 [0.        ]
 [0.        ]
 [0.        ]
 [0.        ]
 [0.        ]
 [0.        ]
 [0.        ]
 [0.        ]
 [0.        ]
 [0.        ]
 [0.        ]
 [0.        ]
 [0.  

In [31]:
df['similarity_bow'] = cosine_value                                         # create cosine value as a new column

In [33]:
simiscores = pd.DataFrame(df, columns=['Answers','similarity_bow'])         # taking similarity value of responses for the question we took
simiscores

Unnamed: 0,Answers,similarity_bow
0,Mental illnesses are health conditions that di...,0.0
1,It is estimated that mental illness affects 1 ...,0.0
2,It is estimated that mental illness affects 1 ...,0.0
3,Symptoms of mental health disorders vary depen...,0.0
4,"When healing from mental illness, early identi...",0.0
...,...,...
93,Sorting out if you are drinking too much can b...,0.0
94,"Cannabis smoke, for example, contains cancer-c...",0.0
95,You can't. But you can influence their capacit...,0.0
96,Cannabidiol or CBD is a naturally occurring co...,0.0


In [34]:
simscoresDescending = simiscores.sort_values(by = 'similarity_bow', ascending=False)          # sorting the values
simscoresDescending.head()

Unnamed: 0,Answers,similarity_bow
7,Just as there are different types of medicatio...,1.0
8,Since beginning treatment is a big step for in...,0.258199
11,Beginning treatment is a big step for individu...,0.258199
17,Mental health conditions are often treated wit...,0.258199
0,Mental illnesses are health conditions that di...,0.0


In [35]:
threshold = 0.2                                                                     # considering the value of smiliarity to be greater than 0.1
df_threshold = simscoresDescending[simscoresDescending['similarity_bow'] > threshold] 
df_threshold

Unnamed: 0,Answers,similarity_bow
7,Just as there are different types of medicatio...,1.0
8,Since beginning treatment is a big step for in...,0.258199
11,Beginning treatment is a big step for individu...,0.258199
17,Mental health conditions are often treated wit...,0.258199


In [36]:
index_value = cosine_value.argmax()         # index number of highest value
index_value

7

In [37]:
df['Answers'].loc[index_value]   

'Just as there are different types of medications for physical illness, different treatment options are available for individuals with mental illness. Treatment works differently for different people. It is important to find what works best for you or your child.'

## TF-IDF 

In [40]:
# Using TF-IDF
tfidf = TfidfVectorizer()                                             # intializing tf-id 
X_tfidf = tfidf.fit_transform(df['lemmatized_questions']).toarray()        # transforming the data into array

In [41]:
# returns all the unique word from data with a score of that word
df_tfidf = pd.DataFrame(X_tfidf, columns = tfidf.get_feature_names_out()) 
df_tfidf.head()

Unnamed: 0,addict,adhd,adult,advanc,affect,alcohol,allow,antidepress,antisoci,anxieti,...,treatment,trial,type,unwel,vape,warn,work,worri,young,youth
0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,0.0,0.0,0.0,0.0,0.750519,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.600266,0.0,0.0,0.0,0.0
4,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [42]:
question_lemma = preprocess_text(example_bow)                               # clean text
question_tfidf = tfidf.transform([question_lemma]).toarray() 

In [43]:
# cosine similarity for the above question we considered.
cosine_value_tfidf = 1- pairwise_distances(df_tfidf, question_tfidf, metric = 'cosine' )
print(cosine_value_tfidf)

[[0.        ]
 [0.        ]
 [0.        ]
 [0.        ]
 [0.        ]
 [0.        ]
 [0.        ]
 [1.        ]
 [0.21481132]
 [0.        ]
 [0.        ]
 [0.21481132]
 [0.        ]
 [0.        ]
 [0.        ]
 [0.        ]
 [0.        ]
 [0.24196901]
 [0.        ]
 [0.        ]
 [0.        ]
 [0.        ]
 [0.        ]
 [0.        ]
 [0.        ]
 [0.        ]
 [0.        ]
 [0.        ]
 [0.        ]
 [0.        ]
 [0.        ]
 [0.        ]
 [0.        ]
 [0.        ]
 [0.        ]
 [0.        ]
 [0.        ]
 [0.        ]
 [0.        ]
 [0.        ]
 [0.        ]
 [0.        ]
 [0.        ]
 [0.        ]
 [0.        ]
 [0.        ]
 [0.        ]
 [0.        ]
 [0.        ]
 [0.        ]
 [0.        ]
 [0.        ]
 [0.        ]
 [0.        ]
 [0.        ]
 [0.        ]
 [0.        ]
 [0.        ]
 [0.        ]
 [0.        ]
 [0.        ]
 [0.        ]
 [0.        ]
 [0.        ]
 [0.        ]
 [0.        ]
 [0.        ]
 [0.        ]
 [0.        ]
 [0.        ]
 [0.        ]
 [0.  

In [44]:
df['similarity_tfidf'] = cosine_value_tfidf                                                    # creating a new column 
df_simi_tfidf = pd.DataFrame(df, columns=['Answers','similarity_tfidf'])        # taking similarity value of responses for the question we took
df_simi_tfidf

Unnamed: 0,Answers,similarity_tfidf
0,Mental illnesses are health conditions that di...,0.0
1,It is estimated that mental illness affects 1 ...,0.0
2,It is estimated that mental illness affects 1 ...,0.0
3,Symptoms of mental health disorders vary depen...,0.0
4,"When healing from mental illness, early identi...",0.0
...,...,...
93,Sorting out if you are drinking too much can b...,0.0
94,"Cannabis smoke, for example, contains cancer-c...",0.0
95,You can't. But you can influence their capacit...,0.0
96,Cannabidiol or CBD is a naturally occurring co...,0.0


In [45]:
df_simi_tfidf_sort = df_simi_tfidf.sort_values(by='similarity_tfidf', ascending=False)            # sorting the values
df_simi_tfidf_sort.head(10)

Unnamed: 0,Answers,similarity_tfidf
7,Just as there are different types of medicatio...,1.0
17,Mental health conditions are often treated wit...,0.241969
8,Since beginning treatment is a big step for in...,0.214811
11,Beginning treatment is a big step for individu...,0.214811
0,Mental illnesses are health conditions that di...,0.0
63,"If possible, bring up your concerns with the p...",0.0
72,Using cannabis has the potential for benefits ...,0.0
71,If you need to talk to someone or you aren’t s...,0.0
70,Someone else’s illness is not your fault. You ...,0.0
69,We naturally want to help a loved one who isn’...,0.0


In [46]:
threshold = 0.2                                                                                  # considering the value of smiliarity to be greater than 0.1
df_threshold = df_simi_tfidf_sort[df_simi_tfidf_sort['similarity_tfidf'] > threshold] 
df_threshold

Unnamed: 0,Answers,similarity_tfidf
7,Just as there are different types of medicatio...,1.0
17,Mental health conditions are often treated wit...,0.241969
8,Since beginning treatment is a big step for in...,0.214811
11,Beginning treatment is a big step for individu...,0.214811


## Chat_Bot Implementation

In [47]:
def get_best_answer(query, faq_df, tfidf_matrix, vectorizer):
    # Preprocess the input query
    processed_query = preprocess_text(query)
    
    # Transform the query into the same TF-IDF space
    query_tfidf = vectorizer.transform([processed_query]).toarray()
    
    # Compute cosine similarity between the query and the FAQ dataset
    similarity_scores = 1- pairwise_distances(tfidf_matrix, query_tfidf, metric = 'cosine' ) # cosine_similarity(query_tfidf, tfidf_matrix)
    
    # Get the index of the highest similarity score
    best_match_index = similarity_scores.argmax()
    
    # Return the question and answer with the highest similarity score
    best_question = faq_df.iloc[best_match_index]['Questions']
    best_answer = faq_df.iloc[best_match_index]['Answers']
    
    return best_question, best_answer
def mental_health_chatbot(faq_df, tfidf_matrix, vectorizer):
    print("Welcome to the Mental Health Support Chatbot.")
    print("Please note that this is not a substitute for professional help.")
    print("If you are in crisis, please contact a professional immediately.")
    print('-' * 50)
    print("Please type your question or type 'exit' to quit.")
    print('=' * 50)
    
    while True:
        user_query = input("You: ")
        
        if user_query.lower() in ["exit", "quit"]:
            print("Chatbot: Thank you for chatting. Remember to take care of yourself!")
            break
        
        best_question, best_answer = get_best_answer(user_query, faq_df, tfidf_matrix, vectorizer)
        print('-' * 30)
        
        print(f"Chatbot: {best_answer}")
        print('='* 30)
        
if __name__ == "__main__":
    # Load and preprocess the FAQ dataset
    faq_df = pd.read_csv('mental_health_faq.csv')
    faq_df['text'] = faq_df['Questions'] + ' ' + faq_df['Answers']
    faq_df['lemmatized_text'] = faq_df['text'].apply(preprocess_text)
    vectorizer = TfidfVectorizer()
    tfidf_matrix = vectorizer.fit_transform(faq_df['lemmatized_text'])
    
    # Start the chatbot
    mental_health_chatbot(faq_df, tfidf_matrix, vectorizer)


Welcome to the Mental Health Support Chatbot.
Please note that this is not a substitute for professional help.
If you are in crisis, please contact a professional immediately.
--------------------------------------------------
Please type your question or type 'exit' to quit.
You: What treatment options are available?
------------------------------
Chatbot: Just as there are different types of medications for physical illness, different treatment options are available for individuals with mental illness. Treatment works differently for different people. It is important to find what works best for you or your child.
You: how do i see a counsellor?
------------------------------
Chatbot: You can find directories of counsellors through their professional organizations. 
 Registered Clinical Counsellors: visit the BC Association of Clinical Counsellors 
 Canadian Certified Counsellors: visit the Canadian Counselling and Psychotherapy Association 
 Canadian Professional Counsellors: visit t

You: What are the signs of having sleep disorder?
------------------------------
Chatbot: Taking care of your physical health is also good for your mental health. It's more important than ever to keep yourself healthy. 
 Try to eat as well as you can. It may be easier to reach for unhealthier comfort foods and snacks while you spend more time at home, but try to keep a balanced approach. When you stock up on groceries, don’t ignore fresh fruit and vegetables—we still have everything we need to prepare food. Now that we're advised to limit the amount of time we spend in public spaces like grocery stores, this is a great time to try out new fruits and vegetables that keep at home for longer periods of time. 
 If it's safer for you to stay home or you are in self-isolation, reach out for help. Many grocery stores and meal prep services offer safe, no-contact delivery. You can also ask family or friends to bring you groceries, or look for local COVID-19 support groups on social media. It's