# **Mental Health Prediction**

##### **Authors**:
<ul type='square'> 
    <li> Alice Wamuyu</li>
    <li> Eugene Kuloba </li>
    <li> Fridah Kimathi </li>
    <li> Karen Amanya  </li>
    <li> Nicholus Magak  </li>
    <li> Nobert Akwir </li>
</ul>

#  **1. Business Understanding**

## **Objectives**
> ### **General Objective**

> ### **Specific Objectives**
<ul type='square'  
    <li >  </li>
    <li>  </li>
    <li>  </li>
    <li>  </li>
    <li>  </li>
    <li>  </li>
</ul>



### **Importing the required libraries**

In [1]:
import pandas as pd
import string
from textblob import TextBlob, Word
import nltk
from nltk.stem import WordNetLemmatizer
from nltk.stem.porter import PorterStemmer

## To install textblob : conda install -c conda-forge textblob in terminal

# **2. Data Understanding**

The data used in this project is from the <a href="https://zindi.africa/competitions/basic-needs-basic-rights-kenya-tech4mentalhealth/data">  Basic Needs Basic Rights Kenya - Tech4MentalHealth</a> competition hosted by Zindi Africa. The data consists of statements and questions expressed by students from multiple universities across Kenya who reported suffering from these different mental health challenges. he wording of the statements is intended to respond to the prompting question, “What is on your mind?”

#### **Loading the data**

In [2]:
train_df = pd.read_csv('Data/Train.csv')
validation_df = pd.read_csv('Data/Test.csv')

In [3]:
# shape of the datasets
print(f'The train data shape: {train_df.shape}')
print(f'The test data shape: {validation_df.shape}')

The train data shape: (616, 3)
The test data shape: (309, 2)


In [4]:
# the columns in the datasets
print(f'The train data columns: \n {train_df.columns} \n')
print(f'The test data columns: \n {validation_df.columns}')

The train data columns: 
 Index(['ID', 'text', 'label'], dtype='object') 

The test data columns: 
 Index(['ID', 'text'], dtype='object')


In [5]:
# the info
print(f'The train data info: {train_df.info()} \n \n')
print(f'The test data info: {validation_df.info()}')

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 616 entries, 0 to 615
Data columns (total 3 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   ID      616 non-null    object
 1   text    616 non-null    object
 2   label   616 non-null    object
dtypes: object(3)
memory usage: 14.6+ KB
The train data info: None 
 

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 309 entries, 0 to 308
Data columns (total 2 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   ID      309 non-null    object
 1   text    309 non-null    object
dtypes: object(2)
memory usage: 5.0+ KB
The test data info: None


In [6]:
# classes proportionality 
train_df['label'].value_counts(normalize=True)

    # There is class imbalance

Depression    0.571429
Alcohol       0.227273
Suicide       0.107143
Drugs         0.094156
Name: label, dtype: float64

In [7]:
train_df['length'] =  train_df['text'].apply(len)
train_df['length']

0      39
1      28
2      57
3      22
4      51
       ..
611    36
612    30
613    24
614    16
615    31
Name: length, Length: 616, dtype: int64

In [8]:
train_df.describe()

# The smallest statement is 8 words long
# The biggest statement is 196 words long

Unnamed: 0,length
count,616.0
mean,39.813312
std,21.438797
min,8.0
25%,26.0
50%,35.0
75%,48.25
max,196.0


In [9]:
# Viewing the statement with the most words

train_df[train_df['length'] == 196]

Unnamed: 0,ID,text,label,length
194,J55053XP,I am financially constrained over school fees ...,Depression,196


In [10]:
# Viewing the statement with the most words

print(train_df['text'].iloc[194])

I am financially constrained over school fees and my  family background is not stable with a lot of debts…I have an elderly brother who could easily support me but has no job even after graduating


# **3. Data Preparation**

 #### **i. Correcting spelling mistakes**

In [11]:
def correct_sent(text):
    correction = TextBlob(text)
    correction = correction.correct()
    return str(correction)

train_df['corrected_sent'] = train_df['text'].apply(lambda x: correct_sent(x))
train_df.head()

Unnamed: 0,ID,text,label,length,corrected_sent
0,SUAVK39Z,I feel that it was better I dieAm happy,Depression,39,I feel that it was better I die happy
1,9JDAGUV3,Why do I get hallucinations?,Drugs,28,Why do I get hallucinations?
2,419WR1LQ,I am stresseed due to lack of financial suppor...,Depression,57,I am stressed due to lack of financial support...
3,6UY7DX6Q,Why is life important?,Suicide,22,Why is life important?
4,FYC0FTFB,How could I be helped to go through the depres...,Depression,51,Now could I be helped to go through the depres...


In [12]:
# find a way to handle the extra punctation (e.g '...')
print(train_df['corrected_sent'].iloc[194] + '\n \n')

print(train_df['corrected_sent'].iloc[48])

I am financially constrained over school fees and my  family background is not stable with a lot of debts…I have an elderly brother who could easily support me but has no job even after granulating
 

I am facing a lot of challenges in life financially, emotional, psycologically and with no solutions…Now can I safely look for solutions about depression on goose


 #### **ii. Changing text to lowercase**

In [13]:

train_df['corrected_sent'] = train_df['corrected_sent'].apply(lambda x: x.lower())
train_df.head()

Unnamed: 0,ID,text,label,length,corrected_sent
0,SUAVK39Z,I feel that it was better I dieAm happy,Depression,39,i feel that it was better i die happy
1,9JDAGUV3,Why do I get hallucinations?,Drugs,28,why do i get hallucinations?
2,419WR1LQ,I am stresseed due to lack of financial suppor...,Depression,57,i am stressed due to lack of financial support...
3,6UY7DX6Q,Why is life important?,Suicide,22,why is life important?
4,FYC0FTFB,How could I be helped to go through the depres...,Depression,51,now could i be helped to go through the depres...


 #### **iii. Removing the punctuation marks**

In [14]:
# Checking texts with special characters such as â€¦ represented as …
for x in train_df['corrected_sent']:
    if '…' in x:
        print(x)

i feel hopeless, unworthy and useless …now do i cope with stress and forge the past?
i am facing a lot of challenges in life financially, emotional, psycologically and with no solutions…now can i safely look for solutions about depression on goose
there i get money for my needs…there do i get money  for personal needs?
i am financially constrained over school fees and my  family background is not stable with a lot of debts…i have an elderly brother who could easily support me but has no job even after granulating
i feel desperate…why is the world so unfair
by relatives deny me…i wonder if i am part of my family?


In [15]:
# Removing special characters â€¦ represented as …
train_df['corrected_sent'] = train_df['corrected_sent'].apply(lambda x: x.replace('…', ' '))

In [16]:

punc_to_rem = '!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~'

train_df['corrected_sent'] = train_df['corrected_sent'].apply(lambda x: x.translate(str.maketrans('', '', punc_to_rem)))

train_df.head()

Unnamed: 0,ID,text,label,length,corrected_sent
0,SUAVK39Z,I feel that it was better I dieAm happy,Depression,39,i feel that it was better i die happy
1,9JDAGUV3,Why do I get hallucinations?,Drugs,28,why do i get hallucinations
2,419WR1LQ,I am stresseed due to lack of financial suppor...,Depression,57,i am stressed due to lack of financial support...
3,6UY7DX6Q,Why is life important?,Suicide,22,why is life important
4,FYC0FTFB,How could I be helped to go through the depres...,Depression,51,now could i be helped to go through the depres...


 #### **iv. removing stop words**

In [17]:
# Downloading the necessary nltk packages. Uncomment to download

#nltk.download('wordnet')
#nltk.download('omw-1.4')
#nltk.download('punkt')

In [18]:
stopwords = nltk.corpus.stopwords.words('english')
wordnet_lemmatizer = WordNetLemmatizer()

def remove_stopwords(x):
    sent = [wordnet_lemmatizer.lemmatize(i, 'v') for i in x.split() if i not in stopwords]
    return ' '.join(sent)

train_df['no_stopwords'] = train_df['corrected_sent'].apply(lambda x: remove_stopwords(x))
train_df.head()

Unnamed: 0,ID,text,label,length,corrected_sent,no_stopwords
0,SUAVK39Z,I feel that it was better I dieAm happy,Depression,39,i feel that it was better i die happy,feel better die happy
1,9JDAGUV3,Why do I get hallucinations?,Drugs,28,why do i get hallucinations,get hallucinations
2,419WR1LQ,I am stresseed due to lack of financial suppor...,Depression,57,i am stressed due to lack of financial support...,stress due lack financial support school
3,6UY7DX6Q,Why is life important?,Suicide,22,why is life important,life important
4,FYC0FTFB,How could I be helped to go through the depres...,Depression,51,now could i be helped to go through the depres...,could help go depression


In [19]:
# find a way to handle the extra punctation (e.g '...')
print(train_df['no_stopwords'].iloc[194] + '\n \n')

print(train_df['no_stopwords'].iloc[48])

financially constrain school fee family background stable lot debts elderly brother could easily support job even granulate
 

face lot challenge life financially emotional psycologically solutions safely look solutions depression goose


 #### **iv. Tokenizing the sentences**

In [20]:
# tokenizing the sentences
train_df['tokenized_text'] = train_df['no_stopwords'].apply(lambda x: nltk.word_tokenize(x))
train_df.head()

Unnamed: 0,ID,text,label,length,corrected_sent,no_stopwords,tokenized_text
0,SUAVK39Z,I feel that it was better I dieAm happy,Depression,39,i feel that it was better i die happy,feel better die happy,"[feel, better, die, happy]"
1,9JDAGUV3,Why do I get hallucinations?,Drugs,28,why do i get hallucinations,get hallucinations,"[get, hallucinations]"
2,419WR1LQ,I am stresseed due to lack of financial suppor...,Depression,57,i am stressed due to lack of financial support...,stress due lack financial support school,"[stress, due, lack, financial, support, school]"
3,6UY7DX6Q,Why is life important?,Suicide,22,why is life important,life important,"[life, important]"
4,FYC0FTFB,How could I be helped to go through the depres...,Depression,51,now could i be helped to go through the depres...,could help go depression,"[could, help, go, depression]"


 #### **iv. Vectorization**

In [21]:
# Convert sentences into vectors
train_df['vector_sent'] = 

SyntaxError: invalid syntax (<ipython-input-21-dcf41ffd04d6>, line 2)

# **3. Modelling**

# **4. Evaluation**

# **Conclusion and recommendations**