# Stress Detection using Python
Stress detection with machine learning from text strings.

The dataset for this task contains data posted on subreddits related to mental health. 

This dataset contains various mental health problems shared by people about their life.

This dataset is labelled as 0 and 1, where 0 indicates no stress and 1 indicates stress.

The data is: stress.csv

In [11]:
# Install if needed
#!pip3 install nltk
#!pip3 install wordcloud
#!pip3 install matplotlib
!pip3 install wordcloud

Collecting wordcloud
  Using cached wordcloud-1.8.1.tar.gz (220 kB)
  Preparing metadata (setup.py): started
  Preparing metadata (setup.py): finished with status 'done'
Using legacy 'setup.py install' for wordcloud, since package 'wheel' is not installed.
Installing collected packages: wordcloud
  Running setup.py install for wordcloud: started
  Running setup.py install for wordcloud: finished with status 'error'


  error: subprocess-exited-with-error
  
  Running setup.py install for wordcloud did not run successfully.
  exit code: 1
  
  [20 lines of output]
  running install
  running build
  running build_py
  creating build
  creating build\lib.win-amd64-3.10
  creating build\lib.win-amd64-3.10\wordcloud
  copying wordcloud\color_from_image.py -> build\lib.win-amd64-3.10\wordcloud
  copying wordcloud\tokenization.py -> build\lib.win-amd64-3.10\wordcloud
  copying wordcloud\wordcloud.py -> build\lib.win-amd64-3.10\wordcloud
  copying wordcloud\wordcloud_cli.py -> build\lib.win-amd64-3.10\wordcloud
  copying wordcloud\_version.py -> build\lib.win-amd64-3.10\wordcloud
  copying wordcloud\__init__.py -> build\lib.win-amd64-3.10\wordcloud
  copying wordcloud\__main__.py -> build\lib.win-amd64-3.10\wordcloud
  copying wordcloud\stopwords -> build\lib.win-amd64-3.10\wordcloud
  copying wordcloud\DroidSansMono.ttf -> build\lib.win-amd64-3.10\wordcloud
  UPDATING build\lib.win-amd64-3.10\wordcloud/_

In [1]:
import pandas as pd
import numpy as np

In [2]:
data = pd.read_csv("stress_csv.csv", sep=';')
print(data.head())

                                                text  stress_label
0  He said he had not felt that way before, sugge...             1
1  Hey there r/assistance, Not sure if this is th...             0
2  My mom then hit me with the newspaper and it s...             1
3  until i met my new boyfriend, he is amazing, h...             1
4  October is Domestic Violence Awareness Month a...             1


In [3]:
# Check for NA/Null
print(data.isnull().sum())

text            0
stress_label    0
dtype: int64


# Prepare the text column
NLTK is a leading platform for building Python programs to work with human language data.
It includes a "stopword" module - [https://www.nltk.org](https://www.nltk.org)

Prepare the text column of this dataset to clean the text column with stopwords, links, special symbols and language errors:

In [4]:
import nltk
import re
nltk.download('stopwords')
stemmer = nltk.SnowballStemmer("english")
from nltk.corpus import stopwords
import string
stopword=set(stopwords.words('english'))

[nltk_data] Downloading package stopwords to
[nltk_data]     C:\Users\TueHellstern\AppData\Roaming\nltk_data...
[nltk_data]   Package stopwords is already up-to-date!


In [5]:
def clean(text):
    text = str(text).lower()
    text = re.sub('\[.*?\]', '', text)
    text = re.sub('https?://\S+|www\.\S+', '', text)
    text = re.sub('<.*?>+', '', text)
    text = re.sub('[%s]' % re.escape(string.punctuation), '', text)
    text = re.sub('\n', '', text)
    text = re.sub('\w*\d\w*', '', text)
    text = [word for word in text.split(' ') if word not in stopword]
    text=" ".join(text)
    text = [stemmer.stem(word) for word in text.split(' ')]
    text=" ".join(text)
    return text

data["text"] = data["text"].apply(clean)

Have a look at the most used words by the people sharing about their life problems on social media by visualizing a word cloud of the text column:

In [13]:
import matplotlib.pyplot as plt
from wordcloud import WordCloud, STOPWORDS, ImageColorGenerator

text = " ".join(i for i in data.text)
stopwords = set(STOPWORDS)
wordcloud = WordCloud(stopwords=stopwords, 
                      background_color="white").generate(text)

plt.figure( figsize=(15,10))
plt.imshow(wordcloud, interpolation='bilinear')
plt.axis("off")
plt.show()

ModuleNotFoundError: No module named 'wordcloud'

# Stress Detection Model
The label column in this dataset contains labels as 0 and 1. 0 means no stress, and 1 means stress. 

Use Stress and No stress labels instead of 1 and 0. 

Prepare this column accordingly and select the text and label columns for the process of training a machine learning model:

In [16]:
data["stress_label"] = data["stress_label"].map({0: "No Stress", 1: "Stress"})
data = data[["text", "stress_label"]]
print(data.head())

                                                text stress_label
0  said felt way sugget go rest trigger ahead you...          NaN
1  hey rassist sure right place post goe  im curr...          NaN
2  mom hit newspap shock would know dont like pla...          NaN
3  met new boyfriend amaz kind sweet good student...          NaN
4  octob domest violenc awar month domest violenc...          NaN


## Split this dataset into training and test sets:

In [17]:
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.model_selection import train_test_split

x = np.array(data["text"])
y = np.array(data["stress_label"])

cv = CountVectorizer()
X = cv.fit_transform(x)
xtrain, xtest, ytrain, ytest = train_test_split(X, y, 
                                                test_size=0.33, 
                                                random_state=42)

This task is based on the problem of binary classification, I will be using the Bernoulli Naive Bayes algorithm, which is one of the best algorithms for binary classification problems. 

Train the stress detection model:

In [18]:
from sklearn.naive_bayes import BernoulliNB

model = BernoulliNB()
model.fit(xtrain, ytrain)

ValueError: Input contains NaN

# Text examples

- People need to take care of their mental health
- Sometime I feel like I need some help

In [61]:
# No Stress
user = 'People need to take care of their mental health'
data = cv.transform([user]).toarray()
output = model.predict(data)
print(output)

['No Stress']


In [62]:
# Stress
user =  
data = cv.transform([user]).toarray()
output = model.predict(data)
print(output)

['Stress']


In [63]:
# User imput
user = input("Enter a Text: ")
data = cv.transform([user]).toarray()
output = model.predict(data)
print(output)

Enter a Text:  My partner and me are splitting up
['No Stress']


This is how you can train a stress detection model to detect stress from social media posts. 

This machine learning model can be improved by feeding it with more data.