<h2><b>Stress Detection with Machine Learning</b></h2>
<h4><b>Author:</b> Data Science @ Georgia Tech</h4>
<p><b>Reference:</b> <a href="https://medium.com/coders-camp/225-machine-learning-projects-with-python-44d6ea8ace18">Medium</a></p>

<b>Welcome to the Stress Detection self-guided project!</b>

Stress, anxiety, and depression are all precursors for mental health issues amongst people in today's society.

In this project, we are going to look at how to detect stress with Machine Learning.

The modules we will be using in this project are imported for your convenience.

In [None]:
import pandas as pd
import numpy as np
import nltk
import re
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.model_selection import train_test_split

Read the dataset and return the first five rows of the dataset.

In [None]:
# Write your code here.


<details>
  <summary>Click for solution: <font color="sky blue"><b>Dataset Import</b></font></summary>
  <pre>
    <code style="display: block;">
      # Solution
      data = pd.read_csv("file location for the csv file")
      print(data.head())
    </code>
  </pre>
</details>

Check how many null values exist in each column.

In [None]:
# Write your code here.


<details>
  <summary>Click for solution: <font color="sky blue"><b>Null Values</b></font></summary>
  <pre>
    <code style="display: block;">
      # Solution code
      print(data.isnull().sum())
    </code>
  </pre>
</details>

Now we are going to clean the text column with stopwords, links, special symbols, and language errors.

The code for doing this is given below.

In [None]:
nltk.download('stopwords')
stemmer = nltk.SnowballStemmer("english")
from nltk.corpus import stopwords
import string
stopword=set(stopwords.words('english'))

def clean(text):
    text = str(text).lower()
    text = re.sub('\[.*?\]', '', text)
    text = re.sub('https?://\S+|www\.\S+', '', text)
    text = re.sub('<.*?>+', '', text)
    text = re.sub('[%s]' % re.escape(string.punctuation), '', text)
    text = re.sub('\n', '', text)
    text = re.sub('\w*\d\w*', '', text)
    text = [word for word in text.split(' ') if word not in stopword]
    text=" ".join(text)
    text = [stemmer.stem(word) for word in text.split(' ')]
    text=" ".join(text)
    return text
data["text"] = data["text"].apply(clean)

Create the wordcloud of the text column.

In [None]:
# Write your code here.


<details>
  <summary>Click for solution: <font color="sky blue"><b>Wordcloud Solution</b></font></summary>
  <pre>
    <code style="display: block;">
      # Wordcloud Solution
      import matplotlib.pyplot as plt
      from wordcloud import WordCloud, STOPWORDS, ImageColorGenerator
      text = " ".join(i for i in data.text)
      stopwords = set(STOPWORDS)
      wordcloud = WordCloud(stopwords=stopwords,
                            background_color="white").generate(text)
      plt.figure(figsize=(15,10))
      plt.imshow(wordcloud, interpolation='bilinear')
      plt.axis("off")
      plt.show()
    </code>
  </pre>
</details>

For ease of interpretation, convert the label column, consiting of 1's and 0's, to categorical being "Stress" and "No Stress" respectively.

In [None]:
# Write your code here.


<details>
  <summary>Click for solution: <font color="sky blue"><b>Label Conversion</b></font></summary>
  <pre>
    <code style="display: block;">
      # Label conversion solution
      data["label"] = data["label"].map({0: "No Stress", 1: "Stress"})
      data = data[["text", "label"]]
      print(data.head())
    </code>
  </pre>
</details>

Split the data into testing and training sets.

In [None]:
# Write your code here.


<details>
  <summary>Click for solution: <font color="sky blue"><b>Train-Test Split</b></font></summary>
  <pre>
    <code style="display: block;">
      # Dataset splitting solution
      x = np.array(data["text"])
      y = np.array(data["label"])
      cv = CountVectorizer()
      X = cv.fit_transform(x)
      xtrain, xtest, ytrain, ytest = train_test_split(X, y, test_size=0.33, random_state=42)
    </code>
  </pre>
</details>

We will use a Bernoulli Naive Bayes Algorithm because it is one of the best ones used for binary classification.

Fit the data into the Bernoulli Naive Bayes classifier.

In [None]:
# Write your code here.


<details>
  <summary>Click for solution: <font color="sky blue"><b>Bernoulli Naive Bayes</b></font></summary>
  <pre>
    <code style="display: block;">
      # Bernoulli Naive Bayes classifier solution
      from sklearn.naive_bayes import BernoulliNB
      model = BernoulliNB()
      model.fit(xtrain, ytrain)
    </code>
  </pre>
</details>

Try the model out with some random sentences based on mental health.

In [None]:
# Write your code here.


<details>
  <summary>Click for solution: <font color="sky blue"><b>Example 1</b></font></summary>
  <pre>
    <code style="display: block;">
      # Example 1 solution
      user = input("Enter a Text: ")
      data = cv.transform([user]).toarray()
      output = model.predict(data)
      print(output)
    </code>
  </pre>
</details>

<details>
  <summary>Click for solution: <font color="sky blue"><b>Example 2</b></font></summary>
  <pre>
    <code style="display: block;">
      # Example 2 solution
      user = input("Enter a Text: ")
      data = cv.transform([user]).toarray()
      output = model.predict(data)
      print(output)
    </code>
  </pre>
</details>

## **Summary**

**Congratulations on completing the Stress Detection Project!**

We hope you have learned more about detecting stress. You can even use this project to test out whether you are stressed or not through what language you use.