# Data for Good: predicting suicidal behavior likelihood among Reddit users using Deep Learning

*Deep Learning and Reinforcement Learning (part of IBM Machine Learning Professional Certificate - Course Project.*

>*No one is useless in this world who lightens the burdens of another.*  
― **Charles Dickens**

<img src='https://www.discover-norway.no/upload/images/-development/header/desktop/kul_munch/edvard%20munch%20the%20scream%201893_munchmmuseet.jpg'></img>

## Table of contents
1. [Introduction: the project](#project)  
2. [Methodology](#methodology)  
3. [Data Understanding](#data)  
  3.1. [Data Cleaning](#cleaning)  
  3.2. [Exploratory Data Aanalysis](#eda)  
  3.3. [Data Preparation](#preparation)  
4. [Model Development: Recurrent Neural Network](#model)  
  3.1. [...](#kmeans)  
  3.2. [...](#hac)  
  3.3. [...](#dbscan)  
5. [Results](#results)  
6. [Discussion](#discussion)  
7. [Conclusion](#conclusion)  
  7.1. [Project Summary](#summary)  
  7.2. [Outcome of the Analysis](#outcome)  
  7.3. [Potential Developments](#developments)

## 1. Introduction: the project <a name=project></a>

Data for good means using Data Science and Machine Learning tools outside of the for-profit sector, to help Non-profits, NGOs, or any other organization or individual, leverage the power of data for good causes and to improve the life of others.

There are many ways to use the power of Data Science for good: data can be used to solve social issues, environmental problems, enhance community security as well as improve people's health. Nowadays, the use of Social Media, forums or news aggregation websites is massively widespread, with people sharing plenty of details about their life. Some people also use internet to share very serious issues, as a cry for help.  
**The scope of this project is using the content created by the users themselves, in an online community, to analyze underlying mental health issues, and try to predict suicidal behavior likelihood among anonymized users.**

The algorithm, if successful, can be used for **targeted suicide intervention:** identify those people at highest risk of self-harm or suicide, so that actions can be undertaken to provide help and support in a timely-fashioned and sustained manner.

## 2. Methodology <a name=methodology></a>

Based on the project requirement, I'll follow a **predictive analytic approach aimed to correctly classify user posts into the correct category.**

To deliver reliable results, I'll follow the <a href='https://en.wikipedia.org/wiki/Cross-industry_standard_process_for_data_mining'>**Cross-Industry Standard Process for Data Mining (CRISP-DM)**</a>, which consists of the following steps:  
1. **Business Understanding** (see the Introduction section)  
2. **Data Understanding**: data cleaning and exploratory data analysis.  
3. **Data Preparation**: transform data into a usable dataset for modeling.  
4. **Modeling**: I 'll build 3 Deep Learning models, using the Recurrent Neural Network class:  
   4.1. Simple Recurrent Neural Network, with an additional Dense layer to output the predicted classification.  
   4.2. Long-Short Term Memory (LSTM) Networks  
   4.3. Gated Recurrent Unit (GRU) Networks  
5. **Evaluation**: model performances will be evaluated using the following metrics: loss function, Jaccard Index (Accuracy Score), Confusion Matrix, Classification Report, Precision/Recall, F1-Score (curves like ROC or precision-recall, but in am not sure I can use curves for multi-class classification) as well as visually inspecting the model results.

---

data source: https://www.kaggle.com/datasets/thedevastator/c-ssrs-labeled-suicidality-in-500-anonymized-red
https://zenodo.org/record/2667859#.Y9aqCXZBw2z

In [2]:
import pandas as pd
data = pd.read_csv(r'500_anonymized_Reddit_users_posts_labels.csv')

In [3]:
data

Unnamed: 0,User,Post,Label
0,user-0,"['Its not a viable option, and youll be leavin...",Supportive
1,user-1,['It can be hard to appreciate the notion that...,Ideation
2,user-2,"['Hi, so last night i was sitting on the ledge...",Behavior
3,user-3,['I tried to kill my self once and failed badl...,Attempt
4,user-4,['Hi NEM3030. What sorts of things do you enjo...,Ideation
...,...,...,...
495,user-495,"['Its not the end, it just feels that way. Or ...",Supportive
496,user-496,"['It was a skype call, but she ended it and Ve...",Indicator
497,user-497,['That sounds really weird.Maybe you were Dist...,Supportive
498,user-498,['Dont know there as dumb as it sounds I feel ...,Attempt


In [6]:
data.loc[26:27]

Unnamed: 0,User,Post,Label
26,user-26,"['So your place could use a cleaning, I dont t...",Indicator
27,user-27,"['Thanks for the effort, but you missed the po...",Ideation


In [9]:
data.loc[76:78]

Unnamed: 0,User,Post,Label
76,user-76,"['Haha, as they say you can do anything you pu...",Supportive
77,user-77,"['Then lets explore that, because self-hate is...",Behavior
78,user-78,['I really dont think its that limited. Ive l...,Indicator


In [7]:
data.shape

(500, 3)

In [15]:
data_1 = pd.read_csv(r'500_Reddit_users_posts_labels.csv')

In [16]:
data_1

Unnamed: 0,User,Post,Label
0,user-0,"['Its not a viable option, and youll be leavin...",Supportive
1,user-1,['It can be hard to appreciate the notion that...,Ideation
2,user-2,"['Hi, so last night i was sitting on the ledge...",Behavior
3,user-3,['I tried to kill my self once and failed badl...,Attempt
4,user-4,['Hi NEM3030. What sorts of things do you enjo...,Ideation
...,...,...,...
495,user-495,"['Its not the end, it just feels that way. Or ...",Supportive
496,user-496,"['It was a skype call, but she ended it and Ve...",Indicator
497,user-497,['That sounds really weird.Maybe you were Dist...,Supportive
498,user-498,['Dont know there as dumb as it sounds I feel ...,Attempt


In [17]:
data_1.loc[26:27]

Unnamed: 0,User,Post,Label
26,user-26,"['So your place could use a cleaning, I dont t...",Indicator
27,user-27,"['Thanks for the effort, but you missed the po...",Ideation


In [18]:
data_1.loc[76:78]

Unnamed: 0,User,Post,Label
76,user-76,"['Haha, as they say you can do anything you pu...",Supportive
77,user-77,"['Then lets explore that, because self-hate is...",Behavior
78,user-78,['I really dont think its that limited. Ive l...,Indicator


In [19]:
data_1.loc[497:]

Unnamed: 0,User,Post,Label
497,user-497,['That sounds really weird.Maybe you were Dist...,Supportive
498,user-498,['Dont know there as dumb as it sounds I feel ...,Attempt
499,user-499,"['&gt;It gets better, trust me.Ive spent long ...",Behavior


In [20]:
data_1.shape

(500, 3)