Twitter Depression Detection

Overview

Social media platforms such as Twitter, Instagram and Facebook play dominant roles in our day to day life. The popularity of these platforms has been significantly increased during the pandemic. Studies show that people are more likely to share their feelings and emotions on Twitter since the beginning of the Covid-19 pandemic. Positive emotions are not commonly associated with higher life satisfaction; however, negative emotions are more likely to express a person true feelings.

Depression is the most common mental disorder which is more than just being sad. Some signs of depression are lack of interest in daily activities, significant weight loss or gain, insomnia or excessive sleeping, lack of energy, inability to concentrate, feelings of worthlessness or excessive guilt and in severe cases recurrent thoughts of death or suicide. Auspiciously, depression is treatable. The treatment is a combination of therapy and antidepressant medication.

Background and Motivation

Large volumes of data which can be retrieve from social media platforms such as Twitter can potentially provide valuable insights into human behaviour and emotions.
Twitter is one of the most common platform for people to share their emotions and opinions which could be used to provide a better understanding of their mental health and wellbeing, people’s everyday decision-making and perceptions about their quality of life.
Depression is the common mental disorder and which may result in suicides. There are more than 300 million people suffer from depression every year globally.

Goals

The goal of this project is to implement supervised machine learning techniques in order to detect tweets containing depressive characteristics.

Datasets

We need two types of datasets one with tweets containing depressive characteristic which is obtained from twitter API and the other one with random tweets.

Data mining more than 20K tweets by using Twitter API and Tweepy library. The raw data retrieved from Twitter can be find here.
Random tweets has been extracted from the Kaggle datasets.

The processed dataset used for training machine learning algorithms can be find here.

Data Science Pipeline:

Data Collection : Balanced dataset collected from Twitter API and Kaggle dataset.
Data Preprocess: Data Cleaning/exploring/processing/Anotation/Analysis via NLP libraries.
EDA and Feature Selection : CountVectorizer, TFIDF, spaCy word embedding model, spaCy word embedding model.
Model Selection : Logistics Regression, support vector machine(SVM), k-nearest neighbors(k-NN), Decision Tree Classifier, Random Forest Classifier, Neural Network, LSTM
Model Training : Scikit-Learn
Inference : F1-Score, Confusion matrix and ROC-AUC to make an inference
Model Deployment : Deployment on AWS or heroku
Data Product : Flask-based web application

Usage

Clone this repository

git clone https://github.com/miladrezazadeh/twitter_depression_detection.git

Create a virtual environment

python3 -m venv env

Activating a virtual environment

source env/bin/activate

Libaries to install

Use the package manager pip to install requred libraries.

pip install -r requirements.txt

Download en_core_web_lg from Spacy

python -m spacy download en_core_web_lg

Clean the dataset

python clean.py <file_name>

Train the best model

python train.py <file_name> <model_name>

Predict

python predict.py <tweet.txt> SVM

Run the Flask Application

Start flask web server: python app.py
The server will start on the address http://127.0.0.1:5000 [if port 5000 is not occupied]

License

This repo has a MIT License.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
data		data
img		img
models		models
notebooks		notebooks
src		src
static		static
templates		templates
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
app.py		app.py
app_utilities.py		app_utilities.py
requirements.txt		requirements.txt

License

Amey-Thakur/DEPRESSION_DETECTION_USING_TWEETS

Folders and files

Latest commit

History

Repository files navigation

Twitter Depression Detection

Table of Contents

Overview

Background and Motivation

Goals

Datasets

Data Science Pipeline:

Usage

License

References

About

Topics

Resources

License

Stars

Watchers

Forks

Languages