# Predicting Anxiety and Depression Based on Social Media Use Patterns

The relationship between social media use and mental health outcomes is not fully understood.
Given that social media is widespread among many different demographics, it is critical to understand this relationship and the effects social media has on mental health in order to properly treat mental disorders and improve quality of life for those suffering from them.

As "mental health" is a far reaching and abstractly defined concept, this project will focus on two of the most common mental health disorders, anxiety and depression, and will use various machine learning classifiers to isolate the relationship between these disorders and social media use in individuals and create models capable of predicting depression and anxiety.

### The impact of mental illness

Mental illness impacts individuals, and by extension society, in a variety of ways.

According to the World Health Organization (WHO):
- Mental health conditions can cause difficulties in all aspects of life, including relationships with family, friends and community.
- In 2019, 970 million people globally were living with a mental disorder, with anxiety and depression the most common. This means approximately **1 in 8[/bold] people suffer from some form of mental illness.
- In 2019, 301 million people were living with an anxiety disorder including 58 million children and adolescents.
- In 2019, 280 million people were living with depression, including 23 million children and adolescents.

The high prevalence of both social media use and mental health disorders combined with the currently limited understanding of the way they interact makes further analysis of the subject crucial to public health.

# Project goals

Despite the prevalence of these disorders, detection and diagnosis still pose a significant challenge due to several factors:
- Diagnosis of anxiety and depression is made according to the self reported feelings of the patient.

  This means that factors such as the individuals personal feelings on mental health, their willingness to accept help, and social stigma all play a part in the detection of these disorders in a way which is not present with physical conditions.
-  Disease comorbidity

   The existence of two or more mental health disorders in an individual is common, and those with one type of mental disorder often develop other types of mental disorders.
   Moreover, many disorders share similar symptoms, making it difficult to identify the primary condition.

The primary goal of this project is to create a predictive model for detection of anxiety and depression based on individual social media use patterns.
Through this, I hope to create another tool for individuals and health care professionals to use in the difficult task of mental health diagnosis.

The secondary goal of this project is to map the relationship between mental health and social media, and to identify healthy and unhealthy social media use patterns using machine learning tools. 

# Exploratory Data Analysis
### The Dataset

The dataset used in this project is comprised of 482 responses to a survey conducted on Bangladeshi citizens.
The first 8 questions are designed to understand the demographics and social media use patterns of the participants. 
The last 12 questions are designed to get various mental health indicators regarding the participants, and responses are based on the Likert scale (meaning a low score of 1 indicates that the participant "strongly disagrees" with the question, and a high score of 5 means the participant "strongly agrees").

The set: https://docs.google.com/spreadsheets/d/1lWFIL7h0F7xtmJHNPJX7ttPkO4v9j3xQ2E9Qb1wjek4/edit?usp=sharing

### Importing and loading the data

In [7]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

In [8]:
data = pd.read_csv('formresponses.csv')

### Data Preprocessing

We have ample information regarding our parameters, and so we will begin by performing manual dimensionality reduction to remove irrelevant data.

Let us examine our variables:

In [12]:
list(data.columns)

['Timestamp',
 '1. What is your age?',
 '2. Gender',
 '3. Relationship Status',
 '4. Occupation Status',
 '5. What type of organizations are you affiliated with?',
 '6. Do you use social media?',
 '7. What social media platforms do you commonly use?',
 '8. What is the average time you spend on social media every day?',
 '9. How often do you find yourself using Social media without a specific purpose?',
 '10. How often do you get distracted by Social media when you are busy doing something?',
 "11. Do you feel restless if you haven't used Social media in a while?",
 '12. On a scale of 1 to 5, how easily distracted are you?',
 '13. On a scale of 1 to 5, how much are you bothered by worries?',
 '14. Do you find it difficult to concentrate on things?',
 '15. On a scale of 1-5, how often do you compare yourself to other successful people through the use of social media?',
 '16. Following the previous question, how do you feel about these comparisons, generally speaking?',
 '17. How often do

Since the main focus of this project is an analysis related to depression and anxiety, we will remove all mental health indicators that are unrelated to these two conditions. The symptoms we require indicators for, and the questions that are relevant to them, are as follows:

<u>Depression:</u>
- a depressed mood (question 18)
- a loss of pleasure or interest in activities (question 19)
- poor concentration (questions 10, 12, 14)
- feelings of excessive guilt or low self-worth (question 15)
- hopelessness about the future (no relevant questions)
- thoughts about dying or suicide (no relevant questions)
- disrupted sleep (question 20)
- changes in appetite or weight (no relevant questions)
- feeling very tired or low in energy. (question 18)

<u>Anxiety:</u>
- excessive fear or worry about a specific situation or a broad range of everyday situations (question 13)
- poor concentration (questions 10, 12, 14)
- feeling irritable, tense or restless (question 11)
- experiencing nausea or abdominal distress (no relevant questions)
- having heart palpitations (no relevant questions)
- sweating, trembling or shaking (no relevant questions)
- disrupted sleep (question 20)
- having a sense of impending danger, panic or doom (question 13)

(symptoms for both disorders are according to the WHO: https://www.who.int/news-room/fact-sheets/detail/depression, https://www.who.int/news-room/fact-sheets/detail/anxiety-disorders)

We can see that all the questions that are mental health indicators are relevant to our analysis except questions 9, 16 and 17.
In addition, the first column indicates a timestamp for when the participant took the survery, which we dont need.
So, we will remove these columns from our dataframe.

In [14]:
data = data.drop(columns=['Timestamp', '9. How often do you find yourself using Social media without a specific purpose?', 
                          '16. Following the previous question, how do you feel about these comparisons, generally speaking?',
                          '17. How often do you look to seek validation from features of social media?'], axis=1)

we will also rename the columns for the sake of simplicity:

In [16]:
data.rename(columns = {'1. What is your age?':'Age','2. Gender':'Sex','3. Relationship Status':'Relationship Status',
                       '4. Occupation Status':'Occupation',
                       '5. What type of organizations are you affiliated with?':'Affiliations',
                       '6. Do you use social media?':'Social Media User?',
                       '7. What social media platforms do you commonly use?':'Platforms Used',
                       '8. What is the average time you spend on social media every day?':'Time Spent',
                       '10. How often do you get distracted by Social media when you are busy doing something?':'Distracted by SM',
                       "11. Do you feel restless if you haven't used Social media in a while?":'Restlessness',
                       '12. On a scale of 1 to 5, how easily distracted are you?':'Easily Distracted',
                       '13. On a scale of 1 to 5, how much are you bothered by worries?':'Anxious',
                       '14. Do you find it difficult to concentrate on things?':'Difficulty Concentrating',
                       '15. On a scale of 1-5, how often do you compare yourself to other successful people through the use of social media?':'SM Comparison',
                       '18. How often do you feel depressed or down?':'Depressed',
                       '19. On a scale of 1 to 5, how frequently does your interest in daily activities fluctuate?':'Loss of Interest in Activities',
                       '20. On a scale of 1 to 5, how often do you face issues regarding sleep?':'Sleep Issues' },inplace=True)

### Depression and Anxiety scores
In order to train machine learning models on our data, we must first quantify the likelihood of depression and anxiety in our participants. Since the answers for each indicator are on a scale of 1 to 5, we will do this by simply summing up the answers related to each condition, and adding them as columns "Anxiety score" and "Depression score". We will also double the weight of questions 13 and 18, since they are the primary symptoms of each condition and are therefor the most direct indicators of anxiety/depression. 

In [35]:
# create arrays for each condition with the appropriate indicators according to the WHO as specified above

Depression = ['Depressed', 'Loss of Interest in Activities', 'Distracted by SM','Easily Distracted', 'Difficulty Concentrating', 'SM Comparison', 'Sleep Issues']
Anxiety = ['Anxious', 'Restlessness', 'Distracted by SM','Easily Distracted', 'Difficulty Concentrating', 'Sleep Issues']

# sum up scores for depression and anxiety
data['Depression score'] = data[Depression].sum(axis=1)
data['Anxiety score'] = data[Anxiety].sum(axis=1)

# double the weight of questions 13 and 18 by adding them to the score columns again.
data['Depression score'] += data['Depressed']
data['Anxiety score'] += data['Anxious']

data.head(5)

Unnamed: 0,Age,Sex,Relationship Status,Occupation,Affiliations,Social Media User?,Platforms Used,Time Spent,Distracted by SM,Restlessness,Easily Distracted,Anxious,Difficulty Concentrating,SM Comparison,Depressed,Loss of Interest in Activities,Sleep Issues,Depression score,Anxiety score
0,21.0,Male,In a relationship,University Student,University,Yes,"Facebook, Twitter, Instagram, YouTube, Discord...",Between 2 and 3 hours,3,2,5,2,5,2,5,4,5,34,24
1,21.0,Female,Single,University Student,University,Yes,"Facebook, Twitter, Instagram, YouTube, Discord...",More than 5 hours,3,2,4,5,4,5,5,4,5,35,28
2,21.0,Female,Single,University Student,University,Yes,"Facebook, Instagram, YouTube, Pinterest",Between 3 and 4 hours,2,1,2,5,4,3,4,2,5,26,24
3,21.0,Female,Single,University Student,University,Yes,"Facebook, Instagram",More than 5 hours,2,1,3,5,3,5,4,3,2,26,21
4,21.0,Female,Single,University Student,University,Yes,"Facebook, Instagram, YouTube",Between 2 and 3 hours,5,4,4,5,5,3,4,4,1,30,29


Now that the levels of our participants' depression and anxiety are quantified, we will begin visualing the data.