# Exploring the Relationship between Social Media and Mental Well-being 

The study consists of 7 variables and 12 Likert scale-based questions designed to measure the frequency or intensity of various aspects of mental health. Responses range from 1 (indicating low frequency or intensity) to 5 (indicating high frequency or intensity). This project aims at investigating the potential correlation between the amount of time an individual spends on social media and the impact it has on their mental health.

## Data Description

**Variables**

1. Age
2. Gender
3. Relationship Status
4. Occupation Status
5. Affiliated Organizations
6. Social Medias Used
7. Time spent - social media use, in hours
8. Questions

**Below are the questions from the questionaire used to gauge the participants' intensity of mental health symptoms -**

1. Purposeless use of Social Media [ADHD] - Question 9
2. Distracted by Social Media [ADHD] - Question 10
3. Restlessness if Social Media not used [Anxiety] - Question 11
4. Ease of Distraction by Social Media [ADHD] - Question 12
5. Bothered by worries [Anxiety] - Question 13
6. Difficulty in concentrating [ADHD] - Question 14
7. Comparison of self to peers [Self Esteem] - Question 15
8. Feelings about above comparison [Self Esteem] - Question 16
9. Validation sought from Social Media [Self Esteem] - Question 17
10. Feelings of Depression [Depression] - Question 18
11. Fluctuation of interest [Depression] - Question 19
12. Sleep Issues [Depression] - Question 20

# Importing libraries and loading datasets

In [27]:
#import necessary libraries
import pandas as pd

In [28]:
#loading project dataset
data=pd.read_csv("C:\Questionnaire  (Responses) - Form Responses 1.csv")


  data=pd.read_csv("C:\Questionnaire  (Responses) - Form Responses 1.csv")


In [29]:
# display all columns
pd.set_option("display.max_columns",None)

In [30]:
#inspect first few rows
data.head()

Unnamed: 0,Timestamp,1. What is your age?,2. Gender,3. Relationship Status,4. Occupation Status,5. What type of organizations are you affiliated with?,6. Do you use social media?,7. What social media platforms do you commonly use?,8. What is the average time you spend on social media every day?,9. How often do you find yourself using Social media without a specific purpose?,10. How often do you get distracted by Social media when you are busy doing something?,11. Do you feel restless if you haven't used Social media in a while?,"12. On a scale of 1 to 5, how easily distracted are you?","13. On a scale of 1 to 5, how much are you bothered by worries?",14. Do you find it difficult to concentrate on things?,"15. On a scale of 1-5, how often do you compare yourself to other successful people through the use of social media?","16. Following the previous question, how do you feel about these comparisons, generally speaking?",17. How often do you look to seek validation from features of social media?,18. How often do you feel depressed or down?,"19. On a scale of 1 to 5, how frequently does your interest in daily activities fluctuate?","20. On a scale of 1 to 5, how often do you face issues regarding sleep?"
0,4/18/2022 19:18:47,21.0,Male,In a relationship,University Student,University,Yes,"Facebook, Twitter, Instagram, YouTube, Discord...",Between 2 and 3 hours,5,3,2,5,2,5,2,3,2,5,4,5
1,4/18/2022 19:19:28,21.0,Female,Single,University Student,University,Yes,"Facebook, Twitter, Instagram, YouTube, Discord...",More than 5 hours,4,3,2,4,5,4,5,1,1,5,4,5
2,4/18/2022 19:25:59,21.0,Female,Single,University Student,University,Yes,"Facebook, Instagram, YouTube, Pinterest",Between 3 and 4 hours,3,2,1,2,5,4,3,3,1,4,2,5
3,4/18/2022 19:29:43,21.0,Female,Single,University Student,University,Yes,"Facebook, Instagram",More than 5 hours,4,2,1,3,5,3,5,1,2,4,3,2
4,4/18/2022 19:33:31,21.0,Female,Single,University Student,University,Yes,"Facebook, Instagram, YouTube",Between 2 and 3 hours,3,5,4,4,5,5,3,3,3,4,4,1


In [31]:
data.shape

(481, 21)

# Data Cleaning and Preprocessing

Dataset is first start off by renaming the columns to simplify the project.

In [32]:
data.rename(columns = {'1. What is your age?':'Age','2. Gender':'Sex','3. Relationship Status':'Relationship Status',
                       '4. Occupation Status':'Occupation',
                       '5. What type of organizations are you affiliated with?':'Affiliations',
                       '6. Do you use social media?':'Social Media User?',
                       '7. What social media platforms do you commonly use?':'Platforms Used',
                       '8. What is the average time you spend on social media every day?':'Time Spent',
                       '9. How often do you find yourself using Social media without a specific purpose?':'ADHD Q1',
                       '10. How often do you get distracted by Social media when you are busy doing something?':'ADHD Q2',
                       "11. Do you feel restless if you haven't used Social media in a while?":'Anxiety Q1',
                       '12. On a scale of 1 to 5, how easily distracted are you?':'ADHD Q3',
                       '13. On a scale of 1 to 5, how much are you bothered by worries?':'Anxiety Q2',
                       '14. Do you find it difficult to concentrate on things?':'ADHD Q4',
                       '15. On a scale of 1-5, how often do you compare yourself to other successful people through the use of social media?':'Self Esteem Q1',
                       '16. Following the previous question, how do you feel about these comparisons, generally speaking?':'Self Esteem Q2',
                       '17. How often do you look to seek validation from features of social media?':'Self Esteem Q3',
                       '18. How often do you feel depressed or down?':'Depression Q1',
                       '19. On a scale of 1 to 5, how frequently does your interest in daily activities fluctuate?':'Depression Q2',
                       '20. On a scale of 1 to 5, how often do you face issues regarding sleep?':'Depression Q3' },inplace=True)

Columns are then rearranged.

In [33]:
column_order = [
    'Timestamp',
    'Age',
    'Sex',
    'Relationship Status',
    'Occupation',
    'Affiliations',
    'Social Media User?',
    'Platforms Used',
    'Time Spent',
    'ADHD Q1',
    'ADHD Q2',
    'ADHD Q3',
    'ADHD Q4',
    'Anxiety Q1',
    'Anxiety Q2',
    'Self Esteem Q1',
    'Self Esteem Q2',
    'Self Esteem Q3',
    'Depression Q1',
    'Depression Q2',
    'Depression Q3'
]

In [34]:
data = data[column_order]

Checking for null values or duplicates

In [35]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 481 entries, 0 to 480
Data columns (total 21 columns):
 #   Column               Non-Null Count  Dtype  
---  ------               --------------  -----  
 0   Timestamp            481 non-null    object 
 1   Age                  481 non-null    float64
 2   Sex                  481 non-null    object 
 3   Relationship Status  481 non-null    object 
 4   Occupation           481 non-null    object 
 5   Affiliations         451 non-null    object 
 6   Social Media User?   481 non-null    object 
 7   Platforms Used       481 non-null    object 
 8   Time Spent           481 non-null    object 
 9   ADHD Q1              481 non-null    int64  
 10  ADHD Q2              481 non-null    int64  
 11  ADHD Q3              481 non-null    int64  
 12  ADHD Q4              481 non-null    int64  
 13  Anxiety Q1           481 non-null    int64  
 14  Anxiety Q2           481 non-null    int64  
 15  Self Esteem Q1       481 non-null    int

Number of records in 'Afflictions' column is less than 481. It is assumed that N/A values indicates the individuals are not afflicted with anyone.

In [36]:
data.duplicated().sum()

0

There are no duplicates to be found.

# Data Transformation


### Age Group

In [37]:
float_entries = data[data['Age'] % 1 != 0]['Age']
float_entries

382    26.7
Name: Age, dtype: float64

Due to this single data entry which is a float value, 'Age' column is detected as a float64 data type.

In [38]:
#convert Age column from float64 to int64
data['Age'] = data['Age'].astype('int64')

In [39]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 481 entries, 0 to 480
Data columns (total 21 columns):
 #   Column               Non-Null Count  Dtype 
---  ------               --------------  ----- 
 0   Timestamp            481 non-null    object
 1   Age                  481 non-null    int64 
 2   Sex                  481 non-null    object
 3   Relationship Status  481 non-null    object
 4   Occupation           481 non-null    object
 5   Affiliations         451 non-null    object
 6   Social Media User?   481 non-null    object
 7   Platforms Used       481 non-null    object
 8   Time Spent           481 non-null    object
 9   ADHD Q1              481 non-null    int64 
 10  ADHD Q2              481 non-null    int64 
 11  ADHD Q3              481 non-null    int64 
 12  ADHD Q4              481 non-null    int64 
 13  Anxiety Q1           481 non-null    int64 
 14  Anxiety Q2           481 non-null    int64 
 15  Self Esteem Q1       481 non-null    int64 
 16  Self Est

### Sex Group

In [40]:
data['Sex'].unique()

array(['Male', 'Female', 'Nonbinary ', 'Non-binary', 'NB', 'unsure ',
       'Trans', 'Non binary ', 'There are others???'], dtype=object)

Individuals who answered "There are others???" are deemed to have not taken the questionnaire seriously. As a result, all entries with this response will be excluded.

In [41]:
# Drop entries with "There are others???" in the 'Sex' column
data = data[data['Sex'] != 'There are others???']

Other unique entries in the 'Sex' column can be considered under as 'Others' category.

In [42]:
#Grouping unique entries as 'Others' category
data.replace('Non-binary','Others', inplace=True)
data.replace('Nonbinary ','Others', inplace=True)
data.replace('NB','Others', inplace=True)
data.replace('unsure ','Others', inplace=True)
data.replace('Non binary ','Others', inplace=True)
data.replace('Trans','Others', inplace=True)

In [43]:
data['Sex'].unique()

array(['Male', 'Female', 'Others'], dtype=object)

All unique entries that fall under the 'Others' category are successfully grouped.

### Scoring Adjustment

There needs to be a slight adjustment in the scoring system of column 'Self Esteem Q2'. The question was -

"Following the previous question, how do you feel about these comparisons, generally speaking?".

Due to the nature of this question, the understanding to answering it is assumed to be as follows:-

Very Negative - 1

Slightly Negative - 2

Neutral - 3

Slightly Positive - 4

Very Positive - 5

In this project, we are only exploring the negative aspects of mental health. Hence, participants with Neutral and Positve answers are not relevant. They will be converted to 0 values.

In [44]:
#setting scores of 3,4 and 5 to 0.
data['Self Esteem Q2'] = data['Self Esteem Q2'].replace([3, 4, 5], 0)

In [45]:
data.head(
    
)

Unnamed: 0,Timestamp,Age,Sex,Relationship Status,Occupation,Affiliations,Social Media User?,Platforms Used,Time Spent,ADHD Q1,ADHD Q2,ADHD Q3,ADHD Q4,Anxiety Q1,Anxiety Q2,Self Esteem Q1,Self Esteem Q2,Self Esteem Q3,Depression Q1,Depression Q2,Depression Q3
0,4/18/2022 19:18:47,21,Male,In a relationship,University Student,University,Yes,"Facebook, Twitter, Instagram, YouTube, Discord...",Between 2 and 3 hours,5,3,5,5,2,2,2,0,2,5,4,5
1,4/18/2022 19:19:28,21,Female,Single,University Student,University,Yes,"Facebook, Twitter, Instagram, YouTube, Discord...",More than 5 hours,4,3,4,4,2,5,5,1,1,5,4,5
2,4/18/2022 19:25:59,21,Female,Single,University Student,University,Yes,"Facebook, Instagram, YouTube, Pinterest",Between 3 and 4 hours,3,2,2,4,1,5,3,0,1,4,2,5
3,4/18/2022 19:29:43,21,Female,Single,University Student,University,Yes,"Facebook, Instagram",More than 5 hours,4,2,3,3,1,5,5,1,2,4,3,2
4,4/18/2022 19:33:31,21,Female,Single,University Student,University,Yes,"Facebook, Instagram, YouTube",Between 2 and 3 hours,3,5,4,5,4,5,3,0,3,4,4,1


Values in 'Self Esteem Q2' can be seen is properly adjusted.

### Summation of scores for different aspects of mental wellbeing

Questionaire measures 4 aspects of mental wellbeing which are -

1. Attention Deficit Hyperactivity Disorder (ADHD)
2. Anxiety
3. Self Esteem
4. Depression

In [47]:
# Group columns by categories
adhd_columns = ['ADHD Q1', 'ADHD Q2', 'ADHD Q3', 'ADHD Q4']
anxiety_columns = ['Anxiety Q1', 'Anxiety Q2']
self_esteem_columns = ['Self Esteem Q1', 'Self Esteem Q2', 'Self Esteem Q3']
depression_columns = ['Depression Q1', 'Depression Q2', 'Depression Q3']

# Sum the scores for each group
data['ADHD Score'] = data[adhd_columns].sum(axis=1)
data['Anxiety Score'] = data[anxiety_columns].sum(axis=1)
data['Self Esteem Score'] = data[self_esteem_columns].sum(axis=1)
data['Depression Score'] = data[depression_columns].sum(axis=1)

# Create a 'Total Score' column
data['Total Score'] = data[['ADHD Score', 'Anxiety Score', 'Self Esteem Score', 'Depression Score']].sum(axis=1)


In [48]:
data.head()

Unnamed: 0,Timestamp,Age,Sex,Relationship Status,Occupation,Affiliations,Social Media User?,Platforms Used,Time Spent,ADHD Q1,ADHD Q2,ADHD Q3,ADHD Q4,Anxiety Q1,Anxiety Q2,Self Esteem Q1,Self Esteem Q2,Self Esteem Q3,Depression Q1,Depression Q2,Depression Q3,ADHD Score,Anxiety Score,Self Esteem Score,Depression Scpre,Depression Score,Total Score
0,4/18/2022 19:18:47,21,Male,In a relationship,University Student,University,Yes,"Facebook, Twitter, Instagram, YouTube, Discord...",Between 2 and 3 hours,5,3,5,5,2,2,2,0,2,5,4,5,18,4,4,14,14,40
1,4/18/2022 19:19:28,21,Female,Single,University Student,University,Yes,"Facebook, Twitter, Instagram, YouTube, Discord...",More than 5 hours,4,3,4,4,2,5,5,1,1,5,4,5,15,7,7,14,14,43
2,4/18/2022 19:25:59,21,Female,Single,University Student,University,Yes,"Facebook, Instagram, YouTube, Pinterest",Between 3 and 4 hours,3,2,2,4,1,5,3,0,1,4,2,5,11,6,4,11,11,32
3,4/18/2022 19:29:43,21,Female,Single,University Student,University,Yes,"Facebook, Instagram",More than 5 hours,4,2,3,3,1,5,5,1,2,4,3,2,12,6,8,9,9,35
4,4/18/2022 19:33:31,21,Female,Single,University Student,University,Yes,"Facebook, Instagram, YouTube",Between 2 and 3 hours,3,5,4,5,4,5,3,0,3,4,4,1,17,9,6,9,9,41


In [49]:
data.drop(data.iloc[:, 9:21], inplace = True, axis = 1)
data.drop(['Timestamp'], inplace = True, axis = 1)

In [56]:
data.head()

Unnamed: 0,Age,Sex,Relationship Status,Occupation,Affiliations,Social Media User?,Platforms Used,Time Spent,ADHD Score,Anxiety Score,Self Esteem Score,Depression Score,Total Score
0,21,Male,In a relationship,University Student,University,Yes,"Facebook, Twitter, Instagram, YouTube, Discord...",Between 2 and 3 hours,18,4,4,14,40
1,21,Female,Single,University Student,University,Yes,"Facebook, Twitter, Instagram, YouTube, Discord...",More than 5 hours,15,7,7,14,43
2,21,Female,Single,University Student,University,Yes,"Facebook, Instagram, YouTube, Pinterest",Between 3 and 4 hours,11,6,4,11,32
3,21,Female,Single,University Student,University,Yes,"Facebook, Instagram",More than 5 hours,12,6,8,9,35
4,21,Female,Single,University Student,University,Yes,"Facebook, Instagram, YouTube",Between 2 and 3 hours,17,9,6,9,41


In [58]:
data['Total Score'].max()

55

Total Score column indicates the extext to which an individual experiences negative mental health symptoms. A maximum total score of 55 can be obtained from the questionnaire, which signifies that the individual is definitely experiencing negative symptoms in some aspect of mental health.

### Adding an 'Outcome' column



A participant scoring 3 out of 5 on every question (12 questions, scoring 3 on each except for a score of 2 on self-esteem question #2, totaling 35) suggests slight to moderate symptoms in all aspects of mental health, but these may not be severe or frequent.

Thus, we set a threshold score of 42, above which an individual is very likely suffering severely and frequently from some symptoms, prompting a strong recommendation for a mental health checkup.

An Outcome of 0 means the individual is not confirmed to be experiencing severe mental health symptoms, so a checkup is not deemed necessary.

An Outcome of 1 indicates that the individual is definitely experiencing severe negative mental health symptoms, and a checkup is recommended.

In [64]:
# Create the 'Outcome' column
data['Outcome'] = data['Total Score'].apply(lambda x: 1 if x >= 42 else 0)

In [65]:
data.head()

Unnamed: 0,Age,Sex,Relationship Status,Occupation,Affiliations,Social Media User?,Platforms Used,Time Spent,ADHD Score,Anxiety Score,Self Esteem Score,Depression Score,Total Score,Outcome
0,21,Male,In a relationship,University Student,University,Yes,"Facebook, Twitter, Instagram, YouTube, Discord...",Between 2 and 3 hours,18,4,4,14,40,0
1,21,Female,Single,University Student,University,Yes,"Facebook, Twitter, Instagram, YouTube, Discord...",More than 5 hours,15,7,7,14,43,1
2,21,Female,Single,University Student,University,Yes,"Facebook, Instagram, YouTube, Pinterest",Between 3 and 4 hours,11,6,4,11,32,0
3,21,Female,Single,University Student,University,Yes,"Facebook, Instagram",More than 5 hours,12,6,8,9,35,0
4,21,Female,Single,University Student,University,Yes,"Facebook, Instagram, YouTube",Between 2 and 3 hours,17,9,6,9,41,0


In [66]:
data.describe()

Unnamed: 0,Age,ADHD Score,Anxiety Score,Self Esteem Score,Depression Score,Total Score,Outcome
count,480.0,480.0,480.0,480.0,480.0,480.0,480.0
mean,26.14375,13.460417,6.147917,5.854167,9.639583,35.102083,0.266667
std,9.923621,3.898302,2.08709,2.464695,3.104528,9.262581,0.442678
min,13.0,4.0,2.0,2.0,3.0,12.0,0.0
25%,21.0,11.0,5.0,4.0,7.75,29.0,0.0
50%,22.0,14.0,6.0,6.0,10.0,36.0,0.0
75%,26.0,16.0,8.0,8.0,12.0,42.0,1.0
max,91.0,20.0,10.0,12.0,15.0,55.0,1.0


In [69]:
#check final shape of dataset
data.shape

(480, 14)

In [70]:
#save final dataset to a new CSV file
data.to_csv('sm_final.csv',index=False)

In [73]:
data['Occupation'].value_counts(normalize=True)*100

Occupation
University Student    60.833333
Salaried Worker       27.291667
School Student        10.208333
Retired                1.666667
Name: proportion, dtype: float64

Universty students cover almost 60% of the data sample, while retired individuals make up less than 2%.