# A Data-Driven Study on Social Media Usage and its Effects on Users

## Introduction

In this study, I will use a dataset found publicly on Kaggle to analyze data concerning social media and its effect on productivity, burnout, and stress.

The first step of this study is to perform a general Exploratory Data Analysis (EDA) of the dataset. Specific areas of interest, derived from glancing over the data provided, include:

1. Understanding how social media usage and burnout are related.
2. Understanding how social media usage and changes in stress levels are related.
3. Understanding how social media usage and changes in productivity are related.
4. Understanding how social media usage before bed affects sleep quality.
5. Understanding which age groups are more prone to the effects of social media.
6. Understanding which gender is more prone to the effects of social media.
7. Understanding which social media platforms impact users the most.

Based on the insights from the analysis, I will determine which points are relevant to the creation of a machine learning model that predicts user outcomes and behaviors associated with social media usage.

The dataset used for this study can be found publicly on Kaggle.com at (https://www.kaggle.com/datasets/mahdimashayekhi/social-media-vs-productivity?resource=download)

## Exploratory Data Analysis

In [2]:
import pandas as pd
import numpy as np

In [7]:
df = pd.read_csv("social_media_vs_productivity.csv")
df

Unnamed: 0,age,gender,job_type,daily_social_media_time,social_platform_preference,number_of_notifications,work_hours_per_day,perceived_productivity_score,actual_productivity_score,stress_level,sleep_hours,screen_time_before_sleep,breaks_during_work,uses_focus_apps,has_digital_wellbeing_enabled,coffee_consumption_per_day,days_feeling_burnout_per_month,weekly_offline_hours,job_satisfaction_score
0,56,Male,Unemployed,4.180940,Facebook,61,6.753558,8.040464,7.291555,4.0,5.116546,0.419102,8,False,False,4,11,21.927072,6.336688
1,46,Male,Health,3.249603,Twitter,59,9.169296,5.063368,5.165093,7.0,5.103897,0.671519,7,True,True,2,25,0.000000,3.412427
2,32,Male,Finance,,Twitter,57,7.910952,3.861762,3.474053,4.0,8.583222,0.624378,0,True,False,3,17,10.322044,2.474944
3,60,Female,Unemployed,,Facebook,59,6.355027,2.916331,1.774869,6.0,6.052984,1.204540,1,False,False,0,4,23.876616,1.733670
4,25,Male,IT,,Telegram,66,6.214096,8.868753,,7.0,5.405706,1.876254,1,False,True,1,30,10.653519,9.693060
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
29995,34,Female,Health,1.877297,Facebook,59,10.226358,3.348512,3.465815,8.0,5.480462,1.412655,9,False,False,4,5,21.776927,
29996,39,Male,Health,4.437784,Instagram,46,4.692862,8.133213,6.659294,8.0,3.045393,0.148936,3,False,False,1,29,4.111370,6.155613
29997,42,Male,Education,17.724981,TikTok,64,10.915036,8.611005,8.658912,5.0,5.491520,1.224296,10,False,False,1,2,1.888315,6.285237
29998,20,Female,Education,3.796634,Instagram,56,6.937410,7.767076,6.895583,8.0,6.816069,0.234483,1,False,False,2,9,12.511871,7.854711


In [5]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 30000 entries, 0 to 29999
Data columns (total 19 columns):
 #   Column                          Non-Null Count  Dtype  
---  ------                          --------------  -----  
 0   age                             30000 non-null  int64  
 1   gender                          30000 non-null  object 
 2   job_type                        30000 non-null  object 
 3   daily_social_media_time         27235 non-null  float64
 4   social_platform_preference      30000 non-null  object 
 5   number_of_notifications         30000 non-null  int64  
 6   work_hours_per_day              30000 non-null  float64
 7   perceived_productivity_score    28386 non-null  float64
 8   actual_productivity_score       27635 non-null  float64
 9   stress_level                    28096 non-null  float64
 10  sleep_hours                     27402 non-null  float64
 11  screen_time_before_sleep        27789 non-null  float64
 12  breaks_during_work              

In [10]:
df.shape

(30000, 19)

In [9]:
df.describe()

Unnamed: 0,age,daily_social_media_time,number_of_notifications,work_hours_per_day,perceived_productivity_score,actual_productivity_score,stress_level,sleep_hours,screen_time_before_sleep,breaks_during_work,coffee_consumption_per_day,days_feeling_burnout_per_month,weekly_offline_hours,job_satisfaction_score
count,30000.0,27235.0,30000.0,30000.0,28386.0,27635.0,28096.0,27402.0,27789.0,30000.0,30000.0,30000.0,30000.0,27270.0
mean,41.486867,3.113418,59.958767,6.990792,5.510488,4.951805,5.514059,6.500247,1.025568,4.9922,1.9993,15.557067,10.360655,4.964901
std,13.835221,2.074813,7.723772,1.997736,2.02347,1.883378,2.866344,1.464004,0.653355,3.173737,1.410047,9.252956,7.280415,2.121194
min,18.0,0.0,30.0,0.0,2.000252,0.296812,1.0,3.0,0.0,0.0,0.0,0.0,0.0,0.0
25%,30.0,1.639566,55.0,5.643771,3.757861,3.373284,3.0,5.493536,0.52849,2.0,1.0,8.0,4.541872,3.36358
50%,41.0,3.025913,60.0,6.990641,5.525005,4.951742,6.0,6.49834,1.006159,5.0,2.0,16.0,10.013677,4.951049
75%,53.0,4.368917,65.0,8.354725,7.265776,6.526342,8.0,7.504143,1.477221,8.0,3.0,24.0,15.300809,6.581323
max,65.0,17.973256,90.0,12.0,8.999376,9.846258,10.0,10.0,3.0,10.0,10.0,31.0,40.964769,10.0
