# Project 3

- **Dataset(s) to be used:** [link](https://www.kaggle.com/datasets/waqi786/mental-health-and-technology-usage-dataset)
- **Analysis question:** hat is the relationship between daily screen time and mental health mental health status?
- **Columns that will (likely) be used:**
  - Screen_Time_Hours (hours)
  - Mental_Health_Status (categorical)
  - Stress_Level (1-10)
  - Sleep_Hours (hours)
  
- (If you're using multiple datasets) **Columns to be used to merge/join them:**
  - This dataset is self-contained, so no merging is needed
- **Hypothesis**: Increased daily screen time correlates with poorer mental health status.
- **Site URL:** [URL from Publish section](https://cic-project3.readthedocs.io/en/latest/)

## Analyzing the Impact of Screen Time on Mental Health

### Introduction

In today’s digital age, screen time has become a significant aspect of daily life. As technology continues to evolve, it influences how we communicate, entertain ourselves, and even work. However, there is growing concern about the potential impact of excessive screen time on mental health. Increased screen time, particularly through social media and digital devices, has been linked to higher stress levels, anxiety, and poorer sleep quality.

This analysis seeks to explore the relationship between daily screen time and mental health by analyzing a dataset that includes self-reported mental health status, stress levels, and daily screen usage. By examining these data, the goal is to uncover insights that might inform healthier screen time habits and provide a deeper understanding of how technology use affects mental well-being.

### Dataset Overview

The dataset used in this analysis contains the following columns:

- User_ID: A unique identifier for each participant.
- Age: The age of the participant.
- Screen_Time_Hours: The average number of hours the participant spends on screens daily.
- Mental_Health_Status: A self-reported mental health status (e.g., "Poor," "Fair," "Good," "Excellent").
- Stress_Level: A self-reported stress level on a scale from 1 to 10, with 1 being very low and 10 being very high.
- Sleep_Hours: The number of hours the participant sleeps per night.

### Research Question:
What is the relationship between daily screen time and mental health status?

### Hypothesis:
Increased daily screen time is associated with poorer mental health status.

In this analysis, we will explore the relationship between **daily screen time** and **mental health status**, using data from a real-world dataset. By the end of this analysis, we aim to uncover key insights that could inform healthier screen time habits and better understand the complex connection between technology use and mental health.

We start by importing the necessary libraries and loading the dataset to prepare for analysis.

In [2]:
import pandas as pd

df = pd.read_csv('mental_health2024.csv')

df.head()


Unnamed: 0,User_ID,Age,Gender,Technology_Usage_Hours,Social_Media_Usage_Hours,Gaming_Hours,Screen_Time_Hours,Mental_Health_Status,Stress_Level,Sleep_Hours,Physical_Activity_Hours,Support_Systems_Access,Work_Environment_Impact,Online_Support_Usage
0,USER-00001,23,Female,6.57,6.0,0.68,12.36,Good,Low,8.01,6.71,No,Negative,Yes
1,USER-00002,21,Male,3.01,2.57,3.74,7.61,Poor,High,7.28,5.88,Yes,Positive,No
2,USER-00003,51,Male,3.04,6.14,1.26,3.16,Fair,High,8.04,9.81,No,Negative,No
3,USER-00004,25,Female,3.84,4.48,2.59,13.08,Excellent,Medium,5.62,5.28,Yes,Negative,Yes
4,USER-00005,53,Male,1.2,0.56,0.29,12.63,Good,Low,5.55,4.0,No,Positive,Yes


Before we start our analysis, it is important to check whether any values are missing in the dataset. Missing values can cause issues in our analysis, so we will need to handle them.

In [3]:
print(df.isnull().sum())

User_ID                     0
Age                         0
Gender                      0
Technology_Usage_Hours      0
Social_Media_Usage_Hours    0
Gaming_Hours                0
Screen_Time_Hours           0
Mental_Health_Status        0
Stress_Level                0
Sleep_Hours                 0
Physical_Activity_Hours     0
Support_Systems_Access      0
Work_Environment_Impact     0
Online_Support_Usage        0
dtype: int64


After checking for missing values, we need to confirm that each column has the appropriate data type. For example, numerical columns should be of type int or float.

In [4]:
print(df.dtypes)


User_ID                      object
Age                           int64
Gender                       object
Technology_Usage_Hours      float64
Social_Media_Usage_Hours    float64
Gaming_Hours                float64
Screen_Time_Hours           float64
Mental_Health_Status         object
Stress_Level                 object
Sleep_Hours                 float64
Physical_Activity_Hours     float64
Support_Systems_Access       object
Work_Environment_Impact      object
Online_Support_Usage         object
dtype: object


We need to convert columns like Mental_Health_Status, Stress_Level, Support_Systems_Access, Work_Environment_Impact, and Online_Support_Usage from object (string) type to categorical type. This conversion will improve memory usage and performance in the analysis, especially if the columns have a limited number of unique values.

In [5]:
df['Mental_Health_Status'] = df['Mental_Health_Status'].astype('category')
df['Stress_Level'] = df['Stress_Level'].astype('category')
df['Support_Systems_Access'] = df['Support_Systems_Access'].astype('category')
df['Work_Environment_Impact'] = df['Work_Environment_Impact'].astype('category')
df['Online_Support_Usage'] = df['Online_Support_Usage'].astype('category')


In [7]:
print(df.dtypes)

User_ID                       object
Age                            int64
Gender                        object
Technology_Usage_Hours       float64
Social_Media_Usage_Hours     float64
Gaming_Hours                 float64
Screen_Time_Hours            float64
Mental_Health_Status        category
Stress_Level                category
Sleep_Hours                  float64
Physical_Activity_Hours      float64
Support_Systems_Access      category
Work_Environment_Impact     category
Online_Support_Usage        category
dtype: object


We should check the unique values in each of these columns after the conversion to ensure that the categories have been identified correctly. This will also help us spot any issues like unexpected or missing categories.

In [8]:
print(df['Mental_Health_Status'].unique())
print(df['Stress_Level'].unique())
print(df['Support_Systems_Access'].unique())
print(df['Work_Environment_Impact'].unique())
print(df['Online_Support_Usage'].unique())


['Good', 'Poor', 'Fair', 'Excellent']
Categories (4, object): ['Excellent', 'Fair', 'Good', 'Poor']
['Low', 'High', 'Medium']
Categories (3, object): ['High', 'Low', 'Medium']
['No', 'Yes']
Categories (2, object): ['No', 'Yes']
['Negative', 'Positive', 'Neutral']
Categories (3, object): ['Negative', 'Neutral', 'Positive']
['Yes', 'No']
Categories (2, object): ['No', 'Yes']


Next, we should check for duplicate rows. Duplicates can skew the results of our analysis and should be removed.

In [9]:
print(df.duplicated().sum())


0


It’s a good idea to run a quick summary to look at the central tendencies (mean, median, etc.) and identify any extreme outliers.

In [10]:
print(df.describe())

                Age  Technology_Usage_Hours  Social_Media_Usage_Hours  \
count  10000.000000            10000.000000              10000.000000   
mean      41.518600                6.474341                  3.972321   
std       13.920217                3.169022                  2.313707   
min       18.000000                1.000000                  0.000000   
25%       29.000000                3.760000                  1.980000   
50%       42.000000                6.425000                  3.950000   
75%       54.000000                9.212500                  5.990000   
max       65.000000               12.000000                  8.000000   

       Gaming_Hours  Screen_Time_Hours   Sleep_Hours  Physical_Activity_Hours  
count  10000.000000       10000.000000  10000.000000             10000.000000  
mean       2.515598           7.975765      6.500724                 5.003860  
std        1.446748           4.042608      1.450933                 2.905044  
min        0.000000   

The dataset provides a comprehensive overview of participants' lifestyle habits and well-being. The average age of participants is around 41.5 years, with a range from 18 to 65. On average, individuals spend 6.47 hours on technology and 7.98 hours on screens daily, with social media usage accounting for 3.97 hours and gaming for 2.52 hours. Sleep patterns show a mean of 6.5 hours of sleep per night, suggesting potential sleep deprivation, while physical activity averages 5 hours daily, with substantial variability. The data reveals diverse habits, offering a solid foundation for exploring relationships between screen time, physical activity, sleep, and mental health indicators like stress and well-being.

We can use box plots to visualize the relationships between daily screen time and mental health indicators.

In [20]:
import plotly.express as px

fig = px.box(df, x='Mental_Health_Status', y='Screen_Time_Hours', 
             title='Screen Time Distribution by Mental Health Status',
             labels={'Mental_Health_Status': 'Mental Health Status', 
                     'Screen_Time_Hours': 'Screen Time (hours)'},
             color='Mental_Health_Status')
fig.show()


Based on the boxplot, there appears to be no significant relationship between daily screen time and mental health status, as the distributions of screen time across the four mental health categories (Good, Poor, Fair, and Excellent) are similar in terms of range, median, and variability. This observation does not support the hypothesis that increased screen time is associated with lower mental health status. Further analysis involving stress levels and sleep quality may provide additional insights to fully address the research question.

In [22]:
fig = px.bar(df, x='Mental_Health_Status', y='Screen_Time_Hours', 
             title='Average Screen Time by Mental Health Status',
             labels={'Mental_Health_Status': 'Mental Health Status', 
                     'Screen_Time_Hours': 'Average Screen Time (hours)'},
             color='Mental_Health_Status')
fig.show()

The chart shows the Average Screen Time by Mental Health Status, categorizing mental health as Good, Poor, Fair, and Excellent with similar average screen times (~20K hours). This visualization aims to explore whether increased screen time correlates with poorer mental health, as stated in the hypothesis. However, the differences between categories appear minimal, suggesting further statistical analysis is needed to validate the relationship. Additional exploration of variables like stress and sleep quality could provide deeper insights into the connection between screen time and mental health.

A violin plot is another good option to visualize the distribution of Screen Time across different Mental Health Status categories. It combines aspects of a box plot and a density plot, showing the distribution and frequency of data points.

In [27]:
# Violin plot of Screen Time by Mental Health Status
fig = px.violin(df, x='Mental_Health_Status', y='Screen_Time_Hours', 
                title='Screen Time Distribution by Mental Health Status',
                labels={'Mental_Health_Status': 'Mental Health Status', 
                        'Screen_Time_Hours': 'Screen Time (hours)'},
                color='Mental_Health_Status')
fig.show()


The violin plot illustrates the distribution of screen time across different mental health statuses: Good, Poor, Fair, and Excellent. Each category shows a similar range of screen time (0 to 15 hours daily), but the shape of the distributions varies, indicating differences in how screen time is spread within each group. For example, the "Poor" mental health category appears to have greater variability, while others are more concentrated. Despite similar medians, the density differences suggest subtle trends that require further statistical analysis to confirm.

Now, let's make a scatter plot between Screen Time vs Sleep Hours

In [30]:
# Scatter plot of Screen Time vs Sleep Hours
fig = px.scatter(df, x='Screen_Time_Hours', y='Sleep_Hours', color='Sleep_Hours',
                 labels={'Screen_Time_Hours': 'Daily Screen Time (hours)', 'Sleep_Hours': 'Sleep Hours'},
                 title='Screen Time vs Sleep Hours')
fig.update_traces(marker=dict(size=12))
fig.show()


The plot illustrates the relationship between **daily screen time** and **sleep hours**, showing data points distributed across a range of 2 to 15 hours of screen time and 4 to 9 hours of sleep. The density, represented by color intensity, suggests no strong linear relationship between the two variables, as the data is relatively evenly spread. However, there appears to be a slight decrease in sleep duration (particularly 8–9 hours) for individuals with very high screen time (>12 hours). This weak association warrants further analysis, such as calculating correlation coefficients or performing regression analysis, to better understand any underlying trends or non-linear relationships.

Next, we can check if the Mental Health Status (categorical) has any statistically significant effect on Screen Time (continuous). For this, we can use ANOVA.

In [31]:
import scipy.stats as stats

# Perform ANOVA to check if mental health status affects screen time
anova_results = stats.f_oneway(df['Screen_Time_Hours'][df['Mental_Health_Status'] == 'Poor'],
                               df['Screen_Time_Hours'][df['Mental_Health_Status'] == 'Good'],
                               df['Screen_Time_Hours'][df['Mental_Health_Status'] == 'Excellent'])

print(f"ANOVA Results for Screen Time and Mental Health Status: F-value = {anova_results.statistic}, p-value = {anova_results.pvalue}")


ANOVA Results for Screen Time and Mental Health Status: F-value = 1.0072677716530465, p-value = 0.3652648239136278


The ANOVA test results indicate no statistically significant differences in average screen time across mental health status categories (F-value = 1.007, p-value = 0.365). The p-value, being greater than 0.05, suggests that any observed variations in screen time between groups (e.g., Good, Poor, Fair, Excellent) are likely due to chance. Therefore, screen time does not appear to vary meaningfully based on mental health status in this dataset. Further analysis may be needed to explore relationships with other factors like stress or sleep quality.

### Conclusion
In conclusion, this analysis examined the relationship between **daily screen time** and **mental health status** using statistical methods, including ANOVA, to determine if screen time varies significantly across different mental health categories. The ANOVA results (F-value = 1.007, p-value = 0.365) indicate that there is no statistically significant difference in screen time between the mental health categories of Good, Poor, Fair, and Excellent. While intuitive assumptions may suggest a connection between screen time and mental health, the findings from this dataset do not support a strong, direct relationship. This suggests that other factors, such as stress levels, sleep quality, or individual behavioral differences, may play a more significant role in determining mental health outcomes. Further research is needed to explore these factors and to consider non-linear relationships or other variables that might better explain the complexities of screen time’s impact on mental health.