<a href="https://colab.research.google.com/github/alyaarslan/dsa210project/blob/main/Final_Report.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Final Report: Analyzing the Relationship Between Screen Time and Sleep Quality**
## **1. Introduction**

This project investigates the relationship between daily screen time, sleep duration, and self-reported sleep quality. Using three months of personal data collected through Samsung Health, the goal was to identify trends, test statistical relationships, and apply machine learning to predict sleep quality.

The analysis was performed using Python and includes data preprocessing, exploratory data analysis (EDA), statistical hypothesis testing, and a classification model built with scikit-learn.

## **2. Dataset Overview**

### **2.1 Sleep Dataset**
The sleep dataset contains the following:

**date**: the day of observation

**sleeptime_min**: total sleep duration (converted from hours to minutes)

**sleep_quality**: self-reported sleep quality (0–100 scale)

**weekday**: derived from the date column

### **2.2 Screen Time Dataset**
The screen time dataset includes:

**date**: the day of observation

**screen_time_min**: total screen time in minutes

### **2.3 Merging Datasets**
Both datasets were merged on the date column to form a single dataframe with sleep and screen activity aligned day-by-day. Rows with missing values were dropped.

## **3. Feature Engineering**
Additional variables were created to enable better analysis:

**quality_group**: A categorical variable derived from sleep quality:

**Poor**: sleep quality < 70

**Good**: 70 ≤ sleep quality < 85

**Excellent**: sleep quality ≥ 85

**extreme_sleep**: A binary flag where:

- 1 = sleep duration < 6.5 hours or > 10.5 hours

- 0 = sleep within the normal range

**is_weekend**: A binary flag to mark Saturday and Sunday

**Weekday Encoding**: The weekday column was one-hot encoded into 7 separate binary features.

## **4. Exploratory Data Analysis (EDA)**
### **4.1 Time Series Plots**
Line plots were created to show daily changes in:

- Sleep duration

- Sleep quality

- Screen time

These visualizations reveal that all three measures vary significantly day to day, with no immediately obvious patterns. Some drops in sleep quality were observed following high screen time days, but not consistently.

### **4.2 Distributions**
Histograms were plotted for all key variables to understand their distributions:

- Sleep time clustered around 7–9 hours

- Sleep quality was right-skewed, with many values in the 80–90 range

- Screen time showed a bimodal distribution

### **4.3 Box Plots**
Box plots of sleep duration by weekday were created. These revealed:

- Slightly longer sleep durations on Fridays and Sundays

- More variability (outliers) on Wednesdays and Saturdays

## **5. Correlation and Hypothesis Testing**
Pearson correlation coefficients were calculated between screen time, sleep time, and sleep quality. The results indicated no statistically significant linear relationship between the variables under standard conditions.

**Additional Test**:
A specific hypothesis is tested: that extreme sleep durations (too short or too long) are associated with lower sleep quality.

- Pearson r = -0.554

- p-value < 0.0000001

This result is statistically significant and supports the hypothesis.

## **6. Sleep Quality Group Analysis**
Sleep records are grouped by quality_group and the average screen time and sleep duration is calculated within each group. The analysis shows that higher quality sleep tends to fall within moderate sleep durations. Poor sleep is associated with both lower duration and surprisingly lower screen time, suggesting other external factors may be involved.

## **7. Machine Learning: Predicting Sleep Quality**
**Objective**:
Train a classification model to predict whether sleep quality is "Poor", "Good", or "Excellent" based on:

- sleeptime_min

- screen_time_min

- extreme_sleep

- is_weekend

- Weekday encoding

**Model**: RandomForestClassifier from scikit-learn

**Training/test split**: 75/25

**Evaluation metrics**: accuracy, classification report, confusion matrix, feature importance

**Results Accuracy**: 45%

**Top Features**:

- sleeptime_min

- screen_time_min

- extreme_sleep

- weekday indicators

Some classes, particularly "Poor", were not predicted by the model at all. This triggered a warning because precision and recall could not be calculated for that class due to zero predictions. This happened because the dataset contains very few "Poor" examples, which limits what the model can learn.

## **8. Conclusion**
This project combined personal sleep and screen time data to analyze patterns and build a predictive model for sleep quality. While no strong linear relationships were found between screen time and sleep quality, extreme sleep durations showed a clear negative impact on rest.

The machine learning model was able to learn general trends but struggled with rare classes due to data imbalance. Overall, the project demonstrates how personal behavior data can be used for exploratory analysis and basic prediction.








