Akash Sahadevan, Louise Filkorn, Shreyansh Misra

# Milestone 2

## 1. Project Definition

Our project aims to explore how daily screen time influences aspects of personal health, well-being, and productivity, focusing on physical activity (step count), mood, focus, and mental fatigue. In a world increasingly influenced by our smartphones and devices, understanding these correlations can help identify healthier digital habits and inform strategies for better time management and well-being.

This project interests our team because we each experience different levels of device usage for work, study, and leisure, yet notice distinct effects on our energy, motivation, and mental clarity. By collecting and analyzing data over six weeks, we aim to uncover patterns showing how our screen time behaviors correlate with both objective measures (like step count) and subjective measures (like mood, productivity, and fatigue).

## 2. Data Processing

Our dataset includes Date, Screen Time (Phone), Screen Time (Laptop), Step Count, Sleep Duration, Mood (AM), Mood (PM), Mental Fatigue (AM), Mental Fatigue (PM), and Productivity. We were able to track step count and sleep automatically using Apple Health on our iPhones, while subjective measures like mood, mental fatigue, and productivity were manually recorded and entered into a spreadsheet. Our independent variable Screen Time was also automatically tracked on our iPhones and MacBooks through the screen time app. 

Manually recorded data was first entered and stored in Google Sheets, then exported as a CSV file for analysis. Automatically collected data followed a similar process but required additional handling. Screen time data was exported directly from the Screen Time app in CSV format, while data from Apple Health was exported as an XML file. We used a Python script to convert the Apple Health XML export into a CSV, ensuring consistent structure and formatting across all datasets. Once all data sources were standardized, we imported them into a Jupyter Notebook and used the Pandas library for cleaning, merging, and preparing the final integrated dataset for analysis.

#### Figure 1: Data Pipeline

In [1]:
from IPython.display import Image
Image(url="img/pipeline.png")

Our investigation had three participants so we labeled our data prior to integration to ensure that trends could be examined both collectively and per-person. With all datasets cleaned and standardized, we were able to proceed with our preliminary analysis.

In [2]:
# %run apple_health_export/convert_xml_csv.py

In [3]:
# %run apple_health_export/get_steps.py
# %run apple_health_export/get_sleep.py

```
data/
├── shrey/
│   ├── manual_shrey.csv
│   ├── steps.csv
│   ├── sleep.csv
│   └── screen.csv
├── akash/
│   ├── manual_akash.csv
│   ├── steps.csv
│   ├── sleep.csv
│   └── screen.csv
└── louise/
    ├── manual_louise.csv
    ├── steps.csv
    ├── sleep.csv
    └── screen.csv
```

In [9]:
import pandas as pd
import os

base_path = "data"
participants = ["shrey", "akash", "louise"]

manual_dfs = []
steps_dfs = []
sleep_dfs = []
screen_dfs = []

for name in participants:
    person_path = os.path.join(base_path, name)
    
    manual_path = os.path.join(person_path, f"manual_{name}.csv")
    df_manual = pd.read_csv(manual_path)
    df_manual["name"] = name
    manual_dfs.append(df_manual)
    
    steps_path = os.path.join(person_path, "steps.csv")
    df_steps = pd.read_csv(steps_path)
    df_steps["name"] = name
    steps_dfs.append(df_steps)
    
    sleep_path = os.path.join(person_path, "sleep.csv")
    df_sleep = pd.read_csv(sleep_path)
    df_sleep["name"] = name
    sleep_dfs.append(df_sleep)
    
    screen_path = os.path.join(person_path, "screen.csv")
    df_screen = pd.read_csv(screen_path)
    df_screen["name"] = name
    screen_dfs.append(df_screen)

df_manual_all = pd.concat(manual_dfs, ignore_index=True)
df_steps_all = pd.concat(steps_dfs, ignore_index=True)
df_sleep_all = pd.concat(sleep_dfs, ignore_index=True)
df_screen_all = pd.concat(screen_dfs, ignore_index=True)

print("Manual:", df_manual_all.shape)
print("Steps:", df_steps_all.shape)
print("Sleep:", df_sleep_all.shape)
print("Screen:", df_screen_all.shape)

Manual: (42, 7)
Steps: (6655, 3)
Sleep: (1540, 3)
Screen: (42, 4)


In [11]:
df_sleep_all

Unnamed: 0,Date,Total Value,name
0,2023-08-21,6.456944,shrey
1,2023-08-22,7.009167,shrey
2,2023-08-23,7.602222,shrey
3,2023-08-24,7.551111,shrey
4,2023-08-25,7.736111,shrey
...,...,...,...
1535,2023-12-31,4.971389,louise
1536,2024-01-01,1.997500,louise
1537,2024-01-02,5.005000,louise
1538,2024-01-04,7.251667,louise


In [5]:
# analysis

# how our step count changes with day of the week
# how our step count has increased or decreased over time

# sleep vs day of week
# sleep over time

# how each of our screen times compare (split between phone and laptop)
# screen time by day of week

In [6]:
# how all the variables affect screen time
# correlation matrix
# 

In [7]:
# analysis

In [8]:
# analysis