# Exploratory Data Analysis – Diet & Sleep Study

This notebook begins the exploratory analysis for understanding the relationship between dietary factors (e.g., macronutrient intake, fiber, meal timing) and sleep quality (e.g., deep sleep %, total sleep hours). The analysis is based on self-tracked data from MyFitnessPal and a smartwatch.



In [10]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

# Optional: prettier plots
sns.set(style="whitegrid")

# Optional: make charts display inline in notebooks
%matplotlib inline


In [14]:
# Load the datasets
diet_df = pd.read_csv('data/diet_data_sample.csv')
sleep_df = pd.read_csv('data/sleep_data_sample.csv')

# Show first few rows
print("Diet Data:")
display(diet_df.head())

print("\nSleep Data:")
display(sleep_df.head())


Diet Data:


Unnamed: 0,Date,Calories,Protein (g),Fat (g),Carbohydrates (g),Fiber (g),Sugar (g),Last Meal Hour
0,2025-01-01,2349,88,80,311,28,46,22
1,2025-01-02,2674,102,63,272,26,43,19
2,2025-01-03,2273,61,44,228,22,55,18
3,2025-01-04,1928,69,92,241,31,29,21
4,2025-01-05,2037,92,53,265,25,46,20



Sleep Data:


Unnamed: 0,Date,Total Sleep (hrs),Sleep Efficiency (%),Deep Sleep (%),REM Sleep (%),Light Sleep (%),Resting Heart Rate,Natural Wakeup
0,2025-01-01,6.62,97.5,11.1,26.6,62.3,53.0,True
1,2025-01-02,8.06,85.0,14.7,19.6,65.7,55.7,True
2,2025-01-03,7.55,87.7,13.8,23.4,62.8,58.8,True
3,2025-01-04,6.59,90.5,15.1,25.8,59.1,55.1,False
4,2025-01-05,7.6,90.7,21.8,13.7,64.6,56.7,True


In [12]:
# Check structure and basic statistics

print("📋 Diet Data Info:")
diet_df.info()
print("\n📊 Diet Data Summary:")
display(diet_df.describe())

print("\n📋 Sleep Data Info:")
sleep_df.info()
print("\n📊 Sleep Data Summary:")
display(sleep_df.describe())


📋 Diet Data Info:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 83 entries, 0 to 82
Data columns (total 8 columns):
 #   Column             Non-Null Count  Dtype 
---  ------             --------------  ----- 
 0   Date               83 non-null     object
 1   Calories           83 non-null     int64 
 2   Protein (g)        83 non-null     int64 
 3   Fat (g)            83 non-null     int64 
 4   Carbohydrates (g)  83 non-null     int64 
 5   Fiber (g)          83 non-null     int64 
 6   Sugar (g)          83 non-null     int64 
 7   Last Meal Hour     83 non-null     int64 
dtypes: int64(7), object(1)
memory usage: 5.3+ KB

📊 Diet Data Summary:


Unnamed: 0,Calories,Protein (g),Fat (g),Carbohydrates (g),Fiber (g),Sugar (g),Last Meal Hour
count,83.0,83.0,83.0,83.0,83.0,83.0,83.0
mean,2230.939759,87.445783,69.771084,250.253012,29.421687,53.216867,20.216867
std,266.469343,13.812554,15.278173,35.03129,8.720764,15.875313,1.465441
min,1686.0,60.0,31.0,158.0,4.0,19.0,18.0
25%,2028.0,77.0,58.0,227.5,23.5,43.0,19.0
50%,2228.0,87.0,70.0,255.0,29.0,53.0,20.0
75%,2345.5,100.0,79.5,272.0,35.5,62.5,22.0
max,2857.0,118.0,102.0,325.0,55.0,108.0,22.0



📋 Sleep Data Info:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 83 entries, 0 to 82
Data columns (total 8 columns):
 #   Column                Non-Null Count  Dtype  
---  ------                --------------  -----  
 0   Date                  83 non-null     object 
 1   Total Sleep (hrs)     83 non-null     float64
 2   Sleep Efficiency (%)  83 non-null     float64
 3   Deep Sleep (%)        83 non-null     float64
 4   REM Sleep (%)         83 non-null     float64
 5   Light Sleep (%)       83 non-null     float64
 6   Resting Heart Rate    83 non-null     float64
 7   Natural Wakeup        83 non-null     bool   
dtypes: bool(1), float64(6), object(1)
memory usage: 4.7+ KB

📊 Sleep Data Summary:


Unnamed: 0,Total Sleep (hrs),Sleep Efficiency (%),Deep Sleep (%),REM Sleep (%),Light Sleep (%),Resting Heart Rate
count,83.0,83.0,83.0,83.0,83.0,83.0
mean,7.518313,87.189157,18.821687,22.263855,58.907229,60.513253
std,0.978381,4.8773,5.183582,4.83689,6.997487,4.633436
min,4.85,74.5,5.6,10.9,41.4,53.0
25%,6.92,83.9,14.8,19.4,54.45,57.35
50%,7.56,87.2,18.9,22.4,59.6,60.1
75%,8.185,90.0,23.15,25.25,63.25,63.2
max,9.77,98.4,30.9,35.2,76.3,72.8


## 🔄 Next Steps

Here’s what I’ll work on next:

- Merge diet and sleep datasets based on date
- Filter out days with missing values or when alarm was used
- Create new features (e.g., "Hours Before Sleep" from last meal time)
- Generate visualizations:
  - Time series trends
  - Correlation heatmap
  - Scatter plots and box plots
- Conduct first hypothesis test:
  - Fiber intake vs. deep sleep %
