# Exploratory Data Analysis – Diet & Sleep Study

This notebook begins the exploratory analysis for understanding the relationship between dietary factors (e.g., macronutrient intake, fiber, meal timing) and sleep quality (e.g., deep sleep %, total sleep hours). The analysis is based on self-tracked data from MyFitnessPal and a smartwatch.



In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

# Optional: prettier plots
sns.set(style="whitegrid")

# Optional: make charts display inline in notebooks
%matplotlib inline


In [5]:
# Load the datasets
diet_df = pd.read_csv('data/diet_data_sample.csv')
sleep_df = pd.read_csv('data/sleep_data_sample.csv')

# Show first few rows
print("Diet Data:")
display(diet_df.head())

print("\nSleep Data:")
display(sleep_df.head())


Diet Data:


Unnamed: 0,date,calories,protein,fiber,carbs,fat,sugar,last_meal_time,sleep_hours,deep_sleep_pct,rem_sleep_pct,efficiency,resting_hr
0,2024-01-01,1835,93,19,250,70,29,20:00,6.2,21,19,88,56
1,2024-01-02,2191,65,36,274,61,40,19:00,8.2,17,20,94,57
2,2024-01-03,1716,96,38,188,55,40,19:00,8.4,16,22,91,63
3,2024-01-04,2400,93,15,238,73,52,18:00,8.3,17,27,91,65
4,2024-01-05,2038,121,18,200,59,47,19:00,8.0,19,22,95,70



Sleep Data:


Unnamed: 0,date,total_sleep_hours,sleep_efficiency,deep_sleep_pct,rem_sleep_pct,resting_hr,alarm_used
0,2025-01-02,7.5,92,18,22,60,False
1,2025-01-03,6.8,88,15,25,62,False
2,2025-01-04,5.5,80,12,20,65,True


In [6]:
# Check structure and basic statistics

print("📋 Diet Data Info:")
diet_df.info()
print("\n📊 Diet Data Summary:")
display(diet_df.describe())

print("\n📋 Sleep Data Info:")
sleep_df.info()
print("\n📊 Sleep Data Summary:")
display(sleep_df.describe())


📋 Diet Data Info:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 30 entries, 0 to 29
Data columns (total 13 columns):
 #   Column          Non-Null Count  Dtype  
---  ------          --------------  -----  
 0   date            30 non-null     object 
 1   calories        30 non-null     int64  
 2   protein         30 non-null     int64  
 3   fiber           30 non-null     int64  
 4   carbs           30 non-null     int64  
 5   fat             30 non-null     int64  
 6   sugar           30 non-null     int64  
 7   last_meal_time  30 non-null     object 
 8   sleep_hours     30 non-null     float64
 9   deep_sleep_pct  30 non-null     int64  
 10  rem_sleep_pct   30 non-null     int64  
 11  efficiency      30 non-null     int64  
 12  resting_hr      30 non-null     int64  
dtypes: float64(1), int64(10), object(2)
memory usage: 3.2+ KB

📊 Diet Data Summary:


Unnamed: 0,calories,protein,fiber,carbs,fat,sugar,sleep_hours,deep_sleep_pct,rem_sleep_pct,efficiency,resting_hr
count,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0
mean,2137.833333,103.933333,25.5,230.033333,72.866667,39.633333,7.326667,20.0,22.2,89.7,62.166667
std,269.959395,26.13484,8.858076,35.525108,15.919481,11.484572,0.789558,2.852706,3.111713,2.961244,5.017784
min,1608.0,64.0,12.0,180.0,50.0,20.0,6.1,15.0,18.0,85.0,55.0
25%,1965.75,86.75,18.25,197.25,61.0,29.0,6.8,17.25,20.0,87.0,58.0
50%,2180.5,101.5,25.5,238.0,69.5,40.0,7.35,21.0,21.0,89.5,63.0
75%,2371.25,129.5,33.75,250.75,87.0,48.0,8.0,22.0,25.0,92.0,65.75
max,2592.0,146.0,40.0,296.0,97.0,60.0,8.4,24.0,28.0,95.0,70.0



📋 Sleep Data Info:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 7 columns):
 #   Column             Non-Null Count  Dtype  
---  ------             --------------  -----  
 0   date               3 non-null      object 
 1   total_sleep_hours  3 non-null      float64
 2   sleep_efficiency   3 non-null      int64  
 3   deep_sleep_pct     3 non-null      int64  
 4   rem_sleep_pct      3 non-null      int64  
 5   resting_hr         3 non-null      int64  
 6   alarm_used         3 non-null      bool   
dtypes: bool(1), float64(1), int64(4), object(1)
memory usage: 279.0+ bytes

📊 Sleep Data Summary:


Unnamed: 0,total_sleep_hours,sleep_efficiency,deep_sleep_pct,rem_sleep_pct,resting_hr
count,3.0,3.0,3.0,3.0,3.0
mean,6.6,86.666667,15.0,22.333333,62.333333
std,1.014889,6.110101,3.0,2.516611,2.516611
min,5.5,80.0,12.0,20.0,60.0
25%,6.15,84.0,13.5,21.0,61.0
50%,6.8,88.0,15.0,22.0,62.0
75%,7.15,90.0,16.5,23.5,63.5
max,7.5,92.0,18.0,25.0,65.0


## 🔄 Next Steps

Here’s what I’ll work on next:

- Merge diet and sleep datasets based on date
- Filter out days with missing values or when alarm was used
- Create new features (e.g., "Hours Before Sleep" from last meal time)
- Generate visualizations:
  - Time series trends
  - Correlation heatmap
  - Scatter plots and box plots
- Conduct first hypothesis test:
  - Fiber intake vs. deep sleep %
