# Deconstructing the Fitbit Sleep Score

In this project I am going to use different Machine Learning models to try and get a better understanding of how Fitbit computes the sleep score that it provides to its users.

First, I will import some sleep score data and visualise the data in different ways. Afterwards, I will apply Machine Learning Models to the data in order to find, and ultimately be able to predict, patterns.

In [2]:
# Import all relevant libraries
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.ensemble import RandomForestRegressor
from xgboost import XGBRegressor

In [3]:
# Import the sleep data
sleep_score = pd.read_csv('sleep_score.csv')
sleep_stats = pd.read_csv('sleep_stats.csv')

In [6]:
# Inspect the sleep score DataFrame
sleep_score.head()

Unnamed: 0,timestamp,overall_score,composition_score,revitalization_score,duration_score,deep_sleep_in_minutes,resting_heart_rate,restlessness
0,2020-07-02T06:23:30Z,86,21,22,43,90,59,0.059426
1,2020-07-01T06:03:30Z,77,21,21,35,125,61,0.091463
2,2020-06-30T05:57:00Z,78,20,22,36,79,60,0.058201
3,2020-06-29T06:05:00Z,76,20,22,34,75,61,0.067885
4,2020-06-28T09:20:30Z,82,20,20,42,126,62,0.097103


In [7]:
# Inspect the sleep stats DataFrame
sleep_stats.head()

Unnamed: 0,Sleep,Unnamed: 1,Unnamed: 2,Unnamed: 3,Unnamed: 4,Unnamed: 5,Unnamed: 6,Unnamed: 7,Unnamed: 8
0,Start Time,End Time,Minutes Asleep,Minutes Awake,Number of Awakenings,Time in Bed,Minutes REM Sleep,Minutes Light Sleep,Minutes Deep Sleep
1,2020-07-01 10:05PM,2020-07-02 6:23AM,456,42,37,498,94,271,91
2,2020-06-30 9:43PM,2020-07-01 6:03AM,412,88,32,500,79,208,125
3,2020-06-29 10:03PM,2020-06-30 5:57AM,412,61,26,473,91,242,79
4,2020-06-28 11:24PM,2020-06-29 6:05AM,342,59,26,401,71,196,75


Something went wrong with the import here. The first row should clearly be the column headers. Let's fix that.

In [11]:
sleep_stats.columns = sleep_stats.iloc[0]
sleep_stat = sleep_stats.drop(sleep_stats.index[0])
sleep_stat

Unnamed: 0,Start Time,End Time,Minutes Asleep,Minutes Awake,Number of Awakenings,Time in Bed,Minutes REM Sleep,Minutes Light Sleep,Minutes Deep Sleep
1,2020-07-01 10:05PM,2020-07-02 6:23AM,456,42,37,498,94,271,91
2,2020-06-30 9:43PM,2020-07-01 6:03AM,412,88,32,500,79,208,125
3,2020-06-29 10:03PM,2020-06-30 5:57AM,412,61,26,473,91,242,79
4,2020-06-28 11:24PM,2020-06-29 6:05AM,342,59,26,401,71,196,75
5,2020-06-27 10:42PM,2020-06-28 9:20AM,530,108,39,638,98,305,127
...,...,...,...,...,...,...,...,...,...
318,2019-07-12 11:11PM,2019-07-13 7:05AM,423,51,28,474,89,263,71
319,2019-07-11 9:58PM,2019-07-12 8:23AM,540,85,30,625,114,324,102
320,2019-07-10 9:43PM,2019-07-11 7:32AM,525,64,31,589,93,322,110
321,2019-07-09 9:12PM,2019-07-10 7:31AM,536,83,38,619,124,336,76


In [12]:
# Check for NaNs
sleep_score.isna().any(), sleep_stat.isna().any()

(timestamp                False
 overall_score            False
 composition_score        False
 revitalization_score     False
 duration_score           False
 deep_sleep_in_minutes    False
 resting_heart_rate       False
 restlessness             False
 dtype: bool,
 0
 Start Time              False
 End Time                False
 Minutes Asleep          False
 Minutes Awake           False
 Number of Awakenings    False
 Time in Bed             False
 Minutes REM Sleep        True
 Minutes Light Sleep      True
 Minutes Deep Sleep       True
 dtype: bool)

The sleep_stat DataFrame has NaNs and we will drop the rows with missing values.

In [14]:
# Drop rows with missing values
sleep_stat.dropna(axis=0, inplace=True)

I want to be able to merge the two DataFrames on the date. As of new there are slight differences in how the times are displayed. The sleep_stat DataFrame has both starting and ending dates but what we care about are ending dates (those are always the dates on which the sleep score is provided). I will drop the beginning date column, transform the dates so that they are in the same format across the two DataFrames and then merge the DataFrames on the date.

In [None]:
# Drop start time column from sleep_stat
sleep_stat.drop(columns='Start Time', inplace=True)

In [22]:
# Separate date into new column
sleep_stat['Date'] = sleep_stat['End Time'].apply(lambda x: x[:10])
sleep_score['Date'] = sleep_score.timestamp.apply(lambda x: x[:10])

In [24]:
# Merge the two DataFrames
joined_sleep = sleep_stat.merge(sleep_score, on='Date', how='left')
joined_sleep.head()

Unnamed: 0,End Time,Minutes Asleep,Minutes Awake,Number of Awakenings,Time in Bed,Minutes REM Sleep,Minutes Light Sleep,Minutes Deep Sleep,Date,timestamp,overall_score,composition_score,revitalization_score,duration_score,deep_sleep_in_minutes,resting_heart_rate,restlessness
0,2020-07-02 6:23AM,456,42,37,498,94,271,91,2020-07-02,2020-07-02T06:23:30Z,86.0,21.0,22.0,43.0,90.0,59.0,0.059426
1,2020-07-01 6:03AM,412,88,32,500,79,208,125,2020-07-01,2020-07-01T06:03:30Z,77.0,21.0,21.0,35.0,125.0,61.0,0.091463
2,2020-06-30 5:57AM,412,61,26,473,91,242,79,2020-06-30,2020-06-30T05:57:00Z,78.0,20.0,22.0,36.0,79.0,60.0,0.058201
3,2020-06-29 6:05AM,342,59,26,401,71,196,75,2020-06-29,2020-06-29T06:05:00Z,76.0,20.0,22.0,34.0,75.0,61.0,0.067885
4,2020-06-28 9:20AM,530,108,39,638,98,305,127,2020-06-28,2020-06-28T09:20:30Z,82.0,20.0,20.0,42.0,126.0,62.0,0.097103


In order to get this merged DataFrame in an order that I prefer, I will drop the End Time and the timestamp column and rearrange the columns so that the Date is the first column.

In [33]:
# Drop redundant columns
sleep_data = joined_sleep.drop(columns=['End Time', 'timestamp'])

In [34]:
# Get a list of the columns
cols = sleep_data.columns.tolist()
cols

['Minutes Asleep',
 'Minutes Awake',
 'Number of Awakenings',
 'Time in Bed',
 'Minutes REM Sleep',
 'Minutes Light Sleep',
 'Minutes Deep Sleep',
 'Date',
 'overall_score',
 'composition_score',
 'revitalization_score',
 'duration_score',
 'deep_sleep_in_minutes',
 'resting_heart_rate',
 'restlessness']

In [35]:
# Rearrange the columns
new_cols = [cols[7]] + cols[:7] + cols[8:]
new_cols

['Date',
 'Minutes Asleep',
 'Minutes Awake',
 'Number of Awakenings',
 'Time in Bed',
 'Minutes REM Sleep',
 'Minutes Light Sleep',
 'Minutes Deep Sleep',
 'overall_score',
 'composition_score',
 'revitalization_score',
 'duration_score',
 'deep_sleep_in_minutes',
 'resting_heart_rate',
 'restlessness']

In [36]:
# Reorder the DataFrame
sleep = sleep_data[new_cols]
sleep.head()

Unnamed: 0,Date,Minutes Asleep,Minutes Awake,Number of Awakenings,Time in Bed,Minutes REM Sleep,Minutes Light Sleep,Minutes Deep Sleep,overall_score,composition_score,revitalization_score,duration_score,deep_sleep_in_minutes,resting_heart_rate,restlessness
0,2020-07-02,456,42,37,498,94,271,91,86.0,21.0,22.0,43.0,90.0,59.0,0.059426
1,2020-07-01,412,88,32,500,79,208,125,77.0,21.0,21.0,35.0,125.0,61.0,0.091463
2,2020-06-30,412,61,26,473,91,242,79,78.0,20.0,22.0,36.0,79.0,60.0,0.058201
3,2020-06-29,342,59,26,401,71,196,75,76.0,20.0,22.0,34.0,75.0,61.0,0.067885
4,2020-06-28,530,108,39,638,98,305,127,82.0,20.0,20.0,42.0,126.0,62.0,0.097103
