# Gather Fitbit Data from API

Utilizing a python module (fitbit) which streamlines requests to the Fitbit API, the following code makes a series of requests for data gathered by Fitbit for the following metrics:

- steps walked
- floors climbed
- resting heart rate
- exercise data
- sleep data

Data are gathered for each metric, processed, and combined into a single tidy dataframe.

In [1]:
import fitbit
import pandas as pd 
import datetime
import time

In [2]:
key = 'XXXXXX'
secret = 'XXXXXXXXXXXXXXXXXXXXX'
access_token_ = 'XXXXXXXXXXXXXXXXXXXXXXXXXX'
refresh_token_ = 'XXXXXXXXXXXXXXXXXXXXXXXXXX'

All data from Fitbit are stored in JSON format. Some can be easily read into a pandas dataframe, others are nested and need to be flattened. This helper function which will unnest the json data so it can be stored and read in a tabular format.

In [5]:
def flatten_json(nested_json, exclude=['']):
    """Flatten json object with nested keys into a single level.
        Args:
            nested_json: A nested json object.
            exclude: Keys to exclude from output.
        Returns:
            The flattened json object if successful, None otherwise.
    """
    out = {}

    def flatten(x, name='', exclude=exclude):
        if type(x) is dict:
            for a in x:
                if a not in exclude: flatten(x[a], name + a + '_')
        elif type(x) is list:
            i = 0
            for a in x:
                flatten(a, name + str(i) + '_')
                i += 1
        else:
            out[name[:-1]] = x

    flatten(nested_json)
    return out

## Import step data

In [6]:
fit_statsSteps = authd_client.time_series('activities/steps', 
                                          base_date = '2018-02-10', 
                                          end_date='2019-10-26')

time_list = []
val_list = []

for i in fit_statsSteps['activities-steps']:
    val_list.append(i['value'])
    time_list.append(i['dateTime'])

stepsdf = pd.DataFrame({'Steps':val_list, 'dateTime':time_list})
stepsdf.columns = ['steps', 'date']
stepsdf = stepsdf.set_index(stepsdf['date'])

stepsdf

Unnamed: 0_level_0,steps,date
date,Unnamed: 1_level_1,Unnamed: 2_level_1
2018-02-10,9430,2018-02-10
2018-02-11,3780,2018-02-11
2018-02-12,15749,2018-02-12
2018-02-13,16648,2018-02-13
2018-02-14,14360,2018-02-14
...,...,...
2019-10-22,5152,2019-10-22
2019-10-23,13279,2019-10-23
2019-10-24,11843,2019-10-24
2019-10-25,8662,2019-10-25


## Import floors data

In [7]:
fit_statsFloors = authd_client.time_series('activities/floors', 
                                           base_date = '2018-02-10', 
                                           end_date='2019-10-26')

time_list = []
val_list = []

for i in fit_statsFloors['activities-floors']:
    val_list.append(i['value'])
    time_list.append(i['dateTime'])

floorsdf = pd.DataFrame({'Floors':val_list, 'dateTime':time_list})
floorsdf.columns = ['floors', 'date']
floorsdf = floorsdf.set_index(floorsdf['date'])

floorsdf

Unnamed: 0_level_0,floors,date
date,Unnamed: 1_level_1,Unnamed: 2_level_1
2018-02-10,2,2018-02-10
2018-02-11,2,2018-02-11
2018-02-12,37,2018-02-12
2018-02-13,43,2018-02-13
2018-02-14,32,2018-02-14
...,...,...
2019-10-22,2,2019-10-22
2019-10-23,37,2019-10-23
2019-10-24,38,2019-10-24
2019-10-25,26,2019-10-25


## Import sleep data

Sleep data cannot be called in periods larger than 100 days at a time. Thus, I created two lists of start and end dates to cover the time period (2018-02-10:2019-10-26), with three month (just under 100 day) periods. Zip from itertools is then used to add the arguments for start and end dates concurrently. A for loop gathers each range of data, parses the JSON into a pandas dataframe and appends the values to a final, complete dataframe of every date within the total range of values.

In [8]:
# create a range of ~100 day periods over the data period
date_ranges = pd.date_range(start='2018-01-31', end='2019-11-01', freq=pd.offsets.MonthEnd(3))

start_dates = date_ranges[0:7]
end_dates = date_ranges[1:8]

sleepdf = pd.DataFrame()

import itertools
for (a, b) in zip(start_dates, end_dates):

    fit_statsSleep = authd_client.time_series('sleep', 
                                              base_date = a, 
                                              end_date= b)
    
    sleepy = pd.DataFrame([flatten_json(x) for x in fit_statsSleep['sleep']])
    sleepdf = sleepdf.append(sleepy)
    # inserts a small delay between requests to lighten load on server
    time.sleep(1)

# trim to only desired columns of data
sleepdf = sleepdf[['dateOfSleep','startTime','endTime','duration','efficiency',
                   'minutesAsleep','minutesAwake','timeInBed']]

# clean up date and column names
sleepdf = sleepdf.rename(columns={'dateOfSleep': 'date',
                            'startTime':'sleep_start_time',
                            'endTime':'sleep_end_time',
                            'duration':'sleep_duration',
                            'minutesAsleep':'asleep_min',
                            'minutesAwake':'awake_min'})

sleepdf = sleepdf.set_index(sleepdf['date'])

sleepdf

Unnamed: 0_level_0,date,sleep_start_time,sleep_end_time,sleep_duration,efficiency,asleep_min,awake_min,timeInBed
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
2018-04-30,2018-04-30,2018-04-29T22:35:00.000,2018-04-30T06:36:30.000,28860000,96,463,18,481
2018-04-29,2018-04-29,2018-04-29T13:41:00.000,2018-04-29T15:06:30.000,5100000,89,76,9,85
2018-04-28,2018-04-28,2018-04-27T23:42:30.000,2018-04-28T09:28:30.000,35160000,96,549,22,586
2018-04-27,2018-04-27,2018-04-27T19:37:30.000,2018-04-27T20:43:30.000,3960000,95,62,3,66
2018-04-27,2018-04-27,2018-04-26T22:00:00.000,2018-04-27T06:20:30.000,30000000,97,487,13,500
...,...,...,...,...,...,...,...,...
2019-08-04,2019-08-04,2019-08-03T22:07:00.000,2019-08-04T10:08:00.000,43260000,93,670,51,721
2019-08-03,2019-08-03,2019-08-02T22:56:00.000,2019-08-03T09:14:00.000,37080000,97,597,21,618
2019-08-02,2019-08-02,2019-08-01T23:34:30.000,2019-08-02T08:46:00.000,33060000,92,508,43,551
2019-08-01,2019-08-01,2019-07-31T22:59:30.000,2019-08-01T09:55:30.000,39360000,95,623,33,656


## Import heart rate data

In [9]:
df = authd_client.time_series('activities/heart', 
                              base_date = '2018-02-10', 
                              end_date='2019-10-26')

hrdf = pd.DataFrame([flatten_json(x) for x in df['activities-heart']])

# clean up date and column names
hrdf = hrdf[['dateTime','value_restingHeartRate']]
hrdf = hrdf.rename(columns={'dateTime': 'date',
                            'value_restingHeartRate':'rest_avg_hr'})

hrdf = hrdf.set_index(hrdf['date'])

hrdf

Unnamed: 0_level_0,date,rest_avg_hr
date,Unnamed: 1_level_1,Unnamed: 2_level_1
2018-02-10,2018-02-10,59.0
2018-02-11,2018-02-11,62.0
2018-02-12,2018-02-12,62.0
2018-02-13,2018-02-13,60.0
2018-02-14,2018-02-14,60.0
...,...,...
2019-10-22,2019-10-22,58.0
2019-10-23,2019-10-23,55.0
2019-10-24,2019-10-24,55.0
2019-10-25,2019-10-25,56.0


## Combine Data

In [10]:
dfs = [stepsdf, floorsdf, hrdf, sleepdf]
dfs = [df.set_index('date') for df in dfs]
fitbit_df = pd.DataFrame().join(dfs, how="outer")
fitbit_df

Unnamed: 0_level_0,steps,floors,rest_avg_hr,sleep_start_time,sleep_end_time,sleep_duration,efficiency,asleep_min,awake_min,timeInBed
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
2018-02-10,9430,2,59.0,2018-02-09T22:14:30.000,2018-02-10T09:12:00.000,39420000.0,98.0,643.0,14.0,657.0
2018-02-11,3780,2,62.0,2018-02-10T22:19:00.000,2018-02-11T09:19:30.000,39600000.0,93.0,615.0,44.0,660.0
2018-02-12,15749,37,62.0,,,,,,,
2018-02-13,16648,43,60.0,2018-02-12T22:34:30.000,2018-02-13T06:21:30.000,28020000.0,93.0,434.0,33.0,467.0
2018-02-14,14360,32,60.0,2018-02-13T21:52:00.000,2018-02-14T06:24:30.000,30720000.0,98.0,502.0,10.0,512.0
...,...,...,...,...,...,...,...,...,...,...
2019-10-23,13279,37,55.0,2019-10-22T22:26:00.000,2019-10-23T06:35:30.000,29340000.0,94.0,461.0,28.0,489.0
2019-10-24,11843,38,55.0,,,,,,,
2019-10-25,8662,26,56.0,2019-10-24T22:16:30.000,2019-10-25T06:26:00.000,29340000.0,95.0,466.0,23.0,489.0
2019-10-26,12028,3,57.0,2019-10-25T22:55:00.000,2019-10-26T08:12:00.000,33420000.0,95.0,529.0,28.0,557.0


In [11]:
fitbit_df.to_csv('fitbit_clean.csv', index=True)