# Overview
My Fitbit history data was exported from my Fitbit Dashboard.  
The goal is to use this data to understand personal fitness habits and trends.

Date Range of Data
* I purchased my 1st device Tuesday, August 20, 2019.
* I purchased my 2nd device Monday, June 21, 2021 
* I exported my data on Saturday July 17, 2021

For now, the data I will work with is from 2020 as it is the only complete year of data available. 

Directory Structure of the Fitbit Data:    
/Physical Activity

From here I create separate directories for each data set type.
* Steps
* Distance
* Calories
* etc

There are many separate JSON files for each data set. 
For example, for steps data, 1 example filename is steps-2019-08-21.json.
* I collect all files with this name structure, and create a new separate directory to hold them. 
* Then, I load each individual json files into its own dataframe
* Then, I create a list of all dataframes that have been appended to each other
* Then, I create 1 dataframe from that list. 

This process is repeated for all different data sets/directories.



## Resources

https://medium.com/analytics-vidhya/exploring-your-fitbit-sleep-data-with-python-pandas-and-seaborn-in-jupyter-notebook-a997f17c3a42

https://towardsdatascience.com/formating-and-visualizing-time-series-data-ba0b2548f27b

https://github.com/CoreyMSchafer/code_snippets/blob/master/Python/Pandas/10-Datetime-Timeseries/Pandas-Demo.ipynb

https://github.com/soumilshah1995/Data-Analysis-Over-10-years-of-hourly-energy-consumption-data-from-PJM-in-Megawatts/blob/master/Exp3%20.ipynb


# Import Packages

In [None]:
import os
import pandas as pd
import numpy as np
import seaborn as sns
import glob
import shutil

from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"

import matplotlib.pyplot as plt
import seaborn as sns
from pandas.plotting import register_matplotlib_converters

%matplotlib inline

# Data Wrangling

In [None]:
os.getcwd()

In [None]:
os.chdir('c:\\Users\\Brandi\\Documents\\my_fitbit_data\\Physical_Activity')
os.getcwd()

In [None]:
# to be run only once. 
# #to list collection of unique file names without timestamp

# filenames = os.listdir()

# print(f"Parsing {len(filenames)} files for unique types.")
# unique_filenames = set()
# for f in filenames:
#     unique_filenames.add(f.split("-")[0])
# print(f"Found {len(unique_filenames)} unique types.")
# for name in sorted(unique_filenames):
#     print(name)

In [None]:
# to be run only once. 
# create new directory with relative path.
# move similarly names files to new directory

# os.mkdir(os.path.join(".","steps_data"))
# dest_dir = os.path.join(".","steps_data")
# for file in glob.glob('steps*'):
#     shutil.move(file, dest_dir)

# os.mkdir(os.path.join(".","distance_data"))
# dest_dir = os.path.join(".","distance_data")
# for file in glob.glob('distance*'):
#     shutil.move(file, dest_dir)

# os.mkdir(os.path.join(".","calories_data"))
# dest_dir = os.path.join(".","calories_data")
# for file in glob.glob('calories*'):
#     shutil.move(file, dest_dir)

# os.mkdir(os.path.join(".","heart_rate_data"))
# dest_dir = os.path.join(".","heart_rate_data")
# for file in glob.glob('heart_rate*'):
#     shutil.move(file, dest_dir)

# os.mkdir(os.path.join(".","lightly_active_minutes_data"))
# dest_dir = os.path.join(".","lightly_active_minutes_data")
# for file in glob.glob('lightly*'):
#     shutil.move(file, dest_dir)

# os.mkdir(os.path.join(".","moderately_active_minutes_data"))
# dest_dir = os.path.join(".","moderately_active_minutes_data")
# for file in glob.glob('moderately*'):
#     shutil.move(file, dest_dir)

# os.mkdir(os.path.join(".","sedentary_minutes_data"))
# dest_dir = os.path.join(".","sedentary_minutes_data")
# for file in glob.glob('sedentary*'):
#     shutil.move(file, dest_dir)

# os.mkdir(os.path.join(".","very_active_minutes_data"))
# dest_dir = os.path.join(".","very_active_minutes_data")
# for file in glob.glob('very_active*'):
#     shutil.move(file, dest_dir)

# os.mkdir(os.path.join(".","time_in_heart_rate_zones_data"))
# dest_dir = os.path.join(".","time_in_heart_rate_zones_data")
# for file in glob.glob('time_in_heart*'):
#     shutil.move(file, dest_dir)

# Load Data

In [None]:
#load all json files into 1 list 
dfs = []
for file in os.listdir("steps_data"):
    dfs.append(pd.read_json(f"steps_data/{file}"))

#concat the files into one dataframe
df_steps = pd.concat(dfs)

In [None]:
dfs = []
for file in os.listdir("distance_data"):
    dfs.append(pd.read_json(f"distance_data/{file}"))

df_distance = pd.concat(dfs)

In [None]:
dfs = []
for file in os.listdir("calories_data"):
    dfs.append(pd.read_json(f"calories_data/{file}"))

df_calories = pd.concat(dfs)

# Cleaning Data

In [None]:
#convert date variable to datetime and update in place and then sort by date
#df_calories.dateTime = pd.to_datetime(df_calories.dateTime)
df_calories.set_index("dateTime", drop=True, inplace=True)
df_calories.sort_index(inplace=True)

#df_distance.dateTime = pd.to_datetime(df_distance.dateTime)
df_distance.set_index("dateTime", drop=True, inplace=True)
df_distance.sort_index(inplace=True)

#df_steps.dateTime = pd.to_datetime(df_steps.dateTime)
df_steps.set_index("dateTime", drop=True, inplace=True)
df_steps.sort_index(inplace=True)

In [None]:
# create new columns from datetime index for visualizations
df_calories["year"] = df_calories.index.year
df_calories["month"] = df_calories.index.month
df_calories["day"] = df_calories.index.day
df_calories['weekday'] = df_calories.index.dayofweek
df_calories['weekday_name'] = df_calories.index.day_name()

df_distance["year"] = df_distance.index.year
df_distance["month"] = df_distance.index.month
df_distance["day"] = df_distance.index.day
df_distance['weekday'] = df_distance.index.dayofweek
df_distance['weekday_name'] = df_distance.index.day_name()

df_steps["year"] = df_steps.index.year
df_steps["month"] = df_steps.index.month
df_steps["day"] = df_steps.index.day
df_steps['weekday'] = df_steps.index.dayofweek
df_steps['weekday_name'] = df_steps.index.day_name()

In [None]:
df_steps['weekday_name'] = pd.Categorical(df_steps['weekday_name'], categories=
    ['Monday','Tuesday','Wednesday','Thursday','Friday','Saturday', 'Sunday'],
    ordered=True)

df_calories['weekday_name'] = pd.Categorical(df_calories['weekday_name'], categories=
    ['Monday','Tuesday','Wednesday','Thursday','Friday','Saturday', 'Sunday'],
    ordered=True)

df_distance['weekday_name'] = pd.Categorical(df_distance['weekday_name'], categories=
    ['Monday','Tuesday','Wednesday','Thursday','Friday','Saturday', 'Sunday'],
    ordered=True)

In [None]:
#Converting distance from cm to miles (approximate)
df_distance["value"] = df_distance["value"] / 160934

In [None]:
#checking min and max dates via index
df_calories.first_valid_index()
df_calories.last_valid_index()

df_distance.first_valid_index()
df_distance.last_valid_index()

df_steps.first_valid_index()
df_steps.last_valid_index()

In [None]:
#checking datatypes
df_steps.dtypes
df_distance.dtypes
df_calories.dtypes

In [None]:
df_steps.head(5)
df_calories.head(5)
df_distance.head(5)

# Validating Data from Fitbit
Comparison between the JSON exported data and the Fitbit online dashboard data show that the data is not precisely matching, but it is fairly close.

## Exported June Data
* total steps  168,846
* distance 79.704413
* calories 68,798.59

## Dashboard June Data
* total steps 169,084
* distance 80 miles
* calories 68,835


In [None]:
drange_steps = df_steps.loc['2021-06-01':'2021-06-30']
drange_distance = df_distance.loc['2021-06-01':'2021-06-30']
drange_calories = df_calories.loc['2021-06-01':'2021-06-30']

june_steps_total = drange_steps["value"].resample("M").sum()
june_distance_total = (drange_distance["value"].resample("M").sum())
june_calories_total = drange_calories["value"].resample("M").sum()

june_steps_total
june_distance_total
june_calories_total

# Merging and Correlation

In [None]:
#create data frames to merge
resampled_steps = pd.DataFrame(df_steps["value"].resample("D").sum())
resampled_calories = pd.DataFrame(df_calories["value"].resample("D").sum())
resampled_distance = pd.DataFrame(df_distance["value"].resample("D").sum())

resampled_steps.rename(columns= {'value':'steps'},inplace=True)
resampled_calories.rename(columns= {'value':'calories'},inplace=True)
resampled_distance.rename(columns= {'value':'distance'},inplace=True)

In [None]:
#merge calorie, step, distance data
combined_steps_calories = pd.merge(resampled_steps, resampled_calories, on=["dateTime"])
combined_all = pd.merge(combined_steps_calories, resampled_distance, on=["dateTime"])


In [None]:
combined_all.head()

In [None]:
#pair plot for quick coorelation analysis
sns.set_theme(style="ticks")
sns.pairplot(combined_all)

In [None]:
sns.regplot(x="steps", y="calories", data=combined_all, scatter_kws={"color":"black"}, line_kws={"color":"blue"})
plt.title("Steps vs Calories") 
plt.xlabel("Steps") 
plt.ylabel("Calories")
plt.show()

# Resample Hourly Data to Create Daily Data

In [None]:
daily_steps = df_steps['value'].resample('D').sum()
daily_calories = df_calories['value'].resample('D').sum()
daily_distance = df_distance['value'].resample('D').sum()

In [None]:
daily_steps.describe()
daily_calories.describe()
daily_distance.describe()

In [None]:
#Date of max steps
daily_steps.idxmax()

#max step count
daily_steps[daily_steps.idxmax()]

In [None]:
daily_steps['2021-06'].plot(kind="bar")
plt.axhline(daily_steps['2021-06'].mean(), color='green')
plt.title("Total Daily Steps in June 2021")
plt.xlabel("Date")
plt.ylabel("Steps")

In [None]:
daily_calories['2021-06'].plot(kind="bar")
plt.axhline(daily_calories['2021-06'].mean(), color='green')
plt.title("Total Daily Calories in June 2021")
plt.xlabel("Date")
plt.ylabel("Calories")

In [None]:
daily_distance['2021-06'].plot(kind="bar")
plt.axhline(daily_distance['2021-06'].mean(), color='green')
plt.title("Total Daily Distance in June 2021")
plt.xlabel("Date")
plt.ylabel("Miles")

In [None]:
daily_steps['2020'].plot()
plt.axhline(daily_steps['2020'].mean(), color='green')
plt.title("Total Steps in 2020")
plt.xlabel("Date")
plt.ylabel("Steps")

In [None]:
daily_calories['2020'].plot()
plt.axhline(daily_calories['2020'].mean(), color='green')
plt.title("Total Calories in 2020")
plt.xlabel("Date")
plt.ylabel("Calories")

In [None]:
daily_distance['2020'].plot()
plt.axhline(daily_distance['2020'].mean(), color='green')
plt.title("Total Distance in 2020")
plt.xlabel("Miles")
plt.ylabel("Calories")

In [None]:
combined_all.plot()

In [None]:
combined_all["year"] = combined_all.index.year
combined_all["month"] = combined_all.index.month
combined_all['weekday_name'] = combined_all.index.day_name()
combined_all['weekday_name'] = pd.Categorical(combined_all['weekday_name'], categories=
    ['Monday','Tuesday','Wednesday','Thursday','Friday','Saturday', 'Sunday'],
    ordered=True)

# Weekday Averages

In [None]:
g = combined_all.loc['2020'].groupby(["weekday_name"])
avg_weekday_steps = g.aggregate({"steps":np.mean})
avg_weekday_distance = g.aggregate({"distance":np.mean})
avg_weekday_calories = g.aggregate({"calories":np.mean})

In [None]:
avg_weekday_steps.plot(kind="bar")
plt.title("Average Steps")
plt.xlabel("Day")
plt.ylabel("Steps")
plt.show()

In [None]:
avg_weekday_distance.plot(kind="bar")
plt.title("Average Distance")
plt.xlabel("Day")
plt.ylabel("Distance")
plt.show()

In [None]:
avg_weekday_calories.plot(kind="bar")
plt.title("Average Calories")
plt.xlabel("Day")
plt.ylabel("Calories")
plt.show()

# Monthly Averages

In [None]:
g = combined_all.loc['2020'].groupby(["month"])
avg_monthly_steps = g.aggregate({"steps":np.mean})
avg_monthly_distance = g.aggregate({"distance":np.mean})
avg_monthly_calories = g.aggregate({"calories":np.mean})

In [None]:
avg_monthly_steps.plot(kind="bar")
plt.title("Average Steps per Month")
plt.xlabel("Month")
plt.ylabel("Steps")
plt.show()

In [None]:
avg_monthly_distance.plot(kind="bar")
plt.title("Average Distance per Month")
plt.xlabel("Month")
plt.ylabel("Distance")
plt.show()

In [None]:
avg_monthly_calories.plot(kind="bar")
plt.title("Average Calories per Month")
plt.xlabel("Month")
plt.ylabel("Calories")
plt.show()

# Export Data 
Exporting finalized and cleaned data for future analysis and projects.

In [None]:
combined_all.to_csv('combined_all.csv')

# Insight from 2020 Data

Correlations
* Steps, Distance, and Calories are have a positive correlation. As expected. 

Weekly Data
* Sundays I was most active
* Average data suggests I tended to have a mid-week slump

Monthly Data
* April was my most active month
* It seems that the majority of Novemeber, by Fitbit was dead. Thus, little data is avaiable.
