David Giacobbi  
Gonzaga University  
CPSC 222, Spring 2022

# Exploratory Data Analysis: Apple Watch Fitness Data

## Introduction

During my first year of college, fitness has been a critical aspect of my daily life and routine. In a place where nearly everything is in walking distance and I have the free time to workout on an ordered schedule, I realized fitness has a large impact on my daily habits. Moreover, I recently got an Apple Watch over Christmas break, so a detailed and accurate analysis of my fitness data was accessible for the Spring 2022 semester.

Fitness data is a large spectrum, so I wanted to focus my project specifically on how certain day attributes affect health and workout activity. Visualizing and testing trends could help answer a few of the following questions:

* Does the type of weather affect the amount I workout?
* Is my workout intensity influenced by the weather?
* How does the day of the week affect how active I am?
* Am I more active on the weekends when I have more free time or does a structured weekday schedule have a greater effect?

In order to answer these questions, data cannot just be extracted from my Apple Watch. In addition to csv files from my Apple Watch, I will utilize an open-source API to create a JSON file of the weather.

Through this exploratory data analysis, I will be able to get a better understanding of just what influences my fitness habits. Information like this could help me improve my daily schedule to workout more effectively as well as find areas in my schedule where fitness could improve. Furthermore, a focused analysis such as this one could draw more general conclusions about how the workout mindset is influenced by both weather and times during the week.

### Load the Data

In order to perform an extensive analysis of my Apple watch data, I need to gather data from various sources so that I can see what exactly influences my fitness activity and what trends I have created over the past semester. The following datasets will be loaded and cleaned for further analysis:

1. Apple Daily Health data (12/26/2021 - 4/12/2022): csv file
1. Apple Workout data (12/26/2021 - 4/12/2022): csv file
1. MeteoStat Daily Weather data: JSON file
1. Days of the Week data: created from `daily_health_data.csv`

#### Loading the CSV Files

The Apple Watch data needs to be uploaded into `Pandas` dataframes so that they can be properly cleaned for analysis. Since this will be the dataframe that the other data files will be surrounded around, the indexing will be done by date. This can be done with a few quick lines of code below:

In [5]:
import pandas as pd

daily_health_df = pd.read_csv("daily_health_data.csv", index_col="Date")

print(daily_health_df.columns)

Index(['Calories', 'Exercise Time (min)', 'Stand Hours', 'Flights Climbed',
       'Heart Rate', 'Max Heart Rate', 'Avg Heart Rate', 'Rest Heart Rate',
       'Step Count', 'Distance (mi) '],
      dtype='object')


The data from the above csv file was retrieved from an application on my iPhone called [Health Auto Export](https://apps.apple.com/us/app/health-auto-export-json-csv/id1115567069). Using an Apple shortcut, Health Auto Export was able to generate an extensive csv file of various health attributes. Each attribute listed above will be cleaned or deleted, depending on its relative performance. The below attributes are the ones that will be kept for further analysis.

* **Calories**: total active calories burned (kcal)
* **Exercise Time**: total exercise time (min)
* **Flights Climbed**: total flights of stairs climbed
* **Max Heart Rate**: highest heart rate reached (bpm)
* **Avg Heart Rate**: average heart rate throughout day (bpm)
* **Rest Heart Rate**: resting heart rate (bpm)
* **Step Count**: total steps taken
* **Distance**: total active distance covered (mi)
  
  
  
The next Apple Watch csv file is workout centered. This file includes data relating to specific workout instances. Date cannot be used as this file's index as there are multiple workouts logged under the same day. However, the start time of the workout is unique and can be used as the index for this dataset.

In [8]:
workout_df = pd.read_csv("workouts_data.csv", index_col="Start")

print(workout_df.columns)

Index(['Type', 'End', 'Duration', 'Total Energy (kcal)',
       'Active Energy (kcal)', 'Max Heart Rate (bpm)', 'Avg Heart Rate (bpm)',
       'Distance (mi)', 'Avg Speed(mi/hr)', 'Step Count (count)',
       'Step Cadence (spm)', 'Swimming Stroke Count (count)',
       'Swim Stoke Cadence (spm)', 'Flights Climbed (count)',
       'Elevation Ascended (ft)', 'Elevation Descended (ft)'],
      dtype='object')


This workout data was also retrieved using the iPhone app [Health Auto Export](https://apps.apple.com/us/app/health-auto-export-json-csv/id1115567069). The Apple shortcut used to create this csv file did not allow for decisions to be made about which attributes to use. Therefore, half of these attributes will be used for data analysis. Below are the attributes that will be used for further analysis:

* **Type**: type of workout completed
* **Duration**: total time of workout (min)
* **Total Energy**: total calories burned in workout (kcal)
* **Max Heart Rate**: highest heart rate reached (bpm)
* **Avg Heart Rate**: average workout heart rate