# **Introduction: Exploring Apple Health Workout Data 📊**

## **Overview**
This notebook performs an **Exploratory Data Analysis (EDA)** of workout data exported from Apple Health. The goal is to uncover trends, patterns, and insights from metrics such as **distance, duration, heart rate, calories burned**, and other fitness-related statistics.

By analyzing this data, I aim to answer key questions about workout performance, health trends, and external influences such as technology or environment.

---

## **Key Questions**
The analysis is guided by the following questions:

1. **How does aging affect workouts?**
   - Analyze trends in **running speed**, **heart rate**, and **calories burned** over time.

2. **Are there differences in workout metrics based on the Apple Watch source version?**
   - Compare **distance, speed, and heart rate** across different `sourceVersion` values.

3. **Does a diverse workout routine improve health trends?**
   - Assess how the variety of workouts impacts overall fitness metrics like **calories burned** and **heart rate**.

4. **How does outside temperature affect running performance?**
   - Explore the relationship between **temperature** and **running speed** (will need to bring in temperature data).

5. **When was I most active based on StepCount?**
   - Identify periods of peak activity using **step count trends**.

6. **How does elevation impact running performance?**
   - Examine **route elevation data** to understand its influence on running pace and visualize routes on a map.

---

## **Data Source**
- The data was exported from the **Apple Health** app using the `export.xml` file.  
- Workout statistics include metrics such as:
   - **Distance** (mi/km)
   - **Duration** (minutes)
   - **Calories burned** (active and basal energy)
   - **Heart rate** (average, min, max) -- need to get this data for older apple watch version.
   - **Step count**
   - **Running power, ground contact time, and stride length**

- Route data (GPX) may be used for advanced mapping and elevation analysis.

---

## **Goals of This Notebook**
1. Clean and prepare the workout data for analysis.
2. Perform visualizations and statistical analysis to answer the outlined questions.
3. Highlight insights and trends that showcase fitness progress and influencing factors.

---

## **Output**
By the end of this notebook, we will:
- Visualize workout metrics and trends.
- Identify patterns based on time, source version, and workout diversity.
- Explore advanced route data and elevation mapping.



In [7]:
# Import necessary libraries
import pandas as pd
import matplotlib
import seaborn

In [4]:
my_workouts = pd.read_csv('output/workout_summary.csv')

In [5]:
my_workouts.describe()

Unnamed: 0,Duration (min),IndoorWorkout,ActiveEnergyBurned_sum_Cal,BasalEnergyBurned_sum_Cal,PausedDuration_mins,AverageMETs,DistanceCycling_sum_mi,DistanceWalkingRunning_sum_mi,DistanceSwimming_sum_yd,SwimmingStrokeCount_sum_count,...,RunningVerticalOscillation_maximum_cm,RunningSpeed_average_mi/hr,RunningSpeed_minimum_mi/hr,RunningSpeed_maximum_mi/hr,RunningStrideLength_average_m,RunningStrideLength_minimum_m,RunningStrideLength_maximum_m,HeartRate_average_count/min,HeartRate_minimum_count/min,HeartRate_maximum_count/min
count,699.0,696.0,697.0,665.0,699.0,657.0,4.0,311.0,6.0,6.0,...,21.0,21.0,21.0,21.0,21.0,21.0,21.0,83.0,83.0,83.0
mean,33.868744,0.208333,295.72356,54.52525,0.548941,7.982494,0.1029251,2.40059,833.333333,326.166667,...,10.82381,6.400204,4.5341,7.50013,1.031756,0.868571,1.166667,125.533888,96.096386,149.831325
std,54.603344,0.406408,138.356434,23.07681,2.650898,3.171258,0.1131527,0.841849,258.19889,96.63626,...,0.254764,0.264544,1.020154,0.488318,0.037542,0.06452,0.076376,30.371957,30.098057,28.515216
min,0.0,0.0,0.041244,0.129883,0.0,1.3,2.79294e-10,0.012913,500.0,189.0,...,10.3,5.98898,2.37821,6.76247,0.979882,0.75,1.06,78.2016,58.0,89.0
25%,23.463122,0.0,220.405,39.7456,0.0,4.87238,0.0425451,2.02216,625.0,257.5,...,10.6,6.21733,3.73263,7.14057,0.999251,0.83,1.12,101.872,76.0,129.0
50%,29.875018,0.0,308.143,49.8284,0.0,9.64053,0.0744396,2.63678,1000.0,385.0,...,10.8,6.41435,4.68994,7.43313,1.03739,0.87,1.16,115.301,84.0,154.0
75%,42.039918,0.0,351.186,71.951,0.02,10.7624,0.1348196,3.01045,1000.0,385.75,...,11.0,6.55316,5.3005,7.67137,1.0557,0.91,1.2,158.315,106.0,176.0
max,1432.133333,1.0,651.749,208.195,53.48,12.8434,0.262821,4.40626,1000.0,397.0,...,11.4,7.1055,6.04167,8.94708,1.12846,0.98,1.33,184.194,176.0,195.0


In [6]:
my_workouts.columns

Index(['ActivityType', 'Duration (min)', 'Source', 'Source_version',
       'StartDate', 'EndDate', 'IndoorWorkout', 'ActiveEnergyBurned_sum_Cal',
       'BasalEnergyBurned_sum_Cal', 'PausedDuration_mins', 'AverageMETs',
       'DistanceCycling_sum_mi', 'DistanceWalkingRunning_sum_mi',
       'DistanceSwimming_sum_yd', 'SwimmingStrokeCount_sum_count',
       'StepCount_sum_count', 'RunningGroundContactTime_average_ms',
       'RunningGroundContactTime_minimum_ms',
       'RunningGroundContactTime_maximum_ms', 'RunningPower_average_W',
       'RunningPower_minimum_W', 'RunningPower_maximum_W',
       'RunningVerticalOscillation_average_cm',
       'RunningVerticalOscillation_minimum_cm',
       'RunningVerticalOscillation_maximum_cm', 'RunningSpeed_average_mi/hr',
       'RunningSpeed_minimum_mi/hr', 'RunningSpeed_maximum_mi/hr',
       'RunningStrideLength_average_m', 'RunningStrideLength_minimum_m',
       'RunningStrideLength_maximum_m', 'HeartRate_average_count/min',
       'HeartRa

In [11]:
my_workouts.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 699 entries, 0 to 698
Data columns (total 34 columns):
 #   Column                                 Non-Null Count  Dtype  
---  ------                                 --------------  -----  
 0   ActivityType                           699 non-null    object 
 1   Duration (min)                         699 non-null    float64
 2   Source                                 699 non-null    object 
 3   Source_version                         699 non-null    object 
 4   StartDate                              699 non-null    object 
 5   EndDate                                699 non-null    object 
 6   IndoorWorkout                          696 non-null    float64
 7   ActiveEnergyBurned_sum_Cal             697 non-null    float64
 8   BasalEnergyBurned_sum_Cal              665 non-null    float64
 9   PausedDuration_mins                    699 non-null    float64
 10  AverageMETs                            657 non-null    float64
 11  Distan

### How does aging affect workout performance?

In [12]:
# first let's convert dates to datetime format
my_workouts['EndDate'] = pd.to_datetime(my_workouts['EndDate'])
my_workouts['StartDate'] = pd.to_datetime(my_workouts['StartDate'])
