# Apple Health Extractor

This code will parse your Apple Health export data, create multiple CSV and do some simple data checks and data analysis. 

Enjoy! 

--------

## Extract Data and Export to CSVs from Apple Health's Export.xml

* Command Line Tool to Process apple health's export.xml file 
* Create multiple CSV files for each data type. 
* Original Source: https://github.com/tdda/applehealthdata
* Based on the size of your Apple Health Data, this script may take several minutes to complete.

**NOTE: Currently there are a few minror errors based on additional data from Apple Health that require some updates.** 

## Setup and Usage NOTE

* Export your data from Apple Health App on your phone. 
* Unzip export.zip into this directory and rename to data. 
* Inside your directory there should be a directory and file here: /data/export.xml
* Run inside project or in the command line.

In [5]:
# %run -i 'apple-health-data-parser' 'export.xml' 
%run -i "apple-health-data-parser" "data/export.xml" 

Reading data from data/export.xml . . . done
Unexpected node of type ExportDate.

Tags:
ActivitySummary: 1323
ExportDate: 1
Me: 1
Record: 2248249
Workout: 373

Fields:
HKCharacteristicTypeIdentifierBiologicalSex: 1
HKCharacteristicTypeIdentifierBloodType: 1
HKCharacteristicTypeIdentifierCardioFitnessMedicationsUse: 1
HKCharacteristicTypeIdentifierDateOfBirth: 1
HKCharacteristicTypeIdentifierFitzpatrickSkinType: 1
activeEnergyBurned: 1323
activeEnergyBurnedGoal: 1323
activeEnergyBurnedUnit: 1323
appleExerciseTime: 1323
appleExerciseTimeGoal: 1323
appleMoveTime: 1323
appleMoveTimeGoal: 1323
appleStandHours: 1323
appleStandHoursGoal: 1323
creationDate: 2248622
dateComponents: 1323
device: 2176842
duration: 373
durationUnit: 373
endDate: 2248622
sourceName: 2248622
sourceVersion: 2232606
startDate: 2248622
type: 2248249
unit: 2231795
value: 2248220
workoutActivityType: 373

Record types:
ActiveEnergyBurned: 938351
AppleExerciseTime: 23543
AppleStandHour: 15556
AppleStandTime: 62337
AppleWa

-----

# Apple Health Data Check and Simple Data Analysis

In [6]:
import numpy as np
import pandas as pd
import glob

----

# Weight

In [7]:
weight = pd.read_csv("data/BodyMass.csv")

In [8]:
weight.tail()

Unnamed: 0,sourceName,sourceVersion,device,type,unit,creationDate,startDate,endDate,value
0,Health,10.1.1,,BodyMass,lb,2016-12-04 18:13:07 -0400,2016-12-04 18:13:07 -0400,2016-12-04 18:13:07 -0400,110
1,Leeor,13.3,,BodyMass,lb,2020-01-03 15:56:51 -0400,2020-01-03 15:56:51 -0400,2020-01-03 15:56:51 -0400,127
2,Health,15.6.1,,BodyMass,lb,2023-02-08 13:58:15 -0400,2023-02-08 13:58:00 -0400,2023-02-08 13:58:00 -0400,135


In [9]:
weight.describe()

Unnamed: 0,device,value
count,0.0,3.0
mean,,124.0
std,,12.767145
min,,110.0
25%,,118.5
50%,,127.0
75%,,131.0
max,,135.0


----

## Steps

In [10]:
steps = pd.read_csv("data/StepCount.csv")

In [11]:
len(steps)

214203

In [12]:
steps.columns

Index(['sourceName', 'sourceVersion', 'device', 'type', 'unit', 'creationDate',
       'startDate', 'endDate', 'value'],
      dtype='object')

In [13]:
steps.describe()

Unnamed: 0,value
count,214203.0
mean,110.116581
std,161.562952
min,1.0
25%,20.0
50%,50.0
75%,124.0
max,13383.0


In [14]:
steps.tail()

Unnamed: 0,sourceName,sourceVersion,device,type,unit,creationDate,startDate,endDate,value
214198,Leeor’s Apple Watch,9.6.2,"<<HKDevice: 0x28379f840>, name:Apple Watch, ma...",StepCount,count,2023-09-15 14:11:06 -0400,2023-09-15 13:59:54 -0400,2023-09-15 14:08:32 -0400,202
214199,Leeor,16.6.1,"<<HKDevice: 0x28379fd90>, name:iPhone, manufac...",StepCount,count,2023-09-15 14:11:13 -0400,2023-09-15 14:00:11 -0400,2023-09-15 14:07:16 -0400,181
214200,Leeor’s Apple Watch,9.6.2,"<<HKDevice: 0x28379f840>, name:Apple Watch, ma...",StepCount,count,2023-09-15 14:22:38 -0400,2023-09-15 14:10:22 -0400,2023-09-15 14:20:09 -0400,266
214201,Leeor’s Apple Watch,9.6.2,"<<HKDevice: 0x28379f840>, name:Apple Watch, ma...",StepCount,count,2023-09-15 14:44:59 -0400,2023-09-15 14:32:43 -0400,2023-09-15 14:33:06 -0400,19
214202,Leeor,16.6.1,"<<HKDevice: 0x28379fd90>, name:iPhone, manufac...",StepCount,count,2023-09-15 14:24:14 -0400,2023-09-15 14:13:04 -0400,2023-09-15 14:19:51 -0400,249


In [15]:
# total all-time steps
steps.value.sum()

23587302

-------

## Stand Count

In [16]:
stand = pd.read_csv("data/AppleStandHour.csv")

In [17]:
len(stand)

15556

In [18]:
stand.columns

Index(['sourceName', 'sourceVersion', 'device', 'type', 'unit', 'creationDate',
       'startDate', 'endDate', 'value'],
      dtype='object')

In [19]:
stand.describe()

Unnamed: 0,unit
count,0.0
mean,
std,
min,
25%,
50%,
75%,
max,


In [20]:
stand.tail()

Unnamed: 0,sourceName,sourceVersion,device,type,unit,creationDate,startDate,endDate,value
15551,Leeor’s Apple Watch,9.6.2,"<<HKDevice: 0x28377f2a0>, name:Apple Watch, ma...",AppleStandHour,,2023-09-15 11:00:35 -0400,2023-09-15 10:00:00 -0400,2023-09-15 11:00:00 -0400,HKCategoryValueAppleStandHourIdle
15552,Leeor’s Apple Watch,9.6.2,"<<HKDevice: 0x28377f2a0>, name:Apple Watch, ma...",AppleStandHour,,2023-09-15 11:56:21 -0400,2023-09-15 11:00:00 -0400,2023-09-15 12:00:00 -0400,HKCategoryValueAppleStandHourStood
15553,Leeor’s Apple Watch,9.6.2,"<<HKDevice: 0x28377f2a0>, name:Apple Watch, ma...",AppleStandHour,,2023-09-15 12:41:22 -0400,2023-09-15 12:00:00 -0400,2023-09-15 13:00:00 -0400,HKCategoryValueAppleStandHourStood
15554,Leeor’s Apple Watch,9.6.2,"<<HKDevice: 0x28377f2a0>, name:Apple Watch, ma...",AppleStandHour,,2023-09-15 13:09:26 -0400,2023-09-15 13:00:00 -0400,2023-09-15 14:00:00 -0400,HKCategoryValueAppleStandHourStood
15555,Leeor’s Apple Watch,9.6.2,"<<HKDevice: 0x28377f2a0>, name:Apple Watch, ma...",AppleStandHour,,2023-09-15 14:01:03 -0400,2023-09-15 14:00:00 -0400,2023-09-15 15:00:00 -0400,HKCategoryValueAppleStandHourStood


------

## Resting Heart Rate (HR)

In [21]:
restingHR = pd.read_csv("data/RestingHeartRate.csv")

In [22]:
len(restingHR)

1225

In [23]:
restingHR.describe()

Unnamed: 0,device,value
count,0.0,1225.0
mean,,61.378776
std,,4.857802
min,,47.0
25%,,58.0
50%,,61.0
75%,,65.0
max,,84.0


---

## Walking Heart Rate (HR) Average

In [24]:
walkingHR = pd.read_csv("data/WalkingHeartRateAverage.csv")

In [25]:
len(walkingHR)

1181

In [26]:
walkingHR.describe()

Unnamed: 0,device,value
count,0.0,1181.0
mean,,100.460627
std,,10.749494
min,,72.0
25%,,93.0
50%,,100.0
75%,,107.0
max,,163.0


---

## Heart Rate Variability (HRV)

In [27]:
hrv = pd.read_csv("data/HeartRateVariabilitySDNN.csv")

In [28]:
len(hrv)

3256

In [29]:
hrv.columns

Index(['sourceName', 'sourceVersion', 'device', 'type', 'unit', 'creationDate',
       'startDate', 'endDate', 'value'],
      dtype='object')

In [30]:
hrv.describe()

Unnamed: 0,value
count,3256.0
mean,52.767602
std,19.636088
min,11.4652
25%,39.70645
50%,50.02415
75%,62.723525
max,181.086


In [31]:
hrv.tail()

Unnamed: 0,sourceName,sourceVersion,device,type,unit,creationDate,startDate,endDate,value
3251,Leeor’s Apple Watch,9.6.2,"<<HKDevice: 0x283770640>, name:Apple Watch, ma...",HeartRateVariabilitySDNN,ms,2023-09-14 13:42:40 -0400,2023-09-14 13:41:38 -0400,2023-09-14 13:42:38 -0400,39.5751
3252,Leeor’s Apple Watch,9.6.2,"<<HKDevice: 0x283770640>, name:Apple Watch, ma...",HeartRateVariabilitySDNN,ms,2023-09-14 17:58:22 -0400,2023-09-14 17:57:20 -0400,2023-09-14 17:58:19 -0400,57.1949
3253,Leeor’s Apple Watch,9.6.2,"<<HKDevice: 0x283770640>, name:Apple Watch, ma...",HeartRateVariabilitySDNN,ms,2023-09-14 22:06:13 -0400,2023-09-14 22:05:12 -0400,2023-09-14 22:06:11 -0400,39.6889
3254,Leeor’s Apple Watch,9.6.2,"<<HKDevice: 0x283770640>, name:Apple Watch, ma...",HeartRateVariabilitySDNN,ms,2023-09-15 09:46:15 -0400,2023-09-15 09:45:13 -0400,2023-09-15 09:46:13 -0400,81.9324
3255,Leeor’s Apple Watch,9.6.2,"<<HKDevice: 0x283770640>, name:Apple Watch, ma...",HeartRateVariabilitySDNN,ms,2023-09-15 14:28:07 -0400,2023-09-15 14:27:04 -0400,2023-09-15 14:28:04 -0400,48.1281


-------

## VO2 Max

In [32]:
vo2max = pd.read_csv("data/VO2Max.csv")

In [33]:
len(vo2max)

290

In [34]:
vo2max.describe()

Unnamed: 0,device,value
count,0.0,290.0
mean,,43.27899
std,,1.698037
min,,38.25
25%,,42.205
50%,,43.11
75%,,44.47
max,,47.0


----

## Blood Pressure

In [35]:
diastolic = pd.read_csv("data/BloodPressureDiastolic.csv")
systolic = pd.read_csv("data/BloodPressureSystolic.csv")

FileNotFoundError: [Errno 2] No such file or directory: 'data/BloodPressureDiastolic.csv'

In [None]:
diastolic.describe()

Unnamed: 0,device,value
count,0.0,29.0
mean,,65.586207
std,,5.0816
min,,55.0
25%,,63.0
50%,,67.0
75%,,69.0
max,,76.0


In [None]:
systolic.describe()

Unnamed: 0,device,value
count,0.0,29.0
mean,,113.206897
std,,8.973689
min,,95.0
25%,,106.0
50%,,112.0
75%,,122.0
max,,128.0


------

## Sleep

In [36]:
sleep = pd.read_csv("data/SleepAnalysis.csv")

In [37]:
sleep.tail()

Unnamed: 0,sourceName,sourceVersion,device,type,unit,creationDate,startDate,endDate,value
431,Clock,50.0,"<<HKDevice: 0x28378eda0>, name:iPhone, manufac...",SleepAnalysis,,2018-12-10 10:40:21 -0400,2018-12-10 07:10:28 -0400,2018-12-10 07:34:32 -0400,HKCategoryValueSleepAnalysisInBed
432,Clock,50.0,"<<HKDevice: 0x28378eda0>, name:iPhone, manufac...",SleepAnalysis,,2018-12-10 10:40:21 -0400,2018-12-10 07:35:20 -0400,2018-12-10 07:45:08 -0400,HKCategoryValueSleepAnalysisInBed
433,Clock,50.0,"<<HKDevice: 0x28378eda0>, name:iPhone, manufac...",SleepAnalysis,,2018-12-10 10:40:21 -0400,2018-12-10 08:08:04 -0400,2018-12-10 08:09:28 -0400,HKCategoryValueSleepAnalysisInBed
434,Clock,50.0,"<<HKDevice: 0x28378eda0>, name:iPhone, manufac...",SleepAnalysis,,2018-12-10 10:40:21 -0400,2018-12-10 08:09:44 -0400,2018-12-10 08:21:52 -0400,HKCategoryValueSleepAnalysisInBed
435,Clock,50.0,"<<HKDevice: 0x28378eda0>, name:iPhone, manufac...",SleepAnalysis,,2018-12-10 10:40:21 -0400,2018-12-10 08:22:40 -0400,2018-12-10 10:40:11 -0400,HKCategoryValueSleepAnalysisInBed


In [38]:
sleep.describe()

Unnamed: 0,sourceVersion,unit
count,436.0,0.0
mean,49.887615,
std,2.346674,
min,1.0,
25%,50.0,
50%,50.0,
75%,50.0,
max,50.0,
