# Apple Health Extractor

This code will parse your Apple Health export data, create multiple CSV and do some simple data checks and data analysis. 

Enjoy! 

--------

## Extract Data and Export to CSVs from Apple Health's Export.xml

* Command Line Tool to Process apple health's export.xml file 
* Create multiple CSV files for each data type. 
* Original Source: https://github.com/tdda/applehealthdata
* Based on the size of your Apple Health Data, this script may take several minutes to complete.

**NOTE: Currently there are a few minror errors based on additional data from Apple Health that require some updates.** 

## Setup and Usage NOTE

* Export your data from Apple Health App on your phone. 
* Unzip export.zip into this directory and rename to data. 
* Inside your directory there should be a directory and file here: /data/export.xml
* Run inside project or in the command line.

In [1]:
# %run -i 'apple-health-data-parser' 'export.xml' 
%run -i 'apple-health-data-parser' 'export.xml' 

Reading data from export.xml . . . done
Unexpected node of type ExportDate.

Tags:
ActivitySummary: 686
ExportDate: 1
Me: 1
Record: 1142965
Workout: 106

Fields:
HKCharacteristicTypeIdentifierBiologicalSex: 1
HKCharacteristicTypeIdentifierBloodType: 1
HKCharacteristicTypeIdentifierDateOfBirth: 1
HKCharacteristicTypeIdentifierFitzpatrickSkinType: 1
activeEnergyBurned: 686
activeEnergyBurnedGoal: 686
activeEnergyBurnedUnit: 686
appleExerciseTime: 686
appleExerciseTimeGoal: 686
appleStandHours: 686
appleStandHoursGoal: 686
creationDate: 1143071
dateComponents: 686
device: 1125552
duration: 106
durationUnit: 106
endDate: 1143071
sourceName: 1143071
sourceVersion: 1138201
startDate: 1143071
totalDistance: 106
totalDistanceUnit: 106
totalEnergyBurned: 106
totalEnergyBurnedUnit: 106
type: 1142965
unit: 1133858
value: 1142954
workoutActivityType: 106

Record types:
ActiveEnergyBurned: 525528
AppleExerciseTime: 11599
AppleStandHour: 9073
AppleStandTime: 4813
BasalEnergyBurned: 100290
BodyFatPer

-----

# Apple Health Data Check and Simple Data Analysis

In [2]:
import numpy as np
import pandas as pd
import glob

----

# Weight

In [5]:
weight = pd.read_csv("BodyMass.csv")

In [6]:
weight.tail()

Unnamed: 0,sourceName,sourceVersion,device,type,unit,creationDate,startDate,endDate,value
176,Mi Fit,201907081918,,BodyMass,kg,2020-07-02 07:52:37 +0530,2020-07-02 07:52:31 +0530,2020-07-02 07:52:31 +0530,88.8
177,Mi Fit,201907081918,,BodyMass,kg,2020-07-04 09:09:36 +0530,2020-07-04 09:09:25 +0530,2020-07-04 09:09:25 +0530,90.9
178,Mi Fit,201907081918,,BodyMass,kg,2020-07-05 09:03:03 +0530,2020-07-04 09:10:52 +0530,2020-07-04 09:10:52 +0530,89.4
179,Mi Fit,201907081918,,BodyMass,kg,2020-07-05 09:03:03 +0530,2020-07-05 09:02:55 +0530,2020-07-05 09:02:55 +0530,88.9
180,Mi Fit,201907081918,,BodyMass,kg,2020-07-06 08:33:11 +0530,2020-07-06 08:33:05 +0530,2020-07-06 08:33:05 +0530,88.3


In [7]:
weight.describe()

Unnamed: 0,device,value
count,0.0,181.0
mean,,88.637569
std,,0.806861
min,,84.2
25%,,88.3
50%,,88.6
75%,,89.1
max,,90.9


----

## Steps

In [8]:
steps = pd.read_csv("StepCount.csv")

In [9]:
len(steps)

174943

In [10]:
steps.columns

Index(['sourceName', 'sourceVersion', 'device', 'type', 'unit', 'creationDate',
       'startDate', 'endDate', 'value'],
      dtype='object')

In [11]:
steps.describe()

Unnamed: 0,value
count,174943.0
mean,82.619207
std,214.041698
min,1.0
25%,17.0
50%,40.0
75%,90.0
max,43109.0


In [12]:
steps.tail()

Unnamed: 0,sourceName,sourceVersion,device,type,unit,creationDate,startDate,endDate,value
174938,Shashank's iPhone,13.5.1,"<<HKDevice: 0x28273ef80>, name:iPhone, manufac...",StepCount,count,2020-07-06 08:33:25 +0530,2020-07-06 08:32:44 +0530,2020-07-06 08:32:47 +0530,4
174939,Shashank's iPhone,13.5.1,"<<HKDevice: 0x28273f020>, name:iPhone, manufac...",StepCount,count,2020-07-06 08:45:07 +0530,2020-07-06 08:34:02 +0530,2020-07-06 08:35:22 +0530,75
174940,Shashank's iPhone,13.5.1,"<<HKDevice: 0x28273f0c0>, name:iPhone, manufac...",StepCount,count,2020-07-06 09:28:26 +0530,2020-07-06 09:27:42 +0530,2020-07-06 09:28:02 +0530,17
174941,Shashank's iPhone,13.5.1,"<<HKDevice: 0x28273f160>, name:iPhone, manufac...",StepCount,count,2020-07-06 09:32:56 +0530,2020-07-06 09:29:15 +0530,2020-07-06 09:32:12 +0530,101
174942,Shashank's iPhone,13.5.1,"<<HKDevice: 0x28273f200>, name:iPhone, manufac...",StepCount,count,2020-07-06 09:55:59 +0530,2020-07-06 09:44:57 +0530,2020-07-06 09:45:22 +0530,29


In [13]:
# total all-time steps
steps.value.sum()

14453652

-------

## Stand Count

In [14]:
stand = pd.read_csv("AppleStandHour.csv")

In [15]:
len(stand)

9073

In [16]:
stand.columns

Index(['sourceName', 'sourceVersion', 'device', 'type', 'unit', 'creationDate',
       'startDate', 'endDate', 'value'],
      dtype='object')

In [17]:
stand.describe()

Unnamed: 0,unit
count,0.0
mean,
std,
min,
25%,
50%,
75%,
max,


In [18]:
stand.tail()

Unnamed: 0,sourceName,sourceVersion,device,type,unit,creationDate,startDate,endDate,value
9068,Shashank’s Apple Watch,6.1.3,"<<HKDevice: 0x2826a4be0>, name:Apple Watch, ma...",AppleStandHour,,2020-07-05 19:01:23 +0530,2020-07-05 19:00:00 +0530,2020-07-05 20:00:00 +0530,HKCategoryValueAppleStandHourStood
9069,Shashank’s Apple Watch,6.1.3,"<<HKDevice: 0x2826a4cd0>, name:Apple Watch, ma...",AppleStandHour,,2020-07-05 20:11:10 +0530,2020-07-05 20:00:00 +0530,2020-07-05 21:00:00 +0530,HKCategoryValueAppleStandHourStood
9070,Shashank’s Apple Watch,6.1.3,"<<HKDevice: 0x2826a4dc0>, name:Apple Watch, ma...",AppleStandHour,,2020-07-05 21:04:22 +0530,2020-07-05 21:00:00 +0530,2020-07-05 22:00:00 +0530,HKCategoryValueAppleStandHourStood
9071,Shashank’s Apple Watch,6.1.3,"<<HKDevice: 0x2826a4eb0>, name:Apple Watch, ma...",AppleStandHour,,2020-07-05 22:01:13 +0530,2020-07-05 22:00:00 +0530,2020-07-05 23:00:00 +0530,HKCategoryValueAppleStandHourStood
9072,Shashank’s Apple Watch,6.1.3,"<<HKDevice: 0x2826a4fa0>, name:Apple Watch, ma...",AppleStandHour,,2020-07-05 23:35:54 +0530,2020-07-05 23:00:00 +0530,2020-07-06 00:00:00 +0530,HKCategoryValueAppleStandHourStood


------

## Resting Heart Rate (HR)

In [19]:
restingHR = pd.read_csv("RestingHeartRate.csv")

In [20]:
len(restingHR)

645

In [21]:
restingHR.describe()

Unnamed: 0,device,value
count,0.0,645.0
mean,,69.809302
std,,5.422455
min,,50.0
25%,,67.0
50%,,69.0
75%,,72.0
max,,98.0


---

## Walking Heart Rate (HR) Average

In [22]:
walkingHR = pd.read_csv("WalkingHeartRateAverage.csv")

In [23]:
len(walkingHR)

539

In [24]:
walkingHR.describe()

Unnamed: 0,device,value
count,0.0,539.0
mean,,99.084416
std,,11.996546
min,,72.5
25%,,91.0
50%,,97.0
75%,,104.0
max,,143.0


---

## Heart Rate Variability (HRV)

In [25]:
hrv = pd.read_csv("HeartRateVariabilitySDNN.csv")

In [26]:
len(hrv)

1687

In [27]:
hrv.columns

Index(['sourceName', 'sourceVersion', 'device', 'type', 'unit', 'creationDate',
       'startDate', 'endDate', 'value'],
      dtype='object')

In [28]:
hrv.describe()

Unnamed: 0,value
count,1687.0
mean,33.308511
std,13.458962
min,7.32718
25%,23.76185
50%,31.0815
75%,40.1322
max,160.64


In [27]:
hrv.tail()

Unnamed: 0,sourceName,sourceVersion,device,type,unit,creationDate,startDate,endDate,value
1211,Mark’s Apple Watch,4.3,"<<HKDevice: 0x1c0c8dc50>, name:Apple Watch, ma...",HeartRateVariabilitySDNN,ms,2018-04-30 04:23:46 +0800,2018-04-30 04:22:40 +0800,2018-04-30 04:23:45 +0800,12.5996
1212,Mark’s Apple Watch,4.3,"<<HKDevice: 0x1c0684920>, name:Apple Watch, ma...",HeartRateVariabilitySDNN,ms,2018-04-30 06:23:48 +0800,2018-04-30 06:22:47 +0800,2018-04-30 06:23:48 +0800,32.791
1213,Mark’s Apple Watch,4.3,"<<HKDevice: 0x1c0c9a5e0>, name:Apple Watch, ma...",HeartRateVariabilitySDNN,ms,2018-04-30 08:24:10 +0800,2018-04-30 08:23:05 +0800,2018-04-30 08:24:10 +0800,22.8008
1214,Mark’s Apple Watch,4.3,"<<HKDevice: 0x1c06932e0>, name:Apple Watch, ma...",HeartRateVariabilitySDNN,ms,2018-04-30 12:37:02 +0800,2018-04-30 12:35:57 +0800,2018-04-30 12:37:02 +0800,110.704
1215,Mark’s Apple Watch,4.3,"<<HKDevice: 0x1c0697c00>, name:Apple Watch, ma...",HeartRateVariabilitySDNN,ms,2018-04-30 17:20:02 +0800,2018-04-30 17:18:57 +0800,2018-04-30 17:20:01 +0800,37.1214


-------

## VO2 Max

In [28]:
vo2max = pd.read_csv("data/VO2Max.csv")

In [29]:
len(vo2max)

143

In [30]:
vo2max.describe()

Unnamed: 0,sourceVersion,device,value
count,0.0,0.0,143.0
mean,,,51.085681
std,,,1.900692
min,,,48.0084
25%,,,49.3646
50%,,,51.0986
75%,,,52.3505
max,,,55.0978


----

## Blood Pressure

In [31]:
diastolic = pd.read_csv("data/BloodPressureDiastolic.csv")
systolic = pd.read_csv("data/BloodPressureSystolic.csv")

In [32]:
diastolic.describe()

Unnamed: 0,device,value
count,0.0,29.0
mean,,65.586207
std,,5.0816
min,,55.0
25%,,63.0
50%,,67.0
75%,,69.0
max,,76.0


In [33]:
systolic.describe()

Unnamed: 0,device,value
count,0.0,29.0
mean,,113.206897
std,,8.973689
min,,95.0
25%,,106.0
50%,,112.0
75%,,122.0
max,,128.0


------

## Sleep

In [34]:
sleep = pd.read_csv("data/SleepAnalysis.csv")

In [35]:
sleep.tail()

Unnamed: 0,sourceName,sourceVersion,device,type,unit,creationDate,startDate,endDate,value
1807,AutoSleep,5.1.20,,SleepAnalysis,,2018-04-28 11:34:16 +0800,2018-04-28 10:23:00 +0800,2018-04-28 10:47:00 +0800,HKCategoryValueSleepAnalysisAsleep
1808,AutoSleep,5.1.20,,SleepAnalysis,,2018-04-29 08:17:12 +0800,2018-04-29 00:27:00 +0800,2018-04-29 08:12:00 +0800,HKCategoryValueSleepAnalysisInBed
1809,AutoSleep,5.1.20,,SleepAnalysis,,2018-04-29 08:17:12 +0800,2018-04-29 00:27:00 +0800,2018-04-29 08:12:00 +0800,HKCategoryValueSleepAnalysisAsleep
1810,AutoSleep,5.1.20,,SleepAnalysis,,2018-04-30 10:04:58 +0800,2018-04-30 00:45:00 +0800,2018-04-30 08:43:00 +0800,HKCategoryValueSleepAnalysisInBed
1811,AutoSleep,5.1.20,,SleepAnalysis,,2018-04-30 10:04:58 +0800,2018-04-30 00:45:00 +0800,2018-04-30 08:43:00 +0800,HKCategoryValueSleepAnalysisAsleep


In [36]:
sleep.describe()

Unnamed: 0,device,unit
count,0.0,0.0
mean,,
std,,
min,,
25%,,
50%,,
75%,,
max,,
