## Extract 2019 Step Counts from Apple iOS Health Data App

Thanks to https://medium.com/better-programming/analyze-your-icloud-health-data-with-pandas-dd5e963e902f for helping me to understand more about Apple's iOS Health Data App and the structure of how to extract the records. 

In [1]:
#pip install xmltodict

In [2]:
import pandas as pd
import xmltodict

#I decided to remove export.xml to protect my information
with open('export.xml', 'r') as xml_data:
    input_data = xmltodict.parse(xml_data.read())

In [3]:
the_records = input_data['HealthData']['Record']

In [4]:
df = pd.DataFrame(the_records)
df.columns

Index(['@type', '@sourceName', '@sourceVersion', '@device', '@unit',
       '@creationDate', '@startDate', '@endDate', '@value', 'MetadataEntry'],
      dtype='object')

In [5]:
#I chose StepCount
df['@type'].unique()

array(['HKQuantityTypeIdentifierStepCount',
       'HKQuantityTypeIdentifierDistanceWalkingRunning',
       'HKQuantityTypeIdentifierFlightsClimbed',
       'HKCategoryTypeIdentifierSleepAnalysis'], dtype=object)

In [6]:
#Only extract steps 
steps = df[df['@type'] == 'HKQuantityTypeIdentifierStepCount']

In [7]:
df['@creationDate'][:10]

0    2016-01-04 19:21:15 -0400
1    2016-01-04 20:21:51 -0400
2    2016-01-04 20:21:51 -0400
3    2016-01-04 21:21:15 -0400
4    2016-01-04 23:51:15 -0400
5    2016-01-05 09:57:59 -0400
6    2016-01-05 11:02:43 -0400
7    2016-01-05 11:02:43 -0400
8    2016-01-05 11:51:15 -0400
9    2016-01-05 11:51:15 -0400
Name: @creationDate, dtype: object

In [8]:
df['@startDate'][:10]

0    2016-01-04 19:11:51 -0400
1    2016-01-04 19:17:34 -0400
2    2016-01-04 19:24:20 -0400
3    2016-01-04 21:15:57 -0400
4    2016-01-04 22:56:52 -0400
5    2016-01-05 09:19:16 -0400
6    2016-01-05 09:54:25 -0400
7    2016-01-05 10:44:05 -0400
8    2016-01-05 11:28:49 -0400
9    2016-01-05 11:38:28 -0400
Name: @startDate, dtype: object

In [9]:
df['@endDate'][:10]

0    2016-01-04 19:17:34 -0400
1    2016-01-04 19:17:52 -0400
2    2016-01-04 19:24:56 -0400
3    2016-01-04 21:16:46 -0400
4    2016-01-04 23:01:09 -0400
5    2016-01-05 09:19:16 -0400
6    2016-01-05 09:57:59 -0400
7    2016-01-05 10:44:38 -0400
8    2016-01-05 11:32:42 -0400
9    2016-01-05 11:38:49 -0400
Name: @endDate, dtype: object

In [10]:
#Examining the date format must be converted to datetime object

steps['@creationDate'] = pd.to_datetime(df['@creationDate'])
steps['@startDate'] = pd.to_datetime(df['@startDate'])
steps['@endDate'] = pd.to_datetime(df['@endDate'])

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  This is separate from the ipykernel package so we can avoid doing imports until
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  after removing the cwd from sys.path.
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  """


In [11]:
#Check if value is int
steps.loc[:,'@value'] = pd.to_numeric(steps.loc[:, '@value']) 
steps.dtypes

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  self.obj[item] = s


@type                                             object
@sourceName                                       object
@sourceVersion                                    object
@device                                           object
@unit                                             object
@creationDate     datetime64[ns, pytz.FixedOffset(-240)]
@startDate        datetime64[ns, pytz.FixedOffset(-240)]
@endDate          datetime64[ns, pytz.FixedOffset(-240)]
@value                                             int64
MetadataEntry                                     object
dtype: object

In [12]:
count_steps = steps.groupby('@creationDate').sum()
days = count_steps['@value'].resample('D').sum()
months = days.resample('M').mean()

In [13]:
# Filter by years = months.index.year
# Filter by month = months.index.month

avg_steps_month = months[(months.index.year == 2019)]
avg_steps_month

@creationDate
2019-01-31 00:00:00-04:00    1367.838710
2019-02-28 00:00:00-04:00    2053.642857
2019-03-31 00:00:00-04:00    1274.483871
2019-04-30 00:00:00-04:00    2701.566667
2019-05-31 00:00:00-04:00    1115.580645
2019-06-30 00:00:00-04:00    2439.700000
2019-07-31 00:00:00-04:00    5511.645161
2019-08-31 00:00:00-04:00    1088.451613
2019-09-30 00:00:00-04:00    1577.133333
2019-10-31 00:00:00-04:00    1689.935484
2019-11-30 00:00:00-04:00    1743.866667
2019-12-31 00:00:00-04:00    1514.129032
Freq: M, Name: @value, dtype: float64

In [14]:
avg_steps_month.plot()



<matplotlib.axes._subplots.AxesSubplot at 0x217f32276a0>

In [32]:
#Export into csv file with the months data for 2019
months = ['January', 'February', 'March', 'April', 'May', 'June', 'July', 
          'August', 'September', 'October', 'November', 'December']

frame = pd.DataFrame(avg_steps_month)
frame.index = months
frame

Unnamed: 0,@value
January,1367.83871
February,2053.642857
March,1274.483871
April,2701.566667
May,1115.580645
June,2439.7
July,5511.645161
August,1088.451613
September,1577.133333
October,1689.935484


In [34]:
#Scale down the values for easier bargraph creation
import math
new_scale = []

for i in frame['@value'].values:
    i = math.ceil(i/200)
    new_scale.append(i)
    
frame['@value'] = new_scale
frame

Unnamed: 0,@value
January,7
February,11
March,7
April,14
May,6
June,13
July,28
August,6
September,8
October,9


In [39]:
#Export the dataframe into a csv for echoAR to use the values
frame.to_csv("echoar_metadata.csv", header=False)