# Glucose Data Science
---
Goal: The goal of this experiment is to see if it is possible to predict which days I have exercised based on the days
blood glucose readings.
Stretch Goal: Predict the type of excercise.

### Technology:
- Apple HealthKit
- Dexcom Continuous Glucose Monitor
- Datascience

### High Level Objectives:
- Gather Data
- Clean Up Data
- Inspect Data
- ML with Data

### Background Information:
 I have __Type 1 Diabetes__ which is an auto-immune disease that causes my body to attack the insulin
 producing cells in my pancreas. Insulin is what helps your body use the energy (glucose) in the food you eat. To manage
 this disease I use some extremely cool pieces of technology. One is a continuous glucose monitor called a __Dexcom__
 CGM. I wear this cgm on my body, and it measures the amount of glucose in my interstitial fluid ( also known as the
 fluid between cells) every 5 minutes. That data is then sent to my phone ( even allowing me to view my real time
 glucose on my Apple Watch! ) as well as my insulin pump. This data, as well as other information is used to help manage
 my Diabetes.

 It has been my personal experience, that when I exercise, I tend to have higher insulin sensitivity, usually meaning
 better blood glucose numbers throughout the day. Your blood glucose is what your body automatically regulates on its
 own with insulin, but must be done manually as a type 1 diabetic.

 The goal of this exercise is to see if I can pull that correlation out with data, and possibly see if I can use a
 Machine Learning Model to predict which days I exercised, based only off my glucose data.

# Gather Data
---
The source of the excercise data will be apple healthkit. Apple Healthkit automatically has my excercise data in it from when I
track workouts. It also can be configured to sync with the Dexcom iOS app. So now lets see if we can download and load
that data.

## Export data from phone:
- Navigate to health app
- click on your profile icon
- scroll to bottom and click export all health data
- this may take a few minutes
- donwload zip
- expand zip
- this data is pretty sensitive, so don't commit it to git!


In [1]:
import sys
print("The version of python is: ", sys.version)

The version of python is:  3.6.8 (v3.6.8:3c6b436a57, Dec 24 2018, 02:04:31) 
[GCC 4.2.1 Compatible Apple LLVM 6.0 (clang-600.0.57)]


# Let's import this data into pandas

In [2]:
import pandas as pd
import xmltodict
input_path = './apple_health_export/export.xml'
with open(input_path, 'r') as xml_file:
    input_data = xmltodict.parse(xml_file.read())

# available health records
# thanks to this article for explaining:
#   https://medium.com/better-programming/analyze-your-icloud-health-data-with-pandas-dd5e963e902f
records_list = input_data['HealthData']['Record']

In [3]:
df = pd.DataFrame(records_list)

In [4]:
df.columns

Index(['@type', '@sourceName', '@unit', '@creationDate', '@startDate',
       '@endDate', '@value', 'MetadataEntry', '@sourceVersion', '@device',
       'HeartRateVariabilityMetadataList'],
      dtype='object')

In [5]:
# Show All data types
df['@type'].unique()

array(['HKQuantityTypeIdentifierBloodGlucose',
       'HKQuantityTypeIdentifierBodyMassIndex',
       'HKQuantityTypeIdentifierHeight',
       'HKQuantityTypeIdentifierBodyMass',
       'HKQuantityTypeIdentifierHeartRate',
       'HKQuantityTypeIdentifierBodyFatPercentage',
       'HKQuantityTypeIdentifierLeanBodyMass',
       'HKQuantityTypeIdentifierStepCount',
       'HKQuantityTypeIdentifierDistanceWalkingRunning',
       'HKQuantityTypeIdentifierBasalEnergyBurned',
       'HKQuantityTypeIdentifierActiveEnergyBurned',
       'HKQuantityTypeIdentifierFlightsClimbed',
       'HKQuantityTypeIdentifierAppleExerciseTime',
       'HKQuantityTypeIdentifierDistanceCycling',
       'HKQuantityTypeIdentifierDistanceSwimming',
       'HKQuantityTypeIdentifierSwimmingStrokeCount',
       'HKQuantityTypeIdentifierRestingHeartRate',
       'HKQuantityTypeIdentifierVO2Max',
       'HKQuantityTypeIdentifierWalkingHeartRateAverage',
       'HKQuantityTypeIdentifierEnvironmentalAudioExposure',
     

In [6]:
# show all source types
df['@device'].unique()

array([nan,
       '<<HKDevice: 0x2821663f0>, name:Apple Watch, manufacturer:Apple, model:Watch, hardware:Watch3,2, software:5.0>',
       '<<HKDevice: 0x282166300>, name:Apple Watch, manufacturer:Apple, model:Watch, hardware:Watch3,2, software:5.0>',
       ...,
       '<<HKDevice: 0x28214a800>, name:Apple Watch, manufacturer:Apple Inc., model:Watch, hardware:Watch5,4, software:7.2>',
       '<<HKDevice: 0x28214a350>, name:Apple Watch, manufacturer:Apple Inc., model:Watch, hardware:Watch5,4, software:7.2>',
       '<<HKDevice: 0x28214a300>, name:Apple Watch, manufacturer:Apple Inc., model:Watch, hardware:Watch5,4, software:7.3>'],
      dtype=object)

### Before working with glucose data, lets try a quick exp with step counts similar to the guide above.

In [9]:
step_counts = df[df['@type'] == 'HKQuantityTypeIdentifierStepCount']
# the medium article includes a bug, dont use the format, use s
# format = '%Y-%m-%d %H:%M:%S %z'
df['@creationDate'] = pd.to_datetime(df['@creationDate'],
                                     format='s')
df['@startDate'] = pd.to_datetime(df['@startDate'],
                                  format='s')
df['@endDate'] = pd.to_datetime(df['@endDate'],
                                format='s')
step_counts.loc[:, '@value'] = pd.to_numeric(
    step_counts.loc[:, '@value'])

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  isetter(ilocs[0], value)


In [12]:
step_counts_by_creation = step_counts.groupby('@creationDate').sum()
by_day = step_counts_by_creation['@value'].resample('D').sum()

### Days with most steps

In [13]:
by_day.sort_values(ascending=False)[:10]

@creationDate
2018-10-03 00:00:00-08:00    60522
2019-07-05 00:00:00-08:00    53743
2017-06-04 00:00:00-08:00    52050
2018-07-20 00:00:00-08:00    52013
2016-06-02 00:00:00-08:00    47451
2018-11-29 00:00:00-08:00    46813
2017-10-17 00:00:00-08:00    46415
2016-04-17 00:00:00-08:00    44828
2018-07-19 00:00:00-08:00    44784
2016-06-03 00:00:00-08:00    44388
Name: @value, dtype: int64