## My Heart Counts Machine Learning Infrastructure
This jupyter notebook demonstrates the use of My Heart Counts machine learning infrastructure.
    
# Architecture ER diagram

![alt text here](tutorial_docs/MyHeartCounts_ML_Pipeline_architecture.png)




# Usage

pip install MyHeartCounts

The MyHeartCounts Python package will set up access to Synapse and Amazon Web Services. 

Inputs:
1. Plain text file with synapse credentials. Username in the first line and Password in the second. For example.

alijaved<br>
thisismypassword



In [6]:
#import libraries
from MyHeartCounts import MyHeartCounts
##############################

#Initilize a MyHeartCounts object
MHC = MyHeartCounts(user_password_file_path = 'synapseAccess.txt')
#Rev up your engine!! -- Setting up of cache and other administrative scripts
MHC.start()
print('MHC ML Infrastructure up and running...')

MHC ML Infrastructure up and running...


The MHC Object has metadata on all users. List of users can be accessed by MHC.Users. Let us see the data for first user. 

In [26]:
print('User health code is '+MHC.Users[0].healthCode)
print('User gender code is '+MHC.Users[0].gender)
print('User date of birth code is '+str(MHC.Users[0].dob))

User health code is GP5EmY5Yph_vvTqqBQhKU4A_
User gender code is female
User date of birth code is 1800-10-05 18:00:00


You can see the date of birth is 1800-10-05. This is set on purpose to tackle missing data with known values. These can later be filled using machine learning predictions. 

## Study

Once we have all the users. These users participate in studies. A study is object containing information for a particular study or table. For example, the 6 minute walk test, mindset, health kit workout collector and health kit sleep collector.

We can load as many studies as we like. Fow now let us load health kit workout collector data

In [7]:
#load a study
MHC.loadStudy(studyName = 'HealthKitWorkoutCollector',studyTable = 'syn3560095')

A MHC object can hold multiple studies which can be accessed as MHC.Studies[i], where i is an index in the list of studies. The study name/ Table Name can be accessed as MHC.studies[0].StudyName and MHC.Studies[0].tableName respectively.<br>
First observation in the study can be accessed as a list in MHC.studies[0].observations[1]<br>
Let us try accessing a study...

In [18]:
print('Study '+MHC.Studies[0].studyName +' is extracted from '+MHC.Studies[0].studyTable +' and has a total of '+str(len(MHC.Studies[0].observations))+ ' observations.')

Study HealthKitWorkoutCollector is extracted from syn3560095 and has a total of 125766 observations.


The data is synced with Synapse and Amazon Web Services at the call to MHC.start(), and each study is synced at the point of MHC.loadStudy. The data loaded is parsed, cleaned and ready for ML. Let us look at the first observation in the health kit workout collector study. 

In [19]:
print(MHC.Studies[0].observations[0])

recordId                              PkQPo4F7CxCXd32HcfGTHpkE
appVersion                            version 2.4.1, build 926
phoneInfo              Unknown iPhone [iPhone13,2]; iOS/15.2.1
uploadDate                                          2022-02-05
healthCode                be595b1c-fbad-42e6-8f77-e10ace28ac51
externalId                                                 NaN
dataGroups                                                 NaN
createdOn                                        1644047761945
createdOnTimeZone                                          NaN
userSharingScope                     ALL_QUALIFIED_RESEARCHERS
validationErrors                                           NaN
substudyMemberships                          |cardiovascular=|
dayInStudy                                              1882.0
data.csv                                            87971663.0
rawData                                             87971667.0
rawMetadata                                         879

## Machine Learning example
We will demonstrate a simplistic machine learning user case by predicting user gender using the heigh and weight. This simple model will not have good prediction power but server our purpose of demonstration. <br>

First create input and labels

In [38]:
#Features array
inputs = []
#Training/ testing labels
labels = []

#Algorithms use digits, so change female to 0 female to 1
gender_dict = {}
gender_dict['female'] = 0
gender_dict['male'] = 1

#loop through all users to get data. If we are using data from a study then we will need to extract user data from that study
for user in MHC.Users:
    #We can only train on data we know the labels for. 
    if user.gender.endswith('male') and user.weight>10 and user.height>10:
        #create a feature vector of height and age
        input_observation = []
        input_observation.append(float(user.height))
        input_observation.append(float(user.weight))
        
        #add to out data
        inputs.append(input_observation)
        labels.append(gender_dict[user.gender])

import numpy as np
from collections import Counter
inputs = np.asarray(inputs)
labels = np.asarray(labels)

print('Number of observations in data are '+str(len(inputs)))
print('Gender distribution is '+str(Counter(labels)))

Number of observations in data are 35047
Gender distribution is Counter({1: 25912, 0: 9135})


Let us train a random forest algorithm on this data to see the accuracy.

In [47]:
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier


#first create a train test split
X_train, X_test, y_train, y_test = train_test_split(inputs, labels, test_size=0.33, random_state=42)
#initilize a random forest classifier
clf = RandomForestClassifier(max_depth=2, random_state=0)
#fit the data
clf.fit(X_train, y_train)
#test accuracy on test data
meanAccuracy = clf.score(X_test,y_test)
print('Mean Accuracy over test samples is '+str(meanAccuracy))

Mean Accuracy over test samples is 0.8537956078160125


This simplistic example trains a random forest using user height and weight to predict the gender. Using different studies we can fill in the missing values and combine tables to have fun!

Best of luck!!!!!