# Predictions of WG, CV and night PPGR from dietary and personal features

In this notebook I will build a predictive model for three measurements: Wakeup glucose (WG), CV of the night and PPGR of the night.
Predictive features I want to use will be daily nutritional data, personal data (age, gender, BMI, waist circumference) and blood tests:
- CRP
- lipid profile including triglycerides, HDL, LDL, cholesterol, cholesterol/HDL, Triglycerides/HDL 
- creatinine for kidney function
- AST, ALL, GGT, Alkaline Phosphatase for liver function

## Imports

In [2]:
import pandas as pd
from LabData.DataLoaders.CGMLoader import CGMLoader
from LabData.DataLoaders.DietLoggingLoader import DietLoggingLoader
from LabData.DataLoaders.SubjectLoader import SubjectLoader
from LabData.DataLoaders.BodyMeasuresLoader import BodyMeasuresLoader
from LabData.DataLoaders.BloodTestsLoader import BloodTestsLoader
import datetime
%matplotlib inline
cgml = CGMLoader()
dll = DietLoggingLoader()

import matplotlib.pyplot as plt
import matplotlib
matplotlib.style.use('ggplot')

import seaborn as sns
from scipy.stats import pearsonr, spearmanr
from statsmodels.stats.multitest import fdrcorrection

## Preparing predictive features

### Personal features of PNP3 participants

Age and gender are in the SubjectLoader. Gender map: 1 - male, 0 - female. 

In the body measurments we have systolic and diastolic blood pressure, weight, BMI, hips, waist, height.
IIn the blood tests I can take all besides A1C, Fructosamine and fasting glucose.

Columns to exclude from the blood tests: 'bt__hba1c', 'bt__glucose', 'bt__fructosamine',  'bt__insulin'

Since the measurements were conducted several times I will take the average over all non zero and non NaN values per person. I can first look if the predictions will work at all. If they don't work a more sofisticated approach will also not help. But! If predictions will work good, then I could think about improving this method.

In [3]:
sl = SubjectLoader()
participants = sl.get_data(study_ids=3).df

In [6]:
bml = BodyMeasuresLoader()
body_meas = bml.get_data(study_ids=3).df

In [79]:
btl = BloodTestsLoader()
blood_tests = btl.get_data(study_ids=3).df

In [5]:
participants.head()

Unnamed: 0_level_0,Unnamed: 1_level_0,StudyTypeID,city,country,gender,us_state,yob,StudyTypeID2,StudyTypeID3,age
RegistrationCode,Date,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
111527,2017-10-05 09:04:27+03:00,3,,IL,0.0,,1971.0,,,48.0
117111,2018-02-13 22:55:37+02:00,3,,IL,1.0,,1971.0,,,48.0
126092,2018-07-08 12:17:18+03:00,3,,IL,0.0,,1962.0,,,57.0
12752,2018-07-11 18:27:38+03:00,3,,IL,1.0,,1962.0,,,57.0
128811,2017-10-22 09:03:50+03:00,3,,IL,1.0,,1971.0,,,48.0


In [6]:
participants.reset_index('Date').index.nunique()

248

In [131]:
def calc_mean_per_person(df):
    """"""
    df = df.dropna(axis=1, how='all')
    # Zeros disturb the correct calculation of the mean, NaNs do not
    df = df.replace(0, np.NaN)
    df = df.reset_index()
    df_means = df.drop(columns='Date').groupby('RegistrationCode').mean()
    sum_nans = df_means.isnull().sum().rename('sum').to_frame()
    too_many_nans = sum_nans[sum_nans['sum'] > 30].index
    df_means = df_means.drop(columns=too_many_nans)
    df_means = df_means.dropna(axis=0, how='any')
    
    return df_means

In [132]:
body_meas_means = calc_mean_per_person(body_meas)

In [133]:
blood_tests_means = calc_mean_per_person(blood_tests)

In [134]:
blood_tests_means = blood_tests_means.drop(columns=['bt__hba1c', 'bt__glucose', 'bt__fructosamine',  'bt__insulin'])

In [135]:
blood_tests_means.columns

Index(['bt__creatinine', 'bt__mchc', 'bt__crp_hs', 'bt__hdl_cholesterol',
       'bt__rdw', 'bt__lymphocytes_%', 'bt__monocytes_%', 'bt__rbc',
       'bt__hemoglobin', 'bt__triglycerides', 'bt__ast_got', 'bt__mch',
       'bt__alt_gpt', 'bt__mean_platelet_volume', 'bt__eosinophils_%',
       'bt__wbc', 'bt__basophils_%', 'bt__total_cholesterol', 'bt__mcv',
       'bt__neutrophils_%', 'bt__crp_synthetic', 'bt__platelets', 'bt__hct',
       'bt__ldl_cholesterol', 'bt__tsh', 'bt__albumin'],
      dtype='object')

In [137]:
blood_tests_means.shape

(229, 26)

In [139]:
body_meas_means.shape

(239, 12)

In [140]:
bm_bt = pd.merge(body_meas_means, blood_tests_means, on='RegistrationCode')

In [141]:
bm_bt.shape

(226, 38)

In [142]:
bm_bt

Unnamed: 0_level_0,weight,body_fat,hips,sitting_blood_pressure_diastolic,bmi,height,trunk_fat,bmr,waist,sitting_blood_pressure_pulse_rate,...,bt__basophils_%,bt__total_cholesterol,bt__mcv,bt__neutrophils_%,bt__crp_synthetic,bt__platelets,bt__hct,bt__ldl_cholesterol,bt__tsh,bt__albumin
RegistrationCode,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
111527,72.500000,40.722223,111.000000,88.60,31.204312,152.444444,37.477779,1335.555556,90.500000,66.200000,...,0.350000,168.750000,82.000000,60.333333,1.766526,336.500000,38.000000,91.500000,1.210000,4.766667
117111,108.637498,32.628571,118.333333,89.25,34.287811,178.000000,35.085715,2189.857143,116.166667,85.250000,...,0.650000,291.666667,92.000000,50.825000,0.672996,347.000000,46.425000,201.000000,1.840000,5.000000
126092,59.944444,33.275000,98.000000,94.50,23.415799,160.000000,30.125000,1200.375000,80.666667,70.000000,...,0.350000,179.000000,88.833333,53.050000,0.451477,244.333333,39.116667,96.000000,1.600000,4.866667
12752,95.355555,26.412500,113.000000,97.75,27.268596,187.000000,29.037500,2049.625000,100.333333,60.750000,...,0.366667,213.750000,93.666667,57.466667,0.437412,232.000000,44.616667,148.500000,0.590000,4.833333
130279,86.320000,26.522222,104.875000,77.00,31.145671,166.500000,28.711111,1839.333333,101.750000,68.200000,...,0.333333,173.000000,95.666667,53.483333,-0.540084,209.500000,46.166667,99.750000,1.030000,4.633333
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
988899,128.533335,39.388888,114.500000,105.40,41.442556,176.111111,38.622223,2413.777778,129.250000,77.400000,...,0.366667,215.750000,85.200000,58.460000,0.957806,177.600000,45.740000,136.250000,1.510000,5.133333
991569,110.900002,31.100000,121.000000,89.50,35.398514,177.000000,33.549999,2291.000000,114.000000,73.500000,...,0.550000,237.000000,87.000000,52.050000,-0.350211,256.500000,42.150000,159.000000,1.500000,5.100000
992638,74.580000,26.450000,94.500000,93.00,26.424319,168.000000,29.410000,1598.100000,93.500000,82.200000,...,0.316667,205.250000,91.333333,62.583333,-0.387131,148.833333,50.400000,115.500000,1.626667,5.200000
997427,96.324999,45.787500,120.000000,84.50,34.538706,167.000000,44.212500,1617.500000,102.333333,72.666667,...,0.500000,239.000000,86.000000,46.475000,0.556962,305.750000,44.025000,148.666667,1.160000,5.050000
