# Week 02 Assignment glucose level data

Welcome to week two of this course programming 1. You will learn about time related data wrangling with pandas and you will learn to visualize with bokeh. This week will be focussed around missing data. Concretely, you will preprocess the glucose json file using interpolation to impute in order to conduct visual analysis. Learning outcomes:

- load a json dataset 
- typecast the Pandas DataFrame to appropiate data types
- inspect the dataset for quality and metadata information
- add a column with interpolated data in Pandas DataFrame
- perform visual analysis

The assignment consists of 6 parts:

- [part 1: load the data](#0)
     - [Exercise 1.1](#ex-11)
- [part 2: prepare for inspection](#1)
     - [Exercise 2.1](#ex-21)
- [part 3: inspect the data](#2)
     - [Exercise 3.1](#ex-31)
- [part 4: interpolate the data](#3)
     - [Exercise 4.1](#ex-41)
- [part 5: visualize the data](#4)
     - [Exercise 5.1](#ex-51)
- [part 6: Challenge](#5)
     - [Exercise 6.1](#ex-61)

Part 1 and 5 are mandatory, part 6 is optional (bonus)
To pass the assingnment you need to a score of 60%. 


<a name='0'></a>
## Part 1: Load the data

Instructions: Load the json datafile `glucose.json` into a pandas dataframe. Check your dataframe with a `.head()` to compare with the expected outcome

<details>    
<summary>
    <font size="3" color="darkgreen"><b>Hints</b></font>
</summary>
<ul><li>json.load() method reads a file, pd.read_json converts it to a Pandas DataFrame</li>
    <li>when loading into a Pandas DataFrame use records orientation </li>
</ul>
</details>

<a name='ex-11'></a>
### Code your solution

In [340]:
import numpy as np
import pandas as pd
import json

# CODE YOUR SOLUTION HERE


In [341]:
#pd.read_json('C:/Data_Science_for_Life_Sciences_MASTER/programming1/BFVM19PROG1-main/data/glucose.json', lines=True)
#pd.read_json('C:/Data_Science_for_Life_Sciences_MASTER/programming1/BFVM19PROG1-main/data/glucose.json', typ='series', orient='records')
# these approaches do not work because the pandas read_json function requires the file to be opened and streamed as a string

# read the data
# open the file
f = open(r'C:/Data_Science_for_Life_Sciences_MASTER/programming1/BFVM19PROG1-main/data/glucose.json')
# load the data from the file as string
data = json.load(f)
# make dataframe from the string
df = pd.read_json(data)
print(df.head())
print(df.tail())

       ID              time  recordtype glucose
0  2845.0  2019-04-25 00:08           1     109
1  2850.0  2019-04-25 00:50           1        
2  2877.0  2019-04-25 07:02           1     123
3  2881.0  2019-04-25 07:34           1     158
4  2886.0  2019-04-25 08:19           1        
               ID              time  recordtype glucose
131  1.614305e+19  2019-04-25 22:47           0    None
132  1.614305e+19  2019-04-25 23:02           0    None
133  1.614305e+19  2019-04-25 23:18           0    None
134  1.614305e+19  2019-04-25 23:33           0    None
135  1.614305e+19  2019-04-25 23:48           0    None


In [342]:
# check the data types
df.dtypes
# redefine the ID column as a int64
df['ID'] = df['ID'].astype('int64')
# check the appearance of the head and tail of the data to see wheter redefinition was succesful
print(df.head())
print(df.tail())

     ID              time  recordtype glucose
0  2845  2019-04-25 00:08           1     109
1  2850  2019-04-25 00:50           1        
2  2877  2019-04-25 07:02           1     123
3  2881  2019-04-25 07:34           1     158
4  2886  2019-04-25 08:19           1        
                      ID              time  recordtype glucose
131 -9223372036854775808  2019-04-25 22:47           0    None
132 -9223372036854775808  2019-04-25 23:02           0    None
133 -9223372036854775808  2019-04-25 23:18           0    None
134 -9223372036854775808  2019-04-25 23:33           0    None
135 -9223372036854775808  2019-04-25 23:48           0    None


#### Expected outcome: 

<a name='1'></a>
## Part 2: Prepare the data

Check the datatypes of your dataframe. The `glucose` field should be an integer, the `time` field should have a datetime format. If the datatypes are different you should typecast them to the right format.
Make sure that your dataset is sorted by the time column


<details>    
<summary>
    <font size="3" color="darkgreen"><b>Hints</b></font>
</summary>
<ul><li>use astype() method or pandas.DataFrame.to_datetime() for instance</li>
    <li>make sure that the empty spaces are filled with NaN. Use errors='coerce'</li>
    <li>set_index(), sort_index() and reset_index() are helpful to sort on index</li>
</ul>
</details>

<a name='ex-21'></a>
### Code your solution

In [343]:
# CODE YOUR SOLUTION HERE
# check the datatypes:
print(df.dtypes)

# check for missing values per column
print(df.isnull().sum())
# check how missing values do look like
print(df.glucose.unique())

ID             int64
time          object
recordtype     int64
glucose       object
dtype: object
ID             0
time           0
recordtype     0
glucose       82
dtype: int64
['109' '  ' '123' '158' '139' '151' '129' '161' '184' '178' '121' '111'
 '106' '91' '97' '86' '114' '118' '115' '105' '107' '120' '137' '134'
 '125' '122' '124' '127' '126' '128' '110' '116' '130' None]


In [344]:
# redefine time and glucose columns' datatypes
df['time'] = pd.to_datetime(df['time'])
# check the data type
df.dtypes
# show the head of dataset to see the typecast variable
#df.head(7)

# fill the empty cells in the glucose column and typecast the glucose values to floats in one go
df.glucose = pd.to_numeric(df.glucose, errors='coerce')
# show the head of the data frame to confirm that missing values were indeed filled with NaN
df.head()

Unnamed: 0,ID,time,recordtype,glucose
0,2845,2019-04-25 00:08:00,1,109.0
1,2850,2019-04-25 00:50:00,1,
2,2877,2019-04-25 07:02:00,1,123.0
3,2881,2019-04-25 07:34:00,1,158.0
4,2886,2019-04-25 08:19:00,1,


In [345]:
# Check the datatypes
df.dtypes

ID                     int64
time          datetime64[ns]
recordtype             int64
glucose              float64
dtype: object

#### Expected outcome: 

In [346]:
# sort the dataframe by the time column
# set the index to the time variable
df = df.set_index(keys = ['time'])
df
#sort the index
df = df.sort_index()
# reset the index
df = df.reset_index()
print(df.head())
print(df.tail())

                 time                   ID  recordtype  glucose
0 2019-04-25 00:08:00                 2845           1    109.0
1 2019-04-25 00:14:00 -9223372036854775808           0      NaN
2 2019-04-25 00:29:00 -9223372036854775808           0      NaN
3 2019-04-25 00:44:00 -9223372036854775808           0      NaN
4 2019-04-25 00:50:00                 2850           1      NaN
                   time                   ID  recordtype  glucose
131 2019-04-25 23:02:00 -9223372036854775808           0      NaN
132 2019-04-25 23:18:00 -9223372036854775808           0      NaN
133 2019-04-25 23:31:00                 3062           1    111.0
134 2019-04-25 23:33:00 -9223372036854775808           0      NaN
135 2019-04-25 23:48:00 -9223372036854775808           0      NaN


<a name='2'></a>
## Part 3: Inspect the data

Now that we prepared the data we are going to inspect the data to get more familiar with the data. You are required to do the following

- inspect the percentage missing data for glucose
- what is the relationship between recordtype and glucose value?
- what is the relationship between ID and glucose value?

Code the solutions to your answers. Create meaningful overviews or statistics

<details>    
<summary>
    <font size="3" color="darkgreen"><b>Hints</b></font>
</summary>
<ul><li>In the week 01 assignment some functions were explained to inspect missing values</li>
    <li>In the week 01 assignment some functions were explained to groupby value</li>
</ul>
</details>

<a name='ex-31'></a>
### Code your solution

In [347]:
#CODE YOUR SOLUTION HERE
## percentage of missing data for glucose
perc = df.glucose.isnull().sum()/len(df)
print(perc)
perc = perc * 100
print(round(perc, 2), '%', 'of the glucose data is missing')

0.6176470588235294
61.76 % of the glucose data is missing


#### Expected outcome percentage missing data
0.6176470588235294

In [348]:
# relationship between recordtype and glucose value
## inspect the possible values for the recordtype variable
print(df.recordtype.unique())

## inpsect the possible values for the glucose variable
print(df.glucose.unique())

## check which glucose values are assigned to which recordtype
df.groupby(['recordtype'])['glucose'].unique()
# check for which ID's the recordtype is 0
df[df.recordtype == 0]['ID'].unique()
# check for which ID's the recordtype is 1
df[df.recordtype == 1]['ID'].unique()
# check whether this 'strange' ID has recordtype 1 as well
#print(df[df.ID == -9223372036854775808]['recordtype'].unique())
#check whether the ID is unique
df.groupby(['ID'])['time'].nunique()

[1 0]
[109.  nan 123. 158. 139. 151. 129. 161. 184. 178. 121. 111. 106.  91.
  97.  86. 114. 118. 115. 105. 107. 120. 137. 134. 125. 122. 124. 127.
 126. 128. 110. 116. 130.]


ID
-9223372036854775808    82
 2845                    1
 2850                    1
 2877                    1
 2881                    1
 2886                    1
 2899                    1
 2909                    1
 2916                    1
 2922                    1
 2925                    1
 2927                    1
 2929                    1
 2932                    1
 2935                    1
 2937                    1
 2940                    1
 2942                    1
 2944                    1
 2947                    1
 2949                    1
 2951                    1
 2954                    1
 2957                    1
 2960                    1
 2969                    1
 2971                    1
 2974                    1
 2976                    1
 2979                    1
 2981                    1
 2983                    1
 2985                    1
 2987                    1
 2990                    1
 2992                    1
 2994                    

To the recordtype 0 only NaN are assigned to. All recorded data for the recordtype = 0 arises from one ID (-9223372036854775808), which is a strangely small number. For recordtype = 1 all values may occur (NaN is possible value as well). ID's are unique except for the strangely small ID

In [349]:
#relationship between ID and glucose value
df.groupby(['glucose'])['ID'].unique()
#df.groupby(['ID'])['glucose'].unique()

glucose
86.0                             [2951]
91.0                             [2940]
97.0                       [2942, 2949]
105.0                            [2974]
106.0    [2937, 2944, 2947, 3034, 3058]
107.0                            [2979]
109.0                      [2845, 2976]
110.0                            [3026]
111.0                [2935, 2981, 3062]
114.0                      [2954, 3055]
115.0                            [2971]
116.0                            [3031]
118.0                      [2960, 2969]
120.0                            [2983]
121.0                      [2932, 3001]
122.0                      [2994, 3005]
123.0                      [2877, 3003]
124.0                      [2996, 3008]
125.0                            [2992]
126.0                      [3010, 3047]
127.0                            [2999]
128.0                      [3012, 3029]
129.0                [2916, 2957, 2990]
130.0                            [3050]
134.0                           

The measured glucose values range from 86.0 to 184.0. For each ID there is one specific glucose value. 

<a name='3'></a>
## Part 4: Interpolate the data

A lot of data is missing. Use interpolation to fill the missing values. Create a new column with the interpolated data. Take an argumentative approach. Select an interpolation method that suits the nature of the data and explain your choice. Mind you that the expected outcome of the interpolation values can differ from the example below

<details>    
<summary>
    <font size="3" color="darkgreen"><b>Hints</b></font>
</summary>
<ul><li>use Pandas.DataFrame.interpolate() method</li>
</ul>
</details>

<a name='ex-41'></a>
### Code your solution

In [350]:
#CODE YOUR SOLUTION HERE

from bokeh.plotting import figure, show
output_notebook()

## check the glucose data distribution
# extract only not NaN values
df_1 = df[np.isfinite(df['glucose'])]
data = np.array(df_1.glucose)
data

# make histogram
hist, edges = np.histogram(data, bins=70)
p = figure()
p.quad(top=hist, bottom=0, left=edges[:-1], right=edges[1:], line_color="white")

show(p)

The histogram shows values ranging between ~ 80 and 185 mg/dl. Based on literature normal glucose levels are ranging between 80 and 100 mg/dl before a meal, between 170 and 200 mg/dl after a meal and between 120 and 140 mg/dl 2 - 3 h after the meal. The observed values are covering all of the 3 scenarios, which leads to a range of 80 to 200 mg/dl. 

Assumption 1: no diabetic patients in this data set: glucose values are in range of 80 to 200 mg/dl. 

In [351]:
# visualize the glucose values over time as scatter plot
p = figure(x_axis_type="datetime", plot_width=800, plot_height=350)
p.dot(df_1.time, df_1.glucose, size = 20)

# add axis title and define font size
p.title.text="Glucose level for several patients in one day"
p.title.text_font_size="17px"

# ad x axis label and define the font size
p.xaxis.axis_label="time in hours"
p.xaxis.axis_label_text_font_size="17px"

# add y axis label and define the font size
p.yaxis.axis_label="glucose level in mg/dl"
p.yaxis.axis_label_text_font_size="17px"

# define legend text font and legend location
#p.legend.label_text_font="times"
#p.legend.location = "top_left"

show(p)

# create a line plot on same variables to get an idea of the trendline over time
pl = figure(x_axis_type="datetime", plot_width=800, plot_height=350)
pl.line(df_1.time, df_1.glucose)

# add axis title and define font size
pl.title.text="Glucose level for several patients in one day"
pl.title.text_font_size="17px"

# ad x axis label and define the font size
pl.xaxis.axis_label="time in hours"
pl.xaxis.axis_label_text_font_size="17px"

# add y axis label and define the font size
pl.yaxis.axis_label="glucose level in mg/dl"
pl.yaxis.axis_label_text_font_size="17px"

# define legend text font and legend location
#pl.legend.label_text_font="times"
#pl.legend.location = "top_left"

show(pl)

Over the day there are several peaks and local minima. The periodicity shown by example blood glucoses levels over a day can not be seen in this dataset. Therefore the following assumption does not hold:

assumption 2: periodicity of values, peaks around breakfast, lunch and dinner time.

For interpolation no linear or periodic (sine, cosine) trend can be assumed. Best fitting would be a polynom of higher order.

The data density is for the first half of the day low and in the second half of the day very high. This makes it hard to eyeball a fitting polynom. Some piecewise interpolation procedures as well as the method pad and nearest are tested and visualized:

In [352]:
# create additional column with interpolated data
# does fill the NaN's with the same value as the existing values
#df['interpolated'] = df.glucose.interpolate(method = 'pad')
# the NaN's are filled based on the closest data points (still only repetitions of given points)
#df['interpolated'] = df.glucose.interpolate(method = 'nearest')

# use the spline approach: inbetween two given points the missing values are filled with values matching a polynom of predfined order
#df['interpolated'] = df.glucose.interpolate(method = 'spline', order = 1).clip(lower=79,upper=201)
#df['interpolated'] = df.glucose.interpolate(method = 'spline', order = 3).clip(lower=79,upper=201)
#df['interpolated'] = df.glucose.interpolate(method = 'spline', order = 5).clip(lower=79,upper=201)
#df['interpolated'] = df.glucose.interpolate(method = 'piecewise_polynomial') # same result as spline order 1
# chosen method: similar to the linear approaches but a bit more curved
df['interpolated'] = df.glucose.interpolate(method = 'pchip') 
df

Unnamed: 0,time,ID,recordtype,glucose,interpolated
0,2019-04-25 00:08:00,2845,1,109.0,109.000000
1,2019-04-25 00:14:00,-9223372036854775808,0,,109.006983
2,2019-04-25 00:29:00,-9223372036854775808,0,,109.029115
3,2019-04-25 00:44:00,-9223372036854775808,0,,109.068168
4,2019-04-25 00:50:00,2850,1,,109.125918
...,...,...,...,...,...
131,2019-04-25 23:02:00,-9223372036854775808,0,,106.185185
132,2019-04-25 23:18:00,-9223372036854775808,0,,107.481481
133,2019-04-25 23:31:00,3062,1,111.0,111.000000
134,2019-04-25 23:33:00,-9223372036854775808,0,,117.851852


In [353]:
# visualize the glucose values over time as scatter plot
p_in = figure(x_axis_type="datetime", plot_width=800, plot_height=350)
p_in.dot(df.time, df.interpolated, size = 20)

# add axis title and define font size
p_in.title.text="Glucose level for several patients in one day"
p_in.title.text_font_size="17px"

# ad x axis label and define the font size
p_in.xaxis.axis_label="time in hours"
p_in.xaxis.axis_label_text_font_size="17px"

# add y axis label and define the font size
p_in.yaxis.axis_label="glucose level in mg/dl"
p_in.yaxis.axis_label_text_font_size="17px"

# define legend text font and legend location
#p_all.legend.label_text_font="times"
#p_all.legend.location = "top_left"

show(p_in)

# create a line plot on same variables to get an idea of the trendline over time
pl_in = figure(x_axis_type="datetime", plot_width=800, plot_height=350)
pl_in.line(df.time, df.interpolated)

# add axis title and define font size
pl_in.title.text="Glucose level for several patients in one day"
pl_in.title.text_font_size="17px"

# ad x axis label and define the font size
pl_in.xaxis.axis_label="time in hours"
pl_in.xaxis.axis_label_text_font_size="17px"

# add y axis label and define the font size
pl_in.yaxis.axis_label="glucose level in mg/dl"
pl_in.yaxis.axis_label_text_font_size="17px"

# define legend text font and legend location
#p_in.legend.label_text_font="times"
#p_in.legend.location = "top_left"

show(pl_in)

The pchip method was chosen because it interpolates between the given points with smooth curves, which seemed to me a bit more realistic than a linear approach. I preferred it over the spline method with uneven order > 1 because it shows a increasing trend between measured points 1 and 2, which seems to be more likely than more or less the same value. This makes especially sense when taking into consideration, that the glucose level is after waking up the lowest and that the increase from then on till reaching the after-breakfast peak is noramally monotonous increasing.

#### Example outcome

<a name='4'></a>
## Part 5: Plot the data

Create a plot with the original data and the interpolated data. Consider what the best representation is for visualisation of actual values and modelled/imputed values. An example of such a plot is given below. This plot however is not considered the best practice. 

<details>    
<summary>
    <font size="3" color="darkgreen"><b>Hints</b></font>
</summary>
<ul><li>figure(x_axis_type='datetime') automatically makes nices labels of the datetime data</li>
</ul>
</details>

<a name='ex-51'></a>
### Code your solution

In [354]:
from bokeh.io import output_notebook
from bokeh.plotting import figure, show
from bokeh.layouts import gridplot
from bokeh.io import output_notebook
from bokeh.plotting import ColumnDataSource


In [355]:
#CODE YOUR SOLUTION HERE

#define a variable to indicate whether glucose data is interpolated or measured data
#False means there is a measured value, true means the value is interpolated
df['meas_inter'] = df.glucose.isnull()
#print(df)

# define plot size and x axis type
p_all = figure(x_axis_type="datetime", plot_width=800, plot_height=350)

# colour the glucose values based on whether they are measured or interpolated
p_all.circle(df[df['meas_inter'] == False]['time'], 
          df[df['meas_inter'] == False]['glucose'],
         size = 7, legend_label = 'measured glucose values', line_width = 0)

p_all.circle(df[df['meas_inter'] == True]['time'], 
          df[df['meas_inter'] == True]['interpolated'],
         size = 7, fill_color = 'red', legend_label = 'interpolated glucose values', line_width = 0)

# add axis title and define font size
p_all.title.text="Glucose level for several patients in one day"
p_all.title.text_font_size="17px"

# ad x axis label and define the font size
p_all.xaxis.axis_label="time in hours"
p_all.xaxis.axis_label_text_font_size="17px"

# add y axis label and define the font size
p_all.yaxis.axis_label="glucose level in mg/dl"
p_all.yaxis.axis_label_text_font_size="17px"

# define legend text font and legend location
p_all.legend.label_text_font="times"
p_all.legend.location = "top_left"

# show the plot
show(p_all)

<a name='6'></a>
## Part 6: Challenge

It might even be interesting to introduce a widget in which you can select different methods to interpolate.
1. Can you improve the interpolation by choosing an other method?
2. Can you add an rolling mean line? 
2. Can you improve the plot by making it interactive?

<a name='ex-61'></a>
### Code your solution

In [356]:
#CODE YOUR SOLUTION HERE

# can interpolation be improved by choosing a different method?
df['interpolated_2'] = df.glucose.interpolate(method = 'spline', order = 1).clip(lower=79,upper=201)
#print(df)

# define plot size and x axis type
p_all2 = figure(x_axis_type="datetime", plot_width=800, plot_height=350)

# colour the glucose values based on whether they are measured or interpolated
p_all2.circle(df[df['meas_inter'] == False]['time'], 
          df[df['meas_inter'] == False]['glucose'],
         size = 7, legend_label = 'measured glucose values', line_width = 0)

p_all2.circle(df[df['meas_inter'] == True]['time'], 
          df[df['meas_inter'] == True]['interpolated_2'],
         size = 7, fill_color = 'red', legend_label = 'interpolated glucose values', line_width = 0)

# add axis title and define font size
p_all2.title.text="Glucose level for several patients in one day"
p_all2.title.text_font_size="17px"

# ad x axis label and define the font size
p_all2.xaxis.axis_label="time in hours"
p_all2.xaxis.axis_label_text_font_size="17px"

# add y axis label and define the font size
p_all2.yaxis.axis_label="glucose level in mg/dl"
p_all2.yaxis.axis_label_text_font_size="17px"

# define legend text font and legend location
p_all2.legend.label_text_font="times"
p_all2.legend.location = "top_left"

# show the plot
show(p_all2)

This linear alternative is reasonable, too. Both versions are taken into account for the next steps.

In [357]:
# add an rolling mean line
# calcualte rolling mean
rolling_windows = df.interpolated.rolling(2, min_periods=1)
rolling_mean = rolling_windows.mean()
rolling_mean

# define plot size and x axis type
p_rol = figure(x_axis_type="datetime", plot_width=800, plot_height=350)

# colour the glucose values based on whether they are measured or interpolated
p_rol.circle(df[df['meas_inter'] == False]['time'], 
          df[df['meas_inter'] == False]['glucose'],
         size = 7, legend_label = 'measured glucose values', line_width = 0)

p_rol.circle(df[df['meas_inter'] == True]['time'], 
          df[df['meas_inter'] == True]['interpolated'],
         size = 7, fill_color = 'red', legend_label = 'interpolated glucose values', line_width = 0)

p_rol.line(df.time, rolling_mean, line_color = 'black')

# add axis title and define font size
p_rol.title.text="Glucose level for several patients in one day"
p_rol.title.text_font_size="17px"

# ad x axis label and define the font size
p_rol.xaxis.axis_label="time in hours"
p_rol.xaxis.axis_label_text_font_size="17px"

# add y axis label and define the font size
p_rol.yaxis.axis_label="glucose level in mg/dl"
p_rol.yaxis.axis_label_text_font_size="17px"

# define legend text font and legend location
p_rol.legend.label_text_font="times"
p_rol.legend.location = "top_left"

show(p_rol)

# rolling mean for second interpolation method
rolling_windows2 = df.interpolated_2.rolling(2, min_periods=1)

rolling_mean2 = rolling_windows2.mean()

rolling_mean2

# define plot size and x axis type
p1 = figure(x_axis_type="datetime", plot_width=800, plot_height=350)

# colour the glucose values based on whether they are measured or interpolated
p1.circle(df[df['meas_inter'] == False]['time'], 
          df[df['meas_inter'] == False]['glucose'],
         size = 7, legend_label = 'measured glucose values', line_width = 0)

p1.circle(df[df['meas_inter'] == True]['time'], 
          df[df['meas_inter'] == True]['interpolated_2'],
         size = 7, fill_color = 'red', legend_label = 'interpolated glucose values', line_width = 0)

p1.line(df.time, rolling_mean2, line_color = 'black')

# add axis title and define font size
p1.title.text="Glucose level for several patients in one day"
p1.title.text_font_size="17px"

# ad x axis label and define the font size
p1.xaxis.axis_label="time in hours"
p1.xaxis.axis_label_text_font_size="17px"

# add y axis label and define the font size
p1.yaxis.axis_label="glucose level in mg/dl"
p1.yaxis.axis_label_text_font_size="17px"

# define legend text font and legend location
p1.legend.label_text_font="times"
p1.legend.location = "top_left"

show(p1)

In [358]:
# improve plot by making it interactive
## i would like to have two tabs for the two interpolation methods 
## and i would like to have checkboxes to choose between measured glucose values, interpolated glucose values and the rolling mean line
from bokeh.models import ColumnDataSource
from bokeh.models import Tabs, Panel
import panel as pn
pn.extension()

# prepare the plots
# plot 1
# Isolate the data for the measured and the interpolated values
measured_data = df[df['meas_inter'] == False]
interpolated_data = df[df['meas_inter'] == True]
#measured_data
#interpolated_data

# Create a ColumnDataSource object for each team
measured_cds = ColumnDataSource(measured_data)
interpolated_cds = ColumnDataSource(interpolated_data)
measured_cds
# Create and configure the figure
fig_ipol1 = figure(x_axis_type='datetime',
             plot_height=300, plot_width=600,
             title='Patient Glucose levels over a day',
             x_axis_label='Time', y_axis_label='glucose level',
             toolbar_location=None)

# add data points to plot using the ColumDataSource objects
fig_ipol1.scatter(x = 'time', y = 'glucose', source = measured_cds, marker = 'circle', size = 7, fill_color = 'blue', line_width = 0)
fig_ipol1.scatter(x = 'time', y = 'interpolated', source = interpolated_cds, marker = 'circle', size = 7, fill_color = 'red', line_width = 0)
fig_ipol1.line(x = df.time, y = rolling_mean, line_color = 'black')
# Show the plot
show(fig_ipol1)

# plot 2
# Create and configure the figure
fig_ipol2 = figure(x_axis_type='datetime',
             plot_height=300, plot_width=600,
             title='Patient Glucose levels over a day',
             x_axis_label='Time', y_axis_label='glucose level',
             toolbar_location=None)

# Render the race as step lines
fig_ipol2.scatter(x = 'time', y = 'glucose',source = measured_cds, marker = 'circle', size = 7, fill_color = 'blue', line_width = 0)
fig_ipol2.scatter(x = 'time', y = 'interpolated_2', source = interpolated_cds, marker = 'circle', size = 7, fill_color = 'red', line_width = 0)
fig_ipol2.line(x = df.time, y = rolling_mean2, line_color = 'black')
# Show the plot
show(fig_ipol2)


# create the tabs
tab_ipol1 = Panel(child = fig_ipol1, title = 'Interpolation method pchip')
tab_ipol2 = Panel(child = fig_ipol2, title = 'Interpolation method spline with order 1')

# create tabbed plot with Tabs object
ipol_tabs = Tabs(tabs = [tab_ipol1, tab_ipol2])

show(ipol_tabs)

# make the widget
glucose_panel = pn.panel(ipol_tabs, )
glucose_panel
# make the checkboxes
checkbox_group = pn.widgets.CheckBoxGroup(name = 'CheckBox Group', value = ['measured glucose values'], options = ['measured glucose values', 'interpolated glucose values'])
checkbox_group

#ip = pn.interact(glucose_panel, checkbox_group)





In [359]:
# interactive plot without rolling mean (see code chunk below for interactive plot with rolling mean)

# define function which is called to refresh the plot based on ticked boxes
def plot_glucose(df = df, value_type = ['measured glucose values']):
    
    # Isolate the data for the measured and the interpolated values
    measured_data = df[df['meas_inter'] == False]
    interpolated_data = df[df['meas_inter'] == True]


    # Create a ColumnDataSource object for each team
    measured_cds = ColumnDataSource(measured_data)
    interpolated_cds = ColumnDataSource(interpolated_data)
    
    # define and configure figure
    fig = figure(x_axis_type='datetime',
             plot_height=300, plot_width=600,
             title='Patient Glucose levels over a day',
             x_axis_label='Time', y_axis_label='glucose level',
             toolbar_location=None)
    
    # add data points
    for item in value_type:
        if item == 'measured glucose values':
            fig.scatter(x = 'time', y = 'interpolated', source = measured_cds, marker = 'circle', size = 7, fill_color = 'blue', line_width = 0)
        if item == 'interpolated glucose values':
            fig.scatter(x = 'time', y = 'interpolated', source = interpolated_cds, marker = 'circle', size = 7, fill_color = 'red', line_width = 0)

    # Create and configure the figure
    fig_ipol2 = figure(x_axis_type='datetime',
             plot_height=300, plot_width=600,
             title='Patient Glucose levels over a day',
             x_axis_label='Time', y_axis_label='glucose level',
             toolbar_location=None)
    
    # add data point
    for item in value_type:
        if item == 'measured glucose values':
            fig_ipol2.scatter(x = 'time', y = 'glucose', source = measured_cds, marker = 'circle', size = 7, fill_color = 'blue', line_width = 0)
        if item == 'interpolated glucose values':
            fig_ipol2.scatter(x = 'time', y = 'interpolated_2', source = interpolated_cds, marker = 'circle', size = 7, fill_color = 'red', line_width = 0)

    # create the tabs
    tab_ipol1 = Panel(child = fig, title = 'Interpolation method pchip')
    tab_ipol2 = Panel(child = fig_ipol2, title = 'Interpolation method spline with order 1')

    # create tabbed plot with Tabs object
    ipol_tabs = Tabs(tabs = [tab_ipol1, tab_ipol2])

    return ipol_tabs


# make the checkboxes
checkbox_group = pn.widgets.CheckBoxGroup(name = 'CheckBox Group', value = ['measured glucose values'], options = ['measured glucose values', 'interpolated glucose values'])
checkbox_group

# connect checkboxes and tabbed plot
ip = pn.interact(plot_glucose, value_type = checkbox_group)

#desing grid
grid = pn.GridSpec(sizing_mode='stretch_both', max_height=800)
grid[0,0] = ip[1]
grid[0,1] = pn.pane.DataFrame(df)
grid[1,0] = ip[0]

grid.show()

Launching server at http://localhost:59452


<bokeh.server.server.Server at 0x21ce3c65e20>

In [360]:
# interactive plot with rolling mean
# def plot function which is called to refresh the visualization based on checkboxes ticked
def plot_glucose(df = df, value_type = ['measured glucose values']):
    
    # Isolate the data for the measured and the interpolated values
    measured_data = df[df['meas_inter'] == False]
    interpolated_data = df[df['meas_inter'] == True]


    # Create a ColumnDataSource object for each team
    measured_cds = ColumnDataSource(measured_data)
    interpolated_cds = ColumnDataSource(interpolated_data)
    
    # define and configure the figure
    fig = figure(x_axis_type='datetime',
             plot_height=300, plot_width=600,
             title='Patient Glucose levels over a day',
             x_axis_label='Time', y_axis_label='glucose level',
             toolbar_location=None)
    
    # add data points
    for item in value_type:
        if item == 'measured glucose values':
            fig.scatter(x = 'time', y = 'interpolated', source = measured_cds, marker = 'circle', size = 7, fill_color = 'blue', line_width = 0)
        if item == 'interpolated glucose values':
            fig.scatter(x = 'time', y = 'interpolated', source = interpolated_cds, marker = 'circle', size = 7, fill_color = 'red', line_width = 0)
        if item == 'all values rolling mean':
            fig.scatter(x = 'time', y = 'interpolated', source = measured_cds, marker = 'circle', size = 7, fill_color = 'blue', line_width = 0)
            fig.scatter(x = 'time', y = 'interpolated', source = interpolated_cds, marker = 'circle', size = 7, fill_color = 'red', line_width = 0)
            fig.line(x = df.time, y = rolling_mean, line_color = 'black')
            

    # define and configure the figure
    fig_ipol2 = figure(x_axis_type='datetime',
             plot_height=300, plot_width=600,
             title='Patient Glucose levels over a day',
             x_axis_label='Time', y_axis_label='glucose level',
             toolbar_location=None)
    
    # add data points
    for item in value_type:
        if item == 'measured glucose values':
            fig_ipol2.scatter(x = 'time', y = 'glucose', source = measured_cds, marker = 'circle', size = 7, fill_color = 'blue', line_width = 0)
        if item == 'interpolated glucose values':
            fig_ipol2.scatter(x = 'time', y = 'interpolated_2', source = interpolated_cds, marker = 'circle', size = 7, fill_color = 'red', line_width = 0)
        if item == 'all values rolling mean':
            fig_ipol2.scatter(x = 'time', y = 'interpolated_2', source = measured_cds, marker = 'circle', size = 7, fill_color = 'blue', line_width = 0)
            fig_ipol2.scatter(x = 'time', y = 'interpolated_2', source = interpolated_cds, marker = 'circle', size = 7, fill_color = 'red', line_width = 0)
            fig_ipol2.line(x = df.time, y = rolling_mean2, line_color = 'black')

    # create the tabs
    tab_ipol1 = Panel(child = fig, title = 'Interpolation method pchip')
    tab_ipol2 = Panel(child = fig_ipol2, title = 'Interpolation method spline with order 1')

    # create tabbed plot with Tabs object
    ipol_tabs = Tabs(tabs = [tab_ipol1, tab_ipol2])

    return ipol_tabs


# make the checkboxes
checkbox_group = pn.widgets.CheckBoxGroup(name = 'CheckBox Group', value = ['measured glucose values'], options = ['measured glucose values', 'interpolated glucose values', 'all values rolling mean'])
checkbox_group

# connect checkboxes and tabbed plot
ip = pn.interact(plot_glucose, value_type = checkbox_group)

#desing grid
grid = pn.GridSpec(sizing_mode='stretch_both', max_height=800)
grid[0,0] = ip[1]
grid[0,1] = pn.pane.DataFrame(df)
grid[1,0] = ip[0]

grid.show()

Launching server at http://localhost:59453


<bokeh.server.server.Server at 0x21ce3d27bb0>