# Physio summary

In this notebook, I will take the preprocessed physio data for subject `005` and summarise it by condition.

One thing we observed from the validation of the preprocessed data, is that my timestamps are 1 hour earlier than the reports.  This is corrected below.  We need to stay aware of this when using the preprocessed files (maybe in a next round, we can include this in the preprocessing).

In [1]:
from datetime import datetime, timedelta
import pandas as pd
import numpy as np
import os

  return f(*args, **kwds)
  return f(*args, **kwds)


In [2]:
datadir = "/Users/jokedurnez/Documents/projects/projectsOngoing/accounts/Data/CAFE/Physio/"
subject = 'WI_AMP_005'

infosheetfile = os.path.join(datadir,
                         'Preliminary Physio Wristband Data for Mollie',
                         "Physio AMP_Subject_Info_Sheet.xlsx")
subdir = os.path.join(datadir,
                      'Preliminary Physio Wristband Data for Mollie',
                      subject)
outdir = os.path.join(datadir,'preprocessed',subject)

In [3]:
infosheet = pd.read_excel(infosheetfile,index_col='Subject ID')

In [4]:
# extract info for this subject
subjectinfo = infosheet.loc[subject]
subjectinfo

Female                                                       1
Age(months)                                               58.8
Wristband                                               A005C7
HR data                                                      Y
Date                                                    180322
Video_Start_Time                                      16:31:13
Physio_Start_DateTime                                 16:24:57
Physio_Duration                                       00:37:30
Cond1                                                     NoTV
Cond2                                                       TL
Cond3                                                    Psych
Baseline_Start_Time                            00:03:28.896000
Cond1_Start_Time                               00:08:28.996000
Cond2_Start_Time                               00:15:29.390000
Cond3_Start_Time                               00:23:17.536000
Cond3_End_Time                                 00:30:46

In [5]:
# write function to transform timestamps to timedelta and get
# starttime.
# probably there's a more efficient way, but it works :) 

def time_to_timedelta(ts):
    mins = ts.minute
    secs = ts.second
    ms = ts.microsecond
    offset = timedelta(minutes=mins,seconds=secs,microseconds=ms)
    return offset

def get_startdatetime(subjectinfo):
    date = datetime.strptime(str(int(subjectinfo.Date)),'%y%m%d')
    time = subjectinfo.Video_Start_Time
    date = date.replace(hour=time.hour,
                        minute=time.minute,
                        second=time.second,
                        microsecond=time.microsecond)
    return date

**Note** I'm on the airplane right now and I'm not entirely sure how to combine these starttimes to the starttimes from the timestamps in the preprocessed data.  I'm assuming it is `Video_Start_Time`+`Baseline_Start_Time`, which seems to be in accordance with the example preprocessed data.  Need to verify this !

Now we extract from this table the conditions and their starttimes.

In [6]:
startdatetime = get_startdatetime(subjectinfo)

cond_dict = {
    "NoTV": "No",
    "TL": "Child",
    "Psych": "Adult",
    "WoF": "Adult"
}
times = {}

# BL
times['BL'] = {}
times['BL']['start'] = startdatetime + \
    time_to_timedelta(subjectinfo.Baseline_Start_Time)
times['BL']['end'] = startdatetime + \
    time_to_timedelta(subjectinfo.Cond1_Start_Time)

# COND 1

cond = cond_dict[subjectinfo['Cond1']]
times[cond] = {}
times[cond]['start'] = startdatetime + \
    time_to_timedelta(subjectinfo.Cond1_Start_Time)
times[cond]['end'] = startdatetime + \
    time_to_timedelta(subjectinfo.Cond2_Start_Time)

# COND 2

cond = cond_dict[subjectinfo['Cond2']]
times[cond] = {}
times[cond]['start'] = startdatetime + \
    time_to_timedelta(subjectinfo.Cond2_Start_Time)
times[cond]['end'] = startdatetime + \
    time_to_timedelta(subjectinfo.Cond3_Start_Time)

# COND 3

cond = cond_dict[subjectinfo['Cond3']]
times[cond] = {}
times[cond]['start'] = startdatetime + \
    time_to_timedelta(subjectinfo.Cond3_Start_Time)
times[cond]['end'] = startdatetime + \
    time_to_timedelta(subjectinfo.Cond3_End_Time)

times

{'BL': {'start': datetime.datetime(2018, 3, 22, 16, 34, 41, 896000),
  'end': datetime.datetime(2018, 3, 22, 16, 39, 41, 996000)},
 'No': {'start': datetime.datetime(2018, 3, 22, 16, 39, 41, 996000),
  'end': datetime.datetime(2018, 3, 22, 16, 46, 42, 390000)},
 'Child': {'start': datetime.datetime(2018, 3, 22, 16, 46, 42, 390000),
  'end': datetime.datetime(2018, 3, 22, 16, 54, 30, 536000)},
 'Adult': {'start': datetime.datetime(2018, 3, 22, 16, 54, 30, 536000),
  'end': datetime.datetime(2018, 3, 22, 17, 1, 59, 642000)}}

In [7]:
measurements = {}
variable_of_interest = {
    "ACC": "SVM",
    "EDA": "EDA_0",
    "BVP": "BVP_0",
    "TEMP": "TEMP_0",
    "HR": "HR_0",
    "IBI": "IBI"
}

summary = pd.DataFrame({})
for metric in ['ACC', 'EDA', 'TEMP', 'HR', 'IBI', 'BVP']:
    # read in preprocessed file
    preprocessed = pd.read_csv(os.path.join(
        outdir,"PHYSIO_%s_%s.csv"%(subject,metric)),
        parse_dates = ['timestamp'])
    
    # change hour according to difference observed
    preprocessed['timestamp'] = preprocessed['timestamp'] + timedelta(hours=2)
    
    # add condition to preprocessed data
    preprocessed['condition'] = None
    for condition,values in times.items():
        conditiontimes = (preprocessed.timestamp < values['end']) & \
            (preprocessed.timestamp >= values['start'])
        preprocessed.loc[conditiontimes,'condition']= condition
    
    # group by condition and summarise
    grouper = variable_of_interest[metric]
    grouped = preprocessed[[grouper,'condition']] \
        .groupby('condition') \
        .aggregate(['mean','count','median','std'])
    grouped.columns = ['mean','count','median','std']
    grouped['metric'] = metric
    
    # add to summary dataset
    summary = pd.concat([summary,grouped])

In [10]:
summary['ID'] = subject
summary.to_csv(os.path.join(outdir,"PHYSIO_%s_summary.csv"%(subject)),
               index=False)

In [11]:
summary

Unnamed: 0_level_0,mean,count,median,std,metric,ID
condition,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Adult,63.442113,14371,63.158531,6.249463,ACC,WI_AMP_005
BL,63.759993,9603,62.585941,10.656769,ACC,WI_AMP_005
Child,62.734624,14981,62.489999,3.592436,ACC,WI_AMP_005
No,64.089205,13453,62.809235,10.929416,ACC,WI_AMP_005
Adult,1.473514,1796,1.332217,0.540778,EDA,WI_AMP_005
BL,3.903582,1200,3.969448,0.965889,EDA,WI_AMP_005
Child,1.266404,1873,1.187514,0.191148,EDA,WI_AMP_005
No,4.091204,1682,4.709348,1.407558,EDA,WI_AMP_005
Adult,36.993068,1796,37.18,0.352233,TEMP,WI_AMP_005
BL,37.958667,1200,38.07,0.334753,TEMP,WI_AMP_005
