![alt text](./pageheader_rose2_babies.jpg)

# SIPPV ventilation data

#### Author: Dr Gusztav Belteki

This notebook imports all files (slow_text, slow_settings, slow_measurements) of the first service evaluation (**DG001-DG060**) and stores them as dictionaries of DataFrames. It filters slow_measurements data to remove HFOV periods. It then limits recordings to continuous PC-AC (SIPPV) periods (1 per recoridng) using manual lookup of the recordings. After some preprocessing the data it exports the data to pickle archives: *slow_measurements_sippv_1, slow_measurements_sippv_2, slow_measurements_sippv_3*. Exporting it to three archives is necessary due to the size of data.

*Preprocessing done on the data:*

*  Resampling to 1 second to remove half empty rows (parameters were retrieved in two batches for each second, half a parameters were taken in each batch. Resampling combines them into one. 
*  Only SIPPV data are kept (this is done in part by manual lookup of ventilator settings).
*  Normalizing the relevant parameters (VTs, MVs) to body weight. Original non-normalized data are also kept.
*  Changing column names to more legible and intuitive ones.
*  Adding the set backup respiratory rate (*RR*) to the data. 
*  Marking if leak compensation was on or not and adding this info to the DataFrames as a categorical variable
*  Adding the recordings' names to the DataFrames
*  Removing some unimportant (additional time stamps) columns

### Importing the necessary libraries and setting options

In [None]:
import IPython
import pandas as pd
import numpy as np
import scipy as sp
import matplotlib
import matplotlib.pyplot as plt
import seaborn as sns
import sklearn as sk

import os
import sys
import re
import pickle

from scipy import stats
from pandas import Series, DataFrame
from datetime import datetime, timedelta

%matplotlib inline

matplotlib.style.use('classic')
matplotlib.rcParams['figure.facecolor'] = 'w'

pd.set_option('display.max_rows', 100)
pd.set_option('display.max_columns', 100)

In [None]:
print("Python version: {}".format(sys.version))
print("pandas version: {}".format(pd.__version__))
print("matplotlib version: {}".format(matplotlib.__version__))
print("NumPy version: {}".format(np.__version__))
print("SciPy version: {}".format(sp.__version__))
print("IPython version: {}".format(IPython.__version__))

### Import custom functions from own module

In [None]:
from gb_loader import *
from gb_stats import *
from gb_transform import *
from gb_visualizer import *

### List and set the working directory and the directory to write out data

In [None]:
# Topic of the Notebook which will also be the name of the subfolder containing results
TOPIC = 'SIPPV_all'

# Name of the external hard drive
DRIVE = 'GUSZTI'

# Directory containing clinical and blood gas data
CWD = '/Users/guszti/ventilation_draeger'

# Directory on external drive to read the ventilation data from
DIR_READ = '/Volumes/%s/Draeger/service_evaluation_old' % DRIVE

# Directory to write results and selected images to 
if not os.path.isdir('%s/%s/%s' % (CWD, 'Analyses', TOPIC)):
    os.makedirs('%s/%s/%s' % (CWD, 'Analyses', TOPIC))
DIR_WRITE = '%s/%s/%s' % (CWD, 'Analyses', TOPIC)

# Images and raw data will be written on an external hard drive
if not os.path.isdir('/Volumes/%s/data_dump/draeger/%s' % (DRIVE, TOPIC)):
    os.makedirs('/Volumes/%s/data_dump/draeger/%s' % (DRIVE, TOPIC))
DATA_DUMP = '/Volumes/%s/data_dump/draeger/%s' % (DRIVE, TOPIC)

In [None]:
os.chdir(CWD)
os.getcwd()

In [None]:
DIR_READ

In [None]:
DIR_WRITE

In [None]:
DATA_DUMP

### List of the  recordings

In [None]:
# This is a list of all recordings

recordings = ['DG001', 'DG002_1', 'DG002_2', 'DG003', 'DG004', 'DG005_1', 'DG005_2', 'DG005_3', 
              'DG006_1', 'DG006_2', 'DG006_3', 'DG007', 'DG008', 'DG009', 'DG010', 'DG011', 
              'DG012', 'DG013', 'DG014', 'DG015', 'DG016', 'DG017', 'DG018_1', 'DG018_2', 'DG019',
              'DG020', 'DG021', 'DG022', 'DG023', 'DG024',  'DG025', 'DG026', 'DG027', 'DG028', 
              'DG029', 'DG030', 'DG031', 'DG032_1', 'DG032_2', 'DG033', 'DG034', 'DG035', 'DG036', 
              'DG037', 'DG038_1', 'DG038_2', 'DG039', 'DG040_1', 'DG040_2', 'DG041', 'DG042', 
              'DG043', 'DG044', 'DG045', 'DG046_1', 'DG046_2', 'DG047', 'DG048', 'DG049', 'DG050',
              'DG051_1', 'DG051_2', 'DG052', 'DG053', 'DG054', 'DG055', 'DG056', 'DG057', 'DG058',
              'DG059', 'DG060']

In [None]:
len(recordings)

### Import clinical details

In [None]:
clinical_details = pd.read_excel('%s/data_grabber_patient_data_combined_old.xlsx' % CWD)
clinical_details.index = clinical_details['Recording']

In [None]:
clinical_details.info()

In [None]:
current_weights = {}
for recording in recordings:
    current_weights[recording] = clinical_details.loc[recording, 'Current weight' ] / 1000

### Import ventilator modes 

In [None]:
vent_modes = {}

for recording in recordings:
    flist = os.listdir('%s/%s' % (DIR_READ, recording))
    flist = [file for file in flist if not file.startswith('.')] # There are some hidden 
    # files on the hard drive starting with '.'; this step is necessary to ignore them
    files = slow_text_finder(flist)
    # print('Loading recording %s' % recording)
    # print(files)
    fnames = ['%s/%s/%s' % (DIR_READ, recording, filename) for filename in files]
    vent_modes[recording] =  data_loader(fnames)

In [None]:
vent_modes_selected = {} # only important mode parameters are kept in this one

for recording in recordings:
    vent_modes_selected[recording] = vent_mode_cleaner(vent_modes[recording])

### Import ventilator settings

In [None]:
vent_settings = {}

for recording in recordings:
    flist = os.listdir('%s/%s' % (DIR_READ, recording))
    flist = [file for file in flist if not file.startswith('.')] # There are some hidden 
    # files on the hard drive starting with '.'; this step is necessary to ignore them
    files = slow_setting_finder(flist)
    # print('Loading recording %s' % recording)
    # print(files)
    fnames = ['%s/%s/%s' % (DIR_READ, recording, filename) for filename in files]
    vent_settings[recording] =  data_loader(fnames)

In [None]:
vent_settings_selected = {} # only important mode parameters are kept in this one

for recording in recordings:
    vent_settings_selected[recording] = vent_settings_cleaner(vent_settings[recording])

### Identify recordings that have SIPPV periods

In [None]:
# Identify recordings which have PC-AC mode ( = SIPPV) and collect their name in a list
# Print those ones which do not have PC_AC periods
recordings_sippv = []

for recording in recordings:
    a = (vent_modes_selected[recording]['Text'])
    if ' Mode PC-AC' in a.values:
        recordings_sippv.append(recording)
    else:
        # print('%s does not contain SIPPV ventilation' % recording)
        pass

In [None]:
print(recordings_sippv)

In [None]:
len(recordings_sippv)

### Import ventilator parameters obtained with 1Hz sampling rate ("slow measurements")

In [None]:
slow_measurements = {}

for recording in recordings_sippv:
    
    flist = os.listdir('%s/%s' % (DIR_READ, recording))
    flist = [file for file in flist if not file.startswith('.')] # There are some hidden 
    # files on the hard drive starting with '.'; this step is necessary to ignore them
    files = slow_measurement_finder(flist)
    print('Loading recording %s' % recording)
    print(files)
    fnames = ['%s/%s/%s' % (DIR_READ, recording, filename) for filename in files]
    slow_measurements[recording] =  data_loader(fnames)

### Resample to remove half-empty rows

In [None]:
%%time

for recording in recordings_sippv:
    slow_measurements[recording] = slow_measurements[recording].resample('1S').mean()

### Limit the intervals to SIPPV (PC-AC) periods

In [None]:
print('Length of the recordings in seconds BEFORE removing non-SIPPV periods: \n')
for recording in recordings_sippv:
    print('%-10s %-10.d' % (recording, len(slow_measurements[recording])))

Limit recordings to ** continuous PC-AC (SIPPV)** periods using **manual lookup** of the recordings (The other recordings in **"recordings_sippv"** are completely PC-AC (SIPPV) recordings)

In [None]:
slow_measurements['DG001'] = slow_measurements['DG001']['2015-09-25 13:42:42':'2015-09-26 10:27:19']
slow_measurements['DG005_1'] = slow_measurements['DG005_1']['2015-10-13 08:54:08':'2015-10-13 14:48:34']
slow_measurements['DG005_2'] = slow_measurements['DG005_2']['2015-10-23 22:30:38':'2015-10-24 11:18:40']
slow_measurements['DG015'] = slow_measurements['DG015']['2015-11-30 13:16:46':'2015-11-30 17:04:37']
slow_measurements['DG016'] = slow_measurements['DG016']['2015-12-07 18:16:17':'2015-12-08 08:50:53']
slow_measurements['DG017'] = slow_measurements['DG017']['2015-12-08 14:29:23':'2015-12-09 17:45:53']
slow_measurements['DG018_1'] = slow_measurements['DG018_1']['2015-12-11 22:15:28':'2015-12-13 01:06:49']
slow_measurements['DG018_2'] = slow_measurements['DG018_2']['2015-12-17 21:27:23':'2015-12-17 22:50:37']
slow_measurements['DG022'] = slow_measurements['DG022']['2016-01-06 03:04:16':'2016-01-06 06:22:55']
slow_measurements['DG027'] = slow_measurements['DG027']['2016-01-29 12:42:06':'2016-02-01 16:38:26']
slow_measurements['DG032_1'] = slow_measurements['DG032_1']['2016-03-07 14:58:29':'2016-03-09 09:47:21']
slow_measurements['DG032_2'] = slow_measurements['DG032_2']['2016-03-24 13:45:36':'2016-03-26 02:06:39']
slow_measurements['DG034'] = slow_measurements['DG034']['2016-03-26 15:17:23':'2016-03-28 16:44:59']
slow_measurements['DG036'] = slow_measurements['DG036']['2016-04-25 15:47:09':'2016-04-25 15:49:29']
slow_measurements['DG038_1'] = slow_measurements['DG038_1']['2016-05-11 11:21:16':'2016-05-11 20:26:32']
slow_measurements['DG040_1'] = slow_measurements['DG040_1']['2016-06-09 10:00:31':'2016-06-09 15:56:34']
slow_measurements['DG040_2'] = slow_measurements['DG040_2']['2016-06-24 08:39:40':'2016-06-24 15:29:44']
slow_measurements['DG046_1'] = slow_measurements['DG046_1']['2016-07-12 15:15:02':'2016-07-12 15:16:13']
slow_measurements['DG049'] = slow_measurements['DG049']['2016-08-31 15:56:01':'2016-09-02 05:24:51']
slow_measurements['DG050'] = slow_measurements['DG050']['2016-09-05 08:48:28':'2016-09-05 11:20:57']
slow_measurements['DG053'] = slow_measurements['DG053']['2016-10-15 10:09:39':'2016-10-15 10:18:35']
slow_measurements['DG056'] = slow_measurements['DG056']['2016-11-12 15:52:36':'2016-11-13 12:02:08']
slow_measurements['DG060'] = slow_measurements['DG060']['2017-02-15 19:27:28':'2017-02-21 10:08:52']


In [None]:
print('Length of the recordings in seconds AFTER removing non-SIPPV periods: \n')
for recording in recordings_sippv:
    print('%-10s %-10.d' % (recording, len(slow_measurements[recording])))

### Normalise parameters to the body weight

In [None]:
# Normalizing the tidal volumes for body weight kilogram
for recording in recordings_sippv:
    try:
        a = slow_measurements[recording]
        a['VT_kg']       = a['5001|VT [mL]'] / current_weights[recording]
        a['VTi_kg']      = a['5001|VTi [mL]'] / current_weights[recording]
        a['VTe_kg']      = a['5001|VTe [mL]'] / current_weights[recording]
        a['VTmand_kg']   = a['5001|VTmand [mL]'] / current_weights[recording]
        a['VTspon_kg']   = a['5001|VTspon [mL]'] / current_weights[recording]
        a['VTimand_kg']  = a['5001|VTimand [mL]'] / current_weights[recording]
        a['VTemand_kg']  = a['5001|VTemand [mL]'] / current_weights[recording]
        a['VTispon_kg']  = a['5001|VTispon [mL]'] / current_weights[recording]
        a['VTespon_kg']  = a['5001|VTespon [mL]'] / current_weights[recording]
    except KeyError:
        print('%s does not have all of the parameters' % recording)

In [None]:
# Normalising minute volumes for body weight kilograms
for recording in recordings_sippv:
    try:
        a = slow_measurements[recording]
        a['MV_kg'] =      a['5001|MV [L/min]'] / current_weights[recording]
        a['MVi_kg'] =     a['5001|MVi [L/min]'] / current_weights[recording]
        a['MVe_kg'] =     a['5001|MVe [L/min]'] / current_weights[recording]
        a['MVemand_kg'] = a['5001|MVemand [L/min]'] / current_weights[recording]
        a['MVespon_kg'] = a['5001|MVespon [L/min]'] / current_weights[recording]
        a['MVleak_kg'] =  a['5001|MVleak [L/min]'] / current_weights[recording]
    except KeyError:
        print('%s does not have all of the parameters' % recording)

### Rename columns

In [None]:
# Creating a dictionary to rename "clumsy" column names with simple ones

old = ['5001|VT [mL]', '5001|VTi [mL]', '5001|VTe [mL]', 
       '5001|VTemand [mL]', '5001|VTespon [mL]',
       '5001|VTimand [mL]', '5001|VTispon [mL]', 
       '5001|VTmand [mL]', '5001|VTspon [mL]',
       '5001|MV [L/min]', '5001|MVe [L/min]', '5001|MVemand [L/min]', '5001|MVespon [L/min]',
       '5001|MVi [L/min]', '5001|MVleak [L/min]', '5001|% MVspon [%]', '5001|% leak [%]', 
       '5001|C20/Cdyn [no unit]', '5001|Cdyn [L/bar]', '5001|E (I:E) [no unit]', '5001|E [mbar/L]', 
       '5001|EIP [mbar]', '5001|FiO2 [%]', '5001|FlowDev [L/min]', '5001|I (I:E) [no unit]', 
       '5001|I:Espon (E-Part) [no unit]', '5001|I:Espon (I-Part) [no unit]', '5001|PEEP [mbar]',
       '5001|PIP [mbar]', '5001|Pmean [mbar]', '5001|Pmin [mbar]', '5001|R [mbar/L/s]', '5001|RR [1/min]',
       '5001|RRmand [1/min]', '5001|RRspon [1/min]', '5001|Rpat [mbar/L/s]', '5001|TC [s]', '5001|TCe [s]',
       '5001|Tispon [s]', '5001|r2 [no unit]']

new = ['VT', 'VTi', 'VTe', 
       'VTemand', 'VTespon', 
       'VTimand', 'VTispon', 
       'VTmand', 'VTspon',
       'MV', 'MVe', 'MVemand', 'MVespon', 
       'MVi', 'MVleak', 'MVspon%', 'leak%', 
       'C20_Cdyn', 'Cdyn', 'E_IE', 'E', 
       'EIP', 'FiO2', 'FlowDev', 'I_IE', 
       'I_Espon_E', 'I_Espon_I', 'PEEP',
       'PIP', 'Pmean', 'Pmin', 'R', 'RR',
       'RRmand', 'RRspon', 'Rpat', 'TC', 'TCe',
       'Tispon', 'r2']

rename_dict = dict(zip(old, new))

In [None]:
# Renaming column names and removing unimportant columns 

for recording in recordings_sippv:
    try:
        slow_measurements[recording].rename(columns=rename_dict, inplace=True)
        to_delete = [par for par in list(slow_measurements[recording]) 
                     if par.startswith('5001') or par.startswith('8272')]
        slow_measurements[recording] = slow_measurements[recording].drop(to_delete, axis = 1)
    except KeyError:
        print('%s does not have all of the parameters' % recording)

### Retrieving the set respiratory rate and adding it to the DataFrames

In [None]:
RR_set = {}
for recording in recordings_sippv:
    RR_set[recording] = vent_settings_selected[recording][vent_settings_selected[recording].Id == 'RR'].copy()
    RR_set[recording]['RR_set'] = RR_set[recording]['Value New']
    RR_set[recording] = RR_set[recording][['RR_set']]
    RR_set[recording] = RR_set[recording].reindex(slow_measurements[recording].index, method = 'ffill')

In [None]:
for recording in recordings_sippv:
    slow_measurements[recording] = pd.concat([slow_measurements[recording], RR_set[recording]],
                                            join = 'inner', axis = 1)

### Mark if leak compensation was on or not and add it to the DataFrames as a categorical variable:


**'leak_comp'** = **VTmand_kg** - **VTemand_kg**

*  If the leak-compensation if off VTmand = VTemand. The targeted parameter is VTemand.
*  If leak-compesation is on, VTmand > VTemand. The targeted parameter is VTmand




In [None]:
# Create a new columns in the dataframe with the amount of leak compensation
# This value is close to 0 if leak compensation was off as the targeted VT is VTemand in that case
# When leak compensation is on, 'VTmand' is the sum of 'VTemand' and the calculated expiratory leak
for recording in recordings_sippv:
    slow_measurements[recording]['leak_comp'] = slow_measurements[recording]['VTmand_kg'] - \
        slow_measurements[recording]['VTemand_kg']

In [None]:
for recording in recordings_sippv:
    if slow_measurements[recording]['leak_comp'].mean() > 0.001:
        slow_measurements[recording]['leak_comp_ON'] = 1 
    else:
        slow_measurements[recording]['leak_comp_ON'] = 0

### Add the recording's name to the DataFrames as a categorical variable

In [None]:
for recording in recordings_sippv:
    slow_measurements[recording]['recording'] = recording 

### Drop unnecessary columns

In [None]:
for recording in recordings_sippv:
    slow_measurements[recording].drop(['Time [ms]', 'Rel.Time [s]'], axis=1, inplace=True )

### Save all processed DataFrames to pickle archives

##### Slow measurements directory is too large to be written into pickle archive in one step

In [None]:
rec1 = recordings_sippv[:20]; rec2 = recordings_sippv[20:40]; 
rec3 = recordings_sippv[40:60]

In [None]:
slow_measurements_1 = { key: value for key, value in slow_measurements.items() if key in rec1}
with open('%s/%s.pickle' % (DATA_DUMP, 'slow_measurements_sippv_1'), 'wb') as handle:
    pickle.dump(slow_measurements_1, handle, protocol=pickle.HIGHEST_PROTOCOL)

In [None]:
slow_measurements_2 = { key: value for key, value in slow_measurements.items() if key in rec2}
with open('%s/%s.pickle' % (DATA_DUMP, 'slow_measurements_sippv_2'), 'wb') as handle:
    pickle.dump(slow_measurements_2, handle, protocol=pickle.HIGHEST_PROTOCOL)

In [None]:
slow_measurements_3 = { key: value for key, value in slow_measurements.items() if key in rec3}
with open('%s/%s.pickle' % (DATA_DUMP, 'slow_measurements_sippv_3'), 'wb') as handle:
    pickle.dump(slow_measurements_3, handle, protocol=pickle.HIGHEST_PROTOCOL)