# Vital Sign Normalisation using Z-score

## - Motivation

Some vital signs are age-dependent for the paediatric population, such as heart rate (HR), respiratory rate (RR), and blood pressures (systolic, mean and diastolic). We applied Z-score to normalise the analysed transported children's vital signs to eliminate the age difference in vital signs. We considered centile charts for the HR and RR reference values reported in [1]. For systolic and diastolic blood pressure values, we referred to the National High Blood Pressure Education Program's blood pressure tables organised by age, gender, and height [2]. Our analysis considered the MBP references for normal children as reported by [3]. We calculated Z-scores for BP by selecting children with the median height and calculating Z-score values separately for boys and girls (the detailed Z-score calculation procedure can be found in the Supplementary Materials).
It is noteworthy that there are very few evidence-based blood pressure guidelines for neonates and infants aged 0 to 1 year, to the best of our knowledge. Blood pressure changes can be significant in preterm infants, depending on a variety of factors. To provide clinicians with a rough reference range, we empirically adjusted the blood pressure reference values for children aged 0 to 1 year by consulting paediatric transport intensivists and clinicians. Following Z-score normalisation, the vital sign distributions are examined at the population level in terms of disease groups and outcome groups throughout the transport.

## - References

[1] S. Fleming, M. Thompson, R. Stevens, C. Heneghan, A. Plüddemann, I. Maconochie, L. Tarassenko, and D. Mant, “Normal ranges of heart rate and respiratory rate in children from birth to 18 years of age: a systematic review of observational studies,” The Lancet, vol. 377, no. 9770, pp. 1011–1018, 2011.

[2] N. H. B. P. E. Program, The fourth report on the diagnosis, evaluation, and treatment of high blood pressure in children and adolescents. US Department of Health and Human Services, National Institutes of Health . . . , 2005, no. 5.

[3] I. U. Haque and A. L. Zaritsky, “Analysis of the evidence for the lower limit of systolic and mean arterial pressure in children,” Pediatric Critical Care Medicine, vol. 8, no. 2, pp. 138–144, 2007.


In [1]:
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import matplotlib.pyplot as plt # plot figures
import seaborn as sns # plot figures
import plotly.express as px
from datetime import datetime
from collections import Counter
from matplotlib.ticker import MaxNLocator
import matplotlib.gridspec as gridspec
from statistics import mean, median, stdev, variance
import glob
import os
import math
import random
import time
import warnings
warnings.filterwarnings('ignore') 

In [2]:
# local file path
path=r'D:\\00_UCL_Z-score_Data_Files'

In [3]:
# read all csv files from path folder
numerical_CSV_files = glob.glob(os.path.join(path, "*.csv"))

In [4]:
# print total number of files
len(numerical_CSV_files)

1781

In [5]:
file_path = "D:\\Jupyter-workspace\\CATS_UCL_Description_Work\\Data_Prepration_and_Extraction\\"

In [6]:
df_EHR = pd.read_csv(file_path+"df_monitor_patients.csv")
print(len(df_EHR))

1767


In [7]:
file_path = "D:\\Jupyter-workspace\\CATS_UCL_Description_Work\\Vital Sign Data and File Check\\"

In [8]:
df_file_check = pd.read_csv(file_path+"Numerical_Vital_Sign_Files_Summary.csv")
print(len(df_file_check))

1781


In [9]:
df = df_EHR.merge(df_file_check, left_on='cats_id', right_on='cats_id')
df = df.reset_index(drop=True)
print("size of: ",len(df))

size of:  1711


In [10]:
df.columns

Index(['Unnamed: 0_x', 'cats_id', 'gender', 'weight_kg', 'age_mon',
       'referral_outcome_datetime', 'referring_unit', 'primary_diagnosis',
       'diagnosis_group', 'preexisting_conditions', 'ventilation_status',
       'vasoactive_agent_used', 'vasoactive_agent_name', 'inhaled_no',
       'arrive_collunit', 'depart_destunit', 'cats_interv', 'local_interv',
       'pim3', 'transport_team_in_attendance', 'referral_datetime',
       'referral_outcome', 'critical_incident', 'incident_type', 'csv_files',
       'min_session_time', 'max_session_time', 'start_datetime',
       'end_datetime', 'length_of_stay', 'length_of_icu_stay', 'deceased_flag',
       'sex', 'ethnicity_nat_code', 'ethnicity_name', 'age_at_admission',
       'age_at_death', 'Destination Unit', 'Destination Care Area',
       'Depart Collection Unit', 'Arrive Destination Unit', 'dest_is_GOSH',
       'CI_Patient_Equipment_Related', 'CI_Patient_Related',
       'CI_Ambulance_Related', 'CI_Equipment_Related',
       'CI_

## Define the z-score mean and STD

- Percentile             Z score
- 90th                 1.645
- 95th                 1.960


In [12]:
# For 90th centile it is 1.645, for 95th centile is 1.96.
# Define z score for 80% confidence level
z_90th_centile = 1.645
# Define mean value for each age group
m_0_3m  = 143
m_3_6m    = 140
m_6_9m    =  134
m_9_12m    = 128
m_12_18m  =  123
m_18_24m   = 116
m_2_3y  = 110
m_3_4y     = 104
m_4_6y  =  98
m_6_8y   =   91
m_8_12y =   84
m_12_15y = 78
m_15_18y   = 73


# define STD for each age group
STD_0_3m = (164-m_0_3m)/z_90th_centile
STD_3_6m = (159-m_3_6m)/z_90th_centile
STD_6_9m = (152-m_6_9m)/z_90th_centile
STD_9_12m = (145-m_9_12m)/z_90th_centile
STD_12_18m = (140-m_12_18m)/z_90th_centile
STD_18_24m = (135-m_18_24m)/z_90th_centile
STD_2_3y = (128-m_2_3y)/z_90th_centile
STD_3_4y= (123-m_3_4y)/z_90th_centile
STD_4_6y = (117-m_4_6y)/z_90th_centile
STD_6_8y = (111-m_6_8y)/z_90th_centile
STD_8_12y = (103-m_8_12y)/z_90th_centile
STD_12_15y = (96-m_12_15y)/z_90th_centile
STD_15_18y = (92-m_15_18y)/z_90th_centile



## Split patients to different age groups
## Assign mean and STD values to each age group
- 0-3,3-6,6-9,9-12,12-18 moths
- 18- 24 months, 2 - 3 year, 3-4 year, 4-6 year,6-8 year, 8-12 y, 12-15 y,15-18y

In [13]:
for i in range(0,len(df)):
    age = df.iloc[i]['age_mon']
    if 0 <= age and age <= 3:
        df.at[i,'Z_age_group'] = "0-3m"
        df.at[i,'Z_mean_value'] = m_0_3m
        df.at[i,'Z_STD'] = STD_0_3m 
    elif 3 < age and age <= 6:
        df.at[i,'Z_age_group'] = "3-6m"
        df.at[i,'Z_mean_value'] = m_3_6m
        df.at[i,'Z_STD'] = STD_3_6m 
    elif 6 < age and age <= 9:
        df.at[i,'Z_age_group'] = "6-9m"
        df.at[i,'Z_mean_value'] = m_6_9m    
        df.at[i,'Z_STD'] = STD_6_9m 
    elif 9 < age and age <= 12:
        df.at[i,'Z_age_group'] = "9-12m"
        df.at[i,'Z_mean_value'] = m_9_12m
        df.at[i,'Z_STD'] = STD_9_12m 
    elif 12 < age and age <= 18:
        df.at[i,'Z_age_group'] = "12-18m"
        df.at[i,'Z_mean_value'] = m_12_18m  
        df.at[i,'Z_STD'] = STD_12_18m
    elif 18 < age and age <= 24:
        df.at[i,'Z_age_group'] = "18-24m"
        df.at[i,'Z_mean_value'] = m_18_24m   
        df.at[i,'Z_STD'] = STD_18_24m 
    elif 2*12 < age and age <= 3*12:
        df.at[i,'Z_age_group'] = "2-3y"
        df.at[i,'Z_mean_value'] = m_2_3y  
        df.at[i,'Z_STD'] = STD_2_3y
    elif 3*12 < age and age <= 4*12:
        df.at[i,'Z_age_group'] = "3-4y"
        df.at[i,'Z_mean_value'] = m_3_4y
        df.at[i,'Z_STD'] = STD_3_4y 
    elif 4*12 < age and age <= 6*12:
        df.at[i,'Z_age_group'] = "4-6y"
        df.at[i,'Z_mean_value'] = m_4_6y
        df.at[i,'Z_STD'] = STD_4_6y 
    elif 6*12 < age and age <= 8*12:
        df.at[i,'Z_age_group'] = "6-8y"
        df.at[i,'Z_mean_value'] = m_6_8y   
        df.at[i,'Z_STD'] = STD_6_8y    
    elif 8*12 < age and age <= 12*12:
        df.at[i,'Z_age_group'] = "8-12y"
        df.at[i,'Z_mean_value'] = m_8_12y
        df.at[i,'Z_STD'] = STD_8_12y
    elif 12*12 < age and age <= 15*12:
        df.at[i,'Z_age_group'] = "12-15y"
        df.at[i,'Z_mean_value'] = m_12_15y 
        df.at[i,'Z_STD'] = STD_12_15y
    elif 15*12 < age and age <= 18*12:
        df.at[i,'Z_age_group'] = "15-18y"
        df.at[i,'Z_mean_value'] = m_15_18y   
        df.at[i,'Z_STD'] = STD_15_18y   

## Iterate each file and if the value is not 0, then calcuate z-score

In [None]:
# write data of interest to the df dataframe - the file describing data characterstics
# More columns can be added later
monitor_CATS_ID_List = df.cats_id.to_list()
for i in range(0,len(numerical_CSV_files)):
    print('The file NO. is: ',i)
    index = i
    # load one file each time
    my_csv_file = numerical_CSV_files[index]
    print('File path and name:',my_csv_file)
    #fsize = round(os.path.getsize(my_csv_file)/float(1024),2)
    #print('fsize: ',fsize)
    cats_id = int(my_csv_file.split('\\')[3].split('_')[0])
    print('cats_id: ',cats_id)
    file_name  = my_csv_file.split('\\')[3]
    # If there are  values in the HR data
    if cats_id in monitor_CATS_ID_List: 
        if df[df['cats_id']==cats_id].c_ecgRate.values[0] != 0:
            read_df = pd.read_csv(my_csv_file)
            read_df = read_df.reset_index(drop=True)
            Z_mean = df[df['cats_id']==cats_id].Z_mean_value.values[0]
            Z_STD = df[df['cats_id']==cats_id].Z_STD.values[0]
            for k in range(0,len(read_df)):
                HR_value = read_df.iloc[k]['necgRate']
                z_HR_value = (HR_value-Z_mean)/Z_STD
                read_df.at[k,'necgRate'] = z_HR_value
            writePath = 'D:\\00_UCL_Z-score_Data_Files'
            filePath = writePath + '\\' + file_name
            print('Writing Destination Path:',filePath)
            read_df.to_csv(filePath)