![alt text](./Cerny_logo_1.jpg)

# Analysis of Cerny ventilation recordings

#### Processing blood gases

This notebook imports and processes blood gas data and exports it into a pickle archive.

The data processed and analysed in this Notebook were collected by the **Neonatal Emergency and Transport Service of the Peter Cerny Foundation**, Budapest, Hungary

**Author: Dr Gusztav Belteki**


- Total number of recordings: **1251**
- Ventilation recordings longer than 15 minutes: **1035 cases**
- Clinical data available in **1053 cases**
- Only keep clinical data for cases where ventilation recordings (>15 minutes) are also available: **987 cases**
- Blood gases available in **927 cases**

### 1. Import the required libraries and set options

In [1]:
import IPython
import pandas as pd
import numpy as np
import scipy as sp
import matplotlib
import matplotlib.pyplot as plt

import os
import sys
import re
import pickle

from scipy import stats
from pandas import Series, DataFrame
from datetime import datetime, timedelta

%matplotlib inline
matplotlib.style.use('classic')
matplotlib.rcParams['figure.facecolor'] = 'w'

pd.set_option('display.max_rows', 100)
pd.set_option('display.max_columns', 100)
pd.set_option('mode.chained_assignment', None) 

# This is to turn off a warning message which is given when read_Excel() imports '.xlsx' files
import warnings
warnings.simplefilter("ignore")

In [2]:
print("Python version: {}".format(sys.version))
print("pandas version: {}".format(pd.__version__))
print("matplotlib version: {}".format(matplotlib.__version__))
print("NumPy version: {}".format(np.__version__))
print("SciPy version: {}".format(sp.__version__))
print("IPython version: {}".format(IPython.__version__))

Python version: 3.11.7 (main, Dec 15 2023, 12:09:04) [Clang 14.0.6 ]
pandas version: 2.1.4
matplotlib version: 3.8.0
NumPy version: 1.26.3
SciPy version: 1.11.4
IPython version: 8.20.0


### 2. List and set the working directory and the directory to write out data

In [3]:
# Name of the external hard drive
DRIVE = 'GUSZTI'

# Directory on external drive to read the clinical from
DIR_READ = os.path.join(os.sep, 'Volumes', DRIVE, 'Fabian_new', 'fabian_patient_data_all_new')

# Path to project folder containing ventilation research results
PATH = os.path.join(os.sep, 'Users', 'guszti', 'Library', 'Mobile Documents', 'com~apple~CloudDocs', 
                            'Documents', 'Research', 'Ventilation')

# Folder to export the result of analysis
DIR_WRITE = os.path.join(PATH, 'ventilation_fabian_new', 'Analyses')
os.makedirs(DIR_WRITE, exist_ok = True)

# Folder on a USB stick to export data to and to import processed data exported by other Notebooks
DATA_DUMP = os.path.join(os.sep, '/Volumes', DRIVE, 'data_dump', 'fabian_new',)
os.makedirs(DATA_DUMP, exist_ok = True)

In [4]:
DIR_READ, DIR_WRITE, DATA_DUMP

('/Volumes/GUSZTI/Fabian_new/fabian_patient_data_all_new',
 '/Users/guszti/Library/Mobile Documents/com~apple~CloudDocs/Documents/Research/Ventilation/ventilation_fabian_new/Analyses',
 '/Volumes/GUSZTI/data_dump/fabian_new')

### 3. Import clinical DataFrame from pickle archive

In [5]:
with open(os.path.join(DATA_DUMP, 'clin_df_new.pickle'), 'rb') as handle:
    clin_df = pickle.load(handle)

In [6]:
cases = sorted(clin_df.index)

In [7]:
len(cases)

1079

### 4. Import all clinical data containing blood gases

In [8]:
# import text files in a dictionary
clin_dict = {}
for fname in os.listdir(DIR_READ):
    if not fname.startswith('.'): # disregard hidden files
        #print(fname)
        fhandle = open(os.path.join(DIR_READ, fname), 'r', encoding = 'cp1252', errors='ignore')
        clin_dict[fname[:-4]] = fhandle.read() # use the filenames without the .txt extension as keys
        fhandle.close()

In [9]:
len(clin_dict)

1199

In [10]:
clin_dict = {key: value for key, value in clin_dict.items() if key in cases }

In [11]:
len(clin_dict)

1079

In [12]:
gas_dict = {}
# Remove clinical details preceding the blood gases
for key, value in clin_dict.items():
    # For recordings starting with AT001263, 'Astrup' in the text file was changed to 'Labor
    if 'Astrup' in value:
        gas_dict[key] = value[value.index('Astrup'):]
    elif 'Labor' in value:
        gas_dict[key] = value[value.index('Labor'):]
    else:
        print(key, 'has no blood gas')

AT001299 has no blood gas
AT001281 has no blood gas
AT001372 has no blood gas


In [13]:
len(gas_dict)

1076

In [31]:
gas_dict_2 = {}

for key, value in gas_dict.items():
    gas_dict_2[key] = {}
    # Recordings before and after AT001263 have different formats and they need to be processed differently 
    if int(key[2:].lstrip('0')) < 1263:
        print(key)
        for i, gas in enumerate(value.split('Astrup')[1:]):
            gas_dict_2[key][i] = {}
            items = gas.split('\n')[1:-1]
            for item in items:
                name, value = item.split(':')
                if value.strip() == '':
                    break
                else:
                    gas_dict_2[key][i][name.strip()] = value.strip()
    else:
        print(key)
        for i, gas in enumerate(value.split('Labor')[1:]):
            gas_dict_2[key][i] = {}
            items = gas.split('\n')[1:-1]
            for item in items:
                if item.startwith('Time'):
                    name = item
                name, value = item.split(':')[:2]
                if value.strip() == '':
                    break
                else:
                    gas_dict_2[key][i][name.strip()] = value.strip()

AT000005
AT000006
AT000007
AT000008
AT000009
AT000011
AT000012
AT000013
AT000014
AT000015
AT000016
AT000017
AT000018
AT000019
AT000020
AT000022
AT000023
AT000025
AT000027
AT000029
AT000030
AT000031
AT000032
AT000033
AT000034
AT000035
AT000036
AT000037
AT000038
AT000039
AT000040
AT000042
AT000043
AT000048
AT000049
AT000050
AT000051
AT000052
AT000053
AT000054
AT000055
AT000056
AT000057
AT000058
AT000059
AT000060
AT000061
AT000062
AT000063
AT000065
AT000066
AT000068
AT000069
AT000070
AT000072
AT000073
AT000074
AT000075
AT000076
AT000077
AT000078
AT000079
AT000080
AT000081
AT000082
AT000083
AT000084
AT000085
AT000086
AT000087
AT000088
AT000089
AT000090
AT000091
AT000094
AT000095
AT000096
AT000098
AT000099
AT000100
AT000101
AT000103
AT000104
AT000105
AT000106
AT000107
AT000108
AT000110
AT000111
AT000112
AT000113
AT000115
AT000116
AT000117
AT000118
AT000120
AT000121
AT000122
AT000123
AT000124
AT000125
AT000126
AT000127
AT000129
AT000130
AT000131
AT000132
AT000133
AT000134
AT000135
AT000137
A

In [39]:
gas_dict_2 = {}

for key, value in gas_dict.items():
    gas_dict_2[key] = {}
    # Recordings before and after AT001263 have different formats and they need to be processed differently 
    if int(key[2:].lstrip('0')) < 1263:
        #print(key)
        for i, gas in enumerate(value.split('Astrup')[1:]):
            gas_dict_2[key][i] = {}
            items = gas.split('\n')[1:-1]
            for item in items:
                name, value = item.split(':')
                if value.strip() == '':
                    break
                else:
                    gas_dict_2[key][i][name.strip()] = value.strip()
    else:
        #print(key)
        for i, gas in enumerate(value.split('Labor')[1:]):
            gas_dict_2[key][i] = {}
            items = gas.split('\n')[1:-1]
            for item in items:
                name, value = item.split(':', maxsplit=1)
                if value.strip() == '':
                    break
                else:
                    gas_dict_2[key][i][name.strip()] = value.strip()

In [40]:
gas_dict_2['AT001251']

{0: {'Time': '0702',
  'pH': '7.331',
  'pCO2': '46.4',
  'pO2': '64.0',
  'HCO3': '24.0',
  'ABE': '-2.3',
  'Saturatio': '90.5',
  'FiO2': '0.30',
  'Type': 'Capillaris'},
 1: {},
 2: {}}

In [41]:
gas_dict_2['AT001263']

{0: {'Time': '2024-02-17 14:32',
  'Sample site': 'KapillÃ¡ris',
  'Hypothermia': 'nem',
  'VÃ©rcukor': '7.8 mmol/l',
  'pH': '7.037',
  'pCO2': '85.3 Hgmm',
  'pO2': '47.9 Hgmm',
  'HCO3': '22.4 mmol/l',
  'BE(ecf)': '-10.24 mmol/l',
  'Lactat': '1.4 mmol/l'},
 1: {'Time': '2024-02-17 15:13',
  'Sample site': 'KapillÃ¡ris',
  'Hypothermia': 'nem',
  'VÃ©rcukor': '8.2 mmol/l',
  'pH': '7.117',
  'pCO2': '77.2 Hgmm',
  'pO2': '33.2 Hgmm',
  'HCO3': '24.9 mmol/l',
  'BE(ecf)': '-4.5 mmol/l',
  'cSO2': '43.5 %',
  'Lactat': '3.8 mmol/l'},
 2: {'Time': '2024-02-17 16:12',
  'Sample site': 'KapillÃ¡ris',
  'Hypothermia': 'nem',
  'VÃ©rcukor': '8.9 mmol/l',
  'pH': '7.065',
  'pCO2': '74.3 Hgmm',
  'pO2': '41.5 Hgmm',
  'HCO3': '16 mmol/l',
  'BE(ecf)': '-9 mmol/l',
  'cSO2': '84.4 %',
  'Lactat': '3.9 mmol/l'}}

In [42]:
for case in gas_dict_2:
    for gas in sorted(gas_dict_2[case].keys()):
        if gas_dict_2[case][gas] == {}:
            del gas_dict_2[case][gas]

In [43]:
gas_frames = {}
for case in gas_dict_2.keys():
    gas_frames[case] = DataFrame(gas_dict_2[case])

In [44]:
def time_changer(rec):
    a = clin_df.loc[rec]['Recording start'].date()
    for column in gas_frames[rec]:
        b = gas_frames[rec][column]['Time']
        c = datetime.strptime(str(b), '%H%M').time()
        # This str() is needed here because AL000665 (and only that) is interpreted as Datetime
        d = datetime.combine(a, c)
        gas_frames[rec][column]['Time'] = d  

In [45]:
for case in sorted(gas_frames.keys()):
    # Recordings after AT001263 do not have blood gas in the text file
    #print(case)
    time_changer(case)

ValueError: unconverted data remains: -02-17 14:32

In [19]:
gas_frames['AT000935']

Unnamed: 0,0,1,2
Time,2023-01-09 02:35:00,2023-01-09 03:14:00,2023-01-09 05:44:00
pH,6.867,7.023,7.287
pCO2,98.2,82.8,60.8
pO2,37.6,45.1,36.3
HCO3,17.8,21.5,28.4
ABE,-15.6,-9.4,0.4
Saturatio,35.5,57.5,83.1
FiO2,0.60,0.90,0.38
Type,Capillaris,Capillaris,Capillaris


In [21]:
gas_frames['AT001285']

In [None]:
for case in sorted(gas_frames.keys()):
    try:
        gas_frames[case] =  gas_frames[case].T.set_index('Time')
    
    except:
        print('No blood gas for %s' % case)
        del gas_frames[case]

In [None]:
len(gas_frames)

### 5. Quality control of blood gases

#### Combine all gases to a single DataFrame

In [None]:
gas_frames_all = pd.concat(gas_frames)

In [None]:
gas_frames_all

In [None]:
for column in ['pH', 'pCO2', 'pO2', 'HCO3', 'ABE', 'Saturatio', 'FiO2']:
    gas_frames_all[column] = gas_frames_all[column].astype('float')

In [None]:
gas_frames_all.columns

In [None]:
gas_frames_all.info()

In [None]:
gas_frames_all[['pH', 'pCO2', 'pO2', 'HCO3', 'ABE']].describe()

#### Manually review outlier with impossible values in gases and correct as appropriate

Only correct trivial ones. 

- For example, "7" is sometimes mis-recognised as "2" by OCR. 
- Other times, the decimal point is clearly in the wrong place.

Only correct those ones where the other values in the blood gas are consistent with the change you are making. Otherwise, remove the clearly impossible values:

- pH < 6 or pH > 7.7
- pCO2 < 10 mmHg or > 200 mmHg
- ABE < - 40 or > 50 



In [None]:
gas_frames_all[gas_frames_all['pH'] < 6].sort_values('pH', ascending=True)

In [None]:
gas_frames_all[gas_frames_all['pH'] > 7.7].sort_values('pH', ascending=False)

In [None]:
gas_frames_all[gas_frames_all['pCO2'] > 200].sort_values('pCO2', ascending=True)

In [None]:
gas_frames_all[gas_frames_all['pCO2'] < 10].sort_values('pCO2', ascending=True)

In [None]:
gas_frames_all[gas_frames_all['ABE'] < - 40].sort_values('ABE', ascending=True)

In [None]:
gas_frames_all[gas_frames_all['ABE'] > 50].sort_values('ABE', ascending=True)

### 6. Export bood gases as Excel files

In [None]:
# Save blood gases into a multi-sheet Excel file

writer = pd.ExcelWriter(os.path.join(DIR_WRITE, 'blood_gases_new.xlsx'))
for case in sorted(gas_frames.keys()):
    gas_frames[case].to_excel(writer, case)
writer.save()

### 7. Export processed data as pickle files

In [None]:
with open(os.path.join(DATA_DUMP, 'blood_gases_new.pickle'), 'wb') as handle:
    pickle.dump(gas_frames, handle, protocol=pickle.HIGHEST_PROTOCOL)