**Baseline Characteristics**

We will summarise the baseline characteristics for our patient cohort for each of the feature sets.

For all dynamic features we will show the mean.
Numerical static features we will show the mean.

Certain features we will show the mode: Ventilator Mode, Charlson score.

We will ignore calculated ratios SpO2:FiO2 and P/F ratio as their components are already shown.

We will show extubation success patients vs. failure patients.

In [1]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [2]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

**Static features**

Let's start with the static features

In [76]:
static_features_train = '/content/drive/MyDrive/MSc_Final_Project/02_data_analysis/mimic/data_analysis/datasets/08_model_input_data/01_feature_set_1/01_lstm_data/static_data/raw_data/train_static_v3.parquet'
static_features_test = '/content/drive/MyDrive/MSc_Final_Project/02_data_analysis/mimic/data_analysis/datasets/08_model_input_data/01_feature_set_1/01_lstm_data/static_data/raw_data/test_static_v3.parquet'

static_features_train = pd.read_parquet(static_features_train)
static_features_test = pd.read_parquet(static_features_test)

In [77]:
# Combine train and test static data
static_features = pd.concat([static_features_train, static_features_test], axis=0)

static_features.head()

  sqr = _ensure_numeric((avg - values) ** 2)


Unnamed: 0,subject_id,age,gender,ethnicity,weight,height,BMI
0,10001884,68,F,BLACK/AFRICAN AMERICAN,65.0,157.0,26.370238
22,10002428,80,F,WHITE,43.0,150.0,19.111111
29,10004235,47,M,BLACK/CAPE VERDEAN,127.0,183.0,37.9229
32,10010867,28,F,WHITE - BRAZILIAN,120.0,170.0,41.522491
35,10011365,73,F,WHITE,46.3,157.0,18.783723


Need to ad labels to be ablse to segment

In [78]:
labels = '/content/drive/MyDrive/MSc_Final_Project/02_data_analysis/mimic/data_analysis/datasets/03_annotated_set/annotation_v03.parquet'

labels = pd.read_parquet(labels)

labels.head()

Unnamed: 0,subject_id,hadm_id,stay_id,ventilation_starttime,ventilation_endtime,ventilation_itemid,ventilation_ordercategoryname,extubation_starttime,extubation_endtime,extubation_itemid,extubation_ordercategoryname,ventilation_duration,anchor_age,extubation_failure
0,10001884,26184834,37510196,2131-01-11 04:40:00,2131-01-12 17:40:00,225792,Ventilation,2131-01-12 17:40:00,2131-01-12 17:41:00,227194,Intubation/Extubation,2220.0,68,1
22,10002428,28662225,38875437,2156-04-19 20:10:00,2156-04-22 17:05:00,225792,Ventilation,2156-04-22 17:10:00,2156-04-22 17:11:00,227194,Intubation/Extubation,4135.0,80,0
29,10004235,24181354,34100191,2196-02-24 16:52:00,2196-02-27 16:28:00,225792,Ventilation,2196-02-27 16:28:00,2196-02-27 16:29:00,227194,Intubation/Extubation,4296.0,47,1
32,10004720,22081550,35009126,2186-11-12 20:29:00,2186-11-17 14:00:00,225792,Ventilation,2186-11-17 14:00:00,2186-11-17 14:01:00,227194,Intubation/Extubation,6811.0,61,1
33,10004733,27411876,39635619,2174-12-04 12:25:00,2174-12-07 16:20:00,225792,Ventilation,2174-12-07 16:20:00,2174-12-07 16:21:00,227194,Intubation/Extubation,4555.0,51,0


In [79]:
# Attach labels to the static features dataframe while keeping original columns
static_features = pd.merge(static_features, labels[['subject_id', 'extubation_failure']], on='subject_id', how='left')

static_features.head()


  sqr = _ensure_numeric((avg - values) ** 2)


Unnamed: 0,subject_id,age,gender,ethnicity,weight,height,BMI,extubation_failure
0,10001884,68,F,BLACK/AFRICAN AMERICAN,65.0,157.0,26.370238,1
1,10002428,80,F,WHITE,43.0,150.0,19.111111,0
2,10004235,47,M,BLACK/CAPE VERDEAN,127.0,183.0,37.9229,1
3,10010867,28,F,WHITE - BRAZILIAN,120.0,170.0,41.522491,0
4,10011365,73,F,WHITE,46.3,157.0,18.783723,1


In [80]:
static_features.columns

Index(['subject_id', 'age', 'gender', 'ethnicity', 'weight', 'height', 'BMI',
       'extubation_failure'],
      dtype='object')

Calculate the mean for: weight, height, bmi
Count the number in each age group, gender and ethnicity
Split by success and failure


In [81]:
static_features_copy = static_features.copy()

In [82]:
# Categorizing age buckets correctly
static_features['age_bucket'] = pd.cut(static_features['age'],
                                       bins=[0, 44, 54, 64, 74, np.inf],
                                       labels=['≤44', '45-54', '55-64', '65-74', '≥75'],
                                       right=True)

# Categorizing ethnicity by the first word and grouping others into 'OTHER'
static_features['ethnicity_bucket'] = static_features['ethnicity'].str.strip().str.split('/|-').str[0].str.upper().replace({
    'BLACK': 'BLACK',
    'WHITE': 'WHITE',
    'ASIAN': 'ASIAN',
    'HISPANIC': 'HISPANIC',
    'HISPANIC OR LATINO': 'HISPANIC'
}).where(static_features['ethnicity'].str.strip().str.split('/|-').str[0].str.upper().isin(['BLACK', 'WHITE', 'ASIAN', 'HISPANIC']), 'OTHER')

# Split data into extubation success and failure groups
success = static_features[static_features['extubation_failure'] == 0]
failure = static_features[static_features['extubation_failure'] == 1]

In [84]:
from scipy import stats

In [85]:
# Function to calculate mean, std, and p-value for continuous variables
def continuous_stats(variable, data):
    mean_total = data[variable].mean()
    std_total = data[variable].std()

    mean_success = success[variable].mean()
    std_success = success[variable].std()

    mean_failure = failure[variable].mean()
    std_failure = failure[variable].std()

    t_stat, p_value = stats.ttest_ind(success[variable], failure[variable])

    return [mean_total, std_total, mean_success, std_success, mean_failure, std_failure, p_value]

# Function to count categorical variables
def categorical_stats(variable, data, value):
    count_total = (data[variable] == value).sum()
    count_success = (success[variable] == value).sum()
    count_failure = (failure[variable] == value).sum()

    return [count_total, count_success, count_failure]

In [87]:
# Create the baseline characteristics table
features = ['Weight', 'Height', 'BMI', 'Male', 'Female',
            'Age ≤44', 'Age 45-54', 'Age 55-64', 'Age 65-74', 'Age ≥75',
            'Ethnicity (Asian)', 'Ethnicity (Black)', 'Ethnicity (Hispanic)', 'Ethnicity (White)', 'Ethnicity (Other)']


In [88]:
total_stats = [
    f"{continuous_stats('weight', static_features)[0]:.2f} ± {continuous_stats('weight', static_features)[1]:.2f}",
    f"{continuous_stats('height', static_features)[0]:.2f} ± {continuous_stats('height', static_features)[1]:.2f}",
    f"{continuous_stats('BMI', static_features)[0]:.2f} ± {continuous_stats('BMI', static_features)[1]:.2f}",
    categorical_stats('gender', static_features, 'M')[0],
    categorical_stats('gender', static_features, 'F')[0],
    categorical_stats('age_bucket', static_features, '≤44')[0],
    categorical_stats('age_bucket', static_features, '45-54')[0],
    categorical_stats('age_bucket', static_features, '55-64')[0],
    categorical_stats('age_bucket', static_features, '65-74')[0],
    categorical_stats('age_bucket', static_features, '≥75')[0],
    categorical_stats('ethnicity_bucket', static_features, 'ASIAN')[0],
    categorical_stats('ethnicity_bucket', static_features, 'BLACK')[0],
    categorical_stats('ethnicity_bucket', static_features, 'HISPANIC')[0],
    categorical_stats('ethnicity_bucket', static_features, 'WHITE')[0],
    categorical_stats('ethnicity_bucket', static_features, 'OTHER')[0]
]

success_stats = [
    f"{continuous_stats('weight', static_features)[2]:.2f} ± {continuous_stats('weight', static_features)[3]:.2f}",
    f"{continuous_stats('height', static_features)[2]:.2f} ± {continuous_stats('height', static_features)[3]:.2f}",
    f"{continuous_stats('BMI', static_features)[2]:.2f} ± {continuous_stats('BMI', static_features)[3]:.2f}",
    categorical_stats('gender', static_features, 'M')[1],
    categorical_stats('gender', static_features, 'F')[1],
    categorical_stats('age_bucket', static_features, '≤44')[1],
    categorical_stats('age_bucket', static_features, '45-54')[1],
    categorical_stats('age_bucket', static_features, '55-64')[1],
    categorical_stats('age_bucket', static_features, '65-74')[1],
    categorical_stats('age_bucket', static_features, '≥75')[1],
    categorical_stats('ethnicity_bucket', static_features, 'ASIAN')[1],
    categorical_stats('ethnicity_bucket', static_features, 'BLACK')[1],
    categorical_stats('ethnicity_bucket', static_features, 'HISPANIC')[1],
    categorical_stats('ethnicity_bucket', static_features, 'WHITE')[1],
    categorical_stats('ethnicity_bucket', static_features, 'OTHER')[1]
]

failure_stats = [
    f"{continuous_stats('weight', static_features)[4]:.2f} ± {continuous_stats('weight', static_features)[5]:.2f}",
    f"{continuous_stats('height', static_features)[4]:.2f} ± {continuous_stats('height', static_features)[5]:.2f}",
    f"{continuous_stats('BMI', static_features)[4]:.2f} ± {continuous_stats('BMI', static_features)[5]:.2f}",
    categorical_stats('gender', static_features, 'M')[2],
    categorical_stats('gender', static_features, 'F')[2],
    categorical_stats('age_bucket', static_features, '≤44')[2],
    categorical_stats('age_bucket', static_features, '45-54')[2],
    categorical_stats('age_bucket', static_features, '55-64')[2],
    categorical_stats('age_bucket', static_features, '65-74')[2],
    categorical_stats('age_bucket', static_features, '≥75')[2],
    categorical_stats('ethnicity_bucket', static_features, 'ASIAN')[2],
    categorical_stats('ethnicity_bucket', static_features, 'BLACK')[2],
    categorical_stats('ethnicity_bucket', static_features, 'HISPANIC')[2],
    categorical_stats('ethnicity_bucket', static_features, 'WHITE')[2],
    categorical_stats('ethnicity_bucket', static_features, 'OTHER')[2]
]

p_values = [
    continuous_stats('weight', static_features)[6],
    continuous_stats('height', static_features)[6],
    continuous_stats('BMI', static_features)[6],
    '', '', '', '', '', '', '', '', '', '', '', ''
]

In [94]:
# Ensure that all arrays have the same length
if len(total_stats) == len(success_stats) == len(failure_stats) == len(p_values):
    baseline_table = pd.DataFrame({
        'Feature': features,
        'Total Patients (mean ± SD or count)': total_stats,
        'Extubation Success (mean ± SD or count)': success_stats,
        'Extubation Failure (mean ± SD or count)': failure_stats,
        'P-value': p_values
    })

    # Display the table
    print(baseline_table)
else:
    print("Error: Arrays are not of the same length.")

                 Feature Total Patients (mean ± SD or count)  \
0                 Weight                       83.56 ± 24.20   
1                 Height                      169.20 ± 11.88   
2                    BMI                           inf ± nan   
3                   Male                                2728   
4                 Female                                1973   
5                Age ≤44                                 612   
6              Age 45-54                                 685   
7              Age 55-64                                1077   
8              Age 65-74                                1148   
9                Age ≥75                                1179   
10     Ethnicity (Asian)                                  47   
11     Ethnicity (Black)                                 483   
12  Ethnicity (Hispanic)                                 126   
13     Ethnicity (White)                                2859   
14     Ethnicity (Other)                

In [95]:
baseline_table_df = baseline_table.copy()

In [96]:
baseline_table_df

Unnamed: 0,Feature,Total Patients (mean ± SD or count),Extubation Success (mean ± SD or count),Extubation Failure (mean ± SD or count),P-value
0,Weight,83.56 ± 24.20,83.29 ± 23.61,84.13 ± 25.36,0.258888
1,Height,169.20 ± 11.88,169.34 ± 12.11,168.91 ± 11.40,0.238135
2,BMI,inf ± nan,inf ± nan,29.81 ± 12.35,
3,Male,2728,1839,889,
4,Female,1973,1318,655,
5,Age ≤44,612,459,153,
6,Age 45-54,685,464,221,
7,Age 55-64,1077,741,336,
8,Age 65-74,1148,750,398,
9,Age ≥75,1179,743,436,


Lets now add charlson score

In [97]:
fs2_train_static = '/content/drive/MyDrive/MSc_Final_Project/02_data_analysis/mimic/data_analysis/datasets/08_model_input_data/02_feature_set_2/01_lstm_data/static_data/train_static.parquet'
fs2_test_static = '/content/drive/MyDrive/MSc_Final_Project/02_data_analysis/mimic/data_analysis/datasets/08_model_input_data/02_feature_set_2/01_lstm_data/static_data/test_static.parquet'

fs2_train_static = pd.read_parquet(fs2_train_static)
fs2_test_static = pd.read_parquet(fs2_test_static)

In [98]:
# Combine together
fs2_static_features = pd.concat([fs2_train_static, fs2_test_static], axis=0)

fs2_static_features.head()

Unnamed: 0,subject_id,weight,height,BMI,age_group_55-64,age_group_65-74,age_group_≤44,age_group_≥75,gender_M,ethnicity_ASIAN,ethnicity_BLACK,ethnicity_HISPANIC,ethnicity_OTHER,ethnicity_WHITE,charlson_score
0,10001884,0.307331,0.25,0.337538,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,2.0
1,10002428,0.100564,0.104167,0.175384,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,4.0
2,10004235,0.890038,0.791667,0.595601,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0
3,10010867,0.824248,0.520833,0.676008,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0
4,10011365,0.131579,0.25,0.168071,0.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,13.0


In [100]:
# Extract only subject id and charlson score
fs2_static_features = fs2_static_features[['subject_id', 'charlson_score']]

fs2_static_features.head()

Unnamed: 0,subject_id,charlson_score
0,10001884,2.0
1,10002428,4.0
2,10004235,0.0
3,10010867,0.0
4,10011365,13.0


In [101]:
# Add the relevant extubation failure label based on subject id
fs2_static_features = pd.merge(fs2_static_features, labels[['subject_id', 'extubation_failure']], on='subject_id', how='left')

fs2_static_features

Unnamed: 0,subject_id,charlson_score,extubation_failure
0,10001884,2.0,1
1,10002428,4.0,0
2,10004235,0.0,1
3,10010867,0.0,0
4,10011365,13.0,1
...,...,...,...
4696,17854152,3.0,1
4697,17859433,0.0,1
4698,17900392,6.0,0
4699,17907953,4.0,1


In [105]:
# Calculate mode for Charlson score for the total, success, and failure groups
fs2_success = fs2_static_features[fs2_static_features['extubation_failure'] == 0]
fs2_failure = fs2_static_features[fs2_static_features['extubation_failure'] == 1]

In [106]:
# Calcuate the mode for the toal, success and failure classes
total_mode = fs2_static_features['charlson_score'].mode()[0]
success_mode = fs2_success['charlson_score'].mode()[0]
failure_mode = fs2_failure['charlson_score'].mode()[0]

In [111]:
# Create a DataFrame for the new row
charlson_mode_row = pd.DataFrame({
    'Feature': ['Charlson Score (Mode)'],
    'Total Patients (mean ± SD or count)': [f'{total_mode:.1f}'],
    'Extubation Success (mean ± SD or count)': [f'{success_mode:.1f}'],
    'Extubation Failure (mean ± SD or count)': [f'{failure_mode:.1f}'],
    'P-value': ['']  # Mode does not have a p-value
})

In [108]:
baseline_table_copy = baseline_table.copy()

In [112]:
# Ensure that the columns match between the DataFrames
charlson_mode_row = charlson_mode_row.reindex(columns=baseline_table.columns)

# Concatenate the new row to the existing baseline_table
baseline_table_2 = pd.concat([baseline_table_copy, charlson_mode_row], ignore_index=True)


In [113]:
baseline_table_2

Unnamed: 0,Feature,Total Patients (mean ± SD or count),Extubation Success (mean ± SD or count),Extubation Failure (mean ± SD or count),P-value
0,Weight,83.56 ± 24.20,83.29 ± 23.61,84.13 ± 25.36,0.258888
1,Height,169.20 ± 11.88,169.34 ± 12.11,168.91 ± 11.40,0.238135
2,BMI,inf ± nan,inf ± nan,29.81 ± 12.35,
3,Male,2728,1839,889,
4,Female,1973,1318,655,
5,Age ≤44,612,459,153,
6,Age 45-54,685,464,221,
7,Age 55-64,1077,741,336,
8,Age 65-74,1148,750,398,
9,Age ≥75,1179,743,436,


In [115]:
# Save progress so far
baseline_table_2.to_csv('/content/drive/MyDrive/MSc_Final_Project/05_code_for_figures/baseline_v1.csv')

**Dynamic data**

In [117]:
dynamic_features = '/content/drive/MyDrive/MSc_Final_Project/02_data_analysis/mimic/data_analysis/datasets/05_time_series_data_extraction/feature_set_3_results/full_data_3.parquet'
dynamic_features = pd.read_parquet(dynamic_features)

dynamic_features.head()

Unnamed: 0,subject_id,charttime,itemid,valuenum
0,10001884,2131-01-12 15:00:00,223835.0,40.0
1,10001884,2131-01-12 15:00:00,224685.0,284.0
2,10001884,2131-01-12 15:00:00,224686.0,284.0
3,10001884,2131-01-12 15:00:00,224687.0,6.1
4,10001884,2131-01-12 15:00:00,224695.0,17.0


In [118]:
# Set itemid column to int
dynamic_features['itemid'] = dynamic_features['itemid'].astype(int)

In [119]:
dynamic_features['itemid'].nunique()

43

In [120]:
# Assign itemids to labels
feature_labels = '/content/drive/MyDrive/MSc_Final_Project/02_data_analysis/mimic/mimic-iv-2.2-raw-data/icu/d_items.csv'

feature_labels = pd.read_csv(feature_labels)

feature_labels.head()

Unnamed: 0,itemid,label,abbreviation,linksto,category,unitname,param_type,lownormalvalue,highnormalvalue
0,220001,Problem List,Problem List,chartevents,General,,Text,,
1,220003,ICU Admission date,ICU Admission date,datetimeevents,ADT,,Date and time,,
2,220045,Heart Rate,HR,chartevents,Routine Vital Signs,bpm,Numeric,,
3,220046,Heart rate Alarm - High,HR Alarm - High,chartevents,Alarms,bpm,Numeric,,
4,220047,Heart Rate Alarm - Low,HR Alarm - Low,chartevents,Alarms,bpm,Numeric,,


In [121]:
# Add column for feature_label based on itemid
dynamic_features = pd.merge(dynamic_features, feature_labels[['itemid', 'label']], on='itemid', how='left')

dynamic_features.head()

Unnamed: 0,subject_id,charttime,itemid,valuenum,label
0,10001884,2131-01-12 15:00:00,223835,40.0,Inspired O2 Fraction
1,10001884,2131-01-12 15:00:00,224685,284.0,Tidal Volume (observed)
2,10001884,2131-01-12 15:00:00,224686,284.0,Tidal Volume (spontaneous)
3,10001884,2131-01-12 15:00:00,224687,6.1,Minute Volume
4,10001884,2131-01-12 15:00:00,224695,17.0,Peak Insp. Pressure


In [122]:
# Rename to feature label
dynamic_features.rename(columns={'label': 'feature_label'}, inplace=True)

In [123]:
# Assign extubation failure label
dynamic_features = pd.merge(dynamic_features, labels[['subject_id', 'extubation_failure']], on='subject_id', how='left')

dynamic_features.head()

Unnamed: 0,subject_id,charttime,itemid,valuenum,feature_label,extubation_failure
0,10001884,2131-01-12 15:00:00,223835,40.0,Inspired O2 Fraction,1
1,10001884,2131-01-12 15:00:00,224685,284.0,Tidal Volume (observed),1
2,10001884,2131-01-12 15:00:00,224686,284.0,Tidal Volume (spontaneous),1
3,10001884,2131-01-12 15:00:00,224687,6.1,Minute Volume,1
4,10001884,2131-01-12 15:00:00,224695,17.0,Peak Insp. Pressure,1


In [124]:
# List all feature labels
dynamic_features['feature_label'].unique()

array(['Inspired O2 Fraction', 'Tidal Volume (observed)',
       'Tidal Volume (spontaneous)', 'Minute Volume',
       'Peak Insp. Pressure', 'Mean Airway Pressure', 'EtCO2',
       'Heart Rate', 'Respiratory Rate', 'GCS - Eye Opening',
       'GCS - Motor Response', 'O2 saturation pulseoxymetry',
       'Richmond-RAS Scale', 'Ventilator Mode',
       'Arterial Blood Pressure systolic',
       'Arterial Blood Pressure diastolic',
       'Arterial Blood Pressure mean', 'Temperature Fahrenheit',
       'Hematocrit (serum)', 'Sodium (serum)', 'Potassium (serum)',
       'Arterial O2 pressure', 'Arterial CO2 Pressure', 'PH (Arterial)',
       'Arterial Base Excess', 'Arterial O2 Saturation',
       'Ionized Calcium', 'Lactic Acid', 'Hemoglobin', 'WBC',
       'Creatinine (serum)', 'Glucose (serum)', 'Platelet Count',
       'CO2 production', 'Compliance', 'Plateau Pressure',
       'Mixed Venous O2% Sat', 'PH (Venous)', 'Venous CO2 Pressure',
       'Venous O2 Pressure', 'Cardiac Output (C

In [125]:
# Features that require mode calculation
mode_features = ['GCS - Eye Opening', 'GCS - Motor Response', 'Richmond-RAS Scale', 'Ventilator Mode']

# Split dynamic features into success and failure groups
success_dyn = dynamic_features[dynamic_features['extubation_failure'] == 0]
failure_dyn = dynamic_features[dynamic_features['extubation_failure'] == 1]

In [132]:
# Function to calculate mean, std, and p-value for continuous features
def continuous_dyn_stats(feature):
    data = dynamic_features[dynamic_features['feature_label'] == feature]
    success_data = success_dyn[success_dyn['feature_label'] == feature]['valuenum']
    failure_data = failure_dyn[failure_dyn['feature_label'] == feature]['valuenum']

    mean_total = data['valuenum'].mean()
    std_total = data['valuenum'].std()

    mean_success = success_data.mean()
    std_success = success_data.std()

    mean_failure = failure_data.mean()
    std_failure = failure_data.std()

    if len(success_data) > 1 and len(failure_data) > 1:
        _, p_value = stats.ttest_ind(success_data, failure_data, nan_policy='omit')
    else:
        p_value = np.nan

    return [mean_total, std_total, mean_success, std_success, mean_failure, std_failure, p_value]

In [144]:
def mode_dyn_stats(feature):
    data = dynamic_features[dynamic_features['feature_label'] == feature]['valuenum']
    success_data = success_dyn[success_dyn['feature_label'] == feature]['valuenum']
    failure_data = failure_dyn[failure_dyn['feature_label'] == feature]['valuenum']

    def safe_mode(x):
        if x.empty:
            print(f"No data available for feature: {feature}")
            return np.nan
        m = stats.mode(x, nan_policy='omit')
        mode_value = m.mode
        if isinstance(mode_value, np.ndarray):
            return mode_value[0] if mode_value.size > 0 else np.nan
        else:
            return mode_value

    mode_total = safe_mode(data)
    mode_success = safe_mode(success_data)
    mode_failure = safe_mode(failure_data)

    return [mode_total, mode_success, mode_failure]

In [145]:
# Add rows to the baseline_table
new_rows = []

for feature in dynamic_features['feature_label'].unique():
    if feature in mode_features:
        mode_values = mode_dyn_stats(feature)
        new_row = {
            'Feature': feature,
            'Total Patients (mean ± SD or count)': f'{mode_values[0]:.1f}',
            'Extubation Success (mean ± SD or count)': f'{mode_values[1]:.1f}',
            'Extubation Failure (mean ± SD or count)': f'{mode_values[2]:.1f}',
            'P-value': ''  # Mode does not have a p-value
        }
    else:
        stats_values = continuous_dyn_stats(feature)
        new_row = {
            'Feature': feature,
            'Total Patients (mean ± SD or count)': f'{stats_values[0]:.1f} ± {stats_values[1]:.1f}',
            'Extubation Success (mean ± SD or count)': f'{stats_values[2]:.1f} ± {stats_values[3]:.1f}',
            'Extubation Failure (mean ± SD or count)': f'{stats_values[4]:.1f} ± {stats_values[5]:.1f}',
            'P-value': f'{stats_values[6]:.3f}' if not pd.isna(stats_values[6]) else ''
        }

    new_rows.append(new_row)

In [146]:
# Convert new rows to a DataFrame and concatenate with baseline_table
new_rows_df = pd.DataFrame(new_rows)
baseline_table_3 = pd.concat([baseline_table_2, new_rows_df], ignore_index=True)

In [147]:
baseline_table_3

Unnamed: 0,Feature,Total Patients (mean ± SD or count),Extubation Success (mean ± SD or count),Extubation Failure (mean ± SD or count),P-value
0,Weight,83.56 ± 24.20,83.29 ± 23.61,84.13 ± 25.36,0.258888
1,Height,169.20 ± 11.88,169.34 ± 12.11,168.91 ± 11.40,0.238135
2,BMI,inf ± nan,inf ± nan,29.81 ± 12.35,
3,Male,2728,1839,889,
4,Female,1973,1318,655,
5,Age ≤44,612,459,153,
6,Age 45-54,685,464,221,
7,Age 55-64,1077,741,336,
8,Age 65-74,1148,750,398,
9,Age ≥75,1179,743,436,


In [148]:
baseline_table_3_copy = baseline_table_3.copy()

In [149]:
# Convert the 'P-value' column to numeric, forcing errors to NaN
baseline_table_3['P-value'] = pd.to_numeric(baseline_table_3['P-value'], errors='coerce')

# Replace P-values less than 0.001 with '<0.001'
baseline_table_3.loc[baseline_table_3['P-value'] < 0.001, 'P-value'] = '<0.001'

# Convert the 'P-value' column back to strings only for non-empty values
baseline_table_3['P-value'] = baseline_table_3['P-value'].apply(lambda x: str(x) if not pd.isna(x) else '')

  baseline_table_3.loc[baseline_table_3['P-value'] < 0.001, 'P-value'] = '<0.001'


In [152]:
baseline_table_3

Unnamed: 0,Feature,Total Patients (mean ± SD or count),Extubation Success (mean ± SD or count),Extubation Failure (mean ± SD or count),P-value
0,Weight,83.56 ± 24.20,83.29 ± 23.61,84.13 ± 25.36,0.258888178343421
1,Height,169.20 ± 11.88,169.34 ± 12.11,168.91 ± 11.40,0.23813526524708875
2,BMI,inf ± nan,inf ± nan,29.81 ± 12.35,
3,Male,2728,1839,889,
4,Female,1973,1318,655,
5,Age ≤44,612,459,153,
6,Age 45-54,685,464,221,
7,Age 55-64,1077,741,336,
8,Age 65-74,1148,750,398,
9,Age ≥75,1179,743,436,


In [151]:
# Save to csv
baseline_table_3.to_csv('/content/drive/MyDrive/MSc_Final_Project/05_code_for_figures/baseline_v2.csv')

In [153]:
success.shape

(3157, 10)

In [154]:
failure.shape

(1544, 10)