## HDAT 9910 Capstone


Research Question 2: Weekend Effect in ICU 

Task: The task is to investigate whether admission to ICU at the weekend increases the risk of ICU mortality 

Objective: To develop a statistical model to estimate the effect of weekend admission to ICU on the risk of mortality. 

Question: Does admission to ICU over the weekend increase the risk of mortality? 

Study Population: MIMIC-III dataset

#### Load packages


In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import warnings
import os
import concurrent.futures
warnings.filterwarnings("ignore")
pd.set_option('display.max_columns', None)


#### Load in the preprocessed MIMIC-III Dataset

In [40]:
df = pd.read_csv('/Users\lukac\OneDrive\Desktop\HDAT-9910-Capstone/df_weekend_r.csv')

In [41]:
# Check the balance of the target variable
total_icu_stays = df['icustay_id'].nunique()
survivors = df.loc[df['mortality'] == 0, 'icustay_id'].nunique()
non_survivors = df.loc[df['mortality'] == 1, 'icustay_id'].nunique()

print(f"Number of ICU stays: {total_icu_stays}")
print(f"Number of survivors: {survivors}")
print(f"Number of non-survivors: {non_survivors}")

mortality_rate = (non_survivors / total_icu_stays) * 100
print(f"Mortality: {mortality_rate:.1f}%")

Number of ICU stays: 61532
Number of survivors: 37341
Number of non-survivors: 24191
Mortality: 39.3%


In [42]:
# Drop rows with any missing values
df.dropna(inplace=True)

In [56]:
df.columns

Index(['icustay_id', 'admission_type', 'admission_location', 'insurance',
       'diagnosis', 'gender', 'mortality', 'spo2_mean', 'spo2_min', 'spo2_max',
       'temperature_mean', 'temperature_min', 'temperature_max',
       'resprate_mean', 'resprate_min', 'resprate_max', 'heartrate_mean',
       'heartrate_min', 'heartrate_max', 'sysbp_mean', 'sysbp_min',
       'sysbp_max', 'diasbp_mean', 'diasbp_min', 'diasbp_max', 'glucose_mean',
       'glucose_min', 'glucose_max', 'meanarterialpressure_mean',
       'meanarterialpressure_min', 'meanarterialpressure_max',
       'neutrophil_mean', 'neutrophil_min', 'neutrophil_max',
       'creactiveprotein_mean', 'creactiveprotein_min', 'creactiveprotein_max',
       'whitebloodcell_mean', 'whitebloodcell_min', 'whitebloodcell_max',
       'partialpressureo2_mean', 'partialpressureo2_min',
       'partialpressureo2_max', 'bicarbonate_mean', 'bicarbonate_min',
       'bicarbonate_max', 'lactate_mean', 'lactate_min', 'lactate_max',
       'tropon

### Model selection and training

In [59]:
import statsmodels.api as sm

X = df[['weekend_admission']]  # Predictor variables; add more as needed
y = df['mortality']  # Outcome variable

# Add a constant to X for the intercept
X = sm.add_constant(X)

# Fit the logistic regression model
model = sm.Logit(y, X).fit()

# Print the summary of the model to see the results
print(model.summary())


Optimization terminated successfully.
         Current function value: 0.687042
         Iterations 4
                           Logit Regression Results                           
Dep. Variable:              mortality   No. Observations:                51367
Model:                          Logit   Df Residuals:                    51365
Method:                           MLE   Df Model:                            1
Date:                Tue, 02 Apr 2024   Pseudo R-squ.:               0.0005658
Time:                        16:48:04   Log-Likelihood:                -35291.
converged:                       True   LL-Null:                       -35311.
Covariance Type:            nonrobust   LLR p-value:                 2.597e-10
                        coef    std err          z      P>|z|      [0.025      0.975]
-------------------------------------------------------------------------------------
const                -0.2440      0.010    -24.299      0.000      -0.264      -0.224
weekend_

In [49]:
import numpy as np

# Calculating the Odds Ratio for weekend admission
odds_ratio = np.exp(model.params['weekend_admission'])
print(f"Odds Ratio for weekend admission: {odds_ratio}")
""

Odds Ratio for weekend admission: 1.1456788245345904


''

In [50]:
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, roc_auc_score

X = df[['weekend_admission']]  # Features DataFrame
y = df['mortality']  # Target series

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)


In [51]:
# Initialise the logistic regression model
model = LogisticRegression()

# Fit the model
model.fit(X_train, y_train)

LogisticRegression()

In [52]:
# Make predictions
y_pred = model.predict(X_test)
y_proba = model.predict_proba(X_test)[:, 1]  # Probabilities for ROC-AUC

# Evaluate the model
accuracy = accuracy_score(y_test, y_pred)
roc_auc = roc_auc_score(y_test, y_proba)

print(f"Accuracy: {accuracy:.2f}")
print(f"ROC-AUC Score: {roc_auc:.2f}")


Accuracy: 0.55
ROC-AUC Score: 0.52


In [53]:
# Getting the coefficient for weekend_admission
weekend_coefficient = model.coef_[0][0]
print(f"Coefficient for weekend admission: {weekend_coefficient:.4f}")


Coefficient for weekend admission: 0.1235
