# Prediction of Patient Survival in the ICU 

## Problem Statement
Accurately predicting patient mortality in the intensive care unit (ICU) within the first 24 hours of admission is essential for effective triage, early prognosis, and informed clinical decision making. While data-driven models hold great potential to uncover factors influencing patient outcomes, the challenge lies in identifying the most relevant predictors within high-dimensional, incomplete ICU datasets. This project aims to utilize logistic regression and random forest models to estimate the risk of in-hospital mortality using data from the first 24 hours of ICU admission. Our objective is to identify a set of key features that can accurately predict mortality risk and to construct a predictive model that is practical for real-world ICU implementation.

## Description of the Dataset
This project utilizes a publicly available dataset collected by MIT’s Global Open Source Severity of Illness Score (GOSSIS) community initiative. The dataset includes comprehensive clinical information from over 130,000 ICU visits recorded over the course of one year. Data were aggregated from more than 200 hospitals across multiple countries, including the United States, Argentina, Australia, New Zealand, Sri Lanka, and Brazil, reflecting a diverse and globally representative ICU population. The dataset’s target variable is hospital mortality, while the predictor variables include a wide range of clinically relevant features collected within the first 24 hours of ICU admission.

## Data Issues
### 1) Data Missingness 

In [1]:
import pandas as pd

df = pd.read_csv("data/training_v2.csv")
df.head()

FileNotFoundError: [Errno 2] No such file or directory: 'data/training_v2.csv'

In [5]:
df.isna().sum()

encounter_id                      0
patient_id                        0
hospital_id                       0
hospital_death                    0
age                            4228
                               ... 
leukemia                        715
lymphoma                        715
solid_tumor_with_metastasis     715
apache_3j_bodysystem           1662
apache_2_bodysystem            1662
Length: 186, dtype: int64

### 2) Data Imbalance
There is a significant class imbalance in our target variable, with 83,798 entries that survived and 7,915 entries that resulted in death. 

In [6]:
df['hospital_death'].value_counts()

hospital_death
0    83798
1     7915
Name: count, dtype: int64

### 3) Data Scaling
The dataset shows a wide range of scales between the features. 

In [8]:
numeric_cols = df.select_dtypes(include=['float64', 'int64']).columns
df[numeric_cols].describe()

Unnamed: 0,encounter_id,patient_id,hospital_id,hospital_death,age,bmi,elective_surgery,height,icu_id,pre_icu_los_days,...,apache_4a_hospital_death_prob,apache_4a_icu_death_prob,aids,cirrhosis,diabetes_mellitus,hepatic_failure,immunosuppression,leukemia,lymphoma,solid_tumor_with_metastasis
count,91713.0,91713.0,91713.0,91713.0,87485.0,88284.0,91713.0,90379.0,91713.0,91713.0,...,83766.0,83766.0,90998.0,90998.0,90998.0,90998.0,90998.0,90998.0,90998.0,90998.0
mean,65606.07928,65537.131464,105.669262,0.086302,62.309516,29.185818,0.183736,169.641588,508.357692,0.835766,...,0.086787,0.043955,0.000857,0.015693,0.225192,0.012989,0.026165,0.007066,0.004132,0.020638
std,37795.088538,37811.252183,62.854406,0.280811,16.775119,8.275142,0.387271,10.795378,228.989661,2.487756,...,0.247569,0.217341,0.029265,0.124284,0.417711,0.113229,0.159628,0.083763,0.064148,0.142169
min,1.0,1.0,2.0,0.0,16.0,14.844926,0.0,137.2,82.0,-24.947222,...,-1.0,-1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
25%,32852.0,32830.0,47.0,0.0,52.0,23.641975,0.0,162.5,369.0,0.035417,...,0.02,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
50%,65665.0,65413.0,109.0,0.0,65.0,27.654655,0.0,170.1,504.0,0.138889,...,0.05,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
75%,98342.0,98298.0,161.0,0.0,75.0,32.930206,0.0,177.8,679.0,0.409028,...,0.13,0.06,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
max,131051.0,131051.0,204.0,1.0,89.0,67.81499,1.0,195.59,927.0,159.090972,...,0.99,0.97,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0
