A data analytics and machine learning project built using MIMIC-IV clinical data to predict ICU admissions and analyze patient risk patterns.
This project focuses on:
Understanding hospital admission patterns Identifying patients at risk of ICU admission Building predictive models for ICU demand Creating interactive dashboards for business insights
Hospitals often struggle with:
Unexpected ICU demand Resource allocation challenges Late identification of high-risk patients
👉 This project aims to predict ICU admission risk early using patient-level data.
Dataset: MIMIC-IV v3.1 Source: PhysioNet Data accessed via Google BigQuery
Includes: Patient demographics Hospital admissions ICU stays
SQL (BigQuery) → Data extraction & joins Python (Pandas, Scikit-learn) → Data processing & ML Power BI → Dashboard & visualization
Joined: patients admissions icustays Created target variable: ICU_FLAG (1 = ICU admission, 0 = No ICU)
Handled missing values Removed data leakage (ICU_STAY_DAYS) Created features: HOSPITAL_STAY_DAYS AGE_GROUP Encoded categorical variables
Used SMOTE to balance ICU vs non-ICU classes
🔹 Logistic Regression
Used for interpretability
Applied feature scaling
🔹 Random Forest
Used for performance comparison
Metric Logistic Regression Random Forest ROC-AUC ~0.70 ~0.75 Recall (ICU) High Moderate Accuracy Moderate Higher
👉 Logistic Regression chosen for: Better interpretability Higher recall (important in healthcare)
Hospital Stay Duration Age / Age Group Gender
The dashboard provides:
🔹 KPIs
Total Patients
ICU Patients
ICU Rate
High-Risk Patients
Avg Hospital Stay
🔹 Visual Insights ICU Rate by Age Group ICU Distribution by Gender ICU Risk vs Hospital Stay Duration
ICU admission likelihood increases with longer hospital stays Certain age groups show higher ICU risk Gender differences observed in ICU distribution Dataset is highly imbalanced, requiring careful modeling
Clone the repository Install dependencies: pip install pandas scikit-learn imbalanced-learn Run the Python script: python main.py Open Power BI dashboard file (.pbix)
📁 Project
│
├── 📄 main.py # ML pipeline
├── 📄 icu_prediction_data.csv
├── 📊 dashboard.pbix # Power BI dashboard
├── 📄 README.md
Add more clinical features (labs, vitals)
Data is de-identified (MIMIC-IV) For educational and research purposes only
Pavan R V