Skip to content

Apply ensemble technique of model stacking to predict patient's readmission

Notifications You must be signed in to change notification settings

TienNguyen93/hospital-readmission

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 

Repository files navigation

Hospital Readmission

This Jupyter Notebook implements model stacking, a Machine Learning ensemble technique, to predict hospital readmission risk for patients with diabetes. It includes data pre-processing for various data types such as numbers, categories, and text, utilizes multiple models, applies cross-validation to find optimal hyperparameters, and incorporates out-of-fold predictions.

Dataset:

  • 8,000 training dataset
    • 40 columns
    • 60% of person not being readmitted and 40% of person being readmitted

Preprocessing

  • Apply Frequency and Ordinal Encoding for categorical data
  • Apply StandardScaler for numerical data
  • For text data, apply
    • Remove stopwords
    • Lemmatization
    • Text Frequency - Inverse Document Frequency (TF-IDF)

Models

Base models

  • Logistic Regression for text data
  • Support Vector Machine
  • Random Forest

Meta model

  • Gradient Boosting Classifier

Feature importance

feature-importance

AUC performance

AUC (Area Under the Curve) score represents the model's ability to discriminate between patients who will be readmitted and those who will not.

Why AUC? Because it allows hospitals to:

  • Implement targeted interventions for high-risk patients to prevent readmission.
  • Allocate resources more effectively.
  • Improve patient outcomes.

An AUC score of 0.68 indicates that the model can effectively differentiate between the two groups (readmission vs. no readmission).

auc-score