Predicting Hospital Readmissions Using Machine Learning

Repository Navigation

Abstract

This project focuses on predicting the likelihood of readmission within 30 days of initial discharge for diabetes patients using machine learning algorithms. I developed and evaluated various models, including random forests, SVC with RBF kernel, gradient boosting, and neural networks, to identify high-risk patients and provide targeted interventions to reduce the likelihood of readmission. By analyzing admission, demographic, clinical, medication, and discharge data, we were able to develop a predictive model that can be used by healthcare providers to optimize resource allocation, improve care coordination, and inform policy and practice in the field of diabetes care and readmission prevention. The insights gained from this project demonstrate the potential of data science and machine learning in healthcare and contribute to efforts to improve patient outcomes and reduce healthcare costs.

Introduction

Hospital readmissions within 30 days of discharge are a common problem, especially for patients with chronic conditions like diabetes. Not only do readmissions negatively impact patient outcomes, but they also increase healthcare costs and strain the capacity of the healthcare system. In response to this issue, Medicare established the Hospital Readmissions Reduction Program (HRRP), which financially penalizes hospitals with higher-than-expected rates of readmissions. To address this problem, I developed machine learning models that can identify patients who are at risk for 30-day readmissions. These models analyze key features and patterns to inform treatment plans and reduce the likelihood of readmissions. By developing and implementing these models, we aim to address the challenges of preventable readmissions, improve patient outcomes, and reduce healthcare costs. Additionally, our models can be used in real-time to predict the probability of a patient's readmission, enabling providers to modify treatment plans dynamically and optimize patient care.

Project Goals

Develop a machine learning model to predict <30-day readmission based on patient treatment and discharge data
Identify key patient and treatment features that are most predictive of readmissions
Provide recommendations to help reduce the number of preventable readmissions

Methods

Data

Source [1]

Overview

The "Diabetes 130-US hospitals for years 1999-2008" dataset from the University of California - Irvine, Machine Learning Repository represents 10 years of clinical care at 130 US hospitals and integrated delivery networks. It includes over 101,000 instances with more than 50 features, such as demographic information, health history, admission and treatment information, and discharge information.

Preparation

The dataset includes three classes for readmission, which were simplified into a binary classification for the purpose of the analysis: Class 0 includes NO and >30 readmissions, while Class 1 includes <30 readmissions. The dataset underwent several preprocessing steps, including de-noising, feature engineering, and undersampling to address class imbalance. These steps were taken to ensure the quality and accuracy of the data and to improve the performance of the machine learning models used in the analysis. The data was split into train (80%) and validation (20%) sets.

Models

I developed and evaluated several machine learning models, including Random Forest, SVC (Support Vector Classifier), LightGBM & CatBoost (Gradient Boosting), and Recurrent Neural Network (RNN) on their accuracy, precision, recall, and f-1 scores.

Results

Models' Performance

After training, tuning, and evaluation, all models achieved similar metrics, indicating that I have likely extracted the maximum amount of information from the dataset. Interestingly, the simple random forest model achieved the same accuracy as the more complex LGBM and Catboost models, but achieved a higher target F-1 score. Given the task of predicting the likelihood of readmission within 30 days of initial discharge for diabetes patients, I prioritized minimizing false negative errors over false positive errors. Therefore, I chose the random forest model for further feature analysis.

Feature Importances

It's important to note that the relationships observed between certain variables and the risk of readmission may be correlational rather than causal. For example, hospital visits, the number of diagnoses, the number of medications, and the discharge facility may be secondary to a patient's health status, which may be the primary predictor of readmission risk. However, it's interesting to note that more procedures appear to be associated with a lower risk of readmission, suggesting that this may be a potentially important factor to consider when developing strategies for preventing readmissions, although more research is necessary.

Discharge Facilities by Readmission Probability

I realized there were several important features relating to the discharge facility, so I investigated this further:

My analysis revealed that rehab facilities had the highest predicted readmission probability, while discharge home had the lowest predicted readmission probability. Additionally, transferring a patient to inpatient care at the same hospital was associated with a lower readmission probability compared to transferring to a different hospital. It's important to note that these findings demonstrate a correlation between certain variables and the risk of readmission, but do not necessarily indicate causation. Nonetheless, these insights can inform clinical decision-making and help me as a healthcare provider develop targeted interventions to reduce the risk of readmissions for diabetes patients.

Application and Value

Deployment

The machine learning model can be integrated with electronic health records (EHRs) to assess a patient's risk of readmission in real-time. This integration would allow healthcare providers to intervene early and provide targeted interventions and informed care plans, potentially reducing the risk of readmission for diabetes patients. By leveraging the predictive capabilities of the model and integrating it with EHRs, healthcare providers can take a proactive approach to managing the health of their patients and improving health outcomes.

Value

The machine learning model's real-time risk assessment capabilities can enable healthcare providers to develop informed care and discharge plans for high-risk patients. This targeted approach to patient care can potentially reduce the risk of readmission for diabetes patients and optimize resource allocation within healthcare systems. In addition, leveraging the predictive capabilities of the model can inform quality improvement (QI) initiatives, allowing healthcare providers to continuously improve patient outcomes and reduce healthcare costs associated with readmissions.

Conclusion

In conclusion, hospital readmissions within 30 days of discharge negatively impact patient outcomes, decrease profits through Medicare's Hospital Readmissions Reduction Program (HRRP), increase healthcare costs, and strain healthcare system capacity. Our machine learning model provides healthcare providers with a valuable tool to address these challenges by identifying high-risk patients and enabling the development of informed care and discharge plans. By leveraging the model's predictive capabilities, healthcare providers can improve patient outcomes, reduce healthcare costs, and optimize resource allocation, ultimately improving the quality of care for diabetes patients.

Reproducibility

This notebook takes my 2018 iMac with the following specs about 1.3 minutes to complete: OS: macos Ventura 13.1 Processor: 3.1 GHz 6-Core Intel Core i5 Memory: 64 GB 2667 MHz DDR4

To reproduce this project, follow these steps:

Clone the project repository or download the project files from GitHub.
Install Python 3.10 on your local machine, if it is not already installed. (I used 3.10.11 specifically)
Create a new virtual environment using the venv module in Python. For example, in your terminal or command prompt, navigate to the project directory and run the following command:

python3.10 -m venv myenv

Activate the virtual environment by running the following command:

source myenv/bin/activate

On Windows, the command is:

myenv\Scripts\activate.bat

Install the required Python packages and their dependencies listed in the requirements.txt file by running the following command:

pip3.10 install -r requirements.txt

Launch Jupyter Notebook and select 'myenv' as the kernel.
Run the cells in the notebook to reproduce the results of the project.

The data in the dataset_diabetes directory has not been modified from the original downloaded from the UCI ML repo, but you may download the data directly from the link in sources below if you would like. You can replace the directory with the download without change any file names and the paths will work as expected.

Sources

[1] Clore,John, Cios,Krzysztof, DeShazo,Jon & Strack,Beata. (2014). Diabetes 130-US hospitals for years 1999-2008. UCI Machine Learning Repository. https://doi.org/10.24432/C5230J.

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
Presentation		Presentation
RNN_info		RNN_info
catboost_info		catboost_info
dataset_diabetes		dataset_diabetes
.gitignore		.gitignore
README.md		README.md
base.ipynb		base.ipynb
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Predicting Hospital Readmissions Using Machine Learning

Repository Navigation

Abstract

Introduction

Project Goals

Methods

Data

Overview

Preparation

Models

Results

Models' Performance

Feature Importances

Discharge Facilities by Readmission Probability

Application and Value

Deployment

Value

Conclusion

Reproducibility

Sources

About

Releases

Packages

Languages

ACB-prgm/HospitalReadmissionPrediction

Folders and files

Latest commit

History

Repository files navigation

Predicting Hospital Readmissions Using Machine Learning

Repository Navigation

Abstract

Introduction

Project Goals

Methods

Data

Overview

Preparation

Models

Results

Models' Performance

Feature Importances

Discharge Facilities by Readmission Probability

Application and Value

Deployment

Value

Conclusion

Reproducibility

Sources

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages