# Patient Readmission Risk Prediction

## Background
Hospitals strive to minimize unplanned readmissions within 30 days, which often indicate poor patient outcomes or inadequate discharge planning. This project uses a machine learning model to predict whether a patient is at risk of readmission based on features such as age, length of stay, and comorbidities.

## Goals
The goals for this project are to create: 
1. Functional Flask App: Accepts input data for a patient and predicts the probability of readmission within 30 days.Includes a health check route.
2. Pre-Trained ML Model: Logistic Regression or Random Forest model trained on a healthcare dataset.
3. API Documentation: Swagger/OpenAPI or Postman documentation. Example requests and responses.
4. Tested API Functionality: Validate the API using test cases with Postman.
5. Code Documentation: Inline comments and a clear README file.


## Importing Libraries and Datasets

In [8]:
# Import libraries
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import classification_report, accuracy_score
import joblib
import os

In [13]:
# Set working directory to 'readmission_app'
os.chdir('/Users/davidbrici/my_portfolio/ml_portfolio/Projects/3_Machine-Learning/readmission_app')

In [16]:
# Load the dataset
pd.set_option('display.max_columns', None)
data = pd.read_csv('data/diabetic_data.csv')
data.head()

Unnamed: 0,encounter_id,patient_nbr,race,gender,age,weight,admission_type_id,discharge_disposition_id,admission_source_id,time_in_hospital,payer_code,medical_specialty,num_lab_procedures,num_procedures,num_medications,number_outpatient,number_emergency,number_inpatient,diag_1,diag_2,diag_3,number_diagnoses,max_glu_serum,A1Cresult,metformin,repaglinide,nateglinide,chlorpropamide,glimepiride,acetohexamide,glipizide,glyburide,tolbutamide,pioglitazone,rosiglitazone,acarbose,miglitol,troglitazone,tolazamide,examide,citoglipton,insulin,glyburide-metformin,glipizide-metformin,glimepiride-pioglitazone,metformin-rosiglitazone,metformin-pioglitazone,change,diabetesMed,readmitted
0,2278392,8222157,Caucasian,Female,[0-10),?,6,25,1,1,?,Pediatrics-Endocrinology,41,0,1,0,0,0,250.83,?,?,1,,,No,No,No,No,No,No,No,No,No,No,No,No,No,No,No,No,No,No,No,No,No,No,No,No,No,NO
1,149190,55629189,Caucasian,Female,[10-20),?,1,1,7,3,?,?,59,0,18,0,0,0,276.0,250.01,255,9,,,No,No,No,No,No,No,No,No,No,No,No,No,No,No,No,No,No,Up,No,No,No,No,No,Ch,Yes,>30
2,64410,86047875,AfricanAmerican,Female,[20-30),?,1,1,7,2,?,?,11,5,13,2,0,1,648.0,250,V27,6,,,No,No,No,No,No,No,Steady,No,No,No,No,No,No,No,No,No,No,No,No,No,No,No,No,No,Yes,NO
3,500364,82442376,Caucasian,Male,[30-40),?,1,1,7,2,?,?,44,1,16,0,0,0,8.0,250.43,403,7,,,No,No,No,No,No,No,No,No,No,No,No,No,No,No,No,No,No,Up,No,No,No,No,No,Ch,Yes,NO
4,16680,42519267,Caucasian,Male,[40-50),?,1,1,7,1,?,?,51,0,8,0,0,0,197.0,157,250,5,,,No,No,No,No,No,No,Steady,No,No,No,No,No,No,No,No,No,No,Steady,No,No,No,No,No,Ch,Yes,NO


## Preprocess

In [17]:
# Preprocess the data
data.fillna("Unknown", inplace=True)

# Drop unnecessary columns
data.drop(['encounter_id', 'patient_nbr', 'weight', 'payer_code', 'medical_specialty'], axis=1, inplace=True)

# Map 'readmitted' to binary
data['readmitted'] = data['readmitted'].apply(lambda x: 1 if x == '<30' else 0)

# Retain only the required features
selected_features = ['age', 'num_lab_procedures', 'num_medications', 'num_procedures', 'time_in_hospital', 'readmitted']
data = data[selected_features]

# Convert 'age' to a numerical value (e.g., taking the lower bound of the range)
data['age'] = data['age'].str.extract(r'(\d+)').astype(int)

In [18]:
# Define target and feature variables
X = data.drop('readmitted', axis=1)
y = data['readmitted']

# Split the data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)


## Train model
Train a Logistic Regression model (or Random Forest for improved performance).


In [19]:
# Train a Logistic Regression model
model = LogisticRegression(max_iter=1000)
model.fit(X_train, y_train)

In [20]:
# Evaluate the model
y_pred = model.predict(X_test)
print("Accuracy:", accuracy_score(y_test, y_pred))
print(classification_report(y_test, y_pred))

Accuracy: 0.887737054141692
              precision    recall  f1-score   support

           0       0.89      1.00      0.94     18069
           1       0.00      0.00      0.00      2285

    accuracy                           0.89     20354
   macro avg       0.44      0.50      0.47     20354
weighted avg       0.79      0.89      0.83     20354



  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


In [21]:
# Save the model
joblib.dump(model, 'models/readmission_model.pkl')

['models/readmission_model.pkl']