# Predicting Patient Length of Stay (LoS) in Hospitals

# 1. Problem statement

Hospitals and healthcare facilities often struggle with efficiently managing resources and planning patient care due to variability in patient Length of Stay (LoS). Accurately predicting the LoS for patients can lead to better resource allocation, improved patient management, and reduced operational costs. Despite advancements in medical technology and data collection, there is a need for a more refined model that integrates diverse data sources to improve prediction accuracy.

# 2. Objective

To develop a machine learning regression model that predicts the Length of Stay (LoS) for patients based on their medical history, demographics, and other relevant factors. The goal is to create a model that helps hospitals forecast patient discharge times more accurately, thereby optimizing bed management and resource allocation.

# 3. Import Libraries

In [3]:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import mean_squared_error
from sklearn.preprocessing import LabelEncoder

# 4. Load Data

In [4]:
df = pd.read_csv("LengthOfStay.csv")

In [5]:
df.head(5)

Unnamed: 0,eid,vdate,rcount,gender,dialysisrenalendstage,asthma,irondef,pneum,substancedependence,psychologicaldisordermajor,...,glucose,bloodureanitro,creatinine,bmi,pulse,respiration,secondarydiagnosisnonicd9,discharged,facid,lengthofstay
0,1,8/29/2012,0,F,0,0,0,0,0,0,...,192.476918,12.0,1.390722,30.432418,96,6.5,4,9/1/2012,B,3
1,2,5/26/2012,5+,F,0,0,0,0,0,0,...,94.078507,8.0,0.943164,28.460516,61,6.5,1,6/2/2012,A,7
2,3,9/22/2012,1,F,0,0,0,0,0,0,...,130.530524,12.0,1.06575,28.843812,64,6.5,2,9/25/2012,B,3
3,4,8/9/2012,0,F,0,0,0,0,0,0,...,163.377028,12.0,0.906862,27.959007,76,6.5,1,8/10/2012,A,1
4,5,12/20/2012,0,F,0,0,0,1,0,1,...,94.886654,11.5,1.242854,30.258927,67,5.6,2,12/24/2012,E,4


# 4. Data preprocessing

In [6]:
# Convert date columns to datetime
df['vdate'] = pd.to_datetime(df['vdate'])
df['discharged'] = pd.to_datetime(df['discharged'])

In [7]:
# Drop unnecessary columns
df = df.drop(columns=['eid', 'vdate', 'discharged'])

In [8]:
# Encode categorical variables
label_encoder = LabelEncoder()
df['rcount'] = label_encoder.fit_transform(df['rcount'])
df['gender'] = label_encoder.fit_transform(df['gender'])
df['facid'] = label_encoder.fit_transform(df['facid'])


# 5. Model selection and building

In [10]:
# Define features and target variable
X = df.drop(columns=['lengthofstay'])
y = df['lengthofstay']

In [11]:
# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)


In [13]:
X.head(5)

Unnamed: 0,rcount,gender,dialysisrenalendstage,asthma,irondef,pneum,substancedependence,psychologicaldisordermajor,depress,psychother,...,neutrophils,sodium,glucose,bloodureanitro,creatinine,bmi,pulse,respiration,secondarydiagnosisnonicd9,facid
0,0,0,0,0,0,0,0,0,0,0,...,14.2,140.361132,192.476918,12.0,1.390722,30.432418,96,6.5,4,1
1,5,0,0,0,0,0,0,0,0,0,...,4.1,136.731692,94.078507,8.0,0.943164,28.460516,61,6.5,1,0
2,1,0,0,0,0,0,0,0,0,0,...,8.9,133.058514,130.530524,12.0,1.06575,28.843812,64,6.5,2,1
3,0,0,0,0,0,0,0,0,0,0,...,9.4,138.994023,163.377028,12.0,0.906862,27.959007,76,6.5,1,0
4,0,0,0,0,0,1,0,1,0,0,...,9.05,138.634836,94.886654,11.5,1.242854,30.258927,67,5.6,2,4


In [16]:
y.head(5)

0    3
1    7
2    3
3    1
4    4
Name: lengthofstay, dtype: int64

In [None]:
# Create the Random Forest Regression model
rf_model = RandomForestRegressor(n_estimators=100, random_state=42)

# Train the model
rf_model.fit(X_train, y_train)


In [None]:
# Make predictions on the testing data
y_pred = rf_model.predict(X_test)

# Evaluate the model
rmse = np.sqrt(mean_squared_error(y_test, y_pred))
print(f"Root Mean Squared Error (RMSE): {rmse}")

# If you want to see a sample of predictions
predictions_sample = pd.DataFrame({
    'Actual': y_test,
    'Predicted': y_pred
})
print(predictions_sample.head())


# 6. Model development using flask

In [None]:
import pickle
# Save the model to a file using pickle
model_filename = 'random_forest_model.pkl'
with open(model_filename, 'wb') as file:
    pickle.dump(rf_model, file)

print(f"Model saved to {model_filename}")


In [None]:
from flask import Flask, request, jsonify, render_template
import pickle
import numpy as np
import pandas as pd

app = Flask(__name__)

# Load the model
with open('random_forest_model.pkl', 'rb') as file:
    model = pickle.load(file)

# Route for home page
@app.route('/')
def home():
    return render_template('index.html')

# Route for prediction
@app.route('/predict', methods=['POST'])
def predict():
    # Get data from form
    input_features = [float(x) for x in request.form.values()]
    feature_names = ['rcount', 'gender', 'dialysisrenalendstage', 'asthma', 'irondef', 
                     'pneum', 'substancedependence', 'psychologicaldisordermajor', 
                     'depress', 'psychother', 'fibrosisandother', 'malnutrition', 'hemo', 
                     'hematocrit', 'neutrophils', 'sodium', 'glucose', 'bloodureanitro', 
                     'creatinine', 'bmi', 'pulse', 'respiration', 
                     'secondarydiagnosisnonicd9', 'facid']
    
    # Convert features to DataFrame for model input
    features = pd.DataFrame([input_features], columns=feature_names)

    # Predict using the model
    prediction = model.predict(features)[0]

    return render_template('index.html', prediction_text=f'Predicted Length of Stay: {prediction:.2f} days')

if __name__ == '__main__':
    app.run(debug=True)


# 7.Conclusion

The implementation Predicting Patient Length of Stay (LoS) has shown promising results using Random Forest regression By leveraging patient demographics, medical history, and other relevant features, the model effectively forecasts discharge times, aiding hospitals in optimizing bed management and resource allocation. The accuracy of the model demonstrates its potential as a valuable tool in healthcare operations, contributing to better resource planning and improved patient care