#  HOSPITAL RE-ADMISSION

Predicting hospital readmissions is an essential task in healthcare analytics. To build a predictive model for high-risk patients, you can follow these steps in Python using libraries like Pandas, Scikit-Learn, and Matplotlib. Ensure you have the necessary datasets and relevant libraries installed before proceeding.

Data Preprocessing:
Load your dataset and prepare it for analysis.

In [1]:
import pandas as pd

In [2]:
df= pd.read_csv("D:\\DATA SCIENCE ARJUN\\DS project\\hospital_readmissions.csv")

In [3]:
df

Unnamed: 0,Patient_ID,Age,Gender,Admission_Type,Diagnosis,Num_Lab_Procedures,Num_Medications,Num_Outpatient_Visits,Num_Inpatient_Visits,Num_Emergency_Visits,Num_Diagnoses,A1C_Result,Readmitted
0,1,69,Other,Emergency,Heart Disease,33,2,4,1,1,5,,Yes
1,2,32,Female,Urgent,Diabetes,81,10,4,4,1,6,,No
2,3,78,Female,Urgent,Heart Disease,75,29,4,0,3,5,Normal,No
3,4,38,Male,Elective,Diabetes,77,11,2,3,4,9,,Yes
4,5,41,Female,Urgent,Diabetes,50,25,3,4,1,3,,Yes
...,...,...,...,...,...,...,...,...,...,...,...,...,...
995,996,21,Female,Emergency,Heart Disease,68,10,2,3,2,9,Normal,No
996,997,43,Female,Emergency,Heart Disease,61,7,0,4,0,1,Normal,No
997,998,75,Male,Urgent,Diabetes,29,13,3,1,4,8,Normal,No
998,999,46,Other,Elective,Injury,19,20,1,4,4,1,Abnormal,No


In [6]:
df = df.dropna() # Remove rows with missing values


In [7]:
df.head(10)

Unnamed: 0,Patient_ID,Age,Gender,Admission_Type,Diagnosis,Num_Lab_Procedures,Num_Medications,Num_Outpatient_Visits,Num_Inpatient_Visits,Num_Emergency_Visits,Num_Diagnoses,A1C_Result,Readmitted
2,3,78,Female,Urgent,Heart Disease,75,29,4,0,3,5,Normal,No
7,8,70,Female,Elective,Heart Disease,28,19,4,0,3,7,Normal,Yes
8,9,19,Male,Urgent,Infection,70,23,1,2,4,6,Normal,No
9,10,47,Male,Emergency,Injury,41,24,4,0,0,3,Abnormal,No
11,12,19,Female,Emergency,Injury,68,14,2,4,2,5,Abnormal,No
12,13,81,Female,Emergency,Heart Disease,99,2,3,4,1,7,Abnormal,Yes
14,15,38,Male,Emergency,Diabetes,56,28,3,2,1,6,Normal,No
15,16,50,Female,Emergency,Diabetes,64,5,0,4,4,8,Normal,Yes
18,19,66,Other,Emergency,Infection,62,19,1,0,1,1,Normal,No
22,23,32,Other,Elective,Injury,18,4,2,0,2,1,Normal,Yes


In [9]:
df = pd.get_dummies(df, columns=['A1C_Result'])

In [10]:
df

Unnamed: 0,Patient_ID,Age,Gender,Admission_Type,Diagnosis,Num_Lab_Procedures,Num_Medications,Num_Outpatient_Visits,Num_Inpatient_Visits,Num_Emergency_Visits,Num_Diagnoses,Readmitted,A1C_Result_Abnormal,A1C_Result_Normal
2,3,78,Female,Urgent,Heart Disease,75,29,4,0,3,5,No,False,True
7,8,70,Female,Elective,Heart Disease,28,19,4,0,3,7,Yes,False,True
8,9,19,Male,Urgent,Infection,70,23,1,2,4,6,No,False,True
9,10,47,Male,Emergency,Injury,41,24,4,0,0,3,No,True,False
11,12,19,Female,Emergency,Injury,68,14,2,4,2,5,No,True,False
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
995,996,21,Female,Emergency,Heart Disease,68,10,2,3,2,9,No,False,True
996,997,43,Female,Emergency,Heart Disease,61,7,0,4,0,1,No,False,True
997,998,75,Male,Urgent,Diabetes,29,13,3,1,4,8,No,False,True
998,999,46,Other,Elective,Injury,19,20,1,4,4,1,No,True,False


In [13]:
# Split the data into features (X) and the target variable (y)
X = df.drop('Readmitted', axis=1)
y = df['Readmitted']

In [14]:
df

Unnamed: 0,Patient_ID,Age,Gender,Admission_Type,Diagnosis,Num_Lab_Procedures,Num_Medications,Num_Outpatient_Visits,Num_Inpatient_Visits,Num_Emergency_Visits,Num_Diagnoses,Readmitted,A1C_Result_Abnormal,A1C_Result_Normal
2,3,78,Female,Urgent,Heart Disease,75,29,4,0,3,5,No,False,True
7,8,70,Female,Elective,Heart Disease,28,19,4,0,3,7,Yes,False,True
8,9,19,Male,Urgent,Infection,70,23,1,2,4,6,No,False,True
9,10,47,Male,Emergency,Injury,41,24,4,0,0,3,No,True,False
11,12,19,Female,Emergency,Injury,68,14,2,4,2,5,No,True,False
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
995,996,21,Female,Emergency,Heart Disease,68,10,2,3,2,9,No,False,True
996,997,43,Female,Emergency,Heart Disease,61,7,0,4,0,1,No,False,True
997,998,75,Male,Urgent,Diabetes,29,13,3,1,4,8,No,False,True
998,999,46,Other,Elective,Injury,19,20,1,4,4,1,No,True,False


Feature Engineering:
Create new features or transform existing ones to improve the model's predictive power.

In [36]:
from sklearn.preprocessing import StandardScaler


numeric_column_name = 'Num_Medications'  # Remove the extra tab character

# Create a StandardScaler instance
scaler = StandardScaler()

# Reshape the data and then scale it
X[numeric_column_name] = scaler.fit_transform(X[numeric_column_name].values.reshape(-1, 1))

In [46]:
from sklearn.preprocessing import StandardScaler, OneHotEncoder
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

# Define your numeric and categorical features
numeric_features = ['Num_Diagnoses', 'Num_Medications','Num_Medications','Num_Outpatient_Visits']
categorical_features = ['Gender','Admission_Type',	'Diagnosis']

# Preprocessing steps
numeric_transformer = Pipeline(steps=[
    ('scaler', StandardScaler())])

categorical_transformer = Pipeline(steps=[
    ('onehot', OneHotEncoder(drop='first'))])

preprocessor = ColumnTransformer(
    transformers=[
        ('num', numeric_transformer, numeric_features),
        ('cat', categorical_transformer, categorical_features)])

# Define the model
model = RandomForestClassifier(n_estimators=100, random_state=42)

# Create a pipeline for preprocessing and modeling
pipeline = Pipeline(steps=[('preprocessor', preprocessor),
                           ('model', model)])

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Fit the pipeline on the training data
pipeline.fit(X_train, y_train)

# Make predictions on the test data
y_pred = pipeline.predict(X_test)

# Evaluate the model
accuracy = accuracy_score(y_test, y_pred)
print(f'Accuracy: {accuracy}')


Accuracy: 0.5175438596491229


In [52]:

# Display confusion matrix and classification report
confusion = confusion_matrix(y_test, y_pred)
print('Confusion Matrix:\n', confusion)

classification_rep = classification_report(y_test, y_pred)
print('Classification Report:\n', classification_rep)


Confusion Matrix:
 [[28 19]
 [36 31]]
Classification Report:
               precision    recall  f1-score   support

          No       0.44      0.60      0.50        47
         Yes       0.62      0.46      0.53        67

    accuracy                           0.52       114
   macro avg       0.53      0.53      0.52       114
weighted avg       0.54      0.52      0.52       114

