### <span style="color:Green">***Introduction***</span>

Hospital readmission rates are a key indicator of healthcare quality and efficiency. High readmission rates can indicate suboptimal patient care, increased healthcare costs, and strain on hospital resources. Identifying factors contributing to patient readmission can help hospitals implement preventive strategies, improve patient outcomes, and optimize resource allocation.

This study explores patient demographic, clinical, and treatment-related factors to predict hospital readmissions. By leveraging machine learning regression techniques, we aim to provide insights into which factors significantly impact readmission likelihood and how hospitals can mitigate unnecessary returns.

### <span style="color:Green">***Problem Statement***</span>

Hospital readmissions pose significant challenges in healthcare management, often leading to increased costs and poor patient experiences. Understanding the underlying socio-demographic and clinical factors influencing readmission is crucial for improving care and reducing unnecessary hospital visits. This project aims to develop a predictive model that estimates the likelihood of a patient being readmitted based on various factors such as age, diagnosis, treatment history, and hospital stay characteristics.

### <span style="color:Green">***Business Statement***</span>

For healthcare institutions, reducing readmission rates is not just a regulatory requirement but a critical factor in cost reduction, resource management, and improved patient outcomes. By accurately predicting readmissions, hospitals can design targeted interventions, personalize patient care, and optimize discharge planning. A data-driven approach will empower healthcare providers to proactively identify high-risk patients, ultimately improving overall healthcare efficiency and reducing financial burdens.

### <span style="color:Green">***Objectives***</span>


1. **Develop a Predictive Model** – Build a machine learning regression model that estimates the likelihood of patient readmission based on provided patient data. The model will utilize structured healthcare data, including patient demographics, medical history, diagnosis codes, and prescribed medications.  

2. **Identify Key Features** – Determine the most influential factors contributing to hospital readmissions. Some features, such as patient demographics and medical history, are crucial for understanding general hospital intake, while others, such as the number of lab procedures, inpatient visits, and specific medication use, are critical for predicting readmission risk. Identifying these key factors will enhance model performance and provide actionable insights for healthcare providers.  

3. **Perform Exploratory Data Analysis (EDA) and Feature Visualizations** – Conduct thorough data exploration and visualization to understand the distribution of features, detect missing values, and identify patterns associated with patient readmission. EDA will help in feature selection, transformation, and engineering for better model performance.  

4. **Enhance Patient Care Strategies** – Utilize insights from features such as the number of medications, prior hospital visits, and lab test results to create personalized follow-up care plans. For instance, patients with frequent hospital visits or multiple prescribed medications may require more intensive post-discharge monitoring to reduce readmission risks.  

5. **Support Clinical Decision-Making** – Provide healthcare professionals with data-driven insights to improve patient management. By leveraging predictive analytics, medical teams can proactively identify patients at high risk of readmission and implement timely interventions.  

6. **Evaluate Model Performance** – Assess the accuracy and reliability of the regression model using key performance metrics such as RMSE, R² score, and mean absolute error (MAE). A thorough evaluation will ensure that the model is robust and generalizable to unseen patient data.  

7. **Generate Recommendations and Conclusions** – Derive meaningful insights from the feature analysis and model results to offer actionable recommendations. This may include identifying high-risk patient groups, suggesting policy changes, or refining hospital intake processes to minimize unnecessary readmissions.  

8. **Optimize Discharge Planning** – Develop strategies to improve discharge and post-discharge care by analyzing key factors such as admission type, discharge disposition, and patient demographics. These insights can help hospitals design better transition care plans to prevent readmissions.  


### <span style="color:Green">***Data Understanding***</span>

#### *Data Importation and Reviews*

In [14]:
import pandas as pd
df1 = pd.read_csv(r"N:\Moringa\afterM\joseline 001\Diabetes_130-US_Hospitals_1999-2008\diabetic_data.csv")
df2 = pd.read_csv(r"N:\Moringa\afterM\joseline 001\Diabetes_130-US_Hospitals_1999-2008\IDS_mapping.csv")
df1.head(3)

Unnamed: 0,encounter_id,patient_nbr,race,gender,age,weight,admission_type_id,discharge_disposition_id,admission_source_id,time_in_hospital,...,citoglipton,insulin,glyburide-metformin,glipizide-metformin,glimepiride-pioglitazone,metformin-rosiglitazone,metformin-pioglitazone,change,diabetesMed,readmitted
0,2278392,8222157,Caucasian,Female,[0-10),?,6,25,1,1,...,No,No,No,No,No,No,No,No,No,NO
1,149190,55629189,Caucasian,Female,[10-20),?,1,1,7,3,...,No,Up,No,No,No,No,No,Ch,Yes,>30
2,64410,86047875,AfricanAmerican,Female,[20-30),?,1,1,7,2,...,No,No,No,No,No,No,No,No,Yes,NO


In [22]:
df2.head(3)

Unnamed: 0,admission_type_id,description
0,1,Emergency
1,2,Urgent
2,3,Elective


In [26]:
df1.shape

(101766, 50)

In [30]:
#The info() function in pandas provides a concise summary of a DataFrame. It is useful for quickly understanding the structure and basic properties of our dataset.
df1.info() 

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 101766 entries, 0 to 101765
Data columns (total 50 columns):
 #   Column                    Non-Null Count   Dtype 
---  ------                    --------------   ----- 
 0   encounter_id              101766 non-null  int64 
 1   patient_nbr               101766 non-null  int64 
 2   race                      101766 non-null  object
 3   gender                    101766 non-null  object
 4   age                       101766 non-null  object
 5   weight                    101766 non-null  object
 6   admission_type_id         101766 non-null  int64 
 7   discharge_disposition_id  101766 non-null  int64 
 8   admission_source_id       101766 non-null  int64 
 9   time_in_hospital          101766 non-null  int64 
 10  payer_code                101766 non-null  object
 11  medical_specialty         101766 non-null  object
 12  num_lab_procedures        101766 non-null  int64 
 13  num_procedures            101766 non-null  int64 
 14  num_

From this we can review the data types , non-NaN valued columns and the index range. non-NaN helps know the approach for cleaning and the dtypes makes easy during column categorisation 

In [27]:
df1.columns

Index(['encounter_id', 'patient_nbr', 'race', 'gender', 'age', 'weight',
       'admission_type_id', 'discharge_disposition_id', 'admission_source_id',
       'time_in_hospital', 'payer_code', 'medical_specialty',
       'num_lab_procedures', 'num_procedures', 'num_medications',
       'number_outpatient', 'number_emergency', 'number_inpatient', 'diag_1',
       'diag_2', 'diag_3', 'number_diagnoses', 'max_glu_serum', 'A1Cresult',
       'metformin', 'repaglinide', 'nateglinide', 'chlorpropamide',
       'glimepiride', 'acetohexamide', 'glipizide', 'glyburide', 'tolbutamide',
       'pioglitazone', 'rosiglitazone', 'acarbose', 'miglitol', 'troglitazone',
       'tolazamide', 'examide', 'citoglipton', 'insulin',
       'glyburide-metformin', 'glipizide-metformin',
       'glimepiride-pioglitazone', 'metformin-rosiglitazone',
       'metformin-pioglitazone', 'change', 'diabetesMed', 'readmitted'],
      dtype='object')


The dataset provides an in-depth explanation of each column, as detailed in the [column description](https://archive.ics.uci.edu/dataset/296/diabetes+130-us+hospitals+for+years+1999-20). To refine our analysis, we will compute a correlation matrix to examine the relationships between the features and the target variable, `readmitted`. Based on the correlation weights of the independent variables with respect to the `column` `readmitted`, we will systematically decide which features to retain and which to drop for further modeling and analysis.

