## DIABETES 130-US HOSPITALS FOR YEARS 1999-2008

### Introduction.

### Improving Diabetes Care and Reducing Early Readmissions

The dataset covers ten years (1999–2008) of clinical care data from 130 U.S. hospitals and integrated delivery networks. Each record represents a hospital stay of up to 14 days for patients diagnosed with diabetes, including information on laboratory tests, medications, and other clinical details.

The primary goal is to predict early readmissions, defined as those occurring within 30 days of discharge. This problem is critical because, despite strong evidence showing the benefits of preventive and therapeutic interventions for diabetic patients, many do not receive adequate care. In hospital environments, inconsistent diabetes management often fails to ensure proper glycemic control.

This failure not only leads to increased hospital costs due to frequent readmissions but also contributes to poorer patient outcomes. Patients are at greater risk of complications, increased morbidity, and higher mortality rates. Addressing these challenges through data analysis and predictive modeling can improve patient care and reduce the financial and operational burden on healthcare systems.

## Problem statement
Hospital readmissions, especially within 30 days of discharge, pose significant challenges in diabetes management. Despite advancements in preventive and therapeutic interventions, many diabetic patients experience inadequate care, leading to frequent hospital readmissions. This issue not only increases healthcare costs but also negatively impacts patient outcomes.

Using the Diabetes 130-US Hospitals (1999–2008) dataset, this project aims to develop a binary classification model to predict whether a diabetic patient will be readmitted within 30 days. By leveraging machine learning techniques, including handling missing values, feature engineering, and model optimization, we seek to identify key factors influencing readmissions and provide actionable insights to improve diabetes care and hospital management.

### Objectives  
1. Develop a classification model to predict 30-day hospital readmissions for diabetic patients.  
2. Identify key factors influencing readmission risk, including demographics, treatment history etc.
3. Evaluate the model’s predictive accuracy for effective hospital resource management.  
4. Provide actionable insights to improve diabetes care and reduce readmission rates.  
5. Support data-driven decision-making for healthcare policies and patient management strategies.  



### **Limitations of the Data**  

1. **High Missing Values** – The dataset contains a significant number of missing or incomplete records, which may impact model performance.    
2. **Limited Socioeconomic Factors** – The dataset lacks critical socioeconomic variables (e.g., income, education) that could influence hospital readmissions.  
3. **Data Source Constraints** – Collected from 130 U.S. hospitals between 1999-2008, making it less generalizable to current healthcare settings.  
  


In [1]:
import pandas as pd


In [2]:
df1 = pd.read_csv("D:\PROJECT\Diabetes_130-US_Hospitals_1999-2008\diabetic_data.csv")

In [3]:
df1.head()

Unnamed: 0,encounter_id,patient_nbr,race,gender,age,weight,admission_type_id,discharge_disposition_id,admission_source_id,time_in_hospital,...,citoglipton,insulin,glyburide-metformin,glipizide-metformin,glimepiride-pioglitazone,metformin-rosiglitazone,metformin-pioglitazone,change,diabetesMed,readmitted
0,2278392,8222157,Caucasian,Female,[0-10),?,6,25,1,1,...,No,No,No,No,No,No,No,No,No,NO
1,149190,55629189,Caucasian,Female,[10-20),?,1,1,7,3,...,No,Up,No,No,No,No,No,Ch,Yes,>30
2,64410,86047875,AfricanAmerican,Female,[20-30),?,1,1,7,2,...,No,No,No,No,No,No,No,No,Yes,NO
3,500364,82442376,Caucasian,Male,[30-40),?,1,1,7,2,...,No,Up,No,No,No,No,No,Ch,Yes,NO
4,16680,42519267,Caucasian,Male,[40-50),?,1,1,7,1,...,No,Steady,No,No,No,No,No,Ch,Yes,NO


In [5]:
df1.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 101766 entries, 0 to 101765
Data columns (total 50 columns):
 #   Column                    Non-Null Count   Dtype 
---  ------                    --------------   ----- 
 0   encounter_id              101766 non-null  int64 
 1   patient_nbr               101766 non-null  int64 
 2   race                      101766 non-null  object
 3   gender                    101766 non-null  object
 4   age                       101766 non-null  object
 5   weight                    101766 non-null  object
 6   admission_type_id         101766 non-null  int64 
 7   discharge_disposition_id  101766 non-null  int64 
 8   admission_source_id       101766 non-null  int64 
 9   time_in_hospital          101766 non-null  int64 
 10  payer_code                101766 non-null  object
 11  medical_specialty         101766 non-null  object
 12  num_lab_procedures        101766 non-null  int64 
 13  num_procedures            101766 non-null  int64 
 14  num_

In [12]:
df1.isnull()

Unnamed: 0,encounter_id,patient_nbr,race,gender,age,weight,admission_type_id,discharge_disposition_id,admission_source_id,time_in_hospital,...,citoglipton,insulin,glyburide-metformin,glipizide-metformin,glimepiride-pioglitazone,metformin-rosiglitazone,metformin-pioglitazone,change,diabetesMed,readmitted
0,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
1,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
2,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
3,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
4,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
101761,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
101762,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
101763,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
101764,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False


In [4]:
df2 = pd.read_csv("D:\PROJECT\Diabetes_130-US_Hospitals_1999-2008\IDS_mapping.csv")

In [5]:
df2.head()

Unnamed: 0,admission_type_id,description
0,1,Emergency
1,2,Urgent
2,3,Elective
3,4,Newborn
4,5,Not Available


In [13]:
df2.isnull()

Unnamed: 0,admission_type_id,description
0,False,False
1,False,False
2,False,False
3,False,False
4,False,False
...,...,...
62,False,False
63,False,False
64,False,False
65,False,False
