# Introduction

Diabetes is a chronic condition that affects the body’s ability to regulate blood sugar levels due to insufficient insulin production or the body's inability to use insulin effectively. It is classified into two main types: **Type 1 diabetes**, an autoimmune condition that destroys insulin-producing cells, and **Type 2 diabetes**, which occurs when the body becomes resistant to insulin. While Type 1 diabetes is less common, affecting about **5-10%** of cases, Type 2 accounts for **95%** and is strongly linked to lifestyle factors such as obesity and physical inactivity. Common symptoms include frequent urination, excessive thirst, fatigue, blurred vision, and slow-healing wounds. If left untreated, diabetes can lead to serious complications like heart disease, kidney failure, and nerve damage.  

In the United States, diabetes is a growing public health concern, with **38.4 million people** diagnosed as of 2021. The disease disproportionately affects racial and ethnic groups, with **American Indians/Alaska Natives (13.6%)**, **non-Hispanic Black adults (12.1%)**, and **Hispanic adults (11.7%)** experiencing higher prevalence compared to **Asian Americans (9.1%)** and **non-Hispanic White adults (6.9%)** [(Diabetes.org)](https://diabetes.org/about-diabetes/statistics/about-diabetes). Age is also a major factor, as nearly **29.2% of adults over 65** have diabetes, and the number of diagnosed young people is rising, with **352,000 individuals under 20** living with the disease. In 2018 alone, **8.2 million hospital stays** involved diabetes-related conditions, with **95%** of those attributed to Type 2 diabetes [(HCUP)](https://hcup-us.ahrq.gov/reports/statbriefs/sb279-Diabetes-Inpatient-Stays-2018.jsp).  

The rising prevalence of diabetes in the U.S. is driven by several factors, including **obesity, poor diet, physical inactivity, and socioeconomic disparities**. Over **40% of U.S. adults** are classified as obese, increasing their risk of developing Type 2 diabetes. Additionally, limited access to **healthy food, healthcare, and preventive education** contributes to higher rates, particularly in low-income communities. Genetic predisposition also plays a role, making certain racial and ethnic groups more vulnerable. As diabetes continues to strain the healthcare system and impact millions of lives, effective prevention and management strategies are essential to reduce complications and improve patient outcomes.




# Problem Statement

Between 2010 and 2019, there were 304 million hospitalizations above 18 years of age, of which 78 million were diabetes-associated hospitalizations. During both 2010 and 2019 majority of the hospitalizations for diabetes occurred in the age groups 60–69 and 70–79 years, followed by ≥80 years and 50–59 years. Among these hospitalizations, there was a greater proportion of females (52.2%) during 2010, whereas there was a greater proportion of males during 2019 (51.2%). During both years, the majority of these hospitalizations were among Whites, followed by Blacks, Hispanics, Asians or Pacific Islanders, and Native Americans. [National Library of Medicine](https://pmc.ncbi.nlm.nih.gov/articles/PMC9698503/#:~:text=Results-,Between%202010%20and%202019%2C%20there%20were%20304%20million%20hospitalizations%20above,%E2%80%936.0)


With an emphasis on several diabetes-related characteristics, this project uses ten years' worth of U.S. hospital data to forecast the risk of diabetes-related early hospital readmissions. Healthcare results can be significantly enhanced by anticipating diabetic patients' early readmissions. Hospitals can offer targeted care, including individualized treatment plans and closer monitoring, to high-risk patients prior to discharge, therefore reducing problems and readmissions. Additionally, this lowers medical expenses and maximizes hospital resources. Early intervention can improve patients' quality of life and long-term well-being by preventing their health from getting worse. In the end, early readmission prediction results in better healthcare practices, cost savings, and patient management.

# Objectives


1. **Develop a binary classification model to predict early patient readmissions (within 30 days of discharge)**  
   Early readmissions are a key indicator of healthcare quality and efficiency. By building a predictive model, we aim to identify high-risk patients who are likely to return to the hospital shortly after discharge. This can help healthcare providers implement targeted interventions to improve patient outcomes and reduce unnecessary hospitalizations.  

2. **Analyze how patient demographics (age, gender, race, weight) influence readmission risk**  
   Certain demographic factors may contribute to a higher likelihood of readmission.  Understanding these patterns can help tailor post-discharge care to individual patient needs.  

3. **Assess the impact of medical history and diagnosis on readmission rates**  
   Conduct an in depth analysis on diag_1,diag_2,diag_3 and no of diagnoses to access how it impats readmission. By evaluating these factors, we can identify high-risk groups and recommend preventive strategies to manage their conditions effectively.  

4. **Analyze the influence of medications and medical treatment on readmission risk**  
   The type of medications prescribed(specific medications like insulin,metformin etc and treatment regimens such as glimepiride-pioglitazone etc,  can all affect a patient’s likelihood of returning to the hospital. By studying patterns in medication usage, we can determine if specific treatments contribute to better patient outcomes or pose risks that increase readmissions.  

5. **Explore how medical procedures and lab tests impact readmissions**  
   Certain medical procedures or test results may indicate higher health risks, influencing the likelihood of readmission. Evaluating the role of number of lab procedures,number of procedures, number of medications can help identify if intensive medical intervention correlates with higher/lower risk of admission.

6. **Accurately evaluate our model’s predictive performance**  
   To ensure the model provides reliable predictions, we will assess its performance using metrics such as accuracy, precision, recall, and AUC-ROC scores. A well-validated model can be used to support real-world decision-making in hospitals and healthcare settings.  

7. **Provide actionable insights for targeted interventions**  
   Beyond model predictions, this project aims to generate insights that healthcare professionals can use to improve patient care. By identifying key risk factors for readmission, hospitals can develop personalized discharge plans, enhance follow-up care, and implement policies that reduce preventable hospital returns, ultimately improving overall healthcare efficiency.  
 

# Data Limitations

1. **Historical bias.** The data source of this project was colleected between the years 1999-2008, which may not capture the new health and technological innovations that may have arised over the years

2. **Limited Socioeconomic and Behavioral Data.** The dataset lacks important social and behavioral factors (e.g., income level, diet, lifestyle choices) that influence readmissions. This may limit  the ability to capture the full picture of why some patients return to the hospital.

3. **Lack of Clear Documentation.** Some column names contain abbreviations , with no explanations, making it difficult to interpret their exact meaning. Without proper documentation, there is a risk of misinterpreting variables, which could affect data analysis and model accuracy

4. **Complexity of Medical Terminology.** The dataset includes various medical terms, diagnoses, and medical terminologies that may be difficult to understand without domain expertise. 


# Data Understanding

In [1]:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt


In [4]:
data = pd.read_csv('C:/Users/hp/Desktop/DATA NEXUS PROJECTS/Diabetes_130-US_Hospitals_1999-2008/diabetic_data.csv')
data.head()

Unnamed: 0,encounter_id,patient_nbr,race,gender,age,weight,admission_type_id,discharge_disposition_id,admission_source_id,time_in_hospital,...,citoglipton,insulin,glyburide-metformin,glipizide-metformin,glimepiride-pioglitazone,metformin-rosiglitazone,metformin-pioglitazone,change,diabetesMed,readmitted
0,2278392,8222157,Caucasian,Female,[0-10),?,6,25,1,1,...,No,No,No,No,No,No,No,No,No,NO
1,149190,55629189,Caucasian,Female,[10-20),?,1,1,7,3,...,No,Up,No,No,No,No,No,Ch,Yes,>30
2,64410,86047875,AfricanAmerican,Female,[20-30),?,1,1,7,2,...,No,No,No,No,No,No,No,No,Yes,NO
3,500364,82442376,Caucasian,Male,[30-40),?,1,1,7,2,...,No,Up,No,No,No,No,No,Ch,Yes,NO
4,16680,42519267,Caucasian,Male,[40-50),?,1,1,7,1,...,No,Steady,No,No,No,No,No,Ch,Yes,NO


In [5]:
data.tail()

Unnamed: 0,encounter_id,patient_nbr,race,gender,age,weight,admission_type_id,discharge_disposition_id,admission_source_id,time_in_hospital,...,citoglipton,insulin,glyburide-metformin,glipizide-metformin,glimepiride-pioglitazone,metformin-rosiglitazone,metformin-pioglitazone,change,diabetesMed,readmitted
101761,443847548,100162476,AfricanAmerican,Male,[70-80),?,1,3,7,3,...,No,Down,No,No,No,No,No,Ch,Yes,>30
101762,443847782,74694222,AfricanAmerican,Female,[80-90),?,1,4,5,5,...,No,Steady,No,No,No,No,No,No,Yes,NO
101763,443854148,41088789,Caucasian,Male,[70-80),?,1,1,7,1,...,No,Down,No,No,No,No,No,Ch,Yes,NO
101764,443857166,31693671,Caucasian,Female,[80-90),?,2,3,7,10,...,No,Up,No,No,No,No,No,Ch,Yes,NO
101765,443867222,175429310,Caucasian,Male,[70-80),?,1,1,7,6,...,No,No,No,No,No,No,No,No,No,NO
