# Diabetes Dataset using XG Boost
This dataset contains information from over 100,000 hospital visits (also called encounters). Each row represents a single patient visit to the hospital. We’re trying to predict whether a patient will be readmitted to the hospital after their visit — that’s our target variable.

I’m building a real-world healthcare readmission model using XGBoost — and explaining predictions with SHAP values so doctors and data scientists can trust what the model is saying. 

The biggest disadvantage of XGBoost is its lack of interpretability. Regulators and stakeholders often prefer simpler models like linear regression because each feature’s impact is directly represented by a coefficient, making it easier to explain and justify decisions.

### Why This Matters:
Linear regression: Each feature has a clear, interpretable weight (positive or negative impact). This is often called coefficient or weight.

XGBoost: Uses complex ensembles of trees, so it's harder to directly quantify feature impact without tools like SHAP values or partial dependence plots.

In regulated industries (like healthcare, finance), transparency is often more important than marginal accuracy gains.

In [1]:
import pandas as pd

# Load the CSV
df = pd.read_csv("/kaggle/input/diabetes/diabetic_data.csv")

# Show basic info
print(df.shape)
df.head()

(101766, 50)


Unnamed: 0,encounter_id,patient_nbr,race,gender,age,weight,admission_type_id,discharge_disposition_id,admission_source_id,time_in_hospital,...,citoglipton,insulin,glyburide-metformin,glipizide-metformin,glimepiride-pioglitazone,metformin-rosiglitazone,metformin-pioglitazone,change,diabetesMed,readmitted
0,2278392,8222157,Caucasian,Female,[0-10),?,6,25,1,1,...,No,No,No,No,No,No,No,No,No,NO
1,149190,55629189,Caucasian,Female,[10-20),?,1,1,7,3,...,No,Up,No,No,No,No,No,Ch,Yes,>30
2,64410,86047875,AfricanAmerican,Female,[20-30),?,1,1,7,2,...,No,No,No,No,No,No,No,No,Yes,NO
3,500364,82442376,Caucasian,Male,[30-40),?,1,1,7,2,...,No,Up,No,No,No,No,No,Ch,Yes,NO
4,16680,42519267,Caucasian,Male,[40-50),?,1,1,7,1,...,No,Steady,No,No,No,No,No,Ch,Yes,NO
