# 💳 Credit Card Default Prediction Project
This project uses machine learning models to predict whether a customer will default on their credit card payment next month.

### ✅ Models Included:
- Logistic Regression
- Random Forest
- XGBoost
- Evaluation Metrics: Confusion Matrix, Classification Report, ROC AUC

Dataset Source: [UCI Repository](https://archive.ics.uci.edu/ml/datasets/default+of+credit+card+clients)

In [None]:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier
from xgboost import XGBClassifier
from sklearn.metrics import classification_report, confusion_matrix, roc_auc_score

In [None]:
# Load dataset
url = "https://archive.ics.uci.edu/ml/machine-learning-databases/00350/default%20of%20credit%20card%20clients.xls"
data = pd.read_excel(url, header=1)
data.rename(columns={'default payment next month': 'default'}, inplace=True)

# Drop ID column and split into features/target
X = data.drop(columns=['ID', 'default'])
y = data['default']

# Train-test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Feature scaling for Logistic Regression
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

In [None]:
# Initialize models
log_reg = LogisticRegression(max_iter=1000)
rf = RandomForestClassifier(n_estimators=100, random_state=42)
xgb = XGBClassifier(use_label_encoder=False, eval_metric='logloss', random_state=42)

# Train models
log_reg.fit(X_train_scaled, y_train)
rf.fit(X_train, y_train)
xgb.fit(X_train, y_train)

In [None]:
# Predictions
y_pred_log = log_reg.predict(X_test_scaled)
y_pred_rf = rf.predict(X_test)
y_pred_xgb = xgb.predict(X_test)

# Metrics
print("🔹 Logistic Regression")
print(confusion_matrix(y_test, y_pred_log))
print(classification_report(y_test, y_pred_log))
print("ROC AUC:", roc_auc_score(y_test, y_pred_log))

print("\n🔹 Random Forest")
print(confusion_matrix(y_test, y_pred_rf))
print(classification_report(y_test, y_pred_rf))
print("ROC AUC:", roc_auc_score(y_test, y_pred_rf))

print("\n🔹 XGBoost")
print(confusion_matrix(y_test, y_pred_xgb))
print(classification_report(y_test, y_pred_xgb))
print("ROC AUC:", roc_auc_score(y_test, y_pred_xgb))

### ✅ Next Steps (Optional)
- Use SHAP (SHapley Additive exPlanations) to interpret XGBoost results
- Deploy model with Streamlit or Flask
- Build a dashboard using Power BI or Tableau

## 🔍 SHAP Interpretation for XGBoost
Now we use **SHAP (SHapley Additive Explanations)** to understand feature contributions to model predictions.

In [None]:
!pip install shap

In [None]:
import shap

# Ensure input is a DataFrame for SHAP compatibility
X_test_df = pd.DataFrame(X_test, columns=X.columns)

# Create SHAP explainer
explainer = shap.Explainer(xgb)

# Compute SHAP values for a sample
shap_values = explainer(X_test_df[:100])

In [None]:
# Summary plot of SHAP values
shap.summary_plot(shap_values, X_test_df[:100])

In [None]:
# Waterfall plot for the first prediction
shap.plots.waterfall(shap_values[0])