# 📊 Credit Risk Analytics – Loan Default Prediction

**Author:** Ankur Jha  
**Project:** QuantLake Data Analyst Internship – Final Project

## 🧩 1. Load Dataset

In [None]:

import pandas as pd

# Replace with your actual file name after uploading to Colab / Jupyter
df = pd.read_csv("bank.csv")
df.head()


## 🧹 2. Data Cleaning

In [None]:

# Check nulls
print(df.isnull().sum())

# Convert negative experience to positive
df['Experience'] = df['Experience'].apply(lambda x: abs(x))

# Map categorical values
df['Education'] = df['Education'].map({
    'Undergrad': 0,
    'Graduate': 1,
    'Advanced/Professional': 2
})

df['Securities Account'] = df['Securities Account'].astype(int)
df['CD Account'] = df['CD Account'].astype(int)
df['Online'] = df['Online'].astype(int)
df['CreditCard'] = df['CreditCard'].astype(int)


## 📊 3. Exploratory Data Analysis (EDA)

In [None]:

import matplotlib.pyplot as plt
import seaborn as sns

# Target variable distribution
sns.countplot(x='Personal Loan', data=df)
plt.title('Loan Approval Distribution')
plt.show()

# Income vs Loan Approval
sns.boxplot(x='Personal Loan', y='Income', data=df)
plt.title('Income vs Loan Approval')
plt.show()

# Correlation heatmap
plt.figure(figsize=(12, 6))
sns.heatmap(df.corr(), annot=True, cmap='coolwarm')
plt.title("Correlation Matrix")
plt.show()


## 🧠 4. Predictive Modeling (Logistic Regression)

In [None]:

from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import classification_report, confusion_matrix

X = df.drop('Personal Loan', axis=1)
y = df['Personal Loan']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

model = LogisticRegression(max_iter=1000)
model.fit(X_train, y_train)
y_pred = model.predict(X_test)

print(confusion_matrix(y_test, y_pred))
print(classification_report(y_test, y_pred))


## 📌 5. Business Insights & Drivers of Default


### Top 5 Drivers of Loan Default:
1. **Income** – Higher income tends to reduce default probability.
2. **CD Account** – Customers with CD accounts have higher acceptance.
3. **Education Level** – Higher education correlates with approval.
4. **Mortgage** – High mortgage doesn't directly mean rejection.
5. **Online Banking** – Users using online facilities are more likely to get approved.

---
✅ Use this notebook to build plots, train models, and finalize insights for your GitHub repo and project submission.
