# 🏦 Loan Data Analysis
**Dataset:** Customer banking and loan details

**Entries:** ~30,000

**Objective:** Analyze trends in income, loan behavior, and EMI details.

## 📌 1. Data Loading & Initial Exploration

In [None]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
import warnings
warnings.filterwarnings('ignore')

# Load data
df = pd.read_csv('../data/test.csv')
df.head()

## 🧹 2. Data Cleaning

In [None]:
# Convert dates
df['DOB'] = pd.to_datetime(df['DOB'], errors='coerce')
df['Lead_Creation_Date'] = pd.to_datetime(df['Lead_Creation_Date'], errors='coerce')

# Create Age feature
df['Age'] = pd.to_datetime('today').year - df['DOB'].dt.year
df['Age'] = df['Age'].fillna(df['Age'].median())

# Basic Cleaning: Fill nulls
df['Employer_Category2'].fillna(df['Employer_Category2'].mode()[0], inplace=True)
df['Employer_Category1'].fillna('Unknown', inplace=True)
df['Employer_Code'].fillna('Unknown', inplace=True)
df['Existing_EMI'].fillna(0, inplace=True)
df['Loan_Amount'].fillna(0, inplace=True)
df['Loan_Period'].fillna(0, inplace=True)
df['Interest_Rate'].fillna(df['Interest_Rate'].median(), inplace=True)
df['EMI'].fillna(0, inplace=True)

# Check cleaned data
df.info()

## 📊 3. Summary Visualizations & Insights

In [None]:
sns.histplot(df['Monthly_Income'], bins=50, kde=True)
plt.title('Monthly Income Distribution')
plt.xlabel('Monthly Income')
plt.ylabel('Frequency')
plt.show()

In [None]:
sns.histplot(df['Loan_Amount'], bins=50, kde=True)
plt.title('Loan Amount Distribution')
plt.xlabel('Loan Amount')
plt.ylabel('Frequency')
plt.show()

In [None]:
sns.scatterplot(x='Monthly_Income', y='EMI', data=df, alpha=0.5)
plt.title('EMI vs Monthly Income')
plt.xlabel('Monthly Income')
plt.ylabel('EMI')
plt.show()

In [None]:
sns.boxplot(x='Primary_Bank_Type', y='Loan_Period', data=df)
plt.title('Loan Period by Bank Type')
plt.show()

In [None]:
sns.scatterplot(x='Age', y='Loan_Amount', data=df, alpha=0.5)
plt.title('Age vs Loan Amount')
plt.show()

# 📘 Continued: Detailed EDA Analysis

# 📊 Detailed EDA: Loan Data Insights

This notebook expands the basic analysis by exploring patterns in income, loans, age, banking behavior, and customer demographics.

In [None]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
import warnings
warnings.filterwarnings('ignore')

# Load dataset
df = pd.read_csv('../data/test.csv')
df['DOB'] = pd.to_datetime(df['DOB'], errors='coerce')
df['Lead_Creation_Date'] = pd.to_datetime(df['Lead_Creation_Date'], errors='coerce')
df['Age'] = pd.to_datetime('today').year - df['DOB'].dt.year
df['Age'] = df['Age'].fillna(df['Age'].median())
df['Existing_EMI'].fillna(0, inplace=True)
df['Loan_Amount'].fillna(0, inplace=True)
df['EMI'].fillna(0, inplace=True)
df['Monthly_Income'] = df['Monthly_Income'].replace(0, np.nan)
df['Monthly_Income'].fillna(df['Monthly_Income'].median(), inplace=True)


## 💰 Income by Gender

In [None]:
sns.boxplot(x='Gender', y='Monthly_Income', data=df)
plt.title('Monthly Income Distribution by Gender')
plt.ylabel('Monthly Income')
plt.show()

## 🏢 Loan Amount by Employer Category

In [None]:
sns.boxplot(x='Employer_Category1', y='Loan_Amount', data=df)
plt.xticks(rotation=45)
plt.title('Loan Amount by Employer Category')
plt.show()

## 🔄 EMI vs Loan Amount

In [None]:
sns.scatterplot(data=df, x='Loan_Amount', y='EMI', hue='Primary_Bank_Type')
plt.title('EMI vs Loan Amount by Bank Type')
plt.show()

## 🧓 Age Distribution

In [None]:
sns.histplot(df['Age'], bins=30, kde=True)
plt.title('Distribution of Customer Age')
plt.xlabel('Age')
plt.ylabel('Count')
plt.show()

## 📈 Correlation Heatmap

In [None]:
plt.figure(figsize=(10,6))
sns.heatmap(df[['Monthly_Income','Loan_Amount','EMI','Interest_Rate','Age']].corr(), annot=True, cmap='coolwarm')
plt.title('Correlation Matrix')
plt.show()

## 📌 Summary of Findings


- 🔹 **Monthly Income** shows a wide range with some outliers.
- 🔹 **Loan Amounts** are higher for certain employer categories.
- 🔹 **EMI and Loan Amount** are strongly correlated, as expected.
- 🔹 **Age distribution** skews between 25–45 years, showing a young to mid-career borrower base.
- 🔹 **Interest Rate** has weak correlation with income but moderate with EMI.
- 🔹 **Bank Type** affects EMI pattern slightly but not drastically.

These insights can guide financial institutions to segment and prioritize leads more effectively.
