# Task 5: Personal Loan Acceptance Prediction
###  Objective:
Predict which customers are likely to accept a personal loan offer using classification models.
- **Dataset**: Bank Marketing Dataset (UCI Machine Learning Repository)



In [None]:
# import Necessary Libraries
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report

## Load Dataset

In [None]:
# Load the dataset
df = pd.read_csv("../data/bank.csv")

# Show basic info and first few rows
print(df.info())
df.head()

## Dataset Description
The dataset contains 4521 customer records with 17 features such as:
- age, job, marital, education, balance
- default, housing, loan, contact, month, day, duration
- pdays, previous, poutcome
- y → Target (1 = Accepted Loan, 0 = Not Accepted)



##  Data Cleaning and Preparation

In [None]:
# Ensure target column is numeric
df['y'] = df['y'].astype(int)

# Check for missing values
df.isnull().sum()

## Exploratory Data Analysis (EDA)
Explore how features like age, job, and marital status influence loan acceptance.

#### Age Distribution by Loan Acceptance

In [None]:
sns.set(style="whitegrid")

plt.figure(figsize=(16, 10))

plt.subplot(2, 2, 1)
sns.histplot(data=df, x='age', hue='y', bins=30, kde=True, palette='pastel')
plt.title("Age Distribution by Loan Acceptance")


plt.tight_layout()
plt.show()

##### Insight:
The Above Graph Shows
- Most clients are aged 30–40.
- People around 30–35 are slightly more likely to accept.
- Overall, younger people (25–40) have more acceptances than older ones.

#### Job vs Loan Acceptance

In [None]:
plt.figure(figsize=(16, 10))

plt.subplot(2, 2, 2)
sns.countplot(data=df, y='job', hue='y', palette='Set2')
plt.title("Job vs Loan Acceptance")


plt.tight_layout()
plt.show()

##### Insight:
The Above Graph Shows:
- Management, technicians, and admin. have high participation.
- Students, entrepreneurs, and self-employed have a higher acceptance rate, even if total counts are lower.
- Retired, housemaids, and blue-collar workers have low acceptance.



#### Marital Status vs Loan Acceptance

In [None]:
plt.figure(figsize=(16, 10))

plt.subplot(2, 2, 3)
sns.countplot(data=df, x='marital', hue='y', palette='Set3')
plt.title("Marital Status vs Loan Acceptance")

plt.tight_layout()
plt.show()


##### Insight:
The Above Graph Shows:
- Most clients are married, but the acceptance rate is low in that group.
- Single people show a relatively higher acceptance rate than married or divorced.

#### Education vs Loan Acceptance

In [None]:
plt.figure(figsize=(16, 10))

plt.subplot(2, 2, 4)
sns.countplot(data=df, x='education', hue='y', palette='Set1')
plt.title("Education vs Loan Acceptance")

plt.tight_layout()
plt.show()

##### Insight:
The Above Graph Shows:
- People with tertiary (higher) education are more likely to accept offers.
- Those with only primary education rarely accept.
- Clients with unknown education also show very low acceptance.

## Model Training & Testing

In [None]:

# Encode categorical variables
df_encoded = df.copy()
label_encoders = {}
for col in df_encoded.select_dtypes(include='object').columns:
    le = LabelEncoder()
    df_encoded[col] = le.fit_transform(df_encoded[col])
    label_encoders[col] = le

# Define features and target
X = df_encoded.drop('y', axis=1)
y = df_encoded['y']

# Train/test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

In [None]:
X.head()


In [None]:
y.head()

### Logistic Regression

In [None]:
# Scale the features
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

# Train Logistic Regression on scaled data
lr_model = LogisticRegression(max_iter=1000)
lr_model.fit(X_train_scaled, y_train)
y_pred_lr = lr_model.predict(X_test_scaled)

### Decision Tree

In [None]:
# Decision Tree
dt_model = DecisionTreeClassifier(random_state=42)
dt_model.fit(X_train, y_train)
y_pred_dt = dt_model.predict(X_test)

##  Evaluation Metrics

In [None]:
# Evaluate both models
print("Logistic Regression Accuracy:", round(accuracy_score(y_test, y_pred_lr) * 100, 2), "%")
print("Confusion Matrix (Logistic Regression):\n", confusion_matrix(y_test, y_pred_lr))
print("Classification Report (Logistic Regression):\n", classification_report(y_test, y_pred_lr))

print("\nDecision Tree Accuracy:", round(accuracy_score(y_test, y_pred_dt) * 100, 2), "%")
print("Confusion Matrix (Decision Tree):\n", confusion_matrix(y_test, y_pred_dt))
print("Classification Report (Decision Tree):\n", classification_report(y_test, y_pred_dt))

**Logistic Regression**
- Accuracy: 89%
- Good at saying who won’t take the loan.
- Misses many people who would take it.

**Decision Tree**
- Accuracy: 87%
- Better at finding people who will take the loan.
- Makes more wrong guesses.
----------------------------------------------------------
**Suggestions:**
- Use Logistic Regression if you want to be safe.
- Use Decision Tree if you want to find more customers who might say yes.

In [None]:
# Logistic Regression

# Plot confusion matrices
fig, axes = plt.subplots(1, 2, figsize=(12, 5))

sns.heatmap([[788, 19], [81, 17]], annot=True, fmt='d', cmap='Blues', ax=axes[0])
axes[0].set_title('Logistic Regression Confusion Matrix')
axes[0].set_xlabel('Predicted')
axes[0].set_ylabel('Actual')

plt.tight_layout()
plt.show()


In [None]:
# Decision Tree

# Plot confusion matrices
fig, axes = plt.subplots(1, 2, figsize=(12, 5))

sns.heatmap([[743, 64], [52, 46]], annot=True, fmt='d', cmap='Greens', ax=axes[1])
axes[1].set_title('Decision Tree Confusion Matrix')
axes[1].set_xlabel('Predicted')
axes[1].set_ylabel('Actual')

plt.tight_layout()
plt.show()


In [None]:
# 🔍 Feature Importance from Decision Tree
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd

# Extract and plot top 10 important features
importances = dt_model.feature_importances_
feature_names = X.columns

importance_df = pd.DataFrame({
    'Feature': feature_names,
    'Importance': importances
}).sort_values(by='Importance', ascending=False)

plt.figure(figsize=(10, 6))
sns.barplot(data=importance_df.head(10), x='Importance', y='Feature', palette='viridis')
plt.title('Top 10 Important Features (Decision Tree)')
plt.xlabel('Feature Importance Score')
plt.ylabel('Feature')
plt.tight_layout()
plt.show()

importance_df.head(10)


Based on the model, these things matter most:

- People who talk longer on the call are more likely to say yes.
- Age **25–40** is the most responsive group.
- Those with a higher bank balance usually accept more.
- Single people accept more than married or divorced ones.
- Students, people in **management**, and **entrepreneurs** are more open.

## Conclusion and Insights

- **Decision Tree** outperforms Logistic Regression in identifying customers likely to accept offers.
- **Single, highly educated, and younger customers** show higher acceptance rates.
- Recommendation: Use Decision Tree for targeted marketing campaigns.

