<a href="https://colab.research.google.com/github/Sanchita210507/BML-Experiments/blob/main/LCA_3.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**The classification goal is to predict if the client will subscribe a term deposit (variable y).
Apply EDA, select any appropriate feature manually and decide appropriate ML model to predict if the customer will subscribe to Term Deposit plan or Not.**

***Dataset Used: Bank Marketing Dataset***

*Step 1: Load and prepare the dataset.*


In [None]:
from google.colab import drive
drive.mount('/content/drive')

import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import OneHotEncoder, StandardScaler
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import (
    classification_report,
    accuracy_score,
    confusion_matrix,
    roc_auc_score,
    RocCurveDisplay
)

data_path = '/content/drive/MyDrive/bank_marketing/bank/bank.csv'

df = pd.read_csv(data_path, sep=';')

print("Shape:", df.shape)
print("Columns:", list(df.columns))

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).
Shape: (4521, 17)
Columns: ['age', 'job', 'marital', 'education', 'default', 'balance', 'housing', 'loan', 'contact', 'day', 'month', 'duration', 'campaign', 'pdays', 'previous', 'poutcome', 'y']


*Step 2: Selcect Features manually and apply EDA*

In [None]:
# features
selected_features = ['age', 'balance', 'pdays', 'poutcome']
df = df[selected_features + ['y']]

print("\nSample Data:")
print(df.head())

print("\nTarget Distribution:")
print(df['y'].value_counts(normalize=True))

print("\nMissing Values:")
print(df.isnull().sum())

X = df[selected_features]
y = df['y'].map({'yes': 1, 'no': 0})  # Encode target variable

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42, stratify=y
)
print("\nTrain/Test Split Completed:")
print("Train size:", X_train.shape, "Test size:", X_test.shape)

# preprocessing
num_features = ['age', 'balance', 'pdays']
cat_features = ['poutcome']

numeric_transformer = Pipeline(steps=[
    ('scaler', StandardScaler())
])
categorical_transformer = Pipeline(steps=[
    ('encoder', OneHotEncoder(handle_unknown='ignore'))
])

preprocessor = ColumnTransformer(
    transformers=[
        ('num', numeric_transformer, num_features),
        ('cat', categorical_transformer, cat_features)
    ]
)



Sample Data:
   age  balance  pdays poutcome   y
0   30     1787     -1  unknown  no
1   33     4789    339  failure  no
2   35     1350    330  failure  no
3   30     1476     -1  unknown  no
4   59        0     -1  unknown  no

Target Distribution:
y
no     0.88476
yes    0.11524
Name: proportion, dtype: float64

Missing Values:
age         0
balance     0
pdays       0
poutcome    0
y           0
dtype: int64

Train/Test Split Completed:
Train size: (3616, 4) Test size: (905, 4)


*Step 3: Build Logisitc Regression Model for dataset*

In [None]:
# Train Logistic Regression Model
model = Pipeline(steps=[
    ('preprocessor', preprocessor),
    ('classifier', LogisticRegression(max_iter=1000))
])

model.fit(X_train, y_train)
y_pred = model.predict(X_test)
y_proba = model.predict_proba(X_test)[:, 1]

# Evaluate Model
print("\nClassification Report:")
print(classification_report(y_test, y_pred, digits=4))

accuracy = accuracy_score(y_test, y_pred)
print("\nModel Accuracy:", round(accuracy * 100, 2), "%")

print("\nConfusion Matrix:")
print(confusion_matrix(y_test, y_pred))

roc_auc = roc_auc_score(y_test, y_proba)
print("\nROC-AUC Score:", roc_auc)




Classification Report:
              precision    recall  f1-score   support

           0     0.9017    0.9850    0.9415       801
           1     0.6000    0.1731    0.2687       104

    accuracy                         0.8917       905
   macro avg     0.7509    0.5790    0.6051       905
weighted avg     0.8670    0.8917    0.8642       905


Model Accuracy: 89.17 %

Confusion Matrix:
[[789  12]
 [ 86  18]]

ROC-AUC Score: 0.6116573033707865
