# Bank Marketing Data - A Decision Tree Approach

Aim:
The aim of this attempt is to predict if the client will subscribe (yes/no) to a term deposit, by building a classification model using Decision Tree.

In [39]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.impute import SimpleImputer
from sklearn.preprocessing import OneHotEncoder, StandardScaler, LabelEncoder
from sklearn.pipeline import Pipeline
from sklearn.compose import ColumnTransformer
from sklearn import metrics
%matplotlib inline

## Step 1: Load the data
Load `bank.csv' data
Check the first five observations
Check if there are any null values

In [40]:
df = pd.read_csv('bank.csv')

In [41]:
df.head(5)

Unnamed: 0,age,job,marital,education,default,balance,housing,loan,contact,day,month,duration,campaign,pdays,previous,poutcome,deposit
0,59,admin.,married,secondary,no,2343,yes,no,unknown,5,may,1042,1,-1,0,unknown,yes
1,56,admin.,married,secondary,no,45,no,no,unknown,5,may,1467,1,-1,0,unknown,yes
2,41,technician,married,secondary,no,1270,yes,no,unknown,5,may,1389,1,-1,0,unknown,yes
3,55,services,married,secondary,no,2476,yes,no,unknown,5,may,579,1,-1,0,unknown,yes
4,54,admin.,married,tertiary,no,184,no,no,unknown,5,may,673,2,-1,0,unknown,yes


In [42]:
df.isnull().sum()

age          0
job          0
marital      0
education    0
default      0
balance      0
housing      0
loan         0
contact      0
day          0
month        0
duration     0
campaign     0
pdays        0
previous     0
poutcome     0
deposit      0
dtype: int64

In [43]:
df['day']

0         5
1         5
2         5
3         5
4         5
         ..
11157    20
11158    16
11159    19
11160     8
11161     9
Name: day, Length: 11162, dtype: int64

Summay of data
Categorical Variables :
[1] job : admin,technician, services, management, retired, blue-collar, unemployed, entrepreneur, housemaid, unknown, self-employed, student
[2] marital : married, single, divorced
[3] education: secondary, tertiary, primary, unknown
[4] default : yes, no
[5] housing : yes, no
[6] loan : yes, no
[7] deposit : yes, no ** (Dependent Variable)**
[8] contact : unknown, cellular, telephone
[9] month : jan, feb, mar, apr, may, jun, jul, aug, sep, oct, nov, dec
[10] poutcome: unknown, other, failure, success

Numerical Variables:
**[1] age
[2] balance
[3] day
[4] duration
[5] campaign
[6] pdays
[7] previous **

## Step 2: Transformer

In [44]:
numeric_features =  ['age', 'balance', 'day', 'duration', 'campaign', 'pdays', 'previous']
categorical_features = ['job', 'marital', 'education', 'default', 'housing', 'loan', 'contact', 'month', 'poutcome']

In [45]:
numeric_transformer = Pipeline(steps=[
    ('imputer', SimpleImputer(strategy='mean')),
    ('scaler', StandardScaler())
])

In [46]:
categorical_transformer = Pipeline(steps=[
    ('imputer', SimpleImputer(strategy='constant', fill_value='missing')),
    ('onehot', OneHotEncoder(drop='first', handle_unknown='ignore'))
])

In [47]:
preprocessor = ColumnTransformer(
    transformers=[
        ('num', numeric_transformer, numeric_features),
        ('cat', categorical_transformer, categorical_features)
    ])

## Step 3: Classifier

In [48]:
label_encoder = LabelEncoder()
y = label_encoder.fit_transform(df['deposit'])
X = df.drop('deposit', axis=1)

In [49]:
full_pipeline = Pipeline(steps=[
    ('preprocessor', preprocessor),
    ('classifier', DecisionTreeClassifier())
])

In [50]:
full_pipeline.fit(X, y)
predictions = full_pipeline.predict(X)

In [51]:
predictions

array([1, 1, 1, ..., 0, 0, 0])