# Bank Marketing Success Classification

The data is related with direct marketing campaigns of a Portuguese banking institution. The marketing campaigns were based on phone calls. Often, more than one contact to the same client was required, in order to access if the product (bank term deposit) would be ('yes') or not ('no') subscribed.

The goal of this project is to run classification algorithms to identify whether a customer will subscribe to a term deposit. The ```bank-names.txt``` file has a description of all the independent variables as well as the dependent variable

In [9]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

from dataprep.eda import plot

In [10]:
df = pd.read_csv("bank-full.csv", delimiter=";")
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 45211 entries, 0 to 45210
Data columns (total 17 columns):
 #   Column     Non-Null Count  Dtype 
---  ------     --------------  ----- 
 0   age        45211 non-null  int64 
 1   job        45211 non-null  object
 2   marital    45211 non-null  object
 3   education  45211 non-null  object
 4   default    45211 non-null  object
 5   balance    45211 non-null  int64 
 6   housing    45211 non-null  object
 7   loan       45211 non-null  object
 8   contact    45211 non-null  object
 9   day        45211 non-null  int64 
 10  month      45211 non-null  object
 11  duration   45211 non-null  int64 
 12  campaign   45211 non-null  int64 
 13  pdays      45211 non-null  int64 
 14  previous   45211 non-null  int64 
 15  poutcome   45211 non-null  object
 16  y          45211 non-null  object
dtypes: int64(7), object(10)
memory usage: 5.9+ MB


# EDA

In [None]:
plot(df)

In [None]:
plot(df, 'age', 'y')

In [None]:
plot(df, 'job', 'y')

In [None]:
plot(df, 'marital', 'y')

In [None]:
plot(df, 'education', 'y')

In [None]:
plot(df, 'default', 'y')

In [None]:
plot(df, 'education', 'y')

In [None]:
plot(df, 'default', 'y')

In [None]:
plot(df, 'balance', 'y')

In [None]:
plot(df, 'housing', 'y')

In [None]:
plot(df, 'loan', 'y')

In [None]:
plot(df, 'contact', 'y')

In [None]:
plot(df, 'day', 'y')

In [None]:
plot(df, 'month', 'y')

In [None]:
plot(df, 'duration', 'y')

In [None]:
plot(df, 'campaign', 'y')

In [None]:
plot(df, 'pdays', 'y')

In [None]:
plot(df, 'previous', 'y')

In [None]:
plot(df, 'poutcome', 'y')

# Data Cleaning

In [11]:
from data_cleaning import *

In [12]:
# binary
df = clean_binary_cols(df)
df['contacted_previously'] = df['pdays'].apply(was_previously_contacted)

In [13]:
# transformations

df['standardized_balance'] = (df['balance'] - np.mean(df['balance'])) / np.std(df['balance'])
df['standardized_duration'] = (df['duration'] - np.mean(df['duration'])) / np.std(df['duration'])

In [14]:
# extra month columns 
df['month_numeric'] = df['month'].apply(month_to_numeric)
df['quarter'] = df['month'].apply(month_to_quarters)

In [15]:
# categorical          
df = get_cat_cols_dummies(df, drop_original_cols=True, drop_first=True)

In [16]:
df

Unnamed: 0,age,default,balance,housing,loan,day,duration,campaign,pdays,previous,...,month_jul,month_jun,month_mar,month_may,month_nov,month_oct,month_sep,poutcome_other,poutcome_success,poutcome_unknown
0,58,0,2143,1,0,5,261,1,-1,0,...,0,0,0,1,0,0,0,0,0,1
1,44,0,29,1,0,5,151,1,-1,0,...,0,0,0,1,0,0,0,0,0,1
2,33,0,2,1,1,5,76,1,-1,0,...,0,0,0,1,0,0,0,0,0,1
3,47,0,1506,1,0,5,92,1,-1,0,...,0,0,0,1,0,0,0,0,0,1
4,33,0,1,0,0,5,198,1,-1,0,...,0,0,0,1,0,0,0,0,0,1
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
45206,51,0,825,0,0,17,977,3,-1,0,...,0,0,0,0,1,0,0,0,0,1
45207,71,0,1729,0,0,17,456,2,-1,0,...,0,0,0,0,1,0,0,0,0,1
45208,72,0,5715,0,0,17,1127,5,184,3,...,0,0,0,0,1,0,0,0,1,0
45209,57,0,668,0,0,17,508,4,-1,0,...,0,0,0,0,1,0,0,0,0,1


# Model Building