In [3]:
import pandas as pd

# Bank Marketing

## Objectives

- Predict the success of telemarketing calls for selling bank long-term deposits.

## Algorithms/Models

- Logistic Regression (Classification)
- Support Vector Machine (Classification)
- Random Forest (Classification)

## Data Preview

In [9]:
data = pd.read_csv('bank_marketing.csv', sep=';')
data.head()

Unnamed: 0,age,job,marital,education,default,housing,loan,contact,month,day_of_week,...,campaign,pdays,previous,poutcome,emp.var.rate,cons.price.idx,cons.conf.idx,euribor3m,nr.employed,y
0,56,housemaid,married,basic.4y,no,no,no,telephone,may,mon,...,1,999,0,nonexistent,1.1,93.994,-36.4,4.857,5191.0,no
1,57,services,married,high.school,unknown,no,no,telephone,may,mon,...,1,999,0,nonexistent,1.1,93.994,-36.4,4.857,5191.0,no
2,37,services,married,high.school,no,yes,no,telephone,may,mon,...,1,999,0,nonexistent,1.1,93.994,-36.4,4.857,5191.0,no
3,40,admin.,married,basic.6y,no,no,no,telephone,may,mon,...,1,999,0,nonexistent,1.1,93.994,-36.4,4.857,5191.0,no
4,56,services,married,high.school,no,no,yes,telephone,may,mon,...,1,999,0,nonexistent,1.1,93.994,-36.4,4.857,5191.0,no


## Data Dictionary

- age = age (numeric)
- job = type of job (categorical: 'admin.','blue-collar','entrepreneur','housemaid','management','retired','self-employed','services','student','technician','unemployed','unknown')
- marital = marital status (categorical: 'divorced','married','single','unknown'; note: 'divorced' means divorced or widowed)
- education = education (categorical: 'basic.4y','basic.6y','basic.9y','high.school','illiterate','professional.course','university.degree','unknown')
- default = has credit in default? (categorical: 'no','yes','unknown')
- housing = has housing loan? (categorical: 'no','yes','unknown')
- loan = has personal loan? (categorical: 'no','yes','unknown')
- contact = contact communication type (categorical: 'cellular','telephone') 
- month = last contact month of year (categorical: 'jan', 'feb', 'mar', ..., 'nov', 'dec')
- day_of_week = last contact day of the week (categorical: 'mon','tue','wed','thu','fri')
- duration = last contact duration, in seconds (numeric). Important note: this attribute highly affects the output target (e.g., if duration=0 then y='no'). Yet, the duration is not known before a call is performed. Also, after the end of the call y is obviously known. Thus, this input should only be included for benchmark purposes and should be discarded if the intention is to have a realistic predictive model.
- campaign = number of contacts performed during this campaign and for this client (numeric, includes last contact)
- pdays = number of days that passed by after the client was last contacted from a previous campaign (numeric; 999 means client was not previously contacted)
- previous = number of contacts performed before this campaign and for this client (numeric)
- poutcome = outcome of the previous marketing campaign (categorical: 'failure','nonexistent','success')
- emp.var.rate = employment variation rate - quarterly indicator (numeric)
- cons.price.idx = consumer price index - monthly indicator (numeric) 
- cons.conf.idx = consumer confidence index - monthly indicator (numeric) 
- euribor3m = euribor 3 month rate - daily indicator (numeric)
- nr.employed = number of employees - quarterly indicator (numeric)
- y = has the client subscribed a term deposit? (binary: 'yes','no')

## Basic Literature

- Moro S., Laureano, R., & Cortez, P. (2011). Using Data Mining for Bank Direct Marketing: An Application of the CRISP-DM Methodology. In P. Novais et al, *Proceedings of the European Simulation and Modelling Conference*. Paper presented at the 2011 European Simulation and Modelling Conference, Guimaraes, Portugal (117-121). Belgium: EUROSIS.
- Moro S., Cortez, P., & Rita, P. (2004). A Data-Driven Approach to Predict the Success of Bank Telemarketing. *Decision Support Systems, 62*, 22-31. https://doi.org/10.1016/j.dss.2014.03.001

## Source

https://archive.ics.uci.edu/ml/datasets/Bank+Marketing