# Bank Marketing

The <b>bank-marketing.csv</b> data is related with direct marketing campaigns of a Portuguese banking institution. The marketing campaigns were based on phone calls. Often, more than one contact to the same client was required, in order to access if the product (bank term deposit) would be (or not) subscribed. The ultimate goal is to predict if the client will subscribe to a term deposit (variable y). This is a classic classification problem where the attempt is to classify between two classes - those who'll subscribe and those who won't.

Dataset reference:
- S. Moro, R. Laureano and P. Cortez. Using Data Mining for Bank Direct Marketing: An Application of the CRISP-DM Methodology. 
- In P. Novais et al. (Eds.), Proceedings of the European Simulation and Modelling Conference - ESM'2011, pp. 117-121, Guimarães, Portugal, October, 2011. EUROSIS.

#### Variable description:

- 1 age (numeric)
- 2 job : type of job (categorical: "admin.","unknown","unemployed","management","housemaid","entrepreneur","student", "blue-collar","self-employed","retired","technician","services") 
- 3 marital : marital status (categorical: "married","divorced","single"; note: "divorced" means divorced or widowed)
- 4 education (categorical: "unknown","secondary","primary","tertiary")
- 5 default: has credit in default? (binary: "yes","no")
- 6 balance: average yearly balance, in euros (numeric) 
- 7 housing: has housing loan? (binary: "yes","no")
- 8 loan: has personal loan? (binary: "yes","no")
   
#### related with the last contact of the current campaign:
- 9 contact: contact communication type (categorical: "unknown","telephone","cellular") 
- 10 day: last contact day of the month (numeric)
- 11 month: last contact month of year (categorical: "jan", "feb", "mar", ..., "nov", "dec")
- 12 duration: last contact duration, in seconds (numeric)

#### other attributes:
- 13 campaign: number of contacts performed during this campaign and for this client (numeric, includes last contact)
- 14 pdays: number of days that passed by after the client was last contacted from a previous campaign (numeric, -1 means client was not previously contacted)
- 15 previous: number of contacts performed before this campaign and for this client (numeric)
- 16 poutcome: outcome of the previous marketing campaign (categorical: "unknown","other","failure","success")

Output variable (desired target):
- 17 y - has the client subscribed a term deposit? (binary: "yes","no")

### Read the dataset and answer the following questions.

In [69]:
import pandas as pd

In [70]:
bank_df=pd.read_csv(r'C:\Users\parma00c\bank-marketing.csv')

In [71]:
bank_df

Unnamed: 0,age,job,marital,education,default,balance,housing,loan,contact,day,month,duration,campaign,pdays,previous,poutcome,y
0,30,unemployed,married,primary,no,1787,no,no,cellular,19,oct,79,1,-1,0,unknown,no
1,33,services,married,secondary,no,4789,yes,yes,cellular,11,may,220,1,339,4,failure,no
2,35,management,single,tertiary,no,1350,yes,no,cellular,16,apr,185,1,330,1,failure,no
3,30,management,married,tertiary,no,1476,yes,yes,unknown,3,jun,199,4,-1,0,unknown,no
4,59,blue-collar,married,secondary,no,0,yes,no,unknown,5,may,226,1,-1,0,unknown,no
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
4516,33,services,married,secondary,no,-333,yes,no,cellular,30,jul,329,5,-1,0,unknown,no
4517,57,self-employed,married,tertiary,yes,-3313,yes,yes,unknown,9,may,153,1,-1,0,unknown,no
4518,57,technician,married,secondary,no,295,no,no,cellular,19,aug,151,11,-1,0,unknown,no
4519,28,blue-collar,married,secondary,no,1137,no,no,cellular,6,feb,129,4,211,3,other,no


### Question

Extract all column names. Count the number of columns (using code).

In [72]:
list(bank_df.columns)

['age',
 'job',
 'marital',
 'education',
 'default',
 'balance',
 'housing',
 'loan',
 'contact',
 'day',
 'month',
 'duration',
 'campaign',
 'pdays',
 'previous',
 'poutcome',
 'y']

### Question

Is data in the correct format? By that we mean, do you see integers and floats where you expect them to be the case? If not, convert them into the correct format. Also make sure that all entries are non-null.

In [73]:
bank_df.dtypes

age           int64
job          object
marital      object
education    object
default      object
balance       int64
housing      object
loan         object
contact      object
day           int64
month        object
duration      int64
campaign      int64
pdays         int64
previous      int64
poutcome     object
y            object
dtype: object

### Question

Provide a general summary statistics of the entire dataset. The describe() method is what you would want to use.

In [74]:
bank_df.describe()

Unnamed: 0,age,balance,day,duration,campaign,pdays,previous
count,4521.0,4521.0,4521.0,4521.0,4521.0,4521.0,4521.0
mean,41.170095,1422.657819,15.915284,263.961292,2.79363,39.766645,0.542579
std,10.576211,3009.638142,8.247667,259.856633,3.109807,100.121124,1.693562
min,19.0,-3313.0,1.0,4.0,1.0,-1.0,0.0
25%,33.0,69.0,9.0,104.0,1.0,-1.0,0.0
50%,39.0,444.0,16.0,185.0,2.0,-1.0,0.0
75%,49.0,1480.0,21.0,329.0,3.0,-1.0,0.0
max,87.0,71188.0,31.0,3025.0,50.0,871.0,25.0


### Question

The data type of columns like job, married, education etc. is called categorical data. List all the different categories in the job column.

In [75]:
bank_df.job.unique()

array(['unemployed', 'services', 'management', 'blue-collar',
       'self-employed', 'technician', 'entrepreneur', 'admin', 'student',
       'housemaid', 'retired', 'unknown'], dtype=object)

### Question

In one line of code, provide a count of the number of people who were unemployed and owned a home.

In [76]:
bank_df.loc[(bank_df['job'] == 'unemployed')].loc[(bank_df['housing'] == 'yes')]

Unnamed: 0,age,job,marital,education,default,balance,housing,loan,contact,day,month,duration,campaign,pdays,previous,poutcome,y
79,40,unemployed,married,secondary,no,219,yes,no,cellular,17,nov,204,2,196,1,failure,no
152,45,unemployed,divorced,primary,yes,-249,yes,yes,unknown,1,jul,92,1,-1,0,unknown,no
239,51,unemployed,married,tertiary,no,1634,yes,no,cellular,22,jul,168,4,-1,0,unknown,no
307,38,unemployed,married,primary,no,1147,yes,yes,unknown,8,may,249,5,-1,0,unknown,no
505,31,unemployed,married,secondary,no,296,yes,no,unknown,20,may,378,3,-1,0,unknown,no
594,37,unemployed,single,tertiary,yes,0,yes,no,cellular,10,jul,108,1,-1,0,unknown,no
773,33,unemployed,married,secondary,no,2,yes,no,cellular,16,jul,132,1,-1,0,unknown,no
828,47,unemployed,married,primary,no,168,yes,no,telephone,30,jan,66,1,241,1,other,no
1065,30,unemployed,married,tertiary,no,0,yes,no,cellular,18,nov,756,1,-1,0,unknown,no
1075,26,unemployed,married,tertiary,no,454,yes,yes,unknown,23,may,28,18,-1,0,unknown,no


In [77]:
bank_df.loc[(bank_df['job'] == 'unemployed')].loc[(bank_df['housing'] == 'yes')].count()

age          58
job          58
marital      58
education    58
default      58
balance      58
housing      58
loan         58
contact      58
day          58
month        58
duration     58
campaign     58
pdays        58
previous     58
poutcome     58
y            58
dtype: int64

### Question

What is the education level of a typical blue-collar worker? Explore value_counts() method.

In [78]:
bc_df = bank_df[(bank_df['job'] == 'blue-collar')]

In [79]:
#bc_df.set_index(["education", "job"]).count(level="education")
bc_df["education"].value_counts()

secondary    524
primary      369
unknown       41
tertiary      12
Name: education, dtype: int64

### Question

How many who are unemployed have an outstanding loan? Is that percentage more than that of the employed ones? 

In [80]:
#bank_df["job"].value_counts(normalize=True)
bank_df["job"].value_counts()

management       969
blue-collar      946
technician       768
admin            478
services         417
retired          230
self-employed    183
entrepreneur     168
unemployed       128
housemaid        112
student           84
unknown           38
Name: job, dtype: int64

### Question

What percent of clients subscribed to the term deposit (column y)? 

In [81]:
bank_df["y"].value_counts()

no     4000
yes     521
Name: y, dtype: int64

### Question

What percent of married clients subscribed to the term deposit? Is that more or less than that for single folks? 

In [82]:
bank_df["y"].value_counts(normalize=True)

no     0.88476
yes    0.11524
Name: y, dtype: float64

### Question

<p>Ask an interesting question of this data set and provide a solution to answer that.</p>
<p>The goal is to help teach your fellow students all possible questions we can collectively ask of this dataset. Your question should be as clear as possible and as short as possible. Try to avoid asking questions that are too trivial or obvious.</p>

<p>This is a bonus point question. The only way not to get full points are if you do the following:
<ul>
<li>You do not perform this task.
<li>You do not include a solution. 
</ul>
</p>
<p>If the solution you provide is incorrect, you will still receive full points. But you must make an honest effort to get it right.
</p>

Write your question here.

<b>People who are working, do not have home, has balance > 0 (this can be further check if necessary for some limit) and currently has no outstanding loan can be the candidate for housing loan

Aditional filters 
    which can be included is the people who were not contacted in last 30 days
    Peoplw who either have a telephone or cell phone (has a contact means)

</b>

In [83]:
fil1 = bank_df.loc[(bank_df['job'] != 'unemployed')].loc[(bank_df['housing'] == 'no')].loc[(bank_df['balance'] > 0)].loc[(bank_df['loan'] == 'no')]


In [84]:
fil2 = fil1.loc[(fil1['job'] != 'unknown')].loc[(fil1['housing'] == 'no')].loc[(fil1['balance'] > 0)].loc[(fil1['loan'] == 'no')]

In [85]:
fil3 = fil2.loc[(fil2['job'] != 'student')].loc[(fil2['housing'] == 'no')].loc[(fil2['balance'] > 0)].loc[(fil2['loan'] == 'no')]

In [86]:
fil4 = fil3.loc[(fil3['job'] != 'retired')].loc[(fil3['housing'] == 'no')].loc[(fil3['balance'] > 0)].loc[(fil3['loan'] == 'no')]

In [87]:
fil4["job"].value_counts()

management       348
technician       256
blue-collar      173
admin            121
services          81
self-employed     75
entrepreneur      54
housemaid         54
Name: job, dtype: int64