# What is the Problem?

## Informal

The classification goal is to predict if the client will make a particular cash investment with the bank.

## Formal

A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience E

- E - A list of clients with attributes about the clients
- T - Classify if a client will make a particular cash investment with the bank
- P - Kappa (normalized accuracy), the number of clients correctly classified whether they will make this investment (yes/no).


## Assumptions
- Day of the week could be important, probably want to try different transforms
- Month may be useful, certain months might be better.  I don't have a guess for which ones.
- A housing loan will be a positively correlated with the target.  These people have relied on the bank before for loans, and they often have good income.  Most people probably have a house, so I'm interested in which people do or do not have house loans.
- A personal loan indicates that this person probably doesn't have a large amount saved, but they may be looking to gain passive income to pay off debt
- For those with credit in default, they may be less willing to invest because they have other commitments to pay off before investing.  They also might be looking to make cash quickly to remove the stress of a default, so earning passive income may interest them.  It could go either way.
- marital status should have a large affect because I think finances are often different for each marital status group.
- The job title will be a great attribute indicating disposable income.  Might want to estimate the salary of each job as well to have an additional attribute combined across jobs.
- Age should be a good attribute for disposable income to invest.
- Duration should be discarded
- The quarterly indicators will have little effect
- Contact communication type probably doesn't matter
- Previous marketing campaign will be rarely used but useful when there is history.  If successful in the past, it will be more likely to be successful and likewise, failures in the past will more likely be failures again.  Also a success in the past may also lead to a failure in the future if they already have an investment with them.
- Previous number of contacts performed could be useful, there might be a middling number that does best.
- Days since last contacted from a previous campaign could be useful, will have to consider dealing with missing values.  A middling number might do best.

## Why Does It Need to Be Solved?

### Motivation

We can only reach out to so many people, so we want to reach out to those who are most likely to say yes to our investment opportunity.

### Solution Benefits

This solution would be a large improvement over unprioritized calls to clients.

### How the Solution Will Be Used

The model will rank the predictions and we give these rankings to our sales people to reach out the the right people.

## Solving the Problem Manually

The problem would be solved manually by calling those with a favorable age, credit history, and education.  Also people with a high paying job may have a higher priority.

# Prepare Data

## Select Data

The work has been done for us, we should include all of the data from the bank-additional-full file.  This file includes all the attributes and rows available.

# Preprocess Data

## Formatting




In [2]:
import pandas as pd
pd.set_option('display.max_columns', None)
df = pd.read_csv("data/bank-additional-full.csv", sep=";")
df.head()


Unnamed: 0,age,job,marital,education,default,housing,loan,contact,month,day_of_week,duration,campaign,pdays,previous,poutcome,emp.var.rate,cons.price.idx,cons.conf.idx,euribor3m,nr.employed,y
0,56,housemaid,married,basic.4y,no,no,no,telephone,may,mon,261,1,999,0,nonexistent,1.1,93.994,-36.4,4.857,5191.0,no
1,57,services,married,high.school,unknown,no,no,telephone,may,mon,149,1,999,0,nonexistent,1.1,93.994,-36.4,4.857,5191.0,no
2,37,services,married,high.school,no,yes,no,telephone,may,mon,226,1,999,0,nonexistent,1.1,93.994,-36.4,4.857,5191.0,no
3,40,admin.,married,basic.6y,no,no,no,telephone,may,mon,151,1,999,0,nonexistent,1.1,93.994,-36.4,4.857,5191.0,no
4,56,services,married,high.school,no,no,yes,telephone,may,mon,307,1,999,0,nonexistent,1.1,93.994,-36.4,4.857,5191.0,no
