# Building a Logistic Regression

Create a logistic regression based on the bank data provided. 

The data is based on the marketing campaign efforts of a Portuguese banking institution. The classification goal is to predict if the client will subscribe a term deposit (variable y).

Note that the first column of the dataset is the index.

Source: [Moro et al., 2014] S. Moro, P. Cortez and P. Rita. A Data-Driven Approach to Predict the Success of Bank Telemarketing. Decision Support Systems, Elsevier, 62:22-31, June 2014


## Import the relevant libraries

In [2]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import statsmodels.api as sm
import seaborn as sb
sb.set()

## Load the data

Load the ‘Example_bank_data.csv’ dataset.

In [7]:
raw_data = pd.read_csv('Example-bank-data.csv')
raw_data

Unnamed: 0.1,Unnamed: 0,duration,y
0,0,117,no
1,1,274,yes
2,2,167,no
3,3,686,yes
4,4,157,no
5,5,126,no
6,6,84,no
7,7,17,no
8,8,704,yes
9,9,185,no


We want to know whether the bank marketing strategy was successful, so we need to transform the outcome variable into 0s and 1s in order to perform a logistic regression.

In [12]:
data = raw_data.copy()
data = data.drop(['Unnamed: 0'], axis = 1)
data['y'] = data['y'].map({'yes':1,'no':0})
data

Unnamed: 0,duration,y
0,117,0
1,274,1
2,167,0
3,686,1
4,157,0
5,126,0
6,84,0
7,17,0
8,704,1
9,185,0


In [13]:
data.describe()

Unnamed: 0,duration,y
count,518.0,518.0
mean,382.177606,0.5
std,344.29599,0.500483
min,9.0,0.0
25%,155.0,0.0
50%,266.5,0.5
75%,482.75,1.0
max,2653.0,1.0


### Declare the dependent and independent variables

In [15]:
y = data['y']
x1 = data['duration']

### Simple Logistic Regression

Run the regression and visualize it on a scatter plot (no need to plot the line).

In [17]:
x = sm.add_constant(x1)
my_model = sm.Logit(y,x)
my_fit = my_model.fit()
my_fit.summary()

Optimization terminated successfully.
         Current function value: 0.546118
         Iterations 7


0,1,2,3
Dep. Variable:,y,No. Observations:,518.0
Model:,Logit,Df Residuals:,516.0
Method:,MLE,Df Model:,1.0
Date:,"Tue, 18 Dec 2018",Pseudo R-squ.:,0.2121
Time:,21:44:21,Log-Likelihood:,-282.89
converged:,True,LL-Null:,-359.05
,,LLR p-value:,5.387e-35

0,1,2,3,4,5,6
,coef,std err,z,P>|z|,[0.025,0.975]
const,-1.7001,0.192,-8.863,0.000,-2.076,-1.324
duration,0.0051,0.001,9.159,0.000,0.004,0.006


In [None]:
plt.