# Building a Logistic Regression

Create a logistic regression based on the bank data provided. 

The data is based on the marketing campaign efforts of a Portuguese banking institution. The classification goal is to predict if the client will subscribe a term deposit (variable y).

Note that the first column of the dataset is the index.

Source: [Moro et al., 2014] S. Moro, P. Cortez and P. Rita. A Data-Driven Approach to Predict the Success of Bank Telemarketing. Decision Support Systems, Elsevier, 62:22-31, June 2014


## Import the relevant libraries

In [1]:
import numpy as np
import pandas as pd
import statsmodels.api as sm

## Load the data

Load the ‘Example_bank_data.csv’ dataset.

In [2]:
raw_data = pd.read_csv("Bank_data.csv")

We want to know whether the bank marketing strategy was successful, so we need to transform the outcome variable into 0s and 1s in order to perform a logistic regression.

In [3]:
raw_data.head()

Unnamed: 0.1,Unnamed: 0,interest_rate,credit,march,may,previous,duration,y
0,0,1.334,0.0,1.0,0.0,0.0,117.0,no
1,1,0.767,0.0,0.0,2.0,1.0,274.0,yes
2,2,4.858,0.0,1.0,0.0,0.0,167.0,no
3,3,4.12,0.0,0.0,0.0,0.0,686.0,yes
4,4,4.856,0.0,1.0,0.0,0.0,157.0,no


In [8]:
data = raw_data.copy()
data['y'] = data['y'].map({'yes': 1, 'no': 0})
y = data['y']
data = data.drop(['y'], axis=1)
data.head()

Unnamed: 0.1,Unnamed: 0,interest_rate,credit,march,may,previous,duration
0,0,1.334,0.0,1.0,0.0,0.0,117.0
1,1,0.767,0.0,0.0,2.0,1.0,274.0
2,2,4.858,0.0,1.0,0.0,0.0,167.0
3,3,4.12,0.0,0.0,0.0,0.0,686.0
4,4,4.856,0.0,1.0,0.0,0.0,157.0


### Declare the dependent and independent variables

In [10]:
x1 = data[data.columns]

Unnamed: 0.1,Unnamed: 0,interest_rate,credit,march,may,previous,duration
0,0,1.334,0.0,1.0,0.0,0.0,117.0
1,1,0.767,0.0,0.0,2.0,1.0,274.0
2,2,4.858,0.0,1.0,0.0,0.0,167.0
3,3,4.120,0.0,0.0,0.0,0.0,686.0
4,4,4.856,0.0,1.0,0.0,0.0,157.0
...,...,...,...,...,...,...,...
513,513,1.334,0.0,1.0,0.0,0.0,204.0
514,514,0.861,0.0,0.0,2.0,1.0,806.0
515,515,0.879,0.0,0.0,0.0,0.0,290.0
516,516,0.877,0.0,0.0,5.0,1.0,473.0


### Simple Logistic Regression

Run the regression and visualize it on a scatter plot (no need to plot the line).

In [12]:
x = sm.add_constant(x1)
reg_log = sm.Logit(y, x)
result_log = reg_log.fit()

Optimization terminated successfully.
         Current function value: 0.335737
         Iterations 7


In [13]:
result_log.summary()

0,1,2,3
Dep. Variable:,y,No. Observations:,518.0
Model:,Logit,Df Residuals:,510.0
Method:,MLE,Df Model:,7.0
Date:,"Thu, 17 Aug 2023",Pseudo R-squ.:,0.5156
Time:,16:10:47,Log-Likelihood:,-173.91
converged:,True,LL-Null:,-359.05
Covariance Type:,nonrobust,LLR p-value:,5.602000000000001e-76

0,1,2,3,4,5,6
,coef,std err,z,P>|z|,[0.025,0.975]
const,-0.0347,0.408,-0.085,0.932,-0.834,0.764
Unnamed: 0,-0.0004,0.001,-0.461,0.645,-0.002,0.001
interest_rate,-0.7790,0.092,-8.463,0.000,-0.959,-0.599
credit,2.3701,1.091,2.172,0.030,0.231,4.509
march,-1.8120,0.331,-5.468,0.000,-2.461,-1.163
may,0.1918,0.229,0.836,0.403,-0.258,0.641
previous,1.2795,0.585,2.188,0.029,0.133,2.426
duration,0.0070,0.001,9.396,0.000,0.006,0.008
