# Binary Predictors in a Logistic Regression

Using the same code as in the previous exercise, find the odds of 'duration'. 

What do they tell you?

## Import the relevant libraries

In [1]:
import numpy as np
import pandas as pd
import statsmodels.api as sm

## Load the data

Load the ‘Bank_data.csv’ dataset.

In [26]:
data = pd.read_csv('Bank-data.csv')

In [27]:
data = data.drop(['Unnamed: 0'],axis=1)

In [28]:
data['y'] = data['y'].map({
    'yes':1,
    'no' : 0
})

In [30]:
data.describe()

Unnamed: 0,interest_rate,credit,march,may,previous,duration,y
count,518.0,518.0,518.0,518.0,518.0,518.0,518.0
mean,2.835776,0.034749,0.266409,0.388031,0.127413,382.177606,0.5
std,1.876903,0.183321,0.442508,0.814527,0.333758,344.29599,0.500483
min,0.635,0.0,0.0,0.0,0.0,9.0,0.0
25%,1.04275,0.0,0.0,0.0,0.0,155.0,0.0
50%,1.466,0.0,0.0,0.0,0.0,266.5,0.5
75%,4.9565,0.0,1.0,0.0,0.0,482.75,1.0
max,4.97,1.0,1.0,5.0,1.0,2653.0,1.0


### Declare the dependent and independent variables

Use 'duration' as the independet variable.

In [40]:
y = data['y']
x = data.drop(['y'],axis=1)

### Simple Logistic Regression

Run the regression.

In [41]:
from sklearn.preprocessing import StandardScaler

In [42]:
scaler = StandardScaler()

In [43]:
scaler.fit(x)

StandardScaler()

In [44]:
x_scaled = scaler.transform(x)

In [45]:
new_x = pd.DataFrame(x_scaled,columns=x.columns.values)

In [46]:
new_x 

Unnamed: 0,interest_rate,credit,march,may,previous,duration
0,-0.800908,-0.189737,1.659404,-0.476849,-0.382123,-0.770947
1,-1.103294,-0.189737,-0.602626,1.980938,2.616961,-0.314503
2,1.078467,-0.189737,1.659404,-0.476849,-0.382123,-0.625583
3,0.684886,-0.189737,-0.602626,-0.476849,-0.382123,0.883298
4,1.077401,-0.189737,1.659404,-0.476849,-0.382123,-0.654656
...,...,...,...,...,...,...
513,-0.800908,-0.189737,1.659404,-0.476849,-0.382123,-0.518013
514,-1.053163,-0.189737,-0.602626,1.980938,2.616961,1.232173
515,-1.043563,-0.189737,-0.602626,-0.476849,-0.382123,-0.267987
516,-1.044630,-0.189737,-0.602626,5.667618,2.616961,0.264047


In [47]:
x1 = sm.add_constant(new_x)

In [48]:
x1

Unnamed: 0,const,interest_rate,credit,march,may,previous,duration
0,1.0,-0.800908,-0.189737,1.659404,-0.476849,-0.382123,-0.770947
1,1.0,-1.103294,-0.189737,-0.602626,1.980938,2.616961,-0.314503
2,1.0,1.078467,-0.189737,1.659404,-0.476849,-0.382123,-0.625583
3,1.0,0.684886,-0.189737,-0.602626,-0.476849,-0.382123,0.883298
4,1.0,1.077401,-0.189737,1.659404,-0.476849,-0.382123,-0.654656
...,...,...,...,...,...,...,...
513,1.0,-0.800908,-0.189737,1.659404,-0.476849,-0.382123,-0.518013
514,1.0,-1.053163,-0.189737,-0.602626,1.980938,2.616961,1.232173
515,1.0,-1.043563,-0.189737,-0.602626,-0.476849,-0.382123,-0.267987
516,1.0,-1.044630,-0.189737,-0.602626,5.667618,2.616961,0.264047


In [50]:
l_reg = sm.Logit(y,x1[['const','duration']]).fit()

Optimization terminated successfully.
         Current function value: 0.546118
         Iterations 7


In [51]:
l_reg.summary()

0,1,2,3
Dep. Variable:,y,No. Observations:,518.0
Model:,Logit,Df Residuals:,516.0
Method:,MLE,Df Model:,1.0
Date:,"Mon, 10 Aug 2020",Pseudo R-squ.:,0.2121
Time:,21:20:35,Log-Likelihood:,-282.89
converged:,True,LL-Null:,-359.05
Covariance Type:,nonrobust,LLR p-value:,5.387e-35

0,1,2,3,4,5,6
,coef,std err,z,P>|z|,[0.025,0.975]
const,0.2537,0.114,2.228,0.026,0.030,0.477
duration,1.7584,0.192,9.159,0.000,1.382,2.135


### Find the odds of duration

In [13]:
np.exp(0.0051)

1.005113027136717