# Understanding Logistic Regression Tables

Using the same code as in the previous exercise, try to interpret the summary table.

### More information about the dataset: 
Note that <i> interest rate</i> indicates the 3-month interest rate between banks and <i> duration </i> indicates the time since the last contact was made with a given consumer. The <i> previous </i> variable shows whether the last marketing campaign was successful with this customer. The <i>March</i> and <i> May </i> are Boolean variables that account for when the call was made to the specific customer and <i> credit </i> shows if the customer has enough credit to avoid defaulting.

<i> Notes: 
    <li> the first column of the dataset is an index one; </li>
    <li> you don't need the graph for this exercise; </li>
    <li> the dataset used is much bigger </li>
</i>

## Import the relevant libraries

In [1]:
import numpy as np
import pandas as pd 
import statsmodels.api as sm

## Load the data

Load the ‘Bank_data.csv’ dataset.

In [2]:
raw_data= pd.read_csv('Bank_data.csv')


In [3]:
raw_data

Unnamed: 0.1,Unnamed: 0,interest_rate,credit,march,may,previous,duration,y
0,0,1.334,0.0,1.0,0.0,0.0,117.0,no
1,1,0.767,0.0,0.0,2.0,1.0,274.0,yes
2,2,4.858,0.0,1.0,0.0,0.0,167.0,no
3,3,4.120,0.0,0.0,0.0,0.0,686.0,yes
4,4,4.856,0.0,1.0,0.0,0.0,157.0,no
...,...,...,...,...,...,...,...,...
513,513,1.334,0.0,1.0,0.0,0.0,204.0,no
514,514,0.861,0.0,0.0,2.0,1.0,806.0,yes
515,515,0.879,0.0,0.0,0.0,0.0,290.0,no
516,516,0.877,0.0,0.0,5.0,1.0,473.0,yes


In [5]:
data= raw_data.copy()

In [8]:
data= data.drop(['Unnamed: 0'], axis= 1)

KeyError: "['Unnamed: 0'] not found in axis"

In [9]:
data['y']= data['y'].map({'yes':1, 'no':0})

### Declare the dependent and independent variables

Use 'duration' as the independent variable.

In [11]:
x1=data['duration']
y= data['y']

### Simple Logistic Regression

Run the regression.

In [12]:
x=sm.add_constant(x1)
reg_log= sm.Logit(y,x)
results_log= reg_log.fit()

Optimization terminated successfully.
         Current function value: 0.546118
         Iterations 7


##### Interpretation

La variable dependiente es 'duración'. El modelo utilizado es una regresión Logit (logística en jerga común), mientras que el método - Estimación de máxima verosimilitud (MLE). Claramente ha convergido después de clasificar en 518 observaciones. El pseudo R-cuadrado es 0,21, que se encuentra dentro de la "región aceptable". La variable duración es significativa y su coeficiente es 0,0051. La constante también es significativa y es igual a: -1,70

In [13]:
results_log.summary()

0,1,2,3
Dep. Variable:,y,No. Observations:,518.0
Model:,Logit,Df Residuals:,516.0
Method:,MLE,Df Model:,1.0
Date:,"Sun, 04 Jul 2021",Pseudo R-squ.:,0.2121
Time:,12:05:13,Log-Likelihood:,-282.89
converged:,True,LL-Null:,-359.05
Covariance Type:,nonrobust,LLR p-value:,5.387e-35

0,1,2,3,4,5,6
,coef,std err,z,P>|z|,[0.025,0.975]
const,-1.7001,0.192,-8.863,0.000,-2.076,-1.324
duration,0.0051,0.001,9.159,0.000,0.004,0.006


In [16]:
np.set_printoptions(formatter={'float': lambda x: "{0:0.2f}".format(x)})
results_log.predict()

array([0.25, 0.43, 0.30, 0.86, 0.29, 0.26, 0.22, 0.17, 0.87, 0.32, 0.55,
       0.23, 0.30, 0.39, 0.35, 0.49, 0.29, 0.25, 0.51, 0.83, 0.84, 0.59,
       0.46, 0.40, 0.45, 0.21, 0.42, 0.42, 0.58, 0.26, 0.18, 0.26, 0.86,
       0.43, 0.29, 0.33, 0.98, 0.22, 1.00, 0.63, 0.26, 0.49, 0.31, 0.28,
       0.29, 0.25, 0.20, 0.30, 0.31, 0.47, 0.87, 0.20, 0.22, 0.29, 0.26,
       0.49, 0.28, 0.20, 0.89, 0.54, 0.24, 0.25, 0.90, 0.40, 0.51, 0.46,
       0.28, 0.30, 0.87, 0.24, 0.18, 0.37, 0.47, 0.38, 0.54, 0.26, 0.41,
       0.57, 0.29, 0.21, 0.34, 0.43, 0.30, 0.46, 0.41, 0.97, 0.30, 0.53,
       0.40, 0.23, 0.36, 0.22, 0.37, 0.32, 0.77, 0.81, 0.80, 0.20, 0.98,
       0.30, 0.93, 0.34, 0.89, 0.41, 0.50, 0.50, 0.42, 0.35, 0.42, 0.52,
       0.54, 0.25, 0.31, 0.21, 0.74, 0.18, 0.25, 0.88, 0.60, 0.48, 0.23,
       0.17, 0.40, 0.99, 0.48, 0.96, 0.36, 0.59, 0.34, 0.42, 0.29, 0.33,
       0.48, 0.90, 0.46, 0.68, 0.57, 0.43, 0.43, 0.31, 0.26, 0.23, 0.84,
       0.44, 0.84, 0.65, 0.38, 0.26, 1.00, 0.26, 0.

In [17]:
results_log.pred_table()

array([[204.00, 55.00],
       [104.00, 155.00]])

In [21]:
mat_conf= pd.DataFrame(results_log.pred_table())
mat_conf.columns= ['Valor predicho (0)', 'Valor predicho 1']
mat_conf= mat_conf.rename(index= {0:'Actual 0', 1: 'Actual 1'})
mat_conf

Unnamed: 0,Valor predicho (0),Valor predicho 1
Actual 0,204.0,55.0
Actual 1,104.0,155.0


In [24]:
precision= 359/518
precision

0.693050193050193

## Expandiendo el modelo

In [26]:
#Para evitar escribirlos cada vez, guardamos los nombres de los estimadores de nuestro modelo en una lista. 
estimators=['interest_rate','march','credit','previous','duration']

X1 = data[estimators]
y = data['y']

In [27]:
X = sm.add_constant(X1)
reg_logit = sm.Logit(y,X)
results_logit = reg_logit.fit()
results_logit.summary2()

Optimization terminated successfully.
         Current function value: 0.336664
         Iterations 7


0,1,2,3
Model:,Logit,Pseudo R-squared:,0.514
Dependent Variable:,y,AIC:,360.7836
Date:,2021-07-04 19:40,BIC:,386.2834
No. Observations:,518,Log-Likelihood:,-174.39
Df Model:,5,LL-Null:,-359.05
Df Residuals:,512,LLR p-value:,1.2114e-77
Converged:,1.0000,Scale:,1.0
No. Iterations:,7.0000,,

0,1,2,3,4,5,6
,Coef.,Std.Err.,z,P>|z|,[0.025,0.975]
const,-0.0211,0.3113,-0.0677,0.9460,-0.6313,0.5891
interest_rate,-0.8001,0.0895,-8.9434,0.0000,-0.9755,-0.6248
march,-1.8322,0.3297,-5.5563,0.0000,-2.4785,-1.1859
credit,2.3585,1.0875,2.1688,0.0301,0.2271,4.4900
previous,1.5363,0.5010,3.0666,0.0022,0.5544,2.5182
duration,0.0070,0.0007,9.3810,0.0000,0.0055,0.0084


In [28]:
results_logit.pred_table()

array([[218.00, 41.00],
       [30.00, 229.00]])

In [29]:
mat_conf= pd.DataFrame(results_logit.pred_table())
mat_conf.columns= ['Valor predicho (0)', 'Valor predicho 1']
mat_conf= mat_conf.rename(index= {0:'Actual 0', 1: 'Actual 1'})
mat_conf

Unnamed: 0,Valor predicho (0),Valor predicho 1
Actual 0,218.0,41.0
Actual 1,30.0,229.0


#### Podemos ver que el modelo mejora

In [31]:
precision= (218+229)/518
precision 

0.862934362934363

In [33]:
(229+218)-518

-71