<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Understanding-Logistic-Regression-Tables" data-toc-modified-id="Understanding-Logistic-Regression-Tables-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Understanding Logistic Regression Tables</a></span><ul class="toc-item"><li><ul class="toc-item"><li><span><a href="#More-information-about-the-dataset:" data-toc-modified-id="More-information-about-the-dataset:-1.0.1"><span class="toc-item-num">1.0.1&nbsp;&nbsp;</span>More information about the dataset:</a></span></li></ul></li><li><span><a href="#Import-the-relevant-libraries" data-toc-modified-id="Import-the-relevant-libraries-1.1"><span class="toc-item-num">1.1&nbsp;&nbsp;</span>Import the relevant libraries</a></span></li><li><span><a href="#Load-the-data" data-toc-modified-id="Load-the-data-1.2"><span class="toc-item-num">1.2&nbsp;&nbsp;</span>Load the data</a></span><ul class="toc-item"><li><span><a href="#Declare-the-dependent-and-independent-variables" data-toc-modified-id="Declare-the-dependent-and-independent-variables-1.2.1"><span class="toc-item-num">1.2.1&nbsp;&nbsp;</span>Declare the dependent and independent variables</a></span></li><li><span><a href="#Simple-Logistic-Regression" data-toc-modified-id="Simple-Logistic-Regression-1.2.2"><span class="toc-item-num">1.2.2&nbsp;&nbsp;</span>Simple Logistic Regression</a></span></li><li><span><a href="#Interpretation" data-toc-modified-id="Interpretation-1.2.3"><span class="toc-item-num">1.2.3&nbsp;&nbsp;</span>Interpretation</a></span></li></ul></li></ul></li></ul></div>

# Understanding Logistic Regression Tables

Using the same code as in the previous exercise, try to interpret the summary table.

### More information about the dataset: 
Note that <i> interest rate</i> indicates the 3-month interest rate between banks and <i> duration </i> indicates the time since the last contact was made with a given consumer. The <i> previous </i> variable shows whether the last marketing campaign was successful with this customer. The <i>March</i> and <i> May </i> are Boolean variables that account for when the call was made to the specific customer and <i> credit </i> shows if the customer has enough credit to avoid defaulting.

<i> Notes: 
    <li> the first column of the dataset is an index one; </li>
    <li> you don't need the graph for this exercise; </li>
    <li> the dataset used is much bigger </li>
</i>

## Import the relevant libraries

In [1]:
import numpy as np
import pandas as pd
import scipy
import statsmodels.api as sm
import sklearn

# plotting
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
sns.set(color_codes=True)
sns.set()

# seaborn warnings ignore
import warnings
warnings.filterwarnings('ignore')

# Jupyter notebook settings for pandas
pd.set_option('display.float_format', '{:,.2f}'.format) # numbers sep by comma
pd.set_option('display.max_rows', 100) # None for all the rows
pd.set_option('display.max_colwidth', 100)

## Load the data

Load the ‘Bank_data.csv’ dataset.

In [2]:
!ls ../data/csv/

Bank-data.csv                        real_estate_price_size.csv           real_estate_price_size_year_view.csv
Example-bank-data.csv                real_estate_price_size_year.csv


In [3]:
df = pd.read_csv('../data/csv/Bank-data.csv',index_col=0)
df.head()

Unnamed: 0,interest_rate,credit,march,may,previous,duration,y
0,1.33,0.0,1.0,0.0,0.0,117.0,no
1,0.77,0.0,0.0,2.0,1.0,274.0,yes
2,4.86,0.0,1.0,0.0,0.0,167.0,no
3,4.12,0.0,0.0,0.0,0.0,686.0,yes
4,4.86,0.0,1.0,0.0,0.0,157.0,no


### Declare the dependent and independent variables

Use 'duration' as the independent variable.

In [4]:
df['y'] = df['y'].map({'yes':1, 'no':0})
y = df['y']
x1 = df['duration']

### Simple Logistic Regression

Run the regression.

In [5]:
x = sm.add_constant(x1)
results = sm.Logit(y,x).fit()
results.summary()

Optimization terminated successfully.
         Current function value: 0.546118
         Iterations 7


0,1,2,3
Dep. Variable:,y,No. Observations:,518.0
Model:,Logit,Df Residuals:,516.0
Method:,MLE,Df Model:,1.0
Date:,"Thu, 27 Dec 2018",Pseudo R-squ.:,0.2121
Time:,11:20:09,Log-Likelihood:,-282.89
converged:,True,LL-Null:,-359.05
,,LLR p-value:,5.387e-35

0,1,2,3,4,5,6
,coef,std err,z,P>|z|,[0.025,0.975]
const,-1.7001,0.192,-8.863,0.000,-2.076,-1.324
duration,0.0051,0.001,9.159,0.000,0.004,0.006


### Interpretation

```
Method: MLE  (maximum likelihood estimation)  
Model:  Logit   or logistic regression  
num of iterations: 518  
Log-Likelihood: -282.89   small negative number is better
Pseudo R-squared:  0.21   According to McFadden  0.2 < good Pseudo R-squared < 0.4

LLR p-value:	5.387e-35  very small value, the dependent variables are useful.


coef cost: -1.70 P>z is significant. we want 3 zeros.
coef duration:  0.0051  P>z is significant. we look for 3 zeros.

```