<blockquote>
    <h1>Exercise 5.6</h1>
    <p>We continue to consider the use of a logistic regression model to predict the probability of $\mathrm{default}$ using $\mathrm{income}$ and $\mathrm{balance}$ on the <code>Default</code> data set. In particular, we will now compute estimates for the standard errors of the $\mathrm{income}$ and $\mathrm{balance}$ logistic regression coefficients in two different ways: (1) using the bootstrap, and (2) using the standard formula for computing the standard errors in the <code>glm()</code> function. Do not forget to set a random seed before beginning your analysis.</p>
    <ol>
        <li>Using the <code>summary()</code> and <code>glm()</code> functions, determine the estimated standard errors for the coefficients associated with $\mathrm{income}$ and $\mathrm{balance}$ in a multiple logistic regression model that uses both predictors.</li>
        <li>Write a function, <code>boot.fn()</code>, that takes as input the <code>Default</code> data set as well as an index of the observations, and that outputs the coefficient estimates for $\mathrm{income}$ and $\mathrm{balance}$ in the multiple logistic regression model.</li>
        <li>Use the <code>boot()</code> function together with your <code>boot.fn()</code> function to estimate the standard errors of the logistic regression coefficients for $\mathrm{income}$ and $\mathrm{balance}$.</li>
        <li>Comment on the estimated standard errors obtained using the <code>glm()</code> function and using your bootstrap function.</li>
    </ol>
</blockquote>

In [1]:
import pandas as pd
import numpy as np

# https://stackoverflow.com/questions/34398054/ipython-notebook-cell-multiple-outputs
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"

from sklearn.utils import resample
import statsmodels.api as sm

<h3>Exercise 5.6.1</h3>
<blockquote>
    <i>Using the <code>summary()</code> and <code>glm()</code> functions, determine the estimated standard errors for the coefficients associated with $\mathrm{income}$ and $\mathrm{balance}$ in a multiple logistic regression model that uses both predictors.</i>
</blockquote>

In [2]:
df = pd.read_csv("../../DataSets/Default/Default.csv")
df['default01'] = np.where(df['default'] == 'Yes', 1, 0)
df.insert(0, 'Intercept', 1)
targetColumn = ['default01']
descriptiveColumns = ['Intercept', 'balance', 'income']
df_X = df[descriptiveColumns]
df_Y = df[targetColumn]
model = sm.Logit(df_Y, df_X)
fitted = model.fit()
fitted.summary()

Optimization terminated successfully.
         Current function value: 0.078948
         Iterations 10


0,1,2,3
Dep. Variable:,default01,No. Observations:,10000.0
Model:,Logit,Df Residuals:,9997.0
Method:,MLE,Df Model:,2.0
Date:,"Sun, 12 Jan 2020",Pseudo R-squ.:,0.4594
Time:,22:24:19,Log-Likelihood:,-789.48
converged:,True,LL-Null:,-1460.3
Covariance Type:,nonrobust,LLR p-value:,4.541e-292

0,1,2,3,4,5,6
,coef,std err,z,P>|z|,[0.025,0.975]
Intercept,-11.5405,0.435,-26.544,0.000,-12.393,-10.688
balance,0.0056,0.000,24.835,0.000,0.005,0.006
income,2.081e-05,4.99e-06,4.174,0.000,1.1e-05,3.06e-05


<h3>Exercise 5.6.2</h3>
<blockquote>
    <i>Write a function, <code>boot.fn()</code>, that takes as input the <code>Default</code> data set as well as an index of the observations, and that outputs the coefficient estimates for $\mathrm{income}$ and $\mathrm{balance}$ in the multiple logistic regression model.</i>
</blockquote>

In [3]:
def boot_fn(df, mask=None):
    targetColumn = ['default01']
    descriptiveColumns = ['Intercept', 'balance', 'income']
    if mask is None:
        df_X = df[descriptiveColumns]
        df_Y = df[targetColumn]
    else:
        df_X = df[mask][descriptiveColumns]
        df_Y = df[mask][targetColumn]

    model = sm.Logit(df_Y, df_X)
    fitted = model.fit()
    return fitted.params
    
boot_fn(df=df, mask=df['student'] == 'Yes')
boot_fn(df=df, mask=df['student'] == 'No')
boot_fn(df=df, mask=None)

Optimization terminated successfully.
         Current function value: 0.095624
         Iterations 10


Intercept   -11.507699
balance       0.005598
income        0.000016
dtype: float64

Optimization terminated successfully.
         Current function value: 0.071431
         Iterations 10


Intercept   -10.936996
balance       0.005818
income        0.000002
dtype: float64

Optimization terminated successfully.
         Current function value: 0.078948
         Iterations 10


Intercept   -11.540468
balance       0.005647
income        0.000021
dtype: float64

<h3>Exercise 5.6.3</h3>
<blockquote>
    <i>Use the <code>boot()</code> function together with your <code>boot.fn()</code> function to estimate the standard errors of the logistic regression coefficients for $\mathrm{income}$ and $\mathrm{balance}$.</i>
</blockquote>

In [4]:
sample_size = int(1.0 * df.shape[0])
B = 1000

df_std_err_coef = pd.DataFrame(np.empty((B, 3)), columns=descriptiveColumns)

for r in range(0, B):
    boot = resample(df, replace=True, n_samples=sample_size, random_state=r)
    df_std_err_coef.iloc[r] = boot_fn(df=boot)

average_intercept = df_std_err_coef["Intercept"].mean()
average_balance = df_std_err_coef["balance"].mean()
average_income = df_std_err_coef["income"].mean()

std_err_intercept = ((1/(B-1))*np.sum((df_std_err_coef['Intercept'] - average_intercept)**2))**0.5
std_err_balance = ((1/(B-1))*np.sum((df_std_err_coef['balance'] - average_balance)**2))**0.5
std_err_income = ((1/(B-1))*np.sum((df_std_err_coef['income'] - average_income)**2))**0.5

Optimization terminated successfully.
         Current function value: 0.077812
         Iterations 10
Optimization terminated successfully.
         Current function value: 0.078583
         Iterations 10
Optimization terminated successfully.
         Current function value: 0.080187
         Iterations 10
Optimization terminated successfully.
         Current function value: 0.079723
         Iterations 10
Optimization terminated successfully.
         Current function value: 0.077127
         Iterations 10
Optimization terminated successfully.
         Current function value: 0.075327
         Iterations 10
Optimization terminated successfully.
         Current function value: 0.080034
         Iterations 10
Optimization terminated successfully.
         Current function value: 0.083545
         Iterations 10
Optimization terminated successfully.
         Current function value: 0.078794
         Iterations 10
Optimization terminated successfully.
         Current function value: 0.

Optimization terminated successfully.
         Current function value: 0.075462
         Iterations 10
Optimization terminated successfully.
         Current function value: 0.080212
         Iterations 10
Optimization terminated successfully.
         Current function value: 0.074971
         Iterations 10
Optimization terminated successfully.
         Current function value: 0.076399
         Iterations 10
Optimization terminated successfully.
         Current function value: 0.081344
         Iterations 10
Optimization terminated successfully.
         Current function value: 0.077512
         Iterations 10
Optimization terminated successfully.
         Current function value: 0.079328
         Iterations 10
Optimization terminated successfully.
         Current function value: 0.079527
         Iterations 10
Optimization terminated successfully.
         Current function value: 0.081869
         Iterations 10
Optimization terminated successfully.
         Current function value: 0.

Optimization terminated successfully.
         Current function value: 0.078774
         Iterations 10
Optimization terminated successfully.
         Current function value: 0.076891
         Iterations 10
Optimization terminated successfully.
         Current function value: 0.080004
         Iterations 10
Optimization terminated successfully.
         Current function value: 0.078037
         Iterations 10
Optimization terminated successfully.
         Current function value: 0.087092
         Iterations 10
Optimization terminated successfully.
         Current function value: 0.076549
         Iterations 10
Optimization terminated successfully.
         Current function value: 0.082131
         Iterations 10
Optimization terminated successfully.
         Current function value: 0.079630
         Iterations 10
Optimization terminated successfully.
         Current function value: 0.083663
         Iterations 10
Optimization terminated successfully.
         Current function value: 0.

Optimization terminated successfully.
         Current function value: 0.079127
         Iterations 10
Optimization terminated successfully.
         Current function value: 0.077483
         Iterations 10
Optimization terminated successfully.
         Current function value: 0.078241
         Iterations 10
Optimization terminated successfully.
         Current function value: 0.086108
         Iterations 10
Optimization terminated successfully.
         Current function value: 0.079336
         Iterations 10
Optimization terminated successfully.
         Current function value: 0.081105
         Iterations 10
Optimization terminated successfully.
         Current function value: 0.075999
         Iterations 10
Optimization terminated successfully.
         Current function value: 0.077299
         Iterations 10
Optimization terminated successfully.
         Current function value: 0.077553
         Iterations 10
Optimization terminated successfully.
         Current function value: 0.

Optimization terminated successfully.
         Current function value: 0.080222
         Iterations 10
Optimization terminated successfully.
         Current function value: 0.080017
         Iterations 10
Optimization terminated successfully.
         Current function value: 0.084344
         Iterations 10
Optimization terminated successfully.
         Current function value: 0.075108
         Iterations 10
Optimization terminated successfully.
         Current function value: 0.082855
         Iterations 10
Optimization terminated successfully.
         Current function value: 0.079417
         Iterations 10
Optimization terminated successfully.
         Current function value: 0.080676
         Iterations 10
Optimization terminated successfully.
         Current function value: 0.082114
         Iterations 10
Optimization terminated successfully.
         Current function value: 0.075498
         Iterations 10
Optimization terminated successfully.
         Current function value: 0.

Optimization terminated successfully.
         Current function value: 0.082248
         Iterations 10
Optimization terminated successfully.
         Current function value: 0.077668
         Iterations 10
Optimization terminated successfully.
         Current function value: 0.079869
         Iterations 10
Optimization terminated successfully.
         Current function value: 0.073327
         Iterations 10
Optimization terminated successfully.
         Current function value: 0.078647
         Iterations 10
Optimization terminated successfully.
         Current function value: 0.079661
         Iterations 10
Optimization terminated successfully.
         Current function value: 0.078475
         Iterations 10
Optimization terminated successfully.
         Current function value: 0.074128
         Iterations 10
Optimization terminated successfully.
         Current function value: 0.080918
         Iterations 10
Optimization terminated successfully.
         Current function value: 0.

Optimization terminated successfully.
         Current function value: 0.079088
         Iterations 10
Optimization terminated successfully.
         Current function value: 0.082489
         Iterations 10
Optimization terminated successfully.
         Current function value: 0.079605
         Iterations 10
Optimization terminated successfully.
         Current function value: 0.083251
         Iterations 10
Optimization terminated successfully.
         Current function value: 0.077364
         Iterations 10
Optimization terminated successfully.
         Current function value: 0.082389
         Iterations 10
Optimization terminated successfully.
         Current function value: 0.085143
         Iterations 10
Optimization terminated successfully.
         Current function value: 0.085427
         Iterations 10
Optimization terminated successfully.
         Current function value: 0.082297
         Iterations 10
Optimization terminated successfully.
         Current function value: 0.

Optimization terminated successfully.
         Current function value: 0.079886
         Iterations 10
Optimization terminated successfully.
         Current function value: 0.075958
         Iterations 10
Optimization terminated successfully.
         Current function value: 0.083846
         Iterations 10
Optimization terminated successfully.
         Current function value: 0.078687
         Iterations 10
Optimization terminated successfully.
         Current function value: 0.077125
         Iterations 10
Optimization terminated successfully.
         Current function value: 0.078577
         Iterations 10
Optimization terminated successfully.
         Current function value: 0.079647
         Iterations 10
Optimization terminated successfully.
         Current function value: 0.080944
         Iterations 10
Optimization terminated successfully.
         Current function value: 0.075906
         Iterations 10
Optimization terminated successfully.
         Current function value: 0.

Optimization terminated successfully.
         Current function value: 0.078713
         Iterations 10
Optimization terminated successfully.
         Current function value: 0.079765
         Iterations 10
Optimization terminated successfully.
         Current function value: 0.080668
         Iterations 10
Optimization terminated successfully.
         Current function value: 0.073143
         Iterations 10
Optimization terminated successfully.
         Current function value: 0.085271
         Iterations 10
Optimization terminated successfully.
         Current function value: 0.076049
         Iterations 10
Optimization terminated successfully.
         Current function value: 0.079965
         Iterations 10
Optimization terminated successfully.
         Current function value: 0.080447
         Iterations 10
Optimization terminated successfully.
         Current function value: 0.078354
         Iterations 10
Optimization terminated successfully.
         Current function value: 0.

Optimization terminated successfully.
         Current function value: 0.074234
         Iterations 10
Optimization terminated successfully.
         Current function value: 0.077402
         Iterations 10
Optimization terminated successfully.
         Current function value: 0.082419
         Iterations 10
Optimization terminated successfully.
         Current function value: 0.081452
         Iterations 10
Optimization terminated successfully.
         Current function value: 0.070656
         Iterations 10
Optimization terminated successfully.
         Current function value: 0.077123
         Iterations 10
Optimization terminated successfully.
         Current function value: 0.071773
         Iterations 10
Optimization terminated successfully.
         Current function value: 0.077205
         Iterations 10
Optimization terminated successfully.
         Current function value: 0.069131
         Iterations 10
Optimization terminated successfully.
         Current function value: 0.

Optimization terminated successfully.
         Current function value: 0.082022
         Iterations 10
Optimization terminated successfully.
         Current function value: 0.084066
         Iterations 10
Optimization terminated successfully.
         Current function value: 0.075211
         Iterations 10
Optimization terminated successfully.
         Current function value: 0.082116
         Iterations 10
Optimization terminated successfully.
         Current function value: 0.074498
         Iterations 10
Optimization terminated successfully.
         Current function value: 0.074706
         Iterations 10
Optimization terminated successfully.
         Current function value: 0.081196
         Iterations 10
Optimization terminated successfully.
         Current function value: 0.085292
         Iterations 10
Optimization terminated successfully.
         Current function value: 0.080787
         Iterations 10
Optimization terminated successfully.
         Current function value: 0.

Optimization terminated successfully.
         Current function value: 0.085858
         Iterations 10
Optimization terminated successfully.
         Current function value: 0.076276
         Iterations 10
Optimization terminated successfully.
         Current function value: 0.081076
         Iterations 10
Optimization terminated successfully.
         Current function value: 0.073877
         Iterations 10
Optimization terminated successfully.
         Current function value: 0.081621
         Iterations 10
Optimization terminated successfully.
         Current function value: 0.076858
         Iterations 10
Optimization terminated successfully.
         Current function value: 0.075676
         Iterations 10
Optimization terminated successfully.
         Current function value: 0.080904
         Iterations 10
Optimization terminated successfully.
         Current function value: 0.075744
         Iterations 10
Optimization terminated successfully.
         Current function value: 0.

In [5]:
'''The standard errors of the coefficients associated with the intercept, balance 
and income variables, respectively, are'''
std_err_intercept, std_err_balance, std_err_income

'The standard errors of the coefficients associated with the intercept, balance \nand income variables, respectively, are'

(0.4453142214860075, 0.00023171377272598035, 4.935330018674955e-06)

<h3>Exercise 5.6.4</h3>
<blockquote>
    <i>Comment on the estimated standard errors obtained using the <code>glm()</code> function and using your bootstrap function.</i>
</blockquote>

<p>We see that the bootstrapping estimates of the standard error for each coefficient is very close to the estimate yielded by the <code>sm.Logit()</code> function.</p>