## Heteroscedasticity

The homoscedasticity assumptions SLR.5 for the simple linear regression model and MLR.5 for the multiple linear regression model require that the variance of the error terms is unrelated to the regresssors, i.e.

$$Var(u|x_1, x_2, ..., x_k) = \sigma^2$$

Unbaisedness and consistency (Theorems 3.1, 5.1) do not depend on this assumption, but the sampling distribution (Theorems 3.2, 4.1, 5.2) does. If homoscedasticity is violated, the standard errors are invalid and all inferences form *t*, *F*, and other tests based on them are unreliable. Also the (asymptotic) efficiency of the OLS (Theorems 3.4, 5.3) depends on homoscedasticity. Generally, homoscedasticity is difficult to justify from theory. Different kinds of individiuals might have different amounts of unobserved influences in ways that depend on regressors.

We cover three topics in this notebook: First, we show how the formula of the estimated variance-covariance can be adjusted so it does not require homoscedasticity. In this way, we can use OLS to get unbaised and consistent parameter estimates and draw inference from valid standard errors and tests. Next, we present tests for the existence of heteroscedasticity. Finally, we discusses weighted least squares (WLS) as an alternative to OLS. This estimator can be more efficient in the presence fo heteroscedasticity. 


**Topics:**

1. Heteroscedasticity - Robust Inference  
2. Heteroscedasticity Tests  
3. Weighted Least Squares  

### 1. Heteroscedasticity - Robust Inference

Wooldridge (2019, Section 8.2) presents formulas for heteroscedasticity-robust standard errors. In **statsmodels**, an easy way to do these calculations is to make use of the argument **cov_type** in the method **fit()**. The argument **cov_type** can produce several refined versions of the White formula presented by Wooldridge (2019).

If the regression model obtained by **ols** is stored in the variable **reg**, the variance-covariance matrix can be calculated using

- **reg.fit(cov_type='nonrobust')** or **reg.fit()** for the default homoscedasticity-based standard errors.
- **reg.fit(cov_type='HC0')** for the classical version of White's robust variance-covariance matrix presented by Wooldridge (2019, Equation 8.4 in Section 8.2).
- **reg.fit(cov_type='HC1')** for a version of White's robust variance-covariance matrix corrected by degree of freedom.
- **reg.fit(cov_type='HC2')** for a version with a small sample correction. This is the default behaviro of Stata.
- **reg.fit(cov_type='HC3')** for the refined version of White's' robust variance-covariance matrix.

Regression tables with coefficents, standard errors, *t* statistics, and their *p* values are based on the specified method of variance-covariance estimation. To perform *F* tests of a joint hypothesis for an estimated model the syntax is the same as presented in the previous notebooks.

#### Wooldridge, Example 8.2: Heteroscedasticity - Robust Inference

We use two example to demonstrate these commands. **results_default** and **results_white** use the usual standard error and the classical White standard errors respectively. This reproduces standard errors reported in Wooldridge (2019).

In [1]:
# Import dependencies
import wooldridge as woo
import pandas as pd 
import statsmodels.formula.api as smf

In [2]:
# Import data set 'gpa3'
gpa3 = woo.dataWoo('gpa3')

In [3]:
# Define regression model
reg = smf.ols(formula = 'cumgpa ~ sat + hsperc + tothrs + female + black + white',
             data = gpa3, subset = gpa3['spring'] == 1)

In [4]:
# Estimate default model (only for spring data)
results_default = reg.fit()

# Print regression table
table_default = pd.DataFrame({'Betas': round(results_default.params, 4),
                     'Standarde Errors': round(results_default.bse, 4),
                     't Statistics': round(results_default.tvalues, 4),
                     'p Value': round(results_default.pvalues, 4)})
print(f'Regression Table (Default SE): \n{table_default}\n')

Regression Table (Default SE): 
            Betas  Standarde Errors  t Statistics  p Value
Intercept  1.4701            0.2298        6.3971   0.0000
sat        0.0011            0.0002        6.3885   0.0000
hsperc    -0.0086            0.0012       -6.9060   0.0000
tothrs     0.0025            0.0007        3.4255   0.0007
female     0.3034            0.0590        5.1412   0.0000
black     -0.1283            0.1474       -0.8705   0.3846
white     -0.0587            0.1410       -0.4165   0.6773



In [5]:
# Estimate model with White SE (only for spring data)
results_white = reg.fit(cov_type = 'HC0')

# Print regression table
table_white = pd.DataFrame({'Betas': round(results_white.params, 4),
                     'Standarde Errors': round(results_white.bse, 4),
                     't Statistics': round(results_white.tvalues, 4),
                     'p Value': round(results_white.pvalues, 4)})
print(f'Regression Table (Classical White SE): \n{table_white}\n')

Regression Table (Classical White SE): 
            Betas  Standarde Errors  t Statistics  p Value
Intercept  1.4701            0.2186        6.7261   0.0000
sat        0.0011            0.0002        6.0136   0.0000
hsperc    -0.0086            0.0014       -6.1001   0.0000
tothrs     0.0025            0.0007        3.4136   0.0006
female     0.3034            0.0586        5.1807   0.0000
black     -0.1283            0.1181       -1.0863   0.2774
white     -0.0587            0.1103       -0.5323   0.5945



In [6]:
# Estimate model with refined White SE (only for spring data)
results_refined = reg.fit(cov_type = 'HC3')

# Print regression table
table_refined = pd.DataFrame({'Betas': round(results_refined.params, 4),
                     'Standarde Errors': round(results_refined.bse, 4),
                     't Statistics': round(results_refined.tvalues, 4),
                     'p Value': round(results_refined.pvalues, 4)})
print(f'Regression Table (Refined White SE): \n{table_refined}\n')

Regression Table (Refined White SE): 
            Betas  Standarde Errors  t Statistics  p Value
Intercept  1.4701            0.2294        6.4089   0.0000
sat        0.0011            0.0002        5.8402   0.0000
hsperc    -0.0086            0.0014       -5.9341   0.0000
tothrs     0.0025            0.0007        3.3418   0.0008
female     0.3034            0.0600        5.0539   0.0000
black     -0.1283            0.1282       -1.0007   0.3170
white     -0.0587            0.1204       -0.4876   0.6258



For the *F* tests, three versions are calculated and diplayed. The results generally do not differ a lot between the different versions. This is an indication that heteroscedasticity might not be a big issue in this example. To be sure, we would like to have a formal test as discussed in the next section.

In [7]:
# Import dependencies
import wooldridge as woo
import statsmodels.formula.api as smf

In [8]:
# Import data set 'gpa3'
gpa3 = woo.dataWoo('gpa3')

In [9]:
# Define regression model
reg = smf.ols(formula = 'cumgpa ~ sat + hsperc + tothrs + female + black + white',
             data = gpa3, subset = gpa3['spring'] == 1)

In [10]:
# Define hypotheses for testing
hypotheses = ['black = 0', 'white = 0']

In [11]:
# F-test using differetn variance-covariance formulas
# Default VCOV:
results_default = reg.fit()
ftest_default = results_default.f_test(hypotheses)
fstat_default = ftest_default.statistic[0][0]
fpval_default = ftest_default.pvalue
print(f'F Statistic: (Default VCOV) \n{fstat_default}\n')
print(f'F Test p-value: (Default VCOV) \n{fpval_default}\n')

F Statistic: (Default VCOV) 
0.6796041956073353

F Test p-value: (Default VCOV) 
0.5074683622584049



In [12]:
# Classical White VCOV:
results_hc0 = reg.fit(cov_type = 'HC0')
ftest_hc0 = results_hc0.f_test(hypotheses)
fstat_hc0 = ftest_hc0.statistic[0][0]
fpval_hc0 = ftest_hc0.pvalue
print(f'F Statistic: (Classical White VCOV) \n{fstat_hc0}\n')
print(f'F Test p-value: (Classical White VCOV) \n{fpval_hc0}\n')

F Statistic: (Classical White VCOV) 
0.7477969818036153

F Test p-value: (Classical White VCOV) 
0.4741442714738484



In [13]:
# Refined White VCOV:
results_hc3 = reg.fit(cov_type = 'HC3')
ftest_hc3 = results_hc3.f_test(hypotheses)
fstat_hc3 = ftest_hc3.statistic[0][0]
fpval_hc3 = ftest_hc3.pvalue
print(f'F Statistic: (Refined White VCOV) \n{fstat_hc3}\n')
print(f'F Test p-value: (Refined White VCOV) \n{fpval_hc3}\n')

F Statistic: (Refined White VCOV) 
0.6724692957656578

F Test p-value: (Refined White VCOV) 
0.5110883633440992



### 2. Heteroscedasticity Tests

The Breusch-pagan (BP) test for heteroscedasticity is easy to implement with basic OLS routines. After a model

$$y=\beta_0+\beta_1x_1+...+\beta_kx_k+u$$

is estimated, we obtain the residuals $\hat{u_i}$ for all observations *i* = 1, 2, 3, ..., *n*. We regress their squared value on all independent variables from the orginal equation. We can either look at the standard *F* test of overall significance preinted for example by the **summary()** method. Or we can use an *LM* test by multiplying the $R^2$ from the second regression with the number of observations.

In **statsmodels**, this is easily done. Remember that the residuals from a regression are saved as **resid** in the result object that is returned by **fit()**. Their squared value can be stored in a new variable to be used as a dependent variable in the second stage.

The *LM* version of the BP test is even more convenient to use with the **statsmodels** function **stats.diagnostic.het_breuschpagan()**. It can be used directly as demonstrated in our example to compute the test statistic and corresponding *p* value.

#### Wooldrdge, Example 8.4: Heteroscedasticity in a Housing Price Equation

We implement the *F* and *LM* versions of the BP test. The command **stats.diagnostic.het_breuschpagan()** simply takes the regression residuals and the regressor matrix as an argument and delivers a test statistic of *LM* = 14.09. The corresponding *p* value is smaller than 0.003 so we reject homoscedasticity for all reasonable signficance levels.

The output also shows the manual implementation of a second stage regression where we regress squared residuals on the independent variables. We can directly interpret the reported *F* statistic of 5.34 and its *p* value of 0.002 as the *F* version of the BP test. We can manually calculate the *LM* statistic by multiplying the reported $R^2$ = 0.16 with the number of observation *n* = 88.

We replicate the test for an alternative model with logarithms discussed by Wooldridge (2019) together with the White test in the example.

In [14]:
# Import dependencies
import wooldridge as woo
import pandas as pd 
import statsmodels.api as sm
import statsmodels.formula.api as smf
import patsy as pt

In [15]:
# Import the data set 'hprice1'
hprice1 = woo.dataWoo('hprice1')

In [16]:
# Estimate model:
reg = smf.ols(formula = 'price ~ lotsize + sqrft + bdrms',
             data = hprice1)
results = reg.fit()

# Print regression table
table_results = pd.DataFrame({'Betas': round(results.params, 4),
                     'Standarde Errors': round(results.bse, 4),
                     't Statistics': round(results.tvalues, 4),
                     'p Value': round(results.pvalues, 4)})
print(f'Regression Table (Default SE): \n{table_results}\n')

Regression Table (Default SE): 
             Betas  Standarde Errors  t Statistics  p Value
Intercept -21.7703           29.4750       -0.7386   0.4622
lotsize     0.0021            0.0006        3.2201   0.0018
sqrft       0.1228            0.0132        9.2751   0.0000
bdrms      13.8525            9.0101        1.5374   0.1279



In [17]:
# Automatic BP test (LM version)
y, X = pt.dmatrices('price ~ lotsize + sqrft + bdrms',
                   data = hprice1, return_type = 'dataframe')
result_bp_lm = sm.stats.diagnostic.het_breuschpagan(results.resid, X)
bp_lm_statistic = result_bp_lm[0]
bp_lm_pval = result_bp_lm[1]
print(f'BP Test LM Statistic: {bp_lm_statistic}\n')
print(f'BP Test LM p-value: {bp_lm_pval}\n')

BP Test LM Statistic: 14.092385504350242

BP Test LM p-value: 0.0027820595556890867



In [18]:
# Manual BP test (F version)
hprice1['resid_sq'] = results.resid ** 2
reg_resid = smf.ols(formula = 'resid_sq ~ lotsize + sqrft + bdrms',
                   data = hprice1)
results_resid = reg_resid.fit()
bp_F_statistic = results_resid.fvalue
bp_F_pval = results_resid.f_pvalue
print(f'BP Test F Statistic: {bp_F_statistic}\n')
print(f'BP Test p-value: {bp_F_pval}\n')

BP Test F Statistic: 5.338919363241419

BP Test p-value: 0.0020477444209360787



The White test is a variant of the BP test where in the second stage, we do not regress the squared first-stage residuals on the original regressors only. Instead, we add interactions and polynomials of them or include the fitted value $\hat{y}$ and $\hat{y}^2$. This can easily be done in a manual second-stage regression remembering that the fitted values are stored in the regression results object as **fittedvalues**.

Conveniently, we can also use the **stats.diagnostic.het_breuschpagan()** command to do the calculations of the *LM* version of hte test including the *p* values automatically. All we have to do is to explain that in the second stage we want to a different set of regressors.

#### Wooldridge, Example 8.5: BP and White test in the Log Housing Price Equation

We implements the BP and the White test for a model that now contains logarithms of the dependent variable and two independent variables. The LM versions of both the BP and the White test do not reject the null hypothesis at conventional signficiance levels with *p* values of 0.238 and 0.178, respectively.

In [19]:
# Import dependencies
import wooldridge as woo
import numpy as np
import pandas as pd 
import statsmodels.api as sm
import statsmodels.formula.api as smf
import patsy as pt

In [20]:
# Import data set 'hprice1'
hprice1 = woo.dataWoo('hprice1')

In [21]:
# Estimate regression model
reg = smf.ols(formula = 'np.log(price) ~ np.log(lotsize) + np.log(sqrft) + bdrms',
             data = hprice1)
results = reg.fit()

# Print regression table
table_results = pd.DataFrame({'Betas': round(results.params, 4),
                     'Standarde Errors': round(results.bse, 4),
                     't Statistics': round(results.tvalues, 4),
                     'p Value': round(results.pvalues, 4)})
print(f'Regression Table (Default SE): \n{table_results}\n')

Regression Table (Default SE): 
                  Betas  Standarde Errors  t Statistics  p Value
Intercept       -1.2970            0.6513       -1.9915   0.0497
np.log(lotsize)  0.1680            0.0383        4.3877   0.0000
np.log(sqrft)    0.7002            0.0929        7.5403   0.0000
bdrms            0.0370            0.0275        1.3424   0.1831



In [22]:
# Automatic BP test (LM version)
y, X_bp = pt.dmatrices('np.log(price) ~ np.log(lotsize) + np.log(sqrft) + bdrms',
                   data = hprice1, return_type = 'dataframe')
result_bp_lm = sm.stats.diagnostic.het_breuschpagan(results.resid, X_bp)
bp_lm_statistic = result_bp_lm[0]
bp_lm_pval = result_bp_lm[1]
print(f'BP Test LM Statistic: {bp_lm_statistic}\n')
print(f'BP Test LM p-value: {bp_lm_pval}\n')

BP Test LM Statistic: 4.223245741805276

BP Test LM p-value: 0.23834482631492962



In [23]:
# White test
X_wh = pd.DataFrame({'const':1, 'fitted_reg': results.fittedvalues,
                    'fitted_reg_sq': results.fittedvalues ** 2})
result_white = sm.stats.diagnostic.het_breuschpagan(results.resid, X_wh)
white_statistic = result_white[0]
white_pval = result_white[1]
print(f'White Test Statistic: {white_statistic}\n')
print(f'White Test p-value: {white_pval}\n')

White Test Statistic: 3.4472865468750546

White Test p-value: 0.17841494794132906



### 3. Weighted Least Squares

Weighted Least Squares (WLS) attempts to provide a more efficient alternative to OLS. It is a special version of a feasible generalized least squares (FGLS) estimator. Instead of the sum of squared residuals, their weighted sum is minimized. If the weights are inversely proportional to the variance, the estimator is efficient. Also the usual formula for the variance-covariance matrix of the parameter estimates and standard inference tools are valid.

We can obtain WLS parameter estimates by multiplying each variable in the model with the square root of the weight as shown by Wooldridge (2019, Section 8.4). In **statsmodels**, it is more convenient to use the option **weights = ...** of hte command **wls()**. This provides a more concise syntax and takes care of the correct residuals, fitted values, predictions, and the like in terms of the original variables. In terms of methods and arguments, **wls()** is very similar to the function **ols()**.

#### Wooldridge, Example 8.6: Financial Wealth Equation

In this example, we implement both OLS and WLS estimation for a regression of financial wealth (**nettfa**) on income (**inc**), age (**age**), gender (**male**), and eligibility for a pension plan (**e401k**) using the data set *401ksubs*. Following Wooldridge (2019), we assume that the variance is proportional to the income variable **inc**. Therefore, the optimal weight is $\frac{1}{inc}$ which is given as **wls_weight** in the **wls** call.

In [24]:
# Import dependencies
import wooldridge as woo
import pandas as pd 
import statsmodels.formula.api as smf

In [25]:
# Import the data set '401ksubs'
k401ksubs = woo.dataWoo('401ksubs')

In [26]:
# Subsetting data
k401ksubs_sub = k401ksubs[k401ksubs['fsize'] == 1]

In [27]:
# OLS (only for singles, i.e. 'fsize' == 1)
reg_ols = smf.ols(formula = 'nettfa ~ inc + I((age-25)**2) + male + e401k',
                 data = k401ksubs_sub)
results_ols = reg_ols.fit(cov_type = 'HC0')

# Print regression table
table_ols = pd.DataFrame({'Betas': round(results_ols.params, 4),
                     'Standarde Errors': round(results_ols.bse, 4),
                     't Statistics': round(results_ols.tvalues, 4),
                     'p Value': round(results_ols.pvalues, 4)})
print(f'Regression Table (OLS): \n{table_ols}\n')

Regression Table (OLS): 
                      Betas  Standarde Errors  t Statistics  p Value
Intercept          -20.9850            3.4909       -6.0114   0.0000
inc                  0.7706            0.0994        7.7486   0.0000
I((age - 25) ** 2)   0.0251            0.0043        5.7912   0.0000
male                 2.4779            2.0558        1.2053   0.2281
e401k                6.8862            2.2837        3.0153   0.0026



In [28]:
# WLS (only for singles, i.e. 'fsize' == 1)
wls_weight = list(1 / k401ksubs_sub['inc'])
reg_wls = smf.wls(formula = 'nettfa ~ inc + I((age-25)**2) + male + e401k',
                 weights = wls_weight, data = k401ksubs_sub)
results_wls = reg_wls.fit()

# Print regression table
table_wls = pd.DataFrame({'Betas': round(results_wls.params, 4),
                     'Standarde Errors': round(results_wls.bse, 4),
                     't Statistics': round(results_wls.tvalues, 4),
                     'p Value': round(results_wls.pvalues, 4)})
print(f'Regression Table (WLS): \n{table_wls}\n')

Regression Table (WLS): 
                      Betas  Standarde Errors  t Statistics  p Value
Intercept          -16.7025            1.9580       -8.5304   0.0000
inc                  0.7404            0.0643       11.5140   0.0000
I((age - 25) ** 2)   0.0175            0.0019        9.0796   0.0000
male                 1.8405            1.5636        1.1771   0.2393
e401k                5.1883            1.7034        3.0458   0.0024



We can also use heteroscedasticity-robust statistics to account for the fact that our variance function might be misspecified. Here we repeat the WLS estimation but reports non-robust and robust standard errors and *t* statistics. It replicates Wooldridge (2019, Table 8.2) with the only difference that we use a refined version of the robust SE formula. There is nothing special about the implementation. The fact that we used weights is correctly accounted for in the following calculations.

In [29]:
# Import dependencies
import wooldridge as woo
import pandas as pd 
import statsmodels.formula.api as smf

In [30]:
# Import the data set '401ksubs'
k401ksubs = woo.dataWoo('401ksubs')

# Subsetting data
k401ksubs_sub = k401ksubs[k401ksubs['fsize'] == 1]

In [31]:
# WLS (only for singles, i.e. 'fsize' == 1)
wls_weight = list(1 / k401ksubs_sub['inc'])
reg_wls = smf.wls(formula = 'nettfa ~ inc + I((age-25)**2) + male + e401k',
                 weights = wls_weight, data = k401ksubs_sub)


In [32]:
# Non-robust (default) results
results_wls = reg_wls.fit()

# Print regression table
table_default = pd.DataFrame({'Betas': round(results_wls.params, 4),
                     'Standarde Errors': round(results_wls.bse, 4),
                     't Statistics': round(results_wls.tvalues, 4),
                     'p Value': round(results_wls.pvalues, 4)})
print(f'Regression Table (Non-Robust SE): \n{table_default}\n')

Regression Table (Non-Robust SE): 
                      Betas  Standarde Errors  t Statistics  p Value
Intercept          -16.7025            1.9580       -8.5304   0.0000
inc                  0.7404            0.0643       11.5140   0.0000
I((age - 25) ** 2)   0.0175            0.0019        9.0796   0.0000
male                 1.8405            1.5636        1.1771   0.2393
e401k                5.1883            1.7034        3.0458   0.0024



In [33]:
# Robust results (refined White SE)
results_white = reg_wls.fit(cov_type = 'HC3')

# Print regression table
table_white = pd.DataFrame({'Betas': round(results_white.params, 4),
                     'Standarde Errors': round(results_white.bse, 4),
                     't Statistics': round(results_white.tvalues, 4),
                     'p Value': round(results_white.pvalues, 4)})
print(f'Regression Table (Refined White SE): \n{table_white}\n')

Regression Table (Refined White SE): 
                      Betas  Standarde Errors  t Statistics  p Value
Intercept          -16.7025            2.2482       -7.4292   0.0000
inc                  0.7404            0.0752        9.8403   0.0000
I((age - 25) ** 2)   0.0175            0.0026        6.7650   0.0000
male                 1.8405            1.3132        1.4015   0.1611
e401k                5.1883            1.5743        3.2955   0.0010



The assumption made in Example 8.6 that the variance is proportional to a regressor is usually hard to justify. Typically, we don't know the variance function and have to estimate it. This feasible GLS (FGLS) estimator replaces the (allegedly) known variance function with an estimated one.

We can estiate the relation between variance and regressors using a linear regression of the log of the squared residuals from an initial OLS regression $log(\hat{u}^2)$ as the dependent variable. Wooldridge (2019, Section 8.4) suggests two versions for the selection of regressors:

- the regressors $x_1, x_2, ... x_k$ from the original model similar to the BP test.
- $\hat{y}$ and $\hat{y}^2$ from the original model similar to the White test.

As the estimated error variance, we can use $exp(\hat{log(\hat{u}^2}))$. Its inverse can then be used as a weight in WLS estimation.

#### Woodridge, Example 8.7: Demand for Cigarettes

In this example we study the relationship between daily cigarette consumption **cigs**, individual characteristics, and restaurant smoking restrictions **restaurn**. After the initial OLS regression, a BP test is performed which clearly rejects homoscedasticity (see previous section for the BP test). After the regression of log squared residuals on the regressors, the FGLS weights are calculated and used in the WLS regression. See Wooldridge (2019) for a discussion of the results.

In [34]:
# Import dependencies
import wooldridge as woo
import numpy as np
import pandas as pd 
import statsmodels.api as sm
import statsmodels.formula.api as smf
import patsy as pt

In [35]:
# Import the data set 'smoke'
smoke = woo.dataWoo('smoke')

In [36]:
# OLS regression 
reg_ols = smf.ols(formula = 'cigs ~ np.log(income) + np.log(cigpric) +'
                 'educ + age + I(age**2) + restaurn',
                 data = smoke)
results_ols = reg_ols.fit()

# Print regression table
table_ols = pd.DataFrame({'Betas': round(results_ols.params, 4),
                     'Standarde Errors': round(results_ols.bse, 4),
                     't Statistics': round(results_ols.tvalues, 4),
                     'p Value': round(results_ols.pvalues, 4)})
print(f'Regression Table (OLS): \n{table_ols}\n')

Regression Table (OLS): 
                  Betas  Standarde Errors  t Statistics  p Value
Intercept       -3.6398           24.0787       -0.1512   0.8799
np.log(income)   0.8803            0.7278        1.2095   0.2268
np.log(cigpric) -0.7509            5.7733       -0.1301   0.8966
educ            -0.5015            0.1671       -3.0016   0.0028
age              0.7707            0.1601        4.8132   0.0000
I(age ** 2)     -0.0090            0.0017       -5.1765   0.0000
restaurn        -2.8251            1.1118       -2.5410   0.0112



In [37]:
# BP test
y, X = pt.dmatrices('cigs ~ np.log(income) + np.log(cigpric) + educ +'
                   'age + I(age**2) + restaurn',
                   data = smoke, return_type = 'dataframe')
result_bp = sm.stats.diagnostic.het_breuschpagan(results_ols.resid, X)
bp_statistic = result_bp[0]
bp_pval = result_bp[1]
print(f'BP Test Statistic: {bp_statistic}\n')
print(f'BP Test p-value: {bp_pval}\n')

BP Test Statistic: 32.25841908120112

BP Test p-value: 1.4557794830279539e-05



In [38]:
# FGLS (estimation of the variance function)
smoke['logu2'] = np.log(results_ols.resid ** 2)
reg_fgls = smf.ols(formula = 'logu2 ~ np.log(income) + np.log(cigpric) +'
                  'educ + age + I(age**2) + restaurn', data = smoke)
results_fgls = reg_fgls.fit()

# Print regression table
table_fgls = pd.DataFrame({'Betas': round(results_fgls.params, 4),
                     'Standarde Errors': round(results_fgls.bse, 4),
                     't Statistics': round(results_fgls.tvalues, 4),
                     'p Value': round(results_fgls.pvalues, 4)})
print(f'Regression Table (FGLS): \n{table_fgls}\n')

Regression Table (FGLS): 
                  Betas  Standarde Errors  t Statistics  p Value
Intercept       -1.9207            2.5630       -0.7494   0.4538
np.log(income)   0.2915            0.0775        3.7634   0.0002
np.log(cigpric)  0.1954            0.6145        0.3180   0.7506
educ            -0.0797            0.0178       -4.4817   0.0000
age              0.2040            0.0170       11.9693   0.0000
I(age ** 2)     -0.0024            0.0002      -12.8931   0.0000
restaurn        -0.6270            0.1183       -5.2982   0.0000



In [39]:
# FGLS (WLS)
wls_weight = list(1 / np.exp(results_fgls.fittedvalues))
reg_wls = smf.wls(formula = 'cigs ~ np.log(income) + np.log(cigpric) + '
                  'educ + age + I(age**2) + restaurn', 
                  weights = wls_weight, data = smoke)
results_wls = reg_wls.fit()

# Print regression table
table_wls = pd.DataFrame({'Betas': round(results_wls.params, 4),
                     'Standarde Errors': round(results_wls.bse, 4),
                     't Statistics': round(results_wls.tvalues, 4),
                     'p Value': round(results_wls.pvalues, 4)})
print(f'Regression Table (WLS): \n{table_wls}\n')

Regression Table (WLS): 
                  Betas  Standarde Errors  t Statistics  p Value
Intercept        5.6355           17.8031        0.3165   0.7517
np.log(income)   1.2952            0.4370        2.9639   0.0031
np.log(cigpric) -2.9403            4.4601       -0.6592   0.5099
educ            -0.4634            0.1202       -3.8570   0.0001
age              0.4819            0.0968        4.9784   0.0000
I(age ** 2)     -0.0056            0.0009       -5.9897   0.0000
restaurn        -3.4611            0.7955       -4.3508   0.0000

