# The Frisch-Waugh-Lowell Theorem
**Author:** Ian Ho | **Date:** Sep 13, 2023 | **Python Version:** 3.10.9

In this file, I use a real-world data set to see whether the FWL Theorem holds. To successfully run the following codes, please install the Python package `wooldridge` by running the following command at your Anaconda prompt.
```
pip install wooldridge
```
This package contains 111 data sets from *Introductory Econometrics: A Modern Approach* (7th edition) by Jeffrey M. Wooldridge in 2019.

In [1]:
import pandas as pd                    # for data handling
import numpy as np                     # for numerical methods and data structures
import statsmodels.formula.api as smf  # provides a way to directly specify models from formulas
import wooldridge as woo               # for data sets in Wooldridge (2019)

In [2]:
wage = woo.data('wage1')
wage

Unnamed: 0,wage,educ,exper,tenure,nonwhite,female,married,numdep,smsa,northcen,...,trcommpu,trade,services,profserv,profocc,clerocc,servocc,lwage,expersq,tenursq
0,3.10,11,2,0,0,1,0,2,1,0,...,0,0,0,0,0,0,0,1.131402,4,0
1,3.24,12,22,2,0,1,1,3,1,0,...,0,0,1,0,0,0,1,1.175573,484,4
2,3.00,11,2,0,0,0,0,2,0,0,...,0,1,0,0,0,0,0,1.098612,4,0
3,6.00,8,44,28,0,0,1,0,1,0,...,0,0,0,0,0,1,0,1.791759,1936,784
4,5.30,12,7,2,0,0,1,1,0,0,...,0,0,0,0,0,0,0,1.667707,49,4
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
521,15.00,16,14,2,0,1,1,2,0,0,...,0,0,0,1,1,0,0,2.708050,196,4
522,2.27,10,2,0,0,1,0,3,0,0,...,0,1,0,0,1,0,0,0.819780,4,0
523,4.67,15,13,18,0,0,1,3,0,0,...,0,0,0,0,1,0,0,1.541159,169,324
524,11.56,16,5,1,0,0,1,0,0,0,...,0,0,0,0,0,0,0,2.447551,25,1


In [3]:
# See the description of variables in this data set
woo.data('wage1', description=True)

name of dataset: wage1
no of variables: 24
no of observations: 526

+----------+---------------------------------+
| variable | label                           |
+----------+---------------------------------+
| wage     | average hourly earnings         |
| educ     | years of education              |
| exper    | years potential experience      |
| tenure   | years with current employer     |
| nonwhite | =1 if nonwhite                  |
| female   | =1 if female                    |
| married  | =1 if married                   |
| numdep   | number of dependents            |
| smsa     | =1 if live in SMSA              |
| northcen | =1 if live in north central U.S |
| south    | =1 if live in southern region   |
| west     | =1 if live in western region    |
| construc | =1 if work in construc. indus.  |
| ndurman  | =1 if in nondur. manuf. indus.  |
| trcommpu | =1 if in trans, commun, pub ut  |
| trade    | =1 if in wholesale or retail    |
| services | =1 if in services indus.  

## The OLS Model
I'm going to run the OLS model
$${\rm wage} = \beta_0 + \beta_1 \cdot {\rm educ} + \beta_2 \cdot {\rm female} + \beta_3 \cdot {\rm married} + e$$
on the sample with only white people. I'll focus on the coefficient of education ($\beta_1$) to see whether the FWL Theorem holds truly.

In [4]:
# Select a subset of the whole data set
df = wage[['wage', 'educ', 'exper', 'nonwhite', 'female', 'married']]
df

Unnamed: 0,wage,educ,exper,nonwhite,female,married
0,3.10,11,2,0,1,0
1,3.24,12,22,0,1,1
2,3.00,11,2,0,0,0
3,6.00,8,44,0,0,1
4,5.30,12,7,0,0,1
...,...,...,...,...,...,...
521,15.00,16,14,0,1,1
522,2.27,10,2,0,1,0
523,4.67,15,13,0,0,1
524,11.56,16,5,0,0,1


In [5]:
# Run the OLS model
whole_model = smf.ols('wage ~ educ + exper + female + married', data=df[df['nonwhite']==0]).fit()
print(whole_model.summary())

                            OLS Regression Results                            
Dep. Variable:                   wage   R-squared:                       0.316
Model:                            OLS   Adj. R-squared:                  0.310
Method:                 Least Squares   F-statistic:                     53.90
Date:                Fri, 08 Sep 2023   Prob (F-statistic):           2.42e-37
Time:                        14:37:42   Log-Likelihood:                -1203.5
No. Observations:                 472   AIC:                             2417.
Df Residuals:                     467   BIC:                             2438.
Df Model:                           4                                         
Covariance Type:            nonrobust                                         
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept     -2.1125      0.822     -2.570      0.0

In [6]:
print('The coeffcient of education in the OLS model is {0:8.6f}.'.format(whole_model.params[1]))

The coeffcient of education in the OLS model is 0.603820.


## Three-Step Version of FWL Theorem
The FWL Theorem states that the following procedure returns the same coefficient estimate:
1. Regress `wage` on `exper`, `female`, `married`, and save the residuals as `resid1`.
2. Regress `educ` on `exper`, `female`, `married`, and save the residuals as `resid2`.
3. Regress `resid1` on `resid2`, and obtain the same coefficient.

In [7]:
step1 = smf.ols('wage ~ exper + female + married', data=df[df['nonwhite']==0]).fit()
df.insert(len(df.axes[1]), "resid1", step1.resid, True)    # insert the values of residuals into the last column of the dataframe

step2 = smf.ols('educ ~ exper + female + married', data=df[df['nonwhite']==0]).fit()
df.insert(len(df.axes[1]), "resid2", step2.resid, True)

df

Unnamed: 0,wage,educ,exper,nonwhite,female,married,resid1,resid2
0,3.10,11,2,0,1,0,-0.615141,-1.792193
1,3.24,12,22,0,1,1,-1.999432,-0.332860
2,3.00,11,2,0,0,0,-3.080094,-2.365765
3,6.00,8,44,0,0,1,-2.103263,-3.508789
4,5.30,12,7,0,0,1,-1.964240,-1.859372
...,...,...,...,...,...,...,...,...
521,15.00,16,14,0,1,1,9.941979,3.158906
522,2.27,10,2,0,1,0,-1.445141,-2.792193
523,4.67,15,13,0,0,1,-2.730298,1.521804
524,11.56,16,5,0,0,1,4.341113,2.013570


In [8]:
step3 = smf.ols('resid1 ~ resid2', data=df[df['nonwhite']==0]).fit()
print(step3.summary())

                            OLS Regression Results                            
Dep. Variable:                 resid1   R-squared:                       0.197
Model:                            OLS   Adj. R-squared:                  0.195
Method:                 Least Squares   F-statistic:                     115.4
Date:                Fri, 08 Sep 2023   Prob (F-statistic):           3.20e-24
Time:                        14:37:43   Log-Likelihood:                -1203.5
No. Observations:                 472   AIC:                             2411.
Df Residuals:                     470   BIC:                             2419.
Df Model:                           1                                         
Covariance Type:            nonrobust                                         
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept   1.735e-15      0.143   1.21e-14      1.0

In [9]:
print('The coeffcient of education in the whole model is {0:8.6f}.'.format(whole_model.params[1]))
print('The coeffcient after the three-step procedure is {0:8.6f}.'.format(step3.params[1]))

The coeffcient of education in the whole model is 0.603820.
The coeffcient after the three-step procedure is 0.603820.


## Two-Step Version of FWL Theorem
The FWL Theorem states that the following procedure returns the same coefficient estimate:
1. Regress `educ` on `exper`, `female`, `married`, and save the residuals as `residual`.
1. Regress `wage` on `residual`, and obtain the same coefficient.

In [10]:
s1 = smf.ols('educ ~ exper + female + married', data=df[df['nonwhite']==0]).fit()
df.insert(len(df.axes[1]), "residual", s1.resid, True)

s2 = smf.ols('wage ~ residual', data=df[df['nonwhite']==0]).fit()
print(s2.summary())

                            OLS Regression Results                            
Dep. Variable:                   wage   R-squared:                       0.168
Model:                            OLS   Adj. R-squared:                  0.166
Method:                 Least Squares   F-statistic:                     94.90
Date:                Fri, 08 Sep 2023   Prob (F-statistic):           1.51e-20
Time:                        14:37:43   Log-Likelihood:                -1249.7
No. Observations:                 472   AIC:                             2503.
Df Residuals:                     470   BIC:                             2512.
Df Model:                           1                                         
Covariance Type:            nonrobust                                         
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept      5.9442      0.158     37.718      0.0

In [11]:
print('The coeffcient of education in the whole model is {0:8.6f}.'.format(whole_model.params[1]))
print('The coeffcient after the two-step procedure is {0:8.6f}.'.format(s2.params[1]))

The coeffcient of education in the whole model is 0.603820.
The coeffcient after the two-step procedure is 0.603820.
