## Linear Regression: Model Estimation

**Functions**

`sm.OLS`

### Exercise 32
Use the OLS function to estimate the coefficients of the Fama-French portfolios (monthly data) on the
market, size and value factors. Include a constant in the regressions. Use only the four
extremum portfolios – that is the 1-1, 1-5, 5-1 and 5-5 portfolios. Estimate the model with
homoskedastic errors and with White's covariance estimator.

In [22]:
import pandas as pd
ff = pd.read_hdf("data/ff.h5","ff")

factors = ff.iloc[:,:3]
portfolios = ff.iloc[:,4:]
portfolios = portfolios[["SMALL LoBM", "SMALL HiBM","BIG LoBM", "BIG HiBM"]]

In [23]:
import statsmodels.api as sm
factors = sm.add_constant(factors)
all_results = {}
homosked_results = {}
factors

Unnamed: 0,const,Mkt-RF,SMB,HML
1926-07-31,1.0,2.96,-2.30,-2.87
1926-08-31,1.0,2.64,-1.40,4.19
1926-09-30,1.0,0.36,-1.32,0.01
1926-10-31,1.0,-3.24,0.04,0.51
1926-11-30,1.0,2.53,-0.20,-0.35
...,...,...,...,...
2019-03-31,1.0,1.10,-3.13,-4.07
2019-04-30,1.0,3.96,-1.68,1.93
2019-05-31,1.0,-6.94,-1.20,-2.39
2019-06-30,1.0,6.93,0.33,-1.08


In [24]:
mod = sm.OLS(portfolios["SMALL LoBM"], factors)
res = mod.fit(cov_type="HC0")
all_results["SMALL LoBM"] = res
homosked_results["SMALL LoBM"] = sm.OLS(portfolios["SMALL LoBM"], factors).fit()
res.summary()

0,1,2,3
Dep. Variable:,SMALL LoBM,R-squared:,0.655
Model:,OLS,Adj. R-squared:,0.654
Method:,Least Squares,F-statistic:,180.8
Date:,"Mon, 21 Dec 2020",Prob (F-statistic):,1.7699999999999998e-95
Time:,14:35:18,Log-Likelihood:,-3776.8
No. Observations:,1117,AIC:,7562.0
Df Residuals:,1113,BIC:,7582.0
Df Model:,3,,
Covariance Type:,HC0,,

0,1,2,3,4,5,6
,coef,std err,z,P>|z|,[0.025,0.975]
const,-0.4355,0.172,-2.526,0.012,-0.773,-0.098
Mkt-RF,1.2832,0.119,10.766,0.000,1.050,1.517
SMB,1.4336,0.177,8.120,0.000,1.088,1.780
HML,0.4214,0.274,1.536,0.125,-0.116,0.959

0,1,2,3
Omnibus:,896.514,Durbin-Watson:,2.065
Prob(Omnibus):,0.0,Jarque-Bera (JB):,102194.78
Skew:,2.973,Prob(JB):,0.0
Kurtosis:,49.48,Cond. No.,5.68


In [25]:
mod = sm.OLS(portfolios["SMALL HiBM"], factors)
res = mod.fit(cov_type="HC0")
all_results["SMALL HiBM"] = res
homosked_results["SMALL HiBM"] = sm.OLS(portfolios["SMALL HiBM"], factors).fit()
res.summary()

0,1,2,3
Dep. Variable:,SMALL HiBM,R-squared:,0.939
Model:,OLS,Adj. R-squared:,0.938
Method:,Least Squares,F-statistic:,941.0
Date:,"Mon, 21 Dec 2020",Prob (F-statistic):,1.19e-304
Time:,14:35:18,Log-Likelihood:,-2506.2
No. Observations:,1117,AIC:,5020.0
Df Residuals:,1113,BIC:,5041.0
Df Model:,3,,
Covariance Type:,HC0,,

0,1,2,3,4,5,6
,coef,std err,z,P>|z|,[0.025,0.975]
const,0.3696,0.062,6.008,0.000,0.249,0.490
Mkt-RF,0.9830,0.024,40.929,0.000,0.936,1.030
SMB,1.3001,0.065,20.068,0.000,1.173,1.427
HML,0.9124,0.058,15.624,0.000,0.798,1.027

0,1,2,3
Omnibus:,589.467,Durbin-Watson:,2.272
Prob(Omnibus):,0.0,Jarque-Bera (JB):,15321.552
Skew:,1.888,Prob(JB):,0.0
Kurtosis:,20.747,Cond. No.,5.68


In [26]:
mod = sm.OLS(portfolios["BIG LoBM"], factors)
res = mod.fit(cov_type="HC0")
all_results["BIG LoBM"] = res
homosked_results["BIG LoBM"] = mod.fit()
res.summary()

0,1,2,3
Dep. Variable:,BIG LoBM,R-squared:,0.952
Model:,OLS,Adj. R-squared:,0.952
Method:,Least Squares,F-statistic:,4050.0
Date:,"Mon, 21 Dec 2020",Prob (F-statistic):,0.0
Time:,14:35:19,Log-Likelihood:,-1758.6
No. Observations:,1117,AIC:,3525.0
Df Residuals:,1113,BIC:,3545.0
Df Model:,3,,
Covariance Type:,HC0,,

0,1,2,3,4,5,6
,coef,std err,z,P>|z|,[0.025,0.975]
const,0.3594,0.035,10.270,0.000,0.291,0.428
Mkt-RF,1.0229,0.009,109.366,0.000,1.005,1.041
SMB,-0.1485,0.023,-6.524,0.000,-0.193,-0.104
HML,-0.2629,0.016,-16.645,0.000,-0.294,-0.232

0,1,2,3
Omnibus:,109.852,Durbin-Watson:,1.791
Prob(Omnibus):,0.0,Jarque-Bera (JB):,248.017
Skew:,0.578,Prob(JB):,1.3899999999999998e-54
Kurtosis:,4.998,Cond. No.,5.68


In [27]:
mod = sm.OLS(portfolios["BIG HiBM"], factors)
res = mod.fit(cov_type="HC0")
all_results["BIG HiBM"] = res
homosked_results["BIG HiBM"] = mod.fit()
print(res.summary())

                            OLS Regression Results                            
Dep. Variable:               BIG HiBM   R-squared:                       0.838
Model:                            OLS   Adj. R-squared:                  0.838
Method:                 Least Squares   F-statistic:                     376.3
Date:                Mon, 21 Dec 2020   Prob (F-statistic):          1.04e-168
Time:                        14:35:19   Log-Likelihood:                -2956.0
No. Observations:                1117   AIC:                             5920.
Df Residuals:                    1113   BIC:                             5940.
Df Model:                           3                                         
Covariance Type:                  HC0                                         
                 coef    std err          z      P>|z|      [0.025      0.975]
------------------------------------------------------------------------------
const          0.1042      0.103      1.016      0.3

In [28]:
het_res = {}
hom_res = {}

for model in portfolios.columns:
    hom_res[model] = sm.OLS(portfolios[model],factors).fit()
    het_res[model] = sm.OLS(portfolios[model],factors).fit(cov_type="HC0")
# faster way of obtaining identical results

####################

### USE
### THIS
### CELL !!!!

In [29]:
het_res["SMALL HiBM"].summary()

0,1,2,3
Dep. Variable:,SMALL HiBM,R-squared:,0.939
Model:,OLS,Adj. R-squared:,0.938
Method:,Least Squares,F-statistic:,941.0
Date:,"Mon, 21 Dec 2020",Prob (F-statistic):,1.19e-304
Time:,14:35:19,Log-Likelihood:,-2506.2
No. Observations:,1117,AIC:,5020.0
Df Residuals:,1113,BIC:,5041.0
Df Model:,3,,
Covariance Type:,HC0,,

0,1,2,3,4,5,6
,coef,std err,z,P>|z|,[0.025,0.975]
const,0.3696,0.062,6.008,0.000,0.249,0.490
Mkt-RF,0.9830,0.024,40.929,0.000,0.936,1.030
SMB,1.3001,0.065,20.068,0.000,1.173,1.427
HML,0.9124,0.058,15.624,0.000,0.798,1.027

0,1,2,3
Omnibus:,589.467,Durbin-Watson:,2.272
Prob(Omnibus):,0.0,Jarque-Bera (JB):,15321.552
Skew:,1.888,Prob(JB):,0.0
Kurtosis:,20.747,Cond. No.,5.68


### Exercise 33
Are the parameter standard errors similar using the two covariance estimators?
If not, what does this mean? 

In [37]:
from IPython.display import display, HTML
for key in all_results:
    white_res = all_results[key]
    homosk_res = homosked_results[key]
    std_err = pd.DataFrame({"White":white_res.bse,"Homo":homosk_res.bse})
    display(HTML(key))
    display(std_err)

Unnamed: 0,White,Homo
const,0.172414,0.215569
Mkt-RF,0.119187,0.043167
SMB,0.176554,0.070797
HML,0.274367,0.063235


Unnamed: 0,White,Homo
const,0.061514,0.069118
Mkt-RF,0.024017,0.013841
SMB,0.064785,0.022699
HML,0.058401,0.020275


Unnamed: 0,White,Homo
const,0.034997,0.035391
Mkt-RF,0.009353,0.007087
SMB,0.022765,0.011623
HML,0.015797,0.010382


Unnamed: 0,White,Homo
const,0.102525,0.103384
Mkt-RF,0.038067,0.020702
SMB,0.069341,0.033953
HML,0.072789,0.030327


### Exercise 34
How much of the variation is explained by these three regressors?

In [31]:
r_square = {}
for key in portfolios:
    r_square[key] = all_results[key].rsquared
pd.Series(r_square,name="R2").to_frame()

Unnamed: 0,R2
SMALL LoBM,0.654851
SMALL HiBM,0.938606
BIG LoBM,0.951723
BIG HiBM,0.838409


In [32]:
capm_factor = factors.iloc[:,:2]
for key in portfolios:
    res = sm.OLS(portfolios[key],capm_factor).fit()
    r_square[key] = res.rsquared
pd.Series(r_square,name="R2").to_frame()

Unnamed: 0,R2
SMALL LoBM,0.508956
SMALL HiBM,0.629589
BIG LoBM,0.915187
BIG HiBM,0.674243
