# F-тест: значимость регрессии

In [1]:
import numpy as np
import pandas as pd
import statsmodels.formula.api as smf
from scipy.stats import f # F-distribution

## sleep equation 1
Регрессия `sleep` на `totwrk, age, male, smsa`

Неробастный тест

In [4]:
sleep_df = pd.read_csv('https://raw.githubusercontent.com/artamonoff/Econometrica/master/python-notebooks/data-csv/sleep75.csv')
mod1 = smf.ols(formula='sleep~totwrk+age+male+smsa', data=sleep_df).fit()
print(mod1.summary(slim=True))

                            OLS Regression Results                            
Dep. Variable:                  sleep   R-squared:                       0.123
Model:                            OLS   Adj. R-squared:                  0.118
No. Observations:                 706   F-statistic:                     24.68
Covariance Type:            nonrobust   Prob (F-statistic):           3.89e-19
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept   3494.2231     68.889     50.722      0.000    3358.970    3629.477
totwrk        -0.1677      0.018     -9.337      0.000      -0.203      -0.132
age            2.8065      1.390      2.020      0.044       0.078       5.535
male          86.9084     34.266      2.536      0.011      19.632     154.185
smsa         -75.2858     32.103     -2.345      0.019    -138.315     -12.257

Notes:
[1] Standard Errors assume that the covarian

Значимость регрессии $H_0:\beta_{totwrk}=\beta_{age}=\beta_{male}=\beta_{smsa}=0$

$F=24.68$, $P=3.89*10^{-19}$

Уроыен значимости 5%=0.05

Вывод: $P<\alpha$ => регрессия значима! ($H_0$ отвергется)

In [5]:
# Критическое значение  df1=4=.df_model, df2=n-k-1=706-4-1=.df_resid
f.ppf(q=1-0.05, dfn=mod1.df_model, dfd=mod1.df_resid)

2.384637913666586

$F=24.68>F_{cr}=2.38$ => регрессия значима!

Робастный F-тест

In [6]:
sleep_df = pd.read_csv('https://raw.githubusercontent.com/artamonoff/Econometrica/master/python-notebooks/data-csv/sleep75.csv')
mod2 = smf.ols(formula='sleep~totwrk+age+male+smsa', data=sleep_df).fit(cov_type='HC3')
print(mod2.summary(slim=True))

                            OLS Regression Results                            
Dep. Variable:                  sleep   R-squared:                       0.123
Model:                            OLS   Adj. R-squared:                  0.118
No. Observations:                 706   F-statistic:                     19.88
Covariance Type:                  HC3   Prob (F-statistic):           1.63e-15
                 coef    std err          z      P>|z|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept   3494.2231     72.998     47.868      0.000    3351.150    3637.296
totwrk        -0.1677      0.020     -8.213      0.000      -0.208      -0.128
age            2.8065      1.366      2.054      0.040       0.129       5.484
male          86.9084     35.438      2.452      0.014      17.451     156.366
smsa         -75.2858     31.476     -2.392      0.017    -136.978     -13.593

Notes:
[1] Standard Errors are heteroscedasticity r

Вывод: регрессия значима!

## sleep equation 2
Регрессия `sleep` на `union, south, marr, log(hrwage)`

Неробастный тест

In [7]:
sleep_df = pd.read_csv('https://raw.githubusercontent.com/artamonoff/Econometrica/master/python-notebooks/data-csv/sleep75.csv')
mod3 = smf.ols(formula='sleep~union+south+marr+np.log(hrwage)', data=sleep_df).fit()
print(mod3.summary(slim=True))

                            OLS Regression Results                            
Dep. Variable:                  sleep   R-squared:                       0.013
Model:                            OLS   Adj. R-squared:                  0.006
No. Observations:                 532   F-statistic:                     1.746
Covariance Type:            nonrobust   Prob (F-statistic):              0.138
                     coef    std err          t      P>|t|      [0.025      0.975]
----------------------------------------------------------------------------------
Intercept       3249.7952     60.892     53.370      0.000    3130.175    3369.415
union             21.3754     45.135      0.474      0.636     -67.292     110.043
south             68.1007     46.083      1.478      0.140     -22.429     158.631
marr              72.4004     48.346      1.498      0.135     -22.575     167.376
np.log(hrwage)   -47.7370     29.681     -1.608      0.108    -106.044      10.570

Notes:
[1] Standard Err

$P=0.138=13.8\%$

Уровень значимости $\alpha=5\%$

$P>\alpha$ => регрессия незначима!

In [8]:
# Критическое значение
f.ppf(q=1-0.05, dfn=mod3.df_model, dfd=mod3.df_resid)

2.388849582894386

$F=1.746>F_{cr}=2.39$ => регрессия незначима!

Робастный тест

In [9]:
sleep_df = pd.read_csv('https://raw.githubusercontent.com/artamonoff/Econometrica/master/python-notebooks/data-csv/sleep75.csv')
mod4 = smf.ols(formula='sleep~union+south+marr+np.log(hrwage)', data=sleep_df).fit(cov_type='HC3')
print(mod4.summary(slim=True))

                            OLS Regression Results                            
Dep. Variable:                  sleep   R-squared:                       0.013
Model:                            OLS   Adj. R-squared:                  0.006
No. Observations:                 532   F-statistic:                     1.611
Covariance Type:                  HC3   Prob (F-statistic):              0.170
                     coef    std err          z      P>|z|      [0.025      0.975]
----------------------------------------------------------------------------------
Intercept       3249.7952     62.741     51.797      0.000    3126.825    3372.765
union             21.3754     44.723      0.478      0.633     -66.281     109.032
south             68.1007     44.879      1.517      0.129     -19.861     156.063
marr              72.4004     49.853      1.452      0.146     -25.310     170.111
np.log(hrwage)   -47.7370     30.501     -1.565      0.118    -107.518      12.044

Notes:
[1] Standard Err