<style>
div.blue{
    background-color:#e6f0ff; 
    border-radius: 5px; 
    padding: 20px;}
</style> 

<style>
div.warn {    
    background-color: #fcf2f2;
    border-color: #dFb5b4;
    border-left: 5px solid #dfb5b4;
    padding: 0.5em;
    }
 </style>
    
<h1 style="text-align: center; color: purple;" markdown="1">Econ 320 Python: Inference t and F test </h1>
<h2 style="text-align: center; color: purple;" markdown="1">Handout 12 </h2>


**The package setup**

In [44]:
from scipy.stats import norm
from scipy.stats import t
import scipy.stats as stats
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt 
import wooldridge as woo
import statsmodels.formula.api as smf
from stargazer.stargazer import Stargazer
from IPython.core.display import HTML

## The *t*-test 

After learning the magnitude and sign of the estimated coefficients in a regression, the next step in empirical research is investigating the statistical significance of these estimates. 

### General Setup

We are often interested in testing whether there is a relation at all between the dependent variable $y$ and a regressor $x_j$ and do not want to impose a sign on the partial effect a *priori*. 
So we have the null hypotesis of the form
$$ H_0: \beta_j = a_j $$ 
where $a_j$ is some given number, very often $a_j=0$. The standard two-tailed where the alternative hypothesis is 
$$ H_1: \beta_j \neq a_j $$
and one- tailed test it is either one of 

$$H_1: \beta_j < a_j  \;\;\;\;  or \;\;\;\;   H_1: \beta_j > a_j$$ 

The hypotheses can be conveniently tested using a t test which is based on the statistic

$$ t= \frac{\hat{\beta_j} - a_j}{se(\hat{\beta_j})}$$
If $H_0$ is in fact true and the CLM assumptions hold, then this statistic has a t distribution with n-k-1 degrees of freedom.

Since the standard case of a t test,$a_j=0$, is so common R provides us with the relevant t and p values directly in the **summary** of the estimation results. 

In other words the formula is: 
$$ t= \frac{estimate - hypothesized \; value }{standard \; error}$$


### t-test in the regression results

the summary of the `smf.ols()` or the stargazer table both provide with inference analysis and they both contained the same information it is just display differently. 

`.summary()` Provides a table that has four relevant columns related to the coefficients

| Indep variables | Estimate              | Std. Error                  | t-stat value  | **$Pr(>|t|)$**    | [0.025	0.975]| 
|-----------------|-------------|------------------|-----------------|-------------|---------------------|
| Var names       | Estimated ($\beta's$) | ($\beta's$) standard errors | $(\hat\beta - 0) / se$ | p-value for $\beta$ |Confidence intervals at 95%|





In [45]:
gpa1 = woo.dataWoo('gpa1')

# store results:
reg = smf.ols(formula='colGPA ~ hsGPA + ACT + skipped', data=gpa1)
results = reg.fit()
# display summary results 
print(results.summary())


                            OLS Regression Results                            
Dep. Variable:                 colGPA   R-squared:                       0.234
Model:                            OLS   Adj. R-squared:                  0.217
Method:                 Least Squares   F-statistic:                     13.92
Date:                Fri, 11 Apr 2025   Prob (F-statistic):           5.65e-08
Time:                        10:52:34   Log-Likelihood:                -41.501
No. Observations:                 141   AIC:                             91.00
Df Residuals:                     137   BIC:                             102.8
Df Model:                           3                                         
Covariance Type:            nonrobust                                         
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept      1.3896      0.332      4.191      0.0

`stargazer(model)` Provides a table with one column per model 

|Independent variables |  Estimate|
|---------------------|------------|
|Variable name| Estimated ($\beta's$) with stars the imply statistical significance at certain $\alpha$|
|                  |(standard errors in parenthesis)| 

In [46]:
# Let's see the regression results in the stargazer table why are the stars there? what do they show? 
stargazer = Stargazer([results])
# organize covariates and put intercept at the bottom
stargazer.title('Regression Results')
stargazer.covariate_order(['hsGPA', 'ACT', 'skipped'])
HTML(stargazer.render_html())

0,1
,
,Dependent variable: colGPA
,
,(1)
,
hsGPA,0.412***
,(0.094)
ACT,0.015
,(0.011)
skipped,-0.083***




Let's see an example from Wooldridge Chapter 4. Using gpa1 data 

### t-test step-by-step

You rarely use this method, but it is crucial to understand how these number are calculated to know how to read the regression results from the summary table as those caluclations are done there. See both and compare this with the resgression summary table. 

In [47]:
# manually confirm the formulas, i.e. extract coefficients and SE:
b = results.params
se = results.bse

obsd = results.nobs
# reproduce t statistic:
tstat = b/se
print(f'tstat: \n{tstat}\n')


# reproduce p value: multiply by two because this is a two tail test. 

pval = pd.Series(2 * stats.t.cdf(-abs(tstat), obsd), index=b.index)
# use the np.around() function to round array 

print(f'pval vector: \n{np.around(pval,4)}\n')


tstat: 
Intercept    4.191039
hsGPA        4.396260
ACT          1.393319
skipped     -3.196840
dtype: float64

pval vector: 
Intercept    0.0000
hsGPA        0.0000
ACT          0.1657
skipped      0.0017
dtype: float64



<font color='blue'>
    The results above, both from the step-by-step method and doing the `summary()`, are equal and provide the same information. Remember that a $|t|>2$ is the rule of thumb to reject the null hypothesis. 

   From our calculation above, the critical value for $\alpha =0.05$ was 1.97. The t-stats for High School GPA and skipped are bigger than this number. Showing the $$ H_0: \beta_j = 0 $$ is rejected in favor of $$ H_a: \beta_j \neq a_j $$ 
    
   p-values are also close to zero, informing us about the level of significance of our variables. High School GPA for example is significant at 1%
    
</font> 


## Confindence Intervals

Confindence intervals for regression coefficients are related to the t test. The 95% confidence inteval for parameter $\hat{\beta_j}\pm c \cdot se(\hat{\beta_j})$ 
where $c$ is the same critical value for the two sided t test using a significance level $\alpha=5\%$

In Python you can calculate CI for all coefficients: 
Store the regressions results in an *object* and then use the method **.conf_int()** to obtain a table of 95% confidence intervals. 


You can use the option alpha=value to choose other levels. For example use  **`results.conf_int(alpha=0.01, cols=None)`** to find a 99% confidence interval.

In [48]:
# 95% CI:
#default alpha is 0.05, no need to specify
print('95% CI:',results.conf_int())
# 99% CI:
# add word alpha to the function to specify the confidence level
print('99% CI:', results.conf_int(alpha=0.01,cols = None))

95% CI:                   0         1
Intercept  0.733930  2.045178
hsGPA      0.226582  0.597050
ACT       -0.006171  0.035612
skipped   -0.134523 -0.031703
99% CI:                   0         1
Intercept  0.523472  2.255635
hsGPA      0.167121  0.656511
ACT       -0.012877  0.042318
skipped   -0.151026 -0.015200



## Linear restrictions: The *F* test 

This test allows you to test the significance of various parameters at the same time. The test statistic of the `F` test is based on the relative difference between the sum of squared residuals in the general (unrestricted) model and a restricted model in which the hypoteses are imposed $SSR_{ur}$ and $SSR_{r}$, respectively. 
$$F={{\frac{SSR_{r}-SSR_{ur}}{SSR_{ur}}}\cdot{\frac{n-k-1}{q}}}= {{\frac{R^2_{ur}-R^2_{r}}{1-R^2_{ur}}}\cdot{\frac{n-k-1}{q}}}$$

where q is the number of restrictions, n the number of observations and k the number of parameters in the regression. 

Let's play with this regression, from Wooldridge Chapter 4.  
$$log(salary)=\beta_0 + \beta1\cdot years + \beta_2 \cdot gamesyr + \beta_3 \cdot bavg + \beta_4 \cdot hrunsyr +\beta_5 \cdot rbisyr +u$$


### F test using `f_test` 

Python provides a more convenient way do this, but is always best to understand what is behind those magical functions.  The module `statsmodels` provides the command **`f-test`** that allows you to perfom this kind of test. Once you have estimated your regression and store your results in an object, for our example named **results**, use

> `hypothesis = ["var_name1 = 0", "var_name2 = 0", ...] 

> ftest = results.f_test(hypothesis)`

where **`hypothesis`** represents the null hypothesis to be tested. It is a list of length q where each restriction is described as text in which the variable name takes the place of its parameter. 

In [49]:
mlb1 = woo.dataWoo('mlb1')
# OLS regression:
reg = smf.ols(formula='lsalary ~ years + gamesyr + bavg + hrunsyr + rbisyr', data=mlb1).fit()
# display summary results
display(reg.summary())



0,1,2,3
Dep. Variable:,lsalary,R-squared:,0.628
Model:,OLS,Adj. R-squared:,0.622
Method:,Least Squares,F-statistic:,117.1
Date:,"Fri, 11 Apr 2025",Prob (F-statistic):,2.94e-72
Time:,10:52:34,Log-Likelihood:,-385.11
No. Observations:,353,AIC:,782.2
Df Residuals:,347,BIC:,805.4
Df Model:,5,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
Intercept,11.1924,0.289,38.752,0.000,10.624,11.760
years,0.0689,0.012,5.684,0.000,0.045,0.093
gamesyr,0.0126,0.003,4.742,0.000,0.007,0.018
bavg,0.0010,0.001,0.887,0.376,-0.001,0.003
hrunsyr,0.0144,0.016,0.899,0.369,-0.017,0.046
rbisyr,0.0108,0.007,1.500,0.134,-0.003,0.025

0,1,2,3
Omnibus:,6.816,Durbin-Watson:,1.265
Prob(Omnibus):,0.033,Jarque-Bera (JB):,10.197
Skew:,-0.068,Prob(JB):,0.0061
Kurtosis:,3.821,Cond. No.,2090.0


In [50]:
# automated F test:
# first we need to define the null hypothesis 
# with the coefficients we want to test for the joint significance
# in this case we want to test if bavg, hrunsyr, and rbisyr are equal to zero. 
# meaning ability related to performance, is not relevant for the salary.
hypothesis = ['bavg = 0', 'hrunsyr = 0', 'rbisyr = 0']
ftest = reg.f_test(hypothesis)
fstat = ftest.statistic
fpval = ftest.pvalue
# Beacuse fstat comes form an array we need .around() to round it. 

print(f'Fstat: {np.around(fstat, 3)}\n')
print(f'Fpval: {np.around(fpval,3)}\n')

Fstat: 9.55

Fpval: 0.0



In [51]:
!jupyter nbconvert --to html H12_320Lab_InferenceS25.ipynb

This application is used to convert notebook files (*.ipynb)
        to various other formats.


Options
The options below are convenience aliases to configurable class-options,
as listed in the "Equivalent to" description-line of the aliases.
To see all configurable class-options for some <cmd>, use:
    <cmd> --help-all

--debug
    set log level to logging.DEBUG (maximize logging output)
    Equivalent to: [--Application.log_level=10]
--show-config
    Show the application's configuration (human-readable format)
    Equivalent to: [--Application.show_config=True]
--show-config-json
    Show the application's configuration (json format)
    Equivalent to: [--Application.show_config_json=True]
--generate-config
    generate default config file
    Equivalent to: [--JupyterApp.generate_config=True]
-y
    Answer yes to any questions instead of prompting.
    Equivalent to: [--JupyterApp.answer_yes=True]
--execute
    Execute the notebook prior to export.
    Equivalent to: [--ExecutePr



&nbsp;
<hr />
<p style="font-family:palatino; text-align: center;font-size: 15px">ECON320 Python Programming Laboratory</a></p>
<p style="font-family:palatino; text-align: center;font-size: 15px">Professor <em> Paloma Lopez de mesa Moyano</em></a></p>
<p style="font-family:palatino; text-align: center;font-size: 15px"><span style="color: #6666FF;"><em>paloma.moyano@emory.edu</em></span></p>

<p style="font-family:palatino; text-align: center;font-size: 15px">Department of Economics</a></p>
<p style="font-family:palatino; text-align: center; color: #012169;font-size: 15px">Emory University</a></p>

&nbsp;