# 9 Hypothesis Test

## 1 What is Hypothesis Test

The goal of analyzing a dataset (sample) is to make inference about **population parameters**. For example, we've learned how to obtain **point estimators - $\hat{\beta}$**. These estimators are used to infer the population parameters of interest - $\beta$.

**Hypothesis testing** is a different way to make inferences, whereby an analyst first make an assumption (called **null hypothesis**) regarding one(or several) population parameter(s), and then prove - by contradiction -  if we should reject or accept the null hypothesis.

## 2 The Most Famous Example - t-Test

t-test (aka significance test) is the most famous hypothesis test in econometrics. The test is used to infer if the population coefficient $\beta=0$. In other words, we want to test whether there is any relation at all between the dependent variable y and a regressor $x_j$. In the standard setup, we have

$$H_0: \beta_j=0$$
$$H_1: \beta_j\neq 0$$

### Reasoning Step 1

If Gauss-Markov assumptions hold, we have $\hat{\beta_j}\sim N(\beta_j, \sigma_{\hat{\beta_j}}^2)$ by central limit theorem. Hence

$$z_{\hat{\beta_j}}\equiv \frac{\hat{\beta_j}-\beta_j}{\sigma_{\hat{\beta_j}}} \sim N(0,1)$$

However, there is one small problem. Recall what is the formula for $\sigma_{\hat{\beta_j}}$?

$$Var(\hat{\beta_j}) = \frac{\sigma^2}{nVar(x_j)}*\frac{1}{1-R_j^2}$$
$$\sigma_{\hat{\beta_j}} = \sqrt{Var(\hat{\beta_j})}$$

In order to calculate $\sigma_{\hat{\beta_j}}$, we need to know the population variance of $u$ -- $\sigma^2$. But usually $\sigma^2$ is unknown. 

If we replace $\sigma^2$ with its sample estimate $s^2 = \frac{\sum \hat{u}^2}{n-k-1}$, the estimated standard deviation of $\hat{\beta_j}$ is called the **standard error** of $\hat{\beta_j}$. And a new distribution - t-distribution - is derived for this estimated version of z.

$$t_{\hat{\beta_j}}\equiv \frac{\hat{\beta_j}-\beta_j}{s.e.(\hat{\beta_j})} \sim t(n-k-1)$$

### Reasoning Step 2

If $\beta_j = 0$, we can plug 0 into the t formula and obtain the t-statistic.
$$t_{\hat{\beta_j}}\equiv \frac{\hat{\beta_j}}{s.e.(\hat{\beta_j})} \sim t(n-k-1)$$

### Reasoning Step 3

If $t_{\hat{\beta_j}}~t(n-k-1)$, then we can calculate $2*P(t>|t_{\hat{\beta_j}}|)$ according to the t distribution. The reason for using absolute value is that (1) t-distribution is symmetric, and (2) we want to know the probability of an extreme value, i.e. either a large value or a small value.

If this $p-value$ is too low (usually compared with 3 benchmarks - 0.1, 0.05, 0.01), we claim that the null hypothesis is not very likely to be true. We reject the null hypothesis, and conclude that $\beta_j$ is significant.

### Python Implementation 1 - smf package

Specification
$$ln(wage) = \beta_0 + \beta_1 educ + \beta_2 exper + \beta_3 tenure +u$$

In [2]:
import wooldridge as woo
import statsmodels.formula.api as smf
import numpy as np
df = woo.data("wage1")
res = smf.ols("np.log(wage) ~ educ + exper + tenure", data=df).fit()
print(res.tvalues)
print(res.pvalues.round(2))

Intercept     2.729230
educ         12.555246
exper         2.391437
tenure        7.133070
dtype: float64
Intercept    0.01
educ         0.00
exper        0.02
tenure       0.00
dtype: float64


In [10]:
res.summary()

0,1,2,3
Dep. Variable:,np.log(wage),R-squared:,0.316
Model:,OLS,Adj. R-squared:,0.312
Method:,Least Squares,F-statistic:,80.39
Date:,"Tue, 29 Mar 2022",Prob (F-statistic):,9.13e-43
Time:,13:11:20,Log-Likelihood:,-313.55
No. Observations:,526,AIC:,635.1
Df Residuals:,522,BIC:,652.2
Df Model:,3,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
Intercept,0.2844,0.104,2.729,0.007,0.080,0.489
educ,0.0920,0.007,12.555,0.000,0.078,0.106
exper,0.0041,0.002,2.391,0.017,0.001,0.008
tenure,0.0221,0.003,7.133,0.000,0.016,0.028

0,1,2,3
Omnibus:,11.534,Durbin-Watson:,1.769
Prob(Omnibus):,0.003,Jarque-Bera (JB):,20.941
Skew:,0.021,Prob(JB):,2.84e-05
Kurtosis:,3.977,Cond. No.,135.0


> [df.apply()](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.apply.html) is a useful function that enables you to apply a function to each row of a dataframe (series)

### Python Implementation 2 - Calculate Using Matrix

In [2]:
import patsy as pt
y,X = pt.dmatrices("np.log(wage) ~ educ + exper + tenure", data=df)
b_h = np.linalg.inv(X.T@X)@X.T@y
y_h = X@b_h
u_h = y-y_h
s2 = sum(u_h**2)/(X.shape[0]-X.shape[1]) 
vcov = s2*np.linalg.inv(X.T@X)
bse = np.sqrt(np.diagonal(vcov))
t = b_h[:,0]/bse # if you see some irregular results, try converting matrix to a vector (extrac the first column)
t

array([ 2.72923048, 12.55524558,  2.39143711,  7.13307039])

In [1]:
t=2.73

P(t>|t|) = 1- P(t<|t|)

In [3]:
#probability calculation in Python
from scipy import stats
p_gt_t = stats.t.cdf(np.abs(t),X.shape[0]-X.shape[1])#degrees of freedom = n-p
p = (1 - p_gt_t)*2 
p.round(2)

array([0.01, 0.  , 0.02, 0.  ])

In [5]:
from scipy import stats

(1-stats.t.cdf(t,len(df)-3-1))*2

0.006547382710030636

## 9.3 Apply the Idea to Test Other Estimators

Following the logic of the t-test - (1) find the distribution (2) plug in the hypothesis (3) find a probability - we can carry out a lot more hypothesis tests.

### 1 F-test

$$H_0: \beta_1=\beta_2=\dots=\beta_k =0$$

If Gauss-Markov assumptions are true, after plugging in the null hypothesis, we have
$$F \equiv \frac{(SST-SS_{error})/k}{SS_{error}/(n-k-1)} \sim F(k,n-k-1)$$

$P(F>F_{stat})$ is the p-value for F. (Since F-distribution is not symmetric, and it is always positive, we do not use two-tail tests).

In [4]:
res.fvalue

80.39091867158423

In [5]:
res.f_pvalue.round(2)

0.0

In [6]:
# Verify
sst = res.centered_tss
ssr = res.ssr
F_num = (sst-ssr)/res.df_model
F_denom = ssr/res.df_resid
F = F_num/F_denom
F

80.39091867158423

In [7]:
(1-stats.f.cdf(F,res.df_model,res.df_resid)).round(2)

0.0

### 2 More General F-Test

F test can be generalized to test part of $\beta$s. As an example, consider the following specification

$$ln(wage) = \beta_0 + \beta_1 educ + \beta_2 exper + \beta_3 tenure +u$$

Instead of assuming all parameters on independent variables are equal to 0, we assume **q** restrictions
$$H_0: \beta_2 = \beta_3 =0$$

With Gauss-Markov Assumptions and the null hypothesis, we have

$$F \equiv \frac{(SS_{error}^r-SS_{error})/q}{SS_{error}/(n-k-1)} \sim F(q,n-k-1)$$

where $SS_{error}^r$ is the sum of squared error (residual) from a restricted model
$$ln(wage) = \beta_0 + \beta_1 educ+u$$

In [6]:
# smf implementation
hypotheses = ["exper=0","tenure=0"]
ftest = res.f_test(hypotheses)

In [9]:
ftest.statistic

array([[49.68515733]])

In [9]:
fpval = ftest.pvalue
fpval.round(2)

0.0

In [10]:
# varify
ssr = res.ssr
res2 = smf.ols("np.log(wage)~educ",data=df).fit()
ssrr = res2.ssr

In [11]:
fstat = (ssrr-ssr)/2/(ssr/res.df_resid)
fstat

49.685157328648856

In [12]:
fpval = (1-stats.f.cdf(fstat,2,res.df_resid))
fpval.round(2)

0.0

### 3 Even More General F-test

In the null hypothesis, we do not need to equate $\beta_j$ to 0. We can use any form of linear combinations. For example, in model

$$ln(wage) = \beta_0 + \beta_1 educ + \beta_2 exper + \beta_3 tenure +u$$

We can assume

$$H_0: \beta_1=0.3,\text{ and }\beta_2=\beta_3$$

In [13]:
hypotheses = ["exper=0","exper+tenure=0"]

In [14]:
# smf
hypothesis=["educ=0.3","exper=tenure"]
ftest = res.f_test(hypothesis)
fstat = ftest.statistic[0,0]
fpval = ftest.pvalue
print("F statistic = {:.2f}, and its p-value = {:.2f}".format(fstat,fpval))

F statistic = 404.07, and its p-value = 0.00


We reject the null hypothesis

In [13]:
f = res.f_test(["4*tenure = educ","exper=0"])

In [14]:
f.statistic

array([[3.76853997]])

In [15]:
f.pvalue

array(0.02371631)

In [16]:
1-stats.f.cdf(f.statistic,2,len(df)-3-1)

array([[0.02371631]])

# Extra - Wald test

The Wald test, due to Wald(1943), is the preeminent test for joint restrictions. The null and alternative hypotheses of the Wald test are written as

$$H_0: R\beta = q$$
$$H_1: R\beta \neq q$$

where in the notation used here there are h restrictions, R is an $h\times p$ matrix of full rank h. $\beta$ is the $p\times 1$ parameter vector, q is an $h\times 1$ vector of constants.

The reasoning starts from $\hat{\beta}\sim \mathcal{N}(\beta_0, \sigma^2(X'X)^{-1})$. Hence, 

$$R\hat{\beta}-q \sim \mathcal{N}(0, \sigma^2R(X'X)^{-1}R')$$

Because a multivariate normal distribution is harder to test. We compute a $\chi^2$ statistics by normalizing $R\hat{\beta}-q$

$$Wald-statistic = (R\beta-q)'[\sigma^2R(X'X)^{-1}R']^{-1}(R\beta-q)'$$

> This step is similar to $(\frac{x-\mu}{\sigma})^2$

And we have $Wald-statistics \sim \chi^2(h)$

### implement using smf

Assume we want to test if $\beta_1=1$ and $\beta_2-\beta_3=2$. We have h=2, and we can rewrite the restrictions as

$$\begin{bmatrix} 0 & 1 & 0 & 0 \\ 0 & 0 & 1 & -1  \end{bmatrix} \beta - \begin{bmatrix} 0 \\ 2  \end{bmatrix}=0$$

In [15]:
wtest = res.wald_test(([[0,1,0,0],[0,0,1,-1]], [1,2]),use_f=False)
print(wtest.statistic)
print(wtest.pvalue)

[[227564.92149323]]
0.0


In [16]:
# Extra:
# squeeze() removes axis with length 1. 
# In this example we have only one row, so there is no need to use a matrix
wtest.statistic.squeeze()

array(227564.92149323)

In [17]:
wtest

<class 'statsmodels.stats.contrast.ContrastResults'>
<Wald test (chi2): statistic=[[227564.92149323]], p-value=0.0, df_denom=2>

### implement using matrix

In [18]:
R = np.array([[0,1,0,0],[0,0,1,-1]])
q = np.array([1,2])
b_h = res.params
w = (R@b_h-q).T@np.linalg.inv(R@vcov@R.T)@(R@b_h-q)

In [19]:
w

array(227564.92149323)

In [20]:
import scipy.stats as stats

In [21]:
#pvalue
1-stats.chi2.cdf(w,df=2)

0.0