### F-statistic ###

In order to make **joint** hypothesis testing for the estimators $b_j$, define the null hypothesis:

$$H_0 : R \beta = r$$

$$H_0 : R \beta - r= 0$$

Where $R_{g \times k }$ is a matrix of $g$ number of restrictions imposed on the $k$ parameters, and $r_{g \times 1}$ is a vector of values for these constraints (usually zero). 

For example, if $k = 3$, $g=2$, $r=0$ and $R$ is:

$$R = \begin{bmatrix}
1 & 0 & 0 \\
0 & 1 & 0 \\
\end{bmatrix}$$

The null hypothesis is

$$
H_0 : \begin{bmatrix}
\beta_1 \\
\beta_2 \\
\end{bmatrix}
=
\begin{bmatrix}
0 \\
0 \\
\end{bmatrix}
$$

Remember: The $R$ and $r$ matrices do not need to be zeros and ones, you can make any linear combination to test. This could be testing whether $\beta_1 = \beta_2 + 1$, or $4\beta_5 = \beta_1 + 3\beta_3 - 2$, etc.

To test $ R \beta - r= 0$ , we need to find a pivotal quantity. Recall that under assumptions 1-7, the distribution of $b$ is

$$b \sim N_k(\beta, \sigma^2(X^TX)^{-1})$$

$$Rb \sim N_g(R\beta, \sigma^2R(X^TX)^{-1}R^T)$$

$$Rb - r \sim N_g(R\beta - r, \sigma^2R(X^TX)^{-1}R^T)$$

Under $H_0$, $R\beta - r = 0$

$$Rb - r \sim N_g(0, \sigma^2R(X^TX)^{-1}R^T)$$

$$\frac{Rb - r}{\sigma} \sim N_g(0, R(X^TX)^{-1}R^T)$$

Rename $R(X^TX)^{-1}R^T = C_{g\times g}$. If this matrix is positive definite and symmetric, it can be decomposed in $C = C^{\frac{1}{2}}C^{\frac{1}{2}}$

$$\frac{Rb - r}{\sigma} \sim N_g(0, C^{\frac{1}{2}}C^{\frac{1}{2}})$$

$$C^{-\frac{1}{2}}\frac{Rb - r}{\sigma} \sim N_g(0, I_g)$$

We would perform our tests with this using the $N_g(0,I_g)$ distribution, but notice that this is not yet a pivotal quantity, because there is an unknown $\sigma$ parameter. We could adjust this using the Chi distribution and constructing $T$ random variables. However, this is not possible given that the distribution derived earlier estimates $\sigma^2$, not $\sigma$. We solve this constructing a Chi squared distribution from our $N_g(0, I_g)$

$$C^{-\frac{1}{2}}\frac{Rb - r}{\sigma} \sim N_g(0, I_g) \quad \text{, then $\Bigg[C^{-\frac{1}{2}}\frac{Rb - r}{\sigma}\Bigg]^TC^{-\frac{1}{2}}\frac{Rb - r}{\sigma} \sim \chi^2(g)$}$$


$$\frac{(Rb - r)^TC^{-1}(Rb - r)}{\sigma^2} \sim \chi^2(g)$$

If $\sigma$ were a know parameter, we would just use a Chi squared for the test. However, it isn't, so to find the pivotal quantity we will divide two Chi squared random variables, which will result in an $F$ random variable.

For $V_1 \sim \chi^2(k_1)$ and $V_2 \sim \chi^2(k_2)$ Chi squared random variables:

$$ F(k_1,k_2) = \frac{V_1/k_1}{V_2/k_2} \quad \text{, an $F$ distribution with $k_1, k_2$ degrees of freedom} $$

We will use $\frac{e^Te}{\sigma^2}$ , which we know from previous results that follows a $ \chi^2(n-k)$ distribution. Then


$$\frac{V_1/k_1}{V_2/k_2} = \frac{(Rb - r)^TC^{-1}(Rb - r)}{\sigma^2}\frac{1}{g}\Bigg[\frac{e^Te}{\sigma^2}\frac{1}{n-k}\Bigg]^{-1} = \frac{(Rb - r)^TC^{-1}(Rb - r)/g}{e^Te/(n-k)} \sim F(g,n-k)$$

Where $C = R(X^TX)^{-1}R^T$



### Restricted regression ###

To find the $b_R$ estimators that minimize the sum of squared residuals, subject to having a constraint of the form  $Rb_R = r$:

\begin{equation*}
\begin{aligned}
& \underset{b_R}{\text{minimize}}
& & S(b_R) = (y-Xb_R)^T(y-Xb_R) \\
& \text{subject to}
& & Rb_R - r = g(b_R) = 0.
\end{aligned}
\end{equation*}

The Karush-Kuhn-Tucker conditions are:

$$\nabla S(b_R) = \nabla g(b_R) \lambda $$

$$g(b_R) =0  $$

$$ X^T(y-Xb_R) = R^T \lambda $$

$$ (X^TX)b_R = X^Ty - R^T\lambda$$

$$ b_R = (X^TX)^{-1}X^Ty - (X^TX)^{-1}R^T\lambda$$

Where $b = (X^TX)^{-1}X^Ty$ is the estimator from the non restricted model

$$ b_R = b - (X^TX)^{-1}R^T\lambda$$

Premultiplying bt $R$ and substracting $r$

$$ Rb_R - r = Rb - R(X^TX)^{-1}R^T\lambda - r $$

$$R(X^TX)^{-1}R^T\lambda = Rb - r $$

$$\lambda = (R(X^TX)^{-1}R^T)^{-1}(Rb - r) $$

Replacing:

$$ b_R = b - (X^TX)^{-1}R^T(R(X^TX)^{-1}R^T)^{-1}(Rb - r)$$

Using $y = Xb + e$ and $y = Xb_R + e_R$ 

$$ Xb_R = Xb - X(X^TX)^{-1}R^T(R(X^TX)^{-1}R^T)^{-1}(Rb - r)$$

$$ y - e_R = y - e - X(X^TX)^{-1}R^T(R(X^TX)^{-1}R^T)^{-1}(Rb - r)$$

$$ e_R = e + X(X^TX)^{-1}R^T(R(X^TX)^{-1}R^T)^{-1}(Rb - r)$$

$$ e_R^T = e^T + (Rb - r)^T(R(X^TX)^{-1}R^T)^{-1}R(X^TX)^{-1}X^T $$

Using the fact that $X^Te = 0$, and renaming $C = R(X^TX)^{-1}R^T$

$$e_R^Te_R = e^Te + (Rb - r)^T(R(X^TX)^{-1}R^T)^{-1}R(X^TX)^{-1}X^TX(X^TX)^{-1}R^T(R(X^TX)^{-1}R^T)^{-1}(Rb - r)   $$

(The following also counts as a proof that when a model is restricted, its SSR increases)

$$e_R^Te_R = e^Te + (Rb - r)^TC^{-1}CC^{-1}(Rb - r)   $$

$$e_R^Te_R - e^Te = (Rb - r)^TC^{-1}(Rb - r)   $$

Finally, we can replace this result in the $F$ statistic:

$$F = \frac{(Rb - r)^TC^{-1}(Rb - r)/g}{e^Te/(n-k)} = \frac{(e_R^Te_R - e^Te)/g}{e^Te/(n-k)}    \sim F(g,n-k)$$

In terms of $SSR$

$$F = \frac{(SSR_R - SSR)/g}{SSR/(n-k)}   $$

So we can test the **joint** hypothesis using as input for the $F$ statistic the $SSR$ of both specifications. 