# 11. $t$ and $F$ Distributions
<hr>

A statistic is any real-valued function of the observable random variables in a sample. For example, the sample mean $\bar{X} = \frac{1}{n} \sum_{i=1}^n X_i$ is a statistic because it is a function of only the observed values $X_i$. One of the goals of statistical theory is to estimate the unknown population parameters by statistics, e.g., $\mu$ can be estimated by $\bar{X}$. The probability distribution of a statistic (such as $\bar{X}$) is called the **sampling distribution** of that statistic. 

To find the sampling distributions of statistics, we use the three methods discussed in notebook 10.

## 11.1 Sampling Distributions Related to the Normal
<hr>

We know that if $X_i \sim N(\mu_i, \sigma^2_i)$, then $U=a_1 X_1 + a_2 X_2 + \cdots + a_n X_n) \sim N(\mu_U, \sigma^2_U)$.

<br>

**Theorem:** Let $Y_1, Y_2, \cdots, Y_n$ be a random sample of size $n$ from a normal distribution with mean $\mu$ and variance $\sigma^2$. Then,

$$\bar{Y} = \frac{1}{n} \sum_{i=1}^n Y_i = \frac{1}{n}Y_1 + \frac{1}{n}Y_2 + \cdots + \frac{1}{n}Y_n = a_1 Y_1 + a_2 Y_2 + \cdots + a_n Y_n$$

Where $a_i = \frac{1}{n}, i\in [1, n]$.

Thus $\bar{Y}$ is a linear combination of $Y_1, Y_2, \cdots, Y_n$.

$$E(\bar{Y}) = E\left[ \frac{1}{n}Y_1 + \cdots + \frac{1}{n}Y_n \right] = \frac{1}{n} (\mu) + \cdots + \frac{1}{n} (\mu) = \frac{n \mu}{n} = \mu$$

$$V(\bar{Y}) = V\left[ \frac{1}{n}Y_1 + \cdots + \frac{1}{n}Y_n \right] = \frac{1}{n^2} (\sigma^2) + \cdots + \frac{1}{n^2}(\sigma^2) = \frac{\sigma^2}{n}$$

Therefore, if $Y_1,Y_2, \cdots ,Y_n$ are drawn from a normal distribution with mean $\mu$ and variance $\sigma^2$, then the average $\bar{Y}$ is also normal and given as:

$$\bar{Y} = \frac{1}{n} \sum_{i=1}^n Y_i \sim N \left(\mu, \frac{\sigma^2}{n} \right)$$

$$Z = \frac{\bar{Y} - \mu_{\bar{Y}}}{\sigma_{\bar{Y}}} = \frac{\bar{Y} - \mu}{\left( \frac{\sigma}{\sqrt{n}} \right)} = \sqrt{n} \left( \frac{\bar{Y} - \mu}{\sigma} \right)$$

<br>

**Theorem:** Let $Y_1, Y_2, \cdots, Y_n$ be a random sample from a normal distribution with mean $\mu$ and variance $\sigma^2$. Then,

$$\frac{(n-1)S^2}{\sigma^2} = \frac{1}{\sigma^2} sum_{i=1}^n (Y_i - \bar{Y})^2 \sim \chi^2_{(n-1)}$$

Also, $\bar{Y}$ and $S^2$ are independent random random variables.

*Proof.* Let's consider $n=2$:

$$\frac{(n-1)S^2}{\sigma^2} = \frac{(Y_1 - Y_2)^2}{2\sigma^2} = \left( \frac{Y_1 - Y_2}{\sqrt{2\sigma^2}} \right)^2 = Z^2 \sim \chi^2_{(1)}$$

## 11.2 The $t$ Distribution
<hr>

Let $Z$ be a standard normal random variable and let $W$ be a $\chi^2$-distributed variable with $\nu$ degrees of freedom. Then, if $Z$ and $W$ are independent,

$$T = \frac{Z}{\sqrt{W/\nu}}$$

is said to have a $t$ distribution with $\nu$ degrees of freedom.

$$T = \frac{\sqrt{n} \left( \frac{\bar{Y} - \mu}{\sigma} \right)}{\sqrt{\frac{\left( \frac{(n-1)S^2}{\sigma^2} \right)}{n-1}}} = \sqrt{n} \left( \frac{\bar{Y} - \mu}{S} \right) $$

has a $t$ distribution with $n-1$ degrees of freedom.

$$E(T) = 0, \quad V(T) = \begin{cases}
\frac{n}{n-2} & n \geq 3 \\
\infty & n=2 \\
1 & n \rightarrow \infty \\
\end{cases}
$$

## 11.3 The F Distribution
<hr>

Suppose that we want to compare the variances of two normal populations based on information contained in independent random samples from the two populations. Samples of sizes $n_1$ and $n_2$ are taken from the two populations with variances $\sigma_1^2$ and $\sigma_2^2$, respectively. If we calculate $S_1^2$ from the observations in sample 1, then $S_1^2$ estimates $\sigma_1^2$. Similarly, $S_2^2$, calculated from the observations in sample 2, estimates $\sigma_2^2$. Thus, it seems intuitive that the ratio $\frac{S_1^2}{S_2^2}$ could be used to make inferences about the relative magnitudes of $\sigma_1^2$ and $\sigma_2^2$. If we divide each $S_i^2$ by $\sigma_i^2$, then the resulting ratio:

$$\frac{S_1^2 / \sigma_1^2}{S_2^2 / \sigma_2^2} = \left( \frac{\sigma_2^2}{\sigma_1^2} \right) \left( \frac{S_1^2}{S_2^2} \right)$$

has an F distribution with $n_1 - 1$ numerator degrees of freedom and $n_2 - 1$ denominator degrees of freedom. 

### Definition

Let $W_1$ and $W_2$ be independent $\chi^2$-distributed random variables with $\nu_1$ and $\nu_2$ df, respectively. Then,

$$F = \frac{W_1 / \nu_1}{W_2 / \nu_2}$$

$$F_{(\nu_1, \nu_2)} \text{: range } [0, \infty)$$

is said to have an F distribution with $\nu_1$ numerator degrees of freedom and $\nu_2$ denominator degrees of freedom.

$$\nu_1 = \text{number of groups being compared} - 1$$

$$\nu_2 = \text{number of data points} - \text{number of groups}$$

Typically, $\nu_2 > \nu_1$

<br>

$$E(F) = \frac{\nu_2}{\nu_2 - 2}, \quad v_2 > 2$$

$$V(F) = \frac{2 \nu_2^2 (\nu_1 + \nu_2 - 2}{\nu_1 (\nu_2 - 2)^2 (\nu_2 - 4)}, \quad \nu_2>4$$