# Standard Error of OLS parameter
* Suppose we got the beta value of 10
* Is this value reliable?
* What if the variance of this value gathered from random samples is 100? Sometimes it is like -80. Can we still believe that the true value of beta is not 0?
* Standard error gives us the hint for this question.

$$
\hat\beta = \beta_{pop} + \displaystyle \frac{\sum_{i=1}^N(x_i-\bar{x}u_i)}{Var(X)} \\
Var(\displaystyle \frac{\sum_{i=1}^N(x_i-\bar{x})u_i}{Var(X)}) = \displaystyle\frac{\sum_{i=1}^NVar((x_i-\bar{x})u_i)}{Var(X)^2}~(\because Cov=0)\\ = \frac{\sum_{i=1}^N(x_i-\bar{x})^2\sigma^2}{Var(X)^2} = \frac{\sigma^2}{Var(X)}
$$

Since we don't know the theoretical population $\sigma^2$, we use the estimated one from the sample: $\hat\sigma^2$<br>

$\therefore Var(\hat\beta|X) = \displaystyle\frac{\hat\sigma^2}{Var(X)}$

### How to get the value of $\hat\sigma^2$

In the sample,
$\tilde\sigma^2 = \displaystyle\frac{1}{N}\displaystyle\sum_{i=1}^N\hat{u}^2_i = \frac{1}{N}(\hat{u}^2_1 + \hat{u}^2_2 + ... + \hat{u}^2_N)$

and the last k-errors are all zero when there are k regressors, because the sum of all $u_i$ should be zero if the estimator is BLUE. So,

$\tilde\sigma^2 = \displaystyle\frac{1}{N-k}\displaystyle\sum_{i=1}^N\hat{u}^2_i$

* Intuition: if the sample variance is high relative to the total variance, then the standard error is low. It's because we have diverse elements in the sample.

### Heteroskedasticy's role in standard error

the variance of $\hat\beta$ is $\displaystyle\frac{\sum_{i=1}^Nu_i^2}{Var(X)}$ since the variance is the function of X.

### Serial Correlation's role in standard error

the covariance between error are not zero, so the error term is added to the formula, likely to underestimate errors by the formula above.

### Getting the sample variance
* Sample variance of a sample is $\sigma^2 = \displaystyle\frac{1}{N-1}\sum_{i=1}^N(x_i-\bar x)^2$
* Why N-1 instead of N?

proof:

when the population size is N and the sample size is n,<br>
$$
\bar x_s = \displaystyle\frac{1}{n}\sum_{i=1}^nx_i, ~~\mathbb{E}[\bar X_s] = \mu_s = \mu_{p}
$$<br>

we want
$$
\sigma^2_{p} = \displaystyle\frac{1}{n}((x_1-\bar x_s)^2+(x_1-\bar x_s)^2+...+(x_{n}-\bar x_s)^2)
$$<br>
then<br>
$$
\mathbb{E}[\sigma^2_{p}] = \displaystyle\frac{1}{n}(\sigma_s^2+\sigma_s^2+...+0)=\frac{n-1}{n}\sigma^2_s (\because \mu_s = \mu_p)\\
\therefore \sigma_s^2 = \frac{n}{n-1}\sigma^2 = \frac{n}{n-1}\frac{1}{n}\sum_{i=1}^n(x_i-\bar x_s)^2 = \frac{1}{n-1}\sum_{i=1}^n(x_i-\bar x_s)^2
$$

# Hypothesis Testing

**What is hypothesis testing?**
* Statistics is all about confidence.
* In most time, we can't figure out the statistic of the whole population, and should deal with samples.
* Then, can we figure out the properties of the whole population, when we only have a small size of samples? Are the properties derived from those samples reliable? In other words, will those can be applied to the whole population too?
* Hypothesis testing gives us answers on questions like above.


An essential term before we start: <br>
**Null hypothesis** : Claim that the effect being studied does not exist.
<br>

For a result to be reliable, **the null hypothesis must be rejected** with some level of (high)confidence. Hypothesis tests are all about when will we reject the null hypothesis with how much confidence.

## Tail Tests

Tail tests are used in hypothesis testing to determine if there is a significant difference between a sample statistic and a population parameter in a specific direction.

### One-Tailed Test

In a one-tailed test, we're testing for the possibility of the relationship in one direction only.

1. Right-Tailed Test:
   $$
   H_0: \mu \leq \mu_0 \\
   H_1: \mu > \mu_0
   $$

2. Left-Tailed Test:
   $$
   H_0: \mu \geq \mu_0 \\
   H_1: \mu < \mu_0
   $$

### Two-Tailed Test

In a two-tailed test, we're testing for the possibility of the relationship in both directions.

$$
H_0: \mu = \mu_0 \\
H_1: \mu \neq \mu_0
$$


## Z-Test

The z-test is a statistical test used to determine whether two population means are different when the variances are known and the sample size is large.

### Z-Test Statistic

The z-test statistic is calculated as:

$$
z = \frac{\bar{x} - \mu_0}{\sigma / \sqrt{n}}
$$


Where:
- $\bar{x}$ is the sample mean
- $\mu_0$ is the population mean under the null hypothesis
- $\sigma$ is the population standard deviation
- $n$ is the sample size

### Z-Test for Two Populations

For comparing two population means:

$$
z = \frac{(\bar{x}_1 - \bar{x}_2) - (\mu_1 - \mu_2)}{\sqrt{\frac{\sigma_1^2}{n_1} + \frac{\sigma_2^2}{n_2}}}
$$


Where:
- $\bar{x}_1$ and $\bar{x}_2$ are the sample means
- $\mu_1$ and $\mu_2$ are the population means
- $\sigma_1$ and $\sigma_2$ are the population standard deviations
- $n_1$ and $n_2$ are the sample sizes

### Decision Rule

For a significance level $\alpha$:

1. For a right-tailed test, reject $H_0$ if $z > z_{\alpha}$
2. For a left-tailed test, reject $H_0$ if $z < -z_{\alpha}$
3. For a two-tailed test, reject $H_0$ if $|z| > z_{\alpha/2}$

Where $z_{\alpha}$ and $z_{\alpha/2}$ are the critical values from the standard normal distribution.

## **T test**
**IMPORTANT!**

T-test is similar with the Z test, but used when we don't know the metrics of the whole population, and only know those from the samples. Well, In most time it's impossible to get the population metrics, so T-test is more widely used.

For example, if I want to compare mean math ability between Korean and Japanese students. 10 students from each nation took the test. Korean students got mean score 71, and Japanese students got 70; can I say that Korean students are better in math, with only 1 point of difference? Maybe it is just because the sampling error. And if I say Korean students are better at math, how strong is the confidence?

The Student's t-test is used to determine if there is a significant difference between the means of two groups. The test statistic is calculated as:

$$
t = \frac{\bar{x}_1 - \bar{x}_2}{s_p \sqrt{\frac{2}{n}}}
$$


where:

- $\bar{x}_1$ and $\bar{x}_2$ are the sample means
- $s_p$ is the pooled standard deviation
- $n$ is the sample size (assuming equal sample sizes)

The pooled standard deviation is calculated as:

$$
s_p = \sqrt{\frac{s_1^2 + s_2^2}{2}}
$$


where $s_1^2$ and $s_2^2$ are the sample variances.

For unequal sample sizes, the formula becomes:

$$
t = \frac{\bar{x}_1 - \bar{x}_2}{\sqrt{\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2}}}
$$


The degrees of freedom for this test are calculated as:

$$
df = n_1 + n_2 - 2
$$


The null hypothesis ($H_0$) and alternative hypothesis ($H_1$) are typically stated as:

$$
\begin{align*}
H_0&: \mu_1 = \mu_2 \\
H_1&: \mu_1 \neq \mu_2
\end{align*}
$$


where $\mu_1$ and $\mu_2$ are the population means.

The calculated t-value is compared to the critical t-value from the t-distribution table, or the p-value is computed to make a decision about the null hypothesis.

So in the example of math score above, I need the variation of scores for each of the sample to confirm if there is a significant difference or not.

**How can we use T-test to evaluate regression model?**
1. Set null hypothesis $H_{0k} : \beta_k = 0$
2. Perform T test for that null hypothesis
3. If the null hypothesis is rejected, then beta is not zero.

## **F test**

T test is powerful to measure how significant is the mean difference. However it can compare two distributions, not many. Maybe we should consider the joint effect of the regressors, not the individual ones, when estimating the power of the regression model.

The F-test is used to compare the variances of two populations or to test the overall significance of a regression model. The test statistic follows an F-distribution under the null hypothesis.

### F-Statistic

The F-statistic is calculated as the ratio of two variances:

$$
F = \frac{s_1^2}{s_2^2}
$$


where $s_1^2$ and $s_2^2$ are the sample variances.

### Degrees of Freedom

The F-distribution has two parameters for degrees of freedom:

- $df_1 = n_1 - 1$ (numerator degrees of freedom)
- $df_2 = n_2 - 1$ (denominator degrees of freedom)

where $n_1$ and $n_2$ are the sample sizes.

### Hypothesis Testing

For a two-tailed test of equal variances:

- Null hypothesis: $H_0: \sigma_1^2 = \sigma_2^2$
- Alternative hypothesis: $H_1: \sigma_1^2 \neq \sigma_2^2$

### Critical Value

The critical value $F_{\alpha, df_1, df_2}$ is found from the F-distribution table or calculator, where $\alpha$ is the significance level.

### Decision Rule

- If $F < F_{\alpha/2, df_1, df_2}$ or $F > F_{1-\alpha/2, df_1, df_2}$, reject $H_0$
- Otherwise, fail to reject $H_0$

### F-Test in Regression

* F-test for regression is essentially testing if the regression using x to get y is better than just estimating y with the previous mean of y.

We need some metrics for this:<br>
Restricted Sum of Squares for Regression $SSR_R = \displaystyle\sum_{i=1}^N(y_i-\bar{y})^2$<br>
Unrestricted Sum of Squares for Regression $SSR_{UR} = \displaystyle\sum_{i=1}^N(y_i-\hat{y})^2$<br>
and the F statistic is:
$$F = \frac{(SSR_R-SSR_{UR})/P_R}{SSR_{UR}/(N-P-1)}$$
where $P_R$ is the first degree fo freedom which is the number of linear restrictions in $H_0$ same with the number of zero betas. $P$ is the original # of regressors, which makes up $N-P-1$ which is the second degree of freedom.
We can test for the meaninglessness of betas for any combinations of betas.

# **Normality Tests**

## 1. Jaques - Bera Test
* Used to prove the error term from a regressor is normally distributed.
* Originally it is just a normality test
* **Uses skewness and kurtosis to figure out if the distribution is similar to the normal distribution.**

The Jacques-Bera test statistic is:

$$
JB = \frac{n}{6}\left(S^2 + \frac{1}{4}(K-3)^2\right)
$$


where:

- $n$ is the number of observations
- $S$ is the sample skewness
- $K$ is the sample kurtosis

The sample skewness $S$ and kurtosis $K$ are calculated as:

$$
S = \frac{\frac{1}{n}\sum_{i=1}^n (x_i - \bar{x})^3}{(\frac{1}{n}\sum_{i=1}^n (x_i - \bar{x})^2)^{3/2}}
$$


$$
K = \frac{\frac{1}{n}\sum_{i=1}^n (x_i - \bar{x})^4}{(\frac{1}{n}\sum_{i=1}^n (x_i - \bar{x})^2)^2}
$$


Under the null hypothesis of normality, the JB statistic asymptotically follows a chi-squared distribution with two degrees of freedom:

$$
JB \sim \chi^2_2
$$


The null hypothesis is rejected if the JB statistic is greater than the critical value from the chi-squared distribution with 2 degrees of freedom at the chosen significance level(generally 0.95 or 0.99).

## 2. Shapiro-Wilk test

* Also a test that can be used to figure out if the error tern is normally distributed.
* Uses the quantiles - quite similar with the QQplot, but in numerical metrics.

The Shapiro-Wilk test is a statistical test of the hypothesis that sample data have the skewness and kurtosis matching a normal distribution. It is defined as:

$$
W = \frac{(\sum_{i=1}^n a_i x_{(i)})^2}{\sum_{i=1}^n (x_i - \bar{x})^2}
$$


where:

- $x_{(i)}$ are the ordered sample values (x_(1) is the smallest)
- $\bar{x}$ is the sample mean
- $a_i$ are constants generated from the means, variances and covariances of the order statistics of a sample of size n from a normal distribution

The coefficients $a_i$ are calculated as:

$$
(a_1, \ldots, a_n) = \frac{m^T V^{-1}}{(m^T V^{-1} V^{-1} m)^{1/2}}
$$


where:

- $m = (m_1, \ldots, m_n)^T$
- $m_i$ are the expected values of the order statistics of independent and identically distributed random variables sampled from the standard normal distribution
- $V$ is the covariance matrix of those order statistics

The null hypothesis of this test is that the population is normally distributed. Thus, if the p-value is less than the chosen alpha level, then the null hypothesis is rejected and there is evidence that the data tested are not from a normally distributed population.

The W statistic is always greater than zero and less than or equal to one, with a value of one indicating normality.

## 3. Kolmogorov-Smirnov Test

* Uses the difference between CDFs between two distributions
* Can be used to compare with any arbitrary distributions.

The Kolmogorov-Smirnov (K-S) test is a nonparametric test of the equality of continuous, one-dimensional probability distributions that can be used to compare a sample with a reference probability distribution or to compare two samples.

### One-Sample K-S Test

For a one-sample K-S test, the test statistic is defined as:

$$
D_n = \sup_x |F_n(x) - F(x)|
$$


where:

- $F_n(x)$ is the empirical distribution function of the sample
- $F(x)$ is the cumulative distribution function of the reference distribution
- $\sup_x$ is the supremum of the set of distances

### Two-Sample K-S Test

For a two-sample K-S test, the test statistic is:

$$
D_{n,m} = \sup_x |F_{1,n}(x) - F_{2,m}(x)|
$$


where $F_{1,n}$ and $F_{2,m}$ are the empirical distribution functions of the first and second sample respectively, and $n$ and $m$ are the sample sizes.

### Critical Values

The critical value for the K-S test statistic is given by:

$$
D_{\alpha} = c(\alpha) \sqrt{\frac{n+m}{nm}}
$$


where $c(\alpha)$ is the coefficient that depends on the significance level $\alpha$. For $\alpha = 0.05$, $c(\alpha) \approx 1.36$.

### Null Hypothesis

The null hypothesis of the K-S test is:

$$
H_0: F_1(x) = F_2(x) \text{ for all } x
$$


The null hypothesis is rejected if the test statistic $D$ is greater than the critical value $D_{\alpha}$.

### Asymptotic Distribution

For large sample sizes, the asymptotic distribution of the K-S test statistic under the null hypothesis is given by:

$$
\lim_{n \to \infty} P(\sqrt{n}D_n \leq x) = 1 - 2\sum_{k=1}^{\infty} (-1)^{k-1}e^{-2k^2x^2}
$$


This distribution is known as the Kolmogorov distribution.