# T-test and T-distribution
### T-distribution
A T-distribution with $v$-degree of freedoms is defined by 
$$T = \frac{Z}{\sqrt{V/v}}$$
where $Z$ is a standard normal distribution, and $V$ is a chi-square distribution with $v$-degree of freedoms. It looks very similar to a normal distribution.

### T-test
Let $X_1,\ldots, X_n$ be independent random variables sampled from a normal distribution $N(\mu, \sigma^2)$. We would like to perform the following hypothesis test
$$H_0: \mu = \mu_0$$

The $t$-statistics is defined as 
$$t = \frac{\overline{X}-\mu_0}{s/\sqrt{n}} = \frac{\overline{X}-\mu_0}{\sigma/\sqrt{n}}/\frac{s}{\sigma}$$

The term $\frac{\overline{X}-\mu_0}{\sigma/\sqrt{n}}$ is the standard normal distribution under the null-hypothesis, and $s/\sigma = \sqrt{\frac{1}{n-1}\sum_1^n\frac{(X_i-\overline{X})^2}{\sigma^2}}$ where $\sum_1^n\frac{(X_i-\overline{X})^2}{\sigma^2}$ is a chi-square distribution with $n-1$ degree of freedom (Cochran's theorem).

The t-test is not sensitive to the normality assumption as long as the sample size is large.

In [26]:
import numpy as np
import scipy.stats as stats

sample_size = 500
mu = 0.2
sigma = 2

# generate the random sequence X from a normal distribution
rng = np.random.default_rng()
X = rng.normal(mu, sigma, sample_size)
X_mean = X.mean()

s = np.sqrt(((X-X_mean) ** 2).sum()/(sample_size - 1))
t = X_mean/(s/np.sqrt(sample_size))
print(f"The estimator for mu is {X_mean:.4f}")
print(f"The t statistic is {t:.4f}")
p = 1 - stats.t.cdf(t, df = sample_size - 1)
print(f"The p value for the t-test is {p:.4f}")

The estimator for mu is 0.1959
The t statistic is 2.1731
The p value for the t-test is 0.0151


In [28]:
# Another example where X is not sampled from a normal distribution
sample_size = 500

# generate the random sequence X from a non-normal distribution
rng = np.random.default_rng()
X = 10 * rng.random(sample_size) - 4.9
X_mean = X.mean()

s = np.sqrt(((X-X_mean) ** 2).sum()/(sample_size - 1))
t = X_mean/(s/np.sqrt(sample_size))
print(f"The estimator for mu is {X_mean:.4f}")
print(f"The t statistic is {t:.4f}")
p = 1 - stats.t.cdf(t, df = sample_size - 1)
print(f"The p value for the t-test is {p:.4f}")

The estimator for mu is 0.2414
The t statistic is 1.8519
The p value for the t-test is 0.0323
