layout | mathjax | author | affiliation | e_mail | date | title | chapter | section | topic | theorem | sources | proof_id | shortcut | username | ||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
proof |
true |
Joram Soch |
BCCN Berlin |
joram.soch@bccn-berlin.de |
2021-03-12 01:20:00 -0800 |
Two-sample t-test for independent observations |
Statistical Models |
Univariate normal data |
Univariate Gaussian |
Two-sample t-test |
|
P205 |
ug-ttest2 |
JoramSoch |
Theorem: Let
be two univariate Gaussian data sets representing two groups of unequal size
$$ \label{eq:t} t = \frac{(\bar{y}_1-\bar{y}2)-\mu\Delta}{s_p \cdot \sqrt{\frac{1}{n_1}+\frac{1}{n_2}}} $$
with sample means
under the null hypothesis
Proof: The sample means are given by
$$ \label{eq:mean-samp} \begin{split} \bar{y}1 &= \frac{1}{n_1} \sum{i=1}^{n_1} y_{1i} \ \bar{y}2 &= \frac{1}{n_2} \sum{i=1}^{n_2} y_{2i} \end{split} $$
and the pooled standard deviation is given by
with the sample variances
$$ \label{eq:var-samp} \begin{split} s^2_1 &= \frac{1}{n_1-1} \sum_{i=1}^{n_1} (y_{1i} - \bar{y}1)^2 \ s^2_2 &= \frac{1}{n_2-1} \sum{i=1}^{n_2} (y_{2i} - \bar{y}_2)^2 ; . \end{split} $$
Using the linear combination formula for normal random variables, the sample means follows normal distributions with the following parameters:
$$ \label{eq:mean-samp-dist} \begin{split} \bar{y}1 &= \frac{1}{n_1} \sum{i=1}^{n_1} y_{1i} \sim \mathcal{N}\left( \frac{1}{n_1} n_1 \mu_1, \left(\frac{1}{n_1}\right)^2 n_1 \sigma^2 \right) = \mathcal{N}\left( \mu_1, \sigma^2/n_1 \right) \ \bar{y}2 &= \frac{1}{n_2} \sum{i=1}^{n_2} y_{2i} \sim \mathcal{N}\left( \frac{1}{n_2} n_2 \mu_2, \left(\frac{1}{n_2}\right)^2 n_2 \sigma^2 \right) = \mathcal{N}\left( \mu_2, \sigma^2/n_2 \right) ; . \end{split} $$
Again employing the linear combination theorem and applying the null hypothesis from \eqref{eq:ttest2-h0}, the distribution of
$$ \label{eq:Z-dist} Z = \frac{(\bar{y}1-\bar{y}2)-\mu\Delta}{\sigma \cdot \sqrt{\frac{1}{n_1}+\frac{1}{n_2}}} \sim \mathcal{N}\left( \frac{(\mu_1-\mu_2)-\mu\Delta}{\sigma \cdot \sqrt{\frac{1}{n_1}+\frac{1}{n_2}}}, \left(\frac{1}{\sigma \cdot \sqrt{\frac{1}{n_1}+\frac{1}{n_2}}}\right)^2 \left( \frac{\sigma^2}{n_1} + \frac{\sigma^2}{n_2} \right) \right) \overset{H_0}{=} \mathcal{N}\left( 0, 1 \right) ; . $$
Because sample variances calculated from independent normal random variables follow a chi-squared distribution, the distribution of
Finally, since the ratio of a standard normal random variable and the square root of a chi-squared random variable follows a t-distribution, the distribution of the test statistic is given by
$$ \label{eq:t-dist-qed} t = \frac{(\bar{y}_1-\bar{y}2)-\mu\Delta}{s_p \cdot \sqrt{\frac{1}{n_1}+\frac{1}{n_2}}} = \frac{Z}{\sqrt{V / (n_1+n_2-2)}} \sim \mathrm{t}(n_1+n_2-2) ; . $$
This means that the null hypothesis can be rejected when