layout

mathjax

author

affiliation

e_mail

date

title

chapter

section

topic

theorem

sources

proof_id

shortcut

username

proof

true

Joram Soch

BCCN Berlin

joram.soch@bccn-berlin.de

2021-03-12 01:20:00 -0800

Two-sample t-test for independent observations

Statistical Models

Univariate normal data

Univariate Gaussian

Two-sample t-test

authors	year	title	in	pages	url
Wikipedia	2021	Student's t-distribution	Wikipedia, the free encyclopedia	retrieved on 2021-03-12	https://en.wikipedia.org/wiki/Student%27s_t-distribution#Derivation

authors	year	title	in	pages	url
Wikipedia	2021	Student's t-test	Wikipedia, the free encyclopedia	retrieved on 2021-03-12	https://en.wikipedia.org/wiki/Student%27s_t-test#Equal_or_unequal_sample_sizes,_similar_variances_(1/2_%3C_sX1/sX2_%3C_2)

P205

ug-ttest2

JoramSoch

Theorem: Let

$$ \label{eq:ug} \begin{split} y_{1i} &\sim \mathcal{N}(\mu_1, \sigma^2), \quad i = 1, \ldots, n_1 \\ y_{2i} &\sim \mathcal{N}(\mu_2, \sigma^2), \quad i = 1, \ldots, n_2 \end{split} $$

be two univariate Gaussian data sets representing two groups of unequal size $n_1$ and $n_2$ with unknown means $\mu_1$ and $\mu_2$ and equal unknown variance $\sigma^2$. Then, the test statistic

$$ \label{eq:t} t = \frac{(\bar{y}_1-\bar{y}2)-\mu\Delta}{s_p \cdot \sqrt{\frac{1}{n_1}+\frac{1}{n_2}}} $$

with sample means $\bar{y}_1$ and $\bar{y}_2$ and pooled standard deviation $s_p$ follows a Student's t-distribution with $n_1+n_2-2$ degrees of freedom

$$ \label{eq:t-dist} t \sim \mathrm{t}(n_1+n_2-2) $$

under the null hypothesis

$$ \label{eq:ttest2-h0} H_0: ; \mu_1-\mu_2 = \mu_\Delta ; . $$

Proof: The sample means are given by

$$ \label{eq:mean-samp} \begin{split} \bar{y}1 &= \frac{1}{n_1} \sum{i=1}^{n_1} y_{1i} \ \bar{y}2 &= \frac{1}{n_2} \sum{i=1}^{n_2} y_{2i} \end{split} $$

and the pooled standard deviation is given by

$$ \label{eq:std-pool} s_p = \sqrt{ \frac{(n_1-1) s^2_1 + (n_2-1) s^2_2}{n_1+n_2-2} } $$

with the sample variances

$$ \label{eq:var-samp} \begin{split} s^2_1 &= \frac{1}{n_1-1} \sum_{i=1}^{n_1} (y_{1i} - \bar{y}1)^2 \ s^2_2 &= \frac{1}{n_2-1} \sum{i=1}^{n_2} (y_{2i} - \bar{y}_2)^2 ; . \end{split} $$

Using the linear combination formula for normal random variables, the sample means follows normal distributions with the following parameters:

$$ \label{eq:mean-samp-dist} \begin{split} \bar{y}1 &= \frac{1}{n_1} \sum{i=1}^{n_1} y_{1i} \sim \mathcal{N}\left( \frac{1}{n_1} n_1 \mu_1, \left(\frac{1}{n_1}\right)^2 n_1 \sigma^2 \right) = \mathcal{N}\left( \mu_1, \sigma^2/n_1 \right) \ \bar{y}2 &= \frac{1}{n_2} \sum{i=1}^{n_2} y_{2i} \sim \mathcal{N}\left( \frac{1}{n_2} n_2 \mu_2, \left(\frac{1}{n_2}\right)^2 n_2 \sigma^2 \right) = \mathcal{N}\left( \mu_2, \sigma^2/n_2 \right) ; . \end{split} $$

Again employing the linear combination theorem and applying the null hypothesis from \eqref{eq:ttest2-h0}, the distribution of $Z = ((\bar{y}_1-\bar{y}_2) - \mu_{\Delta}) / (\sigma \sqrt{1/n_1+1/n_2})$ becomes standard normal

$$ \label{eq:Z-dist} Z = \frac{(\bar{y}1-\bar{y}2)-\mu\Delta}{\sigma \cdot \sqrt{\frac{1}{n_1}+\frac{1}{n_2}}} \sim \mathcal{N}\left( \frac{(\mu_1-\mu_2)-\mu\Delta}{\sigma \cdot \sqrt{\frac{1}{n_1}+\frac{1}{n_2}}}, \left(\frac{1}{\sigma \cdot \sqrt{\frac{1}{n_1}+\frac{1}{n_2}}}\right)^2 \left( \frac{\sigma^2}{n_1} + \frac{\sigma^2}{n_2} \right) \right) \overset{H_0}{=} \mathcal{N}\left( 0, 1 \right) ; . $$

Because sample variances calculated from independent normal random variables follow a chi-squared distribution, the distribution of $V = (n_1+n_2-2),s_p^2/\sigma^2$ is

$$ \label{eq:V-dist} V = \frac{(n_1+n_2-2),s_p^2}{\sigma^2} \sim \chi^2\left(n_1+n_2-2\right) ; . $$

Finally, since the ratio of a standard normal random variable and the square root of a chi-squared random variable follows a t-distribution, the distribution of the test statistic is given by

$$ \label{eq:t-dist-qed} t = \frac{(\bar{y}_1-\bar{y}2)-\mu\Delta}{s_p \cdot \sqrt{\frac{1}{n_1}+\frac{1}{n_2}}} = \frac{Z}{\sqrt{V / (n_1+n_2-2)}} \sim \mathrm{t}(n_1+n_2-2) ; . $$

This means that the null hypothesis can be rejected when $t$ is as extreme or more extreme than the critical value obtained from the Student's t-distribution with $n_1+n_2-2$ degrees of freedom using a significance level $\alpha$.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ug-ttest2.md

ug-ttest2.md

Files

ug-ttest2.md

Latest commit

History

ug-ttest2.md

File metadata and controls