## 3.4 Comparing Means from Different Populations

- Suppose you are interested in the means of two different populations, denote them $μ_1$ and $μ_2$. 
- More specifically, you are interested in whether these population means are different from each other and plan to use a hypothesis test to verify this on the basis of independent sample data from both populations. 
- Where $d_0$ denotes the hypothesized difference in means (so $d_0 = 0$ when the means are equal, under the null hypothesis), a suitable pair of hypotheses is
\begin{equation}
H_0: \mu_1 - \mu_2 = d_0 \ \ \text{vs.} \ \ H_1: \mu_1 - \mu_2 \neq d_0
\end{equation}
- Using the two sample t-test, $H_0$ can be tested with the t-statistic
\begin{equation}
t=\frac{(\overline{Y}_1 - \overline{Y}_2) - d_0}{SE(\overline{Y}_1 - \overline{Y}_2)}
\end{equation}
where
\begin{equation}
SE(\overline{Y}_1 - \overline{Y}_2) = \sqrt{\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2}}.
\end{equation}

- For large $n_1$ and $n_2$, the above t-statistic is standard normal under the null hypothesis. 
- Analogously to the simple t-test we can compute confidence intervals for the true difference in population means, a 95% confidence interval for d is:

\begin{equation}
(\overline{Y}_1 - \overline{Y}_2) \pm 1.96 \times SE(\overline{Y}_1 - \overline{Y}_2)
\end{equation}

- In Julia, we can perform this unequal variance two-sample t-test (sometimes known as Welch's t-test) of the null hypothesis that x and y come from distributions with equal means against the alternative hypothesis that the distributions have different means using the function UnequalVarianceTTest() from the HypothesisTests package.

In [1]:
using HypothesisTests

random_data_1 = vec(rand(range(0, 20), 1, 100)) #we use vec because the OneSampleTTest requires a vector type
random_data_2 = vec(rand(range(0, 20), 1, 100)) #we use vec because the OneSampleTTest requires a vector type

UnequalVarianceTTest(random_data_1, random_data_2)

Two sample t-test (unequal variance)
------------------------------------
Population details:
    parameter of interest:   Mean difference
    value under h_0:         0
    point estimate:          0.5099999999999998
    95% confidence interval: (-1.2509, 2.2709)

Test summary:
    outcome with 95% confidence: fail to reject h_0
    two-sided p-value:           0.5686

Details:
    number of observations:   [100,100]
    t-statistic:              0.5711402777721202
    degrees of freedom:       197.67829223239636
    empirical standard error: 0.8929505059411782


- Here we find that the two sample t-test does not reject the (true) null hypothesis that $d_0 = 0$.
- This is expected because the ranges of our two datasets are the same, which results in similar means.
- However, if we change the ranges of our two datasets, we should expect means which are further apart and therefore a rejection of the null hypothesis.

In [10]:
using HypothesisTests

random_data_1 = vec(rand(range(0, 25), 1, 100)) #we use vec because the OneSampleTTest requires a vector type
random_data_2 = vec(rand(range(0, 15), 1, 100)) #we use vec because the OneSampleTTest requires a vector type

UnequalVarianceTTest(random_data_1, random_data_2)

Two sample t-test (unequal variance)
------------------------------------
Population details:
    parameter of interest:   Mean difference
    value under h_0:         0
    point estimate:          5.050000000000001
    95% confidence interval: (3.2571, 6.8429)

Test summary:
    outcome with 95% confidence: reject h_0
    two-sided p-value:           <1e-6

Details:
    number of observations:   [100,100]
    t-statistic:              5.563188549589401
    degrees of freedom:       158.50289394887977
    empirical standard error: 0.9077528030885675
