
When the population standard deviation is not known and we have a relatively small sample

$$
H_0: \mu_1 = \mu_2\\
H_a: \mu_1 \neq \mu_2
$$

## Q: Is the variance for the two samples equal?
## Yes:
$$
t_{cal} = \frac{\bar{x_1} - \bar{x_2}}{s_p\sqrt{\frac{1}{n_1} + \frac{1}{n_2}}} \\
$$
$$
s_p^2 = \frac{(n_1 - 1)s_1^2 + (n_2 - 1)s_2^2}{df} \\
$$
$$
df = n_1 + n_2 - 2
$$

## No:

$$
t_{cal} = \frac{\bar{x_1} - \bar{x_2}}{\sqrt{\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2}}} \\
$$
$$
df = \text{Floor}\frac{(\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2})^2}{\frac{(\frac{s_1^2}{n_1})^2}{n_1-1} + \frac{(\frac{s_2^2}{n_2})^2}{n_2-1}}
$$

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
sns.set_style("darkgrid")
import scipy.stats as stats
from statsmodels.stats import weightstats

In [4]:
# EQUAL VARIANCES (approximately)

m1 = [150, 152, 154, 152, 151]
m2 = [156, 155, 158, 155, 154]

# e.g. volume of perfume produced by two machines. Is there a statistical difference?

In [7]:
stats.ttest_ind(m1, m2, equal_var=True)

# p < 0.05 so we reject the null, i.e. there is a statistically significant difference between the two samples

Ttest_indResult(statistic=-4.005551702879929, pvalue=0.003919295477128331)

In [8]:
# UNEQUAL VARIANCES

m1 = [150, 152, 154, 152, 151]
m3 = [144, 162, 177, 150, 140]

In [9]:
stats.ttest_ind(m1, m3, equal_var=False)

# fail to reject the null

Ttest_indResult(statistic=-0.4146442144313621, pvalue=0.699289145758865)