- 假设检验 (hypothesis testing)
  - 参数检验 (parametric testing)
    - 单正态分布下的假设检验
    - 双正态分布下的假设检验
  - 非参数假设检验 (non-parametric testing)
    - 单样本
      - 卡方检验 ($\chi$ squared  test)
      - 二项分布检验 (binomial distribution test)
      - one sample K-S test
      - Wilcoxon sign test
      - 游程检验
    - 两总体比较
      - 独立样本
        - Mann-Whitney U test
        - K-S test
        - Wald-Wolfowitz test
        - Moses极端反应检验
      - 相关样本
        - sign test
        - McNemar test
        - Wilcoxon sign rank test
        - 边际同质性检验
    - 多总体比较
      - 独立样本
        - 中位数检验
        - K-W单因素ANOVA检验
        - 有序备择检验（J-T检验）
      - 相关样本
        - Friedman秩和检验
        - Kendall协同系数检验
        - Cochran Q test
- 方差分析 (analysis of variance, ANOVA)

# 假设检验
对总体提出假设，根据采样来对假设进行检验，结论只有接受或拒绝两种情况

假设的提法：$H_0$原假设(null hypothesis)，$H_1$备择假设(对立假设)(alternative hypothesis)

例：$H_0$: $p \geq 5\%$, $H_1$: $p < 5\%$

$H_0$: $\mu=\mu_0$, $H_1$: $\mu \neq \mu_0$

## 假设检验
1. 有明确的假设H
2. 给定一个所能容忍的犯这类错误的上限
3. 在此上限下，判断证据对拒绝H是否显著
4. 只要证据对拒绝H不显著，即接受H

## 假设错误和风险
真实情况 |$H_0$为真|$H_1$为真
-------|---------|-------
决策|接受$H_0$|拒绝$H_1$
决策|拒绝$H_0$|接受$H_1$

若拒绝原假设，可能会弃真，犯第一类错误(Type I error)

若接受原假设，可能会取伪，犯第二类错误(Type II error)

I类风险：犯第一类错误的概率

II类风险：犯第二类错误的概率

检验原则一：保护$H_0$

检验原则二：最优检验，控制第二类风险小于$\alpha$的前提下，是检验问题的第二类风险达到最小

步骤：
1. 假设$H_0$为真，构造一个统计量
2. 根据此统计量来确定一个事件（等价于给出$H_0$的否定域），要求$H_0$为真时，该事件是小概率事件
3. 进行实验，利用采样数据，判断小概率事件是否发生，若发生则拒绝$H_0$


# 参数检验
## 单正态总体

例1. (two sided) 若$\mu_0 = 2$, size = 100, $\bar{X} = 1.978$, $\sigma = 0.2$, $\alpha = 0.05$。问$\mu$是否等于$\mu_0$?

解 $H_0$: $\mu = \mu_0$, $H_1$: $\mu \neq \mu_0$

因为$\bar{X}$的取值在$\mu$附近波动，故$|\bar{X} - \mu_0| \geq C$时拒绝$H_0$，当$|\bar{X} - \mu_0| < C$时，接受$H_0$

拒绝$H_0$时的事件$|\frac{\bar{X} - \mu_0}{\frac{\sigma}{\sqrt{n}}}| \geq \mu_{1 - \frac{\alpha}{2}}$ 称为拒绝域，也可转化为求
$|\frac{\bar{X} - \mu_0}{\frac{\sigma}{\sqrt{n}}}|$对应的概率与$\alpha$的比较

In [1]:
import numpy as np
from scipy.stats import norm

f_norm = norm()
alpha = 0.05
x_bar = 1.978
sigma = 0.2
mu_0 = 2
n = 100
mu_975 = norm.ppf(0.975)

threshold = np.abs((x_bar - mu_0)/(sigma/np.sqrt(n)))
print('threshold is: {:.2f}'.format(threshold))
print('threshold >= mu_975: {}'.format(threshold >= mu_975))

p_value = 2 * (1 - norm.cdf((threshold + 1) / 2))
print('p_value: {:.2f}'.format(p_value))
print('p_value < alpha: {}'.format(p_value < alpha))

print('therefore, do not reject H0')

threshold is: 1.10
threshold >= mu_975: False
p_value: 0.29
p_value < alpha: False
therefore, do not reject H0


例2. (one sided) 若$X \sim \mathcal{N}(40, 2^2)$, n=25, $\bar{x} = 41.25$，$\sigma=2$, $\alpha=0.05$。问$\mu > \mu_0$?

解： 设$H_0$: $\mu \leq \mu_0 = 40$, $H_1$: $\mu > \mu_0$

分析： 若$H_0$成立，则$\bar{X} - \mu_0$偏小与0

当$\bar{x} - \mu_0 > c$时，拒绝$H_0$, $\bar{x} - \mu_0 \leq$时接受$H_0$

因为 $\mu \leq \mu_0$， $\mu_0 - \mu \geq 0$

所以 $$P\{\frac{\bar{X} - \mu_0}{\frac{\sigma}{\sqrt{n}}} > \frac{C}{\frac{\sigma}{\sqrt{n}}}| \mu \leq \mu_0 \} \leq P\{\frac{\bar{X} - \mu}{\frac{\sigma}{\sqrt{n}}} > \frac{C}{\frac{\sigma}{\sqrt{n}}}| \mu \leq \mu_0 \} = \alpha$$

$$\frac{\bar{X} - \mu_0}{\frac{\sigma}{\sqrt(n)}} \geq \mu_{1-\alpha}$$

In [3]:
f_norm = norm()
alpha = 0.05
x_bar = 41.25
sigma = 2
mu_0 = 40
n = 25
mu_95 = norm.ppf(1-alpha)

threshold = (x_bar - mu_0)/(sigma/np.sqrt(n))
print('threshold is: {:.2f}'.format(threshold))
print('threshold >= mu_95: {}'.format(threshold >= mu_95))

p_value = 1 - norm.cdf(threshold)
print('p_value: {:.2f}'.format(p_value))
print('p_value < alpha: {}'.format(p_value < alpha))

print('therefore, reject H0')

threshold is: 3.12
threshold >= mu_95: True
p_value: 0.00
p_value < alpha: True
therefore, reject H0


### 均值检验（方差已知）
总体$X \sim \mathcal{N}(\mu, \sigma^2)$，$\frac{\bar{X} - \mu}{\frac{\sigma}{\sqrt{n}}} \sim \mathcal{N}(0, 1)$
1. $H_0$: $\mu = \mu_0$， $H_1$: $\mu \neq \mu_0$，拒绝域$|\frac{\bar{X} - \mu_0}{\frac{\sigma}{\sqrt{n}}}| \geq \mu_{1 - \frac{\alpha}{2}}$, p_value = 2 * (1 - norm.cdf((threshold + 1) / 2))，I类风险 = $\alpha$
2. $H_0$: $\mu \leq \mu_0$， $H_1$: $\mu > \mu_0$，拒绝域$\frac{\bar{X} - \mu_0}{\frac{\sigma}{\sqrt{n}}} \geq \mu_{1 - \alpha}$，p_value = 1 - norm.cdf(threshold)，I类风险 $\leq \alpha$
3. $H_0$: $\mu \geq \mu_0$， $H_1$: $\mu < \mu_0$，拒绝域$\frac{\bar{X} - \mu_0}{\frac{\sigma}{\sqrt{n}}} \leq \mu_{\alpha}$，p_value = norm.cdf(threshold)，I类风险 $\leq \alpha$

### 均值检验（方差未知，单样本t检验）
$\frac{\bar{X} - \mu}{\frac{S}{\sqrt{n}}} \sim t(n - 1)$
1. $H_0$: $\mu = \mu_0$， $H_1$: $\mu \neq \mu_0$，拒绝域$|\frac{\bar{X} - \mu_0}{\frac{S}{\sqrt{n}}}| \geq t_{1 - \frac{\alpha}{2}(n - 1)}$, p_value = 2*(1 - t.cdf((threshold + 1) / 2), n - 1)，I类风险 = $\alpha$
2. $H_0$: $\mu \leq \mu_0$， $H_1$: $\mu > \mu_0$，拒绝域$\frac{\bar{X} - \mu_0}{\frac{S}{\sqrt{n}}} \geq t_{1 - \alpha}(n - 1)$，p_value = 1 - t.cdf(threshold, n-1)，I类风险 $\leq \alpha$
3. $H_0$: $\mu \geq \mu_0$， $H_1$: $\mu < \mu_0$，拒绝域$\frac{\bar{X} - \mu_0}{\frac{S}{\sqrt{n}}} \leq t_{\alpha}(n - 1)$，p_value = t.cdf(threshold, n - 1)，I类风险 $\leq \alpha$

In [19]:
# use premade function for 2-sided test
from scipy.stats import norm, ttest_1samp
np.random.seed(100)

mu_0 = 40
mu = 40
sigma = 0.2
alpha = 0.05
X = norm.rvs(loc=mu, scale=sigma, size=50)
print(X.mean(), X.var(ddof=1))
result = ttest_1samp(X, mu_0)
print('t statistic: {:.02f}'.format(result[0]))
print('p_value: {:.2f}'.format(result[1]))
print('therefore, do not reject H0. The mu is equal to mu_0(population mean)')

39.99183985842521 0.03615898698684999
t statistic: -0.30
p_value: 0.76
therefore, do not reject H0. The mu is equal to mu_0(population mean)


### 方差检验（均值未知）
$\frac{(n-1)S^2}{\sigma^2} \sim \chi^2(n-1)$
1. $H_0$: $\sigma^2 \leq \sigma_0^2$， $H_1$: $\sigma^2 > \sigma_0^2$，拒绝域$\frac{(n-1)S^2}{\sigma^2} \geq \chi_{1-\alpha}^2(n-1)$，p_value = 1 - chi2.cdf(threshold, n - 1)，I类风险 $\leq \alpha$
2. $H_0$: $\sigma^2 \geq \sigma_0^2$， $H_1$: $\sigma^2 < \sigma_0^2$，拒绝域$\frac{(n-1)S^2}{\sigma^2} \leq \chi_{\alpha}^2(n - 1)$，p_value = chi2.cdf(threshold)，I类风险 $\leq \alpha$
3. $H_0$: $\sigma_2 = \sigma_0^2$， $H_1$: $\sigma^2 \neq \sigma_0^2$，拒绝域$\frac{(n-1)S^2}{\sigma^2} \geq \chi_{1-\alpha}^2(n-1) \cup \frac{(n-1)S^2}{\sigma^2} \leq \chi_{\alpha}^2(n - 1)$，p_value = (1 - chi2.cdf(threshold, n - 1)) + (chi2.cdf(threshold))，I类风险 = $\alpha$

In [None]:
from scipy.stats import norm, chisquare
np.random.seed(100)

mu = 40
sigma = 0.2
sigma_0 = 0.25
alpha = 0.05
X = norm.rvs(loc=mu, scale=sigma, size=50)
print(X.mean(), X.var(ddof=1))
result = chisquare(X, mu_0)
print('t statistic: {:.02f}'.format(result[0]))
print('p_value: {:.2f}'.format(result[1]))
print('therefore, do not reject H0. The mu is equal to mu_0(population mean)')

### 方差检验（均值已知）
1. $H_0$: $\sigma^2 \leq \sigma_0^2$， $H_1$: $\sigma^2 > \sigma_0^2$，拒绝域$\frac{\sum_{i=1}^n(X_i - \mu)^2}{\sigma^2} \geq \chi_{1-\alpha}^2(n-1)$，p_value = 1 - chi2.cdf(threshold, n - 1)，I类风险 $\leq \alpha$
2. $H_0$: $\sigma^2 \geq \sigma_0^2$， $H_1$: $\sigma^2 < \sigma_0^2$，拒绝域$\frac{\sum_{i=1}^n(X_i - \mu)^2}{\sigma^2} \leq \chi_{\alpha}^2(n - 1)$，p_value = chi2.cdf(threshold)，I类风险 $\leq \alpha$
3. $H_0$: $\sigma_2 = \sigma_0^2$， $H_1$: $\sigma^2 \neq \sigma_0^2$，拒绝域$\frac{\sum_{i=1}^n(X_i - \mu)^2}{\sigma^2} \geq \chi_{1-\alpha}^2(n-1) \cup \frac{\sum_{i=1}^n(X_i - \mu)^2}{\sigma^2} \leq \chi_{\alpha}^2(n - 1)$，p_value = (1 - chi2.cdf(threshold, n - 1)) + (chi2.cdf(threshold))，I类风险 = $\alpha$

## 双正态总体
### 均值差检验
#### 方差齐性
$X \sim \mathcal{N}(\mu_1, \sigma^2)$, $Y \sim \mathcal{N}(\mu_2, \sigma^2)$, X, Y iid
1. $H_0$: $\mu_1 = \mu_2$， $H_1$: $\mu_1 \neq \mu_2$。拒绝域为
$$\frac{\bar{X} - \bar{Y}}{S_\omega\sqrt{\frac{1}{n_1} + \frac{1}{n_2}}} \leq t_{\frac{\alpha}{2}}(n_1 + n_2 -2 ) \cup \frac{\bar{X} - \bar{Y}}{S_\omega\sqrt{\frac{1}{n_1} + \frac{1}{n_2}}} \leq t_{1 - \frac{\alpha}{2}}(n_1 + n_2 -2 )$$ 其中$S_\omega^2 = \frac{(n - 1)S_1^2 + (m - 1)S_2^2}{m + n -2}$
2. $H_0$: $\mu_1 \leq \mu_2$， $H_1$: $\mu_1 > \mu_2$。拒绝域为
$$\frac{\bar{X} - \bar{Y}}{S_\omega\sqrt{\frac{1}{n_1} + \frac{1}{n_2}}} \geq t_{1 - \alpha}(n_1 + n_2 -2 ) $$
p_value = 1 - t.cdf(threshold, n1 + n2 - 2)
3. $H_0$: $\mu_1 \geq \mu_2$， $H_1$: $\mu_1 < \mu_2$。拒绝域为
$$\frac{\bar{X} - \bar{Y}}{S_\omega\sqrt{\frac{1}{n_1} + \frac{1}{n_2}}} \leq t_{\alpha}(n_1 + n_2 -2 ) $$
p_value = t.cdf(threshold, n1 + n2 - 2)
4. $H_0$: $\mu_1 - \mu_2 \geq \delta$， $H_1$: $\mu_1 - \mu_2 < \delta$。拒绝域为
$$\frac{\bar{X} - \bar{Y} - \delta}{S_\omega\sqrt{\frac{1}{n_1} + \frac{1}{n_2}}} \leq t_{\alpha}(n_1 + n_2 -2 )$$
p_value = t.cdf(threshold, n1 + n2 - 2)

In [4]:
# X and Y iid, sigma1 = sigma2
from scipy.stats import t

x_bar = 76.23
y_bar = 79.43
s1_2 = 3.325
s2_2 = 2.225
n1 = 10
n2 = 10
alpha = 0.05
s_omega_2 = ((n1 - 1)* s1_2 + (n2 - 1)*s2_2) / (n1 + n2 - 2)
threshold = (x_bar - y_bar) / (np.sqrt(s_omega_2) * np.sqrt(1/n1 + 1/n2))
f_t = t(df=n1+n2 -2)
t_18 = f_t.ppf(alpha)

print('threshold is: {:.2f}'.format(threshold))
print('threshold <= t_18: {}'.format(threshold <= t_18))

p_value = t.cdf(threshold, n1 + n2 - 2)
print('p_value: {:.2f}'.format(p_value))
print('p_value < alpha: {}'.format(p_value < alpha))
print('therefore, reject H0')

threshold is: -4.30
threshold <= t_18: True
p_value: 0.00
p_value < alpha: True
therefore, reject H0


In [29]:
from scipy.stats import ttest_ind, norm
np.random.seed(10)

x_bar = 76.23
y_bar = 79.43
s1_2 = 3.325
s2_2 = 2.225
n1 = 10
n2 = 10
alpha = 0.05
f_norm = norm()
X = norm.rvs(x_bar, np.sqrt(s1_2), size=n1)
Y = norm.rvs(y_bar, np.sqrt(s2_2), size=n2)
ttest_ind(X, Y, equal_var=True)

Ttest_indResult(statistic=-5.224632872788141, pvalue=5.7241658501511684e-05)

In [31]:
from scipy.stats import ttest_ind_from_stats
np.random.seed(10)

x_bar = 76.23
y_bar = 79.43
s1_2 = 3.325
s2_2 = 2.225
n1 = 10
n2 = 10
alpha = 0.05

ttest_ind_from_stats(mean1=x_bar, std1=np.sqrt(s1_2), nobs1=n1,
                     mean2=y_bar, std2=np.sqrt(s2_2), nobs2=n2,
                     equal_var=True)

Ttest_indResult(statistic=-4.295398753369759, pvalue=0.0004355175480216593)

#### 方差不等
$X \sim \mathcal{N}(\mu_1, \sigma_1), Y \sim \mathcal{N}(\mu_2, \sigma_2^2)$, X, Y iid

$Z = X - Y \sim \mathcal{N}(\mu_1 - \mu_2, \sigma_1^2 + \sigma_2^2)$

$H_0$: $\mu_1 = \mu_2 \to \mu = 0$， $H_1$: $\mu_1 \neq \mu_2 \to \mu \neq 0$

此时的拒绝域为
$$|t| = |\frac{\bar{Z}}{\frac{S_z}{\sqrt{n}}}| \geq t_{1-\frac{\alpha}{2}}(n - 1)$$
p_value = 2*(1 - t.cdf((threshold + 1) / 2), n - 1)

In [35]:
from scipy.stats import t


X = np.array([0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0])
Y = np.array([0.1, 0.21, 0.52, 0.32, 0.78, 0.59, 0.68, 0.77, 0.89])
Z = X - Y
alpha = 0.01
f_t = t(Z.size - 1)

threshold = np.abs(Z.mean() / (np.sqrt(Z.var(ddof=1) / Z.size)))
t_9 = f_t.ppf(1 - alpha/2)
print('threshold is: {:.2f}'.format(threshold))
print('threshold >= t_9: {}'.format(threshold >= t_9))
p_value = 2 * (1 - f_t.cdf((threshold + 1) / 2))
print('p_value: {:.2f}'.format(p_value))
print('p_value < alpha: {}'.format(p_value < alpha))
print('therefore, do not reject H0')

threshold is: 1.47
threshold >= t_9: False
p_value: 0.25
p_value < alpha: False
therefore, do not reject H0


In [33]:
from scipy.stats import ttest_ind, norm
np.random.seed(10)

X = np.array([0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0])
Y = np.array([0.1, 0.21, 0.52, 0.32, 0.78, 0.59, 0.68, 0.77, 0.89])

ttest_ind(X, Y, equal_var=False)

Ttest_indResult(statistic=0.46352358236214464, pvalue=0.6492323448586885)

### 方差检验


1. $H_0$: $\sigma_1^2 \leq \sigma_2^2$， $H_1$: $\sigma_1^2 > \sigma_2^2$，拒绝域为$\frac{S_1^2}{S_2^2} \geq F_{1-\alpha}(n_1 - 1, n_2 - 1)$
2. $H_0$: $\sigma_1^2 \geq \sigma_2^2$， $H_1$: $\sigma_1^2 < \sigma_2^2$，拒绝域为$\frac{S_1^2}{S_2^2} \leq F_{\alpha}(n_1 - 1, n_2 - 1)$
3. $H_0$: $\sigma_1^2 = \sigma_2^2$， $H_1$: $\sigma_1^2 \neq \sigma_2^2$，拒绝域为$\frac{S_1^2}{S_2^2} \geq F_{1-\alpha}(n_1 - 1, n_2 - 1) \cup \frac{S_1^2}{S_2^2} \leq F_{\alpha}(n_1 - 1, n_2 - 1)$

In [6]:
from scipy.stats import f


S_1_2 = 2500
S_2_2 = 400
n_1 = 10
n_2 = 12
alpha = 0.01
f_f = f(n_1 -1, n_2 - 1)
threshold = S_1_2/S_2_2
f_99 = f_f.ppf(1- alpha)
print('threshold is: {:.2f}'.format(threshold))
print('threshold >= f_99: {}'.format(threshold >= f_99))

p_value = 1 - f.cdf(threshold, n_1, n_2)
print('p_value: {:.2f}'.format(p_value))
print('p_value < alpha: {}'.format(p_value < alpha))
print('therefore, reject H0')

threshold is: 6.25
threshold >= f_99: True
p_value: 0.00
p_value < alpha: True
therefore, reject H0


# 非参数检验
## Pearson's $\chi$ squared test
### 检验一个分布是否服从某分布
$H_0$: P{X = $a_i$} = $p_i$，$H_1$: P{X = $a_i$} $\neq p_i$, $\sum_{i=1}^{n}p_i = 1$

$\chi^2 = \sum_{i=1}{k}\frac{(n_i - np_i)^2}{np_i}$应偏小
1. 一般的，当n$\geq$50时，就认为$\chi^2 \sim \chi^2(k-1)$
2. $H_0$的拒绝域为 $\sum_{i=1}{k}\frac{(n_i - np_i)^2}{np_i} \geq \chi_{1-\alpha}^2(k-1)$
3. 对连续性总体做离散化处理

In [7]:
from scipy.stats import chisquare


X = np.array([74, 92, 83, 79, 80, 73, 77, 75, 76, 91])
X_exp = np.array([80]*10)

result = chisquare(X, X_exp)

print('chi_square statistic: {:.02f}'.format(result[0]))
print('p_value: {:.2f}'.format(result[1]))
print('therefore, do not reject H0')

chi_square statistic: 5.12
p_value: 0.82
therefore, do not reject H0


In [8]:
# should use skewness and kurtosis normality test as is shown below
from scipy.stats import chisquare


X = np.array([8, 20, 21, 11])
X_exp = np.array([8.952, 21.048, 21.048, 8.952])
result = chisquare(X, X_exp)
alpha = 0.05

print('chi_square statistic: {:.02f}'.format(result[0]))
print('p_value: {:.2f}'.format(result[1]))
print('therefore, do not reject H0. X follows normal distribution')

chi_square statistic: 0.62
p_value: 0.89
therefore, do not reject H0. X follows normal distribution


In [9]:
from scipy.stats import normaltest
from scipy.stats import norm

nrv = norm(loc=80, scale=9.6**2)
X = nrv.rvs(size=60, random_state=123)
result = normaltest(X)
print('p_value: {:.2f}'.format(result[1]))
print('therefore, do not reject H0. X follows normal distribution')

p_value: 0.63
therefore, do not reject H0. X follows normal distribution


### 独立性的卡方检验
例 高血压是否与食盐摄入过多有关

In [10]:
from scipy.stats import chi2_contingency

X = np.matrix([[43, 13], [162, 121]])
alpha = 0.02

chi2, p_value, dof, ex = chi2_contingency(X, correction=False)
print('p_value: {:.2f}'.format(p_value))
print('therefore, reject H0. 高血压与食盐摄入过多不独立')
print(dof)
print(ex)

p_value: 0.01
therefore, reject H0. 高血压与食盐摄入过多不独立
1
[[ 33.86430678  22.13569322]
 [171.13569322 111.86430678]]


# 方差分析

$H_0$: $\mu_1 = \mu_2 = ... \mu_i$，$H_1$: $\mu_1, \mu_2 ... \mu_i$不完全相等

$S_t = S_A + S_e$，其中$S_t=\sum_{j=1}^{r}\sum_{i=1}^{n_j}(X_{ij} - \bar{X})^2$为总偏差平方和，$S_A=\sum_{j=i}^{r}\sum_{i=1}^{n_j}(\bar{X}_j - \bar{X})^2$为因子效应平方和，$S_e = \sum_{i=1}^{r}\sum_{i=1}^{n_j}(X_{ij} - \bar{X}_j)^2$为误差平方和，由各组数据随机误差累计引起

当$H_0$为真时，$\frac{S_A}{\sigma^2} \sim \chi^2(r-1)$, r为水平数

总有
$$\frac{S_e}{\sigma^2} \sim \chi^2(n-r)$$

因此, 当$H_0$为真时，有
$$\frac{\frac{S_A}{r-1}}{\frac{S_e}{n-r}} \sim F_{1-\alpha}(r-1, n-r)$$
其中$\alpha$为置信度

In [11]:
from scipy.stats import f_oneway

#X = np.matrix([[87, 90, 56, 92, 75], [85, 88, 62, 99, 72], [80, 87, np.nan, 95, 81], [np.nan, 94, np.nan, 91, np.nan]])
X_1 = np.array([87, 85, 80])
X_2 = np.array([90, 88, 87, 94])
X_3 = np.array([56, 62])
X_4 = np.array([92, 99, 95, 91])
X_5 = np.array([75, 72, 81])
result = f_oneway(X_1, X_2, X_3, X_4, X_5)

print('ANOVA statistic: {:.02f}'.format(result[0]))
print('p_value: {:.2f}'.format(result[1]))
print('therefore, reject H0. The mu do not all equal')

ANOVA statistic: 35.62
p_value: 0.00
therefore, reject H0. The mu do not all equal
