## Задание 2(2)

### Однофакторный дисперсионный анализ

#### Модель: 
> $Y_{ij} = \mu_j + \varepsilon_{ij}$

где
* Y - наблюдения зависимой переменной
* $1 \leq j \leq J$ - уровень фактора
* $1 \leq i \leq I_j$ - номер наблюдения для $j$ уровня
* $\mu_j$ - среднее влияние на $j$ уровне

#### Тест: 
> f-тест Фишера

#### Гипотезы:
> $H_0$: $\mu_1 = \mu_2 = ... = \mu_J$
>
> $H_1$: $\neg H_0$

#### Статистика:
> $F = \frac{S_b^2}{S^2_w} \cdot \frac{df_w}{df_b} \sim F(df_b, df_w)$

где 
* $S_b^2 = \sum_{j=1}^J I_j (\overline{Y_{*j}} - \overline{Y})^2$
* $S_w^2 = \sum_{j=1}^J \sum_{i=1}^{I_j} (Y_{ij} - \overline{Y_{*j}})^2$
* $df_b = J - 1$
* $df_w = I - J$

#### Тип теста:
> правосторонний

In [73]:
import pandas as pd
from scipy.stats import f

In [74]:
def main():
    alpha = 0.05

    data = pd.read_csv('exams_dataset.csv')
    data['score'] = data['math score'] + data['reading score'] + data['writing score']
    
    n = len(data)
    mean = data['score'].mean()
    
    target = data['score']
    factor = data['race/ethnicity']
    
    levels = factor.unique()
    lvl_count = factor.value_counts()
    lvl_mean = data.groupby('race/ethnicity')['score'].mean()
    
    print(f"Mean: {mean}")
    print(f"Levels mean: ")
    for lvl in levels:
        print(f"  {lvl} | {lvl_mean[lvl]}")
    
    ssb = sum(lvl_count[lvl] * (lvl_mean[lvl] - mean)**2 for lvl in levels)
    dfb = len(levels) - 1
    
    ssw = sum((target[i] - lvl_mean[factor[i]])**2 for i in range(n))
    dfw = n - len(levels)
    
    stat = (ssb / dfb) / (ssw / dfw)
    pvalue = 1 - f(dfb, dfw).cdf(stat)
    result = "accept" if pvalue > alpha else "reject"
    
    print(f"Statistic: {stat}")
    print(f"P-value: {pvalue}")
    print(f"Result: {result}")
    
if __name__ == '__main__':
    main()

Mean: 202.404
Levels mean: 
  group B | 195.06372549019608
  group D | 209.53639846743295
  group A | 191.66233766233765
  group C | 194.9814814814815
  group E | 223.80597014925374
Statistic: 14.31081484758895
P-value: 2.303324198038581e-11
Result: reject
