### **Analysis of Variance (ANOVA)**

**Introduction to ANOVA (Analysis of Variance)**

ANOVA (Analysis of Variance) was introduced by Ronald A. Fisher in 1920 to compare multiple population means.

The basic principle is to compare the variance between groups and the variance within groups. By examining variability within and between groups, ANOVA allows us to determine whether observed differences between sample means are statistically significant.

Key Concepts of ANOVA

- **Variance**: A measure of the dispersion of data around the mean.
- **Total Variance (TSV)**: Variance of all observations from the overall mean.
- **Between-Group Variance (BV)**: Variance due to differences between group means.
- **Within-Group Variance (TSV)**: Variance due to differences within each group.

**ANOVA Assumptions**

1. **Independence of observations**: The samples must be independent of each other.
2. **Normality**: The data for each group must follow a normal distribution.
3. **Homogeneity of variances**: The variances of the groups must be similar.

**Elements of ANOVA**

**SSW (Within-Group Sum of Squares)**: Measures the variability of observations relative to their own group mean.

**SSG (Between-Group Sum of Squares)**: Measures the variability of group means relative to the overall mean.

**SST (Total Sum of Squares)**: Measures the total variability of the data, and is the sum of SSW and SSG:

``SST = SSW + SSG``

**Formule de la Statistique F**:

``F = MSG/MSW = (SSG/(K - 1)) / (SSW/(n - K))``

- K : Nombre de groupes.
- n : Nombre total d’observations ou taille de l'échantillon

Test Hypotheses:
- Null Hypothesis (H0): The means of all groups are equal. (H0) : x1 = x2 = x3
- Alternative Hypothesis (H1): At least one mean is different. (H1) : x1 <> x2 = x3

**One-Way ANOVA with Python (Scipy, Statsmodels)**

Import Libraries

In [1]:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import scipy.stats as stats
import statsmodels.api as sm
from statsmodels.formula.api import ols

In [2]:
# Set Data
group_1 = [3, 2, 1]
group_2 = [5, 3, 4]
group_3 = [5, 6, 7]

In [None]:
# Calculate the averages for each group
# Calculate the overall average
mean_1 = np.mean(group_1)
mean_2 = np.mean(group_2)
mean_3 = np.mean(group_3)
overall_mean = np.mean(group_1 + group_2 + group_3)
for i in range (1,4):
    print(f"Mean of group {i}: {eval('mean_' + str(i))}")
print(f"overall mean: {overall_mean}")

Mean of group 1: 2.0
Mean of group 2: 4.0
Mean of group 3: 6.0
Overall mean: 4.0


In [None]:
# Sum of Squares Between Groups (SSG)
SSG = len(group_1) * (mean_1 - overall_mean) ** 2 + \
      len(group_2) * (mean_2 - overall_mean) ** 2 + \
      len(group_3) * (mean_3 - overall_mean) ** 2
print(f"SSG: {SSG}")

SSG : 24.0


In [24]:
# Intra-Group Sum of Squares (SSW)
SSW = sum((x - mean_1) ** 2 for x in group_1) + \
      sum((x - mean_2) ** 2 for x in group_2) + \
      sum((x - mean_3) ** 2 for x in group_3)
print(f"SSW: {SSW}")


SSW: 6.0


In [None]:
# Total Sum of Squares (TSS)
SST = SSG + SSW
print(f"SST: {SST}")

SST: 30.0


In [63]:
# Mean Squares and the F-Statistic
k = 3
n = len(group_1) + len(group_2) + len(group_3)

MSG = SSG / (k - 1)
MSW = SSW / (n - k)
F_stat = MSG / MSW
print(f"F-statistic: {(k-1, n-k)}")
print(f"F-statistic: {F_stat}")

F-statistic: (2, 6)
F-statistic: 12.0


This means that the variance between groups is 12 times greater than the variance within groups.

In [59]:
from scipy import stats

# Perform ANOVA with scipy
F_scipy, p_value = stats.f_oneway(group_1, group_2, group_3)

# Afficher les résultats
print(f"F-statistic (scipy): {F_scipy}")
print(f"p-value (scipy): {p_value.round(4)}")

F-statistic (scipy): 12.0
p-value (scipy): 0.008


Such a low p-value means that the probability that the differences between groups are due to chance is practically zero. This is very strong evidence that the group means are significantly different.

In [61]:
# Interpret the Result
alpha = 0.05
if p_value < alpha:
    print("We reject the null hypothesis Ho. The group means are significantly different.")
else:
    print("We do not have enough evidence to reject the null hypothesis. The group means are similar.")

We reject the null hypothesis Ho. The group means are significantly different.


In [62]:
from scipy.stats import f

# Paramètres de l'ANOVA
alpha = 0.05  
df1 = 3       # Degrés de liberté entre les groupes
df2 = 9      # Degrés de liberté au sein des groupes

# Calcul de la Valeur Critique F
F_critique = f.ppf(1 - alpha, df1, df2)
print(f"Valeur Critique F (F_critique) : {F_critique:.4f}")

Valeur Critique F (F_critique) : 3.8625
