# ANOVA Test

## Formulas

1. **Sum of Squares Between Groups (SST)**  
   The SST measures the variability due to differences between the group means.  
   
   $$
   SST = \sum_{i=1}^k n_i (\bar{x}_i - \bar{x}_{\text{overall}})^2
   $$

   Where:  
   - $k$: Number of groups  
   - $n_i$: Number of observations in group $i$  
   - $\bar{x}_i$: Mean of group $i$  
   - $\bar{x}_{\text{overall}}$: Overall mean of all observations  

2. **Sum of Squares Within Groups (SSE)**  
   The SSE, also known as the residual sum of squares, measures the variability within each group.  

   $$
   SSE = \sum_{i=1}^k \sum_{j=1}^{n_i} (x_{ij} - \bar{x}_i)^2
   $$

   Where:  
   - $x_{ij}$: Individual observation in group $i$  
   - $\bar{x}_i$: Mean of group $i$  

3. **Mean Squares**  
   To compute the variance for SST and SSE, divide each by their respective degrees of freedom.  

   - **Mean Square Total Between Groups (MST):**  
     $$
     MST = \frac{SST}{k-1}
     $$

   - **Mean Square Errors Within Groups (MSE):**  
     $$
     MSE = \frac{SSE}{N-k}
     $$

   Where:  
   - $N$: Total number of observations  
   - $k$: Number of groups  

5. **F-Statistic**  
   The F-statistic is calculated to determine whether the group means are significantly different.  

   $$
   F = \frac{MST}{MSE}
   $$

---

## Lastly
1. Compare the calculated $F$-statistic with the critical $F$-value from the F-distribution table at a given significance level ($\alpha$).  

---

## Decision Rule

- If $F > F_{\text{critical}}$: Reject the null hypothesis. There is a significant difference between the group means.  
- If $F \leq F_{\text{critical}}$: Fail to reject the null hypothesis. The group means are not significantly different.  



In [1]:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
from scipy.stats import f

In [2]:
class Anova:
    """
    A class to perform a one-way Analysis of Variance (ANOVA) test.

    Attributes:
    -----------
    groups : list of arrays
        A list where each element is a group of sample data.

    Methods:
    --------
    get_sizes():
        Returns the number of observations in each group.

    get_k_n():
        Returns the number of groups (k) and total number of observations (n).

    sample_mean_groups():
        Computes the mean of each group and the overall mean.

    sum_total_squares():
        Computes the total sum of squares (SST).

    sum_squared_errors():
        Computes the sum of squared errors (SSE) within groups.

    mean_squared_total():
        Computes the mean squared total (MST).

    mean_squared_error():
        Computes the mean squared error (MSE).

    f_value():
        Computes the F-statistic for the ANOVA test.

    check_hyphotethis(alpha):
        Checks whether to reject or fail to reject the null hypothesis based on the F-statistic and critical value.

    generate_anova_table():
        Generates an ANOVA table with degrees of freedom (DF), sum of squares (SS), mean squares (MSS), and F-value.
    """

    def __init__(self, *groups):
        """
        Initializes the ANOVA test with the provided groups of data.

        Parameters:
        -----------
        *groups : array-like
            One or more groups of sample data to perform the ANOVA test on.
        """
        self.groups = groups

    def get_sizes(self):
        """
        Returns the number of observations in each group.

        Returns:
        --------
        list : The sizes of each group.
        """
        return [len(group) for group in self.groups]

    def get_k_n(self):
        """
        Returns the number of groups (k) and the total number of observations (n).

        Returns:
        --------
        tuple : (k, n) where k is the number of groups and n is the total number of observations.
        """
        k = len(self.get_sizes())
        n = sum(self.get_sizes())
        return k, n

    def sample_mean_groups(self):
        """
        Computes the mean of each group and the overall mean.

        Returns:
        --------
        tuple : (means, overall_mean) where means is a list of group means and
                overall_mean is the mean of all observations.
        """
        means = [np.mean(group) for group in self.groups]
        sizes = self.get_sizes()
        numerator = sum(size * mean for mean, size in zip(means, sizes))
        denominator = sum(sizes)
        return means, numerator / denominator

    def sum_total_squares(self):
        """
        Computes the total sum of squares (SST), a measure of the total variability in the data.

        Returns:
        --------
        float : The total sum of squares (SST).
        """
        means, overall_mean = self.sample_mean_groups()
        sizes = self.get_sizes()
        total = []
        for mean, size in zip(means, sizes):
            total.append(((mean - overall_mean) ** 2) * size)
        return sum(total)

    def sum_squared_errors(self):
        """
        Computes the sum of squared errors (SSE), a measure of the variability within each group.

        Returns:
        --------
        float : The sum of squared errors (SSE).
        """
        total = 0
        for group in self.groups:
            m = np.mean(group)
            for n in group:
                total += (n - m) ** 2
        return round(total, 2)

    def mean_squared_total(self):
        """
        Computes the mean squared total (MST), the average of the total sum of squares.

        Returns:
        --------
        float : The mean squared total (MST).
        """
        sst = self.sum_total_squares()
        k, n = self.get_k_n()
        return round(sst / (k - 1), 2)

    def mean_squared_error(self):
        """
        Computes the mean squared error (MSE), the average of the sum of squared errors.

        Returns:
        --------
        float : The mean squared error (MSE).
        """
        sse = self.sum_squared_errors()
        k, n = self.get_k_n()
        return round(sse / (n - k), 2)

    def f_value(self):
        """
        Computes the F-statistic for the ANOVA test, the ratio of MST to MSE.

        Returns:
        --------
        float : The F-statistic value.
        """
        mst = self.mean_squared_total()
        mse = self.mean_squared_error()
        return round(mst / mse, 3)

    def check_hyphotethis(self, alpha):
        """
        Checks whether to reject or fail to reject the null hypothesis.

        Parameters:
        -----------
        alpha : float
            The significance level for the test (commonly 0.05).

        Returns:
        --------
        str : A message indicating whether to reject or fail to reject the null hypothesis.
        """
        k, n = self.get_k_n()
        f_critical = f.ppf(1 - alpha, k - 1, n - k)
        return "Fails to Reject Null Hypothesis" if self.f_value() <= f_critical else "Reject Null Hypothesis"

    def generate_anova_table(self):
        """
        Generates an ANOVA table containing degrees of freedom (DF), sum of squares (SS),
        mean squares (MSS), and the F-statistic.

        Returns:
        --------
        pd.DataFrame : The ANOVA table as a DataFrame.
        """
        k, n = self.get_k_n()
        sst = self.sum_total_squares()
        sse = self.sum_squared_errors()
        mst = self.mean_squared_total()
        mse = self.mean_squared_error()
        f_value = self.f_value()

        data = {
            "Source": ["DF", "SS", "MSS", "F"],
            "Treatment": [k - 1, sst, mst, f_value],
            "Errors": [n - k, sse, mse, np.nan],
            "Total": [(k - 1) + (n - k), sst + sse, mst + mse, f_value]
        }

        return pd.DataFrame(data)


In [3]:

group1 = [42, 30, 39,28, 29]
group2 = [28, 36, 31, 32, 27]
group3 = [24, 36, 28, 28, 33]
group4 = [20, 32, 38, 28, 25]

anova = Anova(group1, group2, group3, group4)
alpha = 0.05

means, overall_mean = anova.sample_mean_groups()
sst = anova.sum_total_squares()
sse = anova.sum_squared_errors()
mst = anova.mean_squared_total()
mse = anova.mean_squared_error()
f_value = anova.f_value()
testing = anova.check_hyphotethis(alpha)

In [4]:
print("Sample Mean1: ", means)
print("Overall Mean: ", overall_mean)
print("SST: ", sst)
print("SSE: ", sse)
print("MST: ", mst)
print("MSE: ", mse)
print("F: ", f_value)
print(testing)

Sample Mean1:  [33.6, 30.8, 29.8, 28.6]
Overall Mean:  30.7
SST:  68.2
SSE:  492.0
MST:  22.73
MSE:  30.75
F:  0.739
Fails to Reject Null Hypothesis


In [5]:
anova_table = anova.generate_anova_table()

anova_table.T

Unnamed: 0,0,1,2,3
Source,DF,SS,MSS,F
Treatment,3.0,68.2,22.73,0.739
Errors,16.0,492.0,30.75,
Total,19.0,560.2,53.48,0.739
