Ex: There is a company that is trying to produce three kinds of fertilizers for plants: A, B, and C. The goal is to determine whether all fertilizers have the same impact on plants or if their effects differ. Here, we are analyzing differences between the groups.

In [2]:
import numpy as np
import pandas as pd
from scipy import stats

In [3]:
# Given data
fertilizer_A = np.array([55, 60, 52, 58, 62])
fertilizer_B = np.array([65, 70, 68, 66, 72])
fertilizer_C = np.array([75, 78, 74, 76, 80])

In [4]:
# Step 1: Compute the Group Means
mean_A = np.mean(fertilizer_A)
mean_B = np.mean(fertilizer_B)
mean_C = np.mean(fertilizer_C)

In [5]:
data_combined = np.concatenate([fertilizer_A, fertilizer_B, fertilizer_C])
overall_mean = np.mean(data_combined)

In [6]:
# Step 2: Compute Sum of Squares Between Groups (SSB)
n_A = len(fertilizer_A)
n_B = len(fertilizer_B)
n_C = len(fertilizer_C)

In [7]:
SSB = (n_A * (mean_A - overall_mean) ** 2 +
       n_B * (mean_B - overall_mean) ** 2 +
       n_C * (mean_C - overall_mean) ** 2)

In [8]:
# Step 3: Compute Sum of Squares Within Groups (SSW)
SSW_A = np.sum((fertilizer_A - mean_A) ** 2)
SSW_B = np.sum((fertilizer_B - mean_B) ** 2)
SSW_C = np.sum((fertilizer_C - mean_C) ** 2)
SSW = SSW_A + SSW_B + SSW_C

In [9]:
# Step 4: Compute Total Sum of Squares (SST)
SST = SSB + SSW

In [10]:
# Step 5: Compute Mean Squares (MSB and MSW)
k = 3  # Number of groups (fertilizers)
N = len(data_combined)  # Total number of observations

In [11]:
MSB = SSB / (k - 1)
MSW = SSW / (N - k)

In [12]:
# Step 6: Compute F-statistic
F_statistic = MSB / MSW

In [13]:
# Step 7: Compute p-value
p_value = 1 - stats.f.cdf(F_statistic, k - 1, N - k)

In [14]:
# Organizing the results into a readable DataFrame
anova_manual_results = pd.DataFrame({
    "Sum of Squares": [SSB, SSW, SST],
    "Degrees of Freedom": [k - 1, N - k, N - 1],
    "Mean Squares": [MSB, MSW, None],
    "F-Statistic": [F_statistic, None, None],
    "P-Value": [p_value, None, None]
}, index=["Between Groups (SSB)", "Within Groups (SSW)", "Total (SST)"])


In [15]:
# Display the detailed manual ANOVA calculation results
print(anova_manual_results)

                      Sum of Squares  Degrees of Freedom  Mean Squares  \
Between Groups (SSB)           926.4                   2    463.200000   
Within Groups (SSW)            119.2                  12      9.933333   
Total (SST)                   1045.6                  14           NaN   

                      F-Statistic   P-Value  
Between Groups (SSB)    46.630872  0.000002  
Within Groups (SSW)           NaN       NaN  
Total (SST)                   NaN       NaN  


In [18]:
# Final Interpretation
if p_value < 0.05:
    print("Since the p-value is < 0.05, we reject the null hypothesis. At least one fertilizer significantly affects plant growth.")
else:
    print("Since the p-value is ≥ 0.05, we fail to reject the null hypothesis. The fertilizers have similar effects on plant growth.")

Since the p-value is < 0.05, we reject the null hypothesis. At least one fertilizer significantly affects plant growth.


We are going to reject the NULL hypothesis and out of A, B and C there is one fertilizer which is helping as to grow plats in a better way. So we will reject it.