# ANOVA Test (Analysis of Variance, F-Test)
📌 What It Does

It checks whether the mean of a continuous feature is significantly different across groups (classes).
In ML, we use it to test if a numerical feature is important for classification.

📌 Formula for F-statistic
F=Variance Within GroupsVariance Between Groups​

If between-group variance is high compared to within-group variance → feature is useful.

If not → feature doesn’t vary much across classes → less useful.

📌 Requirements

Feature should be continuous (Iris features are continuous).

Target should be categorical (Iris species are categorical).

Null Hypothesis 
𝐻
0
H
0
	​

: All groups (species) have the same mean feature value.
Alternative Hypothesis 
𝐻
1
H
1
	​

: At least one group has a different mean.

📌 Interpretation

High F-score & p-value < 0.05 → feature significantly differs between species → good for classification.

Low F-score & high p-value → feature does not vary much → less useful.

In [3]:
from sklearn.datasets import load_iris
import pandas as pd
from sklearn.feature_selection import f_classif

iris= load_iris()
x = pd.DataFrame(iris.data, columns = iris.feature_names)
y = pd.Series(iris.target, name = 'Species')

In [4]:
f_score , p_value = f_classif(x, y)

In [5]:
anova_result = pd.DataFrame({
    'feature' : x.columns,
    'f-score' : f_score,
    'p-value' : p_value
})

In [6]:
print("\n Anova test Results {iris) : ")
print(anova_result)


 Anova test Results {iris) : 
             feature      f-score       p-value
0  sepal length (cm)   119.264502  1.669669e-31
1   sepal width (cm)    49.160040  4.492017e-17
2  petal length (cm)  1180.161182  2.856777e-91
3   petal width (cm)   960.007147  4.169446e-85


In [18]:
anova_result

Unnamed: 0,feature,f-score,p-value
0,sepal length (cm),119.264502,1.6696690000000001e-31
1,sepal width (cm),49.16004,4.4920170000000005e-17
2,petal length (cm),1180.161182,2.8567769999999996e-91
3,petal width (cm),960.007147,4.1694459999999995e-85


In [53]:
for i in range(len(x.columns)):
    if (anova_result[anova_result['feature'] == x.columns[i]]['p-value'].values[0]) < 0.05:
        print(x.columns[i]) 

sepal length (cm)
sepal width (cm)
petal length (cm)
petal width (cm)
