#### ANOVA (Analysis of Variance)

Anova is used to analyze differences among means of two or more groups.                                                 The General one-way ANOVA formula is:                                                          F = between-group varianve / within-group variance

Where:
F is the F-statistic, which follows an F-distribution.
Between-group variance represents the variability between the group means.
Within-group variance represents the variability within each group.

In [5]:
from sklearn.datasets import load_iris
from scipy.stats import f_oneway

#loading dataset
iris = load_iris()

data = iris.data
target = iris.target

#extract data for each species
setosa_data =  data[target == 0]
versicolor_data = data[target == 1]
virginica_data = data[target == 2]

#implementing ANOVA
f_statistic, p_value = f_oneway(setosa_data, versicolor_data, virginica_data)
#f_oneway performs analysis of variance test to compare means of samples/ groups
#f-statistic: This is a ratio of the variance between groups to the variance within groups.
#p-value: The probability that the null hypothesis is true (i.e., there are no differences between the groups). If p < alpha
#output
print("F-statistic: ", f_statistic)
print("p-value: ", p_value)

#interpret results
alpha = 0.05 #alha: significance level
for i in p_value:
    if i <alpha:
        print("Reject the null hypothesis. There are significant differences among the means")
    else:
        print("Fail to reject the null hypothesis. There are no significant differencesamon the means")


F-statistic:  [ 119.26450218   49.16004009 1180.16118225  960.0071468 ]
p-value:  [1.66966919e-31 4.49201713e-17 2.85677661e-91 4.16944584e-85]
Reject the null hypothesis. There are significant differences among the means
Reject the null hypothesis. There are significant differences among the means
Reject the null hypothesis. There are significant differences among the means
Reject the null hypothesis. There are significant differences among the means


In [6]:

# The above output computes:
# Comparison 1: Setosa vs. Versicolor
# Comparison 2: Setosa vs. Virginica
# Comparison 3: Versicolor vs. Virginica
# Comparison 4: Overall comparison across all three groups (total variation across groups vs. variation within groups)

#### HYPOTHESIS

In [8]:
#Null Hypothesis (H0): means of all groups are equal
#Alternative Hypothesis (H1): atleast one group's mean is different from others

#anova hypothesis:
F_statistic:  [ 119.26450218, 49.16004009, 1180.16118225,  960.0071468 ]
p_value:  [1.66966919e-31, 4.49201713e-17, 2.85677661e-91, 4.16944584e-85]
alpha = 0.05 #alha: significance level
for i in p_value:
    if i <alpha:
        print("Reject the null hypothesis. There are significant differences among the means")
    else:
        print("Fail to reject the null hypothesis. There are no significant differencesamon the means")


Reject the null hypothesis. There are significant differences among the means
Reject the null hypothesis. There are significant differences among the means
Reject the null hypothesis. There are significant differences among the means
Reject the null hypothesis. There are significant differences among the means


T-Test Hypothesis

In [11]:
#T-test: compares the means of a single feature across different species
#consider comparing sepal lengths of setosa and versicolor
#performing two-sample independent t-test

from scipy.stats import ttest_ind
from sklearn.datasets import load_iris

#loading the dataset
iris = load_iris()
data = iris.data
target = iris.target
feature_names = iris.feature_names

#extract sepal lengths for setosa and versicolor species
setosa_length = data[target == 0, feature_names.index('sepal length (cm)')]
versicolor_length = data[target == 1, feature_names.index('sepal length (cm)')]

#two-sample t-test
t_statistic, p_value = ttest_ind(setosa_length, versicolor_length)

print("t-statistic: ", t_statistic)
print("p-value: ", p_value)

alpha = 0.05
if p_value < alpha:
    print("Reject the null hypothesis. There is a significant difference in sepal lengths between Setosa and Versicolor species.")
else:
    print("Fail to reject the null hypothesis. There is no significant difference in sepal lengths between Setosa and Versicolor species.")
    



t-statistic:  -10.52098626754911
p-value:  8.985235037487079e-18
Reject the null hypothesis. There is a significant difference in sepal lengths between Setosa and Versicolor species.
