In [None]:
_='''
Linear Regression Assumptions:

Linearity: A linear relationship exists between predictors and the response variable.
Independence: Residuals (errors) should be independent of each other.
Homoscedasticity: Residuals show constant variance against predicted values.
Normality: Residuals should have an approximate normal distribution.
No Multicollinearity: Predictors should not be highly correlated.
Additivity: The combined effects of predictors are additive.


ANOVA Assumptions:

Independence: Observations must be independent within and across groups.
Random Sampling: Data represents random samples from populations.
Normality: Residuals should be approximately normal.
Scale: Data should be at the interval or ratio level of measurement
Homogeneity of Variances: Variances should be equal across groups.
Absence of Outliers: Outliers can skew results and should be addressed.
Interactions: In Two-Way ANOVA, interactions between factors should be considered.
Equal Group Sizes: not required but ideally, groups should have roughly the same number of observations to enhance test reliability

Violating the assumptions of ANOVA or linear regression can compromise model accuracy, making predictions and interpretations less reliable. 
Additionally, misleading p-values can arise, increasing the risk of incorrect inferences about statistical significance. These issues can 
undermine the trustworthiness and the generalizability of the results.

One way Python's tool can be utilized to help assess an ANOVA test is by using the command 
f_statistic, p_value = stats.f_oneway(df['group 1'], df['group 2'], df['group 3']) from the scipy.stats library. 
This function calculates the F-statistic and the associated p-value, providing a straightforward method to test the null 
hypothesis that all group means are equal. 
'''

In [2]:
import pandas as pd
import scipy.stats as stats

# All assumptions have been met and ANOVA is the appropriate test

# H0: The teaching methods (Traditional, Hybrid, Online) have no effect on exam scores.
# The means of the exam scores are equal across all three groups.

# H1: At least one teaching method leads to different exam scores.
# The mean exam score is not the same for all groups. At least one group mean is different.

# exam scores
data = {
    'Traditional': [85, 88, 82, 90, 87, 85, 84, 88, 86, 87, 86, 85, 87, 84, 85, 85, 88, 84, 86, 87, 86, 84, 86, 87, 88],
    'Hybrid': [89, 90, 88, 91, 88, 92, 90, 89, 90, 91, 89, 90, 89, 91, 88, 89, 91, 89, 90, 88, 89, 90, 90, 89, 91],
    'Online': [80, 82, 79, 81, 82, 80, 83, 81, 80, 82, 80, 79, 80, 82, 79, 80, 81, 82, 80, 79, 81, 80, 82, 80, 79]
}

# Convert data to a DataFrame
df = pd.DataFrame(data)

# Perform an ANOVA test to compare means
f_statistic, p_value = stats.f_oneway(df['Traditional'], df['Hybrid'], df['Online'])

print(f'F-statistic: {f_statistic}')
print(f'P-value: {p_value}')

# results
alpha = 0.05
if p_value < alpha:
    print("There's a statistically significant difference between the groups.")
else:
    print("There's no statistically significant difference between the groups.")


F-statistic: 272.52668213457105
P-value: 2.5843527350432272e-34
There's a statistically significant difference between the groups.
