In [None]:
Q1: Assumptions Required to Use ANOVA 
ANOVA (Analysis of Variance) requires certain assumptions to be met to ensure the validity of the results:
    Independence of Observations:Each group or sample must be independent of the others.
    Violation Example: If you collect data from the same subjects in different groups, the observations are not independent.
    Normality:The data within each group should be approximately normally distributed.
    Violation Example: If the data is heavily skewed, the normality assumption is violated.
    Homogeneity of Variances (Homoscedasticity):The variance among the groups should be approximately equal.V
    iolation Example: If one group has a much larger variance than the others, the assumption of homogeneity of variances is violated.
    
    
Q2: Types of ANOVAOne-Way ANOVA:Used when comparing the means of three or more independent groups based on one factor.Situation: Comparing the mean test scores of students from different teaching methods.Two-Way ANOVA:Used when comparing the means based on two factors, and it can assess the interaction between these factors.Situation: Comparing the productivity of employees based on different software programs and different experience levels.Repeated Measures ANOVA:Used when the same subjects are measured multiple times under different conditions.Situation: Measuring the effect of different diets on the same group of participants over time.Q3: Partitioning of Variance in ANOVAIn ANOVA, the total variance is partitioned into components:Total Sum of Squares (SST): Measures the total variation in the data.Explained Sum of Squares (SSE): Measures the variation explained by the group differences.Residual Sum of Squares (SSR): Measures the variation within the groups.Understanding these components is crucial as it allows us to determine how much of the total variation is explained by the factors being studied.Q4: Calculating SST, SSE, and SSR in One-Way ANOVA Using PythonpythonCopy codeimport numpy as np
import pandas as pd
import scipy.stats as stats

# Sample data
data = {
    'group': ['A', 'A', 'A', 'B', 'B', 'B', 'C', 'C', 'C'],
    'value': [23, 20, 21, 25, 26, 24, 18, 19, 17]
}
df = pd.DataFrame(data)

# Calculate group means and overall mean
group_means = df.groupby('group').mean().value
overall_mean = df.value.mean()

# Calculate SST (Total Sum of Squares)
sst = sum((df.value - overall_mean) ** 2)

# Calculate SSE (Explained Sum of Squares)
sse = sum(df.groupby('group').size() * (group_means - overall_mean) ** 2)

# Calculate SSR (Residual Sum of Squares)
ssr = sum((df.groupby('group').apply(lambda x: x.value - x.value.mean()) ** 2).sum())

print(f"SST: {sst}, SSE: {sse}, SSR: {ssr}")


Q5: Calculating Main Effects and Interaction Effects in Two-Way ANOVA Using Python

import statsmodels.api as sm
from statsmodels.formula.api import ols

# Sample data
data = {
    'program': ['A', 'A', 'A', 'B', 'B', 'B', 'C', 'C', 'C'],
    'experience': ['novice', 'experienced', 'novice', 'experienced', 'novice', 'experienced', 'novice', 'experienced', 'novice'],
    'time': [30, 25, 28, 27, 26, 22, 35, 32, 33]
}
df = pd.DataFrame(data)

# Fit the two-way ANOVA model
model = ols('time ~ C(program) + C(experience) + C(program):C(experience)', data=df).fit()
anova_table = sm.stats.anova_lm(model, typ=2)

print(anova_table)

Q6: Interpretation of One-Way ANOVA ResultsIf you obtained an F-statistic of 5.23 and a p-value of 0.02:
Conclusion: Since the p-value (0.02) is less than the significance level (0.05), you reject the null hypothesis.
This indicates that there are significant differences between the group means.

Interpretation: There is evidence to suggest that at least one group's mean is significantly different from the others.

Q7: Handling Missing Data in Repeated Measures ANOVAHandling missing data in repeated measures ANOVA can be challenging:
    
Methods:Listwise Deletion: Remove any cases with missing data, which can reduce statistical power.
Imputation: Replace missing values with estimated ones (mean, median, etc.).
Mixed-Effects Models: Use models that can handle missing data without imputation.
Consequences:Listwise Deletion: Loss of data can reduce the power and generalizability of the results.
Imputation: Improper imputation can bias the results.
Mixed-Effects Models: These can handle missing data better but are more complex to implement and interpret.

Q8: Common Post-Hoc Tests Used After ANOVA Tukey's 
HSD (Honestly Significant Difference):Used to compare all possible pairs of means.

Situation: Comparing the mean scores of different teaching methods to identify which pairs differ.
Bonferroni Correction:Adjusts the significance level to control for Type I errors.
Situation: Multiple comparisons in clinical trials.
Scheffé Test:More conservative and suitable for complex comparisons.
Situation: Comparing multiple treatment effects in agricultural studies.

Q9: One-Way ANOVA Using Python

import statsmodels.api as sm
from statsmodels.formula.api import ols

# Sample data
data = {
    'diet': ['A', 'A', 'A', 'B', 'B', 'B', 'C', 'C', 'C'],
    'weight_loss': [5, 6, 7, 8, 7, 6, 4, 3, 5]
}
df = pd.DataFrame(data)

# Fit the one-way ANOVA model
model = ols('weight_loss ~ C(diet)', data=df).fit()
anova_table = sm.stats.anova_lm(model, typ=2)

print(anova_table)


Q10: Two-Way ANOVA Using Python 

import statsmodels.api as sm
from statsmodels.formula.api import ols

# Sample data
data = {
    'program': ['A', 'A', 'A', 'B', 'B', 'B', 'C', 'C', 'C'],
    'experience': ['novice', 'experienced', 'novice', 'experienced', 'novice', 'experienced', 'novice', 'experienced', 'novice'],
    'time': [30, 25, 28, 27, 26, 22, 35, 32, 33]
}
df = pd.DataFrame(data)

# Fit the two-way ANOVA model
model = ols('time ~ C(program) + C(experience) + C(program):C(experience)', data=df).fit()
anova_table = sm.stats.anova_lm(model, typ=2)

print(anova_table)

Q11: Two-Sample T-Test Using Python

from scipy import stats

# Sample data
control_group = [70, 72, 68, 74, 73]
experimental_group = [75, 78, 76, 77, 79]

# Perform two-sample t-test
t_stat, p_val = stats.ttest_ind(control_group, experimental_group)

print(f"T-statistic: {t_stat}, p-value: {p_val}")

Q12: Repeated Measures ANOVA Using Python

import statsmodels.api as sm
from statsmodels.formula.api import ols
from statsmodels.stats.anova import AnovaRM

# Sample data
data = {
    'store': ['A', 'A', 'A', 'B', 'B', 'B', 'C', 'C', 'C'],
    'sales': [200, 210, 205, 190, 195, 200, 220, 215, 225],
    'day': [1, 2, 3, 1, 2, 3, 1, 2, 3]
}
df = pd.DataFrame(data)

# Fit the repeated measures ANOVA model
model = AnovaRM(df, 'sales', 'day', within=['store'])
anova_table = model.fit()

print(anova_table)