Q1. Explain the assumptions required to use ANOVA and provide examples of violations that could impact
the validity of the results.


*Assumptions*:
1. Independence of observations.
2. Normality within each group.
3. Homogeneity of variances.

*Violations*:
1. Non-independence can lead to Type I errors.
2. Non-normality can affect reliability.
3. Heteroscedasticity may lead to incorrect conclusions.


Q2. What are the three types of ANOVA, and in what situations would each be used?



1. One-way ANOVA: Compares means across groups for one independent variable.
2. Two-way ANOVA: Examines effects of two independent variables on one dependent variable.
3. Repeated Measures ANOVA: Analyzes changes over time or under different conditions.

Q3. What is the partitioning of variance in ANOVA, and why is it important to understand this concept

Total Variance (SST) = Explained Variance (SSE) + Unexplained Variance (SSR).

Understanding this helps identify the proportion of variance due to the independent variable.


Q6. Suppose you conducted a one-way ANOVA and obtained an F-statistic of 5.23 and a p-value of 0.02.
What can you conclude about the differences between the groups, and how would you interpret these
results?



ANS 6 : With an F-statistic of 5.23 and p-value of 0.02, we reject the null hypothesis. There are significant differences between groups.

Q7. In a repeated measures ANOVA, how would you handle missing data, and what are the potential
consequences of using different methods to handle missing data?


Imputation methods (e.g., mean imputation) can be used, but choice may impact results. Mean imputation might underestimate variability.


Q8. What are some common post-hoc tests used after ANOVA, and when would you use each one? Provide
an example of a situation where a post-hoc test might be necessary.



Common tests include Tukey's HSD, Bonferroni, and Sidak. They're used to identify specific grMoup differences after finding a significant overall effect in ANOVA.

Q9. A researcher wants to compare the mean weight loss of three diets: A, B, and C. They collect data from
50 participants who were randomly assigned to one of the diets. Conduct a one-way ANOVA using Python
to determine if there are any significant differences between the mean weight loss of the three diets.
Report the F-statistic and p-value, and interpret the results.



In [2]:
#Ans 9
import scipy.stats as stats

diet_A = [5,8,3,4,6,7,4,5,6,7,6,7,9,2,6,8,9]
diet_B = [6,8,5,3,4,8,6,5,7,4,5,6,5,6,6,8,9]
diet_C = [6,7,8,7,6,7,7,6,7,7,6,8,5,4,5,5]

f_statistic, p_value = stats.f_oneway(diet_A, diet_B, diet_C)

print(f"F-statistic: {f_statistic}")
print(f"P-value: {p_value}")

F-statistic: 0.2419799272727273
P-value: 0.7860439452153402


Q10. A company wants to know if there are any significant differences in the average time it takes to
complete a task using three different software programs: Program A, Program B, and Program C. They
randomly assign 30 employees to one of the programs and record the time it takes each employee to
complete the task. Conduct a two-way ANOVA using Python to determine if there are any main effects or
interaction effects between the software programs and employee experience level (novice vs.
experienced). Report the F-statistics and p-values, and interpret the results.



In [4]:
import statsmodels.api as sm
from statsmodels.formula.api import ols

# Fit the model
model = ols('Time ~ Software + Experience + Software*Experience', data=df).fit()

# Perform ANOVA
anova_table = sm.stats.anova_lm(model, typ=2)

print(anova_table)

Q11. An educational researcher is interested in whether a new teaching method improves student test
scores. They randomly assign 100 students to either the control group (traditional teaching method) or the
experimental group (new teaching method) and administer a test at the end of the semester. Conduct a
two-sample t-test using Python to determine if there are any significant differences in test scores
between the two groups. If the results are significant, follow up with a post-hoc test to determine which
group(s) differ significantly from each other.



In [None]:
import scipy.stats as stats

# Assuming scores for control group are in control_scores and experimental group in experimental_scores
t_statistic, p_value = stats.ttest_ind(control_scores, experimental_scores)

print(f"T-statistic: {t_statistic}")
print(f"P-value: {p_value}")

Q12. A researcher wants to know if there are any significant differences in the average daily sales of three
retail stores: Store A, Store B, and Store C. They randomly select 30 days and record the sales for each store
on those days. Conduct a repeated measures ANOVA using Python to determine if there are any
significant differences in sales between the three stores. If the results are significant, follow up with a posthoc test to determine which store(s) differ significantly from each other.

In [None]:
import pandas as pd
import pingouin as pg

# Assuming you have a dataframe 'df' with columns 'Store_A', 'Store_B', 'Store_C' for daily sales

# Create a sample dataframe (replace this with your actual data)
data = {'Store_A': [100, 120, 90, 110, 130],
        'Store_B': [90, 80, 100, 110, 120],
        'Store_C': [120, 110, 100, 130, 140]}

df = pd.DataFrame(data)

# Perform Repeated Measures ANOVA
rm_anova = pg.rm_anova(data=df, dv='value', within='Store')

print(rm_anova)