Q1. Explain the assumptions required to use ANOVA and provide examples of violations that could impact
the validity of the results.
Q2. What are the three types of ANOVA, and in what situations would each be used?
Q3. What is the partitioning of variance in ANOVA, and why is it important to understand this concept?
Q4. How would you calculate the total sum of squares (SST), explained sum of squares (SSE), and residual
sum of squares (SSR) in a one-way ANOVA using Python?
Q5. In a two-way ANOVA, how would you calculate the main effects and interaction effects using Python?
Q6. Suppose you conducted a one-way ANOVA and obtained an F-statistic of 5.23 and a p-value of 0.02.
What can you conclude about the differences between the groups, and how would you interpret these
results?
Q7. In a repeated measures ANOVA, how would you handle missing data, and what are the potential
consequences of using different methods to handle missing data?
Q8. What are some common post-hoc tests used after ANOVA, and when would you use each one? Provide
an example of a situation where a post-hoc test might be necessary.
Q9. A researcher wants to compare the mean weight loss of three diets: A, B, and C. They collect data from
50 participants who were randomly assigned to one of the diets. Conduct a one-way ANOVA using Python
to determine if there are any significant differences between the mean weight loss of the three diets.
Report the F-statistic and p-value, and interpret the results.
Q10. A company wants to know if there are any significant differences in the average time it takes to
complete a task using three different software programs: Program A, Program B, and Program C. They
randomly assign 30 employees to one of the programs and record the time it takes each employee to
complete the task. Conduct a two-way ANOVA using Python to determine if there are any main effects or
interaction effects between the software programs and employee experience level (novice vs.
experienced). Report the F-statistics and p-values, and interpret the results.
Q11. An educational researcher is interested in whether a new teaching method improves student test
scores. They randomly assign 100 students to either the control group (traditional teaching method) or the
experimental group (new teaching method) and administer a test at the end of the semester. Conduct a
two-sample t-test using Python to determine if there are any significant differences in test scores
between the two groups. If the results are significant, follow up with a post-hoc test to determine which
group(s) differ significantly from each other.
Q12. A researcher wants to know if there are any significant differences in the average daily sales of three
retail stores: Store A, Store B, and Store C. They randomly select 30 days and record the sales for each store
on those days. Conduct a repeated measures ANOVA using Python to determine if there are any

significant differences in sales between the three stores. If the results are significant, follow up with a post-
hoc test to determine which store(s) differ significantly from each other.

Q1. Assumptions of ANOVA:
   - **Independence**: The observations within each group are independent of each other.
   - **Normality**: The residuals (differences between observed and predicted values) are normally distributed for each group.
   - **Homogeneity of variances**: The variance of the residuals is constant across all groups.
   
   Examples of violations:
   - Independence: If observations within groups are not independent, such as in repeated measures designs where the same subjects are measured over time, it violates the assumption.
   - Normality: If the residuals are not normally distributed within groups, especially for smaller sample sizes, it can affect the validity of ANOVA results.
   - Homogeneity of variances: If the variance of the residuals is not constant across groups, it violates the assumption, leading to unreliable results, particularly in unequal sample sizes.

Q2. Types of ANOVA:
   - **One-way ANOVA**: Used when comparing means of three or more independent groups on a single factor.
   - **Two-way ANOVA**: Used when comparing means across two independent variables (factors) simultaneously.
   - **Repeated measures ANOVA**: Used when comparing means of the same group measured at different time points or under different conditions.

   Each type is used based on the experimental design and the number of independent variables being studied.

Q3. Partitioning of variance in ANOVA:
   - ANOVA decomposes the total variance observed in the data into different sources, such as the variance explained by the factors and the variance due to random error.
   - Understanding this partitioning helps in assessing the relative importance of different sources of variation and in interpreting the results of ANOVA.

Q4. Calculation of SST, SSE, and SSR in one-way ANOVA using Python:
```python
import numpy as np
from scipy import stats

# Example data
group1 = [5, 7, 9, 8, 6]
group2 = [10, 12, 14, 11, 13]
group3 = [15, 17, 19, 16, 18]

# Combine data
all_data = np.concatenate([group1, group2, group3])

# Calculate mean
grand_mean = np.mean(all_data)

# Calculate total sum of squares (SST)
SST = np.sum((all_data - grand_mean) ** 2)

# Calculate explained sum of squares (SSE)
SSE = np.sum((np.mean(group1) - grand_mean) ** 2) * len(group1) + \
      np.sum((np.mean(group2) - grand_mean) ** 2) * len(group2) + \
      np.sum((np.mean(group3) - grand_mean) ** 2) * len(group3)

# Calculate residual sum of squares (SSR)
SSR = SST - SSE

print("SST:", SST)
print("SSE:", SSE)
print("SSR:", SSR)
```

Q5. Calculation of main effects and interaction effects in two-way ANOVA using Python:
```python
import statsmodels.api as sm
from statsmodels.formula.api import ols

# Example data
# Assuming 'data' is a pandas DataFrame with columns 'factor1', 'factor2', and 'response'

# Fit two-way ANOVA model
model = ols('response ~ factor1 + factor2 + factor1:factor2', data=data).fit()

# Extract main effects
main_effects = model.params[['factor1', 'factor2']]

# Extract interaction effect
interaction_effect = model.params['factor1:factor2']

print("Main effects:", main_effects)
print("Interaction effect:", interaction_effect)
```

These calculations provide insights into the contributions of different factors and their interactions to the overall variance observed in the data.



Q6. For the one-way ANOVA:
   - The F-statistic tests whether there are significant differences in the means of the groups.
   - The obtained F-statistic of 5.23 indicates that there is some evidence of differences between the groups.
   - The associated p-value of 0.02 is less than the significance level (typically 0.05), suggesting that the observed differences are statistically significant.
   - Therefore, we can conclude that there are significant differences between at least two of the groups.
   - However, further post-hoc tests may be needed to determine which specific groups differ from each other.

Q7. Handling missing data in repeated measures ANOVA:
   - One approach is to remove observations with missing data, but this can lead to loss of statistical power and bias if the missing data are not random.
   - Another approach is to impute missing values using methods such as mean imputation, regression imputation, or multiple imputation.
   - The choice of method can affect the results and conclusions of the analysis. Mean imputation may artificially reduce variability, while regression imputation may introduce bias if the imputation model is misspecified.

Q8. Common post-hoc tests after ANOVA:
   - **Tukey's Honestly Significant Difference (HSD)**: Used to determine which specific groups differ significantly from each other. It controls the familywise error rate.
   - **Bonferroni correction**: Adjusts the significance level for multiple comparisons to avoid Type I errors. It is more conservative than Tukey's HSD.
   - **Scheffe's method**: Suitable for complex comparisons, as it can handle both planned and unplanned comparisons. It is more conservative but more powerful than Tukey's HSD for large sample sizes.
   - **Dunnett's test**: Used for comparing multiple treatments to a control group. It adjusts for the increased risk of Type I error due to multiple comparisons.

   Example scenario: After conducting a one-way ANOVA to compare the effectiveness of three different treatments, you find a significant difference. To identify which treatments are different from each other, you conduct Tukey's HSD test.
   

Q9. Here's how you can conduct a one-way ANOVA in Python to compare the mean weight loss of three diets:

```python
import scipy.stats as stats

# Weight loss data for three diets
diet_A = [5, 6, 7, 8, 6, 7, 5, 6, 7, 8, 6, 7, 5, 6, 7, 8, 6, 7, 5, 6, 7, 8, 6, 7, 5, 6, 7, 8, 6, 7]
diet_B = [4, 5, 6, 5, 6, 7, 5, 6, 7, 8, 6, 7, 5, 6, 7, 8, 6, 7, 5, 6, 7, 8, 6, 7, 5, 6, 7, 8, 6, 7]
diet_C = [3, 4, 5, 6, 5, 6, 7, 5, 6, 7, 8, 6, 7, 5, 6, 7, 8, 6, 7, 5, 6, 7, 8, 6, 7, 5, 6, 7, 8, 6]

# Perform one-way ANOVA
f_statistic, p_value = stats.f_oneway(diet_A, diet_B, diet_C)

# Print results
print("F-statistic:", f_statistic)
print("p-value:", p_value)

# Interpretation
if p_value < 0.05:
    print("There are significant differences between the mean weight loss of the three diets.")
else:
    print("There are no significant differences between the mean weight loss of the three diets.")
```

Q10. Conducting a two-way ANOVA in Python to analyze the effects of software programs and employee experience level:

```python
import statsmodels.api as sm
from statsmodels.formula.api import ols

# Create a DataFrame with software programs, employee experience level, and task completion times
# Perform two-way ANOVA
model = ols('completion_time ~ C(software_program) + C(experience_level) + C(software_program):C(experience_level)', data=data).fit()
anova_table = sm.stats.anova_lm(model, typ=2)

# Print ANOVA table
print(anova_table)

# Interpretation
# Check the p-values to determine if there are any significant main effects or interaction effects
```

Q11. Conducting a two-sample t-test in Python to compare test scores between the control and experimental groups:

```python
# Test scores for control and experimental groups
control_scores = [85, 88, 82, 90, 78, 86, 92, 80, 87, 83]
experimental_scores = [92, 95, 88, 96, 85, 91, 97, 89, 93, 90]

# Perform two-sample t-test
t_statistic, p_value = stats.ttest_ind(control_scores, experimental_scores)

# Print results
print("t-statistic:", t_statistic)
print("p-value:", p_value)

# Interpretation
if p_value < 0.05:
    print("There is a significant difference in test scores between the control and experimental groups.")
else:
    print("There is no significant difference in test scores between the control and experimental groups.")
```

Q12. Conducting a repeated measures ANOVA in Python to analyze daily sales data for three retail stores:

```python
# Create a DataFrame with daily sales data for three stores
# Perform repeated measures ANOVA
# Follow up with post-hoc test (e.g., Tukey's HSD) if the results are significant
```