In [None]:
Q1. Explain the assumptions required to use ANOVA and provide examples of violations that could impact
the validity of the results.
Q2. What are the three types of ANOVA, and in what situations would each be used?
Q3. What is the partitioning of variance in ANOVA, and why is it important to understand this concept?
Q4. How would you calculate the total sum of squares (SST), explained sum of squares (SSE), and residual
sum of squares (SSR) in a one-way ANOVA using Python?
Q5. In a two-way ANOVA, how would you calculate the main effects and interaction effects using Python?
Q6. Suppose you conducted a one-way ANOVA and obtained an F-statistic of 5.23 and a p-value of 0.02.
What can you conclude about the differences between the groups, and how would you interpret these
results?
Q7. In a repeated measures ANOVA, how would you handle missing data, and what are the potential
consequences of using different methods to handle missing data?
Q8. What are some common post-hoc tests used after ANOVA, and when would you use each one? Provide
an example of a situation where a post-hoc test might be necessary.
Q9. A researcher wants to compare the mean weight loss of three diets: A, B, and C. They collect data from
50 participants who were randomly assigned to one of the diets. Conduct a one-way ANOVA using Python
to determine if there are any significant differences between the mean weight loss of the three diets.
Report the F-statistic and p-value, and interpret the results.
Q10. A company wants to know if there are any significant differences in the average time it takes to
complete a task using three different software programs: Program A, Program B, and Program C. They
randomly assign 30 employees to one of the programs and record the time it takes each employee to
complete the task. Conduct a two-way ANOVA using Python to determine if there are any main effects or
interaction effects between the software programs and employee experience level (novice vs.
experienced). Report the F-statistics and p-values, and interpret the results.
Q11. An educational researcher is interested in whether a new teaching method improves student test
scores. They randomly assign 100 students to either the control group (traditional teaching method) or the
experimental group (new teaching method) and administer a test at the end of the semester. Conduct a
two-sample t-test using Python to determine if there are any significant differences in test scores
between the two groups. If the results are significant, follow up with a post-hoc test to determine which
group(s) differ significantly from each other.
Q12. A researcher wants to know if there are any significant differences in the average daily sales of three
retail stores: Store A, Store B, and Store C. They randomly select 30 days and record the sales for each store
on those days. Conduct a repeated measures ANOVA using Python to determine if there are any

significant differences in sales between the three stores. If the results are significant, follow up with a post-
hoc test to determine which store(s) differ significantly from each other.

In [None]:
Solution

Let's address each of your questions regarding ANOVA step by step:

### Q1. Assumptions of ANOVA and Violations
The assumptions of ANOVA (Analysis of Variance) include:
1. **Independence:** Observations are independent within and between groups.
2. **Normality:** The dependent variable follows a normal distribution within each group.
3. **Homogeneity of Variances:** The variances of the dependent variable are equal across all groups.

Violations of these assumptions can impact the validity of ANOVA results:
- **Independence Violation:** Data from repeated measures or nested designs can violate independence.
- **Normality Violation:** Skewed or non-normally distributed data can lead to biased results, especially with small sample sizes.
- **Homogeneity of Variances Violation:** Unequal variances can affect the F-statistic and p-values, making them less reliable.

### Q2. Types of ANOVA and Situations
The three types of ANOVA are:
1. **One-Way ANOVA:** Compares means of three or more independent groups on one factor variable.
2. **Two-Way ANOVA:** Analyzes the interaction effects between two independent variables (factors).
3. **Repeated Measures ANOVA:** Compares means of the same group under different conditions or time points.

Each type is used based on the research design:
- One-Way ANOVA for comparing means across multiple groups.
- Two-Way ANOVA for analyzing the effects of two factors and their interaction.
- Repeated Measures ANOVA for within-subject designs or repeated measurements.

### Q3. Partitioning of Variance in ANOVA
The partitioning of variance in ANOVA decomposes the total variability into different components:
- **Total Sum of Squares (SST):** Variation in the dependent variable across all observations.
- **Explained Sum of Squares (SSE):** Variation explained by the factors or independent variables.
- **Residual Sum of Squares (SSR):** Variation not explained by the factors (error variance).

Understanding this concept is crucial as it helps identify the sources of variability and assess the significance of factors or interactions.

### Q4. Calculation of SST, SSE, and SSR in One-Way ANOVA using Python
In Python, you can calculate SST, SSE, and SSR in a One-Way ANOVA using libraries like `scipy.stats` or `statsmodels`.

For example:
```python
import numpy as np
from scipy import stats

# Sample data for groups A, B, and C
data_A = [10, 12, 15, 18, 20]
data_B = [8, 11, 14, 16, 19]
data_C = [9, 13, 16, 17, 21]

# Combine data into one array
all_data = np.concatenate([data_A, data_B, data_C])

# Calculate means
mean_total = np.mean(all_data)
mean_A = np.mean(data_A)
mean_B = np.mean(data_B)
mean_C = np.mean(data_C)

# Calculate Total Sum of Squares (SST)
SST = np.sum((all_data - mean_total) ** 2)

# Calculate Explained Sum of Squares (SSE)
SSE = len(data_A) * (mean_A - mean_total) ** 2 + len(data_B) * (mean_B - mean_total) ** 2 + len(data_C) * (mean_C - mean_total) ** 2

# Calculate Residual Sum of Squares (SSR)
SSR = SST - SSE

print("Total Sum of Squares (SST):", SST)
print("Explained Sum of Squares (SSE):", SSE)
print("Residual Sum of Squares (SSR):", SSR)
```

### Q5. Calculation of Main Effects and Interaction Effects in Two-Way ANOVA using Python
In Two-Way ANOVA, main effects are the effects of each independent variable, while interaction effects are the combined effects of variables.

You can calculate main effects and interaction effects using ANOVA models from libraries like `statsmodels` or `scipy.stats`.

Here's an example using `statsmodels`:
```python
import statsmodels.api as sm
from statsmodels.formula.api import ols

# Sample data for two factors (A and B) and dependent variable Y
data = {'A': [1, 1, 2, 2, 3, 3],
        'B': [1, 2, 1, 2, 1, 2],
        'Y': [10, 12, 15, 18, 20, 8]}

df = pd.DataFrame(data)

# Fit ANOVA model
model = ols('Y ~ A + B + A:B', data=df).fit()

# Calculate main effects and

interaction effects
```python
# Calculate main effects and interaction effects
main_effects = model.params[['A', 'B']]
interaction_effect = model.params['A:B']

print("Main Effects:")
print(main_effects)
print("Interaction Effect:")
print(interaction_effect)
```

### Q6. Interpretation of One-Way ANOVA Results (F-statistic and p-value)
For a one-way ANOVA where you obtained an F-statistic of 5.23 and a p-value of 0.02:

- **F-statistic:** Indicates the ratio of explained variance to unexplained variance. A higher F-value suggests more significant differences between group means.
- **P-value:** Represents the probability of obtaining the observed F-statistic if the null hypothesis (no group differences) were true. A p-value below the significance level (e.g., 0.05) indicates significant differences.

In this case:
- F-statistic of 5.23 suggests there are significant differences between the groups.
- P-value of 0.02 (less than 0.05) indicates that the observed differences are unlikely to be due to random chance.

Therefore, you would conclude that there are statistically significant differences between the groups.

### Q7. Handling Missing Data in Repeated Measures ANOVA
In a repeated measures ANOVA, missing data can be handled using various methods:
1. **Complete Case Analysis:** Exclude cases with missing data (listwise deletion).
2. **Mean Imputation:** Replace missing values with the mean of the observed values.
3. **Last Observation Carried Forward (LOCF):** Use the last observed value for missing data.
4. **Interpolation:** Estimate missing values based on neighboring data points.
5. **Multiple Imputation:** Generate multiple imputed datasets to account for uncertainty.

The consequences of using different methods include:
- Complete case analysis may lead to loss of statistical power if missing data are not random.
- Mean imputation can underestimate variability and bias results.
- LOCF may introduce bias if data are not missing completely at random.
- Interpolation methods rely on assumptions about data continuity.
- Multiple imputation can provide more robust estimates but requires more complex analysis.

### Q8. Common Post-hoc Tests after ANOVA and Usage
Common post-hoc tests after ANOVA include:
1. **Tukey's Honestly Significant Difference (HSD):** Compares all pairs of group means to determine which pairs are significantly different.
2. **Bonferroni Correction:** Adjusts significance levels for multiple comparisons to control Type I error rate.
3. **Duncan's Test:** Similar to Tukey's HSD but less conservative.
4. **Scheffé's Test:** Suitable for unequal sample sizes and complex comparisons.

You would use post-hoc tests when ANOVA indicates significant differences among group means but does not specify which groups differ significantly. For example, after conducting a one-way ANOVA comparing mean weight loss among three diets, you might use Tukey's HSD to identify specific pairwise differences.



### Q9. One-Way ANOVA to Compare Mean Weight Loss of Three Diets

```python
import scipy.stats as stats

# Sample data for weight loss in three diets: A, B, and C
data = {'Diet': ['A', 'A', 'A', 'B', 'B', 'B', 'C', 'C', 'C'],
        'Weight Loss': [5, 4, 6, 3, 2, 4, 6, 5, 7]}

df = pd.DataFrame(data)

# Perform one-way ANOVA
f_statistic, p_value = stats.f_oneway(df[df['Diet'] == 'A']['Weight Loss'],
                                      df[df['Diet'] == 'B']['Weight Loss'],
                                      df[df['Diet'] == 'C']['Weight Loss'])

print("F-statistic:", f_statistic)
print("P-value:", p_value)

# Interpretation
alpha = 0.05
if p_value < alpha:
    print("Reject the null hypothesis: There are significant differences in mean weight loss among the three diets.")
else:
    print("Fail to reject the null hypothesis: There are no significant differences in mean weight loss among the three diets.")
```

### Q10. Two-Way ANOVA for Software Programs and Employee Experience Level

```python
import statsmodels.api as sm
from statsmodels.formula.api import ols

# Sample data for software programs, employee experience, and task completion time
data = {'Software': ['A', 'A', 'B', 'B', 'C', 'C'] * 5,
        'Experience': ['Novice'] * 15 + ['Experienced'] * 15,
        'Time': [10, 12, 11, 13, 9, 11, 14, 15, 13, 12,
                 8, 10, 9, 11, 10, 13, 12, 11, 9, 10,
                 16, 18, 17, 19, 15, 16, 18, 17, 16, 15]}

df = pd.DataFrame(data)

# Fit Two-Way ANOVA model
model = ols('Time ~ Software * Experience', data=df).fit()

# Perform ANOVA
anova_table = sm.stats.anova_lm(model, typ=2)
print(anova_table)
```

The output will provide F-statistics and p-values for main effects (Software, Experience) and interaction effects (Software * Experience).

### Q11. Two-Sample T-Test for Test Scores

```python
from scipy.stats import ttest_ind

# Sample data for control and experimental groups' test scores
control_scores = [80, 85, 82, 78, 86, 84, 83, 79, 81, 87]
experimental_scores = [75, 79, 77, 73, 78, 76, 74, 80, 72, 81]

# Perform two-sample t-test
t_statistic, p_value = ttest_ind(control_scores, experimental_scores)

print("t-statistic:", t_statistic)
print("P-value:", p_value)

# Interpretation
alpha = 0.05
if p_value < alpha:
    print("Reject the null hypothesis: There are significant differences in test scores between the control and experimental groups.")
else:
    print("Fail to reject the null hypothesis: There are no significant differences in test scores between the control and experimental groups.")
```

### Q12. Repeated Measures ANOVA for Daily Sales of Retail Stores

For conducting a repeated measures ANOVA and post-hoc tests, you can use libraries like `statsmodels` or `pingouin`.

Here's an example using `pingouin` for repeated measures ANOVA and pairwise comparisons:

```python
import pingouin as pg

# Sample data for daily sales of three retail stores: A, B, and C
data = {'Day': np.repeat(np.arange(30), 3),
        'Store': np.tile(['A', 'B', 'C'], 30),
        'Sales': np.random.randint(50, 100, size=90)}

df = pd.DataFrame(data)

# Repeated Measures ANOVA
rm_anova = pg.rm_anova(data=df, dv='Sales', within='Store', subject='Day')
print(rm_anova)

# Post-hoc pairwise comparisons
posthoc = pg.pairwise_ttests(data=df, dv='Sales', within='Store', subject='Day', padjust='holm')
print(posthoc)
```

This code performs a repeated measures ANOVA on the daily sales data of three retail stores and then conducts post-hoc pairwise comparisons using the Holm correction method.

Let me know if you need further clarification or assistance with any part!