# ADVANCE STATS 6

# QUESTIONS:

Q1. Explain the assumptions required to use ANOVA and provide examples of violations that could impact 
the validity of the results.

Q2. What are the three types of ANOVA, and in what situations would each be used?

Q3. What is the partitioning of variance in ANOVA, and why is it important to understand this concept?

Q4. How would you calculate the total sum of squares (SST), explained sum of squares (SSE), and residual 
sum of squares (SSR) in a one-way ANOVA using Python?

Q5. In a two-way ANOVA, how would you calculate the main effects and interaction effects using Python?

Q6. Suppose you conducted a one-way ANOVA and obtained an F-statistic of 5.23 and a p-value of 0.02. 
What can you conclude about the differences between the groups, and how would you interpret these 
results?

Q7. In a repeated measures ANOVA, how would you handle missing data, and what are the potential 
consequences of using different methods to handle missing data?

Q8. What are some common post-hoc tests used after ANOVA, and when would you use each one? Provide 
an example of a situation where a post-hoc test might be necessary.

Q9. A researcher wants to compare the mean weight loss of three diets: A, B, and C. They collect data from 
50 participants who were randomly assigned to one of the diets. Conduct a one-way ANOVA using Python 
to determine if there are any significant differences between the mean weight loss of the three diets. 
Report the F-statistic and p-value, and interpret the results.

Q10. A company wants to know if there are any significant differences in the average time it takes to 
complete a task using three different software programs: Program A, Program B, and Program C. They 
randomly assign 30 employees to one of the programs and record the time it takes each employee to 
complete the task. Conduct a two-way ANOVA using Python to determine if there are any main effects or 
interaction effects between the software programs and employee experience level (novice vs. 
experienced). Report the F-statistics and p-values, and interpret the results.

Q11. An educational researcher is interested in whether a new teaching method improves student test 
scores. They randomly assign 100 students to either the control group (traditional teaching method) or the 
experimental group (new teaching method) and administer a test at the end of the semester. Conduct a 
two-sample t-test using Python to determine if there are any significant differences in test scores 
between the two groups. If the results are significant, follow up with a post-hoc test to determine which 
group(s) differ significantly from each other.

Q12. A researcher wants to know if there are any significant differences in the average daily sales of three 
retail stores: Store A, Store B, and Store C. They randomly select 30 days and record the sales for each store 
on those days. Conduct a repeated measures ANOVA using Python to determine if there are any 
significant differences in sales between the three stores. If the results are significant, follow up with a posthoc test to determine which store(s) differ significantly from each other

# SOLUTIONS:


Q1. Assumptions of ANOVA and Violations:
Assumptions of ANOVA include:
1. **Homogeneity of Variance**: The variances of the groups being compared should be approximately equal.
2. **Independence**: Observations within each group are assumed to be independent.
3. **Normality**: The data within each group should follow a normal distribution.

Violations that could impact validity:
1. **Homogeneity of Variance Violation**: Unequal variances can affect the validity of the F-test and post-hoc comparisons.
2. **Independence Violation**: If observations are not independent, it can lead to pseudoreplication and affect the error term.
3. **Normality Violation**: Departure from normality can impact the validity of p-values and confidence intervals.

Q2. Types of ANOVA and When to Use:
1. **One-Way ANOVA**: Used when comparing means of three or more groups for a single categorical independent variable.
2. **Two-Way ANOVA**: Used when there are two independent variables, allowing analysis of main effects and interaction effects.
3. **Repeated Measures ANOVA**: Used when the same subjects are measured at multiple time points or under multiple conditions.

Q3. Partitioning of Variance and Importance:
Partitioning of variance decomposes the total variation in the data into components attributed to different sources. It's important to understand because it helps identify the sources of variability and the proportion of variability explained by factors, aiding interpretation and hypothesis testing.

Q4. Calculation of SST, SSE, and SSR in One-Way ANOVA (Python):
```python
import scipy.stats as stats
import numpy as np

data = [group_data_1, group_data_2, group_data_3]  # Replace with your data
overall_mean = np.mean(np.concatenate(data))
sst = sum((x - overall_mean)**2 for group in data for x in group)
sse = sum((x - np.mean(group))**2 for group in data for x in group)
ssr = sst - sse
```

Q5. Calculation of Main and Interaction Effects in Two-Way ANOVA (Python):
```python
import statsmodels.api as sm
from statsmodels.formula.api import ols

model = ols('response ~ factor1 * factor2', data=df).fit()
anova_table = sm.stats.anova_lm(model)
main_effects = anova_table['mean_sq'][['factor1', 'factor2']]
interaction_effect = anova_table['mean_sq']['factor1:factor2']
```

Q6. Interpretation of F-statistic and p-value in One-Way ANOVA:
A low p-value (e.g., p < 0.05) indicates that at least one group mean is significantly different from the others. The F-statistic represents the ratio of variance between groups to variance within groups. If the F-statistic is large and the p-value is small, you can conclude that there are significant differences between the groups.

Q7. Handling Missing Data in Repeated Measures ANOVA:
You can use techniques like imputation or mixed-effects models to handle missing data. Consequences of different methods include bias, inflated Type I error rates, or decreased power.

Q8. Common Post-Hoc Tests and Usage:
- **Tukey's HSD**: Used when comparing all pairwise group means.
- **Bonferroni Correction**: Controls familywise error rate when conducting multiple comparisons.
- **Dunn's Test**: Nonparametric alternative for pairwise comparisons.

Q9. One-Way ANOVA in Python:
```python
import scipy.stats as stats

data = [data_group_a, data_group_b, data_group_c]  # Replace with your data
f_statistic, p_value = stats.f_oneway(*data)

print("F-statistic:", f_statistic)
print("P-value:", p_value)

if p_value < 0.05:
    print("There are significant differences between the mean weight loss of the three diets.")
else:
    print("There are no significant differences between the mean weight loss of the three diets.")
```

Q10. Two-Way ANOVA in Python:
```python
import statsmodels.api as sm
from statsmodels.formula.api import ols

model = ols('time ~ program * experience', data=df).fit()
anova_table = sm.stats.anova_lm(model)

print("Main effects:")
print(anova_table['mean_sq'][['program', 'experience']])
print("Interaction effect:")
print(anova_table['mean_sq']['program:experience'])
```

Q11. Two-Sample t-test in Python:
```python
import scipy.stats as stats

group_control = [scores_control_group]  # Replace with your data
group_experimental = [scores_experimental_group]  # Replace with your data

t_statistic, p_value = stats.ttest_ind(group_control, group_experimental)

print("T-statistic:", t_statistic)
print("P-value:", p_value)

if p_value < 0.05:
    print("There is a significant difference in test scores between the two groups.")
else:
    print("There is no significant difference in test scores between the two groups.")
```

Q12. Repeated Measures ANOVA in Python:
```python
import statsmodels.api as sm
from statsmodels.stats.anova import AnovaRM

data = [sales_store_a, sales_store_b, sales_store_c]  # Replace with your data

# Create a DataFrame with columns: ['Store', 'Day', 'Sales']
# Perform AnovaRM on the DataFrame
anovarm = AnovaRM(data=df, depvar='Sales', subject='Store', within=['Day'])
results = anovarm.fit()

print(results)
```

Remember to adapt the code snippets to your specific data and needs.

# ---------------------------------------------------END-----------------------------