### Q1. Explain the assumptions required to use ANOVA and provide examples of violations that could impact the validity of the results.

ANOVA (Analysis of Variance) requires the following assumptions:

1. **Independence**: Observations within each group must be independent of each other. Violations might occur if the data were collected in a way that introduces dependence, such as repeated measures without accounting for this in the model.

2. **Normality**: The data within each group should be approximately normally distributed. Violations occur if the data is heavily skewed or has outliers, which could affect the results of the ANOVA test.

3. **Homogeneity of Variances**: The variances among the groups should be approximately equal. Violations occur if one group has much higher or lower variance compared to others, impacting the test's reliability.

**Examples of Violations**:
- **Independence**: Using paired data without proper adjustments.
- **Normality**: Using data with severe skewness or outliers.
- **Homogeneity of Variances**: Using data where group variances differ significantly.

### Q2. What are the three types of ANOVA, and in what situations would each be used?

1. **One-Way ANOVA**: Used when comparing the means of three or more independent groups based on one factor. Example: Comparing test scores from three different teaching methods.

2. **Two-Way ANOVA**: Used when comparing the means across two factors, allowing for the analysis of interaction effects between the factors. Example: Evaluating the effects of diet and exercise on weight loss.

3. **Repeated Measures ANOVA**: Used when comparing means where the same subjects are measured multiple times under different conditions. Example: Measuring blood pressure of patients before and after treatment at different times.

### Q3. What is the partitioning of variance in ANOVA, and why is it important to understand this concept?

In ANOVA, variance is partitioned into:
- **Total Sum of Squares (SST)**: The total variability in the data.
- **Explained Sum of Squares (SSE)**: The variability explained by the model (between-group variability).
- **Residual Sum of Squares (SSR)**: The variability within groups (error or unexplained variability).

Understanding this concept is crucial because it helps in determining how much of the total variability is accounted for by the model versus how much remains unexplained, which is key to interpreting the ANOVA results.

### Q4. How would you calculate the total sum of squares (SST), explained sum of squares (SSE), and residual sum of squares (SSR) in a one-way ANOVA using Python?

```python
import numpy as np
import pandas as pd
from scipy import stats

# Sample data
data = {
    'Group': ['A']*10 + ['B']*10 + ['C']*10,
    'Value': np.random.normal(0, 1, 30)
}
df = pd.DataFrame(data)

# Calculate group means
group_means = df.groupby('Group')['Value'].mean()
overall_mean = df['Value'].mean()

# Calculate SST (Total Sum of Squares)
SST = ((df['Value'] - overall_mean) ** 2).sum()

# Calculate SSE (Explained Sum of Squares)
SSE = ((df.groupby('Group')['Value'].mean() - overall_mean) ** 2).sum() * df.groupby('Group').size()

# Calculate SSR (Residual Sum of Squares)
SSR = SST - SSE

print(f'Total Sum of Squares (SST): {SST}')
print(f'Explained Sum of Squares (SSE): {SSE}')
print(f'Residual Sum of Squares (SSR): {SSR}')

### Q5. In a two-way ANOVA, how would you calculate the main effects and interaction effects using Python?

To calculate main effects and interaction effects in a two-way ANOVA, you can use the `statsmodels` library in Python.

Here’s how you can do it:

```python
import pandas as pd
import statsmodels.api as sm
from statsmodels.formula.api import ols

# Create a sample dataset
data = pd.DataFrame({
    'Factor_A': ['Low', 'Low', 'High', 'High', 'Low', 'Low', 'High', 'High'],
    'Factor_B': ['Type_1', 'Type_2', 'Type_1', 'Type_2', 'Type_1', 'Type_2', 'Type_1', 'Type_2'],
    'Outcome': [25, 30, 35, 40, 20, 27, 37, 44]
})

# Fit the two-way ANOVA model
model = ols('Outcome ~ C(Factor_A) + C(Factor_B) + C(Factor_A):C(Factor_B)', data=data).fit()

# Perform the ANOVA
anova_table = sm.stats.anova_lm(model, typ=2)
print(anova_table)


### Q6. Suppose you conducted a one-way ANOVA and obtained an F-statistic of 5.23 and a p-value of 0.02. What can you conclude about the differences between the groups, and how would you interpret these results?

The p-value of 0.02 is less than the common significance level of 0.05, which means you can reject the null hypothesis. This suggests that there is a statistically significant difference between the group means. However, the ANOVA does not tell you which groups differ from each other, so you may need to conduct a post-hoc test to determine the specific group differences.

### Q7. In a repeated measures ANOVA, how would you handle missing data, and what are the potential consequences of using different methods to handle missing data?

In a repeated measures ANOVA, missing data can be handled in several ways:

- **Listwise Deletion**: Removing all data for any participant with missing values. This can lead to biased results if the data is not missing completely at random.
- **Mean Imputation**: Replacing missing values with the mean of the available data. This can reduce variability and lead to underestimation of standard errors.
- **Multiple Imputation**: Imputing missing data using multiple estimates and averaging the results. This is a more robust method that accounts for variability in missing data.

Each method has different implications, and improper handling of missing data can lead to biased or incorrect conclusions.

### Q8. What are some common post-hoc tests used after ANOVA, and when would you use each one? Provide an example of a situation where a post-hoc test might be necessary.

Common post-hoc tests include:

- **Tukey's HSD**: Used when you want to compare all pairwise group differences.
- **Bonferroni Correction**: Used to adjust p-values when performing multiple comparisons to avoid Type I error.
- **Scheffé Test**: A more conservative test used when comparing complex combinations of means.

Example: After conducting a one-way ANOVA comparing the mean scores of students across three different teaching methods, if the ANOVA shows significant differences, you might perform Tukey's HSD to determine which pairs of teaching methods differ significantly.

### Q9. A researcher wants to compare the mean weight loss of three diets: A, B, and C. They collect data from 50 participants who were randomly assigned to one of the diets. Conduct a one-way ANOVA using Python to determine if there are any significant differences between the mean weight loss of the three diets. Report the F-statistic and p-value, and interpret the results.

```python
import pandas as pd
import scipy.stats as stats

# Example data
data = {'Diet': ['A', 'A', 'A', 'B', 'B', 'B', 'C', 'C', 'C'],
        'Weight_Loss': [5, 6, 7, 4, 5, 6, 3, 4, 5]}

df = pd.DataFrame(data)

# One-way ANOVA
f_stat, p_value = stats.f_oneway(df[df['Diet'] == 'A']['Weight_Loss'],
                                 df[df['Diet'] == 'B']['Weight_Loss'],
                                 df[df['Diet'] == 'C']['Weight_Loss'])

f_stat, p_value

### Q10. A company wants to know if there are any significant differences in the average time it takes to complete a task using three different software programs: Program A, Program B, and Program C. They randomly assign 30 employees to one of the programs and record the time it takes each employee to complete the task. Conduct a two-way ANOVA using Python to determine if there are any main effects or interaction effects between the software programs and employee experience level (novice vs. experienced). Report the F-statistics and p-values, and interpret the results.

```python
import pandas as pd
import statsmodels.api as sm
from statsmodels.formula.api import ols

# Sample data
data = pd.DataFrame({
    'Program': ['A', 'A', 'A', 'B', 'B', 'B', 'C', 'C', 'C'] * 10,
    'Experience': ['Novice', 'Experienced'] * 15,
    'Time': [30, 25, 28, 35, 32, 29, 45, 38, 42] * 10
})

# Two-way ANOVA
model = ols('Time ~ C(Program) + C(Experience) + C(Program):C(Experience)', data=data).fit()
anova_table = sm.stats.anova_lm(model, typ=2)

print(anova_table)


### Q11. An educational researcher is interested in whether a new teaching method improves student test scores. They randomly assign 100 students to either the control group (traditional teaching method) or the experimental group (new teaching method) and administer a test at the end of the semester. Conduct a two-sample t-test using Python to determine if there are any significant differences in test scores between the two groups. If the results are significant, follow up with a post-hoc test to determine which group(s) differ significantly from each other.

```python
import numpy as np
from scipy import stats

# Sample data
control_scores = np.random.normal(75, 10, 50)  # Control group (traditional)
experimental_scores = np.random.normal(80, 10, 50)  # Experimental group (new method)

# Two-sample t-test
t_stat, p_value = stats.ttest_ind(control_scores, experimental_scores)

print(f"T-statistic: {t_stat}, P-value: {p_value}")


### Q12. A researcher wants to know if there are any significant differences in the average daily sales of three retail stores: Store A, Store B, and Store C. They randomly select 30 days and record the sales for each store on those days. Conduct a repeated measures ANOVA using Python to determine if there are any significant differences in sales between the three stores. If the results are significant, follow up with a post-hoc test to determine which store(s) differ significantly from each other.

```python
import pandas as pd
import numpy as np
import statsmodels.api as sm
from statsmodels.formula.api import ols
from statsmodels.stats.anova import AnovaRM

# Sample data
data = pd.DataFrame({
    'Store': ['A', 'B', 'C'] * 30,
    'Day': np.tile(range(1, 31), 3),
    'Sales': np.random.normal(100, 10, 30).tolist() + np.random.normal(120, 10, 30).tolist() + np.random.normal(110, 10, 30).tolist()
})

# Repeated measures ANOVA
aovrm = AnovaRM(data, 'Sales', 'Day', within=['Store']).fit()

print(aovrm.summary())
