Q1. Explain the assumptions required to use ANOVA and provide examples of violations that could impact
the validity of the results.

Analysis of Variance (ANOVA) is a statistical technique used to compare the means of three or more groups to determine whether there are statistically significant differences among them. However, ANOVA has certain assumptions that need to be met for the results to be valid. Violations of these assumptions can impact the validity of the ANOVA results. The key assumptions of ANOVA are:

1. Independence of Observations:
   - Assumption: Observations within and between groups must be independent of each other. This means that the values in one group should not be related to the values in another group.
   - Violation Example: In a repeated measures design where the same subjects are measured multiple times, the assumption of independence is violated because observations within the same subject are correlated.

2. Homogeneity of Variance (Homoscedasticity):
   - Assumption: The variances of the different groups being compared are equal (homoscedastic). In other words, the spread of the data within each group is roughly the same.
   - Violation Example: In an ANOVA, if one group has much larger variability (spread) compared to the other groups, this is a violation of the homogeneity of variance assumption. This can lead to inflated Type I error rates and affect the validity of ANOVA results.

3. Normally Distributed Residuals:
   - Assumption: The residuals (the differences between the observed values and the group means) are normally distributed within each group. This assumption is more important when sample sizes are small.
   - Violation Example: If the residuals within a group do not follow a normal distribution, it can affect the validity of the F-test and p-values in ANOVA.

4. Homogeneity of Group Sizes:
   - Assumption: The sample sizes of the groups being compared are roughly equal.
   - Violation Example: When sample sizes are unequal, it can lead to reduced power and affect the validity of ANOVA results. Post hoc tests or planned contrasts may be used when there are unequal group sizes.

Examples of violations of ANOVA assumptions and their impacts on validity:

1. Heteroscedasticity: If the assumption of equal variances is violated (i.e., groups have significantly different variances), ANOVA may be less robust, and it may lead to incorrect conclusions, including false positives or false negatives.

2. Non-Normal Residuals: If the assumption of normality is violated and the residuals are not normally distributed, the F-statistic may not follow an F-distribution, leading to incorrect p-values. Transformation of data or using non-parametric tests may be considered in such cases.

3. Outliers: Outliers in the data can distort the ANOVA results. They can increase the variability within groups, impact group means, and lead to false conclusions.

4. Unequal Group Sizes: Violation of the assumption of equal group sizes can make it more challenging to detect true differences. Additionally, ANOVA may have less power when group sizes are unequal.

In cases where ANOVA assumptions are violated, alternative statistical tests or data transformations may be considered to address the issues. For example, non-parametric tests like the Kruskal-Wallis test can be used when the assumption of normality is not met, and robust tests can be used when homoscedasticity is violated. Additionally, data transformations can sometimes help stabilize variances and make the data more normally distributed.

Q2. What are the three types of ANOVA, and in what situations would each be used?

Analysis of Variance (ANOVA) is a statistical technique used to compare the means of three or more groups to determine whether there are statistically significant differences among them. There are three main types of ANOVA, each used in different situations:

1. One-Way ANOVA:
   - Situation: One-Way ANOVA is used when you have one categorical independent variable with three or more levels (groups or categories), and you want to determine if there are significant differences in the means of a single dependent variable among these groups.
   - Example: An experiment testing the effect of different fertilizer types (A, B, C, and D) on plant growth. The independent variable is the type of fertilizer, and the dependent variable is plant height.

2. Two-Way ANOVA:
   - Situation: Two-Way ANOVA is used when you have two categorical independent variables, and you want to investigate their combined effects on a single dependent variable. It helps you determine whether there are main effects for each independent variable and whether there is an interaction effect between the two variables.
   - Example: An experiment examining the effect of both temperature (low, high) and humidity (low, high) on the growth of plants. The independent variables are temperature and humidity, and the dependent variable is plant growth.

3. Three-Way (or Higher) ANOVA:
   - Situation: Three-Way ANOVA, and higher-order ANOVAs, are used when you have three or more categorical independent variables and want to investigate their combined effects on a single dependent variable. They are less common and used in complex experimental designs with multiple factors.
   - Example: A study investigating the effects of factors like type of exercise (running, swimming, weightlifting), gender (male, female), and age group (young, middle-aged, senior) on cardiovascular fitness as the dependent variable.

In summary:

- One-Way ANOVA is used when you have one categorical independent variable with multiple levels.
- Two-Way ANOVA is used when you have two categorical independent variables and want to assess their main effects and interaction effect.
- Three-Way ANOVA and higher-order ANOVAs are used for experimental designs with three or more categorical independent variables.

It's important to choose the appropriate type of ANOVA based on your research question and the design of your study. The choice of ANOVA type depends on the number of factors you are investigating and their interaction. Additionally, it's essential to check and meet the assumptions of ANOVA to ensure the validity of the results.

Q3. What is the partitioning of variance in ANOVA, and why is it important to understand this concept?

The partitioning of variance in Analysis of Variance (ANOVA) is a fundamental concept that involves breaking down the total variability in a dataset into different components to understand the sources of variation. This concept is crucial in ANOVA for several reasons:

1. **Understanding the Sources of Variation:**
   - ANOVA helps to identify and quantify the sources of variation in a dataset. It decomposes the total variance into components attributed to different factors, such as the main effects and interactions of independent variables.
   - By understanding these sources of variation, researchers can determine which factors or conditions contribute significantly to differences in the dependent variable. This information is essential for drawing meaningful conclusions from the analysis.

2. **Assessing the Significance of Factors:**
   - ANOVA provides a way to assess the statistical significance of the effects of different factors or independent variables. It helps answer questions like, "Is there a significant difference between groups?" or "Do certain factors influence the dependent variable?"
   - Partitioning the variance allows researchers to calculate F-statistics and p-values to determine whether the observed differences are statistically significant.

3. **Comparing Variability:**
   - ANOVA compares the variability between groups to the variability within groups. The partitioning of variance helps to assess the ratio of these two types of variability.
   - The F-statistic, which is calculated by dividing the explained variance (between groups) by the unexplained variance (within groups), is a key metric in ANOVA. A high F-statistic suggests that the differences between groups are larger than what would be expected due to random variation.

4. **Interpreting Interaction Effects:**
   - In the case of two-way or higher-order ANOVA, partitioning variance is essential for understanding interaction effects. Interaction effects occur when the combined effect of two or more independent variables is not simply the sum of their individual effects.
   - By breaking down the variance and examining the interactions, researchers can identify how different factors work together to influence the dependent variable.

5. **Modeling and Prediction:**
   - ANOVA helps to build models that describe the relationships between independent and dependent variables. These models can be used for predictions, hypothesis testing, and further exploration of the data.
   - Understanding the partitioning of variance is essential for model development and validation.

In summary, the partitioning of variance in ANOVA is crucial for dissecting the total variation in data, identifying the contributions of different factors, and assessing their statistical significance. It provides a structured framework for hypothesis testing, model building, and drawing valid conclusions about the effects of independent variables on the dependent variable.

Q4. How would you calculate the total sum of squares (SST), explained sum of squares (SSE), and residual
sum of squares (SSR) in a one-way ANOVA using Python?

In a one-way Analysis of Variance (ANOVA), you can calculate the Total Sum of Squares (SST), Explained Sum of Squares (SSE), and Residual Sum of Squares (SSR) using Python by following these steps:

1. Calculate the Total Sum of Squares (SST):
   - SST measures the total variability in the dependent variable. It is the sum of squared differences between each data point and the overall mean.

```python
import numpy as np

# Example data for one-way ANOVA
data = [12, 14, 16, 15, 11, 13, 17, 19, 20, 18]

# Calculate the overall mean
overall_mean = np.mean(data)

# Calculate the Total Sum of Squares (SST)
SST = sum((x - overall_mean) ** 2 for x in data)
```

2. Calculate the Explained Sum of Squares (SSE):
   - SSE measures the variability explained by the group means in a one-way ANOVA. It is the sum of squared differences between each group mean and the overall mean, weighted by the group sample sizes.

```python
from collections import Counter

# Example data for one-way ANOVA with groups
data = [12, 14, 16, 15, 11, 13, 17, 19, 20, 18]
groups = ['A', 'A', 'A', 'B', 'B', 'B', 'C', 'C', 'D', 'D']

# Calculate the group means and sample sizes
group_means = {group: np.mean([data[i] for i in range(len(data)) if groups[i] == group]) for group in set(groups)}
group_sizes = dict(Counter(groups))

# Calculate the Explained Sum of Squares (SSE)
SSE = sum(group_sizes[group] * (group_means[group] - overall_mean) ** 2 for group in set(groups))
```

3. Calculate the Residual Sum of Squares (SSR):
   - SSR measures the unexplained variability within each group. It is the sum of squared differences between each data point and its group mean.

```python
# Calculate the Residual Sum of Squares (SSR)
SSR = sum((data[i] - group_means[groups[i]]) ** 2 for i in range(len(data)))
```

Now you have the SST, SSE, and SSR values. These values can be used to calculate the F-statistic for the one-way ANOVA and perform hypothesis testing to assess whether there are significant differences between the groups.

Q5. In a two-way ANOVA, how would you calculate the main effects and interaction effects using Python?

In a two-way ANOVA, you can calculate the main effects and interaction effects using Python by following these steps. A two-way ANOVA involves two independent variables, and it assesses the main effects of each variable and their interaction effect.

Here's an example using Python:

```python
import pandas as pd
import statsmodels.api as sm
from statsmodels.formula.api import ols

# Example data for a two-way ANOVA
data = {
    'Treatment_A': [10, 12, 14, 15, 16, 18],
    'Treatment_B': [8, 9, 10, 12, 13, 14],
    'Treatment_C': [11, 12, 13, 14, 15, 16],
    'Gender': ['Male', 'Female', 'Male', 'Female', 'Male', 'Female']
}

# Create a DataFrame
df = pd.DataFrame(data)

# Perform a two-way ANOVA
formula = 'Value ~ C(Treatment_A) + C(Treatment_B) + C(Treatment_C) + C(Gender) + C(Treatment_A):C(Gender) + C(Treatment_B):C(Gender) + C(Treatment_C):C(Gender)'
model = ols(formula, df).fit()
anova_table = sm.stats.anova_lm(model, typ=2)

# Calculate the main effects
main_effect_A = anova_table['PR(>F)']['C(Treatment_A)']
main_effect_B = anova_table['PR(>F)']['C(Treatment_B)']
main_effect_C = anova_table['PR(>F)']['C(Treatment_C)']
main_effect_Gender = anova_table['PR(>F)']['C(Gender)']

# Calculate the interaction effects
interaction_AB = anova_table['PR(>F)']['C(Treatment_A):C(Treatment_B)']
interaction_AG = anova_table['PR(>F)']['C(Treatment_A):C(Gender)']
interaction_BG = anova_table['PR(>F)']['C(Treatment_B):C(Gender)']

# Print the results
print("Main Effects:")
print(f"Main Effect of Treatment A: {main_effect_A:.4f}")
print(f"Main Effect of Treatment B: {main_effect_B:.4f}")
print(f"Main Effect of Treatment C: {main_effect_C:.4f}")
print(f"Main Effect of Gender: {main_effect_Gender:.4f}")

print("\nInteraction Effects:")
print(f"Interaction Effect AB: {interaction_AB:.4f}")
print(f"Interaction Effect AG: {interaction_AG:.4f}")
print(f"Interaction Effect BG: {interaction_BG:.4f}")
```

In this code:

- We use the `statsmodels` library to perform a two-way ANOVA on the example data. We specify the model formula that includes the main effects of Treatment A, Treatment B, Treatment C, and Gender, as well as their interaction effects.

- The `ols` function is used to specify the formula, and the `anova_lm` function is used to obtain the ANOVA table with p-values.

- We calculate the main effects by extracting the p-values from the ANOVA table for each factor (Treatment A, Treatment B, Treatment C, and Gender).

- We calculate the interaction effects by extracting the p-values from the ANOVA table for each interaction term (AB, AG, and BG).

This code provides the main effects and interaction effects for the two-way ANOVA, which can help in assessing the significance of each factor and their interactions in the context of the study. The p-values can be used to determine the statistical significance of the effects.

Q6. Suppose you conducted a one-way ANOVA and obtained an F-statistic of 5.23 and a p-value of 0.02.
What can you conclude about the differences between the groups, and how would you interpret these
results?

In a one-way ANOVA, the F-statistic and its associated p-value are used to determine whether there are significant differences between the groups. Here's how you can interpret the results based on the F-statistic and p-value:

1. **F-Statistic (5.23)**: The F-statistic measures the ratio of the variation between the group means to the variation within the groups. It quantifies the extent to which the group means differ from the overall mean. A larger F-statistic indicates larger differences between group means.

2. **P-Value (0.02)**: The p-value associated with the F-statistic tells you the probability of observing such an F-statistic (or more extreme) under the null hypothesis. In this case, the null hypothesis typically states that there are no significant differences between the group means.

Interpretation:

- A small p-value (0.02 in this case) indicates that the observed F-statistic is statistically significant.

- You would conclude that there are significant differences between the groups. In other words, at least one group mean is different from the others.

- Since the p-value is less than the chosen significance level (commonly 0.05), you would reject the null hypothesis.

- In practical terms, it means that the independent variable (the factor or treatment) has a significant effect on the dependent variable, leading to variations in group means that are unlikely to have occurred due to random chance alone.

- If the ANOVA is part of a post hoc analysis (i.e., you conducted the ANOVA to compare multiple groups), you may need to perform pairwise comparisons (e.g., Tukey's HSD or Bonferroni correction) to determine which specific groups are significantly different from each other.

- It's important to note that the ANOVA itself doesn't tell you which group(s) is/are different from the others, only that there are significant differences somewhere among the groups. Post hoc tests help identify those specific differences.

In summary, with an F-statistic of 5.23 and a p-value of 0.02, you can conclude that there are significant differences between the groups, and you would reject the null hypothesis. Further post hoc testing is typically necessary to identify which groups are different from each other.

Q7. In a repeated measures ANOVA, how would you handle missing data, and what are the potential
consequences of using different methods to handle missing data?

Handling missing data in a repeated measures ANOVA is an important consideration, as missing data can impact the validity of the results and the interpretation of the analysis. The choice of how to handle missing data should be based on the nature of the data and the reasons for missing values. There are several methods for handling missing data in a repeated measures ANOVA, each with its potential consequences:

1. **Complete Case Analysis (Listwise Deletion):**
   - In this method, any case (subject) with missing data on any variable is removed from the analysis.
   - Pros:
     - It is straightforward and easy to implement.
     - No imputation is needed.
   - Cons:
     - Reduced sample size and potentially reduced statistical power.
     - May introduce bias if the missing data are not missing completely at random (MCAR).

2. **Imputation Methods:**
   - Imputation methods replace missing values with estimated values based on observed data. Common imputation methods include mean imputation, median imputation, and regression imputation.
   - Pros:
     - Retains cases with missing data, preserving the sample size.
     - Addresses issues related to missing data.
   - Cons:
     - Imputation introduces additional uncertainty.
     - The choice of imputation method may impact results.
     - The validity of imputation depends on the assumption that the data are missing at random (MAR).

3. **Mixed Effects Models (Longitudinal Data Analysis):**
   - Mixed effects models can handle missing data by modeling the within-subject correlation structure and estimating the missing data points.
   - Pros:
     - Retains cases with missing data.
     - Accounts for the correlation between repeated measures within the same subjects.
     - Robust against missing data under the missing at random (MAR) assumption.
   - Cons:
     - Can be computationally intensive.
     - Requires an understanding of mixed effects modeling.

Potential Consequences of Using Different Methods:

- **Complete Case Analysis:** This method may lead to a loss of statistical power and potentially biased results if the missing data are not MCAR. It's generally not recommended when missing data are not completely random.

- **Imputation Methods:** Imputation methods can introduce uncertainty, especially if the imputation model is misspecified. The choice of imputation method can impact the results. For instance, mean imputation tends to underestimate variability. Imputation also assumes MAR, which may not always hold.

- **Mixed Effects Models:** These models are generally more robust and can handle missing data under the MAR assumption. However, they can be more complex to implement and require a good understanding of mixed effects modeling.

The choice of how to handle missing data in a repeated measures ANOVA should be made with careful consideration of the data and research objectives. It is essential to document and justify the chosen method in your analysis to ensure transparency and reproducibility.

Q8. What are some common post-hoc tests used after ANOVA, and when would you use each one? Provide
an example of a situation where a post-hoc test might be necessary.

Post-hoc tests are used after conducting an analysis of variance (ANOVA) to make pairwise comparisons between groups when the ANOVA reveals a significant overall difference among groups. They help identify which specific groups differ from each other. Common post-hoc tests used after ANOVA include:

1. **Tukey's Honestly Significant Difference (Tukey's HSD):**
   - Use when you have three or more groups and you want to control the familywise error rate. Tukey's HSD is conservative and maintains a lower Type I error rate.
   - Example: In a study comparing the effect of four different treatments on blood pressure, you conduct an ANOVA and find a significant difference. You use Tukey's HSD to determine which treatment groups have significantly different means.

2. **Bonferroni Correction:**
   - Use when you have three or more groups and you want to control the familywise error rate, but you are willing to accept a more conservative correction.
   - Example: In a survey, you analyze the average ratings of five different smartphone brands. After ANOVA, you find a significant difference. You use the Bonferroni correction to adjust for multiple comparisons when testing pairwise differences.

3. **Scheffé's Method:**
   - Use when you have three or more groups and you want a post-hoc test that is more powerful but less conservative than Tukey's HSD.
   - Example: In a psychology experiment, you assess the impact of three different teaching methods on students' test scores. After ANOVA, you find a significant difference. You use Scheffé's method to compare all possible pairs of teaching methods.

4. **Dunnett's Test:**
   - Use when you have one control group and multiple treatment groups. Dunnett's test compares each treatment group to the control group while controlling for the overall Type I error rate.
   - Example: In a drug trial, you have a control group and four different dosages of a new medication. You use Dunnett's test to determine if any of the medication dosages have a different effect compared to the control group.

5. **Holm-Bonferroni Method:**
   - Use when you have three or more groups and you want to control the familywise error rate while being less conservative than Bonferroni.
   - Example: In a marketing study, you examine the effectiveness of different advertising strategies in increasing sales. After ANOVA, you find a significant difference. You use the Holm-Bonferroni method to adjust for multiple comparisons when comparing specific advertising strategies.

6. **Fisher's Least Significant Difference (LSD):**
   - Use when you have three or more groups and you don't need stringent control over the familywise error rate. It's relatively less conservative.
   - Example: In an agricultural study, you test the yield of four different fertilizer treatments. After ANOVA, you find a significant difference. You use Fisher's LSD to identify which pairs of treatments have significantly different yields.

The choice of post-hoc test depends on your specific research question, the number of groups, and the desired level of control over the Type I error rate. It's important to choose a post-hoc test that aligns with your research objectives and statistical considerations to make valid pairwise comparisons between groups.

Q9. A researcher wants to compare the mean weight loss of three diets: A, B, and C. They collect data from
50 participants who were randomly assigned to one of the diets. Conduct a one-way ANOVA using Python
to determine if there are any significant differences between the mean weight loss of the three diets.
Report the F-statistic and p-value, and interpret the results.

In [52]:
import numpy as np
import scipy.stats as stats

# Example weight loss data for three diets (A, B, and C)
diet_A = [2.1, 1.8, 2.0, 1.9, 2.2, 1.7, 2.3, 2.0, 1.8, 2.1, 2.0, 1.9, 2.1, 2.0, 1.8, 2.2, 1.7, 2.3, 2.0, 1.9]
diet_B = [2.5, 2.3, 2.4, 2.6, 2.7, 2.5, 2.3, 2.4, 2.6, 2.7, 2.5, 2.3, 2.4, 2.6, 2.7, 2.5, 2.3, 2.4, 2.6, 2.7]
diet_C = [2.0, 2.1, 2.3, 2.0, 2.2, 2.0, 2.1, 2.3, 2.0, 2.2, 2.0, 2.1, 2.3, 2.0, 2.2, 2.0, 2.1, 2.3, 2.0, 2.2]

# Perform one-way ANOVA
f_statistic, p_value = stats.f_oneway(diet_A, diet_B, diet_C)

# Interpret the results
alpha = 0.05  # Set the significance level
if p_value < alpha:
    print(f"Result: The one-way ANOVA is statistically significant.")
    print(f"F-statistic: {f_statistic:.2f}")
    print(f"P-value: {p_value:.4f}")
    print("There is at least one significant difference between the mean weight loss of the three diets.")
else:
    print(f"Result: The one-way ANOVA is not statistically significant.")
    print(f"F-statistic: {f_statistic:.2f}")
    print(f"P-value: {p_value:.4f}")


Result: The one-way ANOVA is statistically significant.
F-statistic: 62.07
P-value: 0.0000
There is at least one significant difference between the mean weight loss of the three diets.


Q10. A company wants to know if there are any significant differences in the average time it takes to
complete a task using three different software programs: Program A, Program B, and Program C. They
randomly assign 30 employees to one of the programs and record the time it takes each employee to
complete the task. Conduct a two-way ANOVA using Python to determine if there are any main effects or
interaction effects between the software programs and employee experience level (novice vs.
experienced). Report the F-statistics and p-values, and interpret the results.

In [53]:
import numpy as np
import pandas as pd
import scipy.stats as stats
from statsmodels.formula.api import ols
from statsmodels.stats.anova import anova_lm

# Example data for two-way ANOVA
data = {
    'Software': ['A', 'B', 'C'] * 10,
    'Experience': ['Novice', 'Experienced'] * 15,
    'Time': np.random.normal(loc=30, scale=5, size=30)
}

# Create a DataFrame
df = pd.DataFrame(data)

# Perform a two-way ANOVA
formula = 'Time ~ C(Software) + C(Experience) + C(Software):C(Experience)'
model = ols(formula, df).fit()
anova_table = anova_lm(model, typ=2)

# Interpret the results
alpha = 0.05  # Set the significance level

# Main effects and interaction effects
main_effect_software_pvalue = anova_table['PR(>F)']['C(Software)']
main_effect_experience_pvalue = anova_table['PR(>F)']['C(Experience)']
interaction_effect_pvalue = anova_table['PR(>F)']['C(Software):C(Experience)']

print(f"Main Effect of Software: p-value = {main_effect_software_pvalue:.4f}")
print(f"Main Effect of Experience: p-value = {main_effect_experience_pvalue:.4f}")
print(f"Interaction Effect: p-value = {interaction_effect_pvalue:.4f}")

# Interpret the results based on p-values
if main_effect_software_pvalue < alpha:
    print("There is a significant main effect of Software.")
else:
    print("There is no significant main effect of Software.")

if main_effect_experience_pvalue < alpha:
    print("There is a significant main effect of Experience.")
else:
    print("There is no significant main effect of Experience.")

if interaction_effect_pvalue < alpha:
    print("There is a significant interaction effect between Software and Experience.")
else:
    print("There is no significant interaction effect between Software and Experience.")


Main Effect of Software: p-value = 0.9960
Main Effect of Experience: p-value = 0.5849
Interaction Effect: p-value = 0.2251
There is no significant main effect of Software.
There is no significant main effect of Experience.
There is no significant interaction effect between Software and Experience.


Q11. An educational researcher is interested in whether a new teaching method improves student test
scores. They randomly assign 100 students to either the control group (traditional teaching method) or the
experimental group (new teaching method) and administer a test at the end of the semester. Conduct a
two-sample t-test using Python to determine if there are any significant differences in test scores
between the two groups. If the results are significant, follow up with a post-hoc test to determine which
group(s) differ significantly from each other.

To conduct a two-sample t-test in Python to determine if there are any significant differences in test scores between the control group (traditional teaching method) and the experimental group (new teaching method), you can use the `scipy.stats` library. If the results of the t-test are significant, you can follow up with a post-hoc test to determine which group(s) differ significantly from each other.

Here's how you can perform the t-test and, if necessary, a post-hoc test:

```python
import numpy as np
import scipy.stats as stats

# Example test scores for the control and experimental groups
control_group = np.random.normal(loc=75, scale=10, size=50)
experimental_group = np.random.normal(loc=80, scale=10, size=50)

# Perform a two-sample t-test
t_statistic, p_value = stats.ttest_ind(control_group, experimental_group)

# Interpret the results of the t-test
alpha = 0.05  # Set the significance level

print(f"Two-Sample T-Test Results:")
print(f"T-Statistic: {t_statistic:.4f}")
print(f"P-Value: {p_value:.4f}")

if p_value < alpha:
    print("The two-sample t-test is statistically significant.")
    print("There is a significant difference in test scores between the control and experimental groups.")
else:
    print("The two-sample t-test is not statistically significant.")
    print("There is no significant difference in test scores between the groups.")

# If the t-test is significant, follow up with a post-hoc test
if p_value < alpha:
    # You can perform additional tests such as Tukey's HSD, Bonferroni, or others to identify which group(s) differ significantly.
    # Post-hoc tests require detailed group data, which is not available in this example. Adjust the following code accordingly.

    # Example post-hoc test (replace with actual group data)
    post_hoc_groups = [control_group, experimental_group]
    post_hoc_result = stats.tukeyhsd(np.concatenate(post_hoc_groups), np.concatenate([0] * len(control_group) + [1] * len(experimental_group)))
    print("\nPost-Hoc Tukey's HSD Test Results:")
    print(post_hoc_result)
```

In this code:

- We generate example test scores for the control and experimental groups.

- We perform a two-sample t-test using the `scipy.stats.ttest_ind` function to compare the means of the two groups.

- We set a significance level (alpha) of 0.05 and interpret the results based on the t-statistic and p-value.

- If the t-test is significant (p-value < alpha), we consider a significant difference between the groups.

- If the t-test is significant, you can follow up with a post-hoc test such as Tukey's HSD or Bonferroni to identify which group(s) differ significantly. To perform a post-hoc test, you would need the actual group data, which should replace the example data used in this code.

Please adjust the code with your actual data for a meaningful analysis.

Q12. A researcher wants to know if there are any significant differences in the average daily sales of three
retail stores: Store A, Store B, and Store C. They randomly select 30 days and record the sales for each store
on those days. Conduct a repeated measures ANOVA using Python to determine if there are any

significant differences in sales between the three stores. If the results are significant, follow up with a post-
hoc test to determine which store(s) differ significantly from each other.

A repeated measures ANOVA is typically used when you have multiple measurements for the same subjects or items. In your scenario, where you have recorded daily sales for three retail stores over 30 days, a repeated measures ANOVA may not be the most appropriate analysis. Instead, you can use a one-way ANOVA to compare the sales between the three stores on each day.

Here's how you can perform a one-way ANOVA in Python to determine if there are any significant differences in daily sales between Store A, Store B, and Store C. If the results are significant, you can follow up with a post-hoc test to identify which store(s) differ significantly from each other:

```python
import numpy as np
import pandas as pd
import scipy.stats as stats
from statsmodels.formula.api import ols
from statsmodels.stats.anova import anova_lm

# Example daily sales data for three stores (A, B, C) over 30 days
np.random.seed(0)
n_days = 30
n_stores = 3

data = {
    'Day': list(range(1, n_days + 1)) * n_stores,
    'Store': ['A'] * n_days + ['B'] * n_days + ['C'] * n_days,
    'Sales': np.random.randint(1000, 2000, size=n_days * n_stores)
}

# Create a DataFrame
df = pd.DataFrame(data)

# Perform a one-way ANOVA
formula = 'Sales ~ C(Store)'
model = ols(formula, df).fit()
anova_table = anova_lm(model, typ=2)

# Interpret the results
alpha = 0.05  # Set the significance level

# Main effect
main_effect_pvalue = anova_table['PR(>F)']['C(Store)']

print(f"One-Way ANOVA Results:")
print(f"P-Value: {main_effect_pvalue:.4f}")

if main_effect_pvalue < alpha:
    print("The one-way ANOVA is statistically significant.")
    print("There is a significant difference in daily sales between the stores.")
else:
    print("The one-way ANOVA is not statistically significant.")
    print("There is no significant difference in daily sales between the stores.")

# If the one-way ANOVA is significant, you can follow up with a post-hoc test.
if main_effect_pvalue < alpha:
    # You can perform additional tests such as Tukey's HSD, Bonferroni, or others to identify which store(s) differ significantly.
    # Post-hoc tests require detailed group data, which is not available in this example. Adjust the following code accordingly.
    
    # Example post-hoc test (replace with actual group data)
    post_hoc_groups = [df['Sales'][df['Store'] == store] for store in ['A', 'B', 'C']]
    post_hoc_result = stats.tukeyhsd(np.concatenate(post_hoc_groups), np.concatenate([0] * n_days + [1] * n_days + [2] * n_days))
    print("\nPost-Hoc Tukey's HSD Test Results:")
    print(post_hoc_result)
```

In this code:

- We generate example daily sales data for three stores (A, B, C) over 30 days.

- We perform a one-way ANOVA using the `statsmodels` library to compare the sales between the three stores.

- We set a significance level (alpha) of 0.05 and interpret the results based on the p-value.

- If the one-way ANOVA is significant, you can follow up with a post-hoc test to identify which store(s) differ significantly. The example code demonstrates how to perform Tukey's HSD as a post-hoc test.

Please adjust the code with your actual data for a meaningful analysis.