In [None]:
Q1. Explain the assumptions required to use ANOVA and provide examples of violations that could impact
the validity of the results.



Answer:
    Analysis of Variance (ANOVA) is a statistical technique used to compare means among multiple groups, typically three or more. To use ANOVA effectively and ensure the validity of the results, certain assumptions need to be met. These assumptions include:

1. **Independence**: Observations within each group are assumed to be independent of each other. This means that the value of an observation in one group should not be influenced by the value of an observation in another group.

2. **Normality**: The data within each group should be approximately normally distributed. This assumption is more critical when sample sizes are small, as ANOVA becomes robust to violations of normality when sample sizes are large due to the Central Limit Theorem.

3. **Homogeneity of Variance (Homoscedasticity)**: The variances of the groups should be roughly equal. In other words, the spread of the data within each group should be similar. Homoscedasticity ensures that the variation within groups is consistent across all groups.

Now, let's discuss violations of these assumptions and how they can impact the validity of ANOVA results:

1. **Independence Violation**: If observations are not independent, it can lead to pseudoreplication, where the assumption of having separate and distinct groups is violated. For example, if you measure the same subjects in multiple groups without proper accounting for dependencies, it can lead to inflated Type I error rates.

2. **Normality Violation**: When the normality assumption is violated, the F-statistic used in ANOVA might not follow an F-distribution as assumed. This can affect the Type I error rate (false positive rate) and lead to incorrect conclusions. However, ANOVA is somewhat robust to mild departures from normality, especially with larger sample sizes.

3. **Homoscedasticity Violation**: If the assumption of equal variances is violated, the F-statistic may be unreliable, and the Type I error rate can be affected. If the variances are not equal across groups, it can lead to unequal contributions of different groups to the F-statistic, potentially leading to false conclusions about group differences.

It's worth noting that the impact of these violations can vary based on factors such as sample size, the degree of violation, and the specific type of ANOVA being used (e.g., one-way ANOVA, two-way ANOVA).

When these assumptions are significantly violated, alternative approaches might be more appropriate. For example, non-parametric tests like the Kruskal-Wallis test can be used for group comparisons when the assumptions of ANOVA are not met.

In practice, it's important to assess these assumptions before interpreting ANOVA results. This can involve techniques like visual inspection of data distributions, normality tests (e.g., Shapiro-Wilk test), and tests for homogeneity of variances (e.g., Levene's test). If assumptions are violated, careful consideration and potentially alternative methods should be employed to ensure the reliability of your statistical analysis.

In [None]:
Q2. What are the three types of ANOVA, and in what situations would each be used?

Answer: 

There are three main types of Analysis of Variance (ANOVA) techniques, each designed to address different research questions and experimental designs:

1. **One-Way ANOVA**:
   - **Situation**: When you have one independent variable (factor) with three or more levels (groups) and you want to compare means across those groups.
   - **Example**: An experiment that investigates the effect of different types of exercise (aerobic, strength training, flexibility) on cardiovascular fitness by measuring participants' heart rates after a certain period.

2. **Two-Way ANOVA**:
   - **Situation**: When you have two independent variables (factors) and you want to examine their individual and interaction effects on a dependent variable.
   - **Example**: A study that examines how both gender and treatment type (e.g., medication, therapy) influence the reduction in symptoms of anxiety in patients.

3. **Three-Way ANOVA**:
   - **Situation**: When you have three independent variables (factors) and you want to study their individual and interaction effects on a dependent variable.
   - **Example**: An agricultural study investigating the effects of three factors—fertilizer type, water availability, and sunlight exposure—on crop yield.

In summary:

- **One-Way ANOVA** is used when you have one independent variable and want to compare means across multiple groups.
- **Two-Way ANOVA** is used when you have two independent variables and want to study their individual and combined effects on a dependent variable.
- **Three-Way ANOVA** is used when you have three independent variables and want to examine their individual and combined effects on a dependent variable.

Each type of ANOVA assesses the variation between and within groups or combinations of factors. The analysis allows you to determine whether the observed differences in means are statistically significant or if they could be due to random chance. If significant differences are found, further post hoc tests might be performed to identify which specific groups or combinations of factors differ from each other.

It's important to ensure that the assumptions of ANOVA are met before interpreting the results. If the assumptions are not met, alternatives like non-parametric tests or transformations of data might be considered.


In [None]:
Q3. What is the partitioning of variance in ANOVA, and why is it important to understand this concept?





Answer:
    The partitioning of variance in ANOVA refers to the breakdown of the total variance observed in a dataset into different components that can be attributed to various sources of variation. This breakdown helps to quantify the extent to which different factors or sources contribute to the observed variability in the data. Understanding the partitioning of variance is crucial because it allows researchers to determine the significance of these sources and make informed conclusions about the effects being studied.

In ANOVA, the total variance of the data is divided into two main components:

1. **Between-Group Variance (Systematic Variance)**: This component represents the variability between different groups or levels of the independent variable(s). It measures how much the means of the groups differ from each other. The larger the between-group variance relative to the within-group variance, the more likely it is that there is a significant effect of the independent variable on the dependent variable.

2. **Within-Group Variance (Error Variance)**: This component represents the variability within each group or level of the independent variable(s). It accounts for the random variability or noise in the data that cannot be explained by the factors under study.

The ANOVA process involves comparing the between-group variance to the within-group variance using an F-statistic. If the between-group variance is significantly larger than the within-group variance, it suggests that there are systematic differences between the groups, and the null hypothesis (no group differences) can be rejected.

Understanding the partitioning of variance is important for several reasons:

1. **Hypothesis Testing**: ANOVA helps researchers test hypotheses about the effects of different factors on the dependent variable. By partitioning the variance, researchers can determine if the observed differences are statistically significant or if they could have occurred by chance.

2. **Identifying Important Factors**: By quantifying the contributions of different factors to the variance, researchers can identify which factors have the most significant influence on the dependent variable.

3. **Experimental Design**: Partitioning variance can guide experimental design by showing which factors should be controlled or manipulated to minimize within-group variance and maximize between-group variance.

4. **Interpretation**: Understanding the partitioning of variance helps researchers interpret the results of ANOVA correctly and make more informed conclusions about the relationships between variables.

In summary, the partitioning of variance in ANOVA allows researchers to analyze the sources of variability in their data, assess the significance of effects, and draw meaningful conclusions about the relationships between variables.

In [1]:
# Q4. How would you calculate the total sum of squares (SST), explained sum of squares (SSE), and residual
# sum of squares (SSR) in a one-way ANOVA using Python?



# In a one-way ANOVA, you can calculate the Total Sum of Squares (SST), Explained Sum of Squares (SSE), and Residual Sum of Squares (SSR) using Python. You would typically use a library like `numpy` to perform the calculations. Here's how you can do it step by step:

# Let's assume you have a dataset with multiple groups and a corresponding array for each group's data points.


import numpy as np

# Simulated data for three groups
group1 = np.array([10, 12, 14, 15, 18])
group2 = np.array([20, 22, 24, 25, 28])
group3 = np.array([30, 32, 34, 35, 38])

# Combine all data into one array
all_data = np.concatenate([group1, group2, group3])

# Calculate overall mean
overall_mean = np.mean(all_data)

# Calculate the Total Sum of Squares (SST)
sst = np.sum((all_data - overall_mean)**2)

# Calculate the Explained Sum of Squares (SSE)
group_means = [np.mean(group) for group in [group1, group2, group3]]
sse = np.sum([len(group) * (mean - overall_mean)**2 for group, mean in zip([group1, group2, group3], group_means)])

# Calculate the Residual Sum of Squares (SSR)
ssr = sst - sse

print("Total Sum of Squares (SST):", sst)
print("Explained Sum of Squares (SSE):", sse)
print("Residual Sum of Squares (SSR):", ssr)


# In this example, `sst` represents the total variability in the data, `sse` represents the variability explained by the group means, and `ssr` represents the unexplained variability, or error, in the data.

# Keep in mind that in a real-world scenario, you might want to read the data from a file or database, and you may also want to conduct formal statistical analysis using libraries like `scipy.stats` to perform hypothesis testing and obtain p-values for ANOVA.

Total Sum of Squares (SST): 1110.4
Explained Sum of Squares (SSE): 999.9999999999997
Residual Sum of Squares (SSR): 110.40000000000043


In [2]:
# Q5. In a two-way ANOVA, how would you calculate the main effects and interaction effects using Python?



# In a two-way ANOVA, you can calculate the main effects and interaction effects using Python. To perform this analysis, you'll need to use statistical libraries like `scipy.stats` or `statsmodels`. Here's an example using the `statsmodels` library:

# Assuming you have a dataset with two factors (Factor A and Factor B) and a corresponding array of data points for each combination of factors:


import numpy as np
import pandas as pd
import statsmodels.api as sm
from statsmodels.formula.api import ols

# Simulated data for a 2x2 factorial design
data = {
    'FactorA': np.repeat(['A1', 'A2'], 5),
    'FactorB': np.tile(['B1', 'B2'], 5),
    'Values': [10, 12, 15, 18, 20, 22, 24, 25, 28, 30]
}

df = pd.DataFrame(data)

# Fit a two-way ANOVA model
model = ols('Values ~ FactorA + FactorB + FactorA:FactorB', data=df).fit()
anova_table = sm.stats.anova_lm(model, typ=2)

print(anova_table)

# In the ANOVA table produced by this code, you will see:

# - **Main Effects**: Look at the rows for 'FactorA' and 'FactorB'. These represent the main effects of each factor. If the p-values for these effects are small (typically less than 0.05), it indicates that the corresponding factor has a significant effect on the dependent variable.

# - **Interaction Effect**: Look at the row for 'FactorA:FactorB'. This represents the interaction effect between Factor A and Factor B. If the p-value for this interaction is small, it suggests that the interaction effect is significant. An interaction effect indicates that the combined effect of the two factors is not simply additive and that their influence on the dependent variable depends on each other.

# Remember that a significant interaction effect might lead to additional analysis to understand the nature of the interaction, possibly involving post hoc tests or graphical exploration.

# It's worth noting that interpreting interaction effects can be more complex than interpreting main effects. Understanding the context of the study and visualizing the data can aid in interpreting the results accurately.

                     sum_sq   df          F    PR(>F)
FactorA          281.666667  1.0  15.552147  0.007593
FactorB            0.066667  1.0   0.003681  0.953592
FactorA:FactorB    0.066667  1.0   0.003681  0.953592
Residual         108.666667  6.0        NaN       NaN


In [None]:
Q6. Suppose you conducted a one-way ANOVA and obtained an F-statistic of 5.23 and a p-value of 0.02.
What can you conclude about the differences between the groups, and how would you interpret these
results?




Answer:
    In the context of a one-way ANOVA, the F-statistic and its associated p-value are used to assess whether there are statistically significant differences in means between the groups. The F-statistic is calculated by comparing the ratio of variability between the groups to the variability within the groups. The p-value indicates the probability of obtaining such a result (or more extreme) if the null hypothesis were true.

Given your information of an F-statistic of 5.23 and a p-value of 0.02, here's how you can interpret these results:

1. **F-Statistic Value (5.23)**:
   The F-statistic is a measure of how much the variability between the groups differs from the variability within the groups. A larger F-statistic indicates that the differences between group means are relatively larger compared to the random variability within each group.

2. **P-Value (0.02)**:
   The p-value associated with the F-statistic is 0.02. This p-value represents the probability of observing an F-statistic as extreme as 5.23 (or more extreme) under the assumption that there are no real differences between group means (null hypothesis).

**Interpretation**:
With a p-value of 0.02, the p-value is less than the commonly used significance level of 0.05. This suggests that you have enough evidence to reject the null hypothesis, which states that there are no significant differences between the group means.

Therefore, you can conclude that there are statistically significant differences between at least some of the groups in your dataset. However, the ANOVA itself doesn't tell you which specific groups are different from each other—further post hoc tests or pairwise comparisons might be needed to determine which groups are driving this difference.

Remember that statistical significance doesn't necessarily imply practical significance. It's important to consider the effect size and the context of the study to understand the practical implications of the observed differences between groups.

In [None]:
Q7. In a repeated measures ANOVA, how would you handle missing data, and what are the potential
consequences of using different methods to handle missing data?



Answer: 
    Handling missing data in a repeated measures ANOVA is essential for obtaining accurate and reliable results. Missing data can arise for various reasons, such as participant dropouts, technical errors, or incomplete responses. The way you handle missing data can impact the validity of your analysis and the conclusions you draw. Here are common methods for handling missing data in repeated measures ANOVA and their potential consequences:

1. **Complete Case Analysis (Listwise Deletion)**:
   In this approach, cases with any missing data in any variable are excluded from the analysis. This can lead to a reduction in sample size and potentially introduce bias if the missing data are not random. It can also decrease the power of the analysis, making it less likely to detect true effects.

2. **Mean Imputation**:
   Mean imputation involves replacing missing values with the mean of the observed values for that variable. While this is a simple approach, it can distort the variance and covariance structures in the data, leading to biased estimates and underestimated standard errors. It also doesn't account for the uncertainty introduced by imputation.

3. **Last Observation Carried Forward (LOCF)**:
   LOCF involves replacing missing values with the last observed value for that individual. This method assumes that the missing data pattern follows the trend of the last observed value. However, it might not accurately capture the true trajectory of the data and can lead to biased results.

4. **Linear Interpolation**:
   Linear interpolation estimates missing values based on the surrounding observed values. While it can provide more realistic estimates compared to mean imputation or LOCF, it assumes linear trends, which might not be appropriate for all types of data.

5. **Multiple Imputation**:
   Multiple imputation creates multiple plausible imputed datasets based on the observed data's distribution and correlation structure. Analyzing each imputed dataset separately and combining the results can provide more accurate parameter estimates, standard errors, and p-values. However, this approach can be computationally intensive and might require specialized software.

6. **Model-Based Imputation**:
   This involves fitting a model to the observed data and using the model to predict the missing values. While this approach can produce reasonable estimates when the model assumptions are met, it can introduce bias if the model is misspecified.

The potential consequences of using different methods to handle missing data include bias, inaccurate estimates, incorrect p-values, and reduced power. It's important to choose a method that aligns with the underlying assumptions of your data and the research question. Furthermore, sensitivity analyses, which involve using different methods and comparing their impact on results, can help assess the robustness of your conclusions.

Whichever method you choose, transparency about your approach and a discussion of the potential limitations of handling missing data are crucial when reporting the results of your repeated measures ANOVA.

In [None]:
Q8. What are some common post-hoc tests used after ANOVA, and when would you use each one? Provide
an example of a situation where a post-hoc test might be necessary.


Answer:
    Post-hoc tests are used after performing an Analysis of Variance (ANOVA) to make pairwise comparisons between groups when a significant overall difference has been found. These tests help determine which specific groups differ from each other. Some common post-hoc tests include:

1. **Tukey's Honestly Significant Difference (HSD)**:
   - **When to Use**: Tukey's HSD is a conservative method suitable for comparing all possible pairs of group means. It controls the familywise error rate, making it a good choice when you have a large number of pairwise comparisons.
   - **Example**: In a study comparing the effects of three different diets on weight loss, a significant difference was found in the ANOVA. Tukey's HSD can be used to identify which specific diets resulted in significantly different weight loss.

2. **Bonferroni Correction**:
   - **When to Use**: The Bonferroni correction is a straightforward method that involves dividing the desired significance level by the number of comparisons to control the familywise error rate. It's suitable when you have a small number of pairwise comparisons.
   - **Example**: In a study comparing the effectiveness of different teaching methods in three classrooms, a significant difference was found in the ANOVA. The Bonferroni correction can be used to adjust the significance level for each pairwise comparison.

3. **LSD (Least Significant Difference)**:
   - **When to Use**: LSD is less conservative than Tukey's HSD and is suitable when you have a small number of pairwise comparisons. It's important to note that using LSD without a significant omnibus test (like ANOVA) can inflate the Type I error rate.
   - **Example**: In a clinical trial comparing three different medications for pain relief, a significant difference was found in the ANOVA. LSD can help identify which specific pairs of medications have significantly different effects on pain relief.

4. **Dunnett's Test**:
   - **When to Use**: Dunnett's test is used when you have one control group and want to compare the other groups to the control group. It's often used in situations where you're interested in whether treatments differ from a baseline or standard treatment.
   - **Example**: In a study comparing the effectiveness of new medical treatments to a standard treatment, Dunnett's test can be used to determine if any of the new treatments have significantly different effects from the standard treatment.

5. **Scheffe's Test**:
   - **When to Use**: Scheffe's test is a more conservative post-hoc test that is suitable for cases with unequal group sizes and complex designs. It controls the familywise error rate but might result in wider confidence intervals compared to other methods.
   - **Example**: In a psychological study with a complex factorial design involving multiple factors, Scheffe's test can be used to explore specific interactions between factors.

It's important to select the appropriate post-hoc test based on the design of your study, the number of comparisons, and the level of control you want over the Type I error rate. Post-hoc tests help avoid making false conclusions about group differences and provide more specific insights into the relationships between groups in your data.


In [3]:
# Q9. A researcher wants to compare the mean weight loss of three diets: A, B, and C. They collect data from
# 50 participants who were randomly assigned to one of the diets. Conduct a one-way ANOVA using Python
# to determine if there are any significant differences between the mean weight loss of the three diets.
# Report the F-statistic and p-value, and interpret the results.


import numpy as np
from scipy import stats

# Simulated weight loss data for three diets (A, B, C)
diet_a = np.array([1.5, 2.0, 1.8, 2.2, 2.5, 2.3, 1.7, 1.9, 2.1, 1.8,
                   2.4, 2.6, 2.0, 1.6, 1.9, 2.2, 1.7, 1.8, 2.3, 1.5,
                   2.1, 2.0, 1.9, 1.8, 1.7, 1.6, 2.2, 2.1, 1.8, 1.9,
                   2.0, 2.3, 1.5, 2.1, 1.8, 2.2, 2.5, 2.3, 1.7, 1.9,
                   2.4, 2.6, 2.0, 1.6, 1.9, 2.2, 1.7, 1.8, 2.3, 1.5])

diet_b = np.array([1.2, 1.5, 1.8, 2.1, 1.7, 1.9, 2.2, 1.5, 2.0, 2.3,
                   1.6, 1.9, 2.4, 2.6, 2.0, 1.6, 1.9, 2.2, 1.7, 1.8,
                   2.3, 1.5, 2.1, 2.0, 1.9, 1.8, 1.7, 2.2, 2.1, 1.8,
                   1.9, 2.0, 2.3, 1.5, 2.1, 1.8, 2.2, 1.7, 1.9, 2.4,
                   2.6, 2.0, 1.6, 1.9, 2.2, 1.7, 1.8, 2.3, 1.5])

diet_c = np.array([2.0, 2.3, 1.7, 1.9, 2.1, 1.8, 2.2, 2.5, 2.3, 1.7,
                   1.8, 2.3, 1.5, 2.1, 2.0, 1.9, 1.8, 1.7, 1.6, 2.2,
                   2.1, 1.8, 1.9, 2.0, 2.3, 1.5, 2.1, 1.8, 2.2, 1.7,
                   1.9, 2.4, 2.6, 2.0, 1.6, 1.9, 2.2, 1.7, 1.8, 2.3,
                   1.5, 2.0, 1.8, 2.1, 1.7, 1.9, 2.2, 2.5])

# Combine all data into one array
all_data = np.concatenate([diet_a, diet_b, diet_c])

# Group labels for each diet
group_labels = ['A'] * len(diet_a) + ['B'] * len(diet_b) + ['C'] * len(diet_c)

# Perform one-way ANOVA
f_statistic, p_value = stats.f_oneway(diet_a, diet_b, diet_c)

print("F-statistic:", f_statistic)
print("p-value:", p_value)

# Interpret the results
alpha = 0.05
if p_value < alpha:
    print("There are significant differences between the mean weight loss of the three diets.")
else:
    print("There are no significant differences between the mean weight loss of the three diets.")


F-statistic: 0.4333013206515489
p-value: 0.6492076183304043
There are no significant differences between the mean weight loss of the three diets.


In [5]:
# Q10. A company wants to know if there are any significant differences in the average time it takes to
# complete a task using three different software programs: Program A, Program B, and Program C. They
# randomly assign 30 employees to one of the programs and record the time it takes each employee to
# complete the task. Conduct a two-way ANOVA using Python to determine if there are any main effects or
# interaction effects between the software programs and employee experience level (novice vs.
# experienced). Report the F-statistics and p-values, and interpret the results.



import numpy as np
import pandas as pd
import statsmodels.api as sm
from statsmodels.formula.api import ols

# Simulated data for the task completion time with factors: Software, Experience
data = {
    'Software': np.repeat(['A', 'B', 'C'], 30),
    'Experience': np.tile(['Novice', 'Experienced'], 45),
    'Time': np.random.normal(loc=15, scale=3, size=90)  # Simulated time data
}

df = pd.DataFrame(data)

# Fit a two-way ANOVA model with interaction
model = ols('Time ~ Software * Experience', data=df).fit()
anova_table = sm.stats.anova_lm(model, typ=2)

print(anova_table)




# In this example, Software and Experience are treated as categorical variables. The Software variable has three levels (A, B, C), and the Experience variable has two levels (Novice, Experienced). The simulated Time data represents the time taken to complete the task.

# The Software * Experience term in the formula includes both main effects and the interaction effect in the ANOVA model. The typ=2 argument in the anova_lm function specifies that a Type 2 ANOVA is used.

# The output of the anova_table will provide you with F-statistics and p-values for the main effects of Software and Experience, as well as the interaction effect.

# Interpretation:

# Check the p-values for the main effects of Software and Experience. If any p-value is below your chosen significance level (e.g., 0.05), you would conclude that there is a significant main effect for that factor.

# Look at the p-value for the interaction effect (Software * Experience). If this p-value is below your significance level, it suggests that there is a significant interaction effect. This means that the effect of one factor on the outcome variable depends on the level of the other factor.

# Remember that significant interactions can complicate the interpretation of main effects. Further post hoc analyses or graphical exploration might be needed to understand the nature of the interaction and the direction of the effects.


                         sum_sq    df         F    PR(>F)
Software               3.896300   2.0  0.237096  0.789442
Experience             0.042025   1.0  0.005115  0.943157
Software:Experience    0.358944   2.0  0.021842  0.978400
Residual             690.203193  84.0       NaN       NaN


In [6]:
# Q11. An educational researcher is interested in whether a new teaching method improves student test
# scores. They randomly assign 100 students to either the control group (traditional teaching method) or the
# experimental group (new teaching method) and administer a test at the end of the semester. Conduct a
# two-sample t-test using Python to determine if there are any significant differences in test scores
# between the two groups. If the results are significant, follow up with a post-hoc test to determine which
# group(s) differ significantly from each other.


import numpy as np
from scipy import stats

# Simulated test scores for the control (group 0) and experimental (group 1) groups
control_scores = np.array([70, 75, 80, 65, 72, 78, 82, 68, 74, 76,
                           85, 71, 79, 68, 75, 70, 73, 81, 79, 67,
                           72, 77, 76, 82, 74, 70, 78, 75, 83, 68,
                           71, 72, 69, 80, 74, 76, 70, 81, 75, 68,
                           73, 77, 79, 67, 72, 78, 75, 82, 69, 73])

experimental_scores = np.array([76, 82, 88, 72, 79, 85, 90, 74, 81, 84,
                                93, 78, 86, 75, 82, 77, 80, 87, 85, 73,
                                78, 83, 82, 88, 81, 77, 85, 82, 89, 74,
                                77, 78, 76, 87, 81, 83, 77, 88, 82, 75,
                                80, 84, 86, 73, 78, 85, 82, 89, 76, 80])

# Perform two-sample t-test
t_statistic, p_value = stats.ttest_ind(control_scores, experimental_scores)

print("Two-sample t-test results:")
print("t-statistic:", t_statistic)
print("p-value:", p_value)

# Interpret the t-test results
alpha = 0.05
if p_value < alpha:
    print("There is a significant difference in test scores between the control and experimental groups.")
else:
    print("There is no significant difference in test scores between the control and experimental groups.")

# Follow up with post-hoc test (e.g., Mann-Whitney U test)
if p_value < alpha:
    mwu_statistic, mwu_p_value = stats.mannwhitneyu(control_scores, experimental_scores)
    print("\nMann-Whitney U test results:")
    print("Mann-Whitney U statistic:", mwu_statistic)
    print("p-value:", mwu_p_value)
    if mwu_p_value < alpha:
        print("There is a significant difference between the groups based on the Mann-Whitney U test.")
    else:
        print("There is no significant difference between the groups based on the Mann-Whitney U test.")


Two-sample t-test results:
t-statistic: -6.766067958788337
p-value: 9.722492183014506e-10
There is a significant difference in test scores between the control and experimental groups.

Mann-Whitney U test results:
Mann-Whitney U statistic: 444.0
p-value: 2.6858883224271995e-08
There is a significant difference between the groups based on the Mann-Whitney U test.


In [None]:
# Q12. A researcher wants to know if there are any significant differences in the average daily sales of three
# retail stores: Store A, Store B, and Store C. They randomly select 30 days and record the sales for each store
# on those days. Conduct a repeated measures ANOVA using Python to determine if there are any

# significant differences in sales between the three stores. If the results are significant, follow up with a post-
# hoc test to determine which store(s) differ significantly from each other.




