Q1. Explain the assumptions required to use ANOVA and provide examples of violations that could impact
the validity of the results.

In [None]:
Analysis of Variance (ANOVA) is a statistical test used to analyze whether there are statistically significant differences among the means of two or more groups. However, ANOVA comes with certain assumptions that need to be met for the results to be valid. Violations of these assumptions can impact the validity of the ANOVA results. The assumptions required to use ANOVA are:

Independence: The observations within each group should be independent of each other. This means that the data points in one group should not be related to or influenced by the data points in another group.

Example of a violation: If you are comparing the test scores of students from different schools and some students from the same school have collaborated on their tests, violating the independence assumption.

Homogeneity of Variance (Homoscedasticity): The variance of the dependent variable should be roughly equal across all groups. In other words, the spread of data points in one group should be similar to the spread in other groups.

Example of a violation: If you are comparing the weights of animals from different species, and one species has a much larger variation in weight compared to the others, this violates the homogeneity of variance assumption.

Normally Distributed Residuals: The residuals (the differences between observed values and predicted values) for each group should follow a normal distribution. This assumption is primarily related to the residuals, not the original data.

Example of a violation: If you are comparing the response time of participants under different experimental conditions, and the residuals from one condition follow a non-normal distribution, this can impact the validity of ANOVA results.

If these assumptions are violated, the ANOVA results may not be reliable. In such cases, you might need to consider alternative statistical methods or transformations of the data to address the violations. Alternatively, non-parametric tests, which do not rely on the same assumptions as ANOVA, could be used when the assumptions cannot be met.

It's essential to assess and address these assumptions when performing ANOVA or any statistical analysis to ensure the reliability and validity of the results. Depending on the nature of the data and the specific context of the analysis, there are various techniques and tests available to assess the assumptions and handle violations if they occur.


Q2. What are the three types of ANOVA, and in what situations would each be used?

In [None]:
Analysis of Variance (ANOVA) is a statistical technique used to analyze whether there are statistically significant differences among the means of two or more groups. There are three main types of ANOVA, each of which is used in different situations:

One-Way ANOVA:

Used when you have one categorical independent variable (factor) with two or more levels or groups.
It assesses whether there are statistically significant differences in the means of the dependent variable among the different groups.
Example: To compare the average test scores of students from three different schools (School A, School B, and School C) to determine if there are significant differences in student performance.
Two-Way ANOVA:

Used when you have two categorical independent variables, often referred to as factors, and one dependent variable.
It assesses the impact of both factors on the dependent variable and checks for interactions between the two factors.
Example: To investigate the effects of both diet type (Factor 1: Diet A, Diet B) and exercise level (Factor 2: Low, Moderate, High) on weight loss. This can help determine whether diet, exercise, or their interaction significantly affects weight loss.
Multivariate Analysis of Variance (MANOVA):

Used when you have two or more dependent variables (multivariate data) and one or more categorical independent variables.
MANOVA is an extension of ANOVA and allows you to examine the joint effect of the independent variables on multiple dependent variables simultaneously.
Example: To analyze the effect of two types of treatment (Factor 1: Treatment X and Treatment Y) on three health outcomes (dependent variables: blood pressure, cholesterol levels, and BMI) in a clinical study. MANOVA can determine if the treatments have a significant effect across all three health outcomes.
In summary, you would choose the type of ANOVA to use based on the number of independent variables and dependent variables in your study:

One-Way ANOVA is appropriate when you have one categorical independent variable.
Two-Way ANOVA is used when you have two categorical independent variables.
MANOVA is employed when you have multiple dependent variables and one or more categorical independent variables.
Selecting the appropriate ANOVA method is crucial to ensure that the analysis fits the specific research question and data structure. It's also important to consider the assumptions of each type of ANOVA and verify that they are met for valid results.

Q3. What is the partitioning of variance in ANOVA, and why is it important to understand this concept?

In [None]:
The partitioning of variance in Analysis of Variance (ANOVA) is a fundamental concept that helps in understanding how the total variation in the data is decomposed into different components to assess the sources of variability and test for the significance of those sources. ANOVA partitions the total variance into three main components:

Total Variation (Total Sum of Squares, SST):

Total variation represents the total dispersion or variability in the dataset. It quantifies how much the individual data points differ from the overall mean.
SST = Σ(yᵢ - ȳ)², where yᵢ represents each data point, and ȳ is the overall mean.
Between-Group Variation (Between-Group Sum of Squares, SSB):

This component measures the variation between different groups or levels of the independent variable. It assesses whether the means of the groups are significantly different.
SSB = Σ(nᵢ * (ȳᵢ - ȳ)²), where nᵢ is the sample size of each group, ȳᵢ is the mean of each group, and ȳ is the overall mean.
Within-Group Variation (Within-Group Sum of Squares, SSW):

This component quantifies the variation within each group or level of the independent variable. It measures the random variability or error within each group.
SSW = ΣΣ(yᵢⱼ - ȳᵢ)², where yᵢⱼ represents each data point within each group, ȳᵢ is the mean of each group, and the double summation considers all data points within all groups.
The importance of understanding the partitioning of variance in ANOVA lies in several aspects:

Hypothesis Testing: ANOVA tests whether there are statistically significant differences between the group means. By understanding the partition of variance, you can determine if the variability between groups (SSB) is greater than the variability within groups (SSW). If SSB is much larger than SSW, it suggests that there is a significant difference between the groups.

Effect Size: By examining the partition of variance, you can calculate effect size measures such as eta-squared (η²) or partial eta-squared (η²p). These measures help you quantify the proportion of total variance explained by the independent variable, providing a better understanding of the practical significance of the findings.

Interpretation: Understanding how variance is partitioned helps in interpreting the results. It allows you to determine which component contributes the most to the variation in the dependent variable and assess the relative importance of different factors or groups.

Model Assessment: It helps in evaluating the goodness of fit of the ANOVA model. By comparing the explained variation (SSB) to the unexplained variation (SSW), you can assess how well the model accounts for the observed data.

In summary, the partitioning of variance in ANOVA is essential for hypothesis testing, effect size estimation, interpretation of results, and model assessment. It provides insights into the sources of variability in the data and helps researchers draw meaningful conclusions about the relationships between the independent and dependent variables.

Q4. How would you calculate the total sum of squares (SST), explained sum of squares (SSE), and residual
sum of squares (SSR) in a one-way ANOVA using Python?

In [None]:
In a one-way ANOVA, you can calculate the Total Sum of Squares (SST), Explained Sum of Squares (SSE), and Residual Sum of Squares (SSR) using Python. Here's how you can do it:

Let's assume you have a dataset with a single dependent variable (Y) and a single categorical independent variable (X) with K levels or groups.

Total Sum of Squares (SST):
SST represents the total variation in the dependent variable Y. It quantifies how much individual data points differ from the overall mean of Y.

SST = Σ(yᵢ - ȳ)², where yᵢ represents each data point, and ȳ is the overall mean of Y.

Explained Sum of Squares (SSE):
SSE represents the variation in Y explained by the differences between the group means. It measures the impact of the independent variable X on Y.

SSE = Σ(nᵢ * (ȳᵢ - ȳ)²), where nᵢ is the sample size of each group, ȳᵢ is the mean of each group, and ȳ is the overall mean of Y.

Residual Sum of Squares (SSR):
SSR represents the unexplained or residual variation in Y within each group. It quantifies the random variability or error within each group.

SSR = ΣΣ(yᵢⱼ - ȳᵢ)², where yᵢⱼ represents each data point within each group, ȳᵢ is the mean of each group, and the double summation considers all data points within all groups.

To calculate these sums of squares in Python, you would need your data and perform the following steps:

Calculate the overall mean (ȳ) of the dependent variable Y.
Calculate the group means (ȳᵢ) for each group.
Calculate the sums of squares using the formulas mentioned above.
Here's an example of how you can calculate SST, SSE, and SSR in Python using sample data:

import numpy as np

# Sample data for each group
group_1 = np.array([40, 45, 50, 55, 60])
group_2 = np.array([65, 70, 75, 80, 85])

# Calculate overall mean
overall_mean = (np.mean(group_1) + np.mean(group_2)) / 2

# Calculate group means
mean_group_1 = np.mean(group_1)
mean_group_2 = np.mean(group_2)

# Calculate SST, SSE, and SSR
SST = np.sum((group_1 - overall_mean)**2) + np.sum((group_2 - overall_mean)**2)
SSE = (len(group_1) * (mean_group_1 - overall_mean)**2) + (len(group_2) * (mean_group_2 - overall_mean)**2)
SSR = np.sum((group_1 - mean_group_1)**2) + np.sum((group_2 - mean_group_2)**2)

print("Total Sum of Squares (SST):", SST)
print("Explained Sum of Squares (SSE):", SSE)
print("Residual Sum of Squares (SSR):", SSR)


Q5. In a two-way ANOVA, how would you calculate the main effects and interaction effects using Python?

In [None]:
In a two-way ANOVA, you can calculate the main effects and interaction effects using Python by performing the analysis and interpreting the results. Here's a general outline of how you can calculate and interpret these effects:

Calculate and Set Up Your Data:

You need to have a dataset with two categorical independent variables (factors) and one dependent variable. Organize your data into a suitable format for analysis.

Perform the Two-Way ANOVA:

You can use Python libraries like scipy.stats or statsmodels to perform the two-way ANOVA. Both libraries provide functions for conducting the analysis.

For example, using scipy.stats:

In [None]:
import scipy.stats as stats

# Assuming you have your data in a DataFrame 'df' with columns 'A', 'B', and 'Y'
model = stats.f_oneway(df['Y'], df['A'], df['B'], df['A:B'])


In [None]:
Interpret the Results:

Main Effects:

The main effect of Factor A: This represents the impact of Factor A on the dependent variable Y, regardless of the presence of Factor B. It can be assessed from the F-statistic and p-value associated with Factor A.
The main effect of Factor B: Similarly, this represents the impact of Factor B on the dependent variable Y, regardless of the presence of Factor A. It can be assessed from the F-statistic and p-value associated with Factor B.
Interaction Effect:

The interaction effect between Factor A and Factor B: This effect measures whether the combination of both factors has a significant impact on the dependent variable Y. It can be assessed from the F-statistic and p-value associated with the interaction term (Factor A:B).
Evaluate Significance and Interpretation:

If the p-value for a main effect or interaction effect is less than your chosen significance level (e.g., α = 0.05), you can conclude that the effect is statistically significant.
If an effect is statistically significant, you can further interpret it by examining the means or effect size measures (e.g., eta-squared or partial eta-squared).
It's essential to analyze and interpret the results comprehensively to understand the impact of each factor and their interaction on the dependent variable. Visualizations, such as interaction plots or post hoc tests, can also be helpful for a deeper understanding of the effects in a two-way ANOVA.

Keep in mind that Python libraries, such as statsmodels, offer more detailed output and tools for post hoc tests and interaction plots, which can aid in the interpretation of results. The specific implementation may vary depending on the library you choose and the structure of your dataset.

Q6. Suppose you conducted a one-way ANOVA and obtained an F-statistic of 5.23 and a p-value of 0.02.
What can you conclude about the differences between the groups, and how would you interpret these
results?

In [None]:
In a one-way ANOVA, the F-statistic is used to test whether there are statistically significant differences among the means of the groups. The p-value associated with the F-statistic tells you whether those differences are statistically significant. Here's how you can interpret the results of your one-way ANOVA:

F-Statistic: The F-statistic is a measure of the ratio of between-group variability to within-group variability. It quantifies whether the variation between the group means is significantly greater than the variation within each group.

P-Value: The p-value is a measure of the evidence against the null hypothesis. It tells you the probability of observing the results (or more extreme results) if there were no true differences between the groups.

In your case, you obtained an F-statistic of 5.23 and a p-value of 0.02. Here's how to interpret these results:

F-Statistic (5.23): The F-statistic suggests that there are differences between the group means. However, the F-statistic alone does not tell you if those differences are significant.

P-Value (0.02): The p-value is less than your chosen significance level (usually 0.05). This means that there is strong evidence against the null hypothesis, which states that there are no differences between the groups.

Interpretation:
Since the p-value is less than 0.05 (or your chosen significance level), you can conclude that there are statistically significant differences between at least some of the groups. In other words, the one-way ANOVA indicates that not all group means are equal, but it doesn't specify which specific groups are different from each other.

If you want to identify which groups are different, you can perform post hoc tests (e.g., Tukey's HSD, Bonferroni, or Scheffé tests) to conduct pairwise comparisons. These tests will help you determine where the differences lie.

In summary, your one-way ANOVA suggests that there are statistically significant differences among the groups. To gain a better understanding of which specific groups differ, you should follow up with appropriate post hoc tests or pairwise comparisons.

Q7. In a repeated measures ANOVA, how would you handle missing

In [None]:
Handling missing data in a repeated measures ANOVA can be challenging, as the repeated measures design assumes that each subject provides data for all time points or conditions. When data is missing, it can affect the validity of the analysis and the interpretation of results. Here are some strategies to handle missing data in repeated measures ANOVA:

Listwise Deletion:

This is the simplest method but can result in a substantial loss of data.
Subjects with missing data at any time point or condition are entirely removed from the analysis.
Use this method when the amount of missing data is small and random.
Pairwise Deletion:

Also known as available case analysis.
Subjects with missing data at specific time points or conditions are excluded only from those particular comparisons.
This method retains more data but can lead to multiple testing issues.
Imputation:

Imputation methods involve estimating missing values based on observed data.
Common imputation methods include mean imputation, last observation carried forward (LOCF), or linear interpolation.
Imputation methods can help retain sample size and maintain statistical power but should be used with caution, as they introduce uncertainty.
Mixed-Design ANOVA (Split-Plot ANOVA):

If your design allows, you can consider using mixed-design ANOVA, which can accommodate missing data more flexibly.
Mixed-Design ANOVA combines both repeated measures and between-subjects factors, allowing for different subjects to have different numbers of data points.
Robust Methods:

Robust ANOVA methods are less sensitive to violations of assumptions, including missing data.
Some robust ANOVA methods can be used when the assumption of compound symmetry (homogeneity of variance-covariance matrices) is violated due to missing data.
Multiple Imputation:

Multiple imputation is a more advanced technique that generates multiple complete datasets with imputed values and combines results to account for the uncertainty introduced by imputation.
This method is suitable for handling missing data when simple imputation may not be adequate.
The choice of how to handle missing data depends on the nature and extent of missingness in your dataset, the research question, and the assumptions of the repeated measures ANOVA. It's crucial to be transparent in your reporting about the method used for handling missing data and to acknowledge potential limitations associated with the chosen approach. Additionally, consulting with a statistician or data analyst experienced in repeated measures designs can be valuable in making informed decisions about handling missing data.

In [None]:
Post-hoc tests are used after conducting an Analysis of Variance (ANOVA) to make pairwise comparisons between groups when the ANOVA indicates that there are significant differences among groups. These tests help you identify which specific groups are different from each other. Common post-hoc tests used after ANOVA include:

Tukey's Honestly Significant Difference (HSD):

Used to identify differences between all pairs of groups.
Appropriate when you have multiple groups and want to control the overall familywise error rate.
Example: In a one-way ANOVA comparing the test scores of students from three different schools, Tukey's HSD can be used to determine which pairs of schools have significantly different mean scores.
Bonferroni Correction:

Adjusts the significance level to control the familywise error rate, making it more conservative.
Appropriate when you have multiple groups and want to reduce the risk of Type I errors.
Example: When comparing the effectiveness of four different drug treatments in a clinical trial using ANOVA, Bonferroni-corrected t-tests can be used to compare pairs of treatments while controlling for familywise error.
Scheffé Test:

Used for comparing groups when the number of groups is large or unequal.
It is a more conservative test and can detect differences that other tests may miss.
Appropriate when you want to control the familywise error rate in situations where the groups have unequal variances and sizes.
Example: In a one-way ANOVA comparing the performance of multiple teams in a sports tournament, the Scheffé test can be used to compare any pair of teams.
Dunnett's Test:

Used when you have one control group and want to compare it to multiple treatment groups.
Appropriate for situations where there is a single reference group.
Example: In a one-way ANOVA comparing the effects of different doses of a drug (with a control group receiving a placebo), Dunnett's test can be used to compare each dose group to the control group.
Games-Howell Test:

Used when the assumption of equal variances is violated.
It does not assume equal variances among groups, making it more robust.
Appropriate when you want to make pairwise comparisons between groups with unequal variances.
Example: In a one-way ANOVA comparing the yields of different crop varieties, the Games-Howell test can be used to compare the varieties that have significantly different yields.
When to use each post-hoc test depends on the specific characteristics of your data, including the number of groups, whether group variances are equal, and the nature of the research question. It's essential to select the post-hoc test that best fits your data and research design while also considering the control of Type I errors and the specific hypotheses you want to test. Additionally, you should report the chosen post-hoc test and any adjustments made to the significance level to account for multiple comparisons in your research findings.


Q9. A researcher wants to compare the mean weight loss of three diets: A, B, and C. They collect data from
50 participants who were randomly assigned to one of the diets. Conduct a one-way ANOVA using Python
to determine if there are any significant differences between the mean weight loss of the three diets.
Report the F-statistic and p-value, and interpret the results

In [None]:
import numpy as np
import scipy.stats as stats

# Sample data for each diet
diet_A = np.array([2.5, 3.0, 2.8, 2.6, 2.7, 2.9, 3.2, 2.4, 2.8, 3.1,
                   3.0, 2.7, 2.5, 2.6, 2.8, 2.7, 2.9, 3.2, 2.8, 3.0,
                   3.1, 2.7, 2.8, 2.9, 2.6])
diet_B = np.array([3.5, 3.9, 3.7, 3.8, 3.5, 4.0, 3.6, 3.4, 3.7, 3.9,
                   3.5, 3.8, 3.9, 3.6, 3.7, 3.8, 3.5, 3.6, 3.7, 3.9])
diet_C = np.array([4.2, 4.5, 4.1, 4.4, 4.3, 4.0, 4.2, 4.6, 4.3, 4.5,
                   4.0, 4.1, 4.4, 4.2, 4.3, 4.5, 4.1, 4.2, 4.4, 4.3])

# Perform one-way ANOVA
f_statistic, p_value = stats.f_oneway(diet_A, diet_B, diet_C)

# Print the results
print("F-Statistic:", f_statistic)
print("p-Value:", p_value)

# Interpretation
if p_value < 0.05:
    print("There is a significant difference between the mean weight loss of the three diets.")
else:
    print("There is no significant difference between the mean weight loss of the three diets.")


In [None]:
F-Statistic: This statistic tests whether there are significant differences among the group means. In this case, it quantifies whether the mean weight loss of the three diets is significantly different.
p-Value: The p-value associated with the F-statistic. If it is less than your chosen significance level (e.g., 0.05), you can conclude that there are significant differences between the diets.
In this example, if the p-value is less than 0.05, you would interpret it as follows:

"There is a significant difference between the mean weight loss of the three diets (A, B, and C)."
If the p-value is greater than or equal to 0.05, you would interpret it as:

"There is no significant difference between the mean weight loss of the three diets."
The one-way ANOVA helps you determine whether there are statistically significant differences in the mean weight loss among the three diets.

Q10. A company wants to know if there are any significant differences in the average time it takes to
complete a task using three different software programs: Program A, Program B, and Program C. They
randomly assign 30 employees to one of the programs and record the time it takes each employee to
complete the task. Conduct a two-way ANOVA using Python to determine if there are any main effects or
interaction effects between the software programs and employee experience level (novice vs.
experienced). Report the F-statistics and p-values, and interpret the results.

In [1]:
import numpy as np
import pandas as pd
import scipy.stats as stats
from statsmodels.stats.anova import AnovaRM

# Sample data
np.random.seed(0)  # For reproducibility
n = 30  # Number of employees
experience = np.random.choice(['novice', 'experienced'], n)
software = np.random.choice(['A', 'B', 'C'], n)
completion_time = np.random.normal(30, 5, n)  # Simulated completion times

# Create a DataFrame
data = pd.DataFrame({'Experience': experience, 'Software': software, 'CompletionTime': completion_time})

# Perform a two-way ANOVA
model = AnovaRM(data, 'CompletionTime', 'Experience', within=['Software'])
results = model.fit()

# Print the results
print(results)

# Interpretation
print("Main Effects:")
print("Software Main Effect F-Statistic:", results.anova_table.loc['Software', 'F Value'])
print("Software Main Effect p-Value:", results.anova_table.loc['Software', 'Pr > F'])

print("Experience Main Effect F-Statistic:", results.anova_table.loc['Experience', 'F Value'])
print("Experience Main Effect p-Value:", results.anova_table.loc['Experience', 'Pr > F'])

print("Interaction Effect F-Statistic:", results.anova_table.loc['Software x Experience', 'F Value'])
print("Interaction Effect p-Value:", results.anova_table.loc['Software x Experience', 'Pr > F'])

# Interpretation
if results.anova_table.loc['Software', 'Pr > F'] < 0.05:
    print("There is a significant main effect of software.")
else:
    print("There is no significant main effect of software.")

if results.anova_table.loc['Experience', 'Pr > F'] < 0.05:
    print("There is a significant main effect of experience.")
else:
    print("There is no significant main effect of experience.")

if results.anova_table.loc['Software x Experience', 'Pr > F'] < 0.05:
    print("There is a significant interaction effect between software and experience.")
else:
    print("There is no significant interaction effect between software and experience.")


ValueError: The data set contains more than one observation per subject and cell. Either aggregate the data manually, or pass the `aggregate_func` parameter.

Q11. An educational researcher is interested in whether a new teaching method improves student test
scores. They randomly assign 100 students to either the control group (traditional teaching method) or the
experimental group (new teaching method) and administer a test at the end of the semester. Conduct a
two-sample t-test using Python to determine if there are any significant differences in test scores
between the two groups. If the results are significant, follow up with a post-hoc test to determine which
group(s) differ significantly from each other.

In [2]:
import numpy as np
import scipy.stats as stats
import statsmodels.stats.multicomp as multi

# Sample data (test scores for the two groups)
control_group = np.array([85, 89, 78, 92, 80, 88, 75, 90, 82, 86,
                          79, 91, 84, 87, 76, 93, 81, 89, 77, 94])
experimental_group = np.array([91, 93, 88, 96, 89, 94, 87, 95, 90, 92,
                              86, 97, 88, 91, 85, 98, 89, 93, 87, 95])

# Perform a two-sample t-test
t_statistic, p_value = stats.ttest_ind(control_group, experimental_group)

# Print the results
print("Two-Sample T-Test:")
print("T-Statistic:", t_statistic)
print("P-Value:", p_value)

# Interpretation
if p_value < 0.05:
    print("There is a significant difference in test scores between the control and experimental groups.")
else:
    print("There is no significant difference in test scores between the groups.")

# If the results are significant, follow up with a post-hoc test (e.g., Tukey's HSD)
if p_value < 0.05:
    data = np.concatenate([control_group, experimental_group])
    group_labels = ['Control'] * len(control_group) + ['Experimental'] * len(experimental_group)
    
    tukey_result = multi.MultiComparison(data, group_labels).tukeyhsd()
    print("Tukey's HSD Post-Hoc Test:")
    print(tukey_result)

Two-Sample T-Test:
T-Statistic: -4.025739757040968
P-Value: 0.00026160659757677957
There is a significant difference in test scores between the control and experimental groups.
Tukey's HSD Post-Hoc Test:
   Multiple Comparison of Means - Tukey HSD, FWER=0.05   
 group1    group2    meandiff p-adj  lower  upper  reject
---------------------------------------------------------
Control Experimental      6.4 0.0003 3.1817 9.6183   True
---------------------------------------------------------


Q12. A researcher wants to know if there are any significant differences in the average daily sales of three
retail stores: Store A, Store B, and Store C. They randomly select 30 days and record the sales for each store
on those days. Conduct a repeated measures ANOVA using Python to determine if there are any
significant differences in sales between the three stores. If the results are significant, follow up with a posthoc test to determine which store(s) differ significantly from each other.

In [None]:
import numpy as np
import scipy.stats as stats
import statsmodels.stats.multicomp as multi

# Sample data (sales for three stores on 30 days)
store_A = np.random.normal(1000, 50, 30)  # Sales for Store A
store_B = np.random.normal(1050, 60, 30)  # Sales for Store B
store_C = np.random.normal(1100, 70, 30)  # Sales for Store C

# Combine the data from the three stores
sales_data = np.concatenate([store_A, store_B, store_C])
store_labels = ['Store A'] * 30 + ['Store B'] * 30 + ['Store C'] * 30

# Perform a one-way ANOVA
f_statistic, p_value = stats.f_oneway(store_A, store_B, store_C)

# Print the results
print("One-Way ANOVA:")
print("F-Statistic:", f_statistic)
print("P-Value:", p_value)

# Interpretation
if p_value < 0.05:
    print("There is a significant difference in average daily sales between the stores.")
else:
    print("There is no significant difference in average daily sales between the stores.")

# If the results are significant, follow up with a post-hoc test (e.g., Tukey's HSD)
if p_value < 0.05:
    tukey_result = multi.MultiComparison(sales_data, store_labels).tukeyhsd()
    print("Tukey's HSD Post-Hoc Test:")
    print(tukey_result)
