Q1. Explain the assumptions required to use ANOVA and provide examples of violations that could impact the validity of the results.

ANOVA (Analysis of Variance) is a statistical technique used to compare the means of two or more groups to determine if there are any statistically significant differences between them. To obtain valid and reliable results from ANOVA, several assumptions need to be met. Violations of these assumptions can impact the validity of the results and lead to erroneous conclusions.

Assumptions of ANOVA:

1. Independence: The observations within each group are independent of each other. This means that the data points in one group should not be influenced or dependent on the data points in another group.

2. Normality: The data within each group follow a normal distribution. Normality assumes that the data points are symmetrically distributed around the mean, forming a bell-shaped curve.

3. Homogeneity of Variance (Homoscedasticity): The variance of the data within each group is approximately equal. This assumption means that the variability in the data points is similar across all groups.

4. Random Sampling: The data should be collected using a random sampling method to ensure that the results can be generalized to the larger population.

Examples of Violations and their Impact:

1. Violation of Independence: If the observations within groups are not independent, it can lead to biased estimates and incorrect conclusions. For example, if data points within groups are paired (e.g., repeated measures), such as comparing the performance of individuals before and after an intervention, the independence assumption is violated.

2. Violation of Normality: If the data within groups are not normally distributed, the ANOVA results may be unreliable. Non-normality can lead to inaccurate p-values and incorrect conclusions about group differences. For instance, if the data are heavily skewed or have extreme outliers, the normality assumption is violated.

3. Violation of Homogeneity of Variance: Unequal variances across groups can lead to biased and unreliable results. When variances are not homogenous, the F-statistic in ANOVA can be affected, and the power of the test (ability to detect true effects) may decrease. For example, if one group has much larger variability than others, the assumption of homogeneity of variance is violated.

4. Non-Random Sampling: If the data collection process is not random, the results of ANOVA may not be generalizable to the larger population. Biased sampling methods can lead to overestimation or underestimation of group differences.

When any of these assumptions are violated, it is essential to consider alternative statistical methods or transformations of the data to ensure the validity of the analysis. If the violations are severe, non-parametric tests (e.g., Kruskal-Wallis test) or other robust methods may be more appropriate for comparing group means. It is also crucial to interpret the results cautiously and consider the potential impact of the assumption violations on the conclusions drawn from the analysis.

Q2. What are the three types of ANOVA, and in what situations would each be used?

The three types of ANOVA (Analysis of Variance) are:

1. One-Way ANOVA:

One-Way ANOVA is used when there is a single categorical independent variable (also called a factor) with three or more levels, and the dependent variable is continuous. The purpose of One-Way ANOVA is to compare the means of the dependent variable across the different levels of the independent variable. It helps determine if there are any statistically significant differences between the means of the groups. This test is appropriate when you want to compare the effects of a single factor on the dependent variable.

Example:

Suppose we want to compare the average test scores of students from three different schools (A, B, and C) to see if there are any significant differences in their academic performance.

2. Two-Way ANOVA:

Two-Way ANOVA is used when there are two categorical independent variables (factors) and one continuous dependent variable. The purpose of Two-Way ANOVA is to analyze the main effects of each independent variable and their interaction effect on the dependent variable. It allows us to determine if there are any significant differences based on the two factors and if there is an interaction between them.

Example:

Consider a study where the effects of both gender (male and female) and study method (online and in-person) on exam scores are analyzed. Two-Way ANOVA would be used to examine the main effects of gender and study method and their interaction effect on exam scores.

3. N-Way ANOVA (or Three-Way ANOVA and beyond):

N-Way ANOVA is a generalization of Two-Way ANOVA to situations where there are more than two categorical independent variables (factors). It can handle scenarios with three or more independent variables and one continuous dependent variable. N-Way ANOVA is used to assess the main effects and interactions among multiple factors.

Example:

Suppose we want to analyze the impact of factors like age group (young, middle-aged, and elderly), educational level (high school, college, graduate), and geographic location (urban, suburban, rural) on annual income. In this case, N-Way ANOVA would be appropriate to examine the main effects and interactions of all three factors on income.

In summary, the choice of ANOVA depends on the number of categorical independent variables and the complexity of the study design. One-Way ANOVA is used when there is a single factor, Two-Way ANOVA when there are two factors, and N-Way ANOVA for scenarios with three or more factors. Each type of ANOVA allows researchers to explore and analyze different aspects of the relationship between categorical and continuous variables.

Q3. What is the partitioning of variance in ANOVA, and why is it important to understand this concept?

The partitioning of variance in ANOVA refers to the process of breaking down the total variance observed in the data into different components that can be attributed to specific sources or factors. In the context of ANOVA, the total variance in the dependent variable is divided into two main components:

1. Between-Group Variance (or Treatment Variance):
This component represents the variability in the dependent variable that can be attributed to the differences between the groups or levels of the independent variable (factor). It measures the effect of the independent variable on the dependent variable and is also known as the treatment effect. In other words, it shows how much of the variation in the dependent variable is explained by the grouping factor.

2. Within-Group Variance (or Error Variance):
This component represents the variability in the dependent variable that cannot be explained by the differences between the groups. It is the variability that remains after accounting for the effect of the independent variable. Within-group variance is also referred to as error variance because it includes random errors, individual differences, and other factors that are not part of the grouping variable.

The total variance observed in the data is the sum of the between-group variance and the within-group variance.

Why is it important to understand the partitioning of variance in ANOVA?

1. Identifying Significant Effects: By partitioning the total variance into between-group and within-group components, ANOVA allows us to determine if the variation between groups is statistically significant. If the between-group variance is much larger than the within-group variance, it indicates that the independent variable has a significant effect on the dependent variable.

2. Assessing Group Differences: ANOVA helps to assess whether there are meaningful differences between the groups. If the between-group variance is significant, it implies that there are significant differences in the means of the groups. This is crucial in many research studies to understand the impact of different factors on the outcome.

3. Interpretation of Results: Understanding the partitioning of variance helps researchers interpret the ANOVA results. It provides insights into the relative importance of the independent variable (treatment effect) and the unexplained variability (error) in the data.

4. Designing Better Experiments: By knowing how much of the variability in the dependent variable can be attributed to the grouping factor and how much is due to random error, researchers can design better experiments and improve the study's validity and reliability.

5. Basis for Further Analysis: The partitioning of variance lays the groundwork for post hoc tests (e.g., Tukey's HSD, Bonferroni correction) to identify which specific groups differ significantly from each other when the overall ANOVA is significant.

In summary, understanding the partitioning of variance in ANOVA is crucial for drawing meaningful conclusions from the analysis, interpreting the results correctly, and making informed decisions based on the impact of the independent variable on the dependent variable. It forms the foundation for the inference and interpretation of results in ANOVA.

Q4. How would you calculate the total sum of squares (SST), explained sum of squares (SSE), and residual sum of squares (SSR) in a one-way ANOVA using Python?

In [1]:
import numpy as np

# Assuming sample data for three groups 
group1 = [10, 15, 12, 13, 11]
group2 = [20, 25, 22, 24, 21]
group3 = [30, 35, 32, 33, 31]

# Combine all the data into one array
all_data = np.concatenate((group1, group2, group3))

# Calculate the overall mean (grand mean)
overall_mean = np.mean(all_data)

# Calculate the sample means for each group
group1_mean = np.mean(group1)
group2_mean = np.mean(group2)
group3_mean = np.mean(group3)

# Calculate the total sum of squares (SST)
SST = np.sum((all_data - overall_mean) ** 2)

# Calculate the explained sum of squares (SSE)
SSE = np.sum((group1 - group1_mean) ** 2) + np.sum((group2 - group2_mean) ** 2) + np.sum((group3 - group3_mean) ** 2)

# Calculate the residual sum of squares (SSR)
SSR = SST - SSE

print("Total Sum of Squares (SST):", SST)
print("Explained Sum of Squares (SSE):", SSE)
print("Residual Sum of Squares (SSR):", SSR)


Total Sum of Squares (SST): 1046.9333333333332
Explained Sum of Squares (SSE): 46.8
Residual Sum of Squares (SSR): 1000.1333333333332


Q5. In a two-way ANOVA, how would you calculate the main effects and interaction effects using Python?

In [4]:
import pandas as pd
import statsmodels.api as sm
from statsmodels.formula.api import ols

# Assuming smple data as a pandas DataFrame 
data = {
    'Group1': [10, 12, 15, 11, 13, 14],
    'Group2': [20, 22, 24, 21, 23, 25],
    'Value': [30, 32, 35, 31, 33, 34]
}
df = pd.DataFrame(data)

# Create a formula for the two-way ANOVA
formula = 'Value ~ Group1 + Group2 + Group1:Group2'

# Fit the two-way ANOVA model
model = ols(formula, data=df).fit()

# Get the ANOVA table
anova_table = sm.stats.anova_lm(model, typ=2)

# Extract main effects and interaction effect
main_effect_group1 = anova_table.loc['Group1', 'sum_sq'] / anova_table.loc['Group1', 'df']
main_effect_group2 = anova_table.loc['Group2', 'sum_sq'] / anova_table.loc['Group2', 'df']
interaction_effect = anova_table.loc['Group1:Group2', 'sum_sq'] / anova_table.loc['Group1:Group2', 'df']

print("Main Effect of Group1:", main_effect_group1)
print("Main Effect of Group2:", main_effect_group2)
print("Interaction Effect:", interaction_effect)


Main Effect of Group1: 1.9428571428566626
Main Effect of Group2: 6.22301509746355e-27
Interaction Effect: 4.162749391477013e-28


Q6. Suppose you conducted a one-way ANOVA and obtained an F-statistic of 5.23 and a p-value of 0.02.
What can you conclude about the differences between the groups, and how would you interpret these results?

In a one-way ANOVA, the F-statistic is used to test the null hypothesis that the means of all the groups are equal. The F-statistic measures the ratio of the variance between the groups (explained variance) to the variance within the groups (residual or unexplained variance). The p-value associated with the F-statistic indicates the probability of obtaining the observed F-statistic or a more extreme value under the assumption that the null hypothesis is true.

Given the F-statistic of 5.23 and a p-value of 0.02, we can interpret the results as follows:

1. The F-Statistic: The F-statistic is 5.23. This value represents the ratio of the variation between the group means to the variation within the groups. A higher F-statistic suggests that there is more variability between the group means compared to the variability within the groups.

2. The P-Value: The p-value associated with the F-statistic is 0.02. This p-value represents the probability of observing an F-statistic as extreme as 5.23, or more extreme, if the null hypothesis is true. In this case, the p-value is less than the commonly used significance level of 0.05 (or 5%), indicating that the result is statistically significant.

Interpretation:
Since the p-value (0.02) is less than the chosen significance level (e.g., 0.05), we reject the null hypothesis. In other words, we have enough evidence to conclude that there are statistically significant differences between the means of at least two of the groups. The one-way ANOVA has detected a significant effect of the independent variable (the factor with multiple groups) on the dependent variable.

Keep in mind that rejecting the null hypothesis in ANOVA only indicates that there is evidence of a significant difference between at least two groups, but it does not tell us which specific groups are different. For that, additional post hoc tests (e.g., Tukey's HSD, Bonferroni correction) may be performed to identify specific pairwise differences between groups.

In summary, the results of the one-way ANOVA with an F-statistic of 5.23 and a p-value of 0.02 indicate that there are statistically significant differences between the means of the groups being compared. Further analysis can be conducted to determine which specific groups differ significantly from each other.

Q7. In a repeated measures ANOVA, how would you handle missing data, and what are the potential consequences of using different methods to handle missing data?

Handling missing data in a repeated measures ANOVA is essential to ensure the accuracy and validity of the analysis. Different methods can be used to address missing data, and each method comes with its potential consequences. Let's explore some common approaches and their potential consequences:

1. Complete Case Analysis (Listwise Deletion):
In this approach, any participant with missing data on any of the repeated measures is excluded from the analysis. This method is straightforward, but it can lead to a reduction in sample size and potential bias if the missing data are not missing completely at random (MCAR). Complete case analysis may lead to less representative results if participants with missing data have different characteristics from those with complete data.

2. Mean Imputation:
Mean imputation involves replacing missing values with the mean of the available data for that variable. While this approach is simple, it can distort the distribution and variance of the variable, leading to underestimated standard errors and biased results. It also does not account for any relationship between variables, potentially leading to erroneous conclusions.

3. Last Observation Carried Forward (LOCF) or Next Observation Carried Backward (NOCB):
These methods impute missing values with the last observed value or the next observed value, respectively. While they are straightforward to implement, they assume that the data have a linear trajectory, which may not always be the case. LOCF and NOCB can introduce artificial correlations between repeated measures, leading to biased results and imprecise estimates.

4. Multiple Imputation:
Multiple imputation is a more sophisticated approach that creates multiple plausible imputed datasets based on the observed data. Each dataset is analyzed separately, and the results are combined to provide unbiased estimates and appropriate standard errors. Multiple imputation accounts for the uncertainty introduced by missing data, resulting in more reliable and robust results. However, it requires more computational resources and may be complex to implement.

5. Maximum Likelihood Estimation (MLE):
MLE is a statistical method that estimates model parameters based on the likelihood of the observed data. It treats missing data as parameters to be estimated and integrates them into the likelihood function. MLE is considered a flexible and efficient approach for handling missing data in repeated measures ANOVA, as it can accommodate different patterns of missingness. However, it may require specialized software and assumptions about the distribution of missing data.

Consequences of Using Different Methods:
Using inappropriate methods to handle missing data can lead to biased results, inflated or underestimated standard errors, incorrect conclusions, and reduced statistical power. Additionally, the choice of method may impact the precision and generalizability of the findings.

It is crucial to carefully assess the nature of missing data, explore patterns of missingness, and choose a method that is appropriate for the specific dataset and research question. Researchers should transparently report how missing data were handled, including any sensitivity analyses to assess the impact of different approaches on the results. Consulting with statisticians or using specialized software can be helpful in making informed decisions about handling missing data in repeated measures ANOVA.

Q8. What are some common post-hoc tests used after ANOVA, and when would you use each one? Provide an example of a situation where a post-hoc test might be necessary.

After conducting an ANOVA and finding a significant overall effect, post-hoc tests are often used to determine which specific groups differ significantly from each other. Post-hoc tests are necessary because ANOVA only tells us that there is a significant difference somewhere among the groups, but it does not identify which pairs of groups are significantly different. Some common post-hoc tests include:

1. Tukey's Honestly Significant Difference (HSD) Test:
Tukey's HSD test is widely used and controls the family-wise error rate, making it appropriate for multiple pairwise comparisons. It compares all possible pairs of group means and determines if their differences are statistically significant. Tukey's HSD test is best suited when the number of comparisons is moderate and when you want to protect against making any false positive (Type I error).

2. Bonferroni Correction:
The Bonferroni correction is a conservative method that adjusts the significance level for each individual comparison to control the overall family-wise error rate. It is suitable when there are many pairwise comparisons, but it may be overly conservative and reduce the power of the test.

3. Scheffé's Test:
Scheffé's test is a less conservative alternative to Tukey's HSD and Bonferroni correction. It is more suitable when the number of pairwise comparisons is large and not all comparisons are planned in advance.

4. Dunnett's Test:
Dunnett's test is used when you have one control group and you want to compare the other groups to the control group. It adjusts the significance level to control the family-wise error rate for multiple comparisons against the control group.

5. Fisher's Least Significant Difference (LSD) Test:
Fisher's LSD test is a less conservative alternative to Tukey's HSD, but it does not control the family-wise error rate. It is more appropriate when you have a small number of comparisons and are not concerned about multiple testing.

Example of a Situation Requiring Post-Hoc Test:

Suppose a researcher conducts an ANOVA to compare the effectiveness of three different exercise programs (A, B, and C) on weight loss. The ANOVA shows a significant difference in weight loss between the groups. However, the researcher does not know which specific exercise program(s) lead to significantly different outcomes.

In this case, a post-hoc test like Tukey's HSD or Scheffé's test would be used to perform pairwise comparisons between the three exercise programs. The post-hoc test would determine if there are significant differences in weight loss between any of the pairs of exercise programs (e.g., A vs. B, B vs. C, A vs. C). The post-hoc test helps to identify which exercise programs lead to significantly different weight loss results and provides more detailed insights than the ANOVA alone.

Q9. A researcher wants to compare the mean weight loss of three diets: A, B, and C. They collect data from 50 participants who were randomly assigned to one of the diets. Conduct a one-way ANOVA using Python to determine if there are any significant differences between the mean weight loss of the three diets. Report the F-statistic and p-value, and interpret the results.

In [6]:
import numpy as np
import scipy.stats as stats

# Assuming sample data for weight loss in each diet 
diet_A = [2, 3, 4, 5, 4, 3, 6, 7, 8, 5, 6, 4, 3, 4, 5, 6, 7, 8, 5, 4,
          3, 4, 5, 6, 3, 4, 5, 4, 3, 5, 6, 4, 5, 3, 4, 6, 7, 5, 4, 3]
diet_B = [1, 2, 3, 2, 3, 1, 2, 3, 4, 3, 2, 3, 2, 1, 2, 3, 2, 4, 3, 2,
          3, 2, 1, 2, 3, 1, 2, 4, 3, 2, 3, 2, 1, 2, 3, 2, 3, 1, 2, 3]
diet_C = [3, 4, 5, 3, 4, 5, 4, 3, 2, 5, 4, 3, 5, 4, 3, 4, 5, 6, 4, 5,
          3, 4, 3, 5, 4, 3, 5, 6, 4, 5, 4, 3, 5, 4, 3, 4, 5, 4, 3, 5]

# Combine all the data into one array
all_data = np.concatenate((diet_A, diet_B, diet_C))

# Create a list of group labels
group_labels = ['A'] * len(diet_A) + ['B'] * len(diet_B) + ['C'] * len(diet_C)

# Perform one-way ANOVA
F_statistic, p_value = stats.f_oneway(diet_A, diet_B, diet_C)

print("F-statistic:", F_statistic)
print("p-value:", p_value)


F-statistic: 47.94271713416693
p-value: 6.199190263311225e-16


Interpretation of Results:

The p-value is greater than the significance level, we fail to reject the null hypothesis, and we do not have enough evidence to claim significant differences between the diets.

Q10. A company wants to know if there are any significant differences in the average time it takes to complete a task using three different software programs: Program A, Program B, and Program C. They randomly assign 30 employees to one of the programs and record the time it takes each employee to complete the task. Conduct a two-way ANOVA using Python to determine if there are any main effects or interaction effects between the software programs and employee experience level (novice vs. experienced). Report the F-statistics and p-values, and interpret the results.

In [8]:
import pandas as pd
import statsmodels.api as sm
from statsmodels.formula.api import ols

# Sample data as a pandas DataFrame (replace this with your data)
data = {
    'Software': ['A', 'B', 'C'] * 10,
    'Experience': ['Novice'] * 15 + ['Experienced'] * 15,
    'Time': [12, 10, 11, 15, 13, 14, 9, 11, 10, 12, 13, 14, 18, 17, 16,
             25, 24, 22, 27, 26, 28, 20, 21, 19, 31, 30, 32, 29, 30, 31]
}
df = pd.DataFrame(data)

# Convert Experience column to categorical type for correct ANOVA
df['Experience'] = df['Experience'].astype('category')

# Create a formula for the two-way ANOVA
formula = 'Time ~ Software + Experience + Software:Experience'

# Fit the two-way ANOVA model
model = ols(formula, data=df).fit()

# Get the ANOVA table
anova_table = sm.stats.anova_lm(model, typ=2)

print(anova_table)


                          sum_sq    df          F        PR(>F)
Software                0.466667   2.0   0.015521  9.846086e-01
Experience           1333.333333   1.0  88.691796  1.565765e-09
Software:Experience     0.066667   2.0   0.002217  9.977854e-01
Residual              360.800000  24.0        NaN           NaN


Interpretation of Results:

The ANOVA table provides the F-statistics and p-values for each main effect (Software, Experience) and the interaction effect (Software:Experience).


1. Main Effects:
   a. Software: The F-statistic for the main effect of Software is approximately 0.0155, and the associated p-value is approximately 0.985. Since the p-value is much greater than the significance level (e.g., 0.05), we fail to reject the null hypothesis. This indicates that there is no significant difference in the average time taken to complete the task among the three software programs (A, B, and C).

   b. Experience: The F-statistic for the main effect of Experience is approximately 88.692, and the associated p-value is significantly smaller than the significance level (e.g., 0.05) at approximately 1.566e-09. Therefore, we reject the null hypothesis and conclude that there is a significant difference in the average time taken to complete the task between novice and experienced employees.

2. Interaction Effect:
   The F-statistic for the interaction effect between Software and Experience is approximately 0.0022, and the associated p-value is approximately 0.998. Since the p-value is much greater than the significance level (e.g., 0.05), we fail to reject the null hypothesis. This indicates that there is no significant interaction effect between the software programs and employee experience level in influencing the time taken to complete the task. In other words, the effect of one variable (Software) on the dependent variable (Time) does not depend on the level of the other variable (Experience).

3. Residual:
   The residual sum of squares represents the unexplained variability in the data after considering the effects of the independent variables. It is expected to be non-zero, and its degrees of freedom depend on the total sample size and the number of independent variables in the model.

In summary, the results suggest that there is a significant main effect of Experience, indicating that employee experience level (novice vs. experienced) has a significant impact on the time taken to complete the task. However, there are no significant main effects for the Software variable, and there is no significant interaction effect between Software and Experience. It appears that the software programs used do not significantly affect the time taken to complete the task, and the effect of software does not depend on the employee's experience level.

Q11. An educational researcher is interested in whether a new teaching method improves student test scores. They randomly assign 100 students to either the control group (traditional teaching method) or the experimental group (new teaching method) and administer a test at the end of the semester. Conduct a two-sample t-test using Python to determine if there are any significant differences in test scores between the two groups. If the results are significant, follow up with a post-hoc test to determine which group(s) differ significantly from each other.

In [9]:
import numpy as np
import scipy.stats as stats

# Assumed sample data for test scores in the control group 
control_group_scores = [85, 78, 90, 79, 88, 84, 91, 82, 87, 80, 83, 86, 75, 77, 89, 81, 76, 85, 78, 80]

# Assumed sample data for test scores in the experimental group 
experimental_group_scores = [90, 88, 95, 85, 92, 89, 94, 86, 93, 87, 84, 91, 84, 82, 96, 88, 85, 90, 87, 88]

# Perform two-sample t-test
t_statistic, p_value = stats.ttest_ind(control_group_scores, experimental_group_scores)

print("t-statistic:", t_statistic)
print("p-value:", p_value)

# Perform post-hoc test (e.g., Tukey's HSD) if the t-test results are significant
if p_value < 0.05:
    from statsmodels.stats.multicomp import pairwise_tukeyhsd

    # Combine all the data into one array
    all_scores = np.concatenate((control_group_scores, experimental_group_scores))

    # Create a list of group labels (0 for control group, 1 for experimental group)
    group_labels = ['Control'] * len(control_group_scores) + ['Experimental'] * len(experimental_group_scores)

    # Perform Tukey's HSD post-hoc test
    tukey_result = pairwise_tukeyhsd(endog=all_scores, groups=group_labels, alpha=0.05)

    print(tukey_result)


t-statistic: -4.315953079030419
p-value: 0.00010940882215084969
   Multiple Comparison of Means - Tukey HSD, FWER=0.05   
 group1    group2    meandiff p-adj  lower  upper  reject
---------------------------------------------------------
Control Experimental      6.0 0.0001 3.1857 8.8143   True
---------------------------------------------------------


Q12. A researcher wants to know if there are any significant differences in the average daily sales of three retail stores: Store A, Store B, and Store C. They randomly select 30 days and record the sales for each store on those days. Conduct a repeated measures ANOVA using Python to determine if there are any significant differences in sales between the three stores. If the results are significant, follow up with a post-hoc test to determine which store(s) differ significantly from each other.

In [10]:
import numpy as np
import scipy.stats as stats

# Assume sample data for daily sales in Store A, Store B, and Store C 
store_A_sales = [200, 220, 250, 210, 230, 190, 240, 210, 230, 215, 220, 200, 240, 235, 225, 210, 205, 225, 235, 215, 230, 220, 210, 235, 215, 240, 225, 230, 220, 210]
store_B_sales = [180, 190, 200, 205, 195, 210, 185, 195, 205, 180, 200, 190, 210, 200, 195, 190, 185, 200, 210, 220, 185, 195, 190, 200, 195, 185, 200, 210, 220, 185]
store_C_sales = [250, 260, 270, 280, 260, 255, 265, 270, 280, 260, 250, 275, 265, 260, 270, 280, 270, 250, 265, 260, 255, 270, 280, 260, 255, 270, 265, 280, 260, 255]

# Combine all the data into one array
all_sales = np.concatenate((store_A_sales, store_B_sales, store_C_sales))

# Create a list of group labels
group_labels = ['Store A'] * len(store_A_sales) + ['Store B'] * len(store_B_sales) + ['Store C'] * len(store_C_sales)

# Perform one-way ANOVA
F_statistic, p_value = stats.f_oneway(store_A_sales, store_B_sales, store_C_sales)

print("F-statistic:", F_statistic)
print("p-value:", p_value)


F-statistic: 262.3038791575067
p-value: 1.4371847637303197e-37
