Q1. Explain the assumptions required to use ANOVA and provide examples of violations that could impact
the validity of the results.


    ANOVA (Analysis of Variance) is a statistical method used to compare means across two or more groups to determine if there are statistically significant differences between them. To use ANOVA effectively, several assumptions must be met:


- Independence: Observations within and between groups must be independent of each other. This means that the data points in one group should not be influenced by or dependent on the data points in another group. Violations of independence can occur in clustered or hierarchical data, where observations within a group may be correlated.

    - Example of violation: In a study measuring the effectiveness of different teaching methods on student performance, if students within the same class are assigned to different teaching methods, their performance may be influenced by factors unique to that class, violating the assumption of independence.


- Normality: The data within each group should be normally distributed. This means that the distribution of the data points should resemble a bell-shaped curve when plotted. While ANOVA is robust to violations of normality, especially with larger sample sizes, severe deviations from normality can impact the validity of the results.

    - Example of violation: In a study comparing test scores between different schools, if the test scores within each school are highly skewed or do not follow a normal distribution, it may violate the assumption of normality.


- Homogeneity of Variances (Homoscedasticity): The variance of the data points within each group should be approximately equal across all groups. This means that the spread of the data points should be consistent across groups. Violations of homogeneity of variances can affect the accuracy of the F-test in ANOVA, leading to incorrect conclusions.

  -  Example of violation: In a study comparing the effectiveness of two medications on blood pressure reduction, if the variance of blood pressure measurements within one medication group is much larger than the other, it may violate the assumption of homogeneity of variances.


- Equal Sample Sizes (for one-way ANOVA): In one-way ANOVA (comparing means across multiple groups), it is preferable to have equal sample sizes in each group. While ANOVA can still be used with unequal sample sizes, equal sample sizes improve the power of the analysis.

  - Example of violation: In a study comparing the performance of three different exercise programs on weight loss, if one exercise program has significantly fewer participants than the others, it may impact the validity of the results, especially if there are other violations present.

- Violations of these assumptions can lead to biased estimates of the treatment effects and inaccurate conclusions. However, ANOVA is relatively robust, meaning that minor violations may not severely affect the results, especially with larger sample sizes. Nonetheless, researchers should always check for these assumptions and consider alternative methods if they are substantially violated.

Q2. What are the three types of ANOVA, and in what situations would each be used?

- The three main types of ANOVA are:

- One-way ANOVA: This type of ANOVA is used when there is one categorical independent variable (also known as a factor) with three or more levels (groups), and the dependent variable is continuous. One-way ANOVA is used to determine whether there are statistically significant differences in the means of the dependent variable across the different levels of the independent variable.

    -  Example: A researcher wants to compare the effectiveness of three different teaching methods (e.g., traditional lectures, online tutorials, and interactive workshops) on student exam scores. The independent variable is the teaching method (with three levels), and the dependent variable is the exam score.

- Two-way ANOVA: This type of ANOVA is used when there are two categorical independent variables (factors), and the dependent variable is continuous. Two-way ANOVA examines the main effects of each independent variable as well as any interaction effects between them. The interaction effect occurs when the effect of one independent variable on the dependent variable depends on the level of another independent variable.

    - Example: A researcher wants to investigate the effects of both gender and socioeconomic status (SES) on academic achievement. Gender (male vs. female) and SES (low vs. high) are the two independent variables, and academic achievement is the dependent variable. Two-way ANOVA would examine whether there are main effects of gender and SES on academic achievement, as well as whether there is an interaction effect between gender and SES.

- Repeated Measures ANOVA (or Within-subjects ANOVA): This type of ANOVA is used when the same participants are measured under different conditions or at different time points. Repeated measures ANOVA allows researchers to examine within-subject differences while controlling for individual differences between subjects. It is particularly useful when studying changes over time or when comparing different conditions within the same participants.

    - Example: A researcher wants to investigate the effects of a new drug on blood pressure over time. Blood pressure measurements are taken from the same group of participants at baseline (before taking the drug) and at multiple time points after taking the drug. Repeated measures ANOVA would be used to analyze whether there are significant changes in blood pressure over time due to the drug treatment.

- Each type of ANOVA has its specific use cases depending on the research design and the nature of the independent variables. Choosing the appropriate type of ANOVA ensures that the statistical analysis is aligned with the research questions and hypotheses.

Q3. What is the partitioning of variance in ANOVA, and why is it important to understand this concept?

- The partitioning of variance in ANOVA refers to the decomposition of the total variance observed in the data into different components that can be attributed to various sources or factors in the experimental design. Understanding this concept is crucial because it allows researchers to quantify the amount of variability in the dependent variable that can be explained by the independent variables (factors) being studied, as well as any variability that remains unexplained. This breakdown of variance helps researchers assess the significance of the effects of the independent variables on the dependent variable and draw valid conclusions from their analyses.

- The partitioning of variance in ANOVA typically involves three main components:

- Between-group variance (SS_between): This component represents the variability in the dependent variable that can be attributed to differences between the group means. It measures the extent to which the means of the different groups (levels of the independent variable) differ from each other.

- Within-group variance (SS_within or SS_error): This component represents the variability in the dependent variable that is not accounted for by the differences between the group means. It reflects the variability within each group and includes random error as well as individual differences within the groups.

- Total variance (SS_total): This is the overall variability observed in the dependent variable across all observations or data points. It is the sum of the between-group variance and the within-group variance.

- The partitioning of variance is important for several reasons:

- Assessing the significance of group differences: By comparing the between-group variance to the within-group variance, researchers can determine whether the differences observed between the group means are statistically significant or simply due to random variability.

- Calculating F-statistic and conducting hypothesis tests: The ratio of between-group variance to within-group variance forms the basis of the F-statistic in ANOVA. This statistic is used to test the null hypothesis that there are no significant differences between the group means. Understanding the partitioning of variance is essential for interpreting the results of hypothesis tests and determining whether to reject or fail to reject the null hypothesis.

- Understanding the relative importance of factors: By examining the proportion of variance explained by each factor (or interaction), researchers can assess the relative importance of different variables in influencing the dependent variable. This information can help prioritize factors for further investigation or intervention.

- Overall, understanding the partitioning of variance in ANOVA provides insights into the underlying structure of the data and helps researchers make valid inferences about the effects of the independent variables on the dependent variable.

Q4. How would you calculate the total sum of squares (SST), explained sum of squares (SSE), and residual
sum of squares (SSR) in a one-way ANOVA using Python?


ChatGPT
In a one-way ANOVA, we can calculate the total sum of squares (SST), explained sum of squares (SSE), and residual sum of squares (SSR) using Python. we can use libraries like NumPy or pandas for data manipulation and SciPy for statistical calculations. Below is a step-by-step guide on how to calculate these sums of squares:




In [1]:
import numpy as np
from scipy import stats

# Generate sample data for demonstration
np.random.seed(42)  # for reproducibility
group1 = np.random.normal(loc=10, scale=2, size=30)
group2 = np.random.normal(loc=12, scale=2, size=30)
group3 = np.random.normal(loc=15, scale=2, size=30)

# Combine data into a single array
data = np.concatenate([group1, group2, group3])

# Compute group means
group_means = [np.mean(group1), np.mean(group2), np.mean(group3)]

# Compute overall mean
overall_mean = np.mean(data)

# Calculate Total Sum of Squares (SST)
SST = np.sum((data - overall_mean) ** 2)

# Calculate Explained Sum of Squares (SSE)
SSE = np.sum([len(group) * (mean - overall_mean) ** 2 for group, mean in zip([group1, group2, group3], group_means)])

# Calculate Residual Sum of Squares (SSR)
SSR = SST - SSE

print("Total Sum of Squares (SST):", SST)
print("Explained Sum of Squares (SSE):", SSE)
print("Residual Sum of Squares (SSR):", SSR)


Total Sum of Squares (SST): 752.8407434049392
Explained Sum of Squares (SSE): 444.1655291701484
Residual Sum of Squares (SSR): 308.6752142347908


- First generate sample data for demonstration purposes. Here, we have three groups with normally distributed data.
- Then concatenate the data from all groups into a single array.
- Next, compute the group means for each group and the overall mean of the data.
- Using these means, calculate the Total Sum of Squares (SST), which measures the total variability in the data.
- Then calculate the Explained Sum of Squares (SSE), which represents the variability explained by the group means.
- Finally, compute the Residual Sum of Squares (SSR), which is the unexplained variability in the data after accounting for the group means.
        These calculations give the insight into how much of the total variability in the data is explained by the group means and how much is left unexplained.

Q5. In a two-way ANOVA, how would you calculate the main effects and interaction effects using Python?


In a two-way ANOVA, we can calculate the main effects and interaction effects using Python. You can use libraries like SciPy or statsmodels for performing the ANOVA analysis. 



In [2]:
import numpy as np
import pandas as pd
import statsmodels.api as sm
from statsmodels.formula.api import ols

# Generate sample data for demonstration
np.random.seed(42)  # for reproducibility
size = 50
temperature = np.random.choice(['High', 'Low'], size=size)
humidity = np.random.choice(['High', 'Low'], size=size)
data = pd.DataFrame({'Temperature': temperature, 'Humidity': humidity,
                     'Measurement': np.random.normal(loc=10, scale=2, size=size)})

# Fit the two-way ANOVA model
formula = 'Measurement ~ C(Temperature) + C(Humidity) + C(Temperature):C(Humidity)'
model = ols(formula, data=data).fit()

# Perform ANOVA
anova_table = sm.stats.anova_lm(model, typ=2)
print(anova_table)

# Calculate main effects and interaction effects
main_effects = anova_table['sum_sq'][:-1] / anova_table['sum_sq'].sum()
interaction_effect = anova_table.loc['C(Temperature):C(Humidity)', 'sum_sq'] / anova_table['sum_sq'].sum()

print("Main Effects:")
print(main_effects)
print("Interaction Effect:")
print(interaction_effect)


                                sum_sq    df         F    PR(>F)
C(Temperature)                2.573266   1.0  0.739546  0.394266
C(Humidity)                   1.664595   1.0  0.478398  0.492625
C(Temperature):C(Humidity)    1.792908   1.0  0.515274  0.476494
Residual                    160.057974  46.0       NaN       NaN
Main Effects:
C(Temperature)                0.015493
C(Humidity)                   0.010022
C(Temperature):C(Humidity)    0.010795
Name: sum_sq, dtype: float64
Interaction Effect:
0.01079487755808216


- First generate sample data for demonstration purposes. Here, we have two categorical variables, "Temperature" and "Humidity", and a continuous dependent variable "Measurement".
- Then fit a two-way ANOVA model using the ols function from statsmodels.formula.api module.
- The formula specifies the model with main effects of "Temperature" and "Humidity", as well as their interaction effect.
- Use the anova_lm function from statsmodels.stats.anova module to perform ANOVA and obtain the ANOVA table.
- From the ANOVA table, extract the sum of squares for each effect and divide it by the total sum of squares to get the proportion of variance explained by each effect.
- The main effects are calculated by dividing the sum of squares of each main effect by the total sum of squares.
- The interaction effect is calculated by dividing the sum of squares of the interaction term by the total sum of squares.
- Finally, print out the main effects and interaction effect.

Q6. Suppose you conducted a one-way ANOVA and obtained an F-statistic of 5.23 and a p-value of 0.02.
What can you conclude about the differences between the groups, and how would you interpret these
results?

In the context of a one-way ANOVA, the F-statistic and its associated p-value are used to assess whether there are statistically significant differences in the means of the groups being compared. Here's how to interpret the results:

F-statistic: The F-statistic is a ratio of the between-group variance to the within-group variance. It measures the extent to which the means of the groups differ from each other relative to the variability within each group. In this case, the obtained F-statistic is 5.23.

p-value: The p-value associated with the F-statistic indicates the probability of observing the obtained F-statistic (or a more extreme value) under the null hypothesis that there are no significant differences between the group means. A low p-value suggests that the observed differences between the groups are unlikely to be due to random chance alone.

Based on the F-statistic of 5.23 and the p-value of 0.02:

Since the p-value (0.02) is less than the conventional significance level of 0.05 (or any other chosen significance level), we reject the null hypothesis.
This indicates that there are statistically significant differences between the group means.
Therefore, we can conclude that there are statistically significant differences between at least some of the groups being compared. However, the one-way ANOVA test does not specify which specific groups are different from each other. If the overall test is significant, further post-hoc tests (e.g., Tukey's HSD, Bonferroni correction, etc.) can be conducted to determine which specific group means are significantly different from each other.

In summary, with an F-statistic of 5.23 and a p-value of 0.02, we reject the null hypothesis and conclude that there are statistically significant differences between the groups.

Q7. In a repeated measures ANOVA, how would you handle missing data, and what are the potential
consequences of using different methods to handle missing data?

Handling missing data in a repeated measures ANOVA is crucial to ensure the accuracy and validity of the results. There are several methods for handling missing data, each with its own potential consequences:

       1.Complete Case Analysis (CCA):

- In CCA, only cases with complete data across all time points or conditions are included in the analysis.

    Pros: Simple to implement, does not require imputation of missing values.
    Cons: Reduces sample size and may introduce bias if missingness is related to the outcome or other variables of interest.

        2. Mean Imputation:
     -Missing values are replaced with the mean of the observed values for that variable.
        Pros: Simple and preserves the sample size.
        Cons: May underestimate variability and bias results if data are not missing at random. Can distort relationships between variables.
        
        
       3.Last Observation Carried Forward (LOCF):

    -  The last observed value is used to impute missing values for subsequent time points.
       Pros: Easy to implement and maintains time series structure.
       Cons: Assumes that the last observed value is representative, which may not always be the case. Can underestimate variability and bias results.
       
       
      4. Linear Interpolation:

-   Missing values are estimated based on linear interpolation between adjacent observed values.
  -  Pros : Preserves the trend and temporal ordering of data.
 -  Cons: Assumes a linear relationship between adjacent time points, which may not always be valid. Can lead to biased estimates if the underlying relationship is nonlinear.


          5. Multiple Imputation:

    - Missing values are imputed multiple times to generate multiple complete datasets, and the results are combined using appropriate statistical methods.
    - Pros: Accounts for uncertainty due to missing data, produces unbiased estimates under certain conditions.
    - Cons: More computationally intensive and requires assumptions about the missing data mechanism. May be sensitive to model specification and assumptions.
    
    
          6.  Mixed Effects Models:

    - Mixed effects models can handle missing data by using maximum likelihood estimation to estimate model parameters based on available data.
    -  Pros: Accounts for the correlation structure within subjects, flexible in handling missing data.
    -  Cons: Requires more advanced statistical knowledge and software, assumptions about the missing data mechanism still apply.
The choice of method for handling missing data should be guided by the nature of the data, the missing data mechanism, and the assumptions underlying each method. It is essential to perform sensitivity analyses to assess the robustness of results to different methods of handling missing data and to report any potential biases or limitations introduced by the chosen approach.





Q8. What are some common post-hoc tests used after ANOVA, and when would you use each one? Provide
an example of a situation where a post-hoc test might be necessary.

Post-hoc tests are used after conducting an analysis of variance (ANOVA) to determine which specific group means are significantly different from each other when the overall ANOVA test indicates a significant difference between groups. Here are some common post-hoc tests and situations where each one might be appropriate:

    Tukey's Honestly Significant Difference (HSD):

Tukey's HSD test is used when all pairwise comparisons between group means need to be examined.
It controls for family-wise error rate, making it suitable for situations where multiple pairwise comparisons are conducted.
Example: Suppose a researcher conducts a one-way ANOVA to compare the effectiveness of four different teaching methods on student exam scores. Tukey's HSD test can be used to determine which pairs of teaching methods have significantly different mean exam scores.

    Bonferroni Correction:\n

Bonferroni correction adjusts the significance level for each pairwise comparison to control for family-wise error rate.
It is conservative but effective in controlling for Type I error rate, especially when conducting a large number of pairwise comparisons.
Example: In a study comparing the effects of a new drug across multiple doses, Bonferroni correction can be used to adjust the significance level for each dose comparison to account for multiple testing.


    Dunnett's Test: \n

Dunnett's test is used when one group (usually a control group) is compared against all other groups.
It is appropriate when the primary interest is in comparing all treatment groups to a control group.
Example: In a clinical trial comparing the efficacy of three different treatments to a placebo control, Dunnett's test can be used to determine whether each treatment group differs significantly from the control group.

    Scheffé's Test: \n

Scheffé's test is a conservative post-hoc test that can be used for all pairwise comparisons or for comparing combinations of means.
It is useful when sample sizes are unequal or variances are unequal across groups.
Example: In a study comparing the effects of different interventions on patient outcomes across multiple hospitals with varying patient populations, Scheffé's test can be used to assess differences in mean outcomes while accounting for potential heterogeneity across hospitals.


    Games-Howell Test:\n

The Games-Howell test is a non-parametric post-hoc test that does not assume equal variances or sample sizes.
It is suitable when the assumptions of other post-hoc tests (such as Tukey's HSD or Bonferroni correction) are violated.
Example: In a study comparing the effects of different treatments on a non-normally distributed outcome, the Games-Howell test can be used to conduct pairwise comparisons while accounting for unequal variances or sample sizes.
Post-hoc tests are necessary to provide more detailed information about group differences after conducting an ANOVA and to identify which specific group means differ significantly from each other. The choice of post-hoc test depends on the research question, the experimental design, and the assumptions underlying each test.

Q9. A researcher wants to compare the mean weight loss of three diets: A, B, and C. They collect data from
50 participants who were randomly assigned to one of the diets. Conduct a one-way ANOVA using Python
to determine if there are any significant differences between the mean weight loss of the three diets.
Report the F-statistic and p-value, and interpret the results

To conduct a one-way ANOVA in Python to compare the mean weight loss of three diets (A, B, and C), we can use the scipy.stats module


In [3]:
import numpy as np
from scipy.stats import f_oneway

# Generate sample weight loss data for demonstration
np.random.seed(42)  # for reproducibility
diet_A = np.random.normal(loc=5, scale=2, size=50)  # Mean weight loss of 5 kg, SD of 2 kg
diet_B = np.random.normal(loc=4, scale=1.5, size=50)  # Mean weight loss of 4 kg, SD of 1.5 kg
diet_C = np.random.normal(loc=6, scale=2.5, size=50)  # Mean weight loss of 6 kg, SD of 2.5 kg

# Perform one-way ANOVA
f_statistic, p_value = f_oneway(diet_A, diet_B, diet_C)

# Report the results
print("F-statistic:", f_statistic)
print("p-value:", p_value)

# Interpret the results
if p_value < 0.05:
    print("The p-value is less than 0.05, so we reject the null hypothesis.")
    print("There is sufficient evidence to conclude that there are significant differences between the mean weight loss of the three diets.")
else:
    print("The p-value is greater than or equal to 0.05, so we fail to reject the null hypothesis.")
    print("There is insufficient evidence to conclude that there are significant differences between the mean weight loss of the three diets.")


F-statistic: 12.056355514095166
p-value: 1.4176763826158173e-05
The p-value is less than 0.05, so we reject the null hypothesis.
There is sufficient evidence to conclude that there are significant differences between the mean weight loss of the three diets.


We first generate sample weight loss data for each diet group (A, B, and C) using numpy's random.normal function. Each group consists of 50 participants.

We then use the f_oneway function from scipy.stats to perform the one-way ANOVA test.

The function returns the F-statistic and the p-value.

We report the F-statistic and p-value.

Based on the p-value, we interpret the results. If the p-value is less than 0.05 (or any chosen significance level), we reject the null hypothesis and conclude that there are significant differences between the mean weight loss of the three diets.

Otherwise, if the p-value is greater than or equal to 0.05, we fail to reject the null hypothesis due to insufficient evidence to conclude that there are significant differences.

Q10. A company wants to know if there are any significant differences in the average time it takes to
complete a task using three different software programs: Program A, Program B, and Program C. They
randomly assign 30 employees to one of the programs and record the time it takes each employee to
complete the task. Conduct a two-way ANOVA using Python to determine if there are any main effects or
interaction effects between the software programs and employee experience level (novice vs.
experienced). Report the F-statistics and p-values, and interpret the results.


To conduct a two-way ANOVA in Python to analyze the effects of software programs (Program A, Program B, and Program C) and employee experience level (novice vs. experienced) on task completion time, we can use the statsmodels library

In [4]:
import pandas as pd
import statsmodels.api as sm
from statsmodels.formula.api import ols

# Generate sample data for demonstration
np.random.seed(42)  # for reproducibility

# Create a DataFrame with software programs, employee experience level, and task completion time
data = pd.DataFrame({
    'Software': np.random.choice(['A', 'B', 'C'], size=90),
    'Experience': np.random.choice(['Novice', 'Experienced'], size=90),
    'CompletionTime': np.random.normal(loc=10, scale=2, size=90)  # Mean completion time of 10 units, SD of 2 units
})

# Fit the two-way ANOVA model
model = ols('CompletionTime ~ C(Software) + C(Experience) + C(Software):C(Experience)', data=data).fit()

# Perform ANOVA
anova_table = sm.stats.anova_lm(model, typ=2)
print(anova_table)


                               sum_sq    df         F    PR(>F)
C(Software)                  1.334021   2.0  0.193670  0.824297
C(Experience)                5.096305   1.0  1.479736  0.227223
C(Software):C(Experience)    8.396750   2.0  1.219018  0.300694
Residual                   289.301266  84.0       NaN       NaN


We first generate sample data for demonstration purposes. We create a DataFrame with columns for the software programs, employee experience level, and task completion time.

The task completion time data is generated from a normal distribution with a mean of 10 units and a standard deviation of 2 units.

We fit a two-way ANOVA model using the ols function from statsmodels.formula.api.

The formula specifies the model with main effects of software programs and experience level, as well as their interaction effect.

We use the anova_lm function from statsmodels.stats.anova to perform ANOVA and obtain the ANOVA table.
The ANOVA table contains the F-statistics and p-values for the main effects of software programs and experience level, as well as the interaction effect between them.


    Interpretation of the results:

Look at the F-statistics and p-values in the ANOVA table to determine the significance of main effects and interaction effects.
A significant main effect of software programs indicates that there are differences in task completion time between at least two software programs.

A significant main effect of experience level indicates that there are differences in task completion time between novice and experienced employees.

A significant interaction effect between software programs and experience level indicates that the effect of software programs on task completion time varies depending on the experience level of employees, and vice versa.

Based on the obtained F-statistics and p-values, interpret the results to determine the presence and significance of main effects and interaction effects in the context of the study.

Q11. An educational researcher is interested in whether a new teaching method improves student test
scores. They randomly assign 100 students to either the control group (traditional teaching method) or the
experimental group (new teaching method) and administer a test at the end of the semester. Conduct a
two-sample t-test using Python to determine if there are any significant differences in test scores
between the two groups. If the results are significant, follow up with a post-hoc test to determine which
group(s) differ significantly from each other.


To conduct a two-sample t-test in Python and follow up with a post-hoc test, you can use libraries like SciPy and StatsModels. First, let's simulate some data and then perform the analysis.

In [5]:
import numpy as np
from scipy import stats
import statsmodels.stats.multicomp as mc

# Simulating data
np.random.seed(42)  # for reproducibility
control_scores = np.random.normal(loc=70, scale=10, size=100)
experimental_scores = np.random.normal(loc=75, scale=10, size=100)

# Performing two-sample t-test
t_stat, p_value = stats.ttest_ind(control_scores, experimental_scores)
print("Two-sample t-test results:")
print("t-statistic:", t_stat)
print("p-value:", p_value)

# Follow up with post-hoc test if significant difference exists
if p_value < 0.05:
    print("\nSince p-value is significant (< 0.05), performing post-hoc test...")
    all_scores = np.concatenate([control_scores, experimental_scores])
    groups = ['control'] * len(control_scores) + ['experimental'] * len(experimental_scores)
    posthoc_result = mc.MultiComparison(all_scores, groups).tukeyhsd()
    print(posthoc_result)
else:
    print("\nNo significant difference found. Post-hoc test is not required.")


Two-sample t-test results:
t-statistic: -4.754695943505281
p-value: 3.819135262679478e-06

Since p-value is significant (< 0.05), performing post-hoc test...
  Multiple Comparison of Means - Tukey HSD, FWER=0.05   
 group1    group2    meandiff p-adj lower  upper  reject
--------------------------------------------------------
control experimental   6.2615   0.0 3.6645 8.8585   True
--------------------------------------------------------


- simulate two sets of test scores for the control and experimental groups.
- Then, we perform a two-sample t-test using stats.ttest_ind.
- If the p-value is significant (less than 0.05), indicating a significant difference in means, we proceed with a post-hoc -   Tukey's HSD test using statsmodels.stats.multicomp.MultiComparison.

- Finally, we print the results of the t-test and, if applicable, the post-hoc test.




Q12. A researcher wants to know if there are any significant differences in the average daily sales of three
retail stores: Store A, Store B, and Store C. They randomly select 30 days and record the sales for each store
on those days. Conduct a repeated measures ANOVA using Python to determine if there are any

significant differences in sales between the three stores. If the results are significant,


To conduct a repeated measures ANOVA in Python, you can use the statsmodels library, which provides a convenient function for ANOVA analysis. Here's how you can perform the analysis:

In [6]:
import pandas as pd
import numpy as np
import statsmodels.api as sm
from statsmodels.formula.api import ols

# Simulating data
np.random.seed(42)  # for reproducibility

# Create a DataFrame with sales data for each store
data = pd.DataFrame({
    'Store': np.repeat(['A', 'B', 'C'], 30),
    'Sales': np.random.normal(loc=[100, 110, 120], scale=10, size=90)
})

# Perform repeated measures ANOVA
model = ols('Sales ~ C(Store)', data=data).fit()
anova_table = sm.stats.anova_lm(model, typ=2)

print("Repeated Measures ANOVA Results:")
print(anova_table)


ValueError: shape mismatch: objects cannot be broadcast to a single shape.  Mismatch is between arg 0 with shape (90,) and arg 1 with shape (3,).