Q1. Explain the assumptions required to use ANOVA and provide examples of violations that could impact
the validity of the results.

Ans--

Analysis of Variance (ANOVA) is a statistical method used to compare means across two or more groups to determine whether there are any statistically significant differences among them. However, ANOVA comes with certain assumptions that need to be met for the results to be valid. These assumptions are crucial because violations can lead to inaccurate or misleading conclusions. The assumptions for ANOVA are as follows:

**1. Independence:** The observations within each group and between groups should be independent of each other. This means that the data points in one group should not be influenced by or related to the data points in other groups.

**2. Normality:** The residuals (the differences between observed values and predicted values) should follow a normal distribution within each group. Normality assumption ensures that the sampling distribution of the means is also normal, which is important for making valid inferences.

**3. Homogeneity of Variance (Homoscedasticity):** The variance of the residuals should be roughly constant across all levels of the independent variable. In other words, the spread of data points around the mean should be approximately equal across different groups.

**4. Equal Sample Sizes (for one-way ANOVA):** In a one-way ANOVA, if you have multiple groups, it's ideal to have roughly equal sample sizes in each group. This helps to maintain the validity of the F-test used in ANOVA.

**5. Random Sampling:** The data should be collected using random sampling methods, so the results can be generalized to the larger population from which the samples were drawn.

Examples of violations that could impact the validity of ANOVA results:

**1. Non-Independence:** If the data points within or between groups are not independent, it can lead to issues. For instance, if observations in one group are influenced by those in another group, the assumption of independence is violated.

**2. Non-Normality:** If the residuals within each group do not follow a normal distribution, it can lead to incorrect p-values and confidence intervals. This can happen when the data is heavily skewed or has outliers.

**3. Violation of Homoscedasticity:** When the variability of residuals differs significantly across groups, the assumption of homogeneity of variance is violated. This can impact the accuracy of the F-test and the validity of the ANOVA results.

**4. Unequal Sample Sizes:** In one-way ANOVA, having unequal sample sizes can affect the power of the test and potentially lead to biased results. It's generally recommended to have approximately equal sample sizes in each group.

**5. Non-Random Sampling:** If the data is not collected using proper random sampling techniques, the results might not be generalizable to the larger population.

When these assumptions are violated, it's important to consider alternative statistical methods or transformations of the data that might be more appropriate for your analysis. Additionally, it's always good practice to visually inspect your data using diagnostic plots to assess the assumptions before relying on ANOVA results.

Q2. What are the three types of ANOVA, and in what situations would each be used?

Ans--

There are three main types of Analysis of Variance (ANOVA) techniques, each designed for specific types of experimental designs and research questions:

**1. One-Way ANOVA:**
One-Way ANOVA is used when you have one independent variable (factor) with three or more levels (groups) and you want to determine if there are any significant differences in means among these groups. This is suitable when you're comparing means across multiple categories.

Example: A pharmaceutical company wants to test the effectiveness of three different doses of a new drug by administering it to three different groups of patients.

**2. Two-Way ANOVA:**
Two-Way ANOVA is used when you have two independent variables (factors) and you want to assess the effects of each factor on the dependent variable, as well as any interaction between the factors. It's suitable for designs where you're interested in studying the combined effects of two factors.

Example: A study examines the effects of both gender and diet on weight loss. The study has two independent variables: gender (male or female) and diet type (low-carb, high-protein, balanced). Two-Way ANOVA would help determine the individual and combined effects of gender and diet on weight loss.

**3. Repeated Measures ANOVA:**
Repeated Measures ANOVA (also known as Within-Subjects ANOVA) is used when the same subjects are measured under different conditions or at different time points. It's used to analyze within-subject variations over time or across conditions.

Example: A study measures the performance of the same group of students on a test before training, after one week of training, and after two weeks of training. Repeated Measures ANOVA would be used to analyze whether there are any significant differences in test scores across the different time points.

In summary:

- **One-Way ANOVA**: Used for comparing means across three or more independent groups.
- **Two-Way ANOVA**: Used for studying the effects of two independent variables and their interactions on a dependent variable.
- **Repeated Measures ANOVA**: Used for analyzing within-subject variations across different conditions or time points.

It's important to choose the appropriate type of ANOVA based on your research design and objectives. Additionally, it's always a good practice to check for the assumptions of ANOVA before interpreting the results. If the assumptions are violated, alternative methods or transformations may be necessary.

Q3. What is the partitioning of variance in ANOVA, and why is it important to understand this concept?

Ans--

Partitioning of variance in ANOVA refers to the process of decomposing the total variability in the data into different sources of variability or "components." This decomposition helps us understand the contributions of different factors or sources to the overall variability observed in the dependent variable. The partitioning of variance is a fundamental concept in ANOVA, and it provides valuable insights into the sources of variation and the significance of those sources.

In ANOVA, the total variance of the dependent variable is divided into three main components:

1. **Between-Groups Variance (Variability between groups):** This component of variance represents the variability in the dependent variable that can be attributed to differences between the group means. It indicates whether the means of different groups are significantly different from each other.

2. **Within-Groups Variance (Variability within groups):** This component of variance represents the variability in the dependent variable within each group. It reflects the natural variability and measurement error present within individual groups.

3. **Error Variance (Unexplained variability):** Error variance, also known as residual variance, represents the unexplained variability in the dependent variable that cannot be attributed to the factors being studied. It includes any sources of variability that are not accounted for by the model.

Mathematically, the total variance (Total SS) can be partitioned as follows:
Total SS = Between-Groups SS + Within-Groups SS

Understanding the partitioning of variance is important for several reasons:

1. **Interpretation of Results:** By decomposing the total variance into these components, ANOVA allows you to assess the contributions of different factors to the overall variability. This helps you understand which factors are significant in explaining the differences among groups.

2. **Hypothesis Testing:** ANOVA uses the partitioning of variance to perform hypothesis tests. The F-statistic is calculated by comparing the variance between groups to the variance within groups. This F-test helps determine whether the group means are significantly different.

3. **Model Validation:** By examining the portion of variability that is unexplained (error variance), you can assess how well the model fits the data. If the error variance is small compared to the total variance, it suggests that the model is explaining a substantial portion of the variability.

4. **Identifying Sources of Variation:** Partitioning of variance helps researchers identify which factors are contributing the most to the observed variation. This can guide further investigation and help make informed decisions.

5. **Design and Planning:** When designing experiments, understanding how variance is partitioned can guide decisions about how to allocate resources and determine sample sizes.

In summary, the partitioning of variance in ANOVA provides a structured framework for analyzing and understanding the sources of variability in data. It's a key concept that underlies the hypothesis testing and interpretation process in ANOVA analyses.

Q4. How would you calculate the total sum of squares (SST), explained sum of squares (SSE), and residual
sum of squares (SSR) in a one-way ANOVA using Python?

Ans---

In a one-way ANOVA, you can calculate the Total Sum of Squares (SST), Explained Sum of Squares (SSE), and Residual Sum of Squares (SSR) to analyze the variance within the data. You can use Python and libraries like NumPy and SciPy to perform these calculations. Here's how you can calculate these sums of squares:

In [1]:
import numpy as np
from scipy import stats

# Sample data for each group
group1 = np.array([20, 25, 30, 28, 32])
group2 = np.array([15, 18, 22, 20, 25])
group3 = np.array([10, 12, 15, 11, 18])

# Combine all data into a single array
all_data = np.concatenate([group1, group2, group3])

# Calculate the overall mean
overall_mean = np.mean(all_data)

# Calculate the Total Sum of Squares (SST)
sst = np.sum((all_data - overall_mean)**2)

# Calculate the group means
group1_mean = np.mean(group1)
group2_mean = np.mean(group2)
group3_mean = np.mean(group3)

# Calculate the Explained Sum of Squares (SSE)
sse = len(group1) * (group1_mean - overall_mean)**2 + \
      len(group2) * (group2_mean - overall_mean)**2 + \
      len(group3) * (group3_mean - overall_mean)**2

# Calculate the Residual Sum of Squares (SSR)
ssr = np.sum((group1 - group1_mean)**2) + \
      np.sum((group2 - group2_mean)**2) + \
      np.sum((group3 - group3_mean)**2)

# Print the results
print("Total Sum of Squares (SST):", sst)
print("Explained Sum of Squares (SSE):", sse)
print("Residual Sum of Squares (SSR):", ssr)

# Calculate the degrees of freedom for SST, SSE, and SSR
df_total = len(all_data) - 1
df_groups = 3 - 1  # Number of groups minus 1
df_residual = len(all_data) - 3  # Total - Number of groups

# Calculate the Mean Square for groups (MSG) and residuals (MSR)
msg = sse / df_groups
msr = ssr / df_residual

# Calculate the F-statistic
f_statistic = msg / msr

# Calculate the p-value using the F-distribution
p_value = 1 - stats.f.cdf(f_statistic, df_groups, df_residual)

# Print the F-statistic and p-value
print("F-statistic:", f_statistic)
print("p-value:", p_value)

Total Sum of Squares (SST): 664.9333333333333
Explained Sum of Squares (SSE): 476.1333333333334
Residual Sum of Squares (SSR): 188.8
F-statistic: 15.131355932203391
p-value: 0.0005240145076064184


In this example, replace the group1, group2, and group3 arrays with your own data for each group. The code calculates the SST, SSE, and SSR, and then it calculates the F-statistic and p-value for hypothesis testing. Make sure you have the NumPy and SciPy libraries installed to run this code.

Q5. In a two-way ANOVA, how would you calculate the main effects and interaction effects using Python?

Ans--

In a two-way ANOVA, you analyze the effects of two independent variables (factors) on a dependent variable. You can calculate the main effects of each factor and the interaction effect between the factors. Here's how you can calculate these effects using Python and the statsmodels library:

In [1]:
import numpy as np
import pandas as pd
import statsmodels.api as sm
from statsmodels.formula.api import ols

# Sample data
data = {
    'Factor1': [1, 1, 1, 2, 2, 2, 3, 3, 3],
    'Factor2': [1, 2, 3, 1, 2, 3, 1, 2, 3],
    'Dependent': [10, 12, 15, 18, 20, 23, 28, 30, 33]
}

# Create a DataFrame
df = pd.DataFrame(data)

# Perform two-way ANOVA
model = ols('Dependent ~ Factor1 + Factor2 + Factor1:Factor2', data=df).fit()
anova_table = sm.stats.anova_lm(model, typ=2)

# Print the ANOVA table
print(anova_table)

                       sum_sq   df             F        PR(>F)
Factor1          4.860000e+02  1.0  9.720000e+02  6.373278e-07
Factor2          3.750000e+01  1.0  7.500000e+01  3.392559e-04
Factor1:Factor2  4.892170e-29  1.0  9.784340e-29  1.000000e+00
Residual         2.500000e+00  5.0           NaN           NaN


In this example, replace the sample data in the data dictionary with your own data. The Factor1 and Factor2 columns represent the levels of the two independent variables, and the Dependent column represents the dependent variable.

The ols function from the statsmodels library is used to fit the ANOVA model. The formula 'Dependent ~ Factor1 + Factor2 + Factor1:Factor2' specifies the model with main effects of Factor1 and Factor2, as well as their interaction (Factor1:Factor2).

The sm.stats.anova_lm function calculates the ANOVA table, which includes the main effects of each factor and the interaction effect. The typ=2 argument specifies the type of sum of squares calculation to be used.

The resulting ANOVA table will provide you with the F-statistics, degrees of freedom, p-values, and other information for each effect. This can help you assess the significance of the main effects and interaction effect in your two-way ANOVA.

Q6. Suppose you conducted a one-way ANOVA and obtained an F-statistic of 5.23 and a p-value of 0.02.
What can you conclude about the differences between the groups, and how would you interpret these
results?

Ans--

In a one-way ANOVA, the F-statistic is used to test whether there are significant differences in means among the groups. The associated p-value indicates the probability of observing such a large F-statistic under the assumption that there are no true differences between the group means (null hypothesis). A small p-value suggests that the observed differences are unlikely to have occurred by chance alone, leading to the rejection of the null hypothesis.

In your scenario, you obtained an F-statistic of 5.23 and a p-value of 0.02. Here's how you can interpret these results:

1. **F-Statistic:** The F-statistic value of 5.23 indicates the ratio of variability between the group means to the variability within the groups. A larger F-statistic suggests a larger difference between group means relative to the variability within the groups.

2. **P-Value:** The p-value of 0.02 is below the conventional significance level of 0.05. This indicates that the probability of observing such a large F-statistic under the assumption of no true differences between the group means is only 0.02. In other words, the p-value is small enough to suggest that the observed differences are statistically significant.

3. **Conclusion:** Given the small p-value, you would reject the null hypothesis. This means that there are statistically significant differences between at least some of the group means. In practical terms, you have evidence to suggest that the groups are not all equal with respect to the variable being studied.

4. **Practical Significance:** While statistical significance is important, it's also essential to consider the practical significance of the differences. Even though the differences are statistically significant, they might not be large enough to have meaningful implications in the real world. Interpretation of results should always take into account both statistical and practical significance.

In summary, with an F-statistic of 5.23 and a p-value of 0.02, you can conclude that there are statistically significant differences between the groups' means. This result suggests that the groups are not all equal with respect to the variable you examined in your analysis.

Q7. In a repeated measures ANOVA, how would you handle missing data, and what are the potential
consequences of using different methods to handle missing data?

Ans--

Handling missing data in a repeated measures ANOVA is important to ensure the accuracy and reliability of your analysis. Different methods for handling missing data can have varying effects on the results and conclusions of your study. Here are some common methods for handling missing data in a repeated measures ANOVA and their potential consequences:

**1. Listwise Deletion:**
Listwise deletion involves removing participants who have missing data on any of the measured variables. While it's a straightforward approach, it can lead to a loss of statistical power and potential bias if the missing data are not missing completely at random (MCAR). This method can result in a smaller sample size and potentially biased estimates of means and variances.

**2. Pairwise Deletion:**
Pairwise deletion involves using only the available data for each specific analysis, discarding incomplete cases on a pairwise basis. While this retains more data compared to listwise deletion, it can still lead to biased estimates and standard errors if the missing data are not MCAR. Additionally, the results may vary depending on which pairs of observations are used for analysis.

**3. Imputation Methods:**
Imputation involves estimating missing values based on observed values and certain assumptions. Common imputation methods include mean imputation (replacing missing values with the mean of the variable), regression imputation (predicting missing values based on other variables), and multiple imputation (creating multiple imputed datasets to account for uncertainty).

   - **Potential Consequences:** While imputation can improve sample size and reduce bias compared to deletion methods, it introduces potential bias if the imputation model is misspecified. Additionally, imputed values may not reflect the true variability in the data, leading to underestimated standard errors and inflated significance levels.

**4. Mixed Models (Longitudinal Models):**
Mixed models (also known as hierarchical linear models or multilevel models) are advanced statistical methods that can handle missing data by utilizing all available data while accounting for the underlying structure of repeated measures data. These models can estimate both within-subject and between-subject effects and handle missing data patterns more flexibly.

   - **Potential Consequences:** Mixed models provide a more accurate representation of the data's structure and can yield unbiased estimates even when data are missing not completely at random (MNAR). However, they might be more complex to implement and require assumptions about the missing data mechanism.

**5. Sensitivity Analysis:**
Performing sensitivity analyses involves assessing how different methods of handling missing data affect the results. By comparing the outcomes using different approaches, you can assess the robustness of your conclusions.

In summary, the choice of method for handling missing data in a repeated measures ANOVA depends on the nature of the missing data and the assumptions you can reasonably make about the data mechanism. It's essential to carefully consider the potential consequences of each method and to report your chosen method and its rationale in your research findings.

Q8. What are some common post-hoc tests used after ANOVA, and when would you use each one? Provide
an example of a situation where a post-hoc test might be necessary.

Ans--


After conducting an ANOVA and finding a significant overall effect, post-hoc tests are used to determine which specific group means are significantly different from each other. Post-hoc tests help avoid the issue of inflating the Type I error rate when conducting multiple pairwise comparisons. There are several post-hoc tests available, and the choice depends on the design and assumptions of your study. Here are some common post-hoc tests and when to use them:

**1. Tukey's Honestly Significant Difference (HSD):**
Tukey's HSD is often used when you have a balanced design (equal sample sizes) and you want to compare all possible pairs of group means. It controls the familywise error rate, making it suitable for making multiple comparisons.

**2. Bonferroni Correction:**
The Bonferroni correction is a conservative approach that divides the desired alpha level by the number of comparisons being made. It's commonly used to control the overall Type I error rate when making multiple comparisons. It can be appropriate when you're concerned about making a large number of comparisons.

**3. Scheffe's Method:**
Scheffe's method is a more conservative post-hoc test that can be used with any design, including unbalanced designs. It provides a more general control of the Type I error rate, making it suitable for situations where you have complex designs or unequal sample sizes.

**4. Dunnett's Test:**
Dunnett's test is used when you have one control group and you want to compare each treatment group to the control group. This is particularly useful in situations where you have a predefined control group and are interested in assessing how other groups differ from it.

**5. Games-Howell Test:**
The Games-Howell test is appropriate when you have unequal variances and potentially unequal sample sizes among groups. It's a more robust alternative when the assumption of equal variances is violated.

Example situation where a post-hoc test might be necessary:

Suppose you're conducting a study to compare the effectiveness of three different teaching methods (A, B, and C) on student performance. After performing a one-way ANOVA, you find a significant difference in means (p < 0.05). Now, you want to know which specific teaching methods are different from each other.

Here's how you might use a post-hoc test:

1. Conduct the one-way ANOVA to determine overall differences.
2. Since you have three teaching methods, you decide to perform a post-hoc test to compare the group means.
3. Depending on the assumptions of your data (balanced vs. unbalanced, equal variances vs. unequal variances), you choose an appropriate post-hoc test like Tukey's HSD, Scheffe's method, or Games-Howell.
4. Perform the post-hoc test to identify which specific pairs of teaching methods have significantly different means.
5. Interpret the results to draw conclusions about the relative effectiveness of the teaching methods.

Post-hoc tests provide valuable insights by helping you identify significant differences among group means while controlling for the increased chance of Type I error due to multiple comparisons.

Q9. A researcher wants to compare the mean weight loss of three diets: A, B, and C. They collect data from
50 participants who were randomly assigned to one of the diets. Conduct a one-way ANOVA using Python
to determine if there are any significant differences between the mean weight loss of the three diets.
Report the F-statistic and p-value, and interpret the results.

Ans--

Here's an example of how you can conduct a one-way ANOVA using Python to analyze the mean weight loss of three diets (A, B, and C):

In [2]:
import numpy as np
import scipy.stats as stats

# Sample data for weight loss in each diet group
diet_A = np.array([2.5, 3.0, 2.2, 2.8, 2.6, 2.7, 2.9, 3.2, 2.8, 3.1,
                   2.4, 2.7, 2.9, 2.5, 2.6, 2.8, 2.3, 2.5, 2.4, 2.6,
                   2.7, 2.6, 2.8, 3.0, 2.9, 2.7, 2.4, 2.3, 2.5, 2.8,
                   3.0, 2.7, 2.6, 2.4, 2.5, 2.9, 3.2, 2.3, 2.8, 2.5,
                   2.6, 2.4, 2.2, 2.7, 2.8, 3.1, 2.9, 2.5, 2.6, 2.4])

diet_B = np.array([1.8, 2.0, 1.9, 2.1, 1.7, 2.2, 2.5, 1.9, 2.3, 2.1,
                   1.6, 2.0, 2.2, 2.3, 2.1, 1.8, 2.0, 2.3, 2.2, 2.1,
                   2.0, 1.8, 1.9, 2.2, 2.1, 2.3, 2.0, 1.7, 2.2, 1.8,
                   2.1, 1.9, 2.0, 2.3, 1.8, 2.1, 2.2, 2.0, 2.3, 2.1,
                   2.0, 1.7, 1.9, 2.1, 1.8, 2.2, 2.0, 2.3, 2.1, 2.2])

diet_C = np.array([0.5, 0.8, 0.7, 0.9, 0.6, 0.8, 0.9, 0.7, 0.6, 0.5,
                   0.8, 0.9, 0.7, 0.6, 0.5, 0.9, 0.7, 0.8, 0.6, 0.7,
                   0.8, 0.9, 0.5, 0.6, 0.7, 0.8, 0.9, 0.7, 0.5, 0.6,
                   0.8, 0.9, 0.7, 0.6, 0.5, 0.8, 0.7, 0.6, 0.9, 0.8,
                   0.5, 0.7, 0.9, 0.6, 0.8, 0.7, 0.5, 0.6, 0.9, 0.8])

# Combine data from all diet groups
all_data = np.concatenate([diet_A, diet_B, diet_C])

# Group labels for each observation
group_labels = np.array(['A'] * len(diet_A) + ['B'] * len(diet_B) + ['C'] * len(diet_C))

# Perform one-way ANOVA
f_statistic, p_value = stats.f_oneway(diet_A, diet_B, diet_C)

# Print the results
print("F-statistic:", f_statistic)
print("p-value:", p_value)

# Interpret the results
alpha = 0.05
if p_value < alpha:
    print("There are significant differences in mean weight loss among the diet groups.")
else:
    print("There are no significant differences in mean weight loss among the diet groups.")

F-statistic: 1219.2978207428148
p-value: 2.9827680935806507e-92
There are significant differences in mean weight loss among the diet groups.


In this example, the np.array objects diet_A, diet_B, and diet_C represent the weight loss data for each diet group. The stats.f_oneway function is used to perform the one-way ANOVA, and the F-statistic and p-value are printed. The p-value is then interpreted to determine whether there are significant differences in mean weight loss among the diet groups.

Remember that you'll need the NumPy and SciPy libraries installed to run this code.

Q10. A company wants to know if there are any significant differences in the average time it takes to
complete a task using three different software programs: Program A, Program B, and Program C. They
randomly assign 30 employees to one of the programs and record the time it takes each employee to
complete the task. Conduct a two-way ANOVA using Python to determine if there are any main effects or
interaction effects between the software programs and employee experience level (novice vs.
experienced). Report the F-statistics and p-values, and interpret the results.

Ans--

To conduct a two-way ANOVA in Python, you can use the statsmodels library. Here's how you can perform the analysis for the scenario you described:

In [4]:
import numpy as np
import pandas as pd
import statsmodels.api as sm
from statsmodels.formula.api import ols

# Sample data
data = {
    'Software': np.repeat(['A', 'B', 'C'], 30),
    'Experience': np.tile(['Novice', 'Experienced'], 45),
    'Time': np.random.normal(loc=15, scale=2, size=90)  # Simulated time data
}

# Create a DataFrame
df = pd.DataFrame(data)

# Convert Experience column to a categorical variable
df['Experience'] = pd.Categorical(df['Experience'])

# Perform two-way ANOVA
model = ols('Time ~ C(Software) + C(Experience) + C(Software):C(Experience)', data=df).fit()
anova_table = sm.stats.anova_lm(model, typ=2)

# Print the ANOVA table
print(anova_table)

                               sum_sq    df         F    PR(>F)
C(Software)                  2.922222   2.0  0.285864  0.752093
C(Experience)                0.099920   1.0  0.019549  0.889138
C(Software):C(Experience)    1.950725   2.0  0.190828  0.826632
Residual                   429.341987  84.0       NaN       NaN


In this example, replace the simulated time data with your own time measurements. The Software and Experience columns represent the software program and employee experience level, respectively.

The C() function is used to treat the variables as categorical factors. The interaction term C(Software):C(Experience) captures the interaction effect between software programs and experience levels.

The anova_lm function calculates the ANOVA table, which includes the main effects of software programs, experience levels, and the interaction effect. The typ=2 argument specifies the type of sum of squares calculation.

Interpretation of the results:
- Check the p-values associated with each effect (Software, Experience, Interaction).
- If any p-value is below the significance level (e.g., 0.05), you can conclude that there is a significant effect. The F-statistic measures the variance explained by that effect relative to the residual variance.
- If the interaction effect is significant, it suggests that the combined effects of software programs and experience levels are not additive; they interact in influencing the task completion time.

Remember that this example uses simulated data, so make sure to replace it with your actual data. Additionally, you need to have the numpy, pandas, and statsmodels libraries installed to run the code.

Q11. An educational researcher is interested in whether a new teaching method improves student test
scores. They randomly assign 100 students to either the control group (traditional teaching method) or the
experimental group (new teaching method) and administer a test at the end of the semester. Conduct a
two-sample t-test using Python to determine if there are any significant differences in test scores
between the two groups. If the results are significant, follow up with a post-hoc test to determine which
group(s) differ significantly from each other.

Ans--

Certainly! You can use Python and the scipy library to conduct a two-sample t-test to compare the test scores of the control and experimental groups. If the results are significant, you can follow up with a post-hoc test to identify which group(s) differ significantly. Here's how you can do it:

In [None]:
import numpy as np
import scipy.stats as stats

# Sample data
control_group = np.array([85, 78, 92, 70, 88, 80, 95, 72, 75, 85,
                          88, 90, 78, 82, 80, 85, 76, 80, 84, 88,
                          82, 86, 75, 79, 81, 77, 89, 74, 83, 79,
                          85, 73, 81, 76, 88, 84, 82, 90, 86, 79,
                          81, 84, 89, 83, 77, 80, 75, 88, 82, 86])

experimental_group = np.array([92, 89, 98, 82, 94, 87, 96, 80, 85, 92,
                               94, 96, 88, 90, 84, 91, 82, 86, 89, 93,
                               90, 91, 81, 85, 88, 83, 95, 82, 88, 84,
                               91, 79, 88, 82, 96, 90, 88, 97, 91, 85,
                               89, 92, 95, 90, 84, 86, 82, 96, 88, 91])

# Perform two-sample t-test
t_statistic, p_value = stats.ttest_ind(control_group, experimental_group)

# Print the results of the t-test
print("t-statistic:", t_statistic)
print("p-value:", p_value)

# Check if the results are significant
alpha = 0.05
if p_value < alpha:
    print("There is a significant difference in test scores between the two groups.")
else:
    print("There is no significant difference in test scores between the two groups.")

# Perform post-hoc test (if significant)
if p_value < alpha:
    # Perform a post-hoc test (e.g., Tukey-Kramer) to determine which group(s) differ significantly
    posthoc_results = stats.tukey_kramer(control_group, experimental_group)
    print("Post-hoc test results:")
    print(posthoc_results)

In this example, the control_group and experimental_group arrays represent the test scores of the control and experimental groups, respectively.

The stats.ttest_ind function performs the two-sample t-test to compare the means of the two groups. The p-value is then compared to the significance level (alpha) to determine if there is a significant difference in test scores between the groups.

If the results are significant, a post-hoc test (such as Tukey-Kramer) is performed to identify which group(s) differ significantly from each other. Make sure you have the numpy and scipy libraries installed to run the code.

Q12. A researcher wants to know if there are any significant differences in the average daily sales of three
retail stores: Store A, Store B, and Store C. They randomly select 30 days and record the sales for each store
on those days. Conduct a repeated measures ANOVA using Python to determine if there are any significant differences in sales between the three stores. If the results are significant, follow up with a post-
hoc test to determine which store(s) differ significantly from each other.