## Q1. Explain the assumptions required to use ANOVA and provide examples of violations that could impact the validity of the results.

ANOVA assumes independence of observations, normality of the residuals, and homogeneity of variances across groups. Violations, such as correlated samples, non-normal distributions, or unequal variances, can lead to inaccurate F-tests and misleading conclusions.

## Q2. What are the three types of ANOVA, and in what situations would each be used?

The three types of ANOVA are one-way ANOVA (used when comparing means across three or more independent groups), two-way ANOVA (used when examining the interaction between two independent variables and their effects on a dependent variable), and repeated measures ANOVA (used for analyzing the same subjects across multiple conditions or time points). Each type is selected based on the design of the experiment and the nature of the data being analyzed.

## Q3. What is the partitioning of variance in ANOVA, and why is it important to understand this concept?

Partitioning of variance in ANOVA involves breaking down the total variance into components attributable to different sources, such as between-group variance and within-group variance. Understanding this concept is crucial as it helps determine the extent to which group differences are significant relative to the natural variability within groups.

## Q4. How would you calculate the total sum of squares (SST), explained sum of squares (SSE), and residual sum of squares (SSR) in a one-way ANOVA using Python?

In [6]:
import numpy as np
from scipy.stats import f_oneway

data = [group1, group2, group3] 
grand_mean = np.mean(np.concatenate(data))
SST = np.sum((np.concatenate(data) - grand_mean) ** 2)
SSE = np.sum([len(group) * (np.mean(group) - grand_mean) ** 2 for group in data])
SSR = SST - SSE


NameError: name 'group1' is not defined

## Q5. In a two-way ANOVA, how would you calculate the main effects and interaction effects using Python?

In [9]:
import pandas as pd
import statsmodels.api as sm
from statsmodels.formula.api import ols
data = pd.DataFrame({
    'DependentVar': dependent_variable_data,
    'Factor1': factor1_data,  
    'Factor2': factor2_data  
})

model = ols('DependentVar ~ Factor1 * Factor2', data=data).fit()
anova_table = sm.stats.anova_lm(model, typ=2)  
print(anova_table)


NameError: name 'dependent_variable_data' is not defined

## Q6. Suppose you conducted a one-way ANOVA and obtained an F-statistic of 5.23 and a p-value of 0.02. What can you conclude about the differences between the groups, and how would you interpret these results?

With an F-statistic of 5.23 and a p-value of 0.02, you can conclude that there are statistically significant differences between the groups, as the p-value is less than the common alpha level of 0.05. This suggests that at least one group mean differs from the others, warranting further investigation through post-hoc tests to identify which groups are significantly different.

## Q7. In a repeated measures ANOVA, how would you handle missing data, and what are the potential consequences of using different methods to handle missing data?

In a repeated measures ANOVA, you can handle missing data by using techniques such as Last Observation Carried Forward (LOCF), multiple imputation, or mixed-effects models that can accommodate missingness. The choice of method can impact the validity of results; for instance, LOCF may bias results by not accounting for variability, while multiple imputation may provide more accurate estimates but relies on assumptions about the missing data mechanism.

## Q8. What are some common post-hoc tests used after ANOVA, and when would you use each one? Provide an example of a situation where a post-hoc test might be necessary.

Common post-hoc tests after ANOVA include Tukey's HSD (used when you want to compare all pairs of group means while controlling for Type I error), Bonferroni correction (used for a smaller number of comparisons to reduce the risk of false positives), and Scheff√©'s test (used for comparing complex contrasts among group means). For example, after finding significant differences in a one-way ANOVA comparing the effectiveness of three different teaching methods, a post-hoc test would be necessary to identify which specific methods differ from each other.

## Q9. A researcher wants to compare the mean weight loss of three diets: A, B, and C. They collect data from 50 participants who were randomly assigned to one of the diets. Conduct a one-way ANOVA using Python to determine if there are any significant differences between the mean weight loss of the three diets. Report the F-statistic and p-value, and interpret the results.

In [14]:
import numpy as np
from scipy.stats import f_oneway

diet_A = np.random.normal(loc=5, scale=1, size=50)  
diet_B = np.random.normal(loc=7, scale=1, size=50)  
diet_C = np.random.normal(loc=6, scale=1, size=50)  

F_statistic, p_value = f_oneway(diet_A, diet_B, diet_C)

print(f"F-statistic: {F_statistic:.2f}, p-value: {p_value:.4f}")


F-statistic: 48.81, p-value: 0.0000


## Q10. A company wants to know if there are any significant differences in the average time it takes to complete a task using three different software programs: Program A, Program B, and Program C. They randomly assign 30 employees to one of the programs and record the time it takes each employee to complete the task. Conduct a two-way ANOVA using Python to determine if there are any main effects or interaction effects between the software programs and employee experience level (novice vs. experienced). Report the F-statistics and p-values, and interpret the results.

In [16]:
import pandas as pd
import statsmodels.api as sm
from statsmodels.formula.api import ols

data = pd.DataFrame({
    'Time': [/* completion times */],
    'Program': [/* Program A, B, or C */],
    'Experience': [/* 'Novice' or 'Experienced' */]
})

model = ols('Time ~ Program * Experience', data=data).fit()
anova_table = sm.stats.anova_lm(model, typ=2)

print(anova_table)


SyntaxError: invalid syntax (455110010.py, line 6)

## Q11. An educational researcher is interested in whether a new teaching method improves student test scores. They randomly assign 100 students to either the control group (traditional teaching method) or the experimental group (new teaching method) and administer a test at the end of the semester. Conduct a two-sample t-test using Python to determine if there are any significant differences in test scores between the two groups. If the results are significant, follow up with a post-hoc test to determine which group(s) differ significantly from each other.

In [18]:
import numpy as np
from scipy.stats import ttest_ind
import statsmodels.api as sm
from statsmodels.stats.multicomp import pairwise_tukeyhsd

control_group = np.random.normal(loc=75, scale=10, size=50)  
experimental_group = np.random.normal(loc=80, scale=10, size=50)  

t_stat, p_value = ttest_ind(control_group, experimental_group)

if p_value < 0.05:
    all_scores = np.concatenate([control_group, experimental_group])
    groups = ['Control'] * len(control_group) + ['Experimental'] * len(experimental_group)
    post_hoc = pairwise_tukeyhsd(all_scores, groups)
    print(post_hoc)
else:
    print(f"T-test p-value: {p_value:.4f} (not significant)")


   Multiple Comparison of Means - Tukey HSD, FWER=0.05   
 group1    group2    meandiff p-adj  lower  upper  reject
---------------------------------------------------------
Control Experimental    6.476 0.0011 2.6459 10.306   True
---------------------------------------------------------


## Q12. A researcher wants to know if there are any significant differences in the average daily sales of three retail stores: Store A, Store B, and Store C. They randomly select 30 days and record the sales for each store on those days. Conduct a repeated measures ANOVA using Python to determine if there are any significant differences in sales between the three stores. If the results are significant, follow up with a posthoc test to determine which store(s) differ significantly from each other.

In [20]:
import pandas as pd
from scipy.stats import f_oneway
from statsmodels.stats.anova import AnovaRM
from statsmodels.stats.multicomp import pairwise_tukeyhsd

data = pd.DataFrame({
    'Sales': [/* sales data */],
    'Store': [/* Store A, B, C identifiers */],
    'Day': [/* day identifiers */]
})

anova_results = AnovaRM(data, 'Sales', 'Day', within=['Store']).fit()
print(anova_results)

if anova_results.pvalue < 0.05:
    posthoc = pairwise_tukeyhsd(data['Sales'], data['Store'])
    print(posthoc)


SyntaxError: invalid syntax (1216606817.py, line 7)