Q1: Explain the assumptions required to use ANOVA and provide examples of violations that could impact the validity of the results.
Assumptions for ANOVA (Analysis of Variance):
Independence: Observations within and between groups are independent. Violation Example: Data points within a group are correlated,
such as repeated measurements on the same subjects without accounting for the correlation.
Normality: Residuals (the differences between observed and predicted values) are normally distributed within each group.
Violation Example: Residuals have a skewed or non-normal distribution within one or more groups.
homogeneity of Variance: The variance of the residuals is approximately equal across all groups (homoscedasticity). 
violation Example: One group has significantly larger variances than others, leading to unequal spread in the data.
Homogeneity of Regression Slopes (for factorial ANOVA): The relationships between the independent variable(s) and 
the dependent variable are consistent across groups. Violation Example: The effect of one independent variable
on the dependent variable differs across groups.

Q2: What are the three types of ANOVA, and in what situations would each be used?
One-Way ANOVA: Used to compare means of three or more groups (levels) of one independent variable. For example, comparing the test scores of students in different classes (Class A, Class B, Class C).
Two-Way ANOVA: Used to analyze the influence of two categorical independent variables on a continuous dependent variable. It explores main effects and interaction effects between the two factors. For example, examining how both gender and age impact test scores.
Repeated Measures ANOVA: Used when the same subjects are used for each treatment (within-subjects design). It assesses the impact of one or more independent variables over multiple time points or conditions. For example, studying the effect of a drug on patients' health over time.

Q3: What is the partitioning of variance in ANOVA, and why is it important to understand this concept?
The partitioning of variance in ANOVA involves breaking down the total variance in a dataset into different components:
Total Sum of Squares (SST): Measures the total variability in the data.
Explained Sum of Squares (SSE): Measures the variability explained by the factors or independent variables.
Residual Sum of Squares (SSR): Measures the unexplained or error variability in the data.
Understanding this concept is important because it allows researchers to assess how much of the total variance in the dependent variable can be attributed to the factors being studied (SSE), and how much is due to random variation or unexplained factors (SSR). This helps in determining the significance of the factors and the overall model fit.

#Q4
import scipy.stats as stats

# Sample data for each group (assuming three groups)
group1 = [data for group 1]
group2 = [data for group 2]
group3 = [data for group 3]

# Calculate the grand mean
grand_mean = (sum(group1) + sum(group2) + sum(group3)) / (len(group1) + len(group2) + len(group3))

# Calculate Total Sum of Squares (SST)
SST = sum((data - grand_mean) ** 2 for data in group1 + group2 + group3)

# Calculate Explained Sum of Squares (SSE)
SSE = sum(len(group1) * [(mean(group1) - grand_mean) ** 2] +
          len(group2) * [(mean(group2) - grand_mean) ** 2] +
          len(group3) * [(mean(group3) - grand_mean) ** 2])

# Calculate Residual Sum of Squares (SSR)
SSR = SST - SSE

import statsmodels.api as sm
from statsmodels.formula.api import ols

# Assuming you have a DataFrame df with columns for the two independent variables (factor1 and factor2) and the dependent variable (y)

# Fit the two-way ANOVA model
model = ols('y ~ C(factor1) * C(factor2)', data=df).fit()

# Calculate main effects and interaction effect
interaction_effect = model.f_test("C(factor1):C(factor2)")
main_effect_factor1 = model.f_test("C(factor1)")
main_effect_factor2 = model.f_test("C(factor2)")

print("Interaction Effect:", interaction_effect.pvalue)
print("Main Effect of Factor 1:", main_effect_factor1.pvalue)
print("Main Effect of Factor 2:", main_effect_factor2.pvalue)


Q6: Suppose you conducted a one-way ANOVA and obtained an F-statistic of 5.23 and a p-value of 0.02. What can you conclude about the differences between the groups, and how would you interpret these results?

In a one-way ANOVA, the F-statistic tests whether there are significant differences between the means of three or more groups. In your case, with an F-statistic of 5.23 and a p-value of 0.02, you would interpret the results as follows:
Null Hypothesis (H0): The means of all groups are equal.
Alternative Hypothesis (Ha): At least one group mean is different from the others.
Since the p-value (0.02) is less than the chosen significance level (e.g., 0.05), you would reject the null hypothesis. Therefore, you can conclude that there are statistically significant differences between the groups. However, you would need to conduct post-hoc tests to determine which specific groups differ from each other.

Q7: In a repeated measures ANOVA, how would you handle missing data, and what are the potential consequences of using different methods to handle missing data?

Handling missing data in a repeated measures ANOVA can be critical to obtaining valid results. There are several methods to handle missing data:
Listwise Deletion: Remove any cases with missing data. This approach can lead to reduced sample size and potential bias if the data is not missing completely at random.
Mean Imputation: Replace missing values with the mean of the available data. This method may distort the variance and covariance structure of the data.
Interpolation: Use interpolation techniques to estimate missing values based on the values of adjacent time points. This can work well if the missingness pattern is related to time.
Multiple Imputation: Generate multiple imputed datasets and conduct the analysis separately on each dataset. Pool the results to obtain valid estimates and standard errors.
The choice of method should depend on the nature and pattern of missing data and should be justified in the analysis. Using inappropriate methods can lead to biased or misleading results.

Q8: What are some common post-hoc tests used after ANOVA, and when would you use each one? Provide an example of a situation where a post-hoc test might be necessary.

Common post-hoc tests used after ANOVA include:
Tukey's Honestly Significant Difference (Tukey's HSD): Used to compare all possible pairs of group means. It is suitable when you have three or more groups, and you want to identify which specific pairs of groups differ significantly.
Bonferroni Correction: Adjusts the significance level to control for multiple comparisons. It is conservative but effective in controlling the familywise error rate.
Dunnett's Test: Used when you have one control group and want to compare it to multiple treatment groups. It controls the overall Type I error rate.
Scheffe's Test: Appropriate for complex designs with unequal group sizes and multiple factors. It provides a general method for comparing groups.
Holm-Bonferroni Method: A step-down method that controls the familywise error rate while allowing for multiple comparisons.
Example situation: Suppose you conducted a one-way ANOVA to compare the performance of students in three different teaching methods. After obtaining a significant ANOVA result, you want to determine which specific pairs of teaching methods lead to significantly different outcomes. In this case, you would perform a post-hoc test (e.g., Tukey's HSD) to identify the pairwise differences.

#Q9
import scipy.stats as stats

# Sample data for each diet group (assuming three groups)
diet_A = [weight_loss_data for diet A]
diet_B = [weight_loss_data for diet B]
diet_C = [weight_loss_data for diet C]

# Perform one-way ANOVA
f_statistic, p_value = stats.f_oneway(diet_A, diet_B, diet_C)

alpha = 0.05

print("F-Statistic:", f_statistic)
print("p-value:", p_value)

if p_value < alpha:
    print("Reject the null hypothesis: There are significant differences between the mean weight loss of the diets.")
else:
    print("Fail to reject the null hypothesis: There are no significant differences between the mean weight loss of the diets.")


#Q10
import pandas as pd
import statsmodels.api as sm
from statsmodels.formula.api import ols

# Assuming you have a DataFrame df with columns for software program, employee experience, and task completion time

# Fit the two-way ANOVA model
model = ols('time ~ C(program) * C(experience)', data=df).fit()

# Perform two-way ANOVA
anova_table = sm.stats.anova_lm(model, typ=2)

print(anova_table)

#Q11
import scipy.stats as stats
import pandas as pd
from statsmodels.stats.multicomp import MultiComparison

# Assuming you have a DataFrame df with columns for group (control/experimental) and test scores

# Separate data into control and experimental groups
control_group = df[df['group'] == 'control']['test_scores']
experimental_group = df[df['group'] == 'experimental']['test_scores']

# Perform two-sample t-test
t_statistic, p_value = stats.ttest_ind(control_group, experimental_group)

alpha = 0.05

print("Two-Sample T-Test - t-statistic:", t_statistic)
print("Two-Sample T-Test - p-value:", p_value)

if p_value < alpha:
    print("Reject the null hypothesis: There is a significant difference in test scores between the control and experimental groups.")
    
    # If significant, perform post-hoc test (e.g., Tukey's HSD)
    mc = MultiComparison(df['test_scores'], df['group'])
    posthoc_result = mc.tukeyhsd()
    print(posthoc_result)
else:
    print("Fail to reject the null hypothesis: There is no significant difference in test scores between the groups.")

#Q12
import pandas as pd
import statsmodels.api as sm
from statsmodels.formula.api import ols

# Assuming you have a DataFrame df with columns for store (Store A, Store B, Store C) and daily sales data

# Fit the repeated measures ANOVA model
model = ols('sales ~ C(store)', data=df).fit()

# Perform repeated measures ANOVA
repeated_measures_anova_table = sm.stats.anova_lm(model, typ=2)

print(repeated_measures_anova_table)
