Q1. Explain the assumptions required to use ANOVA and provide examples of violations that could impact the validity of the results.
 Q2. What are the three types of ANOVA, and in what situations would each be used?
 Q3. What is the partitioning of variance in ANOVA, and why is it important to understand this concept?
 Q4. How would you calculate the total sum of squares (SST), explained sum of squares (SSE), and residual sum of squares (SSR) in a one-way ANOVA using Python?
 Q5. In a two-way ANOVA, how would you calculate the main effects and interaction effects using Python?
 Q6. Suppose you conducted a one-way ANOVA and obtained an F-statistic of 5.23 and a p-value of 0.02. What can you conclude about the differences between the groups, and how would you interpret these results?
 Q7. In a repeated measures ANOVA, how would you handle missing data, and what are the potential consequences of using different methods to handle missing data?
 Q8. What are some common post-hoc tests used after ANOVA, and when would you use each one? Provide an example of a situation where a post-hoc test might be necessary.
 Q9. A researcher wants to compare the mean weight loss of three diets: A, B, and C. They collect data from 50 participants who were randomly assigned to one of the diets. Conduct a one-way ANOVA using Python to determine if there are any significant differences between the mean weight loss of the three diets. Report the F-statistic and p-value, and interpret the results.
 Q10. A company wants to know if there are any significant differences in the average time it takes to complete a task using three different software programs: Program A, Program B, and Program C. They randomly assign 30 employees to one of the programs and record the time it takes each employee to complete the task. Conduct a two-way ANOVA using Python to determine if there are any main effects or interaction effects between the software programs and employee experience level (novice vs. experienced). Report the F-statistics and p-values, and interpret the results.
 Q11. An educational researcher is interested in whether a new teaching method improves student test scores. They randomly assign 100 students to either the control group (traditional teaching method) or the experimental group (new teaching method) and administer a test at the end of the semester. Conduct a two-sample t-test using Python to determine if there are any significant differences in test scores between the two groups. If the results are significant, follow up with a post-hoc test to determine which group(s) differ significantly from each other.
 Q12. A researcher wants to know if there are any significant differences in the average daily sales of three retail stores: Store A, Store B, and Store C. They randomly select 30 days and record the sales for each store on those days. Conduct a repeated measures ANOVA using Python to determine if there are any significant differences in sales between the three stores. If the results are significant, follow up with a post hoc test to determine which store(s) differ significantly from each other.

Q1. Assumptions for ANOVA
Assumptions:

Independence: The samples must be independent of each other.
Normality: The data in each group should be approximately normally distributed.
Homogeneity of Variances (Homoscedasticity): The variances among the groups should be approximately equal.
Examples of Violations:

Independence: If data points within groups are related (e.g., repeated measures from the same subjects), it violates independence.
Normality: If data in any group is heavily skewed or contains outliers, it can violate normality.
Homogeneity of Variances: If one group has a much larger variance than others, it violates homogeneity of variances.


Q2. Types of ANOVA
One-Way ANOVA: Used when comparing the means of three or more independent groups based on one factor.
Example: Comparing test scores of students from three different teaching methods.
Two-Way ANOVA: Used when comparing means based on two factors, allowing for the assessment of interaction effects.
Example: Comparing test scores based on teaching method and gender.
Repeated Measures ANOVA: Used when the same subjects are measured multiple times under different conditions.
Example: Measuring the effect of different diets on the same group of participants over several months.


Q3. Partitioning of Variance in ANOVA
Partitioning of Variance:

Total Sum of Squares (SST): The total variability in the data.
Between-Group Sum of Squares (SSB): The variability due to the differences between group means.
Within-Group Sum of Squares (SSW): The variability within each group.
Importance:
Understanding partitioning helps in determining how much of the total variability is explained by the group differences versus random error, which is critical in assessing the significance of the group differences.

In [1]:
# Q4. Calculating Sum of Squares in One-Way ANOVA using Python
import numpy as np
import pandas as pd
from scipy import stats

# Sample data
data = {
    'Diet_A': [2.5, 3.6, 2.9, 3.2, 2.7],
    'Diet_B': [3.8, 3.1, 3.3, 3.7, 3.0],
    'Diet_C': [2.9, 3.4, 3.1, 3.3, 3.5]
}
df = pd.DataFrame(data)

# Calculating means
overall_mean = df.values.flatten().mean()
group_means = df.mean()

# SST (Total Sum of Squares)
sst = np.sum((df.values.flatten() - overall_mean) ** 2)

# SSB (Between-Group Sum of Squares)
ssb = sum(df.count() * (group_means - overall_mean) ** 2)

# SSW (Within-Group Sum of Squares)
ssw = np.sum((df - group_means) ** 2)

print(f"SST: {sst}, SSB: {ssb}, SSW: {ssw}")

SST: 1.9, SSB: 0.41200000000000075, SSW: Diet_A    0.748
Diet_B    0.508
Diet_C    0.232
dtype: float64


In [None]:
# Q5. Main Effects and Interaction Effects in Two-Way ANOVA using Python
import pandas as pd
import statsmodels.api as sm
from statsmodels.formula.api import ols

# Sample data
data = {
    'Software': ['A', 'A', 'B', 'B'],
    'Experience': ['Novice', 'Experienced','no exp','fresher'] * 15,
    'Time': [45, 40, 50, 45]
}
df = pd.DataFrame(data)

# Two-way ANOVA
model = ols('Time ~ C(Software) + C(Experience) + C(Software):C(Experience)', data=df).fit()
anova_table = sm.stats.anova_lm(model, typ=2)

print(anova_table)

Q6. Interpreting One-Way ANOVA Results
With an F-statistic of 5.23 and a p-value of 0.02:

Since the p-value (0.02) is less than the significance level (0.05), we reject the null hypothesis.
This indicates that there are significant differences between the group means.

Q7. Handling Missing Data in Repeated Measures ANOVA
Methods:

Listwise Deletion: Removing subjects with any missing data.
Pairwise Deletion: Using available data for each pairwise comparison.
Imputation: Replacing missing data with estimated values.
Mean Imputation: Replacing with the mean value of the group.
Last Observation Carried Forward (LOCF): Using the last observed value.
Consequences:

Listwise Deletion: Loss of data, reduced statistical power.
Pairwise Deletion: Can lead to biased results if missingness is not random.
Imputation: Risk of introducing bias, especially if assumptions about the missing data mechanism are incorrect.
Q8. Common Post-Hoc Tests after ANOVA
Tukey's HSD: Used for pairwise comparisons when group sizes are equal.
Bonferroni Correction: Adjusts significance levels for multiple comparisons.
Scheffé Test: More conservative, used for complex comparisons.
Example:

If ANOVA shows significant differences among three diets, a post-hoc test like Tukey's HSD can identify which specific diets differ.