Q1. Explain the assumptions required to use ANOVA and provide examples of violations that could impact the validity of the results.

Ans:

Assumptions of ANOVA

Independence of Observations:

Assumption: The observations within each group and between groups are independent of each other. This means that the value of one observation does not influence or is not related to the value of another observation.

Violation Example: If participants in a study are measured multiple times or if there is a clustering effect (e.g., students within the same classroom), the independence assumption may be violated.
Normality:

Assumption: The data in each group should be approximately normally distributed. ANOVA is robust to minor deviations from normality, especially with larger sample sizes, but severe deviations can affect the results.

Violation Example: If the data are highly skewed or have heavy tails, the assumption of normality is violated. For instance, if the reaction times in a study have extreme outliers, the normality assumption may be compromised.
Homogeneity of Variances (Homoscedasticity):

Assumption: The variances among the groups should be approximately equal. This means that the spread or dispersion of data points around the mean is similar across all groups.

Violation Example: If one group has a much larger variance than the others (e.g., one treatment group shows much more variability in response compared to others), the assumption of homogeneity of variances is violated. This can be tested using Levene's test or Bartlett's test.


Examples of Violations and Their Impact

Independence of Observations:

Example: In a clinical trial, if patients are treated in groups and their responses are correlated due to the same environment or treatment, the independence assumption is violated. This can lead to inflated Type I error rates or reduced power of the test.
Normality:

Example: If test scores are not normally distributed but are instead heavily skewed (e.g., many students score very low or very high), the results of ANOVA might be misleading. Transformations (e.g., log or square root) or non-parametric alternatives (e.g., Kruskal-Wallis test) might be needed.
Homogeneity of Variances:

Example: In an educational study comparing test scores among different teaching methods, if one method produces very diverse outcomes while others are more consistent, the assumption of equal variances is violated. This can lead to inaccurate F-statistic calculations and potentially incorrect conclusions.

Q2. What are the three types of ANOVA, and in what situations would each be used?

Ans:

1. One-Way ANOVA
Purpose: Tests for differences in the means of three or more independent groups based on one factor.

When to Use:

Single Factor: When you have one independent variable (factor) with three or more levels (groups).
Example: Comparing the effectiveness of three different diets (low-carb, low-fat, Mediterranean) on weight loss.


2. Two-Way ANOVA
Purpose: Tests for differences in the means of groups based on two independent factors and evaluates the interaction between these factors.

When to Use:

Two Factors: When you have two independent variables and want to understand their individual effects and the interaction effect on a dependent variable.
Example: Studying the effect of diet type (low-carb vs. low-fat) and exercise level (high vs. low) on weight loss.


3. Repeated Measures ANOVA
Purpose: Tests for differences in the means of three or more related groups (i.e., measurements taken on the same subjects at different times or under different conditions).

When to Use:

Related Groups: When the same subjects are measured multiple times or under different conditions.
Example: Measuring the blood pressure of patients before, during, and after a treatment to assess how treatment affects blood pressure over time.

Q3. What is the partitioning of variance in ANOVA, and why is it important to understand this concept?

Ans:

Partitioning of Variance
Total Variance: The overall variability in the data.

Between-Group Variance: Measures how much the means of different groups differ from each other. It shows the effect of the different treatments or conditions.

Within-Group Variance: Measures the variability within each group. It shows how much individual observations in the same group differ from their group's mean.

Importance
Identifying Sources of Variability: Helps determine how much of the total variability is due to differences between groups versus differences within groups.

Assessing Factors' Effectiveness: Shows whether the differences between group means are significant compared to the variability within groups.

Interpreting Results: The F-ratio, which compares between-group variance to within-group variance, helps to decide if the group means are significantly different.

Guiding Further Analysis: Helps in planning additional tests if significant differences are found, to find out which specific groups differ.

Q4. How would you calculate the total sum of squares (SST), explained sum of squares (SSE), and residual sum of squares (SSR) in a one-way ANOVA using Python?

Ans:

Total Sum of Squares (SST): Measures the total variability in the data.
Explained Sum of Squares (SSE): Measures the variability explained by the group differences.
Residual Sum of Squares (SSR): Measures the variability within groups, not explained by the group differences.

Explained with a sample code below in next cell.

In [1]:
import numpy as np
import pandas as pd
from scipy import stats

# Sample data
data = {
    'Group': ['A']*5 + ['B']*5 + ['C']*5,
    'Value': [23, 20, 22, 24, 25, 30, 29, 31, 32, 28, 35, 34, 33, 36, 37]
}

# Create a DataFrame
df = pd.DataFrame(data)

# Calculate group means
group_means = df.groupby('Group')['Value'].mean()

# Calculate overall mean
overall_mean = df['Value'].mean()

# Calculate SST (Total Sum of Squares)
SST = np.sum((df['Value'] - overall_mean) ** 2)

# Calculate SSE (Explained Sum of Squares)
SSE = np.sum(df.groupby('Group').size() * (group_means - overall_mean) ** 2)

# Calculate SSR (Residual Sum of Squares)
SSR = np.sum((df['Value'] - df.groupby('Group')['Value'].transform('mean')) ** 2)

print("Total Sum of Squares (SST):", SST)
print("Explained Sum of Squares (SSE):", SSE)
print("Residual Sum of Squares (SSR):", SSR)


Total Sum of Squares (SST): 410.93333333333334
Explained Sum of Squares (SSE): 376.1333333333333
Residual Sum of Squares (SSR): 34.8


Q5. In a two-way ANOVA, how would you calculate the main effects and interaction effects using Python?

Ans:

Main Effects:

Main Effect of Factor A: Measures how the levels of Factor A (e.g., different teaching methods) impact the dependent variable. It is calculated by comparing the average outcomes across the levels of Factor A, averaging over all levels of Factor B.
Main Effect of Factor B: Measures how the levels of Factor B (e.g., different study times) impact the dependent variable. It is calculated by comparing the average outcomes across the levels of Factor B, averaging over all levels of Factor A.

Interaction Effect:

Interaction Effect of Factors A and B: Measures how the effect of one factor (e.g., teaching method) changes depending on the level of the other factor (e.g., study time). It is calculated by examining whether the differences in the outcomes between levels of Factor A are consistent across the levels of Factor B and vice versa.
To calculate these effects, you use the sum of squares for each factor and their interaction, which can be obtained through an ANOVA table. The main effects and interaction effects are assessed by comparing their corresponding mean squares to the residual mean square (error variance) to determine statistical significance.





Q6. Suppose you conducted a one-way ANOVA and obtained an F-statistic of 5.23 and a p-value of 0.02.
What can you conclude about the differences between the groups, and how would you interpret these
results?

Ans:

Statistical Significance:

P-Value: The p-value of 0.02 is less than the common significance level of 0.05. This indicates that there is a statistically significant difference between the means of at least two of the groups.
Conclusion: Since the p-value is below 0.05, you reject the null hypothesis, which states that all group means are equal. This suggests that at least one group mean is significantly different from the others.
F-Statistic:

F-Statistic: The F-statistic of 5.23 quantifies how much the group means differ from each other relative to the variability within the groups. A higher F-statistic generally indicates a larger difference between group means compared to within-group variability.

Q7. In a repeated measures ANOVA, how would you handle missing data, and what are the potential
consequences of using different methods to handle missing data?

Ans:

Handling Missing Data

Complete Case Analysis (Listwise Deletion)

Description: Only include subjects with complete data for all measurements. Any subject with missing data is excluded from the analysis.
Consequences:
Pros: Simple and straightforward; maintains the integrity of the dataset for analysis.
Cons: Can lead to reduced sample size and loss of statistical power. May introduce bias if the missing data is not missing completely at random (MCAR).

Pairwise Deletion

Description: Use all available data for each analysis. For each pair of variables, use only the cases with non-missing values for those variables.
Consequences:
Pros: Utilizes more of the available data than complete case analysis, potentially preserving statistical power.
Cons: Results can vary depending on the amount and pattern of missing data. May complicate interpretation and affect the consistency of results.

Imputation Methods

Description: Replace missing data with estimated values based on other available data. Common methods include mean imputation, regression imputation, and multiple imputation.
Consequences:
Pros: Can provide a complete dataset and preserve sample size, improving statistical power. Multiple imputation accounts for uncertainty by creating several imputed datasets.
Cons: Imputation can introduce its own biases and inaccuracies. Simple imputation methods (e.g., mean imputation) may underestimate variability and reduce the reliability of statistical tests.

Model-Based Approaches

Description: Use statistical models that handle missing data explicitly, such as maximum likelihood estimation or Bayesian methods.
Consequences:
Pros: Often provide robust results and incorporate the uncertainty of missing data. Methods like maximum likelihood can handle various missing data mechanisms.
Cons: More complex to implement and interpret. Requires careful model specification and assumptions.
Potential Consequences of Different Methods
Bias: Depending on the missing data mechanism (Missing Completely at Random, Missing at Random, Missing Not at Random), different methods may introduce bias. For instance, listwise deletion may introduce bias if the missing data is related to the outcome.

Statistical Power: Methods that reduce sample size, such as listwise deletion, can lower the power of statistical tests and increase Type II errors. Imputation and model-based approaches can help retain sample size and power.

Validity of Results: Imputation methods and model-based approaches can help maintain the validity of results, provided they are applied correctly and the assumptions are met. Incorrect imputation or model specification can lead to misleading conclusions.

Complexity: Advanced methods like multiple imputation or model-based approaches are more complex to implement and require careful consideration of assumptions and data patterns.

Q8. What are some common post-hoc tests used after ANOVA, and when would you use each one? Provide
an example of a situation where a post-hoc test might be necessary.

Ans:

Tukey’s Honestly Significant Difference (HSD) Test

When to Use: When you want to compare all possible pairs of group means and control for Type I error rate. It is used when the assumption of equal variances (homogeneity of variances) is met.
Example: After finding significant differences among three diets (low-carb, low-fat, Mediterranean) in weight loss, Tukey's HSD can identify which specific diet pairs (e.g., low-carb vs. Mediterranean) show significant differences.

Bonferroni Correction

When to Use: When you have multiple comparisons and want to control the family-wise error rate. It adjusts the significance level for each test to reduce the likelihood of Type I errors.
Example: If you test the effects of four different teaching methods on student performance, Bonferroni correction helps control for the increased risk of false positives due to multiple comparisons.

Scheffé’s Test

When to Use: When you need a more conservative test for comparing multiple group means, particularly when dealing with unequal sample sizes or variances. It is useful for complex comparisons (e.g., comparing combinations of groups).
Example: If you are comparing several different therapeutic interventions and their combinations on recovery rates, Scheffé’s test can handle complex comparisons and is less likely to produce false positives.

Dunnett’s Test

When to Use: When comparing multiple treatment groups to a single control group. It is specifically designed for comparing each treatment group against a control.
Example: In a study evaluating the effectiveness of new medications compared to a standard treatment, Dunnett’s test helps determine which new medications are significantly different from the standard treatment.

Newman-Keuls Test

When to Use: When you want a less conservative test compared to Tukey’s HSD. It is sequential and more powerful but may increase the risk of Type I errors.
Example: In a study with multiple treatment levels, Newman-Keuls test can identify differences between groups with greater sensitivity, but with a slightly higher risk of Type I errors.
Example of Using a Post-Hoc Test
Scenario: Suppose you conduct a one-way ANOVA to examine the effect of three different types of exercise (yoga, running, swimming) on stress levels. The ANOVA result is significant, indicating that there are differences in stress levels among the three exercise types.

Need for Post-Hoc Test:

Since the ANOVA only tells you that not all group means are equal but not which specific groups differ, you need a post-hoc test.
Choice: If you want to compare each pair of exercise types while controlling for Type I error, you might use Tukey’s HSD.
Action: Perform Tukey’s HSD to find out if the difference in stress levels between yoga and running is significant, or if yoga differs from swimming, and so on.

Q9. A researcher wants to compare the mean weight loss of three diets: A, B, and C. They collect data from
50 participants who were randomly assigned to one of the diets. Conduct a one-way ANOVA using Python
to determine if there are any significant differences between the mean weight loss of the three diets.
Report the F-statistic and p-value, and interpret the results.

In [2]:
import numpy as np
from scipy import stats


diet_A = np.random.normal(loc=5, scale=1, size=17)  # Sample data for Diet A
diet_B = np.random.normal(loc=6, scale=1, size=16)  # Sample data for Diet B
diet_C = np.random.normal(loc=5.5, scale=1, size=17)  # Sample data for Diet C

F_statistic, p_value = stats.f_oneway(diet_A, diet_B, diet_C)

print("F-statistic:", F_statistic)
print("P-value:", p_value)


F-statistic: 2.794572222829868
P-value: 0.07132490182725253


Interpretation

F-Statistic: The F-statistic of 2.7946 represents the ratio of the variance between the group means to the variance within the groups. This value indicates how much the group means differ relative to the variability within each group.

P-Value: The p-value of 0.0713 is greater than the common significance level of 0.05.

Q10. A company wants to know if there are any significant differences in the average time it takes to
complete a task using three different software programs: Program A, Program B, and Program C. They
randomly assign 30 employees to one of the programs and record the time it takes each employee to
complete the task. Conduct a two-way ANOVA using Python to determine if there are any main effects or
interaction effects between the software programs and employee experience level (novice vs.
experienced). Report the F-statistics and p-values, and interpret the results.

In [4]:
import pandas as pd
import numpy as np
import statsmodels.api as sm
from statsmodels.formula.api import ols

np.random.seed(0)

# Generate data
programs = ['A', 'B', 'C']
experience_levels = ['Novice', 'Experienced']
n_per_group = 10

data = {
    'Time': np.concatenate([
        np.random.normal(loc=30, scale=5, size=n_per_group),
        np.random.normal(loc=25, scale=5, size=n_per_group),
        np.random.normal(loc=28, scale=5, size=n_per_group),
        np.random.normal(loc=22, scale=5, size=n_per_group),
        np.random.normal(loc=27, scale=5, size=n_per_group),
        np.random.normal(loc=21, scale=5, size=n_per_group)
    ]),
    'Program': (['A']*n_per_group + ['A']*n_per_group +
                ['B']*n_per_group + ['B']*n_per_group +
                ['C']*n_per_group + ['C']*n_per_group),
    'Experience': (['Novice']*n_per_group*3 + ['Experienced']*n_per_group*3)
}

df = pd.DataFrame(data)

model = ols('Time ~ C(Program) * C(Experience)', data=df).fit()
anova_table = sm.stats.anova_lm(model, typ=2)

print(anova_table)


                               sum_sq    df         F    PR(>F)
C(Program)                   0.149574   2.0  0.002406  0.961057
C(Experience)                     NaN   1.0       NaN       NaN
C(Program):C(Experience)   216.995689   2.0  3.489854  0.066981
Residual                  1741.012312  56.0       NaN       NaN


  F /= J


Interpretation:
Effect of Program (C(Program)):

Sum of Squares (sum_sq): 0.149574
Degrees of Freedom (df): 2
F-Statistic (F): 0.002406
P-Value (PR(>F)): 0.961057
Interpretation: The p-value for the effect of the software program is 0.961, which is much greater than the typical significance level of 0.05. This indicates that there are no significant differences in task completion times among the three software programs. The F-statistic is very low, which suggests that the variance between the means of different programs is negligible compared to the variance within the groups.

Effect of Experience (C(Experience)):

Sum of Squares (sum_sq): NaN
Degrees of Freedom (df): 1
F-Statistic (F): NaN
P-Value (PR(>F)): NaN
Interpretation: The values for C(Experience) are NaN, which typically occurs when there is no variation in the Experience variable across the groups, or if the Experience variable does not contribute any additional variation after accounting for the other factors. This indicates that the model might not be properly capturing the effect of experience, or there may be an issue with the data related to experience.

Interaction Effect (C(Program)
(Experience)):

Sum of Squares (sum_sq): 216.995689
Degrees of Freedom (df): 2
F-Statistic (F): 3.489854
P-Value (PR(>F)): 0.066981
Interpretation: The p-value for the interaction effect is 0.067, which is slightly above the 0.05 significance level. This suggests that the interaction between software programs and experience levels is marginally significant. The F-statistic indicates that there is some evidence that the effect of software programs on task completion time may depend on the level of experience of the employees, but the effect is not statistically significant at the 5% level. It may be worth investigating further, especially if practical significance is of concern.

Residual (Error) Terms:

Sum of Squares (sum_sq): 1741.012312
Degrees of Freedom (df): 56
Interpretation: The residual sum of squares represents the variation not explained by the model. The degrees of freedom for residuals reflect the number of observations minus the number of parameters estimated.

Q11. An educational researcher is interested in whether a new teaching method improves student test
scores. They randomly assign 100 students to either the control group (traditional teaching method) or the
experimental group (new teaching method) and administer a test at the end of the semester. Conduct a
two-sample t-test using Python to determine if there are any significant differences in test scores
between the two groups. If the results are significant, follow up with a post-hoc test to determine which
group(s) differ significantly from each other.

In [5]:
import numpy as np
from scipy import stats

np.random.seed(0)
control_group_scores = np.random.normal(loc=75, scale=10, size=50)
experimental_group_scores = np.random.normal(loc=78, scale=10, size=50)

t_statistic, p_value = stats.ttest_ind(control_group_scores, experimental_group_scores)

print("T-statistic:", t_statistic)
print("P-value:", p_value)


T-statistic: -0.6823599641573114
P-value: 0.4966210864762014


Interpretation:

T-Statistic:

The t-statistic of -0.6824 indicates the direction and magnitude of the difference between the means of the two groups. The negative sign suggests that the mean test score of the experimental group is lower than that of the control group, but this difference is relatively small.
P-Value:

The p-value of 0.4966 is much greater than the common significance level of 0.05.

Q12. A researcher wants to know if there are any significant differences in the average daily sales of three
retail stores: Store A, Store B, and Store C. They randomly select 30 days and record the sales for each store
on those days. Conduct a repeated measures ANOVA using Python to determine if there are any

significant differences in sales between the three stores. If the results are significant, follow up with a post-
hoc test to determine which store(s) differ significantly from each other.

In [10]:
import pandas as pd
import numpy as np
import statsmodels.api as sm
from statsmodels.stats.anova import AnovaRM
import pingouin as pg

# Generate example data
np.random.seed(0)
days = 30
sales_a = np.random.normal(loc=1000, scale=100, size=days)
sales_b = np.random.normal(loc=1100, scale=100, size=days)
sales_c = np.random.normal(loc=1050, scale=100, size=days)

# Create a DataFrame
data = {
    'Day': np.tile(np.arange(1, days + 1), 3),
    'Store': ['A']*days + ['B']*days + ['C']*days,
    'Sales': np.concatenate([sales_a, sales_b, sales_c])
}

df = pd.DataFrame(data)

# Perform repeated measures ANOVA
aovrm = AnovaRM(df, 'Sales', 'Day', within=['Store'])
anova_results = aovrm.fit()

# Print ANOVA results summary
print(anova_results)

# Extract p-value from ANOVA results
p_value = anova_results.anova_table['Pr > F']['Store']
print("P-value:", p_value)

# If significant, perform post-hoc test
if p_value < 0.05:
    posthoc = pg.pairwise_tukey(df, dv='Sales', between='Store')
    print(posthoc)


               Anova
      F Value Num DF  Den DF Pr > F
-----------------------------------
Store  0.9395 2.0000 58.0000 0.3967

P-value: 0.39667683341525484


Interpretation:
F-Statistic:

The F-value of 0.9395 measures the ratio of variance between the store means to the variance within the stores. A lower F-value suggests that the between-group variance is small relative to the within-group variance.
P-Value:

The p-value of 0.3967 is much greater than the typical significance level of 0.05.