In [None]:
Q1. Assumptions for using ANOVA:

Independence: Observations within and between groups are independent.
Normality: The dependent variable follows a normal distribution within each group.
Homogeneity of Variance: Variances of the dependent variable are equal across all groups.
Examples of violations:

Violation of independence: Participants in one group influencing those in another group.
Non-normality: Skewed or heavily-tailed distributions within groups.
Violation of homogeneity of variance: One group having significantly larger variance than others.

In [None]:
Q2. Three types of ANOVA:

One-way ANOVA: Compares means of three or more groups on one independent variable.
Two-way ANOVA: Examines the influence of two independent variables on one dependent variable.
Repeated Measures ANOVA: Analyzes within-subject changes across different conditions or time points.

In [None]:
Q3. Partitioning of variance:
ANOVA partitions the total variance of the dependent variable into components attributable to different sources like treatment effects and random error.
Understanding this helps in assessing the relative importance of various factors influencing the outcome.

In [None]:
Q4. Calculation of SST, SSE, and SSR in one-way ANOVA using Python:
SST = sum((x - grand_mean)²)
SSE = sum((x - group_mean)²)
SSR = SST - SSE

In [None]:
Q5. Calculation of main effects and interaction effects in two-way ANOVA using Python:
Main effects: Differences in group means for each independent variable.
Interaction effects: The combined effect of two independent variables.

In [None]:
Q6. Interpretation of one-way ANOVA results:
With an F-statistic of 5.23 and a p-value of 0.02, we reject the null hypothesis of equal group means. This suggests that there are significant differences between at least two groups.
Further post-hoc tests can determine which groups differ significantly.

In [None]:
Q7. Handling missing data in repeated measures ANOVA:
Missing data can be handled through methods like imputation or using statistical techniques that accommodate missingness like Mixed Effects Models. Consequences include biased estimates if not handled properly.

In [None]:
Q8. Common post-hoc tests:

Tukey's HSD: Controls Type I error rate for all pairwise comparisons.
Bonferroni correction: Adjusts significance threshold for multiple comparisons.
Scheffe's method: Used for complex comparisons.
Post-hoc tests are necessary when ANOVA indicates significant differences among groups, but it doesn't specify which groups are different.

In [None]:
Q9. One-way ANOVA in Python:

In [1]:
import scipy.stats as stats

# Data for weight loss
diet_A = [2, 3, 4, 2, 3]
diet_B = [3, 4, 5, 4, 5]
diet_C = [1, 2, 1, 3, 2]

# Perform one-way ANOVA
f_statistic, p_value = stats.f_oneway(diet_A, diet_B, diet_C)
print("F-statistic:", f_statistic)
print("p-value:", p_value)


F-statistic: 10.380952380952385
p-value: 0.0024147524494592763


In [None]:
Q10: Two-Way ANOVA

In [14]:
import pandas as pd
import statsmodels.api as sm

# Example data for completion times
completion_times = [10, 12, 15, 11, 13, 14, 18, 20, 22]  # Replace with your data

# Example data for program and experience_level
programs = ["A", "A", "B", "B", "C", "C", "A", "B", "C"]
experience_levels = ["novice", "experienced"] * 3

# Check if all arrays have the same length
if len(completion_times) == len(programs) == len(experience_levels):
    # Create the DataFrame
    data = pd.DataFrame({
        "program": programs,
        "experience_level": experience_levels,
        "completion_time": completion_times
    })

    # Fit the model
    model = sm.ols(formula="completion_time ~ program + experience_level + program:experience_level", data=data)
    results = model.fit()

    # Print the summary
    print(results.summary())
else:
    print("Lengths of arrays do not match.")


Lengths of arrays do not match.


In [None]:
Interpretation:

Main Effects:
Examine p-values for program and experience_level. If p < 0.05, there's a significant main effect, indicating program or experience level impacts completion time.
Look at mean completion times by program and experience level to understand the nature of the effect.
Interaction Effect:
Check the p-value for program:experience_level. If p < 0.05, there's an interaction effect, meaning the effect of one factor depends on the level of the other.
Explore the interaction using visualizations or further comparisons.

In [None]:
Q11: Two-Sample t-test and Post-hoc Test

In [16]:
import pandas as pd
import scipy.stats as stats

# Example data for test scores
experimental_scores = [85, 90, 88, 92, 87]  # Replace with your experimental group scores
control_scores = [78, 82, 80, 85, 79]  # Replace with your control group scores

# Check if both arrays have the same length
if len(experimental_scores) == len(control_scores):
    # Perform the t-test
    t_statistic, p_value = stats.ttest_ind(experimental_scores, control_scores)
    print("t-statistic:", t_statistic)
    print("p-value:", p_value)
else:
    print("Lengths of arrays do not match.")


t-statistic: 4.387862045841161
p-value: 0.0023241881225952343


In [None]:
Post-hoc Test (if p < 0.05):

Choose a suitable post-hoc test (e.g., Tukey's HSD, Games-Howell) based on assumptions and group sizes.
Implement the chosen test to identify significantly different groups.

In [None]:
Q12: Repeated Measures ANOVA and Post-hoc Test

In [25]:
import pandas as pd
from statsmodels.stats.anova import AnovaRM
import warnings
warnings.filterwarnings('ignore')
from statsmodels.stats.multicomp import pairwise_tukeyhsd

# Example data
store_data = ['A'] * 9 + ['B'] * 9 + ['C'] * 9  # 30 days, 3 stores
day_data = list(range(1, 10)) * 3  # 30 days, repeated for each store
sales_data = [100, 120, 110, 80, 85, 90, 150, 140, 130] * 3  # Example sales data, repeated for each store

# Create DataFrame
sales_df = pd.DataFrame({
    'store': store_data,
    'day': day_data,
    'sales': sales_data
})

# Perform repeated measures ANOVA
rm_anova = AnovaRM(data=sales_df, depvar='sales', within=['store'], subject='day').fit()
print(rm_anova.summary())

# If the overall repeated measures ANOVA is significant, perform post-hoc Tukey HSD test
if rm_anova.pvalues['store'] < 0.05:
    posthoc = pairwise_tukeyhsd(sales_df['sales'], sales_df['store'], alpha=0.05)
    print(posthoc)


               Anova
      F Value Num DF  Den DF Pr > F
-----------------------------------
store  0.2286 2.0000 16.0000 0.7982



AttributeError: 'AnovaResults' object has no attribute 'pvalues'