In [1]:
# Q1: Assumptions of ANOVA
# The assumptions required to use ANOVA include:

# Independence of Observations: The samples must be independent of each other.
# Normality: The data should be approximately normally distributed.
# Homogeneity of Variance: The variances among the groups should be approximately equal.
# Examples of Violations:

# Independence: If the data points within a group are correlated, it violates the independence assumption.
# Normality: If the data is heavily skewed, it violates the normality assumption.
# Homogeneity of Variance: If one group has much larger variance than another, it violates the homogeneity of variance assumption.
# Q2: Types of ANOVA
# One-Way ANOVA: Used when comparing the means of three or more independent groups based on one factor.
# Two-Way ANOVA: Used when comparing the means based on two factors. It can also examine the interaction between the two factors.
# Repeated Measures ANOVA: Used when the same subjects are used for each treatment (i.e., repeated measurements).
# Q3: Partitioning of Variance in ANOVA
# In ANOVA, the total variance is partitioned into:

# Total Sum of Squares (SST): Total variability in the data.
# Explained Sum of Squares (SSE): Variability explained by the group means.
# Residual Sum of Squares (SSR): Variability within the groups (i.e., unexplained).
# Understanding the partitioning of variance helps in identifying how much of the variability in the data is explained by the factors being studied.

# Q4: Calculating SST, SSE, and SSR in One-Way ANOVA

import numpy as np
import pandas as pd
from scipy import stats

# Example data
data = {
    'Diet': np.repeat(['A', 'B', 'C'], 50),
    'Weight_Loss': np.random.randn(150)  # Random data for illustration
}

df = pd.DataFrame(data)

# One-way ANOVA calculation
anova_results = stats.f_oneway(df[df['Diet'] == 'A']['Weight_Loss'],
                               df[df['Diet'] == 'B']['Weight_Loss'],
                               df[df['Diet'] == 'C']['Weight_Loss'])

anova_results.statistic, anova_results.pvalue


(0.10809733005570983, 0.8976115129829062)

In [2]:
# Q5: Main Effects and Interaction Effects in Two-Way ANOVA
import statsmodels.api as sm
from statsmodels.formula.api import ols

# Example data
data = {
    'Program': np.repeat(['A', 'B', 'C'], 30),
    'Experience': np.tile(['Novice', 'Experienced'], 45),
    'Time': np.random.randn(90)  # Random data for illustration
}

df = pd.DataFrame(data)

# Two-way ANOVA calculation
model = ols('Time ~ C(Program) * C(Experience)', data=df).fit()
anova_table = sm.stats.anova_lm(model, typ=2)

anova_table


Unnamed: 0,sum_sq,df,F,PR(>F)
C(Program),1.715942,2.0,0.837566,0.436344
C(Experience),0.299262,1.0,0.292144,0.59028
C(Program):C(Experience),0.516912,2.0,0.252309,0.777591
Residual,86.046417,84.0,,


In [4]:

# Q6: Interpretation of F-Statistic and P-Value
# Given:

# F-statistic: 5.23
# P-value: 0.02
# Conclusion:
# Since the p-value (0.02) is less than the significance level (0.05), we reject the null hypothesis. This indicates that there are significant differences between the group means.

# Q7: Handling Missing Data in Repeated Measures ANOVA
# In repeated measures ANOVA, missing data can be handled using:

# Listwise Deletion: Removing all data for any participant with missing values.
# Mean Substitution: Replacing missing values with the mean of the non-missing values.
# Multiple Imputation: Using statistical methods to estimate missing values.
# Potential Consequences:

# Listwise Deletion: Can reduce the sample size and power of the test.
# Mean Substitution: Can underestimate the variability.
# Multiple Imputation: More complex but provides better estimates.
# Q8: Common Post-Hoc Tests
# Tukey's HSD: Used when comparing all pairwise differences.
# Bonferroni Correction: Adjusts the significance level to control for multiple comparisons.
# Scheffé Test: More conservative, used when making many comparisons.
# Example: After finding a significant F-statistic in an ANOVA, a post-hoc test like Tukey's HSD can be used to determine which specific groups differ.

# Q9: One-Way ANOVA for Weight Loss Diets
# Example data
data = {
    'Diet': np.repeat(['A', 'B', 'C'], 50),
    'Weight_Loss': np.random.randn(150)  # Random data for illustration
}

df = pd.DataFrame(data)

# One-way ANOVA calculation
anova_results = stats.f_oneway(df[df['Diet'] == 'A']['Weight_Loss'],
                               df[df['Diet'] == 'B']['Weight_Loss'],
                               df[df['Diet'] == 'C']['Weight_Loss'])

anova_results.statistic, anova_results.pvalue


(0.44582906451384186, 0.6411560688892415)

In [None]:
# Q10: Two-Way ANOVA for Task Completion Time