In [1]:
#ans 1
ANOVA (Analysis of Variance) relies on several key assumptions:

Independence of Observations: The data from different groups must be independent of each other.

Violation Example: Data collected from the same group of people under different conditions without considering the dependency between repeated measures.
Normality: The data within each group should be approximately normally distributed.

Violation Example: If the data in one or more groups is highly skewed or has outliers, the normality assumption is violated.
Homogeneity of Variances (Homoscedasticity): The variance among the groups should be approximately equal.

Violation Example: If one group has much larger variance than the others, it could violate this assumption, affecting the validity of the ANOVA results.
Violations of these assumptions can lead to inaccurate F-statistics and p-values, resulting in incorrect conclusions.

In [None]:
#ans 2
One-Way ANOVA: Used when comparing the means of three or more independent groups based on one factor.

Situation: Comparing test scores among students from different teaching methods.
Two-Way ANOVA: Used when comparing the means based on two independent factors.

Situation: Evaluating the effect of teaching method and study time on test scores.
Repeated Measures ANOVA: Used when the same subjects are measured under different conditions.

Situation: Comparing the effect of a drug at different time points on the same group of patients.

In [None]:
#ans 3
Partitioning of variance in ANOVA refers to breaking down the total variability in the data into components:

Total Sum of Squares (SST): Total variability in the data.
Explained Sum of Squares (SSE): Variability explained by the model (differences between group means).
Residual Sum of Squares (SSR): Variability within the groups (unexplained by the model).
Understanding this concept is crucial because it allows us to determine how much of the total variability is explained by the factors being studied, which is central to the significance testing in ANOVA.

In [2]:
#ans 4
import numpy as np
import pandas as pd
from scipy.stats import f_oneway

# Example dataset
data = {
    'Group': ['A', 'A', 'A', 'B', 'B', 'B', 'C', 'C', 'C'],
    'Score': [23, 25, 27, 31, 33, 35, 40, 42, 44]
}
df = pd.DataFrame(data)

# Grand Mean
grand_mean = np.mean(df['Score'])

# SST: Total Sum of Squares
sst = np.sum((df['Score'] - grand_mean) ** 2)

# Group Means
group_means = df.groupby('Group')['Score'].mean()

# SSE: Explained Sum of Squares
sse = np.sum(df['Group'].apply(lambda x: (group_means[x] - grand_mean) ** 2) * df.groupby('Group').size())

# SSR: Residual Sum of Squares
ssr = sst - sse

print(f"SST: {sst}, SSE: {sse}, SSR: {ssr}")


SST: 458.0, SSE: 0.0, SSR: 458.0


In [3]:
#ans 5
import statsmodels.api as sm
from statsmodels.formula.api import ols

# Example dataset
data = {
    'Factor1': ['A', 'A', 'B', 'B', 'C', 'C', 'A', 'A', 'B', 'B', 'C', 'C'],
    'Factor2': ['X', 'Y', 'X', 'Y', 'X', 'Y', 'X', 'Y', 'X', 'Y', 'X', 'Y'],
    'Score': [23, 25, 27, 31, 33, 35, 21, 29, 32, 38, 40, 42]
}
df = pd.DataFrame(data)

# ANOVA model
model = ols('Score ~ C(Factor1) + C(Factor2) + C(Factor1):C(Factor2)', data=df).fit()
anova_table = sm.stats.anova_lm(model, typ=2)

print(anova_table)


                           sum_sq   df          F    PR(>F)
C(Factor1)             340.666667  2.0  10.645833  0.010626
C(Factor2)              48.000000  1.0   3.000000  0.133975
C(Factor1):C(Factor2)    6.000000  2.0   0.187500  0.833706
Residual                96.000000  6.0        NaN       NaN


In [None]:
#ans 6
Given an F-statistic of 5.23 and a p-value of 0.02 in a one-way ANOVA:

Conclusion: The p-value is 0.02, which is less than the typical significance level of 0.05. This suggests that there is a statistically significant difference between the group means.
Interpretation: We reject the null hypothesis, which states that all group means are equal, and conclude that at least one group mean is different from the others.

In [None]:
#ans 7
Handling missing data in repeated measures ANOVA can be challenging:

Methods:

Listwise Deletion: Remove any cases with missing data, but this can reduce the sample size.
Imputation: Estimate missing values using methods like mean substitution, regression, or multiple imputations.
Mixed-Effects Models: These can handle missing data more effectively without needing imputation.
Consequences: Different methods can lead to different results. Imputation might introduce bias if not done carefully, while listwise deletion reduces statistical power.

In [None]:
#ans 8
Common post-hoc tests include:

Tukey's HSD (Honestly Significant Difference): Used to compare all pairs of group means while controlling for Type I error.

Example: Used after a one-way ANOVA with three or more groups to determine which specific group means are significantly different from each other.
Bonferroni Correction: Adjusts the significance level when performing multiple comparisons to reduce the risk of Type I error.

Example: Used when multiple independent t-tests are conducted after ANOVA.
Scheffé Test: A more conservative post-hoc test, useful when making complex comparisons among means.

Example: Applied in exploratory analyses when many comparisons are made among group means.
A post-hoc test might be necessary when ANOVA shows a significant difference but doesn’t indicate which specific groups differ.

In [4]:
#ans 9
import pandas as pd
from scipy.stats import f_oneway

# Example data: Weight loss in pounds for 50 participants across three diets
data = {
    'Diet': ['A']*17 + ['B']*17 + ['C']*16,
    'WeightLoss': [4.5, 5.2, 6.1, 3.9, 5.6, 4.8, 5.3, 5.1, 6.0, 4.7, 5.4, 6.2, 5.9, 5.7, 4.6, 4.9, 5.0,
                   3.2, 3.8, 4.5, 4.7, 3.6, 3.9, 4.0, 4.3, 4.4, 4.2, 3.7, 3.9, 4.1, 4.2, 3.5, 3.6, 3.7, 
                   7.1, 7.3, 6.9, 7.0, 7.4, 6.8, 7.2, 6.7, 7.5, 6.6, 7.0, 7.1, 6.8, 7.3, 7.4, 6.7]
}
df = pd.DataFrame(data)

# Performing one-way ANOVA
f_statistic, p_value = f_oneway(df[df['Diet'] == 'A']['WeightLoss'],
                                df[df['Diet'] == 'B']['WeightLoss'],
                                df[df['Diet'] == 'C']['WeightLoss'])

print(f"F-statistic: {f_statistic}, p-value: {p_value}")


F-statistic: 183.22393063583797, p-value: 6.432977662891749e-23


In [5]:
#ans 10
import statsmodels.api as sm
from statsmodels.formula.api import ols

# Example data: Time to complete the task in minutes
data = {
    'Program': ['A']*10 + ['B']*10 + ['C']*10 + ['A']*10 + ['B']*10 + ['C']*10,
    'Experience': ['Novice']*15 + ['Experienced']*15,
    'Time': [22, 21, 23, 24, 25, 22, 20, 21, 23, 24, 28, 27, 29, 30, 28, 25, 24, 26, 27, 25, 30, 31, 29, 32, 30, 
             26, 27, 28, 29, 26]
}
df = pd.DataFrame(data)

# Performing two-way ANOVA
model = ols('Time ~ C(Program) + C(Experience) + C(Program):C(Experience)', data=df).fit()
anova_table = sm.stats.anova_lm(model, typ=2)

print(anova_table)


ValueError: All arrays must be of the same length

In [8]:
#ans 11
from scipy.stats import ttest_ind

# Example data: Test scores for control and experimental groups
data = {
    'Group': ['Control']*50 + ['Experimental']*50,
    'TestScore': [72, 74, 69, 71, 73, 75, 68, 70, 74, 72, 70, 69, 73, 71, 74, 72, 70, 68, 73, 74, 69, 71, 72, 70, 68, 74, 
                 72, 73, 71, 75, 85, 83, 84, 82, 85, 87, 81, 83, 86, 85, 82, 83, 86, 84, 85, 87, 82, 81, 83, 84, 88]  # Added one more score
}

df = pd.DataFrame(data)

# Performing two-sample t-test
t_statistic, p_value = ttest_ind(df[df['Group'] == 'Control']['TestScore'],
                                 df[df['Group'] == 'Experimental']['TestScore'])


print(f"T-statistic: {t_statistic}, p-value: {p_value}")


ValueError: All arrays must be of the same length

In [7]:
#ans 12
from statsmodels.stats.anova import AnovaRM

# Example data: Sales in dollars for 30 days across three stores
data = {
    'Day': list(range(1, 31)) * 3,
    'Store': ['A']*30 + ['B']*30 + ['C']*30,
    'Sales': [200, 210, 190, 215, 220, 205, 210, 195, 205, 215, 220, 200, 210, 220, 230, 240, 225, 215, 220, 235, 230,
              210, 205, 195, 200, 210, 215, 225, 235, 240, 150, 160, 170, 180, 190, 175, 180, 170, 185, 190, 200, 210,
              190, 180, 170, 185, 175, 200, 195, 185, 190, 200, 190, 175, 185, 180, 170, 185, 180, 195, 200]
}
df = pd.DataFrame(data)

# Performing repeated measures ANOVA
aovrm = AnovaRM(df, 'Sales', 'Day', within=['Store'])
res = aovrm.fit()

print(res)


ValueError: All arrays must be of the same length