Q1. Assumptions for ANOVA:

Independence of observations
Normally distributed residuals
Homogeneity of variances
Examples of violations:

Independence: Data collected over time from the same individuals may violate independence.
Normality: Skewed or heavily-tailed distributions.
Homogeneity of variances: Unequal variances across groups.

Q2. Three types of ANOVA:

One-way ANOVA: Used when comparing means of three or more groups on a single factor.
Two-way ANOVA: Examines the influence of two categorical independent variables on one continuous dependent variable.
Repeated measures ANOVA: Used when the same subjects are measured at multiple time points or under multiple conditions.

Q3. Partitioning of variance:

Total sum of squares (SST)
Explained sum of squares (SSE)
Residual sum of squares (SSR)
Understanding this concept helps in assessing how much of the total variance is accounted for by the model.

Q4. Calculation of SST, SSE, and SSR in one-way ANOVA:

SST: Sum of squared deviations of each observation from the overall mean.
SSE: Sum of squared deviations of each group mean from the overall mean.
SSR: Sum of squared deviations of each observation from its group mean.

Q5. Calculation of main effects and interaction effects in two-way ANOVA:

Main effects: Differences among the levels of one independent variable regardless of the levels of the other independent variable.
Interaction effects: When the effect of one independent variable depends on the level of another independent variable.

Q6. Interpretation of F-statistic and p-value:

F-statistic indicates whether the group means are significantly different.
p-value indicates the probability of obtaining the observed F-statistic under the null hypothesis.
A low p-value (< 0.05) suggests rejecting the null hypothesis, indicating significant differences between groups.

Q7. Handling missing data in repeated measures ANOVA:

Ignoring missing data may bias results.
Methods include mean imputation, last observation carried forward, or using statistical techniques like multiple imputation.

Q8. Common post-hoc tests:

Tukey's HSD: Compares all possible pairs of means.
Bonferroni correction: Adjusts significance thresholds for multiple comparisons.
Scheffé test: Appropriate for unequal sample sizes.
Post-hoc tests help identify which specific groups differ significantly after finding a significant omnibus ANOVA result.

In [3]:
## 9
from scipy.stats import f_oneway

# Example data for weight loss in three diet groups: A, B, and C
weight_loss_A = [2, 3, 4, 5, 2, 3, 4, 5, 3, 4]  # List of weight loss values for group A
weight_loss_B = [1, 2, 3, 2, 3, 4, 3, 2, 3, 4]  # List of weight loss values for group B
weight_loss_C = [3, 4, 5, 4, 5, 6, 5, 4, 5, 6]

# Performing one-way ANOVA
f_statistic, p_value = f_oneway(weight_loss_A, weight_loss_B, weight_loss_C)

print("F-statistic:", f_statistic)
print("P-value:", p_value)

if p_value < 0.05:
    print("Reject null hypothesis: There are significant differences between the mean weight loss of the three diets.")
else:
    print("Fail to reject null hypothesis: There are no significant differences between the mean weight loss of the three diets.")


F-statistic: 10.247191011235957
P-value: 0.0004883605342438076
Reject null hypothesis: There are significant differences between the mean weight loss of the three diets.


In [11]:
## 10
import pandas as pd

# Creating a DataFrame for task completion time with two factors: software program and experience level
data = pd.DataFrame({
    'program': ['A', 'A', 'B', 'B', 'C', 'C', 'A', 'A', 'B', 'B', 'C', 'C'],
    'experience_level': ['novice', 'experienced'] * 6,
    'time': [10, 12, 15, 14, 9, 8, 11, 13, 16, 15, 10, 9]  # Sample completion times
})

print(data)





   program experience_level  time
0        A           novice    10
1        A      experienced    12
2        B           novice    15
3        B      experienced    14
4        C           novice     9
5        C      experienced     8
6        A           novice    11
7        A      experienced    13
8        B           novice    16
9        B      experienced    15
10       C           novice    10
11       C      experienced     9


In [7]:
## 11
from scipy.stats import ttest_ind

# Example data for test scores in control and experimental groups
control_scores = [85, 87, 88, 84, 90, 91, 83, 82, 89, 86]  # List of test scores for the control group
experimental_scores = [88, 84, 82, 90, 87, 85, 91, 89, 86, 83]  # List of test scores for the experimental group

# Performing two-sample t-test
t_statistic, p_value = ttest_ind(control_scores, experimental_scores)

print("T-statistic:", t_statistic)
print("P-value:", p_value)

if p_value < 0.05:
    print("Reject null hypothesis: There is a significant difference in test scores between the control and experimental groups.")
else:
    print("Fail to reject null hypothesis: There is no significant difference in test scores between the control and experimental groups.")


T-statistic: 0.0
P-value: 1.0
Fail to reject null hypothesis: There is no significant difference in test scores between the control and experimental groups.


In [8]:
## 12
import pandas as pd
from statsmodels.stats.anova import AnovaRM

# Example data for daily sales in three retail stores
sales_data = pd.DataFrame({
    'store': ['A'] * 30 + ['B'] * 30 + ['C'] * 30,
    'day': list(range(1, 31)) * 3,
    'sales': [100, 110, 105, 120, 115, 118, 105, 115, 110, 125] * 3  # Sample sales data for 30 days per store
})

# Performing repeated measures ANOVA
rm_anova = AnovaRM(sales_data, 'sales', 'store', within=['day']).fit()

print(rm_anova.summary())


ValueError: All arrays must be of the same length