Q1: Assumptions and Violations in ANOVA:
Assumptions required for the analysis of variance (ANOVA) include:
1. Independence: Observations within and between groups should be independent.
2. Normality: The residuals (differences between observed and predicted values) should follow a normal distribution.
3. Homogeneity of Variance: The variance of the residuals should be approximately equal for all groups (homoscedasticity).

Violations of these assumptions can impact the validity of ANOVA results. For example:
- Violation of independence can lead to pseudoreplication, where the same data is treated as independent.
- Violation of normality can affect the validity of p-values and confidence intervals.
- Violation of homogeneity of variance can lead to unreliable F-statistics and p-values.

Q2: Types of ANOVA:
There are three main types of ANOVA:
1. One-Way ANOVA: Used to compare the means of three or more groups when you have one categorical independent variable.
2. Two-Way ANOVA: Used when there are two categorical independent variables (factors), and you want to examine their main effects and interaction.
3. N-Way ANOVA: Generalization of two-way ANOVA to more than two independent variables.

In a one-way ANOVA, you compare the means of different groups. In a two-way ANOVA, you explore the influence of two factors and their interaction on a dependent variable.

Q3: Partitioning of Variance in ANOVA:
The partitioning of variance in ANANOVA involves breaking down the total variability in the data into different sources:
- Total Sum of Squares (SST): The total variability in the data.
- Explained Sum of Squares (SSE): The variability attributed to the independent variable(s) or factors.
- Residual Sum of Squares (SSR): The unexplained variability or error in the model.

Understanding this partitioning is important because it allows you to assess the proportion of variance explained by the factors, helping you determine if the differences are statistically significant.


In [1]:
#Q4
import numpy as np
from scipy.stats import f_oneway

# Data for different groups
group1 = [5, 7, 8, 6, 5]
group2 = [9, 11, 12, 10, 9]
group3 = [14, 16, 17, 15, 14]

# Combine data into a single array
data = np.concatenate([group1, group2, group3])

# Calculate one-way ANOVA
f_statistic, p_value = f_oneway(group1, group2, group3)

# Calculate SST, SSE, and SSR
n = len(data)
grand_mean = np.mean(data)
sst = np.sum((data - grand_mean) ** 2)
sse = np.sum((group1 - np.mean(group1)) ** 2) + np.sum((group2 - np.mean(group2)) ** 2) + np.sum((group3 - np.mean(group3)) ** 2)
ssr = sst - sse
print(sst,sse,ssr)

223.73333333333335 20.400000000000002 203.33333333333334


In [2]:
#Q5
import statsmodels.api as sm
from statsmodels.formula.api import ols

# Create a DataFrame with your data
import pandas as pd
data = pd.DataFrame({'A': [1, 1, 2, 2, 3, 3],
                     'B': [1, 2, 1, 2, 1, 2],
                     'Y': [5, 7, 6, 8, 9, 10]})

# Fit a two-way ANOVA model
model = ols('Y ~ A * B', data=data).fit()
anova_table = sm.stats.anova_lm(model, typ=2)

# Extract main effects and interaction effect
main_effect_A = anova_table.loc['A', 'PR(>F)']
main_effect_B = anova_table.loc['B', 'PR(>F)']
interaction_effect = anova_table.loc['A:B', 'PR(>F)']

print(main_effect_A)
print(main_effect_B)
print(interaction_effect)

0.03237107707340793
0.08712907082472318
0.5196155385847366



Q6: Interpretation of ANOVA Results:
In a one-way ANOVA with an F-statistic of 5.23 and a p-value of 0.02, you can conclude the following:

- The F-statistic measures the ratio of variance explained by the group differences to the variance within the groups.
- The p-value (0.02) is less than the chosen significance level (e.g., 0.05), indicating that there is evidence to reject the null hypothesis.

Interpretation:
- There are statistically significant differences between at least two of the groups.
- In other words, the teaching methods have a significant effect on student performance.
- Further post-hoc tests or pairwise comparisons can help identify which groups differ significantly.

Keep in mind that a significant ANOVA result indicates group differences but doesn't specify which group(s) are different from the others.

Q7: Handling Missing Data in Repeated Measures ANOVA:
Handling missing data in a repeated measures ANOVA can be challenging but is essential to maintain the integrity of the analysis. Some common methods to handle missing data include:

1. Listwise Deletion: Exclude participants with any missing data. This method is straightforward but can lead to reduced sample size and potentially biased results if the missing data is not random.

2. Pairwise Deletion: Include all available data points for each comparison, even if it means different subjects for different comparisons. This method uses all available data but can lead to variability in sample sizes across comparisons.

3. Imputation: Fill in missing values with estimates (e.g., mean, median, regression-based) to complete the dataset. Imputation methods assume that the data are missing at random. However, the choice of imputation method can affect the results.

Consequences of using different methods:
- Listwise deletion can reduce sample size, potentially leading to lower statistical power.
- Pairwise deletion uses all available data but can lead to varying sample sizes and may not account for missing data patterns.
- Imputation can introduce bias if the imputation model is not accurate and does not fully account for missing data patterns.

The choice of method should be based on the nature of the data and the assumptions made about missing data.

Q8: Common Post-Hoc Tests:
Common post-hoc tests used after ANOVA include:
1. Tukey's HSD (Honestly Significant Difference): Used to compare all pairs of groups to identify which specific pairs differ significantly.
2. Bonferroni Correction: Adjusts the significance level for multiple comparisons to control for the family-wise error rate.
3. Sidak Correction: Similar to Bonferroni, but it is slightly less conservative.
4. Scheffé's Test: Used for comparing multiple groups in a more flexible manner, especially when sample sizes are unequal.

You would use a post-hoc test when you have conducted an ANOVA and found a significant difference among groups. Post-hoc tests help pinpoint which specific groups are different from each other.


In [3]:
#Q9
from scipy import stats

# Sample data for each diet
diet_A = [4, 5, 6, 4, 3, 5, 6, 7, 3, 5]
diet_B = [5, 6, 7, 4, 6, 5, 7, 8, 6, 5]
diet_C = [6, 7, 8, 7, 9, 8, 7, 6, 8, 7]

# Perform one-way ANOVA
f_statistic, p_value = stats.f_oneway(diet_A, diet_B, diet_C)

# Report the results
print(f"F-statistic: {f_statistic}")
print(f"P-value: {p_value}")

# Interpretation
if p_value < 0.05:
    print("There is a significant difference between at least one pair of diets.")
else:
    print("There is no significant difference between the diets.")


F-statistic: 11.581967213114753
P-value: 0.00023341261881643774
There is a significant difference between at least one pair of diets.


In [None]:
#Q10
import statsmodels.api as sm
from statsmodels.formula.api import ols

# Create a DataFrame with your data
import pandas as pd

# Assuming you have a DataFrame 'data' with columns: 'Time', 'Program', and 'ExperienceLevel'
model = ols('Time ~ Program * ExperienceLevel', data=data).fit()
anova_table = sm.stats.anova_lm(model, typ=2)

# Report the results
print(anova_table)

'''/* Time data for each employee and program */ with the actual time data. The C() function in the formula indicates categorical variables.'''


In [5]:
#q11
from scipy import stats

# Sample data for the two groups
control_group = [85, 88, 90, 82, 87, 89, 84, 86, 92, 88]
experimental_group = [90, 92, 94, 85, 88, 91, 93, 89, 95, 92]

# Perform a two-sample t-test
t_statistic, p_value = stats.ttest_ind(control_group, experimental_group)

# Report the results
print(f"T-statistic: {t_statistic}")
print(f"P-value: {p_value}")

# Interpretation
if p_value < 0.05:
    print("There is a significant difference in test scores between the control and experimental groups.")
else:
    print("There is no significant difference in test scores between the groups.")


T-statistic: -2.85178292166329
P-value: 0.010591036481113826
There is a significant difference in test scores between the control and experimental groups.


In [None]:
#Q12
import pandas as pd
import pingouin as pg
from statsmodels.stats.anova import AnovaRM

# Assuming you have a DataFrame with columns 'Store', 'Day', and 'Sales'
# where 'Store' has values 'A', 'B', 'C', and 'Day' has values from 1 to 30
# and 'Sales' contains the daily sales data

# Example data creation
data = {'Store': ['A']*30 + ['B']*30 + ['C']*30,
        'Day': list(range(1, 31))*3,
        'Sales': [/* Sales data for each day and store */]}

df = pd.DataFrame(data)

# Repeated measures ANOVA
aovrm = AnovaRM(df, 'Sales', 'Day', within=['Store'])
res = aovrm.fit()

# Print ANOVA results
print(res)

# Post-hoc test
posthoc_res = pg.pairwise_ttests(data=df, dv='Sales', within='Store', subject='Day', padjust='bonf')

# Print post-hoc results
print(posthoc_res)

'''/* Sales data for each day and store */ with the actual sales data. The within parameter in the ANOVA indicates that you are comparing measurements within the 'Store' variable.''''
