# Anova

# Ans1-

Assumptions for ANOVA:

Homogeneity of variances: Variances of groups should be roughly equal.

Normality: Data within groups should be normally distributed.

Violations:
Homogeneity: Unequal variances can lead to inaccurate results.

Normality: Departure from normality may affect the reliability of ANOVA, especially with small sample sizes.

# Ans2-

Three types of ANOVA:

One-Way ANOVA:

Use: To compare means of three or more independent groups.

Example: Comparing average scores of students in different teaching methods.

Two-Way ANOVA:
Use: Examines the influence of two different categorical independent variables on a dependent variable.

Example: Studying the effects of both gender and treatment on test scores.

Repeated Measures ANOVA:
Use: Analyzes data where the same subjects are used for each treatment or measurement.

Example: Evaluating the performance of individuals under different conditions over time.

# Ans3-

Partitioning of Variance in ANOVA:
In ANOVA, variance is divided into different components to assess the contributions of various sources to the overall variability in the data. These components typically include variance between groups and variance within groups.

Importance:
Understanding the partitioning of variance helps identify:

Between-Group Differences: Determines if there are significant differences among group means.

Within-Group Variability: Assesses the consistency or variability of scores within each group.
Overall Significance: Enables researchers to make informed conclusions about the impact of factors being studied.

# Ans4-

In [1]:
import numpy as np
import pandas as pd
import statsmodels.api as sm
from statsmodels.formula.api import ols

# Simulated data
data = {'GroupA': np.random.normal(0, 1, 50),
        'GroupB': np.random.normal(2, 1, 50),
        'GroupC': np.random.normal(5, 1, 50)}

df = pd.DataFrame(data)

# Flatten the data for OLS
stacked_data = df.stack().reset_index()
stacked_data.columns = ['Index', 'Group', 'Value']

# Fit the OLS model
model = ols('Value ~ Group', data=stacked_data).fit()

# Calculate SST, SSE, and SSR
SST = np.sum((stacked_data['Value'] - np.mean(stacked_data['Value']))**2)
SSE = np.sum(model.resid**2)
SSR = SST - SSE

print(f'Total Sum of Squares (SST): {SST}')
print(f'Explained Sum of Squares (SSE): {SSE}')
print(f'Residual Sum of Squares (SSR): {SSR}')


Total Sum of Squares (SST): 759.1576480861104
Explained Sum of Squares (SSE): 122.75890510307667
Residual Sum of Squares (SSR): 636.3987429830337


# Ans5-

With an F-statistic of 5.23 and a p-value of 0.02 in a one-way ANOVA:

Statistical Significance:

The F-statistic indicates that there are significant differences among the group means.
Interpretation:

Since the p-value (0.02) is less than the commonly chosen significance level of 0.05, you would reject the null hypothesis.
This suggests that at least one group mean is significantly different from the others.
Conclusion:

There is evidence to support the presence of statistically significant differences between the groups.

# Ans7-

Handling Missing Data in Repeated Measures ANOVA:

Complete Case Analysis (CCA):

Use only cases with complete data for all time points.
Potential consequence: Loss of data may lead to reduced sample size and biased results if missingness is not completely at random.
Maximum Likelihood Estimation (MLE):

Utilize all available data to estimate parameters.
Potential consequence: Assumes missing data are missing at random (MAR); results can be unbiased if MAR holds, but sensitivity to the missing data mechanism is crucial.
Multiple Imputation:

Generate multiple datasets with imputed values and combine results.
Potential consequence: Requires more computational resources but provides more robust estimates, accounting for uncertainty related to missing data.

onsequences of Different Methods:

Biased Results: Choosing an inappropriate method can introduce bias into the analysis.

Loss of Power: Complete case analysis may lead to reduced statistical power due to smaller sample sizes.

# Ans8

Common Post-Hoc Tests After ANOVA:

Tukey's Honestly Significant Difference (HSD):

Use: To identify which specific groups differ from each other after a significant ANOVA result.
Example: Comparing the mean scores of multiple teaching methods to determine if any pair shows a significant difference.
Bonferroni Correction:

Use: Controls the familywise error rate when conducting multiple pairwise comparisons.
Example: Examining the differences between several drug treatments to avoid Type I errors in multiple comparisons.
Scheffe's Test:

Use: Conservative test for multiple comparisons, suitable when sample sizes are unequal.
Example: Analyzing the performance of different training programs in a workplace with varying participant numbers.
Duncan's Multiple Range Test:

Use: Determines groups that are significantly different from each other, especially in situations with unequal sample sizes.
Example: Assessing the yields of different fertilizer treatments in agriculture.
When to Use Post-Hoc Tests:

After a significant ANOVA result, post-hoc tests help pinpoint which specific groups differ from each other.
Use when you have more than two groups and want to explore pairwise differences.
Necessary when there are multiple comparisons to control for Type I errors.
Example Situation:
Suppose you conducted a one-way ANOVA comparing the effectiveness of three different exercise programs on weight loss. If the ANOVA yields a significant result, a post-hoc test (e.g., Tukey's HSD) would be necessary to identify which exercise programs lead to significantly different weight loss outcomes. This helps in making more detailed and nuanced interpretations beyond the overall ANOVA result.

# Ans9-

In [2]:
import scipy.stats as stats
import numpy as np

# Simulated data (replace with actual data)
np.random.seed(42)  # for reproducibility
data_A = np.random.normal(2, 1, 50)
data_B = np.random.normal(3, 1, 50)
data_C = np.random.normal(4, 1, 50)

# Perform one-way ANOVA
f_statistic, p_value = stats.f_oneway(data_A, data_B, data_C)

# Print results
print(f'F-statistic: {f_statistic}')
print(f'p-value: {p_value}')

# Interpretation
if p_value < 0.05:
    print("There is significant evidence to reject the null hypothesis.")
    print("At least one diet has a different mean weight loss.")
else:
    print("There is not enough evidence to reject the null hypothesis.")
    print("There may not be a significant difference in mean weight loss between the diets.")


F-statistic: 67.61854911979148
p-value: 1.5055246613126342e-21
There is significant evidence to reject the null hypothesis.
At least one diet has a different mean weight loss.
