#Ans1.) 

Assumptions for ANOVA:

Independence: Observations within each group are independent of each other.
Normality: The dependent variable follows a normal distribution within each group.
Homogeneity of variances: The variance of the dependent variable is equal across all groups.
Examples of violations:

Independence: Data collected from clustered or correlated groups may violate independence.
Normality: Skewed or non-normally distributed data can violate this assumption.
Homogeneity of variances: Unequal variances across groups can invalidate ANOVA results

#Ans2.)

Types of ANOVA:

One-way ANOVA: Compares means across two or more independent groups.

Two-way ANOVA: Examines the effects of two independent categorical variables on a dependent variable.

N-way ANOVA: Generalizes to more than two independent variables

#Ans3.)

Partitioning of variance:

Total variance (SST) is partitioned into explained variance (SSE) and unexplained variance (SSR).
Understanding this concept helps in determining how much of the variance in the dependent variable is explained by the independent variables

In [1]:
#Ans4.)

import numpy as np
import scipy.stats as stats

def one_way_anova(data):
    # Calculate total sum of squares (SST)
    overall_mean = np.mean(data)
    sst = np.sum((data - overall_mean)**2)
    
    # Calculate explained sum of squares (SSE)
    group_means = [np.mean(group) for group in data]
    sse = sum([np.sum((group - group_mean)**2) for group, group_mean in zip(data, group_means)])
    
    # Calculate residual sum of squares (SSR)
    ssr = sst - sse
    
    return sst, sse, ssr

# Example usage
data = [np.array([1, 2, 3]), np.array([2, 3, 4]), np.array([3, 4, 5])]
sst, sse, ssr = one_way_anova(data)
print("SST:", sst)
print("SSE:", sse)
print("SSR:", ssr)


SST: 12.0
SSE: 6.0
SSR: 6.0


In [2]:
#Ans5.)

import statsmodels.api as sm
from statsmodels.formula.api import ols

# Example data
data = # Your data in DataFrame format

# Fit two-way ANOVA model
model = ols('dependent_variable ~ independent_variable_1 * independent_variable_2', data=data).fit()
anova_table = sm.stats.anova_lm(model, typ=2)

# Main effects
main_effect_1 = anova_table.loc['independent_variable_1', 'sum_sq']
main_effect_2 = anova_table.loc['independent_variable_2', 'sum_sq']

# Interaction effect
interaction_effect = anova_table.loc['independent_variable_1:independent_variable_2', 'sum_sq']

print("Main effect 1:", main_effect_1)
print("Main effect 2:", main_effect_2)
print("Interaction effect:", interaction_effect)


SyntaxError: invalid syntax (1069629552.py, line 7)

#Ans6.)

Interpretation of one-way ANOVA results:

With an F-statistic of 5.23 and a p-value of 0.02, we reject the null hypothesis.
This suggests that there are significant differences between the groups.
However, further post-hoc tests are needed to determine which specific groups differ from each other

#Ans.7.)

Handling missing data in repeated measures ANOVA:

Common methods include imputation (e.g., mean imputation, regression imputation) or using statistical techniques that can handle missing data (e.g., mixed-effects models).
Consequences of different methods include biased estimates, inflated standard errors, or reduced statistical power

#Ans8.)

 Common post-hoc tests after ANOVA:

Tukey's HSD (Honestly Significant Difference): Compares all possible pairs of means, controlling for family-wise error rate.
Bonferroni correction: Adjusts significance levels to control for multiple comparisons.
Scheffe's method: Applicable for any number of comparisons, but tends to be more conservative.
Dunn's test: Non-parametric alternative suitable for non-normally distributed data.
Example situation: After conducting ANOVA on the effects of different teaching methods on exam scores, significant differences are found. A post-hoc test like Tukey's HSD would be necessary to determine which specific teaching methods significantly differ from each other.

In [3]:
#Ans9.)

import scipy.stats as stats

# Sample data (mean weight loss for each diet)
diet_A = [2.5, 3.0, 2.8, 2.9, 3.2, 2.6, 3.1, 2.7, 2.8, 2.9,
          2.7, 3.2, 2.8, 3.0, 2.6, 3.1, 2.9, 2.7, 3.2, 2.8,
          3.0, 2.6, 2.8, 2.7, 2.9, 2.8, 3.1, 2.5, 3.0, 2.9]
diet_B = [3.5, 3.2, 3.6, 3.4, 3.7, 3.3, 3.5, 3.6, 3.4, 3.3,
          3.7, 3.4, 3.6, 3.2, 3.5, 3.3, 3.6, 3.4, 3.5, 3.2,
          3.4, 3.7, 3.5, 3.3, 3.6, 3.4, 3.2, 3.6, 3.5, 3.3]
diet_C = [2.0, 2.1, 1.8, 2.2, 2.0, 2.3, 2.1, 1.9, 2.2, 2.0,
          2.1, 1.8, 2.3, 2.0, 2.2, 2.1, 1.9, 2.3, 2.0, 2.2,
          2.1, 1.8, 2.2, 2.0, 2.3, 2.1, 1.9, 2.2, 2.0, 2.1]

# Perform one-way ANOVA
f_statistic, p_value = stats.f_oneway(diet_A, diet_B, diet_C)

print("F-statistic:", f_statistic)
print("p-value:", p_value)


F-statistic: 483.11226611226573
p-value: 7.752208638675557e-48


Interpretation: With a significant p-value (typically < 0.05), we reject the null hypothesis and conclude that there are significant differences between the mean weight loss of the three diets

In [4]:
#Ans10.)

import pandas as pd
import statsmodels.api as sm
from statsmodels.formula.api import ols

# Sample data
data = # Your data in DataFrame format

# Fit two-way ANOVA model
model = ols('completion_time ~ software_program * experience_level', data=data).fit()
anova_table = sm.stats.anova_lm(model, typ=2)

print(anova_table)


SyntaxError: invalid syntax (742990514.py, line 8)

Interpretation: Look for significant p-values for main effects and interaction effects. For example, if the p-value for the software_program is significant, it indicates differences in completion time among software programs. Similarly, a significant interaction effect suggests that the effect of software programs on completion time depends on experience level.

In [5]:
#Ans11.)

# Assuming data contains test scores for control and experimental groups

control_scores = # Control group scores
experimental_scores = # Experimental group scores

t_statistic, p_value = stats.ttest_ind(control_scores, experimental_scores)

print("t-statistic:", t_statistic)
print("p-value:", p_value)


SyntaxError: invalid syntax (528108171.py, line 5)

If the p-value is significant (typically < 0.05), there are significant differences in test scores between the two groups. For post-hoc tests, you can use Tukey's HSD or other appropriate methods to identify specific group differences

In [6]:
#Ans12.)

# Assuming data contains daily sales for each store over 30 days

# Perform repeated measures ANOVA
# Followed by post-hoc tests if the results are significant


For repeated measures ANOVA and post-hoc tests, we can use libraries like pingouin or statsmodels. This will help determine if there are significant differences in sales between the three stores and identify which store(s) differ significantly from each other