## ANOVA
- ANOVA (Analysis of Variance) is a statistical test used to compare the means of three or more independent groups to determine if there are significant differences between them

## Q1 
- A manufacturing coy has purchased 3 new machines A, B, C of different makes and wishes to determine whether one of them is faster than the other in producing a certain item. From hourly production figures are observed at random from each machine and the results are given below
- Use ANOVA to test whether machines differ significantly.
- ![image.png](attachment:42f62b18-67bf-4f79-ab73-70d1659ed2bd.png)
- This is a One-Way ANOVA problem (Analysis of Variance).
- We have to test whether the mean production of the three machines A, B, and C are significantly different.
- Hypothesis : Ho : mu1 = mu2 = mu3 ; Ha : At least one mean in different

In [3]:
import scipy.stats as stats
import numpy as np

In [4]:
# Data
A = [20, 21, 23, 16, 20]
B = [18, 20, 17, 25, 15]
C = [25, 28, 22, 28, 32]

In [5]:
print(np.mean(A), np.mean(B), np.mean(C))

20.0 19.0 27.0


In [6]:
# One-way ANOVA
f_stat, p_value = stats.f_oneway(A, B, C)

print(f"F-statistic = {f_stat:.3f}")
print(f"p-value = {p_value:.4f}")

F-statistic = 8.143
p-value = 0.0058


- pvalue = 0.0058 < 0.05 : Reject Ho
- There is a significant difference between the mean productions of the three machines.
- Machine C has a much higher average production — it’s likely the fastest.
- Machine C appears to produce significantly more units per hour than A and B.

## Q2 : Two-Way ANOVA without replication problem
- Following Data represent the no of units of a product by 3 different workers using 3 different types of machines
- ![image.png](attachment:72b2bf23-8784-4462-8102-4f001cb613d2.png)
- Test
    - Whether the mean productivity is the same for the different machine types
    - Whether the 3 workers differ with respect to mean productivity
- We have:
    - Factor 1: Type of Detergent (A, B, C)
    - Factor 2: Worker (X, Y, Z)
    - Each cell contains one observation (number of units produced).
- We want to test:
	1.	Whether mean productivity differs across detergents (machine types).
	2.	Whether mean productivity differs across workers.

- Tests
    - Null hypothesis Ho
        - Detergents : Mean productivity is same for all detergents (At least one detergent differs)
        - Workers : Mean productivity is same for all worker (At least one worker differs)

In [11]:
import pandas as pd
from statsmodels.stats.anova import anova_lm
import statsmodels.api as sm
from statsmodels.formula.api import ols

In [12]:
# Create data
data = {
    'Worker': ['X', 'X', 'X', 'Y', 'Y', 'Y', 'Z', 'Z', 'Z'],
    'Detergent': ['A', 'B', 'C', 'A', 'B', 'C', 'A', 'B', 'C'],
    'Units': [8, 32, 20, 28, 36, 38, 6, 28, 14]
}
df = pd.DataFrame(data)
df

Unnamed: 0,Worker,Detergent,Units
0,X,A,8
1,X,B,32
2,X,C,20
3,Y,A,28
4,Y,B,36
5,Y,C,38
6,Z,A,6
7,Z,B,28
8,Z,C,14


Both factors are significant:
	•	The type of detergent (machine) affects productivity.
	•	The workers also differ significantly in mean productivity.

So productivity depends on both worker and machine type.

In [None]:
## end of ANOVA