Lab | Inferential statistics - ANOVA

State the null hypothesis
State the alternate hypothesis
What is the significance level
What are the degrees of freedom of model, error terms, and total DoF
ANOVA (Analysis of Variance) Test

The purpose of an ANOVA test is to determine whether there are any statistically significant differences between the means of three or more independent groups.

Hypotheses

Null Hypothesis (H0): The means of the etching rates for different power levels are equal.
Alternate Hypothesis (H1): At least one of the means of the etching rates for different power levels is different.
Significance Level

A common significance level is α=0.05

Degrees of Freedom

Degrees of Freedom for the Model (DF between or DFmodel): This is equal to the number of groups (power levels) minus 1. In your data, there are 3 unique power levels (160 W, 180 W, 200 W), so

DF model=3−1=2.

Degrees of Freedom for the Error (DF within or DFerror): This is equal to the total number of observations minus the number of groups. In your data, there are 15 observations and 3 groups, so

DF error =15−3=12.

Total Degrees of Freedom (df): 15-1 = 14

In [1]:
import pandas as pd
import numpy as np
from scipy import stats
import seaborn as sns
import matplotlib.pyplot as plt

In [2]:
df = pd.read_excel('anova_lab_data.xlsx')

In [3]:
df

Unnamed: 0,Power,Etching Rate
0,160 W,5.43
1,180 W,6.24
2,200 W,8.79
3,160 W,5.71
4,180 W,6.71
5,200 W,9.2
6,160 W,6.22
7,180 W,5.98
8,200 W,7.9
9,160 W,6.01


In [5]:
df.describe()

Unnamed: 0,Etching Rate
count,15.0
mean,6.782667
std,1.228643
min,5.43
25%,5.845
50%,6.24
75%,7.725
max,9.2


In [7]:
df.groupby('Power ').agg(np.mean)

Unnamed: 0_level_0,Etching Rate
Power,Unnamed: 1_level_1
160 W,5.792
180 W,6.238
200 W,8.318


The findings indicate that as voltage increases, the process becomes faster, with the optimal performance observed at 200W.

Conducting an ANOVA test:

Null Hypothesis (Ho): The means of etching rates remain similar across different voltage levels, and the voltage changes have no statistically significant impact.
Alternate Hypothesis (H1): There are differences in the means of etching rates among various voltage levels, indicating that at least one voltage level has a statistically different mean compared to the others.

In [8]:
df.rename(columns={'Etching Rate': 'Etching_Rate'}, inplace=True)
df.rename(columns={'Power ': 'Power'}, inplace=True)

In [10]:
import statsmodels.api as sm
from statsmodels.formula.api import ols

model = ols('Etching_Rate ~ C(Power)', data=df).fit()
sm.stats.anova_lm(model)

Unnamed: 0,df,sum_sq,mean_sq,F,PR(>F)
C(Power),2.0,18.176653,9.088327,36.878955,8e-06
Residual,12.0,2.95724,0.246437,,



Given a significance level of 0.05, the p-value (p-value = 0.000008) falls below this threshold. Therefore, we reject the null hypothesis. This suggests that there is evidence to conclude that at least one of the power levels has a mean etching rate that differs from the others. However, further analysis is needed to determine precisely which power level exhibits this difference.

In [12]:
df.pivot(columns='Power').describe()

Unnamed: 0_level_0,Etching_Rate,Etching_Rate,Etching_Rate
Power,160 W,180 W,200 W
count,5.0,5.0,5.0
mean,5.792,6.238,8.318
std,0.319875,0.434304,0.669604
min,5.43,5.66,7.55
25%,5.59,5.98,7.9
50%,5.71,6.24,8.15
75%,6.01,6.6,8.79
max,6.22,6.71,9.2


In [15]:
from scipy.stats import ttest_ind

power_a = df[df['Power'] == "200 W"]['Etching_Rate']

for power in df['Power'].unique():
    power_b = df[df['Power'] == power]['Etching_Rate']
    print(power, ttest_ind(power_a, power_b))

160 W Ttest_indResult(statistic=7.611403634613074, pvalue=6.237977344615716e-05)
180 W Ttest_indResult(statistic=5.827496614588661, pvalue=0.0003926796476049085)
200 W Ttest_indResult(statistic=0.0, pvalue=1.0)


In [16]:
power_a

2     8.79
5     9.20
8     7.90
11    8.15
14    7.55
Name: Etching_Rate, dtype: float64

Based on th data, there is compelling evidence to indicate that the etching rate differs significantly when using 200 W compared to either 160 W or 180 W. This conclusion is supported by the fact that the p-values in both tests fall below the significance threshold of 0.05.

Moreover, the descriptive statistics reinforce this finding. Specifically, the mean etching rate for the 200 W group (8.318) is notably higher than that for the 160 W (5.792) and 180 W (6.238) groups. Additionally, the standard deviations suggest that these differences are not merely the result of random variability but likely represent genuine distinctions in etching performance at different power levels.