# ANOVA Analysis for Plasma Beam Etching Rate

## Context and Hypotheses

There are several data sets, each corresponding to a different power level of the plasma beam: 100 Watts, 200 Watts, and 300 Watts.

- **Null Hypothesis**: Changing the power of the plasma beam has no effect on the etching rate.
- **Alternative Hypothesis**: At least one of the plasma beam power levels has an effect on the etching rate.

## Significance Level

The significance level is often set at 5% (0.05), which means that you are willing to accept a 5% risk of being wrong if you reject the null hypothesis.

## Degrees of Freedom

### Model Degrees of Freedom

The number of power level groups (plasma beam power levels) minus one. In our example, there are 3 power levels, so the model degrees of freedom are:

\[
3 - 1 = 2
\]

### Error Degrees of Freedom

The total number of observations minus the number of power level groups. Suppose you have 10 measurements for each power level, making a total of 30 observations. The error degrees of freedom would then be:

\[
30 - 3 = 27
\]

### Total Degrees of Freedom

The total number of observations minus one, which is:

\[
30 - 1 = 29
\]

These degrees of freedom are important because they help determine the critical values in statistical tables, which in turn help you decide whether or not to reject the null hypothesis.


In [1]:
import pandas as pd
import numpy as np
import warnings
warnings.filterwarnings('ignore')

In [2]:
data = pd.read_excel('anova_lab_data.xlsx', sheet_name='data_collected')
data

Unnamed: 0,Power,Etching Rate
0,160 W,5.43
1,180 W,6.24
2,200 W,8.79
3,160 W,5.71
4,180 W,6.71
5,200 W,9.2
6,160 W,6.22
7,180 W,5.98
8,200 W,7.9
9,160 W,6.01


In [3]:
data.describe()

Unnamed: 0,Etching Rate
count,15.0
mean,6.782667
std,1.228643
min,5.43
25%,5.845
50%,6.24
75%,7.725
max,9.2


In [9]:
data.groupby('Power ').agg(np.mean)

Unnamed: 0_level_0,Etching Rate
Power,Unnamed: 1_level_1
160 W,5.792
180 W,6.238
200 W,8.318


The results show that when we increase the volts we are quicker and the best is 200W

Testing with ANOVA:
Ho = the means are similar and the different voltage did not change anything with the etching rate statistically;
H1 = the means are different and there is at least one voltage with the mean different then the others.

In [21]:
data.rename(columns={'Etching Rate': 'Etching_Rate'}, inplace=True)
data.rename(columns={'Power ': 'Power'}, inplace=True)

In [22]:
import statsmodels.api as sm
from statsmodels.formula.api import ols

model = ols('Etching_Rate ~ C(Power)', data=data).fit()
sm.stats.anova_lm(model)

Unnamed: 0,df,sum_sq,mean_sq,F,PR(>F)
C(Power),2.0,18.176653,9.088327,36.878955,8e-06
Residual,12.0,2.95724,0.246437,,


Considering a significance level of 0.05, the p-value is below this threshold (p-value = 0.000008), so we reject the null hypothesis. We can conclude that at least one of the power levels resulted in a different mean etching rate from the others, although we can't specify exactly which one without further analysis

In [23]:
data.pivot(columns='Power').describe()

Unnamed: 0_level_0,Etching_Rate,Etching_Rate,Etching_Rate
Power,160 W,180 W,200 W
count,5.0,5.0,5.0
mean,5.792,6.238,8.318
std,0.319875,0.434304,0.669604
min,5.43,5.66,7.55
25%,5.59,5.98,7.9
50%,5.71,6.24,8.15
75%,6.01,6.6,8.79
max,6.22,6.71,9.2


In [27]:
from scipy.stats import ttest_ind

power_a = data[data['Power'] == "200 W"]['Etching_Rate']

for power in data['Power'].unique():
    power_b = data[data['Power'] == power]['Etching_Rate']
    print(power, ttest_ind(power_a, power_b))

160 W Ttest_indResult(statistic=7.611403634613074, pvalue=6.237977344615716e-05)
180 W Ttest_indResult(statistic=5.827496614588661, pvalue=0.0003926796476049085)
200 W Ttest_indResult(statistic=0.0, pvalue=1.0)


In [28]:
power_a

2     8.79
5     9.20
8     7.90
11    8.15
14    7.55
Name: Etching_Rate, dtype: float64

Your data strongly suggests that the etching rate differs when using 200 W compared to 160 W or 180 W, given that the p-values in both tests are below the 0.05 threshold for significance.

The descriptive statistics further support this, showing that the mean etching rate for the 200 W group (8.318) is higher than that for the 160 W (5.792) and 180 W (6.238) groups. The standard deviations suggest that these differences are not due to random variability.