**<font size="5">Part 1: Steps for setting up ANOVA</font>**

**Null Hypothesis ($H_0$)**:
The null hypothesis states that there is no significant difference in the mean etching rate for different levels of power in the plasma beam.

**Alternate Hypothesis ($H_1$):**
The alternate hypothesis states that there is a significant difference in the mean etching rate for different levels of power in the plasma beam.

**Level of Significance ($\alpha$):**
The level of significance is typically set beforehand and represents the probability of making a Type I error (rejecting the null hypothesis when it's true). Common choices for $\alpha$ are 0.05 or 0.01.

**Test Statistic:**
For ANOVA, the test statistic used is the F-statistic. It measures the ratio of variance between group means to the variance within groups. A larger F-statistic suggests a higher likelihood of a significant difference between group means.

**P-value:**
The p-value is the probability of observing the test statistic (or a more extreme value) under the assumption that the null hypothesis is true. A small p-value (typically less than the chosen $\alpha$) indicates that the data provides enough evidence to reject the null hypothesis.

**F Table:**
An F table is a statistical table that provides critical values of the F-distribution for different degrees of freedom and significance levels. These critical values are used to determine whether the calculated F-statistic is significant.

**Degrees of Freedom (DoF):**

Model DoF: Equal to the number of groups minus 1. In this case, if you have 'n' different power levels, the model DoF would be 'n - 1'.
Error Terms DoF: Equal to the total number of data points minus the total number of groups. If you have 'N' total data points and 'n' groups, the error terms DoF would be 'N - n'.
Total DoF: Equal to the total number of data points minus 1. It represents the total variability in the data.

**<font size="5">PART 2</font>**

In [1]:
import pandas as pd
from scipy.stats import f_oneway

In [2]:
data = pd.read_excel('anova_lab_data.xlsx')

In [3]:
print(data)

   Power   Etching Rate
0   160 W          5.43
1   180 W          6.24
2   200 W          8.79
3   160 W          5.71
4   180 W          6.71
5   200 W          9.20
6   160 W          6.22
7   180 W          5.98
8   200 W          7.90
9   160 W          6.01
10  180 W          5.66
11  200 W          8.15
12  160 W          5.59
13  180 W          6.60
14  200 W          7.55


In [5]:
print(data.columns)

Index(['Power ', 'Etching Rate'], dtype='object')


In [8]:
groups = data['Power ']
etching_rates = data['Etching Rate']

F_statistic, p_value = f_oneway(*[etching_rates[groups == group] for group in groups.unique()])

print("F-statistic:", F_statistic)
print("P-value:", p_value)

F-statistic: 36.87895470100505
P-value: 7.506584272358903e-06


The **F-statistic** is high at around **36.88**, which means there are likely differences in etching rates when we change the power.

The **p-value** is very low, almost **0**. This suggests the differences are not random chance, showing that changing the power significantly affects etching rates.

In simple terms, the power changes seem to be causing real differences in how fast the machine etches.

In [11]:
alpha = 0.05 
if p_value < alpha:
    print("Reject the null hypothesis. There is a significant difference in mean etching rates.")
else:
    print("Fail to reject the null hypothesis. There is no significant difference in mean etching rates.")

Reject the null hypothesis. There is a significant difference in mean etching rates.


We can conclude that changing the power of the plasma beam does make a significant difference in how fast the machine etches.