# One-way ANOVA: Car oil

Researchers took 20 cars of the same to take part in a study. These cars are randomly doped with one of the four-engine oils and allowed to run freely for 100 kilometers each. At the end of the journey, the performance of each of the cars is noted.

In [None]:
#install scipy
#install pandas
#install numpy

import matplotlib.pyplot as plt
import pandas as pd
import numpy as np

### This is the data of 4 performances with 4 different oils

In [None]:
performance1 = [89, 89, 88, 78, 79]
performance2 = [93, 92, 94, 89, 88]
performance3 = [89, 88, 89, 93, 90]
performance4 = [81, 78, 81, 92, 82]

performances_df = pd.DataFrame([performance1, performance2, performance3, performance4],index=["oil1", "oil2", "oil3", "oil4"])
performances_df = performances_df.transpose()
print(performances_df)

### Calculate overal mean
the total mean is the same as the mean of the 4 sample means 

In [None]:
means_of_samples = performances_df.mean()
total_mean = means_of_samples.____()
print(means_of_samples)
print(f"total mean: {total_mean}")

### Calculate SST
The SST is the sum of all squared differences between the mean of a sample and the individual values in that sample. It is represented mathematically with the formula:
![SST formula](img/sst-formula.png)

In [None]:
performances_df["SST_1"] = (performances_df["oil1"]-____)**2
performances_df["SST_2"] = (performances_df["oil2"]-____)**2
performances_df["SST_3"] = (performances_df["oil3"]-____)**2
performances_df["SST_4"] = (performances_df["oil4"]-____)**2

SST = np.sum(performances_df[["SST_1", "SST_2", "SST_3", "SST_4"]].sum(axis="index"))

print(SST)


### Calculate SSW
The SSW is the sum of squared differences between a value and its sample mean for all values. 
![SSW formula](img/ssw-formula.png)

In [None]:
performances_df["SSW_1"] = (performances_df["oil1"]-____["oil1"])**2
performances_df["SSW_2"] = (performances_df["oil2"]-____["oil2"])**2
performances_df["SSW_3"] = (performances_df["oil3"]-____["oil3"])**2
performances_df["SSW_4"] = (performances_df["oil4"]-____["oil4"])**2

SSW = np.sum(performances_df[["SSW_1", "SSW_2", "SSW_3", "SSW_4"]].sum(axis="index"))

print(SSW)

### Calculate SSB
The SSB is the sum of squared differences between a value and the grand mean, the mean of all values regardless of sample, for all values.
![SSB formula](img/ssb-formula.png)


In [None]:
performances_df["SSB_1"] = (means_of_samples["oil1"]-____)**2
performances_df["SSB_2"] = (means_of_samples["oil2"]-____)**2
performances_df["SSB_3"] = (means_of_samples["oil3"]-____)**2
performances_df["SSB_4"] = (means_of_samples["oil4"]-____)**2

SSB = np.sum(performances_df[["SSB_1", "SSB_2", "SSB_3", "SSB_4"]].sum(axis="index"))

print(SSB)
print(performances_df)

### F - Statistic

In [None]:
# number of measurements per group
n = 5

# number of groups compared
m = 4

#    variance between / variance within
#    a large F indicates a differce between the means in the populations
F = (___ / (m-1))/(___ / (m * (n-1)))

print(F)


### Conclusion

In [None]:
from scipy.stats import f

p_value = f.sf(____, m-1, m * (n-1))

print(p_value)

#### The p value is less than 0.05, so we may reject the null hypothesis with a significance level of 0.05

### Tip! we could have done it in 5 lines of code as well

In [None]:

from scipy.stats import f_oneway
 
performance1 = [89, 89, 88, 78, 79]
performance2 = [93, 92, 94, 89, 88]
performance3 = [89, 88, 89, 93, 90]
performance4 = [81, 78, 81, 92, 82]
 
f_oneway(performance1, performance2, performance3, performance4)


data from: https://www.geeksforgeeks.org/how-to-perform-a-one-way-anova-in-python/
 