### Part A:
Six different machines are being considered for use in manufacturing rubber seals. The machines are being compared with respect to tensile strength of the product. A random sample of four seals from each machine is used to determine whether the mean tensile strength varies from machine to machine. In the Data.xlsx (Sheet Part A) file you find the tensile-strength measurements in kilograms per square centimeter 
Perform the analysis of variance at the 0.05 level of significance and indicate whether or not the mean tensile strengths differ significantly for the six machines.

In [62]:
import pandas as pd
from scipy.stats import f_oneway
import numpy as np

In [21]:
workbook = pd.ExcelFile('Data.xlsx')
workbook.sheet_names

['Part A', 'Part B']

In [22]:
df = pd.read_excel('Data.xlsx', sheet_name='Part A')

In [26]:
df.head()

Unnamed: 0,Machine,Measurement
0,1,18
1,1,17
2,1,16
3,1,19
4,2,16


In [24]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 24 entries, 0 to 23
Data columns (total 2 columns):
 #   Column       Non-Null Count  Dtype
---  ------       --------------  -----
 0   Machine      24 non-null     int64
 1   Measurement  24 non-null     int64
dtypes: int64(2)
memory usage: 512.0 bytes


In [34]:
print(df['Machine'].unique())

[1 2 3 4 5 6]


In [37]:
# Each machine measurements
machine1 = df.query("Machine==1")['Measurement']
machine2 = df.query("Machine==2")['Measurement']
machine3 = df.query("Machine==3")['Measurement']
machine4 = df.query("Machine==4")['Measurement']
machine5 = df.query("Machine==5")['Measurement']
machine6 = df.query("Machine==6")['Measurement']

In [40]:
# Perform ANOVA at the 0.05 level of significance and indicate whether or not 
# the mean tensile strengths differ significantly for the six machines.

In [42]:
f_statistic, p_value = f_oneway(machine1, machine2, machine3, machine4, machine5, machine6)
print(f"f-statistic: {f_statistic}")
print(f"p-value: {p_value}")

f-statistic: 0.4363636363636363
p-value: 0.8173294233639146


Results:
    
    F-statistic: 0.4364 is relatively small, suggesting that the variation between group means is not very large compared to the variation within the groups.
    Since the p-value 0.8173 > 0.05, we fail to reject the null hypothesis.
    Hence, there is no sufficient evidence to conclude that the mean tensile strengths of the six machines 
    differ significantly. That is, based on the data provided, we do not have enough evidence to suggest that 
    there are significant differences in the mean tensile strengths among the six machines.

### Part B:
A study measured the sorption (either absorption or adsorption) rates of three different types of organic chemical solvents. These solvents are used to clean industrial fabricated-metal parts and are potentially hazardous waste. Independent samples from each type of solvent were tested, and their sorption rates were recorded as a mole percentage.  Is there a significant difference in the mean sorption rates for the three solvents? Use a P-value for your conclusions. Which solvent would you use? 

In [43]:
data = pd.read_excel('Data.xlsx', sheet_name='Part B')

In [45]:
data.head()

Unnamed: 0,Solvent,Samples
0,1,1.06
1,1,0.79
2,1,0.82
3,1,0.89
4,1,1.05


In [47]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 32 entries, 0 to 31
Data columns (total 2 columns):
 #   Column   Non-Null Count  Dtype  
---  ------   --------------  -----  
 0   Solvent  32 non-null     int64  
 1   Samples  32 non-null     float64
dtypes: float64(1), int64(1)
memory usage: 640.0 bytes


In [48]:
print(data['Solvent'].unique())

[1 2 3]


In [52]:
solvent1 = data.query("Solvent == 1")['Samples']
solvent2 = data.query("Solvent == 2")['Samples']
solvent3 = data.query("Solvent == 3")['Samples']

In [53]:
solvent3

17    0.29
18    0.06
19    0.44
20    0.55
21    0.61
22    0.43
23    0.51
24    0.10
25    0.53
26    0.34
27    0.06
28    0.09
29    0.17
30    0.17
31    0.60
Name: Samples, dtype: float64

In [55]:
# Perform one-way ANOVA
f_statistic, p_value = f_oneway(solvent1, solvent2, solvent3)

In [56]:
print(f"f_statistic: {f_statistic}")
print(f"p-value: {p_value}")

f_statistic: 24.51150480130298
p-value: 5.855201452781716e-07


Conclusion:

Since the p-value is significantly smaller than 0.05, we can reject the null hypothesis that the mean sorption rates of the three solvents are equal. Thus it can be concluded that there are significant differences in mean sorption rates among the three types of organic chemical solvents.



To determine which solvent to use for cleaning industrial fabricated-metal parts based on sorption rate alone, one would typically choose the solvent with the highest or most desirable mean sorption rate, as this indicates better cleaning efficiency in terms of sorption.

In [64]:
print(f"Mean sorption rates:")
print(f"Solvent1: {np.round(solvent1.mean(), 2)}")
print(f"Solvent2: {np.round(solvent2.mean(), 2)}")
print(f"Solvent3: {np.round(solvent3.mean(), 2)}")

Mean sorption rates:
Solvent1: 0.94
Solvent2: 1.01
Solvent3: 0.33


Since the solvent with the highest or most desirable mean sorption rate indicates better cleaning efficiency in terms of sorption, hence I would use Solvent 2.