The production department at Greenside Corporation, a manufacturer of lawn equipment, has devised a new manual assembly method for its lawn tractors. Now it wishes to determine if it is reasonable to conclude that the mean assembly time of the new method is less than the old method. Accordingly, they have randomly sampled assembly times (in minutes) from 40 tractors using the old method and 32 tractors using the new method. The data is saved in the Tractor_Times.csv file. Use the data to test if the method makes significantly difference to the assembly times.

In [6]:
import pandas as pd
import scipy.stats as stats


df = pd.read_csv("/content/sample_data/Tractor_Times.csv", sep=",")
df.head()

Unnamed: 0,Method,Times
0,Old,32
1,Old,36
2,Old,34
3,Old,33
4,Old,36


Here is a detailed breakdown of each part of the line:
stats.ttest_ind(...): This calls the ttest_ind function from the scipy.stats library. The function calculates the T-test for the means of two independent samples of scores. The null hypothesis of this test is that the two independent samples (old and new times) have identical average (expected) values.
(old, new, equal_var=False): These are the arguments passed to the function:
old and new are the two data samples (pandas Series) containing the recorded tractor times for each method.
equal_var=False is an important parameter that tells the function not to assume that the variances of the two populations are equal. When set to False, the function uses the Welch's t-test, which is more robust when sample sizes or variances might be different.
_, p_value = ...: This uses Python's tuple unpacking feature to assign the values returned by the function to two variables:
The stats.ttest_ind function returns a tuple containing two main values: the calculated t-statistic and the p-value.
The _ (underscore) is a Python convention for a "throwaway" or dummy variable. By assigning the t-statistic to _, the programmer indicates that they are not interested in using the t-statistic value itself later in the script.
p_value is the variable where the relevant p-value is stored.
In summary: The sole purpose of this line within the script is to execute the statistical comparison and capture only the p-value needed to decide whether the new method's average time is significantly different from the old method's average time

In [9]:
old = df.loc[df['Method'] == "Old", "Times"]
new = df.loc[df['Method'] == "New", "Times"]


# this performs an independent samples t-test to determine if there is a statistically
#significant difference between the average 'Old' and 'New' tractor times.
#It then specifically extracts and stores only the resulting p-value for analysis
_, p_value = stats.ttest_ind(old, new, equal_var=False)

print("P-value: ", p_value)

if p_value < 0.05:
    print("Reject the null hypothesis: Assembly times are significantly different.")
else:
    print("Fail to reject the null hypothesis: No sigificant difference.")


P-value:  0.15574050354289481
Fail to reject the null hypothesis: No sigificant difference.
