Understanding the Situation:
A company wants to know if a new machine is faster at packing jars into boxes than the old machine. They recorded how much time it takes for each machine to pack 10 boxes. They want to see if the new machine is really quicker or not.

They have two guesses:

Guess 1 (Null Hypothesis): Maybe the new machine is not really faster, and the time it takes is about the same as the old machine.
Guess 2 (Alternative Hypothesis): But maybe, just maybe, the new machine is actually faster, and it takes less time.

Calculating:

They look at how much time each machine took and find some numbers:

For the old machine, the average time is about 35.3 seconds.
For the new machine, the average time is about 32.9 seconds.

In [5]:
import pandas as pd
from scipy.stats import t

In [17]:
data = pd.read_csv('machine.txt', sep='\t', encoding='utf-16')

In [18]:
print(data.head())

   New machine      Old machine
0         42.1             42.7
1         41.0             43.6
2         41.3             43.8
3         41.8             43.3
4         42.4             42.5


In [21]:
print(data.columns)

Index(['New machine', '    Old machine'], dtype='object')


In [25]:
# Calculate means and standard deviations
mean_old = data['    Old machine'].mean()
mean_new = data['New machine'].mean()
std_old = data['    Old machine'].std()
std_new = data['New machine'].std()

print("Mean of    Old machine:", mean_old)
print("Mean of New Machine:", mean_new)
print("Standard Deviation of    Old machine:", std_old)
print("Standard Deviation of New Machine:", std_new)

Mean of    Old machine: 43.230000000000004
Mean of New Machine: 42.14
Standard Deviation of    Old machine: 0.7498888806572157
Standard Deviation of New Machine: 0.6834552736727638


Time to Pack:

On average, the old machine takes around 43.23 seconds to pack, while the new machine takes about 42.14 seconds.
Consistency:

The times for both machines vary around their averages.
For the old machine, the times are about 0.75 seconds away from its average.
For the new machine, the times are about 0.68 seconds away from its average.
So, the new machine is a bit faster on average and has more consistent packing times compared to the old machine.

In [24]:
# Calculate pooled standard deviation
n1 = n2 = 10  # Both samples have 10 observations
pooled_std = ((n1 - 1) * std_old**2 + (n2 - 1) * std_new**2) / (n1 + n2 - 2)
pooled_std = pooled_std**0.5
print("Pooled Standard Deviation:", pooled_std)

Pooled Standard Deviation: 0.7174414416676962


Pooled Standard Deviation:

Think of the pooled standard deviation as a measure of how much the packing times for both machines are spread out from their averages.
The value of 0.72 (approximately) is like an average amount that tells us how the packing times tend to vary around their respective averages.
In simpler words, the pooled standard deviation gives us an idea of how consistent the packing times are for both machines. A smaller value means the times are closer to the average, and a larger value means they are more spread out. In this case, the value suggests that the packing times for both machines are fairly consistent around their average times.

In [26]:
# Calculate t-test statistic
t_statistic = (mean_old - mean_new) / (pooled_std * (2 / n1)**0.5)
print("Calculated t-test statistic:", t_statistic)

Calculated t-test statistic: 3.3972307061176026


Calculated t-test Statistic:

This number, 3.40 (approximately), is like a special value we calculated to see if the new machine is really faster than the old machine.
It's based on the differences in the average packing times and how much the times vary within each group.
Think of it like a score. If the score is bigger, it means the new machine is more noticeably faster compared to the old machine. This score helps us figure out if the difference we see in packing times is just due to chance or if it's actually a meaningful difference. In this case, a higher t-test statistic suggests that there might be a real difference between the two machines' packing times.

In [27]:
# Degrees of freedom for t-distribution
degrees_of_freedom = n1 + n2 - 2
print("Degrees of Freedom:", degrees_of_freedom)

Degrees of Freedom: 18


Degrees of Freedom:

Imagine you're solving a puzzle, and the pieces you have are like bits of information.
The "Degrees of Freedom" here, which is 18, tells us how many pieces of information we really have when we're comparing the two machines' packing times.
In simple words, it's like saying, "Okay, we have enough bits of information to make a fair comparison between the old and new machines." It helps us use the right tools to decide if one machine is really better than the other.

In [28]:
# Calculate critical value from t-distribution for a one-sided test at 0.05 significance level
critical_value = t.ppf(0.05, df=degrees_of_freedom)
print("Calculated critical value:", critical_value)

Calculated critical value: -1.734063606617536


Calculated Critical Value:

Think of this number, -1.73 (approximately), as a special number that helps us make a decision.
Imagine you're at a crossroads and this number tells you which path to take.

In simpler terms, if the calculated critical value is bigger than the t-test statistic (which we calculated earlier), it's like saying, "Okay, it's safe to take the new path. The new machine is likely faster." But if it's smaller, it might mean, "Stick to the old path. The machines might not be that different." It's a way of helping us decide if the difference in packing times between the old and new machines is something we can trust or if it might just be a random thing.

In [29]:
# Compare t-statistic with critical value
if t_statistic < critical_value:
    result = "Yes, the new machine is faster."
else:
    result = "No, there's not enough evidence that the new machine is faster."

print(result)

No, there's not enough evidence that the new machine is faster.


Overall Conclusion:

After looking at all the numbers and calculations, we can say that there's not enough solid evidence to confidently say that the new machine is definitely faster than the old machine.

Why?

We calculated things like average packing times and how much the times vary.
We found a special number (the t-test statistic) that helps us decide if the new machine is truly quicker.
Then, we compared it with another special number (the critical value).

The Decision:

In our case, the t-test statistic we got (3.40) is quite a bit bigger than the critical value (-1.73).
This means that while the new machine looks faster, the difference might not be big enough to be super sure.
It's like saying, "Hey, the new machine seems faster, but the difference might not be strong evidence."
So, based on what we've figured out, we can't say with a lot of confidence that the new machine is clearly faster. It's a bit like looking at clues but not having the whole picture.