### Case:
In a packing plant, a machine packs cartons with jars. It is supposed that a new machine will pack faster on the average than the machine currently used. To test that hypothesis, the times it takes each machine to pack ten cartons are recorded. We will use the data from the .txt. We will also assume that there is sufficient evidence to conduct the t test, does the data provide sufficient evidence to show if one machine is better than the other.

### Hypothesis Testing:

- Step 1: Define the hypothesis:

Null Hypothesis (H0): 
μ(new) ≤ μ(old)(The new machine does not pack faster or is slower/equal to the old machine.)

Alternative Hypothesis (H1): 
μ(new) > μ(old) (The new machine packs faster than the old machine.)

- Step 2: Choosing the significance level:

We will assume a significance level of α=0.05, for a one-sided test.

- Step 3: Selecting the test statistic:

Since we are comparing the means of two independent samples with equal variances, we will use a pooled two-sample t-test.

- Step 4: Setting up the decision Rule:

For a one-sided test at a significance level of 0.05, find the critical t-value with degrees of freedom equal to 
n(total)−2, where n(total) is the total number of observations.

- Step 5: Calculate the test statistics:

The formula will be: 

t= (x(new)-x(old))/√(psv/n(new))+(psv/n(old))

psv = Pooled Sample Variance

- Step 6: Making the decission:



Now, we will apply all of that into Python:

In [13]:
import numpy as np
import pandas as pd
from scipy.stats import t

# Read data from the text file with specified encoding
data = pd.read_csv('files_for_lab\machine.txt', header=None, names=['New Machine', 'Old Machine'], delimiter='\t', encoding='latin1')

# Convert the data to numeric values
data['New Machine'] = pd.to_numeric(data['New Machine'], errors='coerce')
data['Old Machine'] = pd.to_numeric(data['Old Machine'], errors='coerce')

# Drop any rows with missing values
data = data.dropna()

# Extract data into NumPy arrays
new_machine_times = data['New Machine'].values
old_machine_times = data['Old Machine'].values

# Calculate sample statistics
n_new = len(new_machine_times)
n_old = len(old_machine_times)
mean_new = np.mean(new_machine_times)
mean_old = np.mean(old_machine_times)
std_new = np.std(new_machine_times, ddof=1)
std_old = np.std(old_machine_times, ddof=1)

# Calculate pooled standard deviation
pooled_std = np.sqrt(((n_new - 1) * std_new**2 + (n_old - 1) * std_old**2) / (n_new + n_old - 2))

# Calculate the test statistic
t_statistic = (mean_new - mean_old) / np.sqrt((pooled_std**2 / n_new) + (pooled_std**2 / n_old))

# Degrees of freedom
df = n_new + n_old - 2

# Calculate critical t-value for a one-sided test
alpha = 0.05
critical_t_value = t.ppf(1 - alpha, df)

# Print results
print("Calculated t-statistic:", t_statistic)
print("Critical t-value:", critical_t_value)

# Compare t-statistic with critical t-value
if t_statistic > critical_t_value:
    print("Reject the null hypothesis. There is sufficient evidence that the new machine packs faster.")
else:
    print("Fail to reject the null hypothesis. There is not sufficient evidence that the new machine packs faster.")


Calculated t-statistic: nan
Critical t-value: nan
Fail to reject the null hypothesis. There is not sufficient evidence that the new machine packs faster.


  return _methods._mean(a, axis=axis, dtype=dtype,
  ret = ret.dtype.type(ret / rcount)
  ret = _var(a, axis=axis, dtype=dtype, out=out, ddof=ddof,
  arrmean = um.true_divide(arrmean, div, out=arrmean,
  ret = ret.dtype.type(ret / rcount)


As using the machine.txt file directly gave me bad results, I will create the arrays manually and check if it works that way.

In [14]:
import numpy as np
from scipy.stats import t

# Data
new_machine_times = np.array([42.1, 41, 41.3, 41.8, 42.4, 42.8, 43.2, 42.3, 41.8, 42.7])
old_machine_times = np.array([42.7, 43.6, 43.8, 43.3, 42.5, 43.5, 43.1, 41.7, 44, 44.1])

# Calculate sample statistics
n_new = len(new_machine_times)
n_old = len(old_machine_times)
mean_new = np.mean(new_machine_times)
mean_old = np.mean(old_machine_times)
std_new = np.std(new_machine_times, ddof=1)
std_old = np.std(old_machine_times, ddof=1)

# Calculate pooled standard deviation
pooled_std = np.sqrt(((n_new - 1) * std_new**2 + (n_old - 1) * std_old**2) / (n_new + n_old - 2))

# Calculate the test statistic
t_statistic = (mean_new - mean_old) / np.sqrt((pooled_std**2 / n_new) + (pooled_std**2 / n_old))

# Degrees of freedom
df = n_new + n_old - 2

# Calculate critical t-value for a one-sided test
alpha = 0.05
critical_t_value = t.ppf(1 - alpha, df)

# Print results
print("Calculated t-statistic:", t_statistic)
print("Critical t-value:", critical_t_value)

# Compare t-statistic with critical t-value
if t_statistic > critical_t_value:
    print("Reject the null hypothesis. There is sufficient evidence that the new machine packs faster.")
else:
    print("Fail to reject the null hypothesis. There is not sufficient evidence that the new machine packs faster.")


Calculated t-statistic: -3.3972307061176026
Critical t-value: 1.7340636066175354
Fail to reject the null hypothesis. There is not sufficient evidence that the new machine packs faster.


With the results of the code, we can say that we fail to reject the null hypothesis, because there is not sufficient evidence that the new machine is packing faster than the old one.

Now I will use ttest and p-values:

In [17]:
from scipy.stats import ttest_ind

#Creating the data
new_machine = np.array([42.1, 41, 41.3, 41.8, 42.4, 42.8, 43.2, 42.3, 41.8, 42.7])
old_machine = np.array([42.7, 43.6, 43.8, 43.3, 42.5, 43.5, 43.1, 41.7, 44, 44.1])

# Perform the two-sample t-test
t_statistic, p_value = ttest_ind(new_machine, old_machine, alternative='greater')

# Print the results
print("Calculated t-statistic:", t_statistic)
print("P-value:", p_value)

# Compare p-value with significance level
alpha = 0.05
if p_value < alpha:
    print("Reject the null hypothesis. There is sufficient evidence that the new machine packs faster.")
else:
    print("Fail to reject the null hypothesis. There is not sufficient evidence that the new machine packs faster.")


Calculated t-statistic: -3.3972307061176026
P-value: 0.9983944287496127
Fail to reject the null hypothesis. There is not sufficient evidence that the new machine packs faster.
