In [62]:
import pandas as pd
import numpy as np
import scipy.stats
from math import sqrt

# Hypothesis testing lab

1. It is assumed that the mean systolic blood pressure is `μ = 120 mm Hg`. In the Honolulu Heart Study, a sample of `n = 100` people had an average systolic blood pressure of 130.1 mm Hg with a standard deviation of 21.21 mm Hg. Is the group significantly different (with respect to systolic blood pressure!) from the regular population?

   - Set up the hypothesis test.
   - Write down all the steps followed for setting up the test.
   - Calculate the test statistic by hand and also code it in Python. It should be 4.76190. What decision can you make based on this calculated value?

In [56]:
x = 130.1
mu = 120
std_value = 21.21
N = 100
alpha = 0.05
z = abs((x - mu) / (std_value/N**0.5))
zc = scipy.stats.norm.ppf(1-(alpha/2))
print(f"The z value is {z:.5f}, whereas the critical z in a two-tailed \
test with a confidence interval of {int(alpha*100)} % is {zc:.5f}.")
print(f"This means that the null hypthesis is {zc > z}.")

The z value is 4.76190, whereas the critical z in a two-tailed test with a confidence interval of 5 % is 1.95996.
This means that the null hypthesis is False.


2. We will have another simple example on two sample t test. But this time this is a one sided t-test
In a packing plant, a machine packs cartons with jars. It is supposed that a new machine will pack faster on the average than the machine currently used. To test that hypothesis, the times it takes each machine to pack ten cartons are recorded. The results, in seconds, are shown in the tables in the file Data_Machine.txt. Assume that there is sufficient evidence to conduct the t test, does the data provide sufficient evidence to show if one machine is better than the other

In [57]:
df = pd.read_csv("Data_Machine.txt")

We use the appropiate function from the Scipy library.

In [69]:
alpha = 0.05
test, pvalue = scipy.stats.ttest_ind(df["Old_Machine"], df["New_Machine"], alternative="two-sided")

In [71]:
print(test, pvalue)

3.3972307061176026 0.0032111425007745158


In [76]:
print(f"The p-value for the test is {(pvalue*100):.3f} %. The confidence level is {int(alpha*100)} %.")
print(f"That the pvalue is smaller than alpha is then {pvalue<alpha}.")
print(f"This means that the alternative hypothesis, that is, that the machines have different means, is {pvalue<alpha}.")

The p-value for the test is 0.321 %. The confidence level is 5 %.
That the pvalue is smaller than alpha is then True.
This means that the alternative hypothesis, that is, that the machines have different means, is True.


Another way of doing it:

In [73]:
mu_old = df["Old_Machine"].mean()
mu_new = df["New_Machine"].mean()
std_old = np.std(df["Old_Machine"], ddof=1)
std_new = np.std(df["New_Machine"], ddof=1)
N = len(df)
degrees = len(df)*2 - 2

In [74]:
tc = scipy.stats.t.ppf(1-(alpha/2), df=degrees)
t = abs(mu_old - mu_new)/sqrt((std_old**2 + std_new**2)/10)

In [75]:
print(f"Since t = {t:.4f} is greater than tc = {tc:.4f}, we have to reject the null hypothesis, meaning the machines have different means.")

Since t = 3.3972 is greater than tc = 2.1009, we have to reject the null hypothesis, meaning the machines have different means.
