In [62]:
import numpy as np
from scipy.stats import t
import pandas as pd

# Lab | Hypothesis Testing


## Instructions

 ### It is assumed that the mean systolic blood pressure is `μ = 120 mm Hg`. In the Honolulu Heart Study, a sample of `n = 100` people had an average systolic blood pressure of 130.1 mm Hg with a standard deviation of 21.21 mm Hg. Is the group significantly different (with respect to systolic blood pressure!) from the regular population?

   - Set up the hypothesis test.
   - Write down all the steps followed for setting up the test.
   - Calculate the test statistic by hand and also code it in Python. It should be 4.76190. What decision can you make based on this calculated value? 



1. First we enunciate our null (H0) and alternative (Ha) hypotheses:

    - H0 - The  average systolic blood pressure of the selected group is 120mmHG
    - Ha - The  average systolic blood pressure of the selected group is not 120mmHG

2. We choose a confidence level. We will use the commonly used 95%

3. We compute the test statistic for our sample, i.e. t parameter. We use the t parameter because although the test sample size is bigger than 30 we don't know the population standard deviation:

$$t = \displaystyle \frac{\hat{x}-\mu}{\frac{\sigma_{x}}{\sqrt{n}}}$$

- $\hat{x} = \text{Sample Mean = 130.1 mmHG}$
- $\mu = \text{Population Mean = 120 mmHG}$
- $\sigma_{x} = \text{Corrected Sample Standard Deviation =  21.21 mmHG}$
- $n = \text{Sample Size = 100}$


In [63]:
sample_mean = 130.1
population_mean = 120
sample_std = 21.21
n = 100

In [64]:
t_score = (sample_mean - population_mean)/(sample_std/np.sqrt(n))
t_score

4.761904761904759

4. We compare our t_score with the t_critical for a two tailed test an a 95% confidence level

5. Looking into a t-table, the t critical is 1.984 (approx. for a degree of freedom = 100 instead of 99). We can also calcualte t_critical using the cumulative distribution t function provided by scipy

In [65]:
tc = t.ppf(0.975, n-1)
tc

1.9842169515086827

6. Our t_score is well above our t_critical and therefore we are in the rejection zone. We should reject our H0.



7. Conclusion: The  average systolic blood pressure of the selected group is different to the population one


### In a packing plant, a machine packs cartons with jars. It is supposed that a new machine will pack faster on the average than the machine currently used. To test that hypothesis, the times it takes each machine to pack ten cartons are recorded. The results, in seconds, are shown in the tables in the file `Data/machine.txt`. Assume that there is sufficient evidence to conduct the t test, does the data provide sufficient evidence to show if one machine is better than the other

- H0: The new machine is not better than the old one, $\hat{x}_{new} <= \hat{x}_{old}$
- Ha: The new machine is better than the old one, $\hat{x}_{new} > \hat{x}_{old}$


In [66]:
df = pd.read_csv("Data/Data_Machine.txt", sep="\t")
df = df.rename(columns = {'  Old Machine': 'Old Machine'})

In [67]:
n = 10
new_mean = round(np.mean(df["New Machine"]),2)
old_mean = round(np.mean(df["Old Machine"]),2)
new_std = round(np.std(df["New Machine"]),2)
old_std = round(np.std(df["Old Machine"]),2)

In [68]:
print(f'The parameters of the new machine sample are: mean = {new_mean} and std = {new_std}')
print(f'The parameters of the old machine sample are: mean = {old_mean} and std = {old_std}')

The parameters of the new machine sample are: mean = 42.14 and std = 0.65
The parameters of the old machine sample are: mean = 43.23 and std = 0.71


In [69]:
t_value = round((new_mean - old_mean)/np.sqrt((new_std**2/n)+(old_std**2/n)),2)
t_value

-3.58

In [70]:
print(f'The t value of the two samples is t = {t_value}')

The t value of the two samples is t = -3.58


In [71]:
tc = round(t.ppf(0.95,n-1),2)
tc

1.83

In [72]:
print(f'The experiment t critical for a 95% confidence level is {tc}')

The experiment t critical for a 95% confidence level is 1.83


The t_value is outside the rejection zone and therefore the null hypothesis shoulbe accepted.

Conclusion: The new machine is not better than the old one.