Instructions

It is assumed that the mean systolic blood pressure is μ = 120 mm Hg. In the Honolulu Heart Study, a sample of n = 100 people had an average systolic blood pressure of 130.1 mm Hg with a standard deviation of 21.21 mm Hg. Is the group significantly different (with respect to systolic blood pressure!) from the regular population?

- Set up the hypothesis test.
- Write down all the steps followed for setting up the test.
- Calculate the test statistic by hand and also code it in Python. It should be 4.76190. What decision can you make based on this calculated value?

Hint: statistic table t_student

In [1]:
import numpy as np
import pandas as pd
import scipy.stats as stats
import matplotlib.pyplot as plt
import seaborn as sns

%matplotlib inline

In [2]:
# H0: μ = 120
# H1: μ != 120 

In [3]:
samp_size = 100
pop_mean = 120
sam_mean = 130.1
sam_sdt = 21.21

In [4]:
sem = 21.21/np.sqrt(samp_size)
sem

2.121

In [5]:
t = (sam_mean-pop_mean)/sem
print("The t score of our sample is: {:.2f}".format(t))

The t score of our sample is: 4.76


Let's fix our confidence level to 95% which is the same as saying alpha = 1 - 0.95 = 0.05

In [6]:
# t_value for 99 degrees of freedom and 0.025 tow sided test: 1.984
t_value = 1.984
if t > t_value:
    print ('H0 is rejected, therefore, the group is statistically significant different from the population')
else:
    print ('H0 is not rejected, the group is not statistically significant different from the population')

H0 is rejected, therefore, the group is statistically significant different from the population


Optional

In a packing plant, a machine packs cartons with jars. It is supposed that a new machine will pack faster on the average than the machine currently used. To test that hypothesis, the times it takes each machine to pack ten cartons are recorded. The results, in seconds, are shown in the tables in the file Data/machine.txt. Assume that there is sufficient evidence to conduct the t test, does the data provide sufficient evidence to show if one machine is better than the other.
Hint: use two sample t-test

t= (sample_mean(x1) - sample_mean(x2)) / sqrt(square(s1)/n1 + square(s2)/n2 )

In [7]:
machines_df = pd.read_csv('Data_machine.txt', sep='\t')
machines_df

Unnamed: 0,New Machine,Old Machine
0,42.1,42.7
1,41.0,43.6
2,41.3,43.8
3,41.8,43.3
4,42.4,42.5
5,42.8,43.5
6,43.2,43.1
7,42.3,41.7
8,41.8,44.0
9,42.7,44.1


In [8]:
machines_df = machines_df.rename(columns={'New Machine':'new_machine','  Old Machine':'old_machine'})
machines_df

Unnamed: 0,new_machine,old_machine
0,42.1,42.7
1,41.0,43.6
2,41.3,43.8
3,41.8,43.3
4,42.4,42.5
5,42.8,43.5
6,43.2,43.1
7,42.3,41.7
8,41.8,44.0
9,42.7,44.1


In [9]:
# H0: μ1 = μ2
# H1: μ1 != μ2 

In [10]:
ss = 10
mean1 = machines_df['old_machine'].mean()
mean2 = machines_df['new_machine'].mean()
std1 = machines_df['old_machine'].std()
std2 = machines_df['new_machine'].std()
print(mean1, mean2, std1, std2)

43.230000000000004 42.14 0.7498888806572157 0.6834552736727638


In [11]:
t2 = ((mean1) - (mean2)) / np.sqrt(((std1**2)/ss) + ((std2**2)/ss) )
2

2

In [12]:
# t_value for 18 (10+10-2) degrees of freedom and 0.025 tow sided test: 2.060
t_value2 = 2.060
if t2 > t_value2:
    print ('H0 is rejected, therefore, both machines performe equally')
else:
    print ('H0 is not rejected, one machine is better than the other ')

H0 is rejected, therefore, both machines performe equally
