# Lab Hypothesis Testing

It is assumed that the mean systolic blood pressure is μ = 120 mm Hg. In the Honolulu Heart Study, a sample of n = 100 people had an average systolic blood pressure of 130.1 mm Hg with a standard deviation of 21.21 mm Hg. Is the group significantly different (with respect to systolic blood pressure!) from the regular population?

Set up the hypothesis test.
Write down all the steps followed for setting up the test.
Calculate the test statistic by hand and also code it in Python. It should be 4.76190. What decision can you make based on this calculated value?

In [1]:
import numpy as np
import pandas as pd
import scipy.stats as stats
import matplotlib.pyplot as plt
import seaborn as sns

%matplotlib inline

In [84]:
mu = 120
n = 100
x_ = 130.1
s = 21.21   #std. from sample (not population) therefore use t-statistics!

# is the group statistically different from the regular population?

# H0: mu = 120
# H1: mu ≠ 120

In [85]:
t_score = (x_ - mu)/(s/np.sqrt(n))
t_score

4.761904761904759

In [79]:
cv = stats.t.ppf(0.975, 99)
cv

1.9842169515086827

In [82]:
print(t_score > cv)
print('the H0 can be rejected at a confidence interval of 95%, the people from the Honolulu Heart Study have significantly larger blood pressure')

True
the H0 can be rejected at a confidence interval of 95%, the people from the Honolulu Heart Study have significantly larger blood pressure


## optional

In [None]:
# 2 sample t-test: test if two samples are different

# H0: mu_new_ - mu_old = 0 / both machines are equal
# H1: mu_new - mu_old < 0  /the new machine is better

# t= (sample_mean(x1) - sample_mean(x2)) / sqrt(square(s1)/n1 + square(s2)/n2 )

In a packing plant, a machine packs cartons with jars. It is supposed that a new machine will pack faster on the average than the machine currently used. To test that hypothesis, the times it takes each machine to pack ten cartons are recorded. The results, in seconds, are shown in the tables in the file Data/machine.txt. Assume that there is sufficient evidence to conduct the t test, does the data provide sufficient evidence to show if one machine is better than the other.

In [50]:
file=open("/Users/katharina/Documents/GitHub/DA-class-materials/Class_Materials/Statistics/Lab/Data/Data_Machine.txt","r")
print(file.read())

New Machine	  Old Machine
42.1	        42.7
41	          43.6
41.3	        43.8
41.8	        43.3
42.4	        42.5
42.8	        43.5
43.2	        43.1
42.3	        41.7
41.8	        44
42.7	        44.1



In [77]:
file_path="/Users/katharina/Documents/GitHub/DA-class-materials/Class_Materials/Statistics/Lab/Data/Data_Machine.txt"

df = pd.read_csv(file_path, sep="\t")  # Use sep="\t" if your file is tab-separated
df

Unnamed: 0,New Machine,Old Machine
0,42.1,42.7
1,41.0,43.6
2,41.3,43.8
3,41.8,43.3
4,42.4,42.5
5,42.8,43.5
6,43.2,43.1
7,42.3,41.7
8,41.8,44.0
9,42.7,44.1


In [41]:
def lower_case_column_names(columns):
    df.columns = [i.lower() for i in df.columns]
    return df

def replace_white_spaces(columns):
    df.columns = [i.replace(" ","_") for i in df.columns]
    return df

lower_case_column_names(df)
replace_white_spaces(df)

df = df.rename(columns=lambda x: x.replace("__", ""))


In [76]:
#mean_new = np.mean(df['new_machine'])

new = df.new_machine
old = df.old_machine

x_new_ = np.mean(df['new_machine'])
x_old_ = np.mean(df['old_machine']).round(2)

s_new = round(np.std(df.new_machine),2)
s_old = round(np.std(df.old_machine),2)

print('x_ new machine: ',x_new_)
print('x_ old machine: ',x_old_)
print('sample std new machine: ',s_new)
print('sample std old machine: ',s_old)

x_ new machine:  42.14
x_ old machine:  43.23
sample std new machine:  0.65
sample std old machine:  0.71


In [74]:
# H0: mu_new_ - mu_old = 0 / both machines are equal
# H1: mu_new - mu_old < 0  /the new machine is better

t = (x_new_ - x_old_) / np.sqrt(np.square(s_new)/len(new) + np.square(s_old)/len(old))

cv = stats.t.ppf(0.95, len(new)-1)

score = t < cv

if score == True:
    print('H0 can be rejected in 95% of the cases. Hence the new machine is statistically significant faster')
else:
    print('H0 cannot be rejected in 95% of the cases. They are statistically significant equally fast')

H0 can be rejected in 95% of the cases. Hence the new machine is statistically significant faster


In [78]:
t

-3.5808023511626366