## Instructions

It is assumed that the mean systolic blood pressure is μ = 120 mm Hg. In the Honolulu Heart Study, a sample of n = 100 people had an average systolic blood pressure of 130.1 mm Hg with a standard deviation of 21.21 mm Hg. Is the group significantly different (with respect to systolic blood pressure!) from the regular population?

Set up the hypothesis test.
Write down all the steps followed for setting up the test.
Calculate the test statistic by hand and also code it in Python. It should be 4.76190. What decision can you make based on this calculated value?


In [34]:
import numpy as np
import pandas as pd
import scipy.stats as stats
import matplotlib.pyplot as plt
import seaborn as sns

%matplotlib inline

## Problem statement:

Assumed The mean systolic blood pressure is μ = 120 mm Hg.
However the Honolulu Heart Study, with a sample of n = 100 people 
had an average systolic blood pressure of 130.1 mm Hg 
with a standard deviation of 21.21 mm Hg.

Is the group significantly different with respect to the assumed systolic blood pressure :(

- H0: mu = 120 mm Hg
- Ha: mu!= 120 mm Hg

In [3]:
a_mean_blood_p= 120
s_size= 100
#population_stand_dev= not known
sample_mean_blood_p= 130.1
sample_stand_dev= 21.21
sem = sample_stand_dev/np.sqrt(s_size)
sem

2.121

In [4]:
z = (sample_mean_blood_p-a_mean_blood_p)/sem
print("The z score of our sample is: {:.2f}".format(z)) #Almost 5 st dev away from the mean

The z score of our sample is: 4.76


In [5]:
zc = stats.t.ppf(1-(0.05/2), df = 99)
print("The critical value corresponding to a 0.95 area of a t distribution is: {:.2f}".format(zc))

The critical value corresponding to a 0.95 area of a t distribution is: 1.98


In [7]:
p_value = stats.t.sf(4.76, df = 99)
print(p_value* 100) #less than 0.0003% probablility of observing a mean systolic blood pressure of 120, so we can reject the H0 and accept the alternative hypothesis
#Conclusion: Honolulu patients don't tend to have on average 120 mmHg systolic pressure

0.00033066342302957467


In [None]:
"""We can not oly reject the H0 but create the confidence interval,95% chance that the value is between 126:134

The use of confidence interval goes on 2 standard errors to the left and right-1.96 standard de"""

## Optional
In a packing plant, a machine packs cartons with jars. It is supposed that a new machine will pack faster on the average than the machine currently used. To test that hypothesis, the times it takes each machine to pack ten cartons are recorded. The results, in seconds, are shown in the tables in the file Data/machine.txt. Assume that there is sufficient evidence to conduct the t test, does the data provide sufficient evidence to show if one machine is better than the other.
Hint: use two sample t-test

t= (sample_mean(x1) - sample_mean(x2)) / sqrt(square(s1)/n1 + square(s2)/n2 )

In [13]:
import pandas as pd
url= "https://raw.githubusercontent.com/repicao/IH_AB_DA_FT_JAN_2024/main/Class_Materials/Statistics/Lab/Data/Data_Machine.txt"
df = pd.read_csv(url)
df.describe

<bound method NDFrame.describe of   New Machine\t  Old Machine
0         42.1\t        42.7
1         41\t          43.6
2         41.3\t        43.8
3         41.8\t        43.3
4         42.4\t        42.5
5         42.8\t        43.5
6         43.2\t        43.1
7         42.3\t        41.7
8           41.8\t        44
9         42.7\t        44.1>

In [14]:
df[['new_machine', 'old_machine']] = df['New Machine\t  Old Machine'].str.split('\t', expand=True)

In [16]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10 entries, 0 to 9
Data columns (total 3 columns):
 #   Column                     Non-Null Count  Dtype 
---  ------                     --------------  ----- 
 0   New Machine	  Old Machine  10 non-null     object
 1   new_machine                10 non-null     object
 2   old_machine                10 non-null     object
dtypes: object(3)
memory usage: 368.0+ bytes


In [33]:
# Assuming 'new_machine' and 'old_machine' contain string representations of numbers
df['new_machine'] = df['new_machine'].astype(float)
df['old_machine'] = df['old_machine'].astype(float)
df.info()


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10 entries, 0 to 9
Data columns (total 3 columns):
 #   Column                     Non-Null Count  Dtype  
---  ------                     --------------  -----  
 0   New Machine	  Old Machine  10 non-null     object 
 1   new_machine                1 non-null      float64
 2   old_machine                0 non-null      float64
dtypes: float64(2), object(1)
memory usage: 368.0+ bytes


In [None]:
df['new_machine'].mean()
s_size= 9
#population_stand_dev= not known
sample_mean_blood_p= 130.1
sample_stand_dev= 21.21
sem = sample_stand_dev/np.sqrt(s_size)
sem

In [35]:
nm_mean= df['new_machine'].mean()
om_mean= df['old_machine'].mean()

41.0