The following demonstrates a statistical method for determining if two sample means can be said to have come from the same population. 

In research published in the summer of 2005, Chinese researchers determined the concentrations of several organochlorine contaminants in male and female adults. One class of contaminants of concern were the alpha isomers of hexachlorocyclohexanes, identified here as α-HCHs. Samples were collected from 16 male and 15 female adults. The results are shown below under the variables datam (males) and dataf (females). Concentration units are ng/g (ppt).

Part A- Test the following hypothesis scheme at the 5% level of significance where µm is the true average concentration of α-HCHs in males and µf is the true average concentration in females. Use a one-tailed test in performing your analysis and use the t value as the test statistic. What is the value of the critical value? What is the value of the test statistic based upon sampling data? 

Ho: µm = µf,    H1: µm > µf

In [28]:
import numpy as np
from scipy import stats as sp

In [33]:
datam = [1.1,1.9,1.0,1.4,4.4,2.7,1.1,1.3,1.6,4.7,1.0,1.6,4.1,7.6,7.7,3.3]
dataf = [.93,.89,.89,.78,1.0,.93,1.4,1.7,1.1,1.9,2.6,2.9,2.4,1.0,2.9]

s1 = np.var(datam, ddof=1)
s2 = np.var(dataf, ddof=1)
n1 = len(datam)
n2= len(dataf)

print('The values of s1^2, s2^2, n1 and n2 are:', s1, s2, n1, n2)

The values of s1^2, s2^2, n1 and n2 are: 5.00995833333 0.619826666667 16 15


In [38]:
#Calculating the degrees of freedom, denoted by v "nu"
nu = ((((s1/n1)+(s2/n2))**2)/(((s1/n1)**2)*(1/(n1-1))+((s2s/n2)**2)*(1/(n2-1))))
print('The degrees of freedom (nu) is:', nu)

The degrees of freedom (nu) is: 19.0834331863


In [42]:
#Finding the critical value of the t distribution at α=.05, v=19
print(sp.t.isf(.05,19))

1.72913281152


In [43]:
#Running a t test to determine the value of the test statistic and p-value
print(sp.ttest_ind(datam, dataf, equal_var = False))

Ttest_indResult(statistic=2.27022502614536, pvalue=0.035107644314830447)


2.27 > 1.73, p-value= 3.5%

The data falls into the rejection region. Therefore, we reject the null hypothesis that µm = µf.  

Part B- Develop a 95% confidence interval for the difference of the population means using a one sided test. 

In [49]:
#Solving for the statistic t':
tcrit = sp.t.isf(.05,19)
diff_means= np.mean(datam)-np.mean(dataf)

print('The 95% confidence interval is:', diff_means-(np.sqrt((s1/n1)+(s2/n2))), '< µm - µf <', diff_means+(np.sqrt((s1/n1)+(s2/n2))))

The 95% confidence interval is: 0.756231190807 < µm - µf < 1.94693547586


The confidence interval provides further evidence that the difference between the two means is not 0, as the 95% confidence interval does not contain 0. 