# Assignment 4 Solutions

In performing a two-sample t-test, there are two distinct situations to consider:

1.  The variances of the two samples are equal to one another (i.e. we are sampling from the same population).
2.  The variances of the two samples are not equal to one another (i.e. we are sampling from two different populations).

For this assignment, the textbook assumes always that situation 2 is the case!!!!!

In these instances, we calculate the standard error in the mean (SEM) and the combined number of degrees of freedom as follows:

$SEM = \sqrt{\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2}}$

$df = \frac{ \left( \frac{s_1^2}{n_1} + \frac{s_2^2}{n_2}    \right)^2 }{\frac{ \left(\frac{s_1^2}{n_1}\right)^2   }{n_1-1} + \frac{ \left(\frac{s_2^2}{n_2}\right)^2   }{n_2-1}}$

In [1]:
import numpy as np
import scipy.stats as stats
import matplotlib.pyplot as plt

def sem_neq(n1,n2,s1,s2):
    sm = np.sqrt(s1**2/n1+s2**2/n2)
    return float(sm)

def ndof_neq(n1,n2,s1,s2):
    v1 = s1**2/n1
    v2 = s2**2/n2
    dof = (v1+v2)**2/(v1**2/(n1-1)+v2**2/(n2-1))
    return int(dof)

def sem_eq(n1,n2,s1,s2):
    sp = np.sqrt(((n1-1)*s1**2+(n2-1)*s2**2)/(n1+n2-2))
    sm = sp*np.sqrt(1.0/n1+1.0/n2)
    return float(sm)

def ndof_eq(n1,n2,s1,s2):
    dof = n2+n1-2
    return int(dof)

# Question 1

Determine the number of degrees of freedom for the two-sample t test or CI in each of the following situations. 
(Exact integer answers required.)

(a) m = 10, n = 13, s1 = 4.8, s2 = 5.7


(b) m = 14, n = 23, s1 = 5.1, s2 = 5.8


(c) m = 6, n = 7, s1 = 2.3, s2 = 6.2


(d) m = 10, n = 23, s1 = 4.1, s2 = 6.6

In [5]:
m = 15
n = 11
s1 = 4.8
s2 = 5.6
print("a: ", ndof_neq(m, n, s1, s2))

print("b: ", ndof_neq(12, 23, 4.6, 6.3))

print("c: ", ndof_neq(10, 9, 2.3, 6.3))

print("d: ", ndof_neq(10, 23, 4.4, 6.9))

a:  19
b:  29
c:  9
d:  26


# Question 2

Let μ1 and μ2 denote true average densities for two different types of brick. Assuming normality of the two density distributions, test Ho: μ1 – μ2 = 0 versus Ha: μ1 – μ2 ≠ 0 using the following data: m = 6, x = 22.27, s1 = 0.156, n = 5, y = 20.22, and s2 = 0.234. (Use α = 0.05. Give ν to exact integer and t to 2 decimal places.)

In [13]:
m = 6
xbar = 23.06
s1 = 0.152
n = 5
ybar = 20.26
s2 = 0.231

v = ndof_neq(m, n, s1, s2)
print("v =", v)
SEM = sem_neq(m, n, s1, s2)
tvalue = (xbar - ybar) / SEM
print("t =", tvalue)

a = 0.05
tdist = stats.t(v)
pvalue = tdist.cdf(-np.abs(tvalue))
print(tdist.cdf(-np.abs(tvalue)))

if (a > pvalue):
    print("reject")

else:
    print("fail to reject")


v = 6
t = 23.234421101687385
2.0839153684719808e-07
reject


# Question 3

Quantitative noninvasive techniques are needed for routinely assessing symptoms of peripheral neuropathies, such as carpal tunnel syndrome (CTS). An article reported on a test that involved sensing a tiny gap in an otherwise smooth surface by probing with a finger; this functionally resembles many work-related tactile activities, such as detecting scratches or surface defects. When finger probing was not allowed, the sample average gap detection threshold for m = 8 normal subjects was 1.8 mm, and the sample standard deviation was 0.49; for n = 12 CTS subjects, the sample mean and sample standard deviation were 2.52 and 0.85, respectively. Does this data suggest that the true average gap detection threshold for CTS subjects exceeds that for normal subjects? State and test the relevant hypotheses using a significance level of .01. (Give answers accurate to 2 decimal places.)

In [52]:
m = 10
xbar = 1.77
s1 = 0.49
n = 12
ybar = 2.48
s2 = 0.85
alpha = 0.01

v = ndof_neq(m, n, s1, s2)
SEM = sem_neq(m, n, s1, s2)
tvalue = np.abs(xbar - ybar) / SEM
print("t =", tvalue)

tdist = stats.t(v)

pvalue = 2.0 *tdist.cdf(-np.abs(tvalue))
print("pvalue =", pvalue)

critical = tdist.ppf(1 - alpha)
print("critical =", critical)


if (pvalue < alpha):
    print ("Reject the null hypothesis ... P-value is less than alpha")
else:
    print ("Fail to reject the null hypothesis ... P-value is greater than alpha")

t =  2.44655524809463
pvalue = 0.02492013718836604
critical = 2.552379630179453
Fail to reject the null hypothesis ... P-value is greater than alpha


# Question 4

The slant shear test is widely accepted for evaluating the bond of resinous repair materials to concrete; it utilizes cylinder specimens made of two identical halves bonded at 30°. For 12 specimens prepared using wire-brushing, the sample mean shear strength (N/mm2) and sample standard deviation were 18.23 and 1.48, respectively, whereas for 12 hand-chiseled specimens, the corresponding values were 23.47 and 4.01. Does the true average strength appear to be different for the two methods of surface preparation? Test the relevant hypotheses using a significance level of .05. (Give ν to exact integer and t to 2 decimal places.)

In [58]:
n1 = 12
xbar1 = 18.08
s1 = 1.51
n2 = 12
xbar2 = 23.35
s2 = 4.04
alpha = 0.05

df = ndof_neq(n1, n2, s1, s2)
print("df =", df)

SEM = sem_neq(n1, n2, s1, s2)
tvalue = (xbar1 - xbar2) / SEM
print("t =", tvalue)

tdist = stats.t(df)
pvalue = 2.0 *tdist.cdf(-np.abs(tvalue))
print("pvalue =", pvalue)

if (pvalue < alpha):
    print ("Reject the null hypothesis ... P-value is less than alpha")
else:
    print ("Fail to reject the null hypothesis ... P-value is greater than alpha")

df = 14
t = -4.232772434586921
pvalue = 0.0008356142737287534
Reject the null hypothesis ... P-value is less than alpha


# Question 5

Consider the accompanying data on breaking load (kg/25 mm width) for various fabrics in both an unabraded condition and an abraded condition. Use the paired t test to test Ho: μD = 0 versus Ha: μD > 0 at significance level .01. (Give answers accurate to 2 decimal places.)

In [83]:
unabraded = np.array([39.2, 55.0, 59.9, 38.7, 49.2, 48.8, 29.9, 49.8 ])
abraded = np.array([24.3, 20.0, 40.2, 34.5, 36.1, 52.5, 24.6, 46.5])

diff = unabraded - abraded
mu = 0
alpha = 0.01

df = len(diff) - 1
tdist = stats.t(df)
print("critical =", tdist.ppf(1 - alpha))

t, pvalue = stats.ttest_1samp(diff, mu)
print("t =", t)
print("pvalue =", pvalue)

if (pvalue < alpha):
    print ("Reject the null hypothesis ... P-value is less than alpha")
else:
    print ("Fail to reject the null hypothesis ... P-value is greater than alpha")

critical = 2.9979515668685277
t = 2.6839745102662422
pvalue = 0.03135818872867261
Fail to reject the null hypothesis ... P-value is greater than alpha


# Question 6

Data on the modulus of elasticity obtained 1 minute after loading in a certain configuration and 4 weeks after loading for the same lumber specimens is presented here.

Calculate and interpret an upper confidence bound for the true average difference between 1-minute modulus and 4-week modulus; first check the plausibility of any necessary assumptions. (Use α = 0.05. Round your answer to the nearest whole number.)

The data for this question is stored in a local file called A4Q6.csv

In [88]:
difference = np.array([479, 3370, 2580, 3267, 2850, 2690, 2180, 1805, 2210, 2350, 2260, 3304, 2880, 2750, 3520, 1204])
xbar = difference.mean()
s = difference.std(ddof=1)
df = len(difference) - 1
SEM = s / np.sqrt(len(difference))
alpha = 0.05
CI = 1 - alpha*2

lower, upper = stats.t.interval(CI, df, xbar, SEM)
print("Upper bound: %d" % upper)

Upper bound: 2837


# Question 7

Give as much information as you can about the P-value of the F test in each of the following situations. (Give answers accurate to 3 decimal places.)

(a) ν1 = 5, ν2 = 10, upper-tailed test, f = 2.52

(b) ν1 = 5, ν2 = 10, upper-tailed test, f = 5.64 

(c) ν1 = 5, ν2 = 10, two-tailed test, f = 5.64 

(d) ν1 = 5, ν2 = 10, lower-tailed test, f = 5.64

(e) ν1 = 40, ν2 = 20, upper-tailed test, f = 3.86

In [90]:
def fpvalue(fvalue,dof1,dof2,test):
    fdist = stats.f(dof1,dof2)

    if (fvalue > 1):
        if test == "upper":
            pvalue = (1-fdist.cdf(fvalue))
        if test == "two":
            pvalue = 2.0*(1-fdist.cdf(fvalue))
        if test == "lower":
            pvalue = fdist.cdf(fvalue)
    else:
        if test == "upper":
            pvalue = fdist.cdf(fvalue)
        if test == "two":
            pvalue = 2.0*fdist.cdf(fvalue)
        if test == "lower":
            pvalue = (1-fdist.cdf(fvalue))
            
    print ("Pvalue = %0.3f" % (pvalue))

In [97]:
v1 = 5
v2 = 10
f = 2.52
fpvalue(f, v1, v2, "upper")

f = 5.64
fpvalue(f, v1, v2, "upper")

fpvalue(f, v1, v2, "two")

f = 3.33
fpvalue(f, v1, v2, "lower")

f = 1.71
v1 = 40
v2 = 20
fpvalue(f, v1, v2, "upper")

Pvalue = 0.100
Pvalue = 0.010
Pvalue = 0.020
Pvalue = 0.950
Pvalue = 0.100


# Question 8

As the population ages, there is increasing concern about accident-related injuries to the elderly. An article reported on an experiment in which the maximum lean angle—the furthest a subject is able to lean and still recover in one step—was determined for both a sample of younger females (21-29 years) and a sample of older females (67-81 years). The following observations are consistent with summary data given in the article.

YF:	32,	29,	31,	26,	29,	36,	29,	27,	35,	26

OF:	17,	13,	21,	22,	22

Carry out a test at significance level .10 to see whether the population standard deviations for the two age groups are different (normal probability plots support the necessary normality assumption). (Give answer accurate to 2 decimal places.)

In [100]:
yf = np.array([31, 30, 30, 28, 31, 34, 32, 32, 27, 28])
of = np.array([19, 14, 21, 13, 22])
alpha = 0.10

df1 = len(yf) - 1
df2 = len(of) - 1

s1 = yf.std(ddof=1)
s2 = of.std(ddof=1)

xbar1 = yf.mean()
xbar2 = of.mean()

fvalue = s1**2/s2**2
print(fvalue)

fdist = stats.f(df1, df2)
if (fvalue > 1):
    pvalue = 2.0*(1-fdist.cdf(fvalue))
else:
    pvalue = 2.0*fdist.cdf(fvalue)
    
if (pvalue < alpha):
    print ("Reject the null hypothesis ... P-value is less than alpha")
else:
    print ("Fail to reject the null hypothesis ... P-value is greater than alpha")

0.2801064537591483
Fail to reject the null hypothesis ... P-value is greater than alpha
