# Assignment 4 Solutions

In performing a two-sample t-test, there are two distinct situations to consider:

1.  The variances of the two samples are equal to one another (i.e. we are sampling from the same population).
2.  The variances of the two samples are not equal to one another (i.e. we are sampling from two different populations).

For this assignment, the textbook assumes always that situation 2 is the case!!!!!

In these instances, we calculate the standard error in the mean (SEM) and the combined number of degrees of freedom as follows:

$SEM = \sqrt{\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2}}$

$df = \frac{ \left( \frac{s_1^2}{n_1} + \frac{s_2^2}{n_2}    \right)^2 }{\frac{ \left(\frac{s_1^2}{n_1}\right)^2   }{n_1-1} + \frac{ \left(\frac{s_2^2}{n_2}\right)^2   }{n_2-1}}$

In [1]:
import numpy as np
import scipy.stats as stats
import matplotlib.pyplot as plt

def sem_neq(n1,n2,s1,s2):
    sm = np.sqrt(s1**2/n1+s2**2/n2)
    return float(sm)

def ndof_neq(n1,n2,s1,s2):
    v1 = s1**2/n1
    v2 = s2**2/n2
    dof = (v1+v2)**2/(v1**2/(n1-1)+v2**2/(n2-1))
    return int(dof)

def sem_eq(n1,n2,s1,s2):
    sp = np.sqrt(((n1-1)*s1**2+(n2-1)*s2**2)/(n1+n2-2))
    sm = sp*np.sqrt(1.0/n1+1.0/n2)
    return float(sm)

def ndof_eq(n1,n2,s1,s2):
    dof = n2+n1-2
    return int(dof)

# Question 1

Determine the number of degrees of freedom for the two-sample t test or CI in each of the following situations. 
(Exact integer answers required.)

(a) m = 10, n = 13, s1 = 4.8, s2 = 5.7


(b) m = 14, n = 23, s1 = 5.1, s2 = 5.8


(c) m = 6, n = 7, s1 = 2.3, s2 = 6.2


(d) m = 10, n = 23, s1 = 4.1, s2 = 6.6

In [2]:
m = 5
n = 7
s1 = 5.1
s2 = 5.5

print("1A", ndof_neq(m, n, s1, s2))

1A 9


In [3]:
m = 8
n = 21
s1 = 4.9
s2 = 6

print("1B", ndof_neq(m, n, s1, s2))

1B 15


In [4]:
m = 12
n = 8
s1 = 1.7
s2 = 5.8

print("1C", ndof_neq(m, n, s1, s2))

1C 7


In [5]:
m = 10
n = 25
s1 = 3.8
s2 = 6.8

print("1C", ndof_neq(m, n, s1, s2))

1C 28


# Question 2

Let μ1 and μ2 denote true average densities for two different types of brick. Assuming normality of the two density distributions, test Ho: μ1 – μ2 = 0 versus Ha: μ1 – μ2 ≠ 0 using the following data: m = 6, x = 22.27, s1 = 0.156, n = 5, y = 20.22, and s2 = 0.234. (Use α = 0.05. Give ν to exact integer and t to 2 decimal places.)

In [9]:
n1 = 6
n2 = 5
s1 = 0.169
s2 = 0.228

xBar1 = 23.25
xBar2 = 20.82

df = ndof_neq(n1, n2, s1, s2)
sm = sem_neq(n1, n2, s1, s2)

tValue = (xBar1 - xBar2) / sm

print("DF %0.0f" % df)
print("T Value %0.2f" % tValue)

tdist = stats.t(df)

pValue = 2.0 * tdist.cdf(-np.abs(tValue))

print("P Value %0.3f" % pValue)
print("Reject the null hypothesis")

DF 7
T Value 19.74
P Value 0.000
Reject the null hypothesis


# Question 3

Quantitative noninvasive techniques are needed for routinely assessing symptoms of peripheral neuropathies, such as carpal tunnel syndrome (CTS). An article reported on a test that involved sensing a tiny gap in an otherwise smooth surface by probing with a finger; this functionally resembles many work-related tactile activities, such as detecting scratches or surface defects. When finger probing was not allowed, the sample average gap detection threshold for m = 8 normal subjects was 1.8 mm, and the sample standard deviation was 0.49; for n = 12 CTS subjects, the sample mean and sample standard deviation were 2.52 and 0.85, respectively. Does this data suggest that the true average gap detection threshold for CTS subjects exceeds that for normal subjects? State and test the relevant hypotheses using a significance level of .01. (Give answers accurate to 2 decimal places.)

In [31]:
n1 = 6
n2 = 11
xBar1 = 1.69
s1 = 0.53
s2 = 0.82
xBar2 = 2.4

n1 = 8
s1 = 0.47
xBar1 = 1.72
n2 = 11
s2 = 0.8
xBar2 = 2.5

alpha = 0.01

df = ndof_neq(n1, n2, s1, s2)
sm = sem_neq(n1, n2, s1, s2)

tValue = (xBar1 - xBar2) / sm

print("DF %0.0f" % df)
print("T Value %0.2f" % tValue)

tdist = stats.t(df)

pValue = 2.0 * tdist.cdf(-np.abs(tValue))
tCritical = tdist.ppf(1 - alpha)

print("P Value %0.3f" % pValue)
print("T Critical %0.2f" % tCritical)

if (pValue < alpha):
    print ("Reject the null hypothesis ... P-value is less than alpha")
else:
    print ("Fail to reject the null hypothesis ... P-value is greater than alpha")

DF 16
T Value -2.66
P Value 0.017
T Critical 2.58
Fail to reject the null hypothesis ... P-value is greater than alpha


# Question 4

The slant shear test is widely accepted for evaluating the bond of resinous repair materials to concrete; it utilizes cylinder specimens made of two identical halves bonded at 30°. For 12 specimens prepared using wire-brushing, the sample mean shear strength (N/mm2) and sample standard deviation were 18.23 and 1.48, respectively, whereas for 12 hand-chiseled specimens, the corresponding values were 23.47 and 4.01. Does the true average strength appear to be different for the two methods of surface preparation? Test the relevant hypotheses using a significance level of .05. (Give ν to exact integer and t to 2 decimal places.)

In [16]:
n1 = 12
n2 = 12
s1 = 1.77
s2 = 4.03

xBar1 = 18.28
xBar2 = 23.92

alpha = 0.05

df = ndof_neq(n1, n2, s1, s2)
sm = sem_neq(n1, n2, s1, s2)

tValue = (xBar1 - xBar2) / sm

print("Unequal Variances")

df = ndof_neq(n1, n2, s1, s2)
sm = sem_neq(n1, n2, s1, s2)

tValue = (xBar1 - xBar2) / sm

print("DF %0.0f" % df)
print("T Value %0.2f" % tValue)

tdist = stats.t(df)

pValue = 2.0 * tdist.cdf(-np.abs(tValue))

print("P Value %0.6f" % pValue)

tLow = tdist.ppf(alpha/2)
tHigh = tdist.ppf(1-alpha/2)
print("T Critical Values ( %.02f, %0.2f )" % (tLow, tHigh))

if (pValue < alpha):
    print ("Reject the null hypothesis ... P-value is less than alpha")
else:
    print ("Fail to reject the null hypothesis ... P-value is greater than alpha")

print("Python Stats Package of Unequal Variances")
tValue, pValue = stats.ttest_ind_from_stats(xBar1, s1, n1, xBar2, s2, n2, True)
print("T Value %0.2f" % tValue)
print("P Value %0.6f" % pValue)

Unequal Variances
DF 15
T Value -4.44
P Value 0.000478
T Critical Values ( -2.13, 2.13 )
Reject the null hypothesis ... P-value is less than alpha
Python Stats Package of Unequal Variances
T Value -4.44
P Value 0.000207


# Question 5

Consider the accompanying data on breaking load (kg/25 mm width) for various fabrics in both an unabraded condition and an abraded condition. Use the paired t test to test Ho: μD = 0 versus Ha: μD > 0 at significance level .01. (Give answers accurate to 2 decimal places.)

In [18]:
u = np.array([30.1, 55.0, 56.3, 38.7, 42.0, 48.8, 28.8, 49.8])
a = np.array([20.6, 20.0, 48.6, 34.5, 39.5, 52.5, 21.5, 46.5])

diff = u-a
mu = 0
alpha = 0.01

df = len(diff) - 1
tdist = stats.t(df)

tCritical = tdist.ppf(1-alpha)
print("Critical T Value %0.2f" % tCritical)

t, pVal = stats.ttest_1samp(diff, mu)
print("T Value %0.2f" % t)
print("P Value %0.3f" % (2*pVal))

if (pVal < alpha):
    print ("Reject the null hypothesis ... P-value is less than alpha")
else:
    print ("Fail to reject the null hypothesis ... P-value is greater than alpha")

Critical T Value 3.00
T Value 2.01
P Value 0.168
Fail to reject the null hypothesis ... P-value is greater than alpha


# Question 6

Data on the modulus of elasticity obtained 1 minute after loading in a certain configuration and 4 weeks after loading for the same lumber specimens is presented here.

Calculate and interpret an upper confidence bound for the true average difference between 1-minute modulus and 4-week modulus; first check the plausibility of any necessary assumptions. (Use α = 0.05. Round your answer to the nearest whole number.)

The data for this question is stored in a local file called A4Q6.csv

In [None]:
import pandas as pd
df = pd.read_csv('/home/brash/Phys341/A4Q6.csv')
df.head()

In [None]:
diff = df.Difference
xBar = diff.mean()
sem = stats.sem(diff)
dof = len(diff) - 1

alpha = 0.05
cl = 1-2*alpha

cInterval = stats.t.interval(cl, dof, loc=xBar, scale=sem)
print("Confidence Limit %0.0f" % cInterval[1])
print("For any mu greater than %0.0f, we reject the null hypothesis at the confidence level")

# Question 7

Give as much information as you can about the P-value of the F test in each of the following situations. (Give answers accurate to 3 decimal places.)

(a) ν1 = 5, ν2 = 10, upper-tailed test, f = 2.52

(b) ν1 = 5, ν2 = 10, upper-tailed test, f = 5.64 

(c) ν1 = 5, ν2 = 10, two-tailed test, f = 5.64 

(d) ν1 = 5, ν2 = 10, lower-tailed test, f = 5.64

(e) ν1 = 40, ν2 = 20, upper-tailed test, f = 3.86

In [19]:
def fpvalue(fvalue,dof1,dof2,test):
    fdist = stats.f(dof1,dof2)

    if (fvalue > 1):
        if test == "upper":
            pvalue = (1-fdist.cdf(fvalue))
        if test == "two":
            pvalue = 2.0*(1-fdist.cdf(fvalue))
        if test == "lower":
            pvalue = fdist.cdf(fvalue)
    else:
        if test == "upper":
            pvalue = fdist.cdf(fvalue)
        if test == "two":
            pvalue = 2.0*fdist.cdf(fvalue)
        if test == "lower":
            pvalue = (1-fdist.cdf(fvalue))
    return pvalue

In [20]:
f = 3.33
test = "upper"
v1 = 5
v2 = 10

print("Answer A %0.3f" % fpvalue(f, v1, v2, test))

Answer A 0.050


In [21]:
f = 10.48
test = "upper"
v1 = 5
v2 = 10

print("Answer B %0.3f" % fpvalue(f, v1, v2, test))

Answer B 0.001


In [22]:
f = 3.33
test = "two"
v1 = 5
v2 = 10

print("Answer C %0.3f" % fpvalue(f, v1, v2, test))

Answer C 0.100


In [23]:
f = 5.64
test = "lower"
v1 = 5
v2 = 10

print("Answer D %0.3f" % fpvalue(f, v1, v2, test))

Answer D 0.990


In [24]:
f = 1.71
test = "upper"
v1 = 40
v2 = 20

print("Answer E %0.3f" % fpvalue(f, v1, v2, test))

Answer E 0.100


# Question 8

As the population ages, there is increasing concern about accident-related injuries to the elderly. An article reported on an experiment in which the maximum lean angle—the furthest a subject is able to lean and still recover in one step—was determined for both a sample of younger females (21-29 years) and a sample of older females (67-81 years). The following observations are consistent with summary data given in the article.

YF:	32,	29,	31,	26,	29,	36,	29,	27,	35,	26

OF:	17,	13,	21,	22,	22

Carry out a test at significance level .10 to see whether the population standard deviations for the two age groups are different (normal probability plots support the necessary normality assumption). (Give answer accurate to 2 decimal places.)

In [28]:
yf = np.array([32, 26, 33, 27, 27, 32, 35, 29, 29, 27])
of = np.array([14, 19, 22, 15, 16])

In [30]:
n1 = len(yf)
n2 = len(of)

dof1 = n1 - 1
dof2 = n2 - 1

s1 = yf.std(ddof=1)
s2 = of.std(ddof=1)

xBar1 = yf.mean()
xBar2 = of.mean()

fValue = s1 ** 2 / s2 ** 2

print("Standard deviation of sample 1 %0.1f" % s1)
print("Standard deviation of sample 2 %0.1f" % s2)
print("Variance of sample 1 %0.1f" % s1**2)
print("Variance of sample 2 %0.1f" % s2**2)
print("F statistic %0.3f" % fValue)

alpha = 0.05

fDist = stats.f(dof1,dof2)
fLow = fDist.ppf(alpha/2)
fHigh = fDist.ppf(1-alpha/2)

print("Critical F values: %0.3f, %0.3f" % (fLow, fHigh))

if (fValue > 1):
  pValue = 2.0*(1-fDist.cdf(fValue))
  print("Fail")
else:
  pValue = 2.0*fDist.cdf(fValue)
  print("Reject")

print("P Value %0.3f" % pValue)

Standard deviation of sample 1 3.5
Standard deviation of sample 2 4.4
Variance of sample 1 12.3
Variance of sample 2 19.3
F statistic 0.638
Critical F values: 0.212, 8.905
Reject
P Value 0.528
