# Problem Set 11, Part One: Due Thursday, April 24 by 8am Eastern Daylight Time

## Name: David Millard

**Show your work on all problems!** Be sure to give credit to any
collaborators, or outside sources used in solving the problems. Note
that if using an outside source to do a calculation, you should use it
as a reference for the method, and actually carry out the calculation
yourself; it’s not sufficient to quote the results of a calculation
contained in an outside source.

Fill in your solutions in the notebook below, inserting markdown and/or code cells as needed.  Try to do reasonably well with the typesetting, but don't feel compelled to replicate my formatting exactly.  **You do NOT need to make random variables blue!**

In [1]:
%matplotlib inline

In [2]:
import numpy as np
from scipy import stats
import matplotlib.pyplot as plt
plt.rcParams['figure.figsize'] = (8.0,5.0)
plt.rcParams['font.size'] = 14

## Conover Problems on Cochran’s $Q$ Test

#### Exercise 4.6.1:

The relative effectiveness of two different sales techniques was tested
on 12 volunteer housewives. Each housewife was exposed to each sales
technique and asked to buy a certain product, the same product in all
cases. At the end of each exposure, each housewife rated the technique
with a 1 if she felt she would have agreed to buy the product and a 0 if
she probably would not have bought the product.

|               |   |   |   |   |   |   |   |   |   |   |   |   |
| ------------- | - | - | - | - | - | - | - | - | - | - | - | - |                                             
| *Technique 1* | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 0 | 1 | 1 | 0 | 1 |
| *Technique 2* | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 |

**(a)** Use Cochran’s Test.

$H_0$: $p_1 = p_2$

$H_1$: At least one $p_i$ differs.

$Q = 4.0$

$p$-value = $0.0455$

Since $0.0455 < 0.05$, we reject $H_0$.

In [4]:
X_ij = np.array([
    [1, 1, 1, 1, 1, 0, 0, 0, 1, 1, 0, 1],
    [0, 1, 1, 0, 0, 0, 0, 0, 1, 0, 0, 1]
]).T

r,c = X_ij.shape; r, c

(12, 2)

In [5]:
r_i = X_ij.sum(axis=-1)
c_j = X_ij.sum(axis=0)
N = np.sum(X_ij)

In [7]:
Q = c*(c-1)*np.sum((c_j-N/c)**2)/np.sum(r_i*(c-r_i))

p = stats.chi2(df=c-1).sf(Q)

print(Q, p)

4.0 0.04550026389635857


**(b)** Rearrange the data and use McNemar’s test in the large sample form
suggested by Equation 3.5.1.

$H_0$: $p_1 = p_2$

$H_1$: At least one $p_i$ differs.

$Q = 4.0$

$p$-value = $0.0455$

Since $0.0455 < 0.05$, we reject $H_0$.

In [17]:
myX_ij = X_ij
myr,myc = myX_ij.shape

myr_i = myX_ij.sum(axis=-1)
myc_j = myX_ij.sum(axis=0)
myN=np.sum(myX_ij)

In [18]:
thisa = np.sum((1-myX_ij[:,0]) * (1-myX_ij[:,1]))
thisb = np.sum((1-myX_ij[:,0]) * myX_ij[:,1])
thisc = np.sum(myX_ij[:,0] * (1-myX_ij[:,1])) 
thisd = np.sum(myX_ij[:,0] * myX_ij[:,1])

In [19]:
thisT = (thisb-thisc)**2/(thisb+thisc)

p = stats.chi2(df=c-1).sf(Q)

print(thisT, p)

4.0 0.04550026389635857


**(c)** Ignore the blocking effect in this experiement and treat the data as if
24 different housewives were used. Analyze the data using the test for
differences in probabilities given in Section 4.1. Compare with
Cochran’s test and discuss.

$H_0$: $p_1 = p_2$

$H_1$: At least one $p_i$ differs.

$T = 2.666$

$p$-value = $0.1024$

Since $0.1024 > 0.05$, we fail to reject $H_0$.

In [20]:
myO1_j = myc_j
myE1j = myN/myc
myO2_j = myr - myc_j
myE2j = myr-myN/myc

print(myO1_j, myE1j, myO2_j, myE2j)

[8 4] 6.0 [4 8] 6.0


In [21]:
myT = np.sum((myO1_j-myE1j)**2/myE1j) + np.sum((myO1_j-myE1j)**2/myE1j)

p = stats.chi2(df=myc-1).sf(myT)

print(myT, p)

2.6666666666666665 0.10247043485974942


#### Exercise 4.6.2:

On a ship, 12 groups with three sailors in each group were chosen in a
random manner, where the sailors in each group did similar work and were
in the same division aboard the ship. In a random manner the sailors in
each group were given treatment 1, 2, or 3, no two sailors from the same
group receiving the same treatment. Treatment 1 was a “flu shot”,
treatment 2 was a “flu pill”, and treatment 3 was a promise of 2 weeks
extra leave if they did not catch the flu. As each sailor reported to
sick bay with the flu, a report to the experimenter was made. At the end
of the winter, these were the results.

| Group | Sailors with the Flu (by Treatment Number) |
| ----: | :----------------------------------------- |
|    1  | 2                                          |
|    2  | 1, 2                                       |
|    3  | 1, 2, 3                                    |
|    4  | 2, 3                                       |
|    5  | 2                                          |
|    6  | None                                       |
|    7  | 1, 2                                       |
|    8  | 1, 2                                       |
|    9  | 1                                          |
|   10  | 2                                          |
|   11  | 1, 2, 3                                    |
|   12  | 2                                          |

Do these results indicate significant difference between the various
treatments? [compute a $p$-value]

$H_0$: $p_1 = p_2 = p_3$

$H_1$: At least one $p_i$ differs.

$Q = 8.222$

$p$-value = $0.0163$

Since $0.0163 < 0.05$, we reject $H_0$.

In [23]:
X_ij = np.array([
    [0, 1, 0], 
    [1, 1, 0], 
    [1, 1, 1], 
    [0, 1, 1], 
    [0, 1, 0], 
    [0, 0, 0], 
    [1, 1, 0], 
    [1, 1, 0], 
    [1, 0, 0], 
    [0, 1, 0], 
    [1, 1, 1], 
    [0, 1, 0], 
])
r,c = X_ij.shape; r, c

(12, 3)

In [24]:
r_i = X_ij.sum(axis=-1)
c_j = X_ij.sum(axis=0)
N = np.sum(X_ij)

In [25]:
Q = c*(c-1)*np.sum((c_j-N/c)**2)/np.sum(r_i*(c-r_i))

p = stats.chi2(df=c-1).sf(Q)

print(Q, p)

8.222222222222221 0.016389553790213608


#### Problem:

If we define $K^{(i)}_j$ to be the rank within block $i$ of the response
$X_{ij}$ to treatment $j$ (using $K^{(i)}_j$ rather than $R_{ij}$ to
avoid notational confusion with the row sums in this chapter), the
Friedman test statistic, adjusted for ties, can be written, in the
present notation,
$$T_1 = \frac{\sum_{j=1}^c \left(K_j-\frac{r(c+1)}{2}\right)^2}
  {\frac{1}{c-1}\sum_{i=1}^r\sum_{j=1}^c \left(K^{(i)}_j-\frac{c+1}{2}\right)^2}$$
where $K_j=\sum_{i=1}^r K^{(i)}_j$ is the rank-sum in column $j$.

**(a)** Work out expressions for the ranks $K^{(i)}(0)$ and $K^{(i)}(1)$ of $0$
and $1$ responses within block $i$, which will depend on the row sum
$R_i$, which is the total nunber of $1$ observations in row $i$.

\begin{align}
K^{(i)}(0) &= \frac{c - R_i + 1}{2} \\
K^{(i)}(1) &= \frac{2c - R_i + 1}{2}
\end{align}

**(b)** Use the expression $K^{(i)}_j=(1-X_{ij})K^{(i)}(0)+X_{ij}K^{(i)}(1)$ to
work out the value of $K_j$.

\begin{align}
K_j &= \sum_{i=1}^r K^{(i)}_j \\
&= \sum_{i=1}^r \left[ (1 - X_{ij})K^{(i)}(0) + X_{ij}K^{(i)}(1) \right] \\
&= \sum_{i=1}^r (1 - X_{ij})K^{(i)}(0) + \sum_{i=1}^r X_{ij}K^{(i)}(1)
\end{align}


**(c)** Show that the Friedman test statistic $T_1$ is equal to the Cochran $Q$
statistic, and that therefore the Cochran test is equivalent to the
Friedman test with ties applied to the $0$ and $1$ data.

\begin{align}
T_1 &= \frac{\sum_{j=1}^c \left( K_j - \frac{r(c+1)}{2} \right)^2}
{\frac{1}{c-1} \sum_{i=1}^r \sum_{j=1}^c \left( K^{(i)}_j - \frac{c+1}{2} \right)^2} \\
&= \frac{\sum_{j=1}^c \left( \frac{c}{2} \sum_{i=1}^r X_{ij} - \left[ \frac{r(c+1)}{2} - \frac{1}{2} \sum_{i=1}^r (c - R_i + 1) \right] \right)^2}
{\frac{1}{c-1} \sum_{i=1}^r \sum_{j=1}^c \left( K^{(i)}_j - \frac{c+1}{2} \right)^2}
\end{align}

