In [1]:
import numpy as np
import scipy
import pandas as pd

Here is the dataframe.

In [2]:
names = ['Placebo', 'Chlorpromazine', 'Dimenhydrinate', 'Pentobarbital (100 mg)', 'Pentobarbital (150 mg)']

In [3]:
df = pd.DataFrame({
    'Name': names,
    'Number of Patients': [80, 75, 85, 67, 85],
    'Incidence of Nausea': [45, 26, 52, 35, 37]
})

In [4]:
display(df)

Unnamed: 0,Name,Number of Patients,Incidence of Nausea
0,Placebo,80,45
1,Chlorpromazine,75,26
2,Dimenhydrinate,85,52
3,Pentobarbital (100 mg),67,35
4,Pentobarbital (150 mg),85,37


Part (a)
--

We'll test each drug against the placebo, using the Wald test. In a fixed trial, we have random variables $X_1,\ldots,X_n$ where $X_i=1$ if patient $i$ had nausea and $X_i=0$ otherwise. We assume the $X_i\sim \operatorname{Bern}(p)$ for a common parameter $p$ and that the $X_i$ are independent. We measure the mean $\hat{p}$ of the $X_i$ in the study. The standard error of $\hat{p}$ is estimated as $\sqrt{\hat{p}(1-\hat{p})/n}$ where $n$ is the number of patients in the study. We also measure $\hat{p}_0$, the estimated mean for the placebo and the difference $\hat{\theta} = \hat{p} - \hat{p}_0$. The standard error of $\hat{\theta}$ is estimated as the square root of the sum of squares of standard errors for $\hat{p}$ and $\hat{p}_0$.

In [5]:
n_0 = df.loc[0, 'Number of Patients']
p_0 = df.loc[0, 'Incidence of Nausea'] / n_0
se_hat_0 = np.sqrt(p_0*(1-p_0)/n_0)

In [6]:
print('Estimated mean for placebo: {0:.5f}'.format(p_0))

Estimated mean for placebo: 0.56250


In [7]:
from scipy.stats import norm
p_vals = np.zeros(4)

In [8]:
for i, name in enumerate(names[1:]):
    print(name)
    n = df.loc[i+1, 'Number of Patients']
    p = df.loc[i+1, 'Incidence of Nausea'] / n
    print('Estimated mean for {0}: {1:.5f}'.format(name, p))
    theta_hat = p - p_0
    print('Estimated difference in mean from placebo: {0:.5f}'.format(theta_hat))
    se_hat_1 = np.sqrt(p*(1-p)/n)
    se_hat = np.sqrt(se_hat_1**2 + se_hat_0**2)
    W = theta_hat / se_hat
    print('Observed Wald test statistic: {0:.5f}'.format(W))
    p_val = 2*norm.cdf(-abs(W))
    print('p-value: {0:.5f}'.format(p_val))
    print('')
    p_vals[i] = p_val

Chlorpromazine
Estimated mean for Chlorpromazine: 0.34667
Estimated difference in mean from placebo: -0.21583
Observed Wald test statistic: -2.76436
p-value: 0.00570

Dimenhydrinate
Estimated mean for Dimenhydrinate: 0.61176
Estimated difference in mean from placebo: 0.04926
Observed Wald test statistic: 0.64299
p-value: 0.52023

Pentobarbital (100 mg)
Estimated mean for Pentobarbital (100 mg): 0.52239
Estimated difference in mean from placebo: -0.04011
Observed Wald test statistic: -0.48643
p-value: 0.62666

Pentobarbital (150 mg)
Estimated mean for Pentobarbital (150 mg): 0.43529
Estimated difference in mean from placebo: -0.12721
Observed Wald test statistic: -1.64661
p-value: 0.09964



For Chlorpromazine, its p-value is less than 0.05. So we reject the null hypothesis that the rate of nausea is the same under Chlorpromazine and the placebo. For all the other drugs we fail to reject the null hypothesis.

Part (b)
--

Now we adjust our thresholds using the Bonferroni method and the FDR method. First the Bonferroni method.

In [16]:
for i, name in enumerate(names[1:]):
    print(name)
    print('p-value: {0:.5f}'.format(p_vals[i]))
    print('Bonferroni method threshold: {0:.5f}'.format(0.05/4))
    print('')

Chlorpromazine
p-value: 0.00570
Bonferroni method threshold: 0.01250

Dimenhydrinate
p-value: 0.52023
Bonferroni method threshold: 0.01250

Pentobarbital (100 mg)
p-value: 0.62666
Bonferroni method threshold: 0.01250

Pentobarbital (150 mg)
p-value: 0.09964
Bonferroni method threshold: 0.01250



The only p-value smaller than the threshold specified by the Bonferroni method is for Chlorpromazine. So we reject the null hypothesis for Chlorpromazine and fail to reject it for all the other drugs.

Now we try the FDR method. We follow the steps in the box on page 167. The array "ls" below is the array of numbers $\ell_i$ defined on page 167. The numbers "R" and "T" are defined as on page 167.

In [17]:
sorted_p_vals = sorted(p_vals)

In [18]:
ls = np.array([i * 0.05 / 4 for i in range(1,5)])

In [19]:
ls

array([0.0125, 0.025 , 0.0375, 0.05  ])

In [20]:
R = np.max(np.flatnonzero(sorted_p_vals < ls))

In [21]:
T = sorted_p_vals[R]

In [22]:
print('Value of R: {}'.format(R))
print('Value of BH rejection threshold T: {0:.5f}'.format(T))

Value of R: 0
Value of BH rejection threshold T: 0.00570


In [23]:
for i, name in enumerate(names[1:]):
    print(name)
    print('p-value: {0:.5f}'.format(p_vals[i]))
    print('FDR method threshold: {0:.5f}'.format(T))
    print('')

Chlorpromazine
p-value: 0.00570
FDR method threshold: 0.00570

Dimenhydrinate
p-value: 0.52023
FDR method threshold: 0.00570

Pentobarbital (100 mg)
p-value: 0.62666
FDR method threshold: 0.00570

Pentobarbital (150 mg)
p-value: 0.09964
FDR method threshold: 0.00570



Thus we reject the null hypothesis that the rate of nausea is the same as the rate under placebo for the Chlorpromazine trial and for none of the other drugs.