# 10. Statistical Inference
Answers to all exercises from Chapter 10 of M.G. Bulmer's Principle of Statistics (1965)

---

Set local env

In [1]:
import numpy as np
import pandas as pd
from scipy import stats

# Exercise 10.1
Use the data in Table 3 on p. 18 to find a 95 per cent confidence interval for the probability of throwing a six with the white die.

*Solution*

In 20,000 rolls, the die resulted in six 3,932 times. The question asks for the probability of throwing a six; not the probability of throwing 3,932 sixes. Hence, we have to make an inference on the observed ratio of sixes and will therefore resort to the normal approximation.

In [2]:
# Number of observations
n_obs = 20000

# Observed proportion of successes
p_obs = 3932 / n_obs

# Observed standard deviation
s_obs = np.sqrt((p_obs * (1 - p_obs)) / n_obs)

# Declare normal RV
X = stats.norm(loc=p_obs, scale=s_obs)

# Calculate 95% CI (Note that `alpha` is the probability WITHIN the interval)
X.interval(alpha=0.95)

(0.19109204017782955, 0.20210795982217045)

# Exercise 10.2
Use the data in Table 2 on p.13 to find 95 percent confidence limits (a) for the stillbirth rate in males, (b) for the stillbirth rate in females, (c) for the sex difference in the stillbirth rate. [In (c), estimate the variance of $p_1 - p_2$ as $\frac{p_1 q_1}{n_1} + \frac{p_2 q_2}{n_2}$.]

In [3]:
df = pd.DataFrame({'liveborn':[359881,340454],'stillborn':[8609,7796]},
                  index=['male','female'])
df

Unnamed: 0,liveborn,stillborn
male,359881,8609
female,340454,7796


(a) The number of male stillbirths can be regarded as a sample from a binomial distribution and therefore, the stillbirth rate can be approximated by a normal distribution.

In [4]:
# Number of males
n_m = df.loc['male'].sum()

# Probability of stillbirth given male
p_sm = df.loc['male', 'stillborn'] / n_m

# Observed standard deviation
s_sm = np.sqrt((p_sm * (1 - p_sm)) / n_m)

# Normal RV
X = stats.norm(loc=p_sm, scale=s_sm)

# Confidence interval
X.interval(alpha=0.95)

(0.022875199374866848, 0.023850627648932986)

(b) Likewise, the number of female stillbirths can be regarded as a sample from a binomial distribution and therefore, the stillbirth rate can be approximated by a normal distribution.

In [5]:
# Number of females
n_f = df.loc['female'].sum()

# Probability of stillbirth given female
p_sf = df.loc['female', 'stillborn'] / n_f

# Observed standard deviation
s_sf = np.sqrt((p_sf * (1 - p_sf)) / n_f)

# Normal RV
X = stats.norm(loc=p_sf, scale=s_sf)

# Confidence interval
X.interval(alpha=0.95)

(0.02189488311385085, 0.022877550482703353)

(c) We define the random variable $p_1 - p_2$ to find the sex difference in stillbirth ratios.

In [6]:
# Subtract both ratios
diff_p = p_sm - p_sf

# Variance
diff_v = ((p_sm * (1 - p_sm)) / n_m) + ((p_sf * (1 - p_sf)) / n_f)

# Normal RV
X = stats.norm(loc=diff_p, scale=np.sqrt(diff_v))

# Confidence interval
X.interval(alpha=0.95)

(0.00028440062890632304, 0.0016689927983393094)

# Exercise 10.3
Find a 95 per cent confidence interval for the mean of the distribution of litter size in rats given in Table 8 on p. 29. [Use the normal approximation. The observed mean and variance have been calculated on p. 46 and in Exercise 4.3 respectively.]

In [7]:
df = pd.DataFrame({'litter':range(1, 13), 'obs':[7,33,58,116,125,126,121,107,56,37,25,4]})
df

Unnamed: 0,litter,obs
0,1,7
1,2,33
2,3,58
3,4,116
4,5,125
5,6,126
6,7,121
7,8,107
8,9,56
9,10,37


In [8]:
# Number of observations
n = df['obs'].sum()

# Mean
m_obs = (df['litter'] * df['obs']).sum() / n

# Variance
v_obs = ((df['litter'] - m_obs).pow(2) * df['obs']).sum() / (n - 1)

# Declare normal random variable (remember that the variance is s/n)
X = stats.norm(loc=m_obs, scale=np.sqrt(v_obs / n))

# Find confidence interval
X.interval(alpha=0.95)

(5.968922110102722, 6.281384638363536)

# Ecercise 10.4
In a haemocytometer (see p.93) each square has sides of length 0.05 mm and thickness of 0.01 mm. Suppose that a suspension of cells is introduced into the haemocytometer and that counts over 20 squares give a mean value of 5.2 cells per square. Find a 95 per cent confidence interval for the number of cells per c.c. in the suspension. [Use the normal approximation and estimate the variance of the distribution of cell counts from its mean.]

*Solution*

In p.93, Student found that the distribution of the number of cells follows a Poisson distribution. Hence, the estimated variance should be roughly the same as the mean.

In [9]:
# Mean and variance
m_obs = 5.2
v_obs = 10000000

# Number of slides that fit in a mm3 times the number of mm3 that fit in one cm3
factor = 40000 * 1000

# Random variable
X = stats.norm(loc=m_obs * factor, scale=np.sqrt(v_obs * factor))

# Confidence interval
X.interval(alpha=0.95)

(168800720.30919892, 247199279.69080108)

# Exercise 10.5
Use the data in Table 19 on p.149 to find 95 per cent confidence intervals for the average number of hours of sleep gained by the use of (a) hyoscyamine, (b) hyoscine, and (c) for the superiority of the latter of the former drug.

*Solution*

Since $n=10$, we must use the $t$ distribution with $9$ degrees of freedom.

In [10]:
df = pd.DataFrame({'hyoscyamine':[.7,-1.6,-.2,-1.2,-.1,3.4,3.7,.8,0,2],
                   'hyoscine':[1.9,.8,1.1,.1,-.1,4.4,5.5,1.6,4.6,3.4]})
# Superiority of hyoscine over hyoscyamine
df['difference'] = df['hyoscine'] - df['hyoscyamine']
df

Unnamed: 0,hyoscyamine,hyoscine,difference
0,0.7,1.9,1.2
1,-1.6,0.8,2.4
2,-0.2,1.1,1.3
3,-1.2,0.1,1.3
4,-0.1,-0.1,0.0
5,3.4,4.4,1.0
6,3.7,5.5,1.8
7,0.8,1.6,0.8
8,0.0,4.6,4.6
9,2.0,3.4,1.4


95% CI for hyoscyamine

In [11]:
# Observed mean
m_obs = df['hyoscyamine'].mean()

# Observed variace
v_obs = df['hyoscyamine'].var(ddof=1) / 10

# Distribution
t = stats.t(df=9, loc=m_obs, scale=np.sqrt(v_obs))

# Interval
t.interval(alpha=0.95)

(-0.5297804134938646, 2.0297804134938646)

95% CI for hyoscine

In [12]:
# Observed mean
m_obs = df['hyoscine'].mean()

# Observed variace
v_obs = df['hyoscine'].var(ddof=1) / 10

# Distribution
t = stats.t(df=9, loc=m_obs, scale=np.sqrt(v_obs))

# Interval
t.interval(alpha=0.95)

(0.897677539412931, 3.762322460587068)

95% CI for difference

In [13]:
# Observed mean
m_obs = df['difference'].mean()

# Observed variace
v_obs = df['difference'].var(ddof=1) / 10

# Distribution
t = stats.t(df=9, loc=m_obs, scale=np.sqrt(v_obs))

# Interval
t.interval(alpha=0.95)

(0.7001142367452712, 2.459885763254729)

# Exercise 10.6
Use the data in Table 19 on p. 149 to find the 95 per cent confidence intervals for the standard deviation of hours of sleep gained by the use of hyoscine.

*Solution*

The sample variance divided by the popylation variance follows a $\chi^2$ distribution with $n - 1$ degrees of freedom.

$\text{Prob}(\chi^2_0 \leq \frac{S^2}{\sigma^2} \leq \chi^2_1) = 0.95$

Hence $\sigma^2 \leq \frac{S^2}{\chi^2_0}$ and $\sigma^2 \geq \frac{S^2}{\chi^2_1}$

Thus $\frac{S^2}{\chi^2_1} \leq \sigma^2 \leq \frac{S^2}{\chi^2_0}$

Thus $\sqrt{\frac{S^2}{\chi^2_1}} \leq \sigma \leq \sqrt{\frac{S^2}{\chi^2_0}}$

In [14]:
# Calculate S^2
S2 = df['hyoscine'].var(ddof=1) * (n_obs - 1)

# Declare distribution
X = stats.chi2(df=n_obs - 1)

# 95% CI
c = X.interval(alpha=0.95)

# CI
(np.sqrt(S2 / c[1]), np.sqrt(S2 / c[0]))

(1.982818452907266, 2.0220662908819307)

# Exercise 10.7
Use the data in Table 22 on p. 210 to find 95 per cent confidence intervals (a) for the increase in comb-growth in capons receiving 4 mg androsterone, (b) for the difference in comb-growth between capons receiving 4 mg and 8 mg.