# Chapter 9. Tests of Significance

Solutions for all exercises listed in Chapter 9 (Tests of Significance) from *Principles of Statistics* (MG Bulmer, 1965)


---

Set local environment

In [1]:
import numpy as np
import pandas as pd
from scipy import stats

Define a function to determine if we should accept or reject the null hypothesis given a threshold.

In [2]:
def test(alpha=0.05, p_value=0):
    if p_value <= alpha:
        print('Reject null hypothesis with a significance level of', str(alpha))
    else:
        print('Unable to reject the null hypothesis with a significance level of', str(alpha))

# Exercise 9.1
In one of his experiments Mendel observed 705 plants with purple flowers and 224 plants with white flowers in plants bred from a purple-flowered $\times$ white-flowered hybrid. Test the hypothesis that the probability of a purple-flowered plant is $\frac{3}{4}$.

*Solution*

There is no reason to suppose that $p$ is greater than or less than $p=\frac{3}{4}$, so use the binomial distribution to perform a two-tailed test.

$H_0$: $p=\frac{3}{4}$

In [3]:
test(alpha=0.05, p_value=stats.binom_test(x=705, n=(705 + 224), p=0.75))

Unable to reject the null hypothesis with a significance level of 0.05


# Exercise 9.2
200 women are each given a sample of butter and a sample of margarine and asked to identify the butter; 120 of them do so correctly. Can women tell butter from margarine?

*Solution*

If women were unable to tell butter from margarine, the probability of labeling butter correctly would be 0.5. Hence, perform a two-tailed test using the binomial distribution.

$H_0$: Women cannot tell butter from margarine ($p=0.5$)

In [4]:
test(alpha=0.05, p_value=stats.binom_test(x=120, n=200, p=0.5, alternative='two-sided'))

Reject null hypothesis with a significance level of 0.05


# Exercise 9.3
In a similar test among 200 men, 108 identify the butter correctly; is there a sex difference in tase discrimination?

*Solution*

In 9.1 and 9.2, we were comparing the observed occurences against the expected number of observations. We are now comparing the difference between two proportions.

If there is no sex difference in taste discrimination, then the theoretical proportion of women who cannot tell the difference between butter and margarine ($P_w$) should be equal to that of men ($P_m$). In other words, we are testing the hypothesis that $Pw = Pm = P$.

Transforming a binomial random variable into a proportion implies dividing it by the number of observations.

$Y = \frac{X}{n} \implies E(Y) = P, V(Y) = \frac{PQ}{n}$

Hence, the statistic $p_w - p_m$ should be normally distributed with mean 0 and variance $PQ(\frac{1}{n_w} + \frac{1}{n_m})$.

We don't know the true value of $P$, so we'll estimate it with $\frac{x_w + x_m}{n_w + n_m}$

In [5]:
# Observed proportions
p_w = 120 / 200
p_m = 108 / 200

# Global proportion
p = (120 + 108) / (200 + 200)

# Standard deviation
sigma = np.sqrt(p * (1 - p) * ((1 / 200) + (1 / 200)))

# Statistic
s = (p_w - p_m) / sigma
s

1.2119357448701642

In [6]:
p_value = 2 * (1 - stats.norm(loc=0, scale=1).cdf(s))
test(alpha=0.05, p_value=p_value)

Unable to reject the null hypothesis with a significance level of 0.05


# Exercise 9.4
To test whether it is of advantage to klin-dry barley before sowing, eleven varieties of barley were sown (both klin-dried and not klin-dried). Test whether there is any advantage in klin-drying.

The yields, in lb. head corn per acre, are given below:

In [7]:
df = pd.DataFrame({'dried':[2009,1915,2011,2463,2180,1925,2122,1482,1542,1443,1535],
                   'not_dried':[1903,1935,1910,2496,2108,1961,2060,1444,1612,1316,1511]})
df

Unnamed: 0,dried,not_dried
0,2009,1903
1,1915,1935
2,2011,1910
3,2463,2496
4,2180,2108
5,1925,1961
6,2122,2060
7,1482,1444
8,1542,1612
9,1443,1316


$H_0: \mu_{dry} = \mu_{wet}$

We can use the $t$ statistic to do a two-tailed test on the difference between both populations (it is feasible that drying has a negative effect). Note that the samples were taken from the same plants, so they are *not* independent.

In [8]:
# New column with difference
df['diff'] = df['dried'] - df['not_dried']

# Use ttest_1samp since samples are not independent
t_test = stats.ttest_1samp(df['diff'], popmean=0)

# Test using p-value
test(alpha=0.05, p_value=t_test[1])

Unable to reject the null hypothesis with a significance level of 0.05


# Exercise 9.5
Use the data in Table 22 on p. 210 to test (a) whether there is an increase in comb-growth capons receiving $\frac{1}{2}$ mg androsterone, (b) whether there is an increase in capons receiving 4 mg, (c) whether there is any difference in capons receiving 4 mg and 8 mg.


In [9]:
df = pd.DataFrame({'half':[8,1,1,3,1], 'four':[17,14,14,19,13],'eight':[17,17,20,18,15]})
df

Unnamed: 0,half,four,eight
0,8,17,17
1,1,14,17
2,1,14,20
3,3,19,18
4,1,13,15


`scipy.stats.ttest_1samp` does not have one-tailed tests. Hence, we must define the corresponding $t$ distribution to compare the alpha and p-values.

(a) $H_0: \mu_0 = 0$, $H_1: \mu_0 \geq 0$

In [10]:
# Declare statistic
T = stats.ttest_1samp(df['half'], popmean=0)[0]

# Declare p-value
p = 1 - stats.t(df=len(df) - 1, loc=0, scale=1).cdf(T)

# Test H_0
test(alpha=0.05, p_value=p)

Unable to reject the null hypothesis with a significance level of 0.05


(b) $H_0: \mu_0 = 0, H_1: \mu_0 \geq 0$

In [11]:
# Declare statistic
T = stats.ttest_1samp(df['four'], popmean=0)[0]

# Declare p-value
p = 1 - stats.t(df=len(df) - 1, loc=0, scale=1).cdf(T)

# Test H_0
test(alpha=0.05, p_value=p)

Reject null hypothesis with a significance level of 0.05


(c) $H_0: \mu_{4mg} = \mu_{8mg}, H_1: \mu_{4mg} \neq \mu_{8mg}$

Since the samples are independent, we must calculate a $T$ statistic assuming that both samples come from a distribution with equal variance and perhaps different means.


In [12]:
# Declare column with differences
df['diff'] = df['four'] - df['eight']

# Declare t_test
t_test = stats.ttest_ind(a=df['four'], b=df['eight'], equal_var=True)

# Test H_0
test(alpha=0.05, p_value=t_test[1])

Unable to reject the null hypothesis with a significance level of 0.05


# Exercise 9.6
For Weldon's data in Exercise 6.4, (a) test whether $P = \frac{1}{2}$ by comparing the total number of successes with its Expected value, (b) test whether the data with $P=p$ follows a binomial distribution.

---

Weldon threw 12 dice 4096 times, a throw of 4, 5 or 6 being called a success, and obtained the following results

In [13]:
df = pd.DataFrame({'X':range(13),
                   'obs_freq':[0,7,60,198,430,731,948,847,536,257,71,11,0]})
df

Unnamed: 0,X,obs_freq
0,0,0
1,1,7
2,2,60
3,3,198
4,4,430
5,5,731
6,6,948
7,7,847
8,8,536
9,9,257


$H_0: P = \frac{1}{2}$ in 4096 repetitions

Under $H_0$, the observed minus the expected number of successes should follow a normal distribution with mean 0 and variance $4096 \times V(X)$.

In [14]:
# Declare random variable under H_0
X = stats.binom(n=12, p=0.5)

# Total number of successes in 4096 repetitions
obs = (df['X'] * df['obs_freq']).sum()

# Expected number of successes in 4096 repetitions
exp = X.stats('m').item() * 4096

# Variance in 4096 repetitions
var = X.stats('v').item() * 4096

# Calculate statistic
z = (obs - exp) / np.sqrt(var)

# Calculate p-value
p_value = (1 - stats.norm(loc=0, scale=1).cdf(z)) * 2

# Test
test(alpha=0.05, p_value=p_value)

Reject null hypothesis with a significance level of 0.05


We need a $\chi^2$ statistic to test the goodness of fit of a binomial distribution with probability of success equal to the observed proportion of successes.

In [15]:
# Calculate observed proportion of successes
p_obs = obs / 4096 / 12

# Delcare binomial distribution with P = p_obs
X = stats.binom(n=12, p=p_obs)

# Declare expected number of successes in 4096 repetitions
df['exp_freq'] = X.pmf(df['X']) * 4096

# Chi-square test
chi_test = stats.chisquare(f_exp=df['exp_freq'], f_obs=df['obs_freq'])

# Test H_0
test(alpha=0.05, p_value=chi_test[1])

Unable to reject the null hypothesis with a significance level of 0.05


# Exercise 9.7
Test the goodness of fit of the Poisson distribution to the data: (a) in Table 13 on p. 92, (b) in Table 15 on p. 96, (c) in Exercise 6.6. In calculating $\chi^2$ remember that no class should have an Expected value less than 5.

(a) $H_0:$ Deaths from horse kicks follow Poisson distribution with $\mu = 0.7$

In [16]:
df = pd.DataFrame({'deaths':range(6), 'obs':[144,91,32,11,2,0], 'exp':[139,97,34,8,1,0]})
df

Unnamed: 0,deaths,obs,exp
0,0,144,139
1,1,91,97
2,2,32,34
3,3,11,8
4,4,2,1
5,5,0,0


In [17]:
# Combine groups such that df['exp'] >= 5
df = pd.concat([df[df['deaths'] <= 2],
                df[df['deaths'] > 2].sum().to_frame().transpose()])

# Chi-square test with ddof = 1 because mu wasestimated from data
chi_test = stats.chisquare(f_obs=df['obs'], f_exp=df['exp'], ddof=1)

# Test H_0
test(alpha=0.05, p_value=chi_test[1])

Unable to reject the null hypothesis with a significance level of 0.05


(b) $H_0:$ Number of accidents follows Poisson with $\mu = 0.47$

In [18]:
df = pd.DataFrame({'accidents':range(7), 'obs':[447,132,42,21,3,2,0], 'exp':[406,189,44,7,1,0,0]})
df

Unnamed: 0,accidents,obs,exp
0,0,447,406
1,1,132,189
2,2,42,44
3,3,21,7
4,4,3,1
5,5,2,0
6,6,0,0


In [19]:
# Combine groups such that df['exp'] is always >= 5
df = pd.concat([df[df['exp'] > 7],
                df[df['exp'] <= 7].sum().to_frame().transpose()])

# Chi-square test (ddof = 1 because mu was estimated from data)
chi_test = stats.chisquare(f_obs=df['obs'], f_exp=df['exp'], ddof=1)

# Test H_0
test(alpha=0.05, p_value=chi_test[1])

Reject null hypothesis with a significance level of 0.05


(c) $H_0:$ Yeast cells follow a Poisson distribution with $\mu = $

In [20]:
df = pd.DataFrame({'cells':range(7), 'obs':[103,143,98,42,8,4,2]})
df

Unnamed: 0,cells,obs
0,0,103
1,1,143
2,2,98
3,3,42
4,4,8
5,5,4
6,6,2


In [21]:
# Calculate mean
mu = (df['cells'] * df['obs']).sum() / df['obs'].sum()

# Fit Poisson
X = stats.poisson(mu=mu)
df['exp'] = X.pmf(df['cells']) * df['obs'].sum()
df

Unnamed: 0,cells,obs,exp
0,0,103,106.587319
1,1,143,140.96173
2,2,98,93.210944
3,3,42,41.090491
4,4,8,13.585544
5,5,4,3.593376
6,6,2,0.79204


In [22]:
# Group classes together such that df['exp'] always >= 5
df = pd.concat([df[df['exp'] > 41],
                df[df['exp'] < 41].sum().to_frame().transpose()])

# Chi-square test (ddof = 1 because mu was estimated from data)
chi_test = stats.chisquare(f_obs=df['obs'], f_exp=df['exp'], ddof=1)

# Test H_0
test(alpha=0.05, p_value=chi_test[1])

Unable to reject the null hypothesis with a significance level of 0.05
