# Introduction to Sampling and Hypothesis Testing

## Hypothesis Tests | Parametric tests

Parametric tests rely on a probability distribution of known form as a model for the null hypothesis. 

Here we will look at some commonly-encountered examples.

### How surprising is my result? Calculating a p-value

There are many circumstances where we simply want to check whether an observation looks like it is compatible with the null hypothesis, $H_{0}$. 

Having decided on a significance level $\alpha$ and whether the situation warrants a one-tailed or a two-tailed test, we can use the cdf of the null distribution to calculate a p-value for the observation.

#### Example: probability of rolling a six

Your arch-nemesis Blofeld always seems to win at ludo, and you have started to suspect him of using a loaded die.

You observe the following outcomes from 100 rolls of his die:


In [1]:
data <- c(6, 1, 5, 6, 2, 6, 4, 3, 4, 6, 1, 2, 5, 6, 6, 3, 6, 2, 6, 4, 6, 2,
       5, 4, 2, 3, 3, 6, 6, 1, 2, 5, 6, 4, 6, 2, 1, 3, 6, 5, 4, 5, 6, 3,
       6, 6, 1, 4, 6, 6, 6, 6, 6, 2, 3, 1, 6, 4, 3, 6, 2, 4, 6, 6, 6, 5,
       6, 2, 1, 6, 6, 4, 3, 6, 5, 6, 6, 2, 6, 3, 6, 6, 1, 4, 6, 4, 2, 6,
       6, 5, 2, 6, 6, 4, 3, 1, 6, 6, 5, 5)

Do you have enough evidence to confront him?

In [2]:
# We will work with the binomial distribution for the observed number of sixes

# Write down the hypotheses
# H0: p = 1/6
# H1: p > 1/6

# choose a significance level
# alpha = 0.01

In [3]:
# code the data as 6=success and {0-5}=failure
six <- data==6
print(six)

# how many sixes were observed?
x <- sum(six)
x

# check number of trials
n <- length(data)
n

  [1]  TRUE FALSE FALSE  TRUE FALSE  TRUE FALSE FALSE FALSE  TRUE FALSE FALSE
 [13] FALSE  TRUE  TRUE FALSE  TRUE FALSE  TRUE FALSE  TRUE FALSE FALSE FALSE
 [25] FALSE FALSE FALSE  TRUE  TRUE FALSE FALSE FALSE  TRUE FALSE  TRUE FALSE
 [37] FALSE FALSE  TRUE FALSE FALSE FALSE  TRUE FALSE  TRUE  TRUE FALSE FALSE
 [49]  TRUE  TRUE  TRUE  TRUE  TRUE FALSE FALSE FALSE  TRUE FALSE FALSE  TRUE
 [61] FALSE FALSE  TRUE  TRUE  TRUE FALSE  TRUE FALSE FALSE  TRUE  TRUE FALSE
 [73] FALSE  TRUE FALSE  TRUE  TRUE FALSE  TRUE FALSE  TRUE  TRUE FALSE FALSE
 [85]  TRUE FALSE FALSE  TRUE  TRUE FALSE FALSE  TRUE  TRUE FALSE FALSE FALSE
 [97]  TRUE  TRUE FALSE FALSE


In [4]:
# now use H0 to find the p-value of the observed number of sixes
pval <- 1 - pbinom(42,100,1/6)  # note this uses k=(observed value-1)
pval

In [5]:
# pval is less than alpha, so reject H0.

#### Example: is the coin fair?

Dr Vogel has challenged you to a game of ludo and you have agreed to flip a coin to decide who starts.

You're not sure whether the coin she is using is fair or not.

She flips it 50 times for you, with the following results: (1=heads, 0=tails)


In [6]:
data <- c(1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 0, 1, 0, 1, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 1, 1, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 1, 0, 0, 1, 0, 0, 0, 1)

Does the coin appear to be fair? If not, should you pick heads or tails?

In [7]:
# Again, this is a binomial distribution for the observed number of heads, but this time the test is 2-tailed

# Write down the hypotheses
# H0: p = 1/2
# H1: p != 1/2

# choose a significance level
# alpha = 0.05

In [8]:
# find the number of heads
h <- sum(data)
h

# check number of trials
n <- length(data)
n

In [9]:
# now use H0 to find the p-value of the observed number of heads
ex <- 50 * 0.5 # the expected value
ex

In [10]:
x1 <- 20  # the lower tail
p1 <- pbinom(x1,50,0.5)  
pval <- 2 * p1 # double the p-value for a two-tailed test
print(pval)

[1] 0.2026388


In [11]:
# pval is greater than alpha, so we accept H0: there is no evidence that the coin is biased, at the 5% level.

### Difference between two means: independent 2-sample t-test

We use the **t test** to assess whether two samples taken from normal distributions have significantly different means. 

The test statistic follows a Student's t-distribution, provided that the variances of the two groups are equal.

Other variants of the t-test are applicable under different conditions.

The test statistic is

$$ t = \frac{\bar{X}_{1} - \bar{X}_{2}}{s_p \cdot \sqrt{\frac{1}{n_{1}} + \frac{1}{n_{2}}}} $$

where

$$ s_p = \sqrt{\frac{(n_1 - 1)s_1^2 + (n_2 - 1)s_2^2}{n_1 + n_2 - 2}} $$

is an estimator of the pooled standard deviation.

Under the null hypothesis of equal means, the statistic follows a Student's t-distribution with $(n_{1} + n_{2} - 2)$ degrees of freedom.

#### Example: difference in birth weight

The birth weights of babies (in kg) have been measured for a sample of mothers split into two categories: nonsmoking and heavy smoking.

- The two categories are measured independently from each other. 
- Both come from normal distributions
- The two groups are assumed to have the same unknown variance.




In [12]:
data_nonsmoking <- c(3.99, 3.79, 3.60, 3.73, 3.21, 3.60, 4.08, 3.61, 3.83, 3.31, 4.13, 3.26, 3.54)
data_heavysmoking <- c(3.18, 2.84, 2.90, 3.27, 3.85, 3.52, 3.23, 2.76, 3.60, 3.75, 3.59, 3.63, 2.38, 2.34, 2.44)


We want to know whether there is a significant difference in mean birth weight between the two categories.


In [13]:
# Write down the hypotheses
# H0: there is no difference in mean birth weight between groups: d == 0
# H1: there is a difference, d != 0

# choose a significance level
# alpha = 0.05

In [14]:

n_ns <- length(data_nonsmoking)
n_hs <- length(data_heavysmoking)

mean_ns <- mean(data_nonsmoking)
mean_hs <- mean(data_heavysmoking)

s_ns <- sd(data_nonsmoking)
s_hs <- sd(data_heavysmoking)

paste("non-smoking: n =", n_ns, ", mean =", mean_ns, ", SD =",s_ns)
paste("heavy smoking: n =", n_hs, ", mean =", mean_hs, ", SD =",s_hs)

In [15]:
# difference between the two sample means:
d_obs <- mean_ns - mean_hs
d_obs

In [16]:
# the pooled standard deviation
sp <- sqrt(((n_ns - 1)*s_ns^2 + (n_hs - 1)*s_hs^2)/(n_ns + n_hs - 2))
sp

In [17]:
# the test statistic
t_obs <- d_obs/(sp * sqrt(1/n_ns + 1/n_hs))
t_obs

In [18]:
# degrees of freedom is given by n1 + n2 - 2
df <- n_ns + n_hs - 2
df

In [19]:
# find the critical value
t95 <- qt(1-0.05/2,df) # critical value for 95% of probability mass
t95

In [20]:
# t_obs lies outside the 95% confidence interval [-t95,t95], so we reject H0