<H1> Multiple Testing

We'll begin by recreating the examples from this morning's lecture:

<H2> Coin Toss Experiments

In [38]:
set.seed(231)
x=sample(c("H","T"),10,replace=TRUE,prob=c(1/3,2/3)) # Create a sample of size 10, with probability of "H" = 1/3
                                                     # and probability of "T" = 2/3. Clearly, this is a biased coin.
x

In [39]:
# Test for biasedness

binom.test(sum(x=='T'), n=length(x), p = 0.5)


	Exact binomial test

data:  sum(x == "T") and length(x)
number of successes = 9, number of trials = 10, p-value = 0.02148
alternative hypothesis: true probability of success is not equal to 0.5
95 percent confidence interval:
 0.5549839 0.9974714
sample estimates:
probability of success 
                   0.9 


So, our test concludes the coin is biased. Now we toss two coins:

In [46]:
set.seed(231)
x1=sample(c("H","T"),10,replace=TRUE,prob=c(1/3,2/3))
x2=sample(c("H","T"),10,replace=TRUE,prob=c(1/3,2/3))
x1
x2

In [47]:
# Test for bias
binom.test(sum(x1=='T'), n=length(x), p = 0.5)




	Exact binomial test

data:  sum(x1 == "T") and length(x)
number of successes = 9, number of trials = 10, p-value = 0.02148
alternative hypothesis: true probability of success is not equal to 0.5
95 percent confidence interval:
 0.5549839 0.9974714
sample estimates:
probability of success 
                   0.9 


In [48]:
binom.test(sum(x2=='T'), n=length(x), p = 0.5)


	Exact binomial test

data:  sum(x2 == "T") and length(x)
number of successes = 9, number of trials = 10, p-value = 0.02148
alternative hypothesis: true probability of success is not equal to 0.5
95 percent confidence interval:
 0.5549839 0.9974714
sample estimates:
probability of success 
                   0.9 


In [43]:
?binom.test


0,1
binom.test {stats},R Documentation

0,1
x,"number of successes, or a vector of length 2 giving the numbers of successes and failures, respectively."
n,number of trials; ignored if x has length 2.
p,hypothesized probability of success.
alternative,"indicates the alternative hypothesis and must be one of ""two.sided"", ""greater"" or ""less"". You can specify just the initial letter."
conf.level,confidence level for the returned confidence interval.

0,1
statistic,the number of successes.
parameter,the number of trials.
p.value,the p-value of the test.
conf.int,a confidence interval for the probability of success.
estimate,the estimated probability of success.
null.value,"the probability of success under the null, p."
alternative,a character string describing the alternative hypothesis.
method,"the character string ""Exact binomial test""."
data.name,a character string giving the names of the data.


In [49]:
# Suppose we toss 100 fair coins
count.reject=0
for (i in 1:100){
    x2=sample(c("H","T"),10,replace=TRUE,prob=c(1/2,1/2))
    result=binom.test(sum(x2=='T'), n=length(x2), p = 0.5)
    print(result$p.value)
    if (result$p.value<.05) {count.reject=count.reject+1}
    }
count.reject

[1] 0.34375
[1] 1
[1] 0.109375
[1] 1
[1] 1
[1] 0.109375
[1] 0.34375
[1] 0.34375
[1] 1
[1] 0.34375
[1] 0.7539063
[1] 0.7539063
[1] 0.34375
[1] 1
[1] 0.02148438
[1] 0.34375
[1] 0.34375
[1] 0.7539063
[1] 0.7539063
[1] 0.7539063
[1] 0.7539063
[1] 0.34375
[1] 0.7539063
[1] 0.7539063
[1] 1
[1] 0.34375
[1] 0.7539063
[1] 0.7539063
[1] 0.34375
[1] 0.7539063
[1] 1
[1] 0.34375
[1] 0.34375
[1] 1
[1] 1
[1] 0.7539063
[1] 0.7539063
[1] 0.109375
[1] 1
[1] 0.34375
[1] 0.7539063
[1] 1
[1] 0.34375
[1] 1
[1] 0.34375
[1] 0.34375
[1] 0.7539063
[1] 0.109375
[1] 1
[1] 0.109375
[1] 1
[1] 1
[1] 1
[1] 0.7539063
[1] 0.34375
[1] 1
[1] 0.109375
[1] 0.7539063
[1] 0.7539063
[1] 1
[1] 0.7539063
[1] 0.109375
[1] 0.34375
[1] 0.7539063
[1] 1
[1] 0.7539063
[1] 0.34375
[1] 0.109375
[1] 0.34375
[1] 0.34375
[1] 0.109375
[1] 0.7539063
[1] 0.7539063
[1] 0.7539063
[1] 0.7539063
[1] 1
[1] 1
[1] 0.34375
[1] 0.7539063
[1] 1
[1] 1
[1] 1
[1] 1
[1] 0.7539063
[1] 0.34375
[1] 0.02148438
[1] 0.109375
[1] 0.34375
[1] 0.7539063
[1] 0.3437

We can consider the number of tosses to be the 'sample size' and the number of hypotheses tested to be the number of coins tossed. Here, our sample size is small compared to the number of hypotheses tested. In genome data, the sample size is the number of patients and the number of hypotheses tested is the number of genes (or SNPs) analyzed. If we set a significance level of $0.05$, we are saying that we expect to find a false positive 5% of the time. So in our coin toss, we would expect to find the coin to be fair 5 times out of 100.

In [45]:
# What happens if we increase the number of tosses?
