# Further Hypothesis Testing

In [1]:
# Select this cell and type Ctrl-Enter to execute the code below.

library(tidyverse)

set_plot_dimensions <- function(width_choice, height_choice) {
    options(repr.plot.width=width_choice, repr.plot.height=height_choice)
}

cbPal <- c("#E69F00", "#56B4E9", "#009E73", "#F0E442", "#CC79A7", "#0072B2", "#D55E00")

set_plot_dimensions(5, 4)

── Attaching packages ─────────────────────────────────────── tidyverse 1.3.0 ──
✔ ggplot2 3.3.0     ✔ purrr   0.3.4
✔ tibble  3.0.1     ✔ dplyr   0.8.5
✔ tidyr   1.1.0     ✔ stringr 1.4.0
✔ readr   1.3.1     ✔ forcats 0.4.0
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()


In [None]:
# You should see "Attaching packages" and some ticks by the packages loaded.
# The "Conflicts" aren't a problem.

# Other problems loading the library? Try running this cell.

install.packages('tidyverse')

library(tidyverse)


## 7 - Correcting for multiple hypothesis tests

In [2]:
# Run this cell to load the data.

data <- read_csv("stars.csv")

type_key <- c('Brown Dwarf', 'Red Dwarf', 'White Dwarf', 'Main Sequence', 'Supergiant','Hypergiant')
spectral_classes <- c('O','B','A','F','G','K','M')

data$type <- factor(data$type)
data$spectral_class <- factor(data$spectral_class, levels=spectral_classes)


Parsed with column specification:
cols(
  temperature = col_double(),
  luminosity = col_double(),
  radius = col_double(),
  spectral_class = col_character(),
  type = col_double()
)


Unfortunately, there is a problem with the previous analysis.

Recall that the significance level, $\alpha$, is defined as the probability of incorrectly rejecting $H_0$ when it is actually true (i.e. the probability of a Type I error).

When we perform [*multiple related hypothesis tests*](https://en.wikipedia.org/wiki/Multiple_comparisons_problem), we increase the chances of producing such a Type I error.

For example, if $\alpha=0.05$ and we perform 100 tests, we *expect* to generate 5 Type I errors. This can be a serious problem when large numbers of hypothesis tests are carried out simultaneously, for example in screening thousands of genes for association with a disease.

We therefore need a strategy to control the rate of Type I errors. A very simple approach is given by the [*Bonferroni correction*](https://en.wikipedia.org/wiki/Bonferroni_correction):

### Bonferroni correction

#### Theory

When conducting $n$ related hypothesis tests, we reduce the significance level for each test to $\alpha/n$.

The probability of making a Type I error *over the whole set of tests* (known as the *family-wise error rate*, FWER) therefore remains at $\alpha$.

#### Application

In [3]:
print("Shapiro-Wilk test for normality")
print("")

p_values <- c()
alpha <- 0.05
n <- 6

for(t in seq(0,n-1)){
    sample <- data %>%
        filter(type == t) %>%
        pull(temperature) %>%
        log
    p_values <- append(p_values, shapiro.test(sample)$p.value)
}

print(paste("with uncorrected alpha =",signif(alpha,3),":"))
for(i in seq(1,n)){
    result <- ""
    if(p_values[i] < alpha) result <- "*** REJECT H0 ***"
    print(paste(i, type_key[i], ": p =", signif(p_values[i],3), result))
}

print("")    

print(paste("with Bonferroni correction, alpha/n =",signif(alpha/n,3),":"))
for(i in seq(1,n)){
    result <- ""
    if(p_values[i] < alpha/n) result <- "*** REJECT H0 ***"
    print(paste(i, type_key[i], ": p =", signif(p_values[i],3), result))
}

    

[1] "Shapiro-Wilk test for normality"
[1] ""
[1] "with uncorrected alpha = 0.05 :"
[1] "1 Brown Dwarf : p = 0.00216 *** REJECT H0 ***"
[1] "2 Red Dwarf : p = 0.0174 *** REJECT H0 ***"
[1] "3 White Dwarf : p = 0.143 "
[1] "4 Main Sequence : p = 0.0582 "
[1] "5 Supergiant : p = 0.00366 *** REJECT H0 ***"
[1] "6 Hypergiant : p = 5.31e-07 *** REJECT H0 ***"
[1] ""
[1] "with Bonferroni correction, alpha/n = 0.00833 :"
[1] "1 Brown Dwarf : p = 0.00216 *** REJECT H0 ***"
[1] "2 Red Dwarf : p = 0.0174 "
[1] "3 White Dwarf : p = 0.143 "
[1] "4 Main Sequence : p = 0.0582 "
[1] "5 Supergiant : p = 0.00366 *** REJECT H0 ***"
[1] "6 Hypergiant : p = 5.31e-07 *** REJECT H0 ***"



After correcting for multiple hypothesis testing, the red dwarf p-value is not significant.

We should report to Professor Xu that log(temperature) is not normally distributed for the brown dwarf, supergiant and hypergiant types.


### Alternative methods for multiple testing correction

The Bonferroni correction is simple to apply, but it may be too conservative when there is a very large numbers of tests, or when the tests are not independent (for example, genes are often related to other genes so are likely to share properties).

The [*Benjamini-Hochberg procedure*](https://en.wikipedia.org/wiki/False_discovery_rate#Benjamini–Hochberg_procedure) is an alternative approach. Instead of controlling the FWER, this method controls the *proportion of the positive tests that are incorrect*, i.e. the proportion of rejected $H_0$'s that are Type I errors. This is known as the *false-discovery rate*, FDR.

If a vector of p-values is available, the `p.adjust()` function will compute the *adjusted* p-values according to the Benjamini-Hochberg method (or several other available methods). Adjusted p-values are sometimes called *q-values*. These can then be compared to the original $\alpha$ value.

In [4]:
# In our example, the Benjamini-Hochberg method is less conservative than Bonferroni: 
# The red dwarf p-value still appears to be significant when we use this method.

q_values <- p.adjust(p_values, method='BH')

print(paste("with Benjamini-Hochberg correction, alpha =",signif(alpha,3),":"))
for(i in seq(1,n)){
    result <- ""
    if(q_values[i] < alpha) result <- "*** REJECT H0 ***"
    print(paste(i, type_key[i], ": q =", signif(q_values[i],3), result))
}

[1] "with Benjamini-Hochberg correction, alpha = 0.05 :"
[1] "1 Brown Dwarf : q = 0.00647 *** REJECT H0 ***"
[1] "2 Red Dwarf : q = 0.0261 *** REJECT H0 ***"
[1] "3 White Dwarf : q = 0.143 "
[1] "4 Main Sequence : q = 0.0699 "
[1] "5 Supergiant : q = 0.00732 *** REJECT H0 ***"
[1] "6 Hypergiant : q = 3.19e-06 *** REJECT H0 ***"


---