# Tutorial 10: Errors in Inference

From this section, students are expected to be able to:

- Define type I & II errors.
- Describe responsible use and reporting of p-values from hypothesis tests.
- Discuss how these errors are linked to a "reproducibility crisis".
- Measure how these errors amplify when performing multiple hypothesis testing, in the context of multiple comparisons.

In [None]:
# Run this cell before continuing.
library(tidyverse)
library(datateachr)
library(repr)
library(digest)
library(infer)
library(digest)
library(testthat)

## 1. Type of Errors

**Question 1.1**
<br>{points: 3}

Pfizer, a pharmaceutical company, wants to develop a diagnostic test for hemochromatosis, a genetic condition where the body accumulates iron. They suspect that people with this condition end up producing considerably less transferrin (a protein that binds to iron) than healthy patients. To investigate this claim, they want to conduct a hypothesis test. 

Identify Type I and Type II Errors in this case.


DOUBLE CLICK TO EDIT **THIS CELL** AND REPLACE THIS TEXT WITH YOUR ANSWER.

**Question 1.2**
<br>{points: 1}

After careful investigation, Pfizer concluded that transferrin levels in people with hemochromatosis are significantly lower than those in healthy people. Pfizer might be committing:

A. Type I Error
 
B. Type II Error

C. None of the above.

_Assign your answer to an object called `answer1.2`. Your answer should be a single character surrounded by quotes._

In [None]:
# answer1.2 <- ...

# your code here
fail() # No Answer - remove if you provide an answer

In [None]:
# Here we check to see if you have given your answer the correct object name
# and if your answer is plausible. However, all other tests have been hidden
# so you can practice deciding when you have the correct answer.
test_that('Did not assign answer to an object called "answer1.2"', {
  expect_true(exists("answer1.2"))
})
test_that('Solution should be a single character ("A", "B", "C", or "D")', {
  expect_match(answer1.2, "a|b|c", ignore.case = TRUE)
})

**Question 1.3**
<br>{points: 2}

Suppose the transferrin level in patients with hemochromatosis follows $N(260 \text{ mg/dL}, 35 \text{ mg/dL})$ and in healthy people follows $N(275 \text{ mg/dL}, 35 \text{ mg/dL})$. 

If Pfizer tested $H_0: \mu = 275$ vs $H_1: \mu < 275$, at 5% significance level, using a sample of 30 patients with hemochromatosis, what are:
1. probability of type I error?
2. probability of type II error?
3. the power of the test?

_Assign your data frame to an object called `pfizer_errors`. The data frame should have three columns, `type_I_error`, `type_II_error`, and `power_of_test`, and one row with the corresponding probability._

In [None]:
# n <- 30
# pfizer_errors <- tibble(type_I_error = ...,
#                         type_II_error = ..., 
#                         power_of_test = ...)

# your code here
fail() # No Answer - remove if you provide an answer
head(pfizer_errors)

In [None]:
# Here we check to see if you have given your answer the correct object name
# and if your answer is plausible. However, all other tests have been hidden
# so you can practice deciding when you have the correct answer.
test_that('Did not assign answer to an object called "pfizer_errors"', {
  expect_true(exists("pfizer_errors"))
})
test_that("Solution should be a data frame", {
  expect_true("data.frame" %in% class(pfizer_errors))
})

expected_colnames <- c("type_I_error", "type_II_error", "power_of_test")
given_colnames <- colnames(pfizer_errors)
test_that("Data frame does not have the correct columns", {
  expect_equal(length(setdiff(
    union(expected_colnames, given_colnames),
    intersect(expected_colnames, given_colnames)
  )), 0)
})


**Question 1.4**
<br>{points: 2}

Suppose that Pfizer knew that the mean transferrin level in healthy people was $\mu_{healthy} = 275$ and the standard deviation was $\sigma = 35$ for both healthy people and patients with hemochromatosis. They want to reject $H_0: \mu = 275$ in favour of $H_1: \mu < 275$ at least 95% of the times if the difference in transferrin level between healthy people and patients with hemochromatosis is 5 mg/dL or more. Using a 5% significance level, what is the smallest sample size to satisfy this requirement?


DOUBLE CLICK TO EDIT **THIS CELL** AND REPLACE THIS TEXT WITH YOUR ANSWER.

## 2. Reporting Hypothesis Tests

Unfortunately, it is widespread for studies to report the p-values only. The p-values are indeed an important aspect of hypothesis tests. However, it does not show you the full picture. Let's take a look in an example \[[Amrhein et al. ](https://www.nature.com/articles/d41586-019-00857-9), 2019; [Schmidt & Rothman](https://www-sciencedirect-com.ezproxy.library.ubc.ca/science/article/pii/S0167527314019251?via%3Dihub), 2014\]. 

Two studies examined one side effect of anti-inflammatory drug on the risk of atrial fibrillation \[[1](https://www-sciencedirect-com.ezproxy.library.ubc.ca/science/article/pii/S0167527314019251?via%3Dihub#bb0010)\]\[[2](https://pdf.sciencedirectassets.com/271057/1-s2.0-S0167527313X00221/1-s2.0-S0167527312011643/main.pdf?X-Amz-Security-Token=IQoJb3JpZ2luX2VjEGIaCXVzLWVhc3QtMSJIMEYCIQDLG3nNNWTq0byPLf0NLthWbCcyNuDzCWoAMQ%2F9aQkFiQIhAOprDZGjYeH3PUgTPSwNZ6yCrjYpeE0ndDM59fmlPEmlKr0DCJr%2F%2F%2F%2F%2F%2F%2F%2F%2F%2FwEQAxoMMDU5MDAzNTQ2ODY1IgySaOWh85l%2BxJu8rHsqkQNB3h3EyPDuVl13%2F9Du%2FDg%2FDUzcMRyfwbua%2FnDvY5m2h0nWhtPXTjVxxNxhHu1qL%2BsOiYPDmXjtYZAYt7HwpEIKdFXVV1Tv6tmnt88c9CsnhRtUoPvAk%2FUOEHDdlzxRzSu5btGqNCjlygWRA4jdtwrlwfzPZu8iwhlRfXyIlbXoJbX9zVAgTrVbSCqZ7pL6Za8HapkeQkU%2FebPztAuxaekqxgcqmNt697Yzsp4V%2FRop5lw03A7H4z0f71kv7MZIAiHkA8WxXKhqyghGB9oTdqf90iFZ8Ds5QUgg9qJuu74jE9tVWho3Nex1gjFD2WDmRhz6FPSXBcnv%2Bz9HJ5Ssw9ggVWk3BdmioUKTiLuSwV4ekkNNyY98Vjd8jEXqdXyYy9iVuNqu4HV5ETgwfniMULLZSoogGIUXnHVJMacDASbUTk2MUit5m3fsoTuSV2ugNNmfl2oLuyf5GdyTv2lMbEA1SHOaW%2FdyssCTofEZIaV%2BI1F8NtyeWzSp7fKGutK1qwBSuXhK0SdSPG4kM2ff73rW8zDR3t%2BCBjrqAaxwlUxDzLKSWMq%2BnGuMDwCQwNdByc00Y1DSL%2BlJcwfkENDkQzKcjAF%2FVkyqbpdQyKDsVYOGHxunR2Og9uXfw0KwVM0HRSTfrrnP1TybUi6en0X2Uu%2FirleIWqlYEs%2F1fAiOk%2Be25pRMKAyc3zhiq%2F0aguLaEF3pK8x6KBnmMoJplmFQVmdZy0K58E9fQtZox3KPMFlFXhlvLHo4OlSNxRui8AWjb8cP1qKoDSQ65xdw6RI4PI44g5HvH0v%2FFPYZum8yhZ%2FzcpAMxUa80b5CVqAsPzAL9bP6q937MIWgbXFYUE2qmIbfcqZ1vw%3D%3D&X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Date=20210322T021532Z&X-Amz-SignedHeaders=host&X-Amz-Expires=300&X-Amz-Credential=ASIAQ3PHCVTYRKWCAGCO%2F20210322%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Signature=c0513e582aab8bd823f30688eb1e857e312254f0eaea55ef4aa71aa09ef57dde&hash=f62f713f16bf5dba6f1eb34391b4b905375ca55be15a9b5297e147132cc639a4&host=68042c943591013ac2b2430a89b270f6af2c76d8dfd086a07176afe7c76c2c61&pii=S0167527312011643&tid=spdf-5e1ff066-4996-4f49-a0a5-31f941d616b0&sid=31e6d1e96e8fd8482638b5505eaa89d801e5gxrqa&type=client)\]. The comparison is given by a ratio of chances, therefore, a ratio of 1 means that both groups (people exposed to anti-inflammatory and people not exposed) have the same chance of atrial fibrillation. On the other hand, a ratio higher than 1, say 1.2, means that people exposed to anti-inflammatory drugs have a 20% more chance of atrial fibrillation. The Schmidt et al's study obtained a p-value of 0.0003, while the Chao et al's study obtained a p-value of 0.091.

**Question 2.1**
<br>{points: 1}

True or false?

 Based on these p-values, at 5% significance, Schmidt et al's study rejected the null hypothesis of the same risk for both groups in favour of $H_1$, while Chao et al's study did not reject the null hypothesis.

_Assign your answer to an object called `answer2.1`. Your answer should be either "true" or "false", surrounded by quotes._

In [None]:
# answer2.1 <- ...

# your code here
fail() # No Answer - remove if you provide an answer

In [None]:
# Here we check to see if you have given your answer the correct object name
# and if your answer is plausible. However, all other tests have been hidden
# so you can practice deciding when you have the correct answer.
test_that('Did not assign answer to an object called "answer2.1"', {
  expect_true(exists("answer2.1"))
})
test_that('Answer should be "true" or "false"', {
  expect_match(answer2.1, "true|false", ignore.case = TRUE)
})

Looking further, we have the following additional information:

Study | Observed risk ratio | Confidence interval | p-value | claim
------|---------------------|---------------------|---------|------
Schmidt et al | 1.2 | \[1.09; 1.33\] | 0.0003 | significant risk difference
Chao et al | 1.2 | \[0.97; 1.48\] | 0.091 | no-significant risk difference

As you can see from the table above, both studies presented the exact same observed risk. In addition, although the confidence interval of Chao et al's study contains 1 (hence the non-significant conclusion), the interval goes from a mere 3% below 1 to a considerable 48% above 1 (which means a 48% higher risk). By looking at the complete picture, the studies seem to be corroborating each other instead of contradicting each other. The difference in the conclusion of the hypothesis test is due to the fact that Schmidt et al's study had a much higher precision (compare the length of the confidence intervals).

**Question 2.2**
<br>{points: 1}

True or false?

The problem is that, by only reporting the p-value, we are missing information on (select all that apply):

A. the observed effect size;

B. the error associated with the statistic;

C. the ability to decide if $H_0$ is rejected or not for a given significance level;

_Assign your answer to an object called `answer2.2`. Your answer should be a sequence of characters surrounded by quotes._

In [None]:
# answer2.2 <- ...

# your code here
fail() # No Answer - remove if you provide an answer

In [None]:
# Here we check to see if you have given your answer the correct object name
# and if your answer is plausible. However, all other tests have been hidden
# so you can practice deciding when you have the correct answer.
test_that('Did not assign answer to an object called "answer2.2"', {
  expect_true(exists("answer2.2"))
})


This example illustrates the fundamental importance of reporting all the quantities associated with the hypothesis tests: (1) observed test statistic; (2) measurement of error, either a confidence interval or standard error; (3) p-value.  By providing all these quantities, we are able to properly contrast multiple studies and make better-informed conclusions about the significance of the observed differences. 

**Question 2.3**
<br>{points: 1}

As the last exercise, read these two articles:

1. [[Amrhein et al., 2019;](https://www.nature.com/articles/d41586-019-00857-9)\]. 
2. [ASA Statement on Statistical Significance and P-Values](https://amstat.tandfonline.com/doi/full/10.1080/00031305.2016.1154108#.YFgHX0NKho_)




True or false?

I read both articles. 

_Assign your answer to an object called `answer2.3`. Your answer should be either "true" or "false", surrounded by quotes._

In [None]:
# answer2.3 <- ...

# your code here
fail() # No Answer - remove if you provide an answer

In [None]:
test_that('Did not assign answer to an object called "answer2.3"', {
    expect_true(exists("answer2.3"))
})

test_that('Solution should be "true" or "false"', {
    expect_match(answer2.3, "true|false", ignore.case = TRUE)
})

answer_hash <- digest(tolower(answer2.3))
if (answer_hash == "d2a90307aac5ae8d0ef58e2fe730d38b") {
    print("You really should read these articles. :) ")
}

test_that("Solution is incorrect", {
    expect_equal(answer_hash, "05ca18b596514af73f6880309a21b5dd")
})

print("Success!")
