# Tutorial 6: Hypothesis Testing

### Lecture and Tutorial Learning Goals
From this section, students are expected to be able to:

1.	Give an example of a question you could answer with a hypothesis test.
2.	Identify potential limitations in the data, arising from the methods of data collection, to answer the question
3.	Specify a null and alternative hypothesis.
4.	Given an inferential question, formulate hypotheses to be used in a hypothesis test.
5.	Identify the correct steps and components of a basic hypothesis test.
6.	Write computer scripts to perform hypothesis testing via simulation, randomization and bootstrapping approaches, as well as interpret the output.
7.	Identify the advantages of simulation/randomization tests when estimating parameters different from proportions and means.
8.	Describe the relationship between confidence intervals and hypothesis testing.
9.	Discuss the potential limitations of these methods.

In [None]:
# Run this cell before continuing.
library(cowplot)
library(datateachr)
library(digest)
library(infer)
library(repr)
library(taxyvr)
library(tidyverse)
library(dplyr)
library(datateachr)
penguins <- read.csv("https://raw.githubusercontent.com/allisonhorst/palmerpenguins/master/inst/extdata/penguins.csv")
source("tests_tutorial_06.R")

## 1. Annual Maximum Flow Rate of Bow River

&emsp; When the snow melts in spring and summer, tons of water are released into the rivers, and floodings occur. One preventative measure is to keep track of the maximum flow of a river each year. For this question, we aim to prevent flooding by first studying the annual maximum daily discharge (in $m^3/s$) at a hydrometric station called <i> Bow River at Banff </i>, which is near Banff, Alberta. The data are saved to the data table <i>flow_sample</i>. Let's preview this dataset.

In [None]:
?flow_sample

In [None]:
head(flow_sample)

A village downstream wants to build a dam to mitigate the effects of annual flooding. To design this dam, we’re interested in studying the distribution of the maximum flow of Bow River at this station. A retired employee, who was monitoring many hydrometric stations in the area, claims that the annual maximum flow is typically around $210 m^3/s$. However, residents in the area claim  that the annual maximum flow is typically higher than $210 m^3/s$.

<b>Question 1.1: Selecting Parameter</b><br>
{points: 2}

Which of the parameters below would be most suitable to investigate and ultimately test the residents’ claim? (Select all that apply)

A. The mean of the annual maximum flow distribution at Bow River

B. The median of the annual maximum flow distribution at Bow River

C. The variance of the annual maximum flow distribution at Bow River

D. The proportion of annual maximum flow values at Bow River exceeding the residents’ claim

_Assign your answer to an object called `answer1.1`. Your answer should be a sequence of characters surrounded by quotes (e.g., "ABCD")._

In [None]:
#answer1.1 <- ""

# your code here
fail() # No Answer - remove if you provide an answer

answer1.1

In [None]:
# Here we check to see if you have given your answer the correct object name
# and if your answer is plausible. However, all other tests have been hidden
# so you can practice deciding when you have the correct answer.

test_that('Did not assign answer to an object called "answer1.1"', {
    expect_true(exists("answer1.1"))
  })



&emsp; For now, let us focus on the mean of the annual maximum flow. We want to test hypotheses about the mean <b>at the 5% significance level</b>. Here we assume that the annual maximum flow data originate from a distribution that does not change over the years (due to climate change, tectonic activities, etc).

<b>Question 1.2: Null Hypothesis</b><br>
{points: 2}

Which of the following would be an appropriate null hypothesis for us to set, given the residents’ and retired employee’s claims?

A. $H_0$: The mean of the annual maximum flow at Bow River is equal to $210 m^3/s$.

B. $H_0$: The mean of the annual maximum flow at Bow River is greater than $210 m^3/s$.

C. $H_0$: The mean of the annual maximum flow at Bow River is greater than or equal to $210 m^3/s$.

D. $H_0$: The mean of the annual maximum flow at Bow River is NOT equal to $210 m^3/s$.

Your answer should be a string containing one letter.

_Assign your answer to an object called `answer1.2`. Your answer should be a single character surrounded by quotes._

In [None]:
#answer1.2 <-""

# your code here
fail() # No Answer - remove if you provide an answer

answer1.2

In [None]:
# Here we check to see if you have given your answer the correct object name
# and if your answer is plausible. However, all other tests have been hidden
# so you can practice deciding when you have the correct answer.

test_that('Did not assign answer to an object called "answer1.2"', {
expect_true(exists("answer1.2"))
})



<b>Question 1.3: Alternative Hypothesis </b><br>
{points: 2}

Which of the following would be an appropriate alternative hypothesis for us to set, given the residents’ and retired employee’s claims?

A. $H_1$: The mean of the annual maximum flow at Bow River is less than $210 m^3/s$.

B. $H_1$: The mean of the annual maximum flow at Bow River is greater than $210 m^3/s$.

C. $H_1$: The mean of the annual maximum flow at Bow River is greater than or equal to $210 m^3/s$.

D. $H_1$: The mean of the annual maximum flow at Bow River is <b>NOT</b> equal to $210 m^3/s$.

Your answer should be a string containing one letter.

_Assign your answer to an object called `answer1.3`. Your answer should be a single character surrounded by quotes._

In [None]:
#answer1.3 <-""

# your code here
fail() # No Answer - remove if you provide an answer
answer1.3

In [None]:
# Here we check to see if you have given your answer the correct object name
# and if your answer is plausible. However, all other tests have been hidden
# so you can practice deciding when you have the correct answer.


test_that('Did not assign answer to an object called "answer1.3"', {
    expect_true(exists("answer1.3"))
})


&emsp; Now we select the maximum flow, keep only the year and the flow columns. We also find the sample size.

In [None]:
# Run this code before continuing
max_flow_sample <- 
    flow_sample %>%
    filter(extreme_type == 'maximum') %>%
    select(year, flow) %>% 
    rename(maximum_flow = flow)

head(max_flow_sample)


<b> Question 1.4</b> <br>
{points: 3}

Calculate the observed test statistic from `max_flow_sample` with the `infer` package, specify the response, and use the `calculate` function. Leave your answer as a 1x1 tibble with a column named `stat`.

_Assign your data frame to an object called `observed_mean`. Your data frame should have only one column, `stat`, and one row._

In [None]:
#observed_mean <-

# your code here
fail() # No Answer - remove if you provide an answer

observed_mean 

In [None]:
# Here we check to see if you have given your answer the correct object name
# and if your answer is plausible. However, all other tests have been hidden
# so you can practice deciding when you have the correct answer.

test_that('Did not assign answer to an object called "observed_mean"', {
    expect_true(exists("observed_mean"))
})

test_that("Solution should be a data frame", {
    expect_true("data.frame" %in% class(observed_mean))
})



<b>Question 1.5: Simulating from the null distribution</b> <br>
{points: 3}

Using the `infer` workflow, generate 1000 samples from the null distribution. Remember the steps:

1. `specify` the response;
2. `hypothesize`;
3. `generate` 1000 samples; 
4. and `calculate` the mean of each sample. 

_Assign your data frame to an object called `null_max_flow`. Your data frame should have two columns: `replicate` and  `stat`._

In [None]:
set.seed(1432) # Do not change this

#null_max_flow <-

# your code here
fail() # No Answer - remove if you provide an answer

head(null_max_flow)

In [None]:
# Here we check to see if you have given your answer the correct object name
# and if your answer is plausible. However, all other tests have been hidden
# so you can practice deciding when you have the correct answer.


test_that('Did not assign answer to an object called "null_max_flow"', {
    expect_true(exists("null_max_flow"))
  })

  test_that("Solution should be a data frame", {
    expect_true("data.frame" %in% class(null_max_flow))
  })



<b>Question 1.6</b><br>
{points: 3}

Plot the result of the hypothesis test using `visualize` with 10 bins, put a vertical bar for the observed test statistic, and shade the tail(s). Label the x-axis as `Mean`.

```r
max_flow_result_plot <- 
    null_max_flow %>% 
    visualize(bins = ...) + 
    shade_p_value(obs_stat = ..., direction = ...) +
    xlab(...)
```

<i>Assign your answer to an object called </i>`max_flow_result_plot`.

In [None]:
#max_flow_result_plot <-

# your code here
fail() # No Answer - remove if you provide an answer

max_flow_result_plot

In [None]:
test_1.6()

<b>Question 1.7</b><br>
{points: 3}

Use the `get_p_value` function from `infer` package to get the p-value from `null_max_flow`. 

```r
answer1.7 <- 
    ... %>% 
    get_p_value(obs_stat = ..., direction = ...)
```
<i>Assign your answer to an object called </i>`answer1.7`.

In [None]:
#answer1.7 <-

# your code here
fail() # No Answer - remove if you provide an answer
answer1.7

In [None]:
test_1.7()

<b>Question 1.8: Conclusion of the test </b><br>
{points: 3}

What can we conclude based on the result of the hypothesis test?

A. Given a p-value of 0.369 we do not reject the null hypothesis.

B. Given a p-value of 0.369 we reject the null hypothesis.

C. Given a p-value of 0.369 we do not reject the null hypothesis at the 5% significance level.

D. Given a p-value of 0.369 we reject the null hypothesis at the 5% significance level.

_Assign your answer to an object called `answer1.8`. Your response should be a single character surrounded by quotes._

In [None]:
#answer1.8 <-

# your code here
fail() # No Answer - remove if you provide an answer

answer1.8

In [None]:
# Here we check to see if you have given your answer the correct object name
# and if your answer is plausible. However, all other tests have been hidden
# so you can practice deciding when you have the correct answer.

test_that('Did not assign answer to an object called "answer1.8"', {
    expect_true(exists("answer1.8"))
})

test_that('Solution should be a single character ("A", "B", "C", or "D")', {
    expect_match(answer1.8, "a|b|c|d", ignore.case = TRUE)
})



<b>Question 1.9: Conclusion at a different significance level</b>

{Points: 3}

If we conducted the test at the 10% significance level instead, would our conclusion have been different?

A. Yes, it would have, the null hypothesis would be rejected.

B. Yes, it would have, the null hypothesis would be accepted.

C. Yes, it would have, the null hypothesis would NOT be rejected.

D. No, it wouldn’t.

Your answer should be a string containing one letter.

_Assign your answer to an object called `answer1.9`. Your answer should be a single character surrounded by quotes._

In [None]:
#answer1.9 <-

# your code here
fail() # No Answer - remove if you provide an answer
answer1.9

In [None]:
test_1.9()

<b> Question 1.10</b> <br>
{points: 3}

Now we would like to find the 90% confidence interval for the mean. First, let's find the bootstrap distribution for the mean by generating 1000 samples. Use the `infer` package and `max_flow_sample` to specify the response, generate 1000 samples, and calculate the mean. 


_Assign your data frame to an object called `mean_max_bootstrap_dist`. Your data frame should have two columns: `replicate` and  `stat`._

In [None]:
set.seed(6882) # Do not change this

#mean_max_bootstrap_dist <-

# your code here
fail() # No Answer - remove if you provide an answer

head(mean_max_bootstrap_dist)

In [None]:
test_1.10()

<b> Question 1.11 </b> <br>
{points: 2}

Using the boostrap distribution `mean_max_bootstrap_dist`, find the 90% confidence interval given by the 0.1-quantile and 1-quantile (max). 

```r
mean_max_flow_ci <- 
    ... %>% 
    summarise(lower_ci = ..., upper_ci = ...)
```

_Assign your data frame to an object called `mean_max_flow_ci`. Your data frame should have two columns: `lower_ci` and  `upper_ci`._

In [None]:
# mean_max_flow_ci <-

# your code here
fail() # No Answer - remove if you provide an answer

mean_max_flow_ci

In [None]:
test_1.11()

<b> Question 1.12 </b> <br>
{points: 2}

Using the `infer` package, visualize the confidence interval `mean_max_flow_ci` with the bootstrap distribution `mean_max_bootstrap_dist`.

<i>Assign your plot to an object called </i>`mean_flow_ci_plot`.

In [None]:
# mean_flow_ci_plot <- 

# your code here
fail() # No Answer - remove if you provide an answer

mean_flow_ci_plot

In [None]:
test_1.12()

## 2.  Flipper Lengths of Penguins

The dataset `penguins` contains size measurements for adult foraging penguins near Palmer Station, Antarctica. First, let's take a look at the first few rows of this dataset.

In [None]:
head(penguins)

&emsp; We want to study how Adelie and Chinstrap penguins are different. First, we study their flipper lengths (in mm).

<b> Question 2.1: Pre-processing</b> <br>
{points: 2}

Filter the `penguins` dataset to remove all rows with `NA` in `flipper_length_mm`, keep only the `Adelie` and `Chinstrap` species, and select the two columns `species` and `flipper_length_mm`.

_Assign your data frame to an object called `adelie_chinstrap_flipper`. Your data frame should have only two columns, `species` and `flipper_length_mm`._ 

In [None]:
#adelie_chinstrap_flipper <-

# your code here
fail() # No Answer - remove if you provide an answer

head(adelie_chinstrap_flipper)

In [None]:
test_2.1()

<b>Question 2.2: Null hypothesis</b> <br>
{points: 2}

&emsp; An ecologist suspects that flipper lengths affect their ability to swim. But are the flipper lengths different between the species? Looking at photos of the two penguin species, some claim that their flippers are generally the same length. However, an ecologist hypothesizes that they may not be the same length. To study the distributions of the flipper lengths of the two species, let's conduct a hypothesis test to examine their <b> difference in medians</b>.

Which of the following would be an appropriate null hypothesis for us to set, given the above situation?

A. $H_0$: The median flipper length of the Adelie penguins is the same as the median flipper length of the Chinstrap penguins.

B. $H_0$: The mean flipper length of the Adelie penguins is the same as the mean flipper length of the Chinstrap penguins.

C. $H_0$: The median flipper length of the Adelie penguins is different from the median flipper length of the Chinstrap penguins.

D. $H_0$: The median flipper length of the Adelie penguins is greater than the median flipper length of the Chinstrap penguins.

Your answer should be a string containing one letter.

_Assign your answer to an object called `answer2.2`. Your answer should be a single character surrounded by quotes._

In [None]:
#answer2.2 <-

# your code here
fail() # No Answer - remove if you provide an answer

answer2.2

In [None]:
# Here we check to see if you have given your answer the correct object name
# and if your answer is plausible. However, all other tests have been hidden
# so you can practice deciding when you have the correct answer.

test_that('Did not assign answer to an object called "answer2.2"', {
    expect_true(exists("answer2.2"))
})

test_that('Solution should be a single character ("A", "B", "C", or "D")', {
    expect_match(answer2.2, "a|b|c|d", ignore.case = TRUE)
})


<b>Question 2.3: Alternative Hypothesis</b><br>
{points: 2}

Which of the following would be an appropriate alternative hypothesis for us to set, given the above situation?

A. $H_1$: The median flipper length of the Adelie penguins is the same as the median flipper length of the Chinstrap penguins.

B. $H_1$: The mean flipper length of the Adelie penguins is different from the mean flipper length of the Chinstrap penguins.

C. $H_1$: The median flipper length of the Adelie penguins is different from the median flipper length of the Chinstrap penguins.

D. $H_1$: The median flipper length of the Adelie penguins is less than the median flipper length of the Chinstrap penguins.

Your answer should be a string containing one letter.

_Assign your answer to an object called `answer2.3`. Your answer should be a single character surrounded by quotes._

In [None]:
#answer2.3 <-

# your code here
fail() # No Answer - remove if you provide an answer

answer2.3

In [None]:
# Here we check to see if you have given your answer the correct object name
# and if your answer is plausible. However, all other tests have been hidden
# so you can practice deciding when you have the correct answer.

test_that('Did not assign answer to an object called "answer2.3"', {
    expect_true(exists("answer2.3"))
})

test_that('Solution should be a single character ("A", "B", "C", or "D")', {
    expect_match(answer2.3, "a|b|c|d", ignore.case = TRUE)
})


<b> Question 2.4 </b> <br>
{points: 2}

Count the numbers of Adelie penguins and Chinstrap penguins examined in `adelie_chinstrap_flipper`.

```r
penguin_count <-
    ... %>% 
    count(...)
```

_Assign your data frame to an object called `penguin_count`. Your data frame should have only two columns: `species` and `n`._

In [None]:
# penguin_count <-

# your code here
fail() # No Answer - remove if you provide an answer

penguin_count

In [None]:
test_2.4()

<b> Question 2.5</b><br>
{points: 3}

Calculate the observed test statistic with the `infer` package. Use `adelie_chinstrap_flipper` to specify the response and explanatory variables, and calculate Adelie's median minus Chinstrap's median. 

_Assign your data frame to an object called `observed_diff_in_medians`. Your data frame should have only one column, `stat`, and one row._

In [None]:
#obs_diff_in_medians <- 

# your code here
fail() # No Answer - remove if you provide an answer

obs_diff_in_medians

In [None]:
test_2.5()

<b>Question 2.6: Simulating from the null distribution</b> <br>
{points: 3}

Using the `infer` package, generate 1000 samples from the null distribution. Use `adelie_chinstrap_flipper` to specify the response and explanatory variables, hypothesize, generate 1000 samples and calculate Adelie's median minus Chinstrap's median.

_Assign your data frame to an object called `null_diff_in_medians`. Your data frame should have only two columns: `replicate` and `stat`._

In [None]:
set.seed(5437) # Do not change this

#null_diff_in_medians <-

# your code here
fail() # No Answer - remove if you provide an answer

head(null_diff_in_medians)

In [None]:
test_2.6()

<b>Question 2.7</b> <br>
{points: 3}

Plot the result of the hypothesis test with `visualize` with 10 bins, put a vertical bar for the observed test statistic `obs_diff_in_medians`, and shade the tail(s).

_Assign your plot to an object called `diff_in_medians_plot`._

In [None]:
#diff_in_medians_plot <-

# your code here
fail() # No Answer - remove if you provide an answer

diff_in_medians_plot

In [None]:
test_2.7()

<b>Question 2.8</b> <br>
{points: 3}

Obtain the p-value of `obs_diff_in_medians` from `null_diff_in_medians`. Leave your answer as a $1 \times 1$ tibble with column name `p_value`.

_Assign your data frame to an object called `answer2`. Your data frame should have only one column: `p_value`._

In [None]:
#answer2.8 <-

# your code here
fail() # No Answer - remove if you provide an answer

answer2.8

In [None]:
test_2.8()

<b> Question 2.9 </b> <br>
{points: 2}

We should never report a p-value of 0 because this suggests that making a Type I error is impossible. But this is too bold of a claim to make.

What would be the best way to report the p-value? Think about what the next smallest p-value is possible to be calculated, given that we are using 1000 repetitions to calculate the sample.

A. The p-value is < 0.05

B. The p-value is < 0.01

C. The p-value is < 0.001

D. The p-value is < 0.0001


_Assign you answer to an object called `answer2.9`. Your answer should be a string containing one letter._

In [None]:
#answer2.9 <-

# your code here
fail() # No Answer - remove if you provide an answer
answer2.9

In [None]:
test_2.9()

<b> Question 2.10: Conclusion of the test </b> <br>
{points: 3}

What can we conclude based on the result of the hypothesis test?

A. Given a p-value < 0.001 we reject the null hypothesis.

B. Given a p-value < 0.001 we accept the alternative hypothesis at the 5% significance level.

C. Given a p-value < 0.001 we do not reject the null hypothesis at the 5% significance level.

D. Given a p-value < 0.001 we reject the null hypothesis at the 5% significance level.


_Assign your answer to an object called `answer2.10`. Your answer should be a string containing one letter._

In [None]:
#answer2.10 <-

# your code here
fail() # No Answer - remove if you provide an answer

answer2.10

In [None]:
# Here we check to see if you have given your answer the correct object name
# and if your answer is plausible. However, all other tests have been hidden
# so you can practice deciding when you have the correct answer.

test_that('Did not assign answer to an object called "answer2.10"', {
    expect_true(exists("answer2.10"))
})

test_that('Solution should be a single character ("A", "B", "C", or "D")', {
    expect_match(answer2.10, "a|b|c|d", ignore.case = TRUE)
})



<b> Question 2.11</b><br>
{points: 3}

Now we would like to find the 90% confidence interval for the difference in median. First, let's find the bootstrap distribution for the difference in medians with the `infer` package. Use `diff_in_medians_bootstrap_dist` to specify the response and explanatory variables, generate 1000 samples, and calculate Adelie's median minus Chinstrap's median. 

_Assign your data frame to an object called `diff_in_medians_bootstrap_dist`. Your data frame should have only two columns: `replicate` and `stat`._

In [None]:
set.seed(9263) # Do not change this

#diff_in_medians_bootstrap_dist <-

# your code here
fail() # No Answer - remove if you provide an answer

head(diff_in_medians_bootstrap_dist)

In [None]:
test_2.11()

<b> Question 2.12 </b><br>
{points: 2}

Use `diff_in_medians_bootstrap_dist` to find the 90% confidence interval.

_Assign your data frame to an object called `diff_in_medians_ci`. Your data frame should have two columns: `lower_ci` and  `upper_ci`._

In [None]:
#diff_in_medians_ci <-

# your code here
fail() # No Answer - remove if you provide an answer

diff_in_medians_ci

In [None]:
# Here we check to see if you have given your answer the correct object name
# and if your answer is plausible. However, all other tests have been hidden
# so you can practice deciding when you have the correct answer.

test_that('Did not assign answer to an object called "diff_in_medians_ci"', {
expect_true(exists("diff_in_medians_ci"))
})

test_that("Solution should be a data frame", {
expect_true("data.frame" %in% class(diff_in_medians_ci))
})

expected_colnames <- c("lower_ci", "upper_ci")
given_colnames <- colnames(diff_in_medians_ci)
test_that("Data frame does not have the correct columns", {
    expect_equal(length(setdiff(
      union(expected_colnames, given_colnames),
      intersect(expected_colnames, given_colnames)
    )), 0)
})


<b> Question 2.13 </b><br>
{points: 2}

Visualize the confidence interval `diff_in_medians_ci` with the bootstrap distribution `diff_in_medians_bootstrap_dist`.

<i>Assign your plot to an object called </i>`diff_in_medians_ci_plot`.

In [None]:
# diff_in_medians_ci_plot <-

# your code here
fail() # No Answer - remove if you provide an answer

diff_in_medians_ci_plot

In [None]:
test_2.13()

## 3. Breast Cancer and Radiation Therapy

&emsp; For this question, we will use the dataset found at https://archive.ics.uci.edu/ml/datasets/Breast+Cancer. The dataset contains information on 286 breast cancer patients, including variables on tumour size, tumour location, radiation therapy, cancer recurrence, and other basic medical history data. Given this dataset, we want to investigate whether there is a significant difference in the proportions of cancer recurrence between patients who were treated with experimental radiation therapy and patients who were not (i.e. received an alternate treatment). We will assume that the patients have been randomized into each of these two treatment groups.

&emsp; Let's load this dataset. Note that the "irradiat" column indicates whether or not the patient received radiation therapy, while the "Class" column indicates whether or not the patient experienced a cancer recurrence event.

In [None]:
breast_cancer <- read.csv(url("https://archive.ics.uci.edu/ml/machine-learning-databases/breast-cancer/breast-cancer.data"),header=FALSE)
colnames(breast_cancer) <- c("class", "age", "menopause", "tumor-size", "inv-nodes", "node-caps", "deg-malig", "breast", "breast-quad", "irradiat")
head(breast_cancer)

In [None]:
recurrence_irradiat <- 
    breast_cancer %>%
    select(class, irradiat)

head(recurrence_irradiat)

&emsp; Let's group by `class` and `irradiat` and tally how many samples are in each group.

In [None]:
recurrence_irradiat %>%
    group_by(irradiat, class) %>%
    tally() %>%
    spread(irradiat, n)

<b>Question 3.1</b><br>
{points: 3}

Let $p_{1}$ be the proportion of radiation therapy patients (irradiat=true) that subsequently experienced cancer recurrence, and let $p_{2}$ be the proportion of patients that did not receive radiation therapy (irradiat=false) and subsequently experienced cancer recurrence. 

We want to test $$H_0: p_{1} = p_{2},$$ and $$H_a: p_{1} \neq p_{2}.$$

Calculate the observed test statistic $\hat{p}_1 - \hat{p}_2$ using `recurrence_irradiat` by first specifying the response and explanatory variables.

_Assign your data frame to an object called `obs_diff_prop`. Your data frame should have only one column, `stat`, and one row._

In [None]:
#obs_diff_prop <- 

# your code here
fail() # No Answer - remove if you provide an answer

obs_diff_prop 

In [None]:
test_3.1()

<b>Question 3.2: Null Distribution</b><br>
{points: 3}

Generate 1000 samples from the null distribution. Use `recurrence_irradiat` to specify the response and explanatory variables, hypothesize, generate 1000 samples and calculate the proportion of irradiated patients having recurrent cancer minus the proportion of non-irradiated patients having recurrent cancer. 

_Assign your data frame to an object called `irradiat_null_distribution`. Your data frame should have only two columns: `replicate` and `stat`._

In [None]:
set.seed(3526)
#irradiat_null_distribution <-

# your code here
fail() # No Answer - remove if you provide an answer

head(irradiat_null_distribution)

In [None]:
test_3.2()

<b>Question 3.3</b><br>
{points: 3}

Plot the result of the hypothesis test using `visualize` with 10 bins, put a vertical bar for the observed test statistic `obs_diff_prop`, and shade the tail(s).

<i>Assign your answer to an object called </i>`irradiate_result_plot`.

In [None]:
#irradiate_result_plot <-

# your code here
fail() # No Answer - remove if you provide an answer

irradiate_result_plot 

In [None]:
test_3.3()

<b>Question 3.4: Calculate p-value</b> <br>
{points: 3}

Obtain the p-value from `irradiat_null_distribution`. Leave your answer as a $1 \times 1$ tibble with column name `p_value`.

<i>Assign your answer to an object called </i>`answer3.4`.

In [None]:
#answer3.4<-

# your code here
fail() # No Answer - remove if you provide an answer
answer3.4

In [None]:
# Here we check to see if you have given your answer the correct object name
# and if your answer is plausible. However, all other tests have been hidden
# so you can practice deciding when you have the correct answer.

test_that('Did not assign answer to an object called "answer3.4"', {
    expect_true(exists("answer3.4"))
  })

  test_that("Solution should be a data frame", {
    expect_true("data.frame" %in% class(answer3.4))
  })

expected_colnames <- c("p_value")
given_colnames <- colnames(answer3.4)
test_that("Data frame does not have the correct columns", {
    expect_equal(length(setdiff(
      union(expected_colnames, given_colnames),
      intersect(expected_colnames, given_colnames)
    )), 0)
})


&emsp; Thus, given the p-value above, we reject the null hypothesis at 5% significance level.

&emsp; Given this result and the test statistic that we observed in Question 3.1, there is evidence to suggest that cancer recurrence is associated with the type of treatment received. Specifically, patients who received the experimental radiation therapy may be more likely to experience cancer recurrence than patients who did not. This may be attributable to its lower effectiveness at eliminating the cancer present, compared to alternative treatments.