#  Tutorial 8: Classical Tests Based on Normal and t- Distributions

#### Lecture and Tutorial Learning Goals:
From this section, students are expected to be able to:

1. Describe a t-distribution and its relationship with the normal distribution.
2. Use results from the assumption of normality or the Central Limit Theorem to perform estimation and hypothesis testing.
3. Compare and contrast the parts of estimation and hypothesis testing that differ between simulation- and resampling-based approaches with the assumption of normality or the Central Limit Theorem-based approaches.
4. Write a computer script to perform hypothesis testing based on results from the assumption of normality or the Central Limit Theorem.
5. Discuss the potential limitations of these methods.

In [1]:
# Run this cell before continuing.
library(cowplot)
library(datateachr)
library(digest)
library(infer)
library(repr)
library(taxyvr)
library(tidyverse)
library(broom)
library(digest)
library(testthat)
source("tests_tutorial_08.r")


********************************************************

Note: As of version 1.0.0, cowplot does not change the

  default ggplot2 theme anymore. To recover the previous

  behavior, execute:
  theme_set(theme_cowplot())

********************************************************


“package ‘infer’ was built under R version 4.0.2”
── [1mAttaching packages[22m ─────────────────────────────────────── tidyverse 1.3.0 ──

[32m✔[39m [34mggplot2[39m 3.3.2     [32m✔[39m [34mpurrr  [39m 0.3.4
[32m✔[39m [34mtibble [39m 3.0.3     [32m✔[39m [34mdplyr  [39m 1.0.2
[32m✔[39m [34mtidyr  [39m 1.1.1     [32m✔[39m [34mstringr[39m 1.4.0
[32m✔[39m [34mreadr  [39m 1.3.1     [32m✔[39m [34mforcats[39m 0.5.0

“package ‘ggplot2’ was built under R version 4.0.1”
“package ‘tibble’ was built under R version 4.0.2”
“package ‘tidyr’ was built under R version 4.0.2”
“package ‘dplyr’ was built under R version 4.0.2”
── [1mConflicts[22m ────────────────────────────────────────── tid

## 1. Lotto 6/42

Though the lottery is fair in general, many believe that the distribution of winning numbers is not even. Some lottery enthusiasts spend years studying the pattern of numbers and come up with complex theories to win the grand prize in the next draw. For this question, we will study past winning numbers of Lotto 6/42, which is a lottery in Ireland. Every time, six winning numbers are drawn from a pool of 42 numbers without replacement. You can view the description of the data [here](http://jse.amstat.org/datasets/lotto.txt). Let's load and preview the data.

In [2]:
lotto6_42 <- read.delim("http://jse.amstat.org/datasets/lotto.dat.txt", header = FALSE, sep = "\t", dec = ".")
colnames(lotto6_42) <- c("code", "first", "second", "third", "fourth", "fifth", "sixth")
lotto6_42 <- lotto6_42 %>%
                filter(code == 2) %>% ## to look at actual lottory numbers
                subset(select=-c(code))
head(lotto6_42)

Unnamed: 0_level_0,first,second,third,fourth,fifth,sixth
Unnamed: 0_level_1,<int>,<int>,<int>,<int>,<int>,<int>
1,4,17,37,10,21,29
2,21,39,10,15,42,27
3,5,42,27,29,28,20
4,35,40,32,19,30,10
5,28,41,12,29,19,35
6,31,42,30,21,24,33


One claim is that the winning numbers are usually double-digit numbers, and in most cases, there is at most one single-digit winning number in each draw. Looking at the first six rows, we may agree with this claim. But now, we know better than just look at anecdotal evidence. Let's test the hypothesis that the probability of having more than one single-digit winning number in a draw is not what it should be if the lottery was fair. What does the data tell us?

Note that, in total, ${42 - 9 \choose 6}+{42 - 9 \choose 5}{9 \choose 1} = 3,243,592$ out of ${42 \choose 6} = 5,245,786$ games have zero or one single-digit winning number. If the lottery is fair, we expect all the games to have the same chance of occurrence.  

**Question 1.1**
<br>{points: 1}

Let $p$ be the probability of having at least two single-digit winning numbers in a draw. 
Considering the scenario above, the null hypothesis is:

A. $H_0: p = \frac{2,002,194}{5,245,786}$

B. $H_0: p = \frac{3,243,592}{5,245,786}$

C. $H_0: p = \frac{2,002,194}{3,243,592}$

D. $H_0: p = \frac{9}{42}$

_Assign your answer to an object called `answer1.1`. Your answer should be a single character surrounded by quotes. Also, create a variable `lotto_p0` and assign the probability of the null hypothesis._

In [3]:
# answer1.1 <- ...
# lotto_p0 <- ..

### BEGIN SOLUTION
answer1.1 <- "A"
lotto_p0 <- 2002194/5245786
### END SOLUTION

lotto_p0

In [4]:
# Here we check to see if you have given your answer the correct object name
# and if your answer is plausible. However, all other tests have been hidden
# so you can practice deciding when you have the correct answer.
test_that('Did not assign answer to an object called "answer1.1"', {
  expect_true(exists("answer1.1"))
})

test_that('Solution should be a single character ("A", "B", "C", or "D")', {
  expect_match(answer1.1, "a|b|c|d", ignore.case = TRUE)
})

test_that('Did not assign answer to an object called "lotto_p0"', {
  expect_true(exists("lotto_p0"))
})

answer_as_numeric <- as.numeric(lotto_p0)
test_that("Solution should be a number", {
  expect_false(is.na(answer_as_numeric))
})

test_that("Solution is incorrect", {
  expect_equal(digest(as.integer(answer_as_numeric * 10e6)), "089e824515c530884e22bea2dd98a447")
})

### BEGIN HIDDEN TESTS
test_that("Solution is incorrect", {
  expect_equal(digest(tolower(answer1.1)), "127a2ec00989b9f7faf671ed470be7f8")
})
print("Success!")
### END HIDDEN TESTS

[1] "Success!"


<b> Question 1.2 </b>
<br> {points: 1}

What is the correct alternative hypothesis?

A. $H_a: p > \frac{2,002,194}{5,245,786}$

B. $H_a: p \neq \frac{2,002,194}{5,245,786}$

C. $H_a: p > \frac{2,002,194}{5,245,786}$

D. $H_a: p > \frac{3,243,592}{5,245,786}$

E. $H_a: p \neq \frac{3,243,592}{5,245,786}$

F. $H_a: p > \frac{3,243,592}{5,245,786}$

G. $H_0: p < \frac{2,002,194}{3,243,592}$

H. $H_0: p < \frac{9}{42}$

<i>Assign your answer to an object called</i> `answer1.2`</i>. Your answer should be a single character surrounded by quotes.</i>

In [5]:
#answer1.2 <-

### BEGIN SOLUTION
answer1.2 <- "B"
### END SOLUTION

answer1.2

In [6]:
# Here we check to see if you have given your answer the correct object name
# and if your answer is plausible. However, all other tests have been hidden
# so you can practice deciding when you have the correct answer.
test_that('Did not assign answer to an object called "answer1.2"', {
  expect_true(exists("answer1.2"))
})
test_that('Solution should be a single character ("A", "B", "C", "D", "E", "F", "G", or "H")', {
  expect_match(answer1.2, "a|b|c|d|e|f|g|h", ignore.case = TRUE)
})
### BEGIN HIDDEN TESTS
test_that("Solution is incorrect", {
  expect_equal(digest(tolower(answer1.2)), "ddf100612805359cd81fdc5ce3b9fbba")
})
print("Success!")
### END HIDDEN TESTS

[1] "Success!"


<b> Question 1.3</b>
<br> {points: 1}

Calculate the sample proportion, $\hat{p}$, of draws with at least two single-digit numbers. 
(Hint: take a look at the function `rowSums`.)

<i>Assign your answer to an object called `lotto_p_hat`. Your answer should be a single number.</i>

In [7]:
# lotto_p_hat <- 
    

### BEGIN SOLUTION
lotto_p_hat <- 
    lotto6_42 %>% 
    rowwise() %>%
    summarise(n_single_digits = sum(c_across() <10), .groups = "drop") %>% 
    filter(n_single_digits > 1) %>% 
    nrow() / nrow(lotto6_42)

# You could also do like this:
lotto_p_hat <- mean(rowSums(lotto6_42 < 10) > 1)

### END SOLUTION

lotto_p_hat

In [8]:
test_1.3()

[1] "Success!"


<b>Question 1.4</b>
<br> {points: 1}

Calculate the standard error of $\hat{p}$ of the null model.

<i>Assign your answer to an object called `lotto_std_error`. Your answer should be a single number.</i>

In [9]:
#lotto_std_error <-

### BEGIN SOLUTION
lotto_std_error <- sqrt(lotto_p0 * (1 - lotto_p0) / nrow(lotto6_42))
### END SOLUTION

lotto_std_error

In [10]:
test_1.4()

[1] "Success!"


<b>Question 1.5: Calculate p-value</b>
<br> {points: 1}

Check if the assumptions for the CLT hold. Calculate the p-value.

<i>Assign your answer to an object called </i>`lotto_p_value`.

In [11]:
# lotto_p_value<-

### BEGIN SOLUTION
lotto_p_value <- 2 * pnorm(lotto_p_hat, lotto_p0, lotto_std_error)
### END SOLUTION

lotto_p_value

In [12]:
test_1.5()

[1] "Success!"


<b>Question 1.6: Conclusion</b>
<br> {points: 1}

What can we conclude from this test?

A. It would be very unlikely to observe $\hat{p} = 0.375$ in 264 draws, if the true proportion were $p =\frac{2,002,194}{5,245,786}$. Therefore, we reject $H_0$ and conclude that the lottery is not fair.

B. It would be very unlikely to observe $\hat{p} = 0.375$ in 264 draws, if the true proportion were $p =\frac{2,002,194}{5,245,786}$. Therefore, we accept $H_0$ and conclude that the lottery is fair.

C. It is quite plausible to observe $\hat{p} = 0.375$ in 264 draws, if the true proportion were $p =\frac{2,002,194}{5,245,786}$. Therefore, we have enough evidence to reject $H_0$ and conclude that the lottery is not fair.

D. It is quite plausible to observe $\hat{p} = 0.375$ in 264 draws, if the true proportion were $p =\frac{2,002,194}{5,245,786}$. Therefore, we do not have enough evidence to reject $H_0$ and conclude that the lottery is not fair.

In [13]:
# answer1.6 <- ...

### BEGIN SOLUTION
answer1.6 <- "D"
### END SOLUTION

answer1.6

In [14]:
# Here we check to see if you have given your answer the correct object name
# and if your answer is plausible. However, all other tests have been hidden
# so you can practice deciding when you have the correct answer.
test_that('Did not assign answer to an object called "answer1.6"', {
  expect_true(exists("answer1.6"))
})
test_that('Solution should be a single character ("A", "B", "C", or "D")', {
  expect_match(answer1.6, "a|b|c|d", ignore.case = TRUE)
})
### BEGIN HIDDEN TESTS
test_that("Solution is incorrect", {
  expect_equal(digest(tolower(answer1.6)), "d110f00cfb1b248e835137025804a23b")
})
print("Success!")
### END HIDDEN TESTS

[1] "Success!"


<b>Question 1.7</b>
<br> {points: 1}

Use R's `prop.test`function to test the hypothesis. Make sure to use the `correct = FALSE` and `broom::tidy()` to get a more organized result.

<i>Assign your answer to an object called </i>`lotto_prop_test`.

In [15]:
#lotto_prop_test <- ...

### BEGIN SOLUTION
lotto_prop_test <- tidy(prop.test(lotto_p_hat * nrow(lotto6_42),
                             n = nrow(lotto6_42), 
                             p = lotto_p0, correct = FALSE))
### END SOLUTION

lotto_prop_test

estimate,statistic,p.value,parameter,conf.low,conf.high,method,alternative
<dbl>,<dbl>,<dbl>,<int>,<dbl>,<dbl>,<chr>,<chr>
0.375,0.04986654,0.8232957,1,0.3187869,0.4347987,1-sample proportions test without continuity correction,two.sided


In [16]:
# Here we check to see if you have given your answer the correct object name
# and if your answer is plausible. However, all other tests have been hidden
# so you can practice deciding when you have the correct answer.
test_that('Did not assign answer to an object called "lotto_prop_test"', {
  expect_true(exists("lotto_prop_test"))
})
test_that("Solution should be the output of t.test", {
    expect_true("data.frame" %in% class(lotto_prop_test))
})
### BEGIN HIDDEN TESTS
  test_that("Wrong statistic value", {
    expect_equal(digest(as.integer(lotto_prop_test$statistic*10e6)), "b9a549653e162eaca853bd9978a27ca6")
  })
  
  test_that("Wrong p-value", {
    expect_equal(digest(as.integer(lotto_prop_test$p.value*10e6)), "83e9ccfb5fc36266bd1f437fe5529eb0")
  })
  test_that("Wrong estimate", {
    expect_equal(digest(as.integer(lotto_prop_test$estimate*10e6)), "ab8316d1cadf0166e59aad6f3aba8e7f")
  })
  test_that("Wrong parameter", {
    expect_equal(digest(lotto_prop_test$parameter), "cff4821792ec6a04a0622fe7246d298a")
  })
  test_that("Wrong alternative hypothesis", {
    expect_equal(digest(lotto_prop_test$alternative), "d095e00cba6f80bf1d5ad5db0400f812")
  })

print("Success!")
### END HIDDEN TESTS

[1] "Success!"


## 2. Jellyfish

&emsp; For this question, we will study the length of jellyfish from Hawkesbury River in New South Wales, Australia. First, let's load and preview the data.

In [17]:
jellyfish <- read_csv("data/jellyfish.csv")
head(jellyfish)

Parsed with column specification:
cols(
  Location = [31mcol_character()[39m,
  Width = [32mcol_double()[39m,
  Length = [32mcol_double()[39m
)



Location,Width,Length
<chr>,<dbl>,<dbl>
Dangar,6.0,9
Dangar,6.5,8
Dangar,6.5,9
Dangar,7.0,9
Dangar,7.0,10
Dangar,7.0,11


Let $\mu$ be the average length of Jellyfish in Dangar, and we will perform hypothesis testing on $\mu$ at a <b>10% significance level</b>. Our null hypothesis is that the average length $\mu$ is 11 cm, while the alternate hypothesis is that $\mu \neq 11$.

**Question 2.1** 
<br> {points: 1}

Select only the fish in Dangar and select the `length` column. Also, create a variable `dangar_mu0` to store the hypothesized $\mu$.

_Assign your data frame to an object called `dangar`._

In [18]:
#dangar <- 
#dangar_mu0 <-
### BEGIN SOLUTION
dangar <-
    jellyfish %>%
    filter(Location == "Dangar") %>%
    select(Length)

dangar_mu0 <- 11
### END SOLUTION

head(dangar)

Length
<dbl>
9
8
9
9
10
11


In [19]:
test_2.1()

[1] "Success!"


<b>Question 2.2: Calculate the observed test statistics</b>
<br>{points: 1}

Calculate the observed mean length. 

_Assign your answer to an object called `dangar_x_bar`. Your answer should be a single number._

In [20]:
#dangar_x_bar <-

### BEGIN SOLUTION
dangar_x_bar <- mean(dangar$Length)
### END SOLUTION

dangar_x_bar

In [21]:
test_2.2()

[1] "Success!"


<b>Question 2.3: Standard Error</b>
<br> {points: 1}

Calculate the standard error of the test statistic. 

_Assign your answer to an object called `dangar_std_error`. Your answer should be a single number._

In [22]:
#dangar_std_error <-

### BEGIN SOLUTION
dangar_std_error <- sd(dangar$Length) / sqrt(nrow(dangar))
### END SOLUTION

dangar_std_error

In [23]:
test_2.3()

[1] "Success!"


<b>Question 2.4: P-Value</b>
<br> {points: 1}

Calculate the p-value of the observed test statistic using t-distribution.

_Assign your answer to an object called `dangar_p_value`. Your answer should be a single number._

In [24]:
#dangar_p_value <-

### BEGIN SOLUTION
dangar_p_value <- 2 * pt((dangar_x_bar - dangar_mu0)/dangar_std_error, df = nrow(dangar)-1, lower.tail=FALSE)
### END SOLUTION

dangar_p_value

In [25]:
# Here we check to see if you have given your answer the correct object name
# and if your answer is plausible. However, all other tests have been hidden
# so you can practice deciding when you have the correct answer.
test_that('Did not assign answer to an object called "dangar_p_value"', {
  expect_true(exists("dangar_p_value"))
})
answer_as_numeric <- as.numeric(dangar_p_value)
test_that("Solution should be a number", {
  expect_false(is.na(answer_as_numeric))
})
### BEGIN HIDDEN TESTS
test_that("Solution is incorrect", {
  expect_equal(digest(as.integer(answer_as_numeric * 10e6)), "998dd14467338ebb1f7cd9938bf705b5")
})
print("Success!")
### END HIDDEN TESTS

[1] "Success!"


<b>Question 2.5: Confidence Interval </b>
<br> {points:1}

Calculate the 90% confidence interval of the population mean $\mu$. 

Use the scaffolding below:

`dangar_mean_ci <- tibble(
    lower_ci = ...
    upper_ci = ...
)` 

(Hint: the function `qt` can help you).

_Assign your data frame to an object called `dangar_mean_ci`._

In [26]:
#dangar_mean_ci <- tibble(
#    lower_ci = ...
#    upper_ci = ...
#)

### BEGIN SOLUTIOn
dangar_mean_ci <- tibble(
    lower_ci = qt(0.05, df = nrow(dangar) - 1) * dangar_std_error + dangar_x_bar,
    upper_ci = qt(0.95, df = nrow(dangar) - 1) * dangar_std_error + dangar_x_bar,
)
### END SOLUTION

dangar_mean_ci

lower_ci,upper_ci
<dbl>,<dbl>
11.2383,13.44352


In [27]:
# Here we check to see if you have given your answer the correct object name
# and if your answer is plausible. However, all other tests have been hidden
# so you can practice deciding when you have the correct answer.
test_that('Did not assign answer to an object called "dangar_mean_ci"', {
  expect_true(exists("dangar_mean_ci"))
})
test_that("Solution should be a data frame", {
  expect_true("data.frame" %in% class(dangar_mean_ci))
})

expected_colnames <- c("lower_ci", "upper_ci")
given_colnames <- colnames(dangar_mean_ci)
test_that("Data frame does not have the correct columns", {
  expect_equal(length(setdiff(
    union(expected_colnames, given_colnames),
    intersect(expected_colnames, given_colnames)
  )), 0)
})

### BEGIN HIDDEN TESTS

test_that("Data frame does not contain the correct number of rows", {
  expect_equal(digest(as.integer(nrow(dangar_mean_ci))), "4b5630ee914e848e8d07221556b0a2fb")
})

test_that("Data frame does not contain the correct data", {
  expect_equal(digest(as.integer(dangar_mean_ci$lower_ci + dangar_mean_ci$upper_ci) * 10e6), "57c7ad99882de228268d0b4f5134e785")
})

print("Success!")
### END HIDDEN TESTS

[1] "Success!"


**Question 2.6: Conclusion**
<br>{points: 1}

What can we conclude from this test?

A. Since the `p-value` is lower than 10%, we don't have enough evidence to reject $H_0$ and conclude that the average fish's length in Dangar is 11cm.

B. Since the `p-value` is lower than 10%, we have enough evidence to reject $H_0$ and conclude that the average fish's length in Dangar is not 11cm.

_Assign your answer to an object called `answer2.6`. Your answer should be a single character surrounded by quotes._

In [28]:
# answer2.6 <- ...

### BEGIN SOLUTION
answer2.6 <- "B"
### END SOLUTION

In [29]:
# Here we check to see if you have given your answer the correct object name
# and if your answer is plausible. However, all other tests have been hidden
# so you can practice deciding when you have the correct answer.
test_that('Did not assign answer to an object called "answer2.6"', {
  expect_true(exists("answer2.6"))
})
test_that('Solution should be a single character ("A" or "B")', {
  expect_match(answer2.6, "a|b", ignore.case = TRUE)
})
### BEGIN HIDDEN TESTS
test_that("Solution is incorrect", {
  expect_equal(digest(tolower(answer2.6)), "ddf100612805359cd81fdc5ce3b9fbba")
})
print("Success!")
### END HIDDEN TESTS

[1] "Success!"


**Question 2.7** 
<br> {points: 1}

Use R's `t.test` function to test the hypotheses $H_0: \mu = 11$. Make sure to use `broom::tidy()` to get a more organized result.

_Assign your data frame to an object called `dangar_t_test`._

In [30]:
#dangar_t_test <-

### BEGIN SOLUTION
dangar_t_test <- tidy(t.test(dangar, mu = 11))
### END SOLUTION

dangar_t_test

estimate,statistic,p.value,parameter,conf.low,conf.high,method,alternative
<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<chr>,<chr>
12.34091,2.09264,0.04870617,21,11.00835,13.67347,One Sample t-test,two.sided


In [31]:
# Here we check to see if you have given your answer the correct object name
# and if your answer is plausible. However, all other tests have been hidden
# so you can practice deciding when you have the correct answer.
test_that('Did not assign answer to an object called "dangar_t_test"', {
  expect_true(exists("dangar_t_test"))
})
test_that("Solution should be the output of t.test", {
    expect_true("data.frame" %in% class(dangar_t_test))
})
### BEGIN HIDDEN TESTS
  test_that("Wrong statistic value", {
    expect_equal(digest(as.integer(dangar_t_test$statistic*10e6)), "d8cdbf79e18367b1993eeb6f880737df")
  })
  
  test_that("Wrong p-value", {
    expect_equal(digest(as.integer(dangar_t_test$p.value*10e6)), "998dd14467338ebb1f7cd9938bf705b5")
  })
  test_that("Wrong estimate", {
    expect_equal(digest(as.integer(dangar_t_test$estimate*10e6)), "85b4fb3fc7bcebf153c433fac3244fdc")
  })
  test_that("Wrong parameter", {
    expect_equal(digest(dangar_t_test$parameter), "00363a4a30820157d44d41ac4a3c6de3")
  })
  test_that("Wrong alternative hypothesis", {
    expect_equal(digest(dangar_t_test$alternative), "d095e00cba6f80bf1d5ad5db0400f812")
  })

print("Success!")
### END HIDDEN TESTS

[1] "Success!"


## 3. Chronic Bronchial Reaction to Dust

In this section, we will study the `dust` dataset, which records the survey result of the chronic bronchial reaction of employees of a Munich factory. Let's load and preview the dataset.

In [32]:
dust <- read_csv("data//dust.csv")
head(dust %>% slice_sample(n = 6)) 

Parsed with column specification:
cols(
  bronch = [32mcol_double()[39m,
  dust = [32mcol_double()[39m,
  smoke = [32mcol_double()[39m,
  years = [32mcol_double()[39m
)



bronch,dust,smoke,years
<dbl>,<dbl>,<dbl>,<dbl>
0,0.79,0,33
0,0.71,1,14
0,0.49,1,47
0,0.56,1,9
0,0.39,1,18
1,8.0,1,29


Let's assume that all of the employees are exposed to the same dust concentration within the factory. Let $p_{0}$ be the proportion of non-smoking employees (`smoke==0`) that have a chronic bronchial reaction, and let $p_{1}$ be the proportion of smoking employees (`smoke==1`) that have a chronic bronchial reaction. We will perform hypothesis testing on the difference in proportions $p_{1}-p_{0}$ at a 5% significance level.

Our null hypothesis is that smoking is unrelated to chronic bronchitis in this factory ($p_{1}=p_{0}$), while the alternative hypothesis is that there is a difference in proportions of employees having chronic bronchitis between the smokers and the non-smokers ($p_{1} \neq p_{0}$).

In [33]:
smoke_bronch <- dust %>%
                    select("bronch", "smoke") %>%
                    filter(!is.na(smoke))  %>%
                    mutate(smoke = recode(smoke, `0`="non_smoker", `1`="smoker"),
                           bronch = fct_recode(factor(bronch), "no_reaction" = "0", "reaction" = '1'))
head(smoke_bronch)
table(smoke_bronch)

bronch,smoke
<fct>,<chr>
no_reaction,smoker
no_reaction,smoker
no_reaction,smoker
no_reaction,smoker
no_reaction,smoker
no_reaction,smoker


             smoke
bronch        non_smoker smoker
  no_reaction        274    680
  reaction            51    241

<b>Question 3.1: Observed test statistic</b>
<br> {points:1}

Calculate the observed test statistic $\hat{p}_1-\hat{p}_0$. 

_Assign your data frame to an object called `dust_summary`. The data frame should contain five columns: `n_non_smoker`,	`n_smoker`,	`p_hat_non_smoker`,	`p_hat_smoker`, and `prop_diff`._

In [34]:
# dust_summary <-
#     smoke_bronch %>% 
#     group_by(...) %>% 
#     summarise(n = ..., 
#               p_hat = ...,  
#              `.groups` = "drop") %>% 
#     pivot_wider(names_from = smoke, values_from = c(n, p_hat)) %>% 
#     mutate(...)

### BEGIN SOLUTION
dust_summary <-
    smoke_bronch %>% 
    group_by(smoke) %>% 
    summarise(n = n(), p_hat = mean(bronch=="reaction"), `.groups` = "drop") %>% 
    pivot_wider(names_from = smoke, values_from = c(n, p_hat)) %>% 
    mutate(prop_diff = p_hat_smoker-p_hat_non_smoker)
### END SOLUTION

dust_summary

n_non_smoker,n_smoker,p_hat_non_smoker,p_hat_smoker,prop_diff
<int>,<int>,<dbl>,<dbl>,<dbl>
325,921,0.1569231,0.2616721,0.104749


In [35]:
test_3.1()

[1] "Success!"


<b>Question 3.2</b>
<br> {points:1}

Add a sixth column to `dust_summary`, named `null_std_error`, with the standard error of the observed test statistic under the null model. 

In [36]:

### BEGIN SOLUTION
dust_summary <-
    dust_summary %>% 
    mutate(p = (p_hat_smoker * n_smoker + p_hat_non_smoker * n_non_smoker)/(n_non_smoker+n_smoker),
           null_std_error = sqrt(p*(1-p)*(1/n_smoker+1/n_non_smoker))) %>% 
    select(-p)
### END SOLUTION

dust_summary

n_non_smoker,n_smoker,p_hat_non_smoker,p_hat_smoker,prop_diff,null_std_error
<int>,<int>,<dbl>,<dbl>,<dbl>,<dbl>
325,921,0.1569231,0.2616721,0.104749,0.02732971


In [37]:
# Here we check to see if you have given your answer the correct object name
# and if your answer is plausible. However, all other tests have been hidden
# so you can practice deciding when you have the correct answer.
  test_that('Did not assign answer to an object called "dust_summary"', {
    expect_true(exists("dust_summary"))
  })

  test_that("Solution should be a data frame", {
    expect_true("data.frame" %in% class(dust_summary))
  })

  expected_colnames <- c("n_non_smoker", "n_smoker", "p_hat_non_smoker", "p_hat_smoker", "prop_diff", "null_std_error")
  given_colnames <- colnames(dust_summary)
  test_that("Data frame does not have the correct columns", {
    expect_equal(length(setdiff(
      union(expected_colnames, given_colnames),
      intersect(expected_colnames, given_colnames)
    )), 0)
  })

  test_that("Data frame does not contain the correct number of rows", {
    expect_equal(digest(as.integer(nrow(dust_summary))), "4b5630ee914e848e8d07221556b0a2fb")
  })

  test_that("Data frame does not contain the correct data", {
    expect_equal(digest(as.integer(sum(dust_summary$n_non_smoker))), "87f3407882014ba8f18016b8e408ad35")
    expect_equal(digest(as.integer(sum(dust_summary$n_smoker))), "76bb48cccd68153482cda2b8ebfecd93")
    expect_equal(digest(as.integer(sum(dust_summary$p_hat_non_smoker) * 10e6)), "e772aa3c54a729784d27e153931979c7")
    expect_equal(digest(as.integer(sum(dust_summary$p_hat_smoker) * 10e6)), "65105150a89cf0f1e8f0b93ae773058d")
    expect_equal(digest(as.integer(sum(dust_summary$prop_diff) * 10e6)), "2b59c114711ded35f1991bdb2cfe5562")
  })

### BEGIN HIDDEN TESTS
test_that("Solution is incorrect", {
  expect_equal(digest(as.integer(sum(dust_summary$null_std_error) * 10e6)), "cc782cbe4d3b401bd627f6992b356310")
})
print("Success!")
### END HIDDEN TESTS

[1] "Success!"


<b>Question 3.3:</b>
<br> {points: 1}

Add another column to `dust_summary`, named `p_value`, with the p-value of the observed test statistic calculated using the CLT.

In [38]:
### BEGIN SOLUTION
dust_summary <- 
    dust_summary  %>% 
    mutate(p_value = 2 * pnorm(prop_diff, 0, null_std_error, lower.tail=FALSE))
### END SOLUTION

dust_summary

n_non_smoker,n_smoker,p_hat_non_smoker,p_hat_smoker,prop_diff,null_std_error,p_value
<int>,<int>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>
325,921,0.1569231,0.2616721,0.104749,0.02732971,0.0001266988


In [39]:
# Here we check to see if you have given your answer the correct object name
# and if your answer is plausible. However, all other tests have been hidden
# so you can practice deciding when you have the correct answer.
  test_that('Did not assign answer to an object called "dust_summary"', {
    expect_true(exists("dust_summary"))
  })

  test_that("Solution should be a data frame", {
    expect_true("data.frame" %in% class(dust_summary))
  })

  expected_colnames <- c("n_non_smoker", "n_smoker", "p_hat_non_smoker", "p_hat_smoker", "prop_diff", "null_std_error", "p_value")
  given_colnames <- colnames(dust_summary)
  test_that("Data frame does not have the correct columns", {
    expect_equal(length(setdiff(
      union(expected_colnames, given_colnames),
      intersect(expected_colnames, given_colnames)
    )), 0)
  })

  test_that("Data frame does not contain the correct number of rows", {
    expect_equal(digest(as.integer(nrow(dust_summary))), "4b5630ee914e848e8d07221556b0a2fb")
  })

  test_that("Data frame does not contain the correct data", {
    expect_equal(digest(as.integer(sum(dust_summary$n_non_smoker))), "87f3407882014ba8f18016b8e408ad35")
    expect_equal(digest(as.integer(sum(dust_summary$n_smoker))), "76bb48cccd68153482cda2b8ebfecd93")
    expect_equal(digest(as.integer(sum(dust_summary$p_hat_non_smoker) * 10e6)), "e772aa3c54a729784d27e153931979c7")
    expect_equal(digest(as.integer(sum(dust_summary$p_hat_smoker) * 10e6)), "65105150a89cf0f1e8f0b93ae773058d")
    expect_equal(digest(as.integer(sum(dust_summary$prop_diff) * 10e6)), "2b59c114711ded35f1991bdb2cfe5562")
  })

### BEGIN HIDDEN TESTS
test_that("Solution is incorrect", {
  expect_equal(digest(as.integer(sum(dust_summary$null_std_error) * 10e6)), "cc782cbe4d3b401bd627f6992b356310")
  expect_equal(  digest(as.integer(sum(dust_summary$p_value) * 10e6)), "40a71fcefb374785bd9c9e10386e24e5")
})

print("Success!")
### END HIDDEN TESTS

[1] "Success!"


**Question 3.4** 
<br> {points: 1}

Use R's `prop.test` function to test the hypothesis. Make sure to use the `correct = FALSE`.
Make sure to use `broom::tidy()` to get a more organized result.

_Assign your data frame to an object called `dust_prop_test`._

In [40]:
### BEGIN SOLUTION
dust_prop_test <- 
    tidy(prop.test(x = c(dust_summary %>% mutate(success = n_smoker * p_hat_smoker) %>% pull(success),
                    dust_summary %>% mutate(success = n_non_smoker * p_hat_non_smoker) %>% pull(success)),
              n = c(dust_summary$n_smoker, dust_summary$n_non_smoker),
             correct=FALSE))
### END SOLUTION

dust_prop_test

estimate1,estimate2,statistic,p.value,parameter,conf.low,conf.high,method,alternative
<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<chr>,<chr>
0.2616721,0.1569231,14.69027,0.0001266988,1,0.05607071,0.1534273,2-sample test for equality of proportions without continuity correction,two.sided


In [41]:
test_3.4()

[1] "Success!"


**Question 3.5**
<br>{points: 1}

What can we conclude from this test?

A. It would be very unlikely to observe a difference in proportion $\hat{p}_1 - \hat{p}_2 = 0.104749$ among 325 smokers and 921 non-smokers, if the proportions were the same. Therefore, we reject $H_0$ at 5% significance level and conclude that smokers have a higher chance of having a reaction.

B. It would be very unlikely to observe a difference in proportion $\hat{p}_1 - \hat{p}_2 = 0.104749$ among 325 smokers and 921 non-smokers, if the proportions were the same. Therefore, we accept $H_0$ at 5% significance level and conclude that smokers do not have a higher chance of having a reaction.

C. It is quite plausible to observe a difference in proportion $\hat{p}_1 - \hat{p}_2 = 0.104749$ among 325 smokers and 921 non-smokers, if the proportions were the same. Therefore, we do not reject $H_0$ at 5% significance level and conclude that smokers do not have a higher chance of having a reaction.

D. It is quite plausible to observe a difference in proportion $\hat{p}_1 - \hat{p}_2 = 0.104749$ among 325 smokers and 921 non-smokers, if the proportions were the same. Therefore, we reject $H_0$ at 5% significance level and conclude that smokers have a higher chance of having a reaction.

_Assign your answer to an object called `answer3.5`. Your answer should be a single character surrounded by quotes._

In [42]:
# answer3.5 <- ...

### BEGIN SOLUTION
answer3.5 <- "A"
### END SOLUTION

In [43]:
# Here we check to see if you have given your answer the correct object name
# and if your answer is plausible. However, all other tests have been hidden
# so you can practice deciding when you have the correct answer.
test_that('Did not assign answer to an object called "answer3.5"', {
  expect_true(exists("answer3.5"))
})
test_that('Solution should be a single character ("A", "B", "C", or "D")', {
  expect_match(answer3.5, "a|b|c|d", ignore.case = TRUE)
})
### BEGIN HIDDEN TESTS
test_that("Solution is incorrect", {
  expect_equal(digest(tolower(answer3.5)), "127a2ec00989b9f7faf671ed470be7f8")
})
print("Success!")
### END HIDDEN TESTS

[1] "Success!"


## 4. The difference in lengths of fish species

We will use the `fishcatch` dataset, which records the measurement of 159 fish of 7 species from the lake Laengelmavesi near Tampere in Finland. More details of the dataset are in http://jse.amstat.org/datasets/fishcatch.txt.

In [44]:
fish <- read.table("http://jse.amstat.org/datasets/fishcatch.dat.txt", header=FALSE)
colnames(fish) <- c("Obs", "Species", "Weight", "Length1", "Length2", "Length3", "Height", "Width", "Sex")
head(fish)

Unnamed: 0_level_0,Obs,Species,Weight,Length1,Length2,Length3,Height,Width,Sex
Unnamed: 0_level_1,<int>,<int>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<int>
1,1,1,242,23.2,25.4,30.0,38.4,13.4,
2,2,1,290,24.0,26.3,31.2,40.0,13.8,
3,3,1,340,23.9,26.5,31.1,39.8,15.1,
4,4,1,363,26.3,29.0,33.5,38.0,13.3,
5,5,1,430,26.5,29.0,34.0,36.6,15.1,
6,6,1,450,26.8,29.7,34.7,39.2,14.2,


We are interested to see if there is a significant difference in the length from the nose to the end of the tail (`Length3`) of the two species Bream (`Species==1`) and Roach (`Species==3`). Let $\mu_1$ be the mean of Bream's length, and let $\mu_2$ be the mean of Roach's length. We will perform hypothesis testing on $\mu_1 - \mu_2$ at a 5 % significance level. The null hypothesis is $\mu_1 = \mu_2$.

<b>Question 4.1: Alternative Hypothesis</b>
<br> {point: 1}

What is an appropriate alternative hypothesis?

A. $\mu_1 > \mu_2$

B. $\mu_1 \neq \mu_2$

C. $\mu_1 < \mu_2$

<i>Your answer should be a string with one letter assigned to the variable </i>`answer4.1`.

In [45]:
# answer4.1 <-

### BEGIN SOLUTION
answer4.1 <- "B"
### END SOLUTION
answer4.1

In [46]:
# Here we check to see if you have given your answer the correct object name
# and if your answer is plausible. However, all other tests have been hidden
# so you can practice deciding when you have the correct answer.
test_that('Did not assign answer to an object called "answer4.1"', {
  expect_true(exists("answer4.1"))
})
test_that('Solution should be a single character ("A", "B", or "C")', {
  expect_match(answer4.1, "a|b|c", ignore.case = TRUE)
})
### BEGIN HIDDEN TESTS
test_that("Solution is incorrect", {
  expect_equal(digest(tolower(answer4.1)), "ddf100612805359cd81fdc5ce3b9fbba")
})
print("Success!")
### END HIDDEN TESTS

[1] "Success!"


<b>Question 4.2</b>
<br> {points:1}

Filter the `fish` data set to only keep the Bream (`Species==1`) and Roach (`Species==3`) species, and only keep the `Species` and `Length3` columns. Replace the species code with the name of the species. 

_Assign your data frame to an object called `bream_roach`. The data frame should contain two columns: `Species` and `Length3`._

In [47]:
# bream_roach <- 
#     fish %>%
#     select(...) %>%
#     filter(...) %>%
#     mutate(Species = fct_recode(as_factor(Species), Bream='1', Roach='3'))


### BEGIN SOLUTION
bream_roach <- 
    fish %>%
    select("Species", "Length3") %>%
    filter(Species %in% c(1, 3)) %>%
    mutate(Species = fct_recode(as_factor(Species), Bream='1', Roach='3'))
### END SOLUTION

head(bream_roach)

Unnamed: 0_level_0,Species,Length3
Unnamed: 0_level_1,<fct>,<dbl>
1,Bream,30.0
2,Bream,31.2
3,Bream,31.1
4,Bream,33.5
5,Bream,34.0
6,Bream,34.7


In [48]:
test_4.2()

[1] "Success!"


<b> Question 4.3: Observed Test Statistic </b>
<br> {points: 1}

In this exercise, you need to:

1. obtain the sample size of `Bream` and `Roach` species. 
2. calculate the sample average and standard deviation of `Length3` for `Bream` species.
3. calculate the sample average and standard deviation of `Length3` for `Roach` species.
4. calculate observed test statistics $\bar{X}_1 - \bar{X}_2$, where $\bar{X}_1$ and $\bar{X_2}$ are the sample average of `Length3` of `Bream` and `Roach` species, respectively. 

_Assign your data frame to an object called `bream_roach_summary`. The data frame should contain seven columns: `n_Bream`,	`n_Roach`, `x_bar_Bream`, `x_bar_Roach`, `sd_Bream`, `sd_Roach`, and `mean_diff`._

In [49]:
# bream_roach_summary <-
#     bream_roach %>% 
#     group_by(...) %>% 
#     summarise(n = ...,
#               x_bar = ...,
#               sd = ...,
#               `.groups` = "drop") %>% 
#     pivot_wider(names_from = Species, values_from = c(n, x_bar, sd)) %>% 
#     mutate(...)

### BEGIN SOLUTION
bream_roach_summary <-
    bream_roach %>% 
    group_by(Species) %>% 
    summarise(n = n(),
              x_bar = mean(Length3),
              sd = sd(Length3),
              `.groups` = "drop") %>% 
    pivot_wider(names_from = Species, values_from = c(n, x_bar, sd)) %>% 
    mutate(mean_diff = x_bar_Bream - x_bar_Roach)

### END SOLUTION

bream_roach_summary

n_Bream,n_Roach,x_bar_Bream,x_bar_Roach,sd_Bream,sd_Roach,mean_diff
<int>,<int>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>
35,20,38.35429,24.97,4.157866,4.031599,13.38429


In [50]:
test_4.3()

[1] "Success!"


<b> Question 4.4: Standard deviation of the observed test statistic </b>
<br> {points:1}

Add another column to `bream_roach_summary`, named `null_std_error`, with the standard error of the test statistics $\bar{X}_1 - \bar{X}_2$ under the null model. 

In [51]:

### BEGIN SOLUTION
bream_roach_summary <-
    bream_roach_summary %>% 
    mutate(null_std_error = sqrt(sd_Bream^2/n_Bream + sd_Roach^2/n_Roach))
### END SOLUTION

bream_roach_summary

n_Bream,n_Roach,x_bar_Bream,x_bar_Roach,sd_Bream,sd_Roach,mean_diff,null_std_error
<int>,<int>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>
35,20,38.35429,24.97,4.157866,4.031599,13.38429,1.143078


In [52]:
test_4.4()

[1] "Success!"


<b> Question 4.5: Obtaining p-value </b>
<br> {points:1}

Add another column to `bream_roach_summary`, named `p_value`, with the test's p-value using the t-distribution. To help you, we calculated the approximate degrees of freedom for you: 40.7105 (for details on how to obtain this value, see Question 3.3.4 from Worksheet_08). 

In [53]:
### BEGIN SOLUTION
bream_roach_summary <- 
    bream_roach_summary %>% 
    mutate(p_value = 2 * pt(mean_diff/null_std_error, df=40.7105, lower.tail=FALSE))
### END SOLUTION

bream_roach_summary

n_Bream,n_Roach,x_bar_Bream,x_bar_Roach,sd_Bream,sd_Roach,mean_diff,null_std_error,p_value
<int>,<int>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>
35,20,38.35429,24.97,4.157866,4.031599,13.38429,1.143078,1.304443e-14


In [54]:
test_4.5()

[1] "Success!"


**Question 4.6** 
<br> {points: 1}

Use R's `t.test` function to test the hypothesis. Make sure to use `broom::tidy()` to get a more organized result.

_Assign your data frame to an object called `bream_roach_t_test`._

In [55]:
# bream_roach_t_test <-

### BEGIN SOLUTION
bream_roach_t_test <- 
    tidy(t.test(bream_roach %>% filter(Species == "Bream") %>% pull(Length3),
           bream_roach %>% filter(Species == "Roach") %>% pull(Length3)))
### END SOLUTION

bream_roach_t_test

estimate,estimate1,estimate2,statistic,p.value,parameter,conf.low,conf.high,method,alternative
<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<chr>,<chr>
13.38429,38.35429,24.97,11.70898,1.304444e-14,40.7105,11.07529,15.69328,Welch Two Sample t-test,two.sided


In [56]:
test_4.6()

[1] "Success!"


<b>Question 4.7: Conclusion of the test</b>
<br>{points: 1}

It would be unlikely to observe a difference in the average length of 13.38 if both species had the same mean length. Therefore, we reject $H_0$ at the following significance levels:

A. 10%

B. 5%

C. 1%
 
D. 0.01%

E. All the above.

_Assign your answer to an object called `answer4.7`. Your answer should be a single character surrounded by quotes._


In [57]:
# answer4.7 <- ...

### BEGIN SOLUTION
answer4.7 <- "E"
### END SOLUTION

In [58]:
# Here we check to see if you have given your answer the correct object name
# and if your answer is plausible. However, all other tests have been hidden
# so you can practice deciding when you have the correct answer.
test_that('Did not assign answer to an object called "answer4.7"', {
  expect_true(exists("answer4.7"))
})
test_that('Solution should be a single character ("A", "B", "C", "D", or "E")', {
  expect_match(answer4.7, "a|b|c|d|e", ignore.case = TRUE)
})
### BEGIN HIDDEN TESTS
test_that("Solution is incorrect", {
  expect_equal(digest(tolower(answer4.7)), "93a9078c6326f37b481d3e99b60ad987")
})
print("Success!")
### END HIDDEN TESTS

[1] "Success!"


## 5. Soybean: stress vs non-stress

For this question, we will use the `soybean` dataset. First, let's load and preview the data. Since the dataset is small, we can examine the full data.

In [59]:
soybean <- read_csv("data/soybean.csv")
soybean

Parsed with column specification:
cols(
  pair = [32mcol_double()[39m,
  stress = [32mcol_double()[39m,
  nostress = [32mcol_double()[39m
)



pair,stress,nostress
<dbl>,<dbl>,<dbl>
1,205,193
2,220,168
3,230,187
4,234,210
5,235,207
6,237,200
7,246,220
8,258,219
9,261,217
10,269,240


Here's the description of the data set taken directly from the package `isdals`' documentation:

>An experiment was carried out with 26 soybean plants. The plants were pairwise genetically identical, so there were 13 pairs in total. For each pair, one of the plants was 'stressed' by being shaken daily, whereas the other plant was not shaken. After a period the plants were harvested and the total leaf area was measured for each plant.

&emsp; We would like to investigate whether the stress induced by daily shaking the plants affects the total leaf area. Let $\mu_1$ be the average of the total leaf area of stressed plants, and let $\mu_0$ be the average of the total leaf area of unstressed plants. We would like to do hypothesis testing on $\mu_1-\mu_0$ at a 5% significance level and suppose that the null hypothesis is $\mu_1=\mu_0$. We would like to test the difference in paired means.

<b>Question 5.1: Mean and standard deviation</b>
<br> {points: 1}

Let’s create the summary data frame for `soybean` data set. Your job is to:

1. calculate the difference between the pairs, save it in a column named `d’;
2. calculate the mean and standard deviation of the differences, stored in columns `d_bar` and `sd`, respectively; 
3. calculate the standard error of the mean difference and store it in a column `std_error`;
4. finally, store the sample size as well in a column called ’n’.

_Assign your data frame to an object called `soybean_summary`._

In [60]:
# soybean_summary <-
#     soybean_original %>% 
#     mutate(...) %>% 
#     summarise(n = n(), 
#               d_bar = ..., 
#               sd =...,
#               std_error = ...)

### BEGIN SOLUTION
soybean_summary <-
    soybean %>% 
    mutate(d = stress - nostress) %>% 
    summarise(n = n(), d_bar = mean(d), sd = sd(d), std_error = sd/sqrt(n))
### END SOLUTION

soybean_summary

n,d_bar,sd,std_error
<int>,<dbl>,<dbl>,<dbl>
13,32.38462,11.38375,3.157284


In [61]:
test_5.1()

[1] "Success!"


<b> Question 5.2: P-value </b>
<br> {points: 1}

Add another column to `soybean_summary`, named `p_value`, with the p-value associated with the observed test statistic `d_bar`. 

In [62]:

### BEGIN SOLUTION
soybean_summary <-
    soybean_summary %>% 
    mutate(p_value = 2 * pt(d_bar/std_error, df = n-1, lower.tail=FALSE))
### END SOLUTION

soybean_summary

n,d_bar,sd,std_error,p_value
<int>,<dbl>,<dbl>,<dbl>,<dbl>
13,32.38462,11.38375,3.157284,2.720343e-07


In [63]:
# Here we check to see if you have given your answer the correct object name
# and if your answer is plausible. However, all other tests have been hidden
# so you can practice deciding when you have the correct answer.
  test_that('Did not assign answer to an object called "soybean_summary"', {
    expect_true(exists("soybean_summary"))
  })

  test_that("Solution should be a data frame", {
    expect_true("data.frame" %in% class(soybean_summary))
  })

  expected_colnames <- c("n", "d_bar", "sd", "std_error", "p_value")
  given_colnames <- colnames(soybean_summary)
  test_that("Data frame does not have the correct columns", {
    expect_equal(length(setdiff(
      union(expected_colnames, given_colnames),
      intersect(expected_colnames, given_colnames)
    )), 0)
  })
  test_that("Data frame does not contain the correct number of rows", {
    expect_equal(digest(as.integer(nrow(soybean_summary))), "4b5630ee914e848e8d07221556b0a2fb")
  })

### BEGIN HIDDEN TESTS

test_that("Data frame does not contain the correct data", {
    expect_equal(digest(as.integer(sum(soybean_summary$p_value) * 10e6)), "c01f179e4b57ab8bd9de309e6d576c48")
})

print("Success!")
### END HIDDEN TESTS

[1] "Success!"


**Question 5.3** 
<br> {points: 1}

Use R's `t.test` function to test the hypotheses. Make sure to use `broom::tidy()` to get a more organized result.

_Assign your data frame to an object called `soybean_t_test`._

In [64]:
### BEGIN SOLUTION
soybean_t_test <- 
    tidy(t.test(soybean$stress, soybean$nostress, paired = TRUE))
### END SOLUTION

soybean_t_test

estimate,statistic,p.value,parameter,conf.low,conf.high,method,alternative
<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<chr>,<chr>
32.38462,10.25711,2.720343e-07,12,25.50548,39.26375,Paired t-test,two.sided


In [65]:
test_5.3()

[1] "Success!"


**Question 5.4**
<br>{points: 1}

True or false?

We reject at 5% significance the null hypothesis that the mean total leaf area is the same for stressed and no-stressed soybeans. 

_Assign your answer to an object called `answer5.4`. Your answer should be either "true" or "false", surrounded by quotes._

In [66]:
# answer5.4 <- ...

### BEGIN SOLUTION
answer5.4 <- "true"
### END SOLUTION

In [67]:
# Here we check to see if you have given your answer the correct object name
# and if your answer is plausible. However, all other tests have been hidden
# so you can practice deciding when you have the correct answer.
test_that('Did not assign answer to an object called "answer5.4"', {
  expect_true(exists("answer5.4"))
})
test_that('Answer should be "true" or "false"', {
  expect_match(answer5.4, "true|false", ignore.case = TRUE)
})
### BEGIN HIDDEN TESTS
test_that("Solution is incorrect", {
  expect_equal(digest(tolower(answer5.4)), "05ca18b596514af73f6880309a21b5dd")
})
print("Success!")
### END HIDDEN TESTS

[1] "Success!"
