#  Tutorial 7: Classical Tests Based on Normal and t-Distributions

#### Lecture and Tutorial Learning Goals:
After completing this week's lecture and tutorial work, you will be able to:


1.	Use results from the assumption of normality or the Central Limit Theorem to perform hypothesis testing.
2.	Compare and contrast the parts of estimation and hypothesis testing that differ between simulation- and resampling-based approaches with the assumption of normality or the Central Limit Theorem-based approaches.
3.	Write a computer script to perform hypothesis testing based on results from the assumption of normality or the Central Limit Theorem.
4.	Discuss the potential limitations of these methods.

In [None]:
# Run this cell before continuing.
library(cowplot)
library(datateachr)
library(digest)
library(infer)
library(repr)
library(taxyvr)
library(tidyverse)
library(broom)
library(digest)
library(testthat)
source("tests_tutorial_07.R")

## 1. Lotto 6/42

Though the lottery is fair in general, many believe that the distribution of winning numbers is not even. Some lottery enthusiasts spend years studying the pattern of numbers and come up with complex theories to win the grand prize in the next draw. For this question, we will study past winning numbers of Lotto 6/42, which is a lottery in Ireland. Every time, six winning numbers are drawn from a pool of 42 numbers without replacement. You can view the description of the data [here](http://jse.amstat.org/datasets/lotto.txt). Let's load and preview the data.

In [None]:
lotto6_42 <- read.delim("http://jse.amstat.org/datasets/lotto.dat.txt", header = FALSE, sep = "\t", dec = ".")
colnames(lotto6_42) <- c("code", "first", "second", "third", "fourth", "fifth", "sixth")
lotto6_42 <- lotto6_42 %>%
                filter(code == 2) %>% ## to look at actual lottory numbers
                subset(select=-c(code))
head(lotto6_42)

Some claim that the winning numbers are usually double-digit numbers, and in most cases, there is at most one single-digit winning number in each draw. Others claim that single-digit numbers are lucky, and that usually there are more one single-digit winning numbers than would be expected from a fair lottery. Looking at the first six rows, we may agree with the first claim. But we know better than to just look at anecdotal evidence. Let's test whether the lottery is fair, and the hypothesis that the probability of having more than one single-digit winning number in a draw is not what it should be if the lottery was fair. What does the data tell us?

Note that, in total, ${42 - 9 \choose 6}+{42 - 9 \choose 5}{9 \choose 1} = 3,243,592$ out of ${42 \choose 6} = 5,245,786$ games have zero or one single-digit winning number. If the lottery is fair, we expect all the games to have the same chance of occurrence.  

**Question 1.1**
<br>{points: 3}

Let $p$ be the probability of having at least two single-digit winning numbers in a draw. 
Considering the scenario above, the null hypothesis is:

A. $H_0: p = \frac{2,002,194}{5,245,786}$

B. $H_0: p = \frac{3,243,592}{5,245,786}$

C. $H_0: p = \frac{2,002,194}{3,243,592}$

D. $H_0: p = \frac{9}{42}$

_Assign your answer to an object called `answer1.1`. Your answer should be a single character surrounded by quotes. Also, create a variable `lotto_p0` and assign the hypothesized probability of the null hypothesis._

In [None]:
# answer1.1 <- ...
# lotto_p0 <- ..

# your code here
fail() # No Answer - remove if you provide an answer

lotto_p0

In [None]:
# Here we check to see if you have given your answer the correct object name
# and if your answer is plausible. However, all other tests have been hidden
# so you can practice deciding when you have the correct answer.
test_that('Did not assign answer to an object called "answer1.1"', {
  expect_true(exists("answer1.1"))
})

test_that('Solution should be a single character ("A", "B", "C", or "D")', {
  expect_match(answer1.1, "a|b|c|d", ignore.case = TRUE)
})

test_that('Did not assign answer to an object called "lotto_p0"', {
  expect_true(exists("lotto_p0"))
})

answer_as_numeric <- as.numeric(lotto_p0)
test_that("Solution should be a number", {
  expect_false(is.na(answer_as_numeric))
})

test_that("Solution is incorrect", {
  expect_equal(digest(as.integer(answer_as_numeric * 10e6)), "089e824515c530884e22bea2dd98a447")
})


<b> Question 1.2 </b>
<br> {points: 3}

What is the correct alternative hypothesis?

A. $H_a: p > \frac{2,002,194}{5,245,786}$

B. $H_a: p \neq \frac{2,002,194}{5,245,786}$

C. $H_a: p < \frac{2,002,194}{5,245,786}$

D. $H_a: p > \frac{3,243,592}{5,245,786}$

E. $H_a: p \neq \frac{3,243,592}{5,245,786}$

F. $H_a: p > \frac{3,243,592}{5,245,786}$

G. $H_0: p < \frac{2,002,194}{3,243,592}$

H. $H_0: p < \frac{9}{42}$

<i>Assign your answer to an object called</i> `answer1.2`</i>. Your answer should be a single character surrounded by quotes.</i>

In [None]:
#answer1.2 <-

# your code here
fail() # No Answer - remove if you provide an answer

answer1.2

In [None]:
# Here we check to see if you have given your answer the correct object name
# and if your answer is plausible. However, all other tests have been hidden
# so you can practice deciding when you have the correct answer.
test_that('Did not assign answer to an object called "answer1.2"', {
  expect_true(exists("answer1.2"))
})
test_that('Solution should be a single character ("A", "B", "C", "D", "E", "F", "G", or "H")', {
  expect_match(answer1.2, "a|b|c|d|e|f|g|h", ignore.case = TRUE)
})

<b> Question 1.3</b>
<br> {points: 1}

Calculate the sample proportion, $\hat{p}$, of draws with at least two single-digit numbers. 
(Hint: take a look at the function `rowSums`.)

<i>Assign your answer to an object called `lotto_p_hat`. Your answer should be a single number.</i>

In [None]:
# lotto_p_hat <- mean(rowSums(... < ...) > 1)

# your code here
fail() # No Answer - remove if you provide an answer

lotto_p_hat

In [None]:
test_1.3()

<b>Question 1.4</b>
<br> {points: 1}

Calculate the standard error of $\hat{p}$ of the null model.

<i>Assign your answer to an object called `lotto_std_error`. Your answer should be a single number.</i>

In [None]:
#lotto_std_error <-

# your code here
fail() # No Answer - remove if you provide an answer

lotto_std_error

In [None]:
test_1.4()

<b>Question 1.5: Calculate p-value</b>
<br> {points: 1}

Check if the assumptions for the CLT hold. Calculate the p-value.

<i>Assign your answer to an object called </i>`lotto_p_value`.

In [None]:
# lotto_p_value<-

# your code here
fail() # No Answer - remove if you provide an answer

lotto_p_value

In [None]:
test_1.5()

<b>Question 1.6: Conclusion</b>
<br> {points: 3}

What can we conclude from this test? Note that our significance level is 0.05.

A. It would be very unlikely to observe $\hat{p} = 0.375$ in 264 draws, if the true proportion were $p =\frac{2,002,194}{5,245,786}$. Therefore, we reject $H_0$ and conclude that the lottery is not fair.

B. It would be very unlikely to observe $\hat{p} = 0.375$ in 264 draws, if the true proportion were $p =\frac{2,002,194}{5,245,786}$. Therefore, we accept $H_0$ and conclude that the lottery is fair.

C. It is quite plausible to observe $\hat{p} = 0.375$ in 264 draws, if the true proportion were $p =\frac{2,002,194}{5,245,786}$. Therefore, we have enough evidence to reject $H_0$ and conclude that the lottery is not fair.

D. It is quite plausible to observe $\hat{p} = 0.375$ in 264 draws, if the true proportion were $p =\frac{2,002,194}{5,245,786}$. Therefore, we do not have enough evidence to reject $H_0$ and conclude that the lottery is not fair.

In [None]:
# answer1.6 <- ...

# your code here
fail() # No Answer - remove if you provide an answer

answer1.6

In [None]:
# Here we check to see if you have given your answer the correct object name
# and if your answer is plausible. However, all other tests have been hidden
# so you can practice deciding when you have the correct answer.
test_that('Did not assign answer to an object called "answer1.6"', {
  expect_true(exists("answer1.6"))
})
test_that('Solution should be a single character ("A", "B", "C", or "D")', {
  expect_match(answer1.6, "a|b|c|d", ignore.case = TRUE)
})

<b>Question 1.7</b>
<br> {points: 3}

Use R's `prop.test`function to test the hypothesis. Make sure to use the `correct = FALSE` and `broom::tidy()` to get a more organized result.

<i>Assign your answer to an object called </i>`lotto_prop_test`.

In [None]:
#lotto_prop_test <- ...

# your code here
fail() # No Answer - remove if you provide an answer

lotto_prop_test

In [None]:
# Here we check to see if you have given your answer the correct object name
# and if your answer is plausible. However, all other tests have been hidden
# so you can practice deciding when you have the correct answer.
test_that('Did not assign answer to an object called "lotto_prop_test"', {
  expect_true(exists("lotto_prop_test"))
})
test_that("Solution should be the output of t.test", {
    expect_true("data.frame" %in% class(lotto_prop_test))
})

## 2. Jellyfish

For this question, we will study the length of jellyfish from Hawkesbury River in New South Wales, Australia. First, let's load and preview the data.

In [None]:
jellyfish <- read_csv("data/jellyfish.csv")
head(jellyfish)

Let $\mu$ be the average length of Jellyfish in Dangar, and we will perform hypothesis testing on $\mu$ at a <b>10% significance level</b>. Our null hypothesis is that the average length $\mu$ is 11 cm, while the alternate hypothesis is that $\mu \neq 11$.

**Question 2.1** 
<br> {points: 1}

Select only the fish in Dangar and select the `length` column. Also, create a variable `dangar_mu0` to store the hypothesized $\mu$.

_Assign your data frame to an object called `dangar`._

In [None]:
#dangar <- 
#dangar_mu0 <-
# your code here
fail() # No Answer - remove if you provide an answer

head(dangar)

In [None]:
test_2.1()

<b>Question 2.2: Calculate the observed test statistics</b>
<br>{points: 1}

Calculate the observed mean length. 

_Assign your answer to an object called `dangar_x_bar`. Your answer should be a single number._

In [None]:
#dangar_x_bar <-

# your code here
fail() # No Answer - remove if you provide an answer

dangar_x_bar

In [None]:
test_2.2()

<b>Question 2.3: Standard Error</b>
<br> {points: 1}

Calculate the standard error of the test statistic. 

_Assign your answer to an object called `dangar_std_error`. Your answer should be a single number._

In [None]:
#dangar_std_error <-

# your code here
fail() # No Answer - remove if you provide an answer

dangar_std_error

In [None]:
test_2.3()

<b>Question 2.4: P-Value</b>
<br> {points: 3}

Calculate the p-value of the observed test statistic using t-distribution.

_Assign your answer to an object called `dangar_p_value`. Your answer should be a single number._

In [None]:
#dangar_p_value <-

# your code here
fail() # No Answer - remove if you provide an answer

dangar_p_value

In [None]:
# Here we check to see if you have given your answer the correct object name
# and if your answer is plausible. However, all other tests have been hidden
# so you can practice deciding when you have the correct answer.
test_that('Did not assign answer to an object called "dangar_p_value"', {
  expect_true(exists("dangar_p_value"))
})
answer_as_numeric <- as.numeric(dangar_p_value)
test_that("Solution should be a number", {
  expect_false(is.na(answer_as_numeric))
})

<b>Question 2.5: Confidence Interval </b>
<br> {points:3}

Calculate the 90% confidence interval of the population mean $\mu$. 

Use the scaffolding below:

```
dangar_mean_ci <- tibble(
    lower_ci = ...
    upper_ci = ...
)
``` 

(Hint: the function `qt` can help you).

_Assign your data frame to an object called `dangar_mean_ci`._

In [None]:
#dangar_mean_ci <- tibble(
#    lower_ci = ...
#    upper_ci = ...
#)

# your code here
fail() # No Answer - remove if you provide an answer

dangar_mean_ci

In [None]:
# Here we check to see if you have given your answer the correct object name
# and if your answer is plausible. However, all other tests have been hidden
# so you can practice deciding when you have the correct answer.
test_that('Did not assign answer to an object called "dangar_mean_ci"', {
  expect_true(exists("dangar_mean_ci"))
})
test_that("Solution should be a data frame", {
  expect_true("data.frame" %in% class(dangar_mean_ci))
})

expected_colnames <- c("lower_ci", "upper_ci")
given_colnames <- colnames(dangar_mean_ci)
test_that("Data frame does not have the correct columns", {
  expect_equal(length(setdiff(
    union(expected_colnames, given_colnames),
    intersect(expected_colnames, given_colnames)
  )), 0)
})


**Question 2.6: Conclusion**
<br>{points: 3}

What can we conclude from this test?

A. Since the `p-value` is lower than 10%, we don't have enough evidence to reject $H_0$ and conclude that the average fish's length in Dangar is 11cm.

B. Since the `p-value` is lower than 10%, we have enough evidence to reject $H_0$ and conclude that the average fish's length in Dangar is not 11cm.

_Assign your answer to an object called `answer2.6`. Your answer should be a single character surrounded by quotes._

In [None]:
# answer2.6 <- ...

# your code here
fail() # No Answer - remove if you provide an answer

In [None]:
# Here we check to see if you have given your answer the correct object name
# and if your answer is plausible. However, all other tests have been hidden
# so you can practice deciding when you have the correct answer.
test_that('Did not assign answer to an object called "answer2.6"', {
  expect_true(exists("answer2.6"))
})
test_that('Solution should be a single character ("A" or "B")', {
  expect_match(answer2.6, "a|b", ignore.case = TRUE)
})

**Question 2.7** 
<br> {points: 3}

Use R's `t.test` function to test the hypotheses $H_0: \mu = 11$. Make sure to use `broom::tidy()` to get a more organized result.

_Assign your data frame to an object called `dangar_t_test`._

In [None]:
#dangar_t_test <-

# your code here
fail() # No Answer - remove if you provide an answer

dangar_t_test

In [None]:
# Here we check to see if you have given your answer the correct object name
# and if your answer is plausible. However, all other tests have been hidden
# so you can practice deciding when you have the correct answer.
test_that('Did not assign answer to an object called "dangar_t_test"', {
  expect_true(exists("dangar_t_test"))
})
test_that("Solution should be the output of t.test", {
    expect_true("data.frame" %in% class(dangar_t_test))
})

## 3. Chronic Bronchial Reaction to Dust

In this section, we will study the `dust` dataset, which records the survey result of the chronic bronchial reaction of employees of a Munich factory. Let's load and preview the dataset.

In [None]:
dust <- read_csv("data//dust.csv")
head(dust %>% slice_sample(n = 6)) 

Let's assume that all of the employees are exposed to the same dust concentration within the factory. Let $p_{0}$ be the proportion of non-smoking employees (`smoke==0`) that have a chronic bronchial reaction, and let $p_{1}$ be the proportion of smoking employees (`smoke==1`) that have a chronic bronchial reaction. We will perform hypothesis testing on the difference in proportions $p_{1}-p_{0}$ at a 5% significance level.

Our null hypothesis is that smoking is unrelated to chronic bronchitis in this factory ($p_{1}=p_{0}$), while the alternative hypothesis is that there is a difference in proportions of employees having chronic bronchitis between the smokers and the non-smokers ($p_{1} \neq p_{0}$).

In [None]:
smoke_bronch <- dust %>%
                    select("bronch", "smoke") %>%
                    filter(!is.na(smoke))  %>%
                    mutate(smoke = recode(smoke, `0`="non_smoker", `1`="smoker"),
                           bronch = fct_recode(factor(bronch), "no_reaction" = "0", "reaction" = '1'))
head(smoke_bronch)
table(smoke_bronch)

<b>Question 3.1: Observed test statistic</b>
<br> {points:1}

Calculate the observed test statistic $\hat{p}_1-\hat{p}_0$. 

_Assign your data frame to an object called `dust_summary`. The data frame should contain five columns: `n_non_smoker`,	`n_smoker`,	`p_hat_non_smoker`,	`p_hat_smoker`, and `prop_diff`._

In [None]:
# dust_summary <-
#     smoke_bronch %>% 
#     group_by(...) %>% 
#     summarise(n = ..., 
#               p_hat = ...,  
#              `.groups` = "drop") %>% 
#     pivot_wider(names_from = smoke, values_from = c(n, p_hat)) %>% 
#     mutate(...)

# your code here
fail() # No Answer - remove if you provide an answer

dust_summary

In [None]:
test_3.1()

<b>Question 3.2</b>
<br> {points: 1}

Add a sixth column to `dust_summary`, named `null_std_error`, with the standard error of the test statistic under the null model. 

In [None]:

# your code here
fail() # No Answer - remove if you provide an answer

dust_summary

In [None]:
test_3.2()

<b>Question 3.3:</b>
<br> {points: 3}

Add another column to `dust_summary`, named `p_value`, with the p-value of the test statistic calculated using the CLT.

In [None]:
# your code here
fail() # No Answer - remove if you provide an answer

dust_summary

In [None]:
# Here we check to see if you have given your answer the correct object name
# and if your answer is plausible. However, all other tests have been hidden
# so you can practice deciding when you have the correct answer.
  test_that('Did not assign answer to an object called "dust_summary"', {
    expect_true(exists("dust_summary"))
  })

  test_that("Solution should be a data frame", {
    expect_true("data.frame" %in% class(dust_summary))
  })

  expected_colnames <- c("n_non_smoker", "n_smoker", "p_hat_non_smoker", "p_hat_smoker", "prop_diff", "null_std_error", "p_value")
  given_colnames <- colnames(dust_summary)
  test_that("Data frame does not have the correct columns", {
    expect_equal(length(setdiff(
      union(expected_colnames, given_colnames),
      intersect(expected_colnames, given_colnames)
    )), 0)
  })

  test_that("Data frame does not contain the correct number of rows", {
    expect_equal(digest(as.integer(nrow(dust_summary))), "4b5630ee914e848e8d07221556b0a2fb")
  })

  test_that("Data frame does not contain the correct data", {
    expect_equal(digest(as.integer(sum(dust_summary$n_non_smoker))), "87f3407882014ba8f18016b8e408ad35")
    expect_equal(digest(as.integer(sum(dust_summary$n_smoker))), "76bb48cccd68153482cda2b8ebfecd93")
    expect_equal(digest(as.integer(sum(dust_summary$p_hat_non_smoker) * 10e6)), "e772aa3c54a729784d27e153931979c7")
    expect_equal(digest(as.integer(sum(dust_summary$p_hat_smoker) * 10e6)), "65105150a89cf0f1e8f0b93ae773058d")
    expect_equal(digest(as.integer(sum(dust_summary$prop_diff) * 10e6)), "2b59c114711ded35f1991bdb2cfe5562")
  })


**Question 3.4** 
<br> {points: 1}

Use R's `prop.test` function to test the hypothesis. Make sure to use the `correct = FALSE`.
Make sure to use `broom::tidy()` to get a more organized result.

_Assign your data frame to an object called `dust_prop_test`._

In [None]:
# your code here
fail() # No Answer - remove if you provide an answer

dust_prop_test

In [None]:
test_3.4()

**Question 3.5**
<br>{points: 3}

What can we conclude from this test?

A. It would be very unlikely to observe a difference in proportion $\hat{p}_1 - \hat{p}_2 = 0.104749$ among 325 smokers and 921 non-smokers, if the proportions were the same. Therefore, we reject $H_0$ at 5% significance level and conclude that smokers have a higher chance of having a reaction.

B. It would be very unlikely to observe a difference in proportion $\hat{p}_1 - \hat{p}_2 = 0.104749$ among 325 smokers and 921 non-smokers, if the proportions were the same. Therefore, we accept $H_0$ at 5% significance level and conclude that smokers do not have a higher chance of having a reaction.

C. It is quite plausible to observe a difference in proportion $\hat{p}_1 - \hat{p}_2 = 0.104749$ among 325 smokers and 921 non-smokers, if the proportions were the same. Therefore, we do not reject $H_0$ at 5% significance level and conclude that smokers do not have a higher chance of having a reaction.

D. It is quite plausible to observe a difference in proportion $\hat{p}_1 - \hat{p}_2 = 0.104749$ among 325 smokers and 921 non-smokers, if the proportions were the same. Therefore, we reject $H_0$ at 5% significance level and conclude that smokers have a higher chance of having a reaction.

_Assign your answer to an object called `answer3.5`. Your answer should be a single character surrounded by quotes._

In [None]:
# answer3.5 <- ...

# your code here
fail() # No Answer - remove if you provide an answer

In [None]:
# Here we check to see if you have given your answer the correct object name
# and if your answer is plausible. However, all other tests have been hidden
# so you can practice deciding when you have the correct answer.
test_that('Did not assign answer to an object called "answer3.5"', {
  expect_true(exists("answer3.5"))
})
test_that('Solution should be a single character ("A", "B", "C", or "D")', {
  expect_match(answer3.5, "a|b|c|d", ignore.case = TRUE)
})

## 4. The difference in lengths of fish species

We will use the `fishcatch` dataset, which records the measurement of 159 fish of 7 species from the lake Laengelmavesi near Tampere in Finland. More details of the dataset are in http://jse.amstat.org/datasets/fishcatch.txt.

In [None]:
fish <- read.table("http://jse.amstat.org/datasets/fishcatch.dat.txt", header=FALSE)
colnames(fish) <- c("Obs", "Species", "Weight", "Length1", "Length2", "Length3", "Height", "Width", "Sex")
head(fish)

We are interested to see if there is a significant difference in the length from the nose to the end of the tail (`Length3`) of the two species Bream (`Species==1`) and Roach (`Species==3`). Let $\mu_1$ be the mean of Bream's length, and let $\mu_2$ be the mean of Roach's length. We will perform hypothesis testing on $\mu_1 - \mu_2$ at a 5 % significance level. The null hypothesis is $\mu_1 = \mu_2$.

<b>Question 4.1: Alternative Hypothesis</b>
<br> {points: 3}

What is an appropriate alternative hypothesis?

A. $\mu_1 > \mu_2$

B. $\mu_1 \neq \mu_2$

C. $\mu_1 < \mu_2$

<i>Your answer should be a string with one letter assigned to the variable </i>`answer4.1`.

In [None]:
# answer4.1 <-

# your code here
fail() # No Answer - remove if you provide an answer
answer4.1

In [None]:
# Here we check to see if you have given your answer the correct object name
# and if your answer is plausible. However, all other tests have been hidden
# so you can practice deciding when you have the correct answer.
test_that('Did not assign answer to an object called "answer4.1"', {
  expect_true(exists("answer4.1"))
})
test_that('Solution should be a single character ("A", "B", or "C")', {
  expect_match(answer4.1, "a|b|c", ignore.case = TRUE)
})

<b>Question 4.2</b>
<br> {points: 1}

Filter the `fish` data set to only keep the Bream (`Species==1`) and Roach (`Species==3`) species, and only keep the `Species` and `Length3` columns. Replace the species code with the name of the species.

_Assign your data frame to an object called `bream_roach`. The data frame should contain two columns: `Species` and `Length3`._

In [None]:
# bream_roach <- 
#     fish %>%
#     select(...) %>%
#     filter(...) %>%
#     mutate(Species = fct_recode(as_factor(Species), Bream='1', Roach='3'))


# your code here
fail() # No Answer - remove if you provide an answer

head(bream_roach)

In [None]:
test_4.2()

<b> Question 4.3: Observed Test Statistic </b>
<br> {points: 1}

In this exercise, you need to:

1. obtain the sample size of `Bream` and `Roach` species. 
2. calculate the sample average and standard deviation of `Length3` for `Bream` species.
3. calculate the sample average and standard deviation of `Length3` for `Roach` species.
4. calculate observed test statistic $\bar{X}_1 - \bar{X}_2$, where $\bar{X}_1$ and $\bar{X_2}$ are the sample average of `Length3` of `Bream` and `Roach` species, respectively. 

_Assign your data frame to an object called `bream_roach_summary`. The data frame should contain seven columns: `n_Bream`,	`n_Roach`, `x_bar_Bream`, `x_bar_Roach`, `sd_Bream`, `sd_Roach`, and `mean_diff`._

In [None]:
# bream_roach_summary <-
#     bream_roach %>% 
#     group_by(...) %>% 
#     summarise(n = ...,
#               x_bar = ...,
#               sd = ...,
#               `.groups` = "drop") %>% 
#     pivot_wider(names_from = Species, values_from = c(n, x_bar, sd)) %>% 
#     mutate(...)


# your code here
fail() # No Answer - remove if you provide an answer

bream_roach_summary

In [None]:
test_4.3()

<b> Question 4.4: Standard deviation of the test statistic </b>
<br> {points:1}

Add another column to `bream_roach_summary`, named `null_std_error`, with the standard error of the test statistics $\bar{X}_1 - \bar{X}_2$ under the null model. 

In [None]:

# your code here
fail() # No Answer - remove if you provide an answer

bream_roach_summary

In [None]:
test_4.4()

<b> Question 4.5: Obtaining p-value </b>
<br> {points: 1}

Add another column to `bream_roach_summary`, named `p_value`, with the test's p-value using the t-distribution. To help you, we calculated the approximate degrees of freedom for you: 40.7105 (for details on how to obtain this value, see Question 2.3 from Worksheet_07). 

In [None]:
# your code here
fail() # No Answer - remove if you provide an answer

bream_roach_summary

In [None]:
test_4.5()

**Question 4.6** 
<br> {points: 1}

Use R's `t.test` function to test the hypothesis. Make sure to use `broom::tidy()` to get a more organized result.

_Assign your data frame to an object called `bream_roach_t_test`._

In [None]:
# bream_roach_t_test <-

# your code here
fail() # No Answer - remove if you provide an answer

bream_roach_t_test

In [None]:
test_4.6()

<b>Question 4.7: Conclusion of the test</b>
<br>{points: 3}

It would be unlikely to observe a difference in the average length of 13.38 if both species had the same mean length. Therefore, we reject $H_0$ at the following significance levels:

A. 10%

B. 5%

C. 1%
 
D. 0.01%

E. All the above.

_Assign your answer to an object called `answer4.7`. Your answer should be a single character surrounded by quotes._


In [None]:
# answer4.7 <- ...

# your code here
fail() # No Answer - remove if you provide an answer

In [None]:
# Here we check to see if you have given your answer the correct object name
# and if your answer is plausible. However, all other tests have been hidden
# so you can practice deciding when you have the correct answer.
test_that('Did not assign answer to an object called "answer4.7"', {
  expect_true(exists("answer4.7"))
})
test_that('Solution should be a single character ("A", "B", "C", "D", or "E")', {
  expect_match(answer4.7, "a|b|c|d|e", ignore.case = TRUE)
})

## 5. Soybean: stress vs non-stress

For this question, we will use the `soybean` dataset. First, let's load and preview the data. Since the dataset is small, we can examine the full data.

In [None]:
soybean <- read_csv("data/soybean.csv")
soybean

Here's the description of the data set taken directly from the package `isdals`' documentation:

>An experiment was carried out with 26 soybean plants. The plants were pairwise genetically identical, so there were 13 pairs in total. For each pair, one of the plants was 'stressed' by being shaken daily, whereas the other plant was not shaken. After a period the plants were harvested and the total leaf area was measured for each plant.

We would like to investigate whether the stress induced by daily shaking the plants affects the total leaf area. Let $\mu_1$ be the average of the total leaf area of stressed plants, and let $\mu_0$ be the average of the total leaf area of unstressed plants. We would like to do hypothesis testing on $\mu_1-\mu_0$ at a 5% significance level and suppose that the null hypothesis is $\mu_1=\mu_0$. We would like to test the difference in paired means.

<b>Question 5.1: Mean and standard deviation</b>
<br> {points: 1}

Let’s create the summary data frame for `soybean` data set. Your job is to:

1. calculate the difference between the pairs, save it in a column named `d’;
2. calculate the mean and standard deviation of the differences, stored in columns `d_bar` and `sd`, respectively; 
3. calculate the standard error of the mean difference and store it in a column `std_error`;
4. finally, store the sample size as well in a column called ’n’.

_Assign your data frame to an object called `soybean_summary`._

In [None]:
# soybean_summary <-
#     soybean_original %>% 
#     mutate(...) %>% 
#     summarise(n = n(), 
#               d_bar = ..., 
#               sd =...,
#               std_error = ...)

# your code here
fail() # No Answer - remove if you provide an answer

soybean_summary

In [None]:
test_5.1()

<b> Question 5.2: P-value </b>
<br> {points: 3}

Add another column to `soybean_summary`, named `p_value`, with the p-value associated with the observed test statistic `d_bar`. 

In [None]:

# your code here
fail() # No Answer - remove if you provide an answer

soybean_summary

In [None]:
# Here we check to see if you have given your answer the correct object name
# and if your answer is plausible. However, all other tests have been hidden
# so you can practice deciding when you have the correct answer.
  test_that('Did not assign answer to an object called "soybean_summary"', {
    expect_true(exists("soybean_summary"))
  })

  test_that("Solution should be a data frame", {
    expect_true("data.frame" %in% class(soybean_summary))
  })

  expected_colnames <- c("n", "d_bar", "sd", "std_error", "p_value")
  given_colnames <- colnames(soybean_summary)
  test_that("Data frame does not have the correct columns", {
    expect_equal(length(setdiff(
      union(expected_colnames, given_colnames),
      intersect(expected_colnames, given_colnames)
    )), 0)
  })
  test_that("Data frame does not contain the correct number of rows", {
    expect_equal(digest(as.integer(nrow(soybean_summary))), "4b5630ee914e848e8d07221556b0a2fb")
  })


**Question 5.3** 
<br> {points: 1}

Use R's `t.test` function to test the hypotheses. Make sure to use `broom::tidy()` to get a more organized result.

_Assign your data frame to an object called `soybean_t_test`._

In [None]:
# your code here
fail() # No Answer - remove if you provide an answer

soybean_t_test

In [None]:
test_5.3()

**Question 5.4**
<br>{points: 3}

True or false?

We reject at 5% significance the null hypothesis that the mean total leaf area is the same for stressed and no-stressed soybeans. 

_Assign your answer to an object called `answer5.4`. Your answer should be either "true" or "false", surrounded by quotes._

In [None]:
# answer5.4 <- ...

# your code here
fail() # No Answer - remove if you provide an answer

In [None]:
# Here we check to see if you have given your answer the correct object name
# and if your answer is plausible. However, all other tests have been hidden
# so you can practice deciding when you have the correct answer.
test_that('Did not assign answer to an object called "answer5.4"', {
  expect_true(exists("answer5.4"))
})
test_that('Answer should be "true" or "false"', {
  expect_match(answer5.4, "true|false", ignore.case = TRUE)
})