# Tutorial: Statistical Inference

This worksheet covers the [Statistical inference](https://datasciencebook.ca/inference.html) chapter of the online textbook, which also lists the learning objectives for this worksheet. You should read the textbook chapter before attempting this worksheet. 

In [None]:
### Run this cell before continuing.
library(tidyverse)
library(repr)
library(infer)
options(repr.matrix.max.rows = 6)
source('cleanup.R')

###  Virtual sampling simulation

In this tutorial you will study samples and sample means generated from different distributions. In real life, we rarely, if ever, have measurements for our entire population. Here, however, we will make simulated datasets so we can understand the behaviour of sample means.

Suppose we had the data science final grades for a large population of students. 

In [None]:
# run this cell to simulate a finite population
set.seed(20201) # DO NOT CHANGE
students_pop <- tibble(grade = (rnorm(mean = 70, sd = 8, n = 10000)))
students_pop

**Question 1.0** 
<br> {points: 1}

Visualize the distribution of the population (`students_pop`) that was just created by plotting a histogram using `binwidth = 1` in the `geom_histogram` argument. Name the plot `pop_dist` and give x-axis a descriptive label.

In [None]:
options(repr.plot.width = 8, repr.plot.height = 6)
# ... <- ggplot(..., ...) + 
#    geom_...(...) +
#    ... +
#    ggtitle("Population distribution")

# your code here
fail() # No Answer - remove if you provide an answer
pop_dist

In [None]:
library(digest)
stopifnot("type of plot is not correct (if you are using two types of geoms, try flipping the order of the geom objects!)"= setequal(digest(paste(toString(sapply(seq_len(length(pop_dist$layers)), function(i) {c(class(pop_dist$layers[[i]]$geom))[1]})), "2af5c")), "5eeca1fbe1586f9858cac03a42a2c6d8"))
stopifnot("variable x is not correct"= setequal(digest(paste(toString(unlist(lapply(sapply(seq_len(length(pop_dist$layers)), function(i) {rlang::get_expr(c(pop_dist$layers[[i]]$mapping, pop_dist$mapping)$x)}), as.character))), "2af5c")), "2d35d584ff41f55cc43d60fe2a5a3733"))
stopifnot("variable y is not correct"= setequal(digest(paste(toString(unlist(lapply(sapply(seq_len(length(pop_dist$layers)), function(i) {rlang::get_expr(c(pop_dist$layers[[i]]$mapping, pop_dist$mapping)$y)}), as.character))), "2af5c")), "53fe06ea3501d8f4f17dedd0d728580f"))
stopifnot("x-axis label is not descriptive, nicely formatted, or human readable"= setequal(digest(paste(toString(rlang::get_expr(c(pop_dist$layers[[1]]$mapping, pop_dist$mapping)$x)!= pop_dist$labels$x), "2af5c")), "318ba09d770af05e32813fdd8d859ac9"))
stopifnot("y-axis label is not descriptive, nicely formatted, or human readable"= setequal(digest(paste(toString(rlang::get_expr(c(pop_dist$layers[[1]]$mapping, pop_dist$mapping)$y)!= pop_dist$labels$y), "2af5c")), "53fe06ea3501d8f4f17dedd0d728580f"))
stopifnot("incorrect colour variable in pop_dist, specify a correct one if required"= setequal(digest(paste(toString(rlang::get_expr(c(pop_dist$layers[[1]]$mapping, pop_dist$mapping)$colour)), "2af5c")), "53fe06ea3501d8f4f17dedd0d728580f"))
stopifnot("incorrect shape variable in pop_dist, specify a correct one if required"= setequal(digest(paste(toString(rlang::get_expr(c(pop_dist$layers[[1]]$mapping, pop_dist$mapping)$shape)), "2af5c")), "53fe06ea3501d8f4f17dedd0d728580f"))
stopifnot("the colour label in pop_dist is not descriptive, nicely formatted, or human readable"= setequal(digest(paste(toString(rlang::get_expr(c(pop_dist$layers[[1]]$mapping, pop_dist$mapping)$colour) != pop_dist$labels$colour), "2af5c")), "53fe06ea3501d8f4f17dedd0d728580f"))
stopifnot("the shape label in pop_dist is not descriptive, nicely formatted, or human readable"= setequal(digest(paste(toString(rlang::get_expr(c(pop_dist$layers[[1]]$mapping, pop_dist$mapping)$colour) != pop_dist$labels$shape), "2af5c")), "53fe06ea3501d8f4f17dedd0d728580f"))
stopifnot("fill variable in pop_dist is not correct"= setequal(digest(paste(toString(quo_name(pop_dist$mapping$fill)), "2af5c")), "506c90e5392fd49779dcd9b6bc6f2cf4"))
stopifnot("fill label in pop_dist is not informative"= setequal(digest(paste(toString((quo_name(pop_dist$mapping$fill) != pop_dist$labels$fill)), "2af5c")), "53fe06ea3501d8f4f17dedd0d728580f"))
stopifnot("position argument in pop_dist is not correct"= setequal(digest(paste(toString(class(pop_dist$layers[[1]]$position)[1]), "2af5c")), "02b9ff8a71ac55d7874c75ef213f9515"))

stopifnot("pop_dist$data should be a data frame"= setequal(digest(paste(toString('data.frame' %in% class(pop_dist$data)), "2af5d")), "f4968873ba787aece546714d4e1a6b85"))
stopifnot("dimensions of pop_dist$data are not correct"= setequal(digest(paste(toString(dim(pop_dist$data)), "2af5d")), "77fbf2851b2c2966c3a5ae64b99a2e5d"))
stopifnot("column names of pop_dist$data are not correct"= setequal(digest(paste(toString(sort(colnames(pop_dist$data))), "2af5d")), "f18a311faf67ee2d4f3209b2a8eea1b1"))
stopifnot("types of columns in pop_dist$data are not correct"= setequal(digest(paste(toString(sort(unlist(sapply(pop_dist$data, class)))), "2af5d")), "58dfdd80038ee65d08f6ed9080730cb3"))
stopifnot("values in one or more numerical columns in pop_dist$data are not correct"= setequal(digest(paste(toString(if (any(sapply(pop_dist$data, is.numeric))) sort(round(sapply(pop_dist$data[, sapply(pop_dist$data, is.numeric)], sum, na.rm = TRUE), 2)) else 0), "2af5d")), "8d36033f4f54f161719d24043ca6517b"))
stopifnot("values in one or more character columns in pop_dist$data are not correct"= setequal(digest(paste(toString(if (any(sapply(pop_dist$data, is.character))) sum(sapply(pop_dist$data[sapply(pop_dist$data, is.character)], function(x) length(unique(x)))) else 0), "2af5d")), "bbf6499f01b827484ab6cb7ffd2788ba"))
stopifnot("values in one or more factor columns in pop_dist$data are not correct"= setequal(digest(paste(toString(if (any(sapply(pop_dist$data, is.factor))) sum(sapply(pop_dist$data[, sapply(pop_dist$data, is.factor)], function(col) length(unique(col)))) else 0), "2af5d")), "bbf6499f01b827484ab6cb7ffd2788ba"))

print('Success!')

**Question 1.1** 
<br> {points: 3}

Describe in words the distribution above, comment on the shape, center and how spread out the distribution is. 

DOUBLE CLICK TO EDIT **THIS CELL** AND REPLACE THIS TEXT WITH YOUR ANSWER.

**Question 1.2** 
<br> {points: 1}

Use `summarise` to calculate the following population parameters from the `students_pop` population:
- mean (use the `mean` function)
- median (use the `median` function)
- standard deviation (use the `sd` function)

Name this data frame `pop_parameters` which has the column names `pop_mean`, `pop_med` and `pop_sd`.

In [None]:
# your code here
fail() # No Answer - remove if you provide an answer
pop_parameters

In [None]:
library(digest)
stopifnot("pop_parameters should be a data frame"= setequal(digest(paste(toString('data.frame' %in% class(pop_parameters)), "35fe6")), "951532348101e2e828d7743e14999f5a"))
stopifnot("dimensions of pop_parameters are not correct"= setequal(digest(paste(toString(dim(pop_parameters)), "35fe6")), "d8020d88ddaa7e0ec00bd3ad05eff98a"))
stopifnot("column names of pop_parameters are not correct"= setequal(digest(paste(toString(sort(colnames(pop_parameters))), "35fe6")), "ad6bfe90b7b5f13641cee07fcb0f09e5"))
stopifnot("types of columns in pop_parameters are not correct"= setequal(digest(paste(toString(sort(unlist(sapply(pop_parameters, class)))), "35fe6")), "0d6331618e49bc60df5e22bc19a52d69"))
stopifnot("values in one or more numerical columns in pop_parameters are not correct"= setequal(digest(paste(toString(if (any(sapply(pop_parameters, is.numeric))) sort(round(sapply(pop_parameters[, sapply(pop_parameters, is.numeric)], sum, na.rm = TRUE), 2)) else 0), "35fe6")), "496ee2b78a41bdbd468302d7d75a607c"))
stopifnot("values in one or more character columns in pop_parameters are not correct"= setequal(digest(paste(toString(if (any(sapply(pop_parameters, is.character))) sum(sapply(pop_parameters[sapply(pop_parameters, is.character)], function(x) length(unique(x)))) else 0), "35fe6")), "1bf4c6514e5485309876b0029cf330dd"))
stopifnot("values in one or more factor columns in pop_parameters are not correct"= setequal(digest(paste(toString(if (any(sapply(pop_parameters, is.factor))) sum(sapply(pop_parameters[, sapply(pop_parameters, is.factor)], function(col) length(unique(col)))) else 0), "35fe6")), "1bf4c6514e5485309876b0029cf330dd"))

print('Success!')

**Question 1.2.1** 
<br> {points: 1}

Draw one random sample of 5 students from our population of students (`students_pop`). 
Use `summarize` to calculate the mean, median, and standard deviation for these 5 students.

Name this data frame `ests_5` which should have column names `mean_5`, `med_5` and `sd_5`. Use the seed `4321`. 

In [None]:
set.seed(4321) # DO NOT CHANGE!
# your code here
fail() # No Answer - remove if you provide an answer
ests_5

In [None]:
library(digest)
stopifnot("ests_5 should be a data frame"= setequal(digest(paste(toString('data.frame' %in% class(ests_5)), "8a38c")), "ef25d7eff409df26397ce03aa2cc27de"))
stopifnot("dimensions of ests_5 are not correct"= setequal(digest(paste(toString(dim(ests_5)), "8a38c")), "f83bcd8246d29f3b8e4aaa28786fbfcd"))
stopifnot("column names of ests_5 are not correct"= setequal(digest(paste(toString(sort(colnames(ests_5))), "8a38c")), "ec2fd72f4b6901c180c4156150a0a3e0"))
stopifnot("types of columns in ests_5 are not correct"= setequal(digest(paste(toString(sort(unlist(sapply(ests_5, class)))), "8a38c")), "18e4eff0d41c1cb86bc74a6e02025413"))
stopifnot("values in one or more numerical columns in ests_5 are not correct"= setequal(digest(paste(toString(if (any(sapply(ests_5, is.numeric))) sort(round(sapply(ests_5[, sapply(ests_5, is.numeric)], sum, na.rm = TRUE), 2)) else 0), "8a38c")), "da9ed8946d952147a458d63fdedebad7"))
stopifnot("values in one or more character columns in ests_5 are not correct"= setequal(digest(paste(toString(if (any(sapply(ests_5, is.character))) sum(sapply(ests_5[sapply(ests_5, is.character)], function(x) length(unique(x)))) else 0), "8a38c")), "31b9d832adbea8a600e730d6f1567a38"))
stopifnot("values in one or more factor columns in ests_5 are not correct"= setequal(digest(paste(toString(if (any(sapply(ests_5, is.factor))) sum(sapply(ests_5[, sapply(ests_5, is.factor)], function(col) length(unique(col)))) else 0), "8a38c")), "31b9d832adbea8a600e730d6f1567a38"))

print('Success!')

**Question 1.2.2** Multiple Choice:
<br> {points: 1}

Which of the following is the point estimate for the average final grade for the population of data science students (rounded to two decimal places)? 

A. 70.03 

B. 69.76

C. 73.52

D. 8.05 

*Assign your answer to an object called `answer1.2.2`. Your answer should be a single character surrounded by quotes.*

In [None]:
# your code here
fail() # No Answer - remove if you provide an answer

In [None]:
library(digest)
stopifnot("type of answer1.2.2 is not character"= setequal(digest(paste(toString(class(answer1.2.2)), "a4059")), "b6de5a825d157e1a403a418a290a9d7f"))
stopifnot("length of answer1.2.2 is not correct"= setequal(digest(paste(toString(length(answer1.2.2)), "a4059")), "eaa914bd68cef23d2918a3d460df7631"))
stopifnot("value of answer1.2.2 is not correct"= setequal(digest(paste(toString(tolower(answer1.2.2)), "a4059")), "38717fb4eb9fbfd4c9f09ecae862d994"))
stopifnot("letters in string value of answer1.2.2 are correct but case is not correct"= setequal(digest(paste(toString(answer1.2.2), "a4059")), "eb5f719cd828787849b4233166c46eaa"))

print('Success!')

**Question 1.2.3** 
<br> {points: 1}

Draw one random sample of 100 students from our population of students (`students_pop`). Use `summarize` to calculate the mean, median and standard deviation for these 100 students.

Name this data frame `ests_100` which has the column names `mean_100`, `med_100` and `sd_100`. Use the seed `4321`. 

In [None]:
set.seed(4321) # DO NOT CHANGE!
# your code here
fail() # No Answer - remove if you provide an answer
ests_100

In [None]:
library(digest)
stopifnot("ests_100 should be a data frame"= setequal(digest(paste(toString('data.frame' %in% class(ests_100)), "db718")), "876ad9e7264b268c09ce45305055ba1a"))
stopifnot("dimensions of ests_100 are not correct"= setequal(digest(paste(toString(dim(ests_100)), "db718")), "d227f467d7a7a69559e20da5e397b83d"))
stopifnot("column names of ests_100 are not correct"= setequal(digest(paste(toString(sort(colnames(ests_100))), "db718")), "1d4e48352ea9395156c8dd0ac714c9c1"))
stopifnot("types of columns in ests_100 are not correct"= setequal(digest(paste(toString(sort(unlist(sapply(ests_100, class)))), "db718")), "9655b2d35ca9e5f4d96f58016d4b9e12"))
stopifnot("values in one or more numerical columns in ests_100 are not correct"= setequal(digest(paste(toString(if (any(sapply(ests_100, is.numeric))) sort(round(sapply(ests_100[, sapply(ests_100, is.numeric)], sum, na.rm = TRUE), 2)) else 0), "db718")), "43ed7b2a16e9b79042f8e356115bac7b"))
stopifnot("values in one or more character columns in ests_100 are not correct"= setequal(digest(paste(toString(if (any(sapply(ests_100, is.character))) sum(sapply(ests_100[sapply(ests_100, is.character)], function(x) length(unique(x)))) else 0), "db718")), "a3f47c3b2b6712b5c19d123d6f57e9b6"))
stopifnot("values in one or more factor columns in ests_100 are not correct"= setequal(digest(paste(toString(if (any(sapply(ests_100, is.factor))) sum(sapply(ests_100[, sapply(ests_100, is.factor)], function(col) length(unique(col)))) else 0), "db718")), "a3f47c3b2b6712b5c19d123d6f57e9b6"))

print('Success!')

### Exploring the sampling distribution of the sample mean for different populations
We will create the sampling distribution of the sample mean by taking 1500 random samples of size 5 from this population and visualize the distribution of the sample means. 


**Question 1.3** 
<br> {points: 1}

Draw 1500 random samples from our population of students (`students_pop`). Each sample should have 5 observations. Name the data frame `samples` and use the seed `4321`.

In [None]:
# ... <- rep_sample_n(..., size = ..., reps = ...)
set.seed(4321) # DO NOT CHANGE!
# your code here
fail() # No Answer - remove if you provide an answer
head(samples)
tail(samples)
dim(samples)

In [None]:
library(digest)
stopifnot("samples should be a data frame"= setequal(digest(paste(toString('data.frame' %in% class(samples)), "53644")), "82aad5b174c504f698c7eb5b6477268b"))
stopifnot("dimensions of samples are not correct"= setequal(digest(paste(toString(dim(samples)), "53644")), "f5853c8eee0d47a4ff798671be1b775e"))
stopifnot("column names of samples are not correct"= setequal(digest(paste(toString(sort(colnames(samples))), "53644")), "d0013e6354b4c715911be3fd75f540bb"))
stopifnot("types of columns in samples are not correct"= setequal(digest(paste(toString(sort(unlist(sapply(samples, class)))), "53644")), "84f7c8cf91c30a7a3afd5df213ef62c0"))
stopifnot("values in one or more numerical columns in samples are not correct"= setequal(digest(paste(toString(if (any(sapply(samples, is.numeric))) sort(round(sapply(samples[, sapply(samples, is.numeric)], sum, na.rm = TRUE), 2)) else 0), "53644")), "c40a96e040d422fa1a2f96f07495dc47"))
stopifnot("values in one or more character columns in samples are not correct"= setequal(digest(paste(toString(if (any(sapply(samples, is.character))) sum(sapply(samples[sapply(samples, is.character)], function(x) length(unique(x)))) else 0), "53644")), "674add4269674d459483903ebff34346"))
stopifnot("values in one or more factor columns in samples are not correct"= setequal(digest(paste(toString(if (any(sapply(samples, is.factor))) sum(sapply(samples[, sapply(samples, is.factor)], function(col) length(unique(col)))) else 0), "53644")), "674add4269674d459483903ebff34346"))

print('Success!')

**Question 1.4** 
<br> {points: 1}

Group by the sample replicate number, and then for each sample, calculate the mean. Name the data frame `sample_estimates`. The data frame should have the column names `replicate` and `mean_grade`.

In [None]:
# your code here
fail() # No Answer - remove if you provide an answer
head(sample_estimates)
tail(sample_estimates)

In [None]:
library(digest)
stopifnot("sample_estimates should be a data frame"= setequal(digest(paste(toString('data.frame' %in% class(sample_estimates)), "b8b25")), "2d408ca6be1f498f982fd9853d5b8d81"))
stopifnot("dimensions of sample_estimates are not correct"= setequal(digest(paste(toString(dim(sample_estimates)), "b8b25")), "0db6b5a4f5c0913349534cbf095b43a2"))
stopifnot("column names of sample_estimates are not correct"= setequal(digest(paste(toString(sort(colnames(sample_estimates))), "b8b25")), "c626bedeef2ad142f74b2f7e64239988"))
stopifnot("types of columns in sample_estimates are not correct"= setequal(digest(paste(toString(sort(unlist(sapply(sample_estimates, class)))), "b8b25")), "861e757d3fcdcddb7cb5ae9786d95da6"))
stopifnot("values in one or more numerical columns in sample_estimates are not correct"= setequal(digest(paste(toString(if (any(sapply(sample_estimates, is.numeric))) sort(round(sapply(sample_estimates[, sapply(sample_estimates, is.numeric)], sum, na.rm = TRUE), 2)) else 0), "b8b25")), "aff07305228faf965309058dff7f1b86"))
stopifnot("values in one or more character columns in sample_estimates are not correct"= setequal(digest(paste(toString(if (any(sapply(sample_estimates, is.character))) sum(sapply(sample_estimates[sapply(sample_estimates, is.character)], function(x) length(unique(x)))) else 0), "b8b25")), "543908673ae6bcce404a28d84002169a"))
stopifnot("values in one or more factor columns in sample_estimates are not correct"= setequal(digest(paste(toString(if (any(sapply(sample_estimates, is.factor))) sum(sapply(sample_estimates[, sapply(sample_estimates, is.factor)], function(col) length(unique(col)))) else 0), "b8b25")), "543908673ae6bcce404a28d84002169a"))

print('Success!')

**Question 1.5** 
<br> {points: 1}

Visualize the distribution of the sample estimates (`sample_estimates`) you just calculated by plotting a histogram using `binwidth = 1` in the `geom_histogram` argument. Name the plot `sampling_distribution_5` and give the plot (using `ggtitle`) and the x axis a descriptive label.

In [None]:
options(repr.plot.width = 8, repr.plot.height = 6)
# your code here
fail() # No Answer - remove if you provide an answer
sampling_distribution_5

In [None]:
library(digest)
stopifnot("type of plot is not correct (if you are using two types of geoms, try flipping the order of the geom objects!)"= setequal(digest(paste(toString(sapply(seq_len(length(sampling_distribution_5$layers)), function(i) {c(class(sampling_distribution_5$layers[[i]]$geom))[1]})), "194a")), "f6cdcc9f1ad20ef4268b8da3bf704f64"))
stopifnot("variable x is not correct"= setequal(digest(paste(toString(unlist(lapply(sapply(seq_len(length(sampling_distribution_5$layers)), function(i) {rlang::get_expr(c(sampling_distribution_5$layers[[i]]$mapping, sampling_distribution_5$mapping)$x)}), as.character))), "194a")), "bd1c2d095cb7789cf55039ad70e75e62"))
stopifnot("variable y is not correct"= setequal(digest(paste(toString(unlist(lapply(sapply(seq_len(length(sampling_distribution_5$layers)), function(i) {rlang::get_expr(c(sampling_distribution_5$layers[[i]]$mapping, sampling_distribution_5$mapping)$y)}), as.character))), "194a")), "48a1bdaabffa1ef80c59b40894904b8b"))
stopifnot("x-axis label is not descriptive, nicely formatted, or human readable"= setequal(digest(paste(toString(rlang::get_expr(c(sampling_distribution_5$layers[[1]]$mapping, sampling_distribution_5$mapping)$x)!= sampling_distribution_5$labels$x), "194a")), "9108628c07afcff10b527ccd6fcbf78c"))
stopifnot("y-axis label is not descriptive, nicely formatted, or human readable"= setequal(digest(paste(toString(rlang::get_expr(c(sampling_distribution_5$layers[[1]]$mapping, sampling_distribution_5$mapping)$y)!= sampling_distribution_5$labels$y), "194a")), "48a1bdaabffa1ef80c59b40894904b8b"))
stopifnot("incorrect colour variable in sampling_distribution_5, specify a correct one if required"= setequal(digest(paste(toString(rlang::get_expr(c(sampling_distribution_5$layers[[1]]$mapping, sampling_distribution_5$mapping)$colour)), "194a")), "48a1bdaabffa1ef80c59b40894904b8b"))
stopifnot("incorrect shape variable in sampling_distribution_5, specify a correct one if required"= setequal(digest(paste(toString(rlang::get_expr(c(sampling_distribution_5$layers[[1]]$mapping, sampling_distribution_5$mapping)$shape)), "194a")), "48a1bdaabffa1ef80c59b40894904b8b"))
stopifnot("the colour label in sampling_distribution_5 is not descriptive, nicely formatted, or human readable"= setequal(digest(paste(toString(rlang::get_expr(c(sampling_distribution_5$layers[[1]]$mapping, sampling_distribution_5$mapping)$colour) != sampling_distribution_5$labels$colour), "194a")), "48a1bdaabffa1ef80c59b40894904b8b"))
stopifnot("the shape label in sampling_distribution_5 is not descriptive, nicely formatted, or human readable"= setequal(digest(paste(toString(rlang::get_expr(c(sampling_distribution_5$layers[[1]]$mapping, sampling_distribution_5$mapping)$colour) != sampling_distribution_5$labels$shape), "194a")), "48a1bdaabffa1ef80c59b40894904b8b"))
stopifnot("fill variable in sampling_distribution_5 is not correct"= setequal(digest(paste(toString(quo_name(sampling_distribution_5$mapping$fill)), "194a")), "ae6c1b9637f38b610cebd8e4201263e6"))
stopifnot("fill label in sampling_distribution_5 is not informative"= setequal(digest(paste(toString((quo_name(sampling_distribution_5$mapping$fill) != sampling_distribution_5$labels$fill)), "194a")), "48a1bdaabffa1ef80c59b40894904b8b"))
stopifnot("position argument in sampling_distribution_5 is not correct"= setequal(digest(paste(toString(class(sampling_distribution_5$layers[[1]]$position)[1]), "194a")), "b987e29b0b4940fe7c8a6612e352905d"))

stopifnot("sampling_distribution_5$data should be a data frame"= setequal(digest(paste(toString('data.frame' %in% class(sampling_distribution_5$data)), "194b")), "ad9b82f68492d221b8b0f63b49572f27"))
stopifnot("dimensions of sampling_distribution_5$data are not correct"= setequal(digest(paste(toString(dim(sampling_distribution_5$data)), "194b")), "e3665850f5c1e50ad04cd937dccb5dfb"))
stopifnot("column names of sampling_distribution_5$data are not correct"= setequal(digest(paste(toString(sort(colnames(sampling_distribution_5$data))), "194b")), "c88c821b0073942b0e36abcd7abea5b4"))
stopifnot("types of columns in sampling_distribution_5$data are not correct"= setequal(digest(paste(toString(sort(unlist(sapply(sampling_distribution_5$data, class)))), "194b")), "5634a3b5fb62a7d9eb9d4d947b411a71"))
stopifnot("values in one or more numerical columns in sampling_distribution_5$data are not correct"= setequal(digest(paste(toString(if (any(sapply(sampling_distribution_5$data, is.numeric))) sort(round(sapply(sampling_distribution_5$data[, sapply(sampling_distribution_5$data, is.numeric)], sum, na.rm = TRUE), 2)) else 0), "194b")), "cda1e069f30482abfd6bb32c143cd7ce"))
stopifnot("values in one or more character columns in sampling_distribution_5$data are not correct"= setequal(digest(paste(toString(if (any(sapply(sampling_distribution_5$data, is.character))) sum(sapply(sampling_distribution_5$data[sapply(sampling_distribution_5$data, is.character)], function(x) length(unique(x)))) else 0), "194b")), "db6732fd99a68bd10764ecd58ab79979"))
stopifnot("values in one or more factor columns in sampling_distribution_5$data are not correct"= setequal(digest(paste(toString(if (any(sapply(sampling_distribution_5$data, is.factor))) sum(sapply(sampling_distribution_5$data[, sapply(sampling_distribution_5$data, is.factor)], function(col) length(unique(col)))) else 0), "194b")), "db6732fd99a68bd10764ecd58ab79979"))

stopifnot("type of is.character(sampling_distribution_5$labels$title) is not logical"= setequal(digest(paste(toString(class(is.character(sampling_distribution_5$labels$title))), "194c")), "19c2ca441ec9415386ab32d68d8436f1"))
stopifnot("logical value of is.character(sampling_distribution_5$labels$title) is not correct"= setequal(digest(paste(toString(is.character(sampling_distribution_5$labels$title)), "194c")), "44be72a226d98d2946087a51278fc86b"))

print('Success!')

**Question 1.6** 
<br> {points: 3}

Describe in words the distribution above, comment on the shape, center and how spread out the distribution is. Compare this sampling distribution to the population distribution of students' grades above. 

DOUBLE CLICK TO EDIT **THIS CELL** AND REPLACE THIS TEXT WITH YOUR ANSWER.

**Question 1.6.1** 
<br> {points: 3}

Repeat **Q1.3 - 1.5**, but now for 100 observations:  
1. Draw 1500 random samples from our population of students (`students_pop`). Each sample should have 100 observations. Use the seed `4321`.
2. Group by the sample replicate number, and then for each sample, calculate the mean (call this column `mean_grade_100`).
3. Visualize the distribution of the sample estimates you calculated by plotting a histogram using `binwidth = 0.5` in the `geom_histogram` argument. Name the plot `sampling_distribution_100` and give the plot title (using `ggtitle`) and the x axis a descriptive label.

***Note: This question has hidden tests. The "success" message here only means that you've created the right objects that will be autograded.***

In [None]:
set.seed(4321) # DO NOT CHANGE!
# your code here
fail() # No Answer - remove if you provide an answer
sampling_distribution_100

In [None]:
library(digest)
stopifnot("type of exists('sampling_distribution_100') is not logical"= setequal(digest(paste(toString(class(exists('sampling_distribution_100'))), "ce6da")), "1413a586ec4ecd27de0adeebcd136d5b"))
stopifnot("logical value of exists('sampling_distribution_100') is not correct"= setequal(digest(paste(toString(exists('sampling_distribution_100')), "ce6da")), "07f48e2616e3ecb3dfdf08ead2655e19"))

print('Success!')

**Question 1.6.2** 
<br> {points: 3}

*Suppose we do not know the parameter value for the population of data science students (as is usually the case in real life).* Compare your point estimates for the population mean from **Q1.2.1 and 1.2.3** above. Which of the two point estimates is more likely to be closer to the actual value of the average final grade of the population of data science students? Briefly explain. (Hint: look at the sampling distributions for your samples of size 5 and size 100 to help you answer this question).

DOUBLE CLICK TO EDIT **THIS CELL** AND REPLACE THIS TEXT WITH YOUR ANSWER.

**Question 1.7**
<br> {points: 3}

Let's create a simulated dataset of the number of cups of coffee drunk per week for our population of students. 
Describe in words the distribution, comment on the shape, center and how spread out the distribution is. 

In [None]:
# run this cell to simulate a finite population
set.seed(2020) # DO NOT REMOVE
coffee_data = tibble(cups = rexp(n = 2000, rate = 0.34))

coffee_dist <- ggplot(coffee_data, aes(cups)) + 
    geom_histogram(binwidth = 0.5) +
    xlab("Cups of coffee per week") +
    ggtitle("Population distribution") +
    theme(text = element_text(size = 20))
coffee_dist

DOUBLE CLICK TO EDIT **THIS CELL** AND REPLACE THIS TEXT WITH YOUR ANSWER.

**Question 1.8**
<br> {points: 1}

Draw 1500 random samples from `coffee_data`. Each sample should have 5 observations. Assign this data frame to an object called `coffee_samples_5`.

Group by the sample replicate number, and then for each sample, calculate the mean. Name the data frame `coffee_sample_estimates_5`. The data frame should have the column names `replicate` and `coffee_mean_cups_5`.

Finally, create a plot of the sampling distribution called `coffee_sampling_distribution_5`.

> Hint: a binwidth of 1 is a little too big for this data, try a binwidth of 0.5 instead.

In [None]:
set.seed(4321) # DO NOT CHANGE!

# your code here
fail() # No Answer - remove if you provide an answer
coffee_sampling_distribution_5

In [None]:
library(digest)
stopifnot("coffee_samples_5 should be a data frame"= setequal(digest(paste(toString('data.frame' %in% class(coffee_samples_5)), "eccbf")), "80c1e906bd1b660e98e11552d1591fde"))
stopifnot("dimensions of coffee_samples_5 are not correct"= setequal(digest(paste(toString(dim(coffee_samples_5)), "eccbf")), "7870666fd2b61b601129f0774fb5dbb6"))
stopifnot("column names of coffee_samples_5 are not correct"= setequal(digest(paste(toString(sort(colnames(coffee_samples_5))), "eccbf")), "d16147ee2609cd875e46479737c9dd8b"))
stopifnot("types of columns in coffee_samples_5 are not correct"= setequal(digest(paste(toString(sort(unlist(sapply(coffee_samples_5, class)))), "eccbf")), "032012da900f7176cccae62be5ee5f59"))
stopifnot("values in one or more numerical columns in coffee_samples_5 are not correct"= setequal(digest(paste(toString(if (any(sapply(coffee_samples_5, is.numeric))) sort(round(sapply(coffee_samples_5[, sapply(coffee_samples_5, is.numeric)], sum, na.rm = TRUE), 2)) else 0), "eccbf")), "2970d5f057e2f74f2b39d6d6a9a9aeab"))
stopifnot("values in one or more character columns in coffee_samples_5 are not correct"= setequal(digest(paste(toString(if (any(sapply(coffee_samples_5, is.character))) sum(sapply(coffee_samples_5[sapply(coffee_samples_5, is.character)], function(x) length(unique(x)))) else 0), "eccbf")), "ef99664bac3628d26c7f85c735b2cd18"))
stopifnot("values in one or more factor columns in coffee_samples_5 are not correct"= setequal(digest(paste(toString(if (any(sapply(coffee_samples_5, is.factor))) sum(sapply(coffee_samples_5[, sapply(coffee_samples_5, is.factor)], function(col) length(unique(col)))) else 0), "eccbf")), "ef99664bac3628d26c7f85c735b2cd18"))

stopifnot("coffee_sample_estimates_5 should be a data frame"= setequal(digest(paste(toString('data.frame' %in% class(coffee_sample_estimates_5)), "eccc0")), "fca6b2a0ec735e748bdb971113ffb0d1"))
stopifnot("dimensions of coffee_sample_estimates_5 are not correct"= setequal(digest(paste(toString(dim(coffee_sample_estimates_5)), "eccc0")), "0fe7ce94655686c6166c66b01c8a5dc9"))
stopifnot("column names of coffee_sample_estimates_5 are not correct"= setequal(digest(paste(toString(sort(colnames(coffee_sample_estimates_5))), "eccc0")), "8566bc90c1f5df8f08e70484068bfacc"))
stopifnot("types of columns in coffee_sample_estimates_5 are not correct"= setequal(digest(paste(toString(sort(unlist(sapply(coffee_sample_estimates_5, class)))), "eccc0")), "84d2965293c165b4b78175475199c6a8"))
stopifnot("values in one or more numerical columns in coffee_sample_estimates_5 are not correct"= setequal(digest(paste(toString(if (any(sapply(coffee_sample_estimates_5, is.numeric))) sort(round(sapply(coffee_sample_estimates_5[, sapply(coffee_sample_estimates_5, is.numeric)], sum, na.rm = TRUE), 2)) else 0), "eccc0")), "1800109deab3d5f2503465e91b2702df"))
stopifnot("values in one or more character columns in coffee_sample_estimates_5 are not correct"= setequal(digest(paste(toString(if (any(sapply(coffee_sample_estimates_5, is.character))) sum(sapply(coffee_sample_estimates_5[sapply(coffee_sample_estimates_5, is.character)], function(x) length(unique(x)))) else 0), "eccc0")), "66d402522b60d0c5e2341134eef6a612"))
stopifnot("values in one or more factor columns in coffee_sample_estimates_5 are not correct"= setequal(digest(paste(toString(if (any(sapply(coffee_sample_estimates_5, is.factor))) sum(sapply(coffee_sample_estimates_5[, sapply(coffee_sample_estimates_5, is.factor)], function(col) length(unique(col)))) else 0), "eccc0")), "66d402522b60d0c5e2341134eef6a612"))

stopifnot("type of plot is not correct (if you are using two types of geoms, try flipping the order of the geom objects!)"= setequal(digest(paste(toString(sapply(seq_len(length(coffee_sampling_distribution_5$layers)), function(i) {c(class(coffee_sampling_distribution_5$layers[[i]]$geom))[1]})), "eccc1")), "5f35759b44ee47a40d66f3bef50abf88"))
stopifnot("variable x is not correct"= setequal(digest(paste(toString(unlist(lapply(sapply(seq_len(length(coffee_sampling_distribution_5$layers)), function(i) {rlang::get_expr(c(coffee_sampling_distribution_5$layers[[i]]$mapping, coffee_sampling_distribution_5$mapping)$x)}), as.character))), "eccc1")), "c791f6951bea0b6813f756fdaac2d285"))
stopifnot("variable y is not correct"= setequal(digest(paste(toString(unlist(lapply(sapply(seq_len(length(coffee_sampling_distribution_5$layers)), function(i) {rlang::get_expr(c(coffee_sampling_distribution_5$layers[[i]]$mapping, coffee_sampling_distribution_5$mapping)$y)}), as.character))), "eccc1")), "3a95eb8a356f60ec3fc91472ba5cda31"))
stopifnot("x-axis label is not descriptive, nicely formatted, or human readable"= setequal(digest(paste(toString(rlang::get_expr(c(coffee_sampling_distribution_5$layers[[1]]$mapping, coffee_sampling_distribution_5$mapping)$x)!= coffee_sampling_distribution_5$labels$x), "eccc1")), "de4e3dbe04761d62e8b127406496f942"))
stopifnot("y-axis label is not descriptive, nicely formatted, or human readable"= setequal(digest(paste(toString(rlang::get_expr(c(coffee_sampling_distribution_5$layers[[1]]$mapping, coffee_sampling_distribution_5$mapping)$y)!= coffee_sampling_distribution_5$labels$y), "eccc1")), "3a95eb8a356f60ec3fc91472ba5cda31"))
stopifnot("incorrect colour variable in coffee_sampling_distribution_5, specify a correct one if required"= setequal(digest(paste(toString(rlang::get_expr(c(coffee_sampling_distribution_5$layers[[1]]$mapping, coffee_sampling_distribution_5$mapping)$colour)), "eccc1")), "3a95eb8a356f60ec3fc91472ba5cda31"))
stopifnot("incorrect shape variable in coffee_sampling_distribution_5, specify a correct one if required"= setequal(digest(paste(toString(rlang::get_expr(c(coffee_sampling_distribution_5$layers[[1]]$mapping, coffee_sampling_distribution_5$mapping)$shape)), "eccc1")), "3a95eb8a356f60ec3fc91472ba5cda31"))
stopifnot("the colour label in coffee_sampling_distribution_5 is not descriptive, nicely formatted, or human readable"= setequal(digest(paste(toString(rlang::get_expr(c(coffee_sampling_distribution_5$layers[[1]]$mapping, coffee_sampling_distribution_5$mapping)$colour) != coffee_sampling_distribution_5$labels$colour), "eccc1")), "3a95eb8a356f60ec3fc91472ba5cda31"))
stopifnot("the shape label in coffee_sampling_distribution_5 is not descriptive, nicely formatted, or human readable"= setequal(digest(paste(toString(rlang::get_expr(c(coffee_sampling_distribution_5$layers[[1]]$mapping, coffee_sampling_distribution_5$mapping)$colour) != coffee_sampling_distribution_5$labels$shape), "eccc1")), "3a95eb8a356f60ec3fc91472ba5cda31"))
stopifnot("fill variable in coffee_sampling_distribution_5 is not correct"= setequal(digest(paste(toString(quo_name(coffee_sampling_distribution_5$mapping$fill)), "eccc1")), "e9182bcaae9b7c059ad01180b2ce005b"))
stopifnot("fill label in coffee_sampling_distribution_5 is not informative"= setequal(digest(paste(toString((quo_name(coffee_sampling_distribution_5$mapping$fill) != coffee_sampling_distribution_5$labels$fill)), "eccc1")), "3a95eb8a356f60ec3fc91472ba5cda31"))
stopifnot("position argument in coffee_sampling_distribution_5 is not correct"= setequal(digest(paste(toString(class(coffee_sampling_distribution_5$layers[[1]]$position)[1]), "eccc1")), "c9f701f7577122e6ad26d413d5d58c36"))

stopifnot("coffee_sampling_distribution_5$data should be a data frame"= setequal(digest(paste(toString('data.frame' %in% class(coffee_sampling_distribution_5$data)), "eccc2")), "32f691a3b331aa0ee677f730b86380f6"))
stopifnot("dimensions of coffee_sampling_distribution_5$data are not correct"= setequal(digest(paste(toString(dim(coffee_sampling_distribution_5$data)), "eccc2")), "33d2f9443f2f62cb094d707d680e08f9"))
stopifnot("column names of coffee_sampling_distribution_5$data are not correct"= setequal(digest(paste(toString(sort(colnames(coffee_sampling_distribution_5$data))), "eccc2")), "b2f14d16449b0eabcdd9ead37bdedd81"))
stopifnot("types of columns in coffee_sampling_distribution_5$data are not correct"= setequal(digest(paste(toString(sort(unlist(sapply(coffee_sampling_distribution_5$data, class)))), "eccc2")), "fb77731395d98f2dfba211730ac9819e"))
stopifnot("values in one or more numerical columns in coffee_sampling_distribution_5$data are not correct"= setequal(digest(paste(toString(if (any(sapply(coffee_sampling_distribution_5$data, is.numeric))) sort(round(sapply(coffee_sampling_distribution_5$data[, sapply(coffee_sampling_distribution_5$data, is.numeric)], sum, na.rm = TRUE), 2)) else 0), "eccc2")), "3d2a31b7a80fa127177ce8ce9395eeb8"))
stopifnot("values in one or more character columns in coffee_sampling_distribution_5$data are not correct"= setequal(digest(paste(toString(if (any(sapply(coffee_sampling_distribution_5$data, is.character))) sum(sapply(coffee_sampling_distribution_5$data[sapply(coffee_sampling_distribution_5$data, is.character)], function(x) length(unique(x)))) else 0), "eccc2")), "91db80112f34a6e7e1ce5fe8403a7f97"))
stopifnot("values in one or more factor columns in coffee_sampling_distribution_5$data are not correct"= setequal(digest(paste(toString(if (any(sapply(coffee_sampling_distribution_5$data, is.factor))) sum(sapply(coffee_sampling_distribution_5$data[, sapply(coffee_sampling_distribution_5$data, is.factor)], function(col) length(unique(col)))) else 0), "eccc2")), "91db80112f34a6e7e1ce5fe8403a7f97"))

stopifnot("type of is.character(coffee_sampling_distribution_5$labels$title) is not logical"= setequal(digest(paste(toString(class(is.character(coffee_sampling_distribution_5$labels$title))), "eccc3")), "3643482ed82fdfcd93fb9065de87115d"))
stopifnot("logical value of is.character(coffee_sampling_distribution_5$labels$title) is not correct"= setequal(digest(paste(toString(is.character(coffee_sampling_distribution_5$labels$title)), "eccc3")), "5f75655c3d5bc5dfb32c71764eadd9cd"))

print('Success!')

**Question 1.9** 
<br> {points: 3}

Describe in words the distribution above, comment on the shape, center and how spread out the distribution is. Compare this sampling distribution to the population distribution above. 

DOUBLE CLICK TO EDIT **THIS CELL** AND REPLACE THIS TEXT WITH YOUR ANSWER.

**Question 2.0** 
<br> {points: 1}

Draw 1500 random samples from `coffee_data`. Each sample should have 30 observations. Assign this data frame to an object called `coffee_samples_30`.

Group by the sample replicate number, and then for each sample, calculate the mean. Name the data frame `coffee_sample_estimates_30`. The data frame should have the column names `replicate` and `coffee_mean_cups_30`.

Finally, create a plot of the sampling distribution called `coffee_sampling_distribution_30`.

> Hint: use `xlim` to control the x-axis limits so that they are similar to those in the histogram above. This will make it easier to compare this histogram with that one.

In [None]:
set.seed(4321) # DO NOT CHANGE!

# your code here
fail() # No Answer - remove if you provide an answer
coffee_sampling_distribution_30

In [None]:
library(digest)
stopifnot("coffee_samples_30 should be a data frame"= setequal(digest(paste(toString('data.frame' %in% class(coffee_samples_30)), "4cfac")), "c83e699a536f8e1bcb819767c3dffaf9"))
stopifnot("dimensions of coffee_samples_30 are not correct"= setequal(digest(paste(toString(dim(coffee_samples_30)), "4cfac")), "964c47b99b6f1ebd64b273b9f3dbb711"))
stopifnot("column names of coffee_samples_30 are not correct"= setequal(digest(paste(toString(sort(colnames(coffee_samples_30))), "4cfac")), "6d4e8395355ab9a8779e2644786b2bbc"))
stopifnot("types of columns in coffee_samples_30 are not correct"= setequal(digest(paste(toString(sort(unlist(sapply(coffee_samples_30, class)))), "4cfac")), "01180166042829a71827f9551c8584c7"))
stopifnot("values in one or more numerical columns in coffee_samples_30 are not correct"= setequal(digest(paste(toString(if (any(sapply(coffee_samples_30, is.numeric))) sort(round(sapply(coffee_samples_30[, sapply(coffee_samples_30, is.numeric)], sum, na.rm = TRUE), 2)) else 0), "4cfac")), "7092ef75e92465325950d0d6f5f2015f"))
stopifnot("values in one or more character columns in coffee_samples_30 are not correct"= setequal(digest(paste(toString(if (any(sapply(coffee_samples_30, is.character))) sum(sapply(coffee_samples_30[sapply(coffee_samples_30, is.character)], function(x) length(unique(x)))) else 0), "4cfac")), "20c41b829968e8a674a03493584eeb38"))
stopifnot("values in one or more factor columns in coffee_samples_30 are not correct"= setequal(digest(paste(toString(if (any(sapply(coffee_samples_30, is.factor))) sum(sapply(coffee_samples_30[, sapply(coffee_samples_30, is.factor)], function(col) length(unique(col)))) else 0), "4cfac")), "20c41b829968e8a674a03493584eeb38"))

stopifnot("coffee_sample_estimates_30 should be a data frame"= setequal(digest(paste(toString('data.frame' %in% class(coffee_sample_estimates_30)), "4cfad")), "7b76d9cf5ce86469c24eb35ef0c81cb0"))
stopifnot("dimensions of coffee_sample_estimates_30 are not correct"= setequal(digest(paste(toString(dim(coffee_sample_estimates_30)), "4cfad")), "0a819cf77ac5f14f7293f384573f8de4"))
stopifnot("column names of coffee_sample_estimates_30 are not correct"= setequal(digest(paste(toString(sort(colnames(coffee_sample_estimates_30))), "4cfad")), "7be19bdc916d1847a0022d49d866df64"))
stopifnot("types of columns in coffee_sample_estimates_30 are not correct"= setequal(digest(paste(toString(sort(unlist(sapply(coffee_sample_estimates_30, class)))), "4cfad")), "d97886b6c2acecf91cb5ec648033a233"))
stopifnot("values in one or more numerical columns in coffee_sample_estimates_30 are not correct"= setequal(digest(paste(toString(if (any(sapply(coffee_sample_estimates_30, is.numeric))) sort(round(sapply(coffee_sample_estimates_30[, sapply(coffee_sample_estimates_30, is.numeric)], sum, na.rm = TRUE), 2)) else 0), "4cfad")), "cd3cd563b6b7024a4864ba13f99c5102"))
stopifnot("values in one or more character columns in coffee_sample_estimates_30 are not correct"= setequal(digest(paste(toString(if (any(sapply(coffee_sample_estimates_30, is.character))) sum(sapply(coffee_sample_estimates_30[sapply(coffee_sample_estimates_30, is.character)], function(x) length(unique(x)))) else 0), "4cfad")), "c95de78a71dc6caa5ef3c06eea91045c"))
stopifnot("values in one or more factor columns in coffee_sample_estimates_30 are not correct"= setequal(digest(paste(toString(if (any(sapply(coffee_sample_estimates_30, is.factor))) sum(sapply(coffee_sample_estimates_30[, sapply(coffee_sample_estimates_30, is.factor)], function(col) length(unique(col)))) else 0), "4cfad")), "c95de78a71dc6caa5ef3c06eea91045c"))

stopifnot("type of plot is not correct (if you are using two types of geoms, try flipping the order of the geom objects!)"= setequal(digest(paste(toString(sapply(seq_len(length(coffee_sampling_distribution_30$layers)), function(i) {c(class(coffee_sampling_distribution_30$layers[[i]]$geom))[1]})), "4cfae")), "53613c596abe4c660c90c16c85a401d1"))
stopifnot("variable x is not correct"= setequal(digest(paste(toString(unlist(lapply(sapply(seq_len(length(coffee_sampling_distribution_30$layers)), function(i) {rlang::get_expr(c(coffee_sampling_distribution_30$layers[[i]]$mapping, coffee_sampling_distribution_30$mapping)$x)}), as.character))), "4cfae")), "41baf5978b5ab0cde5c622bd26f3baa4"))
stopifnot("variable y is not correct"= setequal(digest(paste(toString(unlist(lapply(sapply(seq_len(length(coffee_sampling_distribution_30$layers)), function(i) {rlang::get_expr(c(coffee_sampling_distribution_30$layers[[i]]$mapping, coffee_sampling_distribution_30$mapping)$y)}), as.character))), "4cfae")), "0ff3d0a60a22cd3162eafdb46dd46a12"))
stopifnot("x-axis label is not descriptive, nicely formatted, or human readable"= setequal(digest(paste(toString(rlang::get_expr(c(coffee_sampling_distribution_30$layers[[1]]$mapping, coffee_sampling_distribution_30$mapping)$x)!= coffee_sampling_distribution_30$labels$x), "4cfae")), "5716d46a14bf91774c862b591c1c03d5"))
stopifnot("y-axis label is not descriptive, nicely formatted, or human readable"= setequal(digest(paste(toString(rlang::get_expr(c(coffee_sampling_distribution_30$layers[[1]]$mapping, coffee_sampling_distribution_30$mapping)$y)!= coffee_sampling_distribution_30$labels$y), "4cfae")), "0ff3d0a60a22cd3162eafdb46dd46a12"))
stopifnot("incorrect colour variable in coffee_sampling_distribution_30, specify a correct one if required"= setequal(digest(paste(toString(rlang::get_expr(c(coffee_sampling_distribution_30$layers[[1]]$mapping, coffee_sampling_distribution_30$mapping)$colour)), "4cfae")), "0ff3d0a60a22cd3162eafdb46dd46a12"))
stopifnot("incorrect shape variable in coffee_sampling_distribution_30, specify a correct one if required"= setequal(digest(paste(toString(rlang::get_expr(c(coffee_sampling_distribution_30$layers[[1]]$mapping, coffee_sampling_distribution_30$mapping)$shape)), "4cfae")), "0ff3d0a60a22cd3162eafdb46dd46a12"))
stopifnot("the colour label in coffee_sampling_distribution_30 is not descriptive, nicely formatted, or human readable"= setequal(digest(paste(toString(rlang::get_expr(c(coffee_sampling_distribution_30$layers[[1]]$mapping, coffee_sampling_distribution_30$mapping)$colour) != coffee_sampling_distribution_30$labels$colour), "4cfae")), "0ff3d0a60a22cd3162eafdb46dd46a12"))
stopifnot("the shape label in coffee_sampling_distribution_30 is not descriptive, nicely formatted, or human readable"= setequal(digest(paste(toString(rlang::get_expr(c(coffee_sampling_distribution_30$layers[[1]]$mapping, coffee_sampling_distribution_30$mapping)$colour) != coffee_sampling_distribution_30$labels$shape), "4cfae")), "0ff3d0a60a22cd3162eafdb46dd46a12"))
stopifnot("fill variable in coffee_sampling_distribution_30 is not correct"= setequal(digest(paste(toString(quo_name(coffee_sampling_distribution_30$mapping$fill)), "4cfae")), "2a27d7a283d35712b0b9b6808ae8b027"))
stopifnot("fill label in coffee_sampling_distribution_30 is not informative"= setequal(digest(paste(toString((quo_name(coffee_sampling_distribution_30$mapping$fill) != coffee_sampling_distribution_30$labels$fill)), "4cfae")), "0ff3d0a60a22cd3162eafdb46dd46a12"))
stopifnot("position argument in coffee_sampling_distribution_30 is not correct"= setequal(digest(paste(toString(class(coffee_sampling_distribution_30$layers[[1]]$position)[1]), "4cfae")), "886506c1678babffd783afc38b95a28e"))

stopifnot("coffee_sampling_distribution_30$data should be a data frame"= setequal(digest(paste(toString('data.frame' %in% class(coffee_sampling_distribution_30$data)), "4cfaf")), "e480c43ce5c4c0e12b36acbef75ec8a4"))
stopifnot("dimensions of coffee_sampling_distribution_30$data are not correct"= setequal(digest(paste(toString(dim(coffee_sampling_distribution_30$data)), "4cfaf")), "f642045877f22fa7f466b2f256411c43"))
stopifnot("column names of coffee_sampling_distribution_30$data are not correct"= setequal(digest(paste(toString(sort(colnames(coffee_sampling_distribution_30$data))), "4cfaf")), "bfa96c87389de9898a622f79318d04cd"))
stopifnot("types of columns in coffee_sampling_distribution_30$data are not correct"= setequal(digest(paste(toString(sort(unlist(sapply(coffee_sampling_distribution_30$data, class)))), "4cfaf")), "803762b9fa066b56cbc888d1e9b62f47"))
stopifnot("values in one or more numerical columns in coffee_sampling_distribution_30$data are not correct"= setequal(digest(paste(toString(if (any(sapply(coffee_sampling_distribution_30$data, is.numeric))) sort(round(sapply(coffee_sampling_distribution_30$data[, sapply(coffee_sampling_distribution_30$data, is.numeric)], sum, na.rm = TRUE), 2)) else 0), "4cfaf")), "a970ca2837ef19ae729b3c56d941dc18"))
stopifnot("values in one or more character columns in coffee_sampling_distribution_30$data are not correct"= setequal(digest(paste(toString(if (any(sapply(coffee_sampling_distribution_30$data, is.character))) sum(sapply(coffee_sampling_distribution_30$data[sapply(coffee_sampling_distribution_30$data, is.character)], function(x) length(unique(x)))) else 0), "4cfaf")), "ea923c37773f1f11729231b40f33a481"))
stopifnot("values in one or more factor columns in coffee_sampling_distribution_30$data are not correct"= setequal(digest(paste(toString(if (any(sapply(coffee_sampling_distribution_30$data, is.factor))) sum(sapply(coffee_sampling_distribution_30$data[, sapply(coffee_sampling_distribution_30$data, is.factor)], function(col) length(unique(col)))) else 0), "4cfaf")), "ea923c37773f1f11729231b40f33a481"))

stopifnot("type of is.character(coffee_sampling_distribution_30$labels$title) is not logical"= setequal(digest(paste(toString(class(is.character(coffee_sampling_distribution_30$labels$title))), "4cfb0")), "ec1e932af1638659a19c4851c7117643"))
stopifnot("logical value of is.character(coffee_sampling_distribution_30$labels$title) is not correct"= setequal(digest(paste(toString(is.character(coffee_sampling_distribution_30$labels$title)), "4cfb0")), "34fe66557b4aeabc5de2b4e8d81b65a9"))

stopifnot("type of !is.null(ggplot_build(coffee_sampling_distribution_30)$layout$panel_scales_x[[1]]$limits) is not logical"= setequal(digest(paste(toString(class(!is.null(ggplot_build(coffee_sampling_distribution_30)$layout$panel_scales_x[[1]]$limits))), "4cfb1")), "ccfe98d987d26353500fa0ef9de4b5d1"))
stopifnot("logical value of !is.null(ggplot_build(coffee_sampling_distribution_30)$layout$panel_scales_x[[1]]$limits) is not correct"= setequal(digest(paste(toString(!is.null(ggplot_build(coffee_sampling_distribution_30)$layout$panel_scales_x[[1]]$limits)), "4cfb1")), "db1963bb844af324693805000fd6dd8c"))

print('Success!')

**Question 2.1** 
<br> {points: 3}

Describe in words the distribution above, comment on the shape, center and how spread out the distribution is. Compare this sampling distribution with samples of size 30 to the sampling distribution with samples of size 5. 

DOUBLE CLICK TO EDIT **THIS CELL** AND REPLACE THIS TEXT WITH YOUR ANSWER.

In [None]:
source('cleanup.R')