# Worksheet 11 - Introduction to Statistical Inference

### Lecture and Tutorial Learning Goals:

After completing this week's lecture and tutorial work, you will be able to:
- Describe real world examples of questions that can be answered with the statistical inference methods.
- Name common population parameters (e.g., mean, proportion, median, variance, standard deviation) that are often estimated using sample data, and use computation to estimate these.
- Define the following statistical sampling terms (population, sample, population parameter, point estimate, sampling distribution).
- Explain the difference between a population parameter and sample point estimate.
- Use computation to draw random samples from a finite population.
- Use computation to create a sampling distribution from a finite population.
- Describe how sample size influences the sampling distribution.

This worksheet covers parts of [the Inference chapter](https://datasciencebook.ca/inference.html) of the online textbook. You should read this chapter before attempting the worksheet.

In [None]:
### Run this cell before continuing.
library(tidyverse)
library(repr)
library(infer)
library(cowplot)
options(repr.matrix.max.rows = 6)
source('cleanup.R')

**Question 1.1** Matching:
<br> {points: 1}

Read the mixed up table below and assign the variables in the code cell below a number to match the the term to its correct definition. Do not put quotations around the number or include words in the answer, we are expecting the assigned values to be numbers.

| Terms |  Definitions |
|----------------|------------|
| <p align="left">point estimate | <p align="left">1. the entire set of entities/objects of interest |
| <p align="left">population | <p align="left">2. selecting a subset of observations from a population where each observation is equally likely to be selected at any point during the selection process|
| <p align="left">random sampling | <p align="left">3. a numerical summary value about the population |
| <p align="left">representative sampling | <p align="left">4. a distribution of point estimates, where each point estimate was calculated from a different random sample from the same population |
| <p align="left">population parameter | <p align="left">5. a collection of observations from a population |
| <p align="left">sample |  <p align="left">6. a single number calculated from a random sample that estimates an unknown population parameter of interest |
| <p align="left">observation | <p align="left">7. selecting a subset of observations from a population where the sample’s characteristics are a good representation of the population’s characteristics |
| <p align="left">sampling distribution | <p align="left">8. a quantity or a quality (or set of these) we collect from a given entity/object |

In [None]:
point_estimate <- NULL
population <- NULL
random_sampling <- NULL
representative_sampling <- NULL
population_parameter <- NULL
sample <- NULL
observation <- NULL
sampling_distribution <- NULL

# your code here
fail() # No Answer - remove if you provide an answer

In [None]:
library(digest)
stopifnot("type of point_estimate is not numeric"= setequal(digest(paste(toString(class(point_estimate)), "9b235")), "781ed9ff784a2f7eba5148838e0a6689"))
stopifnot("value of point_estimate is not correct (rounded to 2 decimal places)"= setequal(digest(paste(toString(round(point_estimate, 2)), "9b235")), "8aa91a1ff866cd59356d3efb1bab8238"))
stopifnot("length of point_estimate is not correct"= setequal(digest(paste(toString(length(point_estimate)), "9b235")), "dc04dba09f286a3fcdc10fab298ac846"))
stopifnot("values of point_estimate are not correct"= setequal(digest(paste(toString(sort(point_estimate)), "9b235")), "8aa91a1ff866cd59356d3efb1bab8238"))

stopifnot("type of population is not numeric"= setequal(digest(paste(toString(class(population)), "9b236")), "5532024e26640aa2dfafcf8e1d314281"))
stopifnot("value of population is not correct (rounded to 2 decimal places)"= setequal(digest(paste(toString(round(population, 2)), "9b236")), "f8ff610d8f13df6b8182cccf6941f82e"))
stopifnot("length of population is not correct"= setequal(digest(paste(toString(length(population)), "9b236")), "f8ff610d8f13df6b8182cccf6941f82e"))
stopifnot("values of population are not correct"= setequal(digest(paste(toString(sort(population)), "9b236")), "f8ff610d8f13df6b8182cccf6941f82e"))

stopifnot("type of random_sampling is not numeric"= setequal(digest(paste(toString(class(random_sampling)), "9b237")), "4067661c5db4f7e82275342f83861c6e"))
stopifnot("value of random_sampling is not correct (rounded to 2 decimal places)"= setequal(digest(paste(toString(round(random_sampling, 2)), "9b237")), "a1e27ac484d1d714a302f07c2812de9a"))
stopifnot("length of random_sampling is not correct"= setequal(digest(paste(toString(length(random_sampling)), "9b237")), "0180179ac103a97c8299cef7ac335e52"))
stopifnot("values of random_sampling are not correct"= setequal(digest(paste(toString(sort(random_sampling)), "9b237")), "a1e27ac484d1d714a302f07c2812de9a"))

stopifnot("type of representative_sampling is not numeric"= setequal(digest(paste(toString(class(representative_sampling)), "9b238")), "49e4823dacd4fa53a9fe6c1c7e467d6e"))
stopifnot("value of representative_sampling is not correct (rounded to 2 decimal places)"= setequal(digest(paste(toString(round(representative_sampling, 2)), "9b238")), "cdd34d048e854877b7725cf210b7bb0b"))
stopifnot("length of representative_sampling is not correct"= setequal(digest(paste(toString(length(representative_sampling)), "9b238")), "2aeb13d4bc5c27ceb602ac74f28d1880"))
stopifnot("values of representative_sampling are not correct"= setequal(digest(paste(toString(sort(representative_sampling)), "9b238")), "cdd34d048e854877b7725cf210b7bb0b"))

stopifnot("type of population_parameter is not numeric"= setequal(digest(paste(toString(class(population_parameter)), "9b239")), "c3b0ee61e40abd16028e07dbaf612136"))
stopifnot("value of population_parameter is not correct (rounded to 2 decimal places)"= setequal(digest(paste(toString(round(population_parameter, 2)), "9b239")), "f753a32addd9a8a4db0b4f93619aa8af"))
stopifnot("length of population_parameter is not correct"= setequal(digest(paste(toString(length(population_parameter)), "9b239")), "44b695dc8fbf3e757d984f6a2c9f577e"))
stopifnot("values of population_parameter are not correct"= setequal(digest(paste(toString(sort(population_parameter)), "9b239")), "f753a32addd9a8a4db0b4f93619aa8af"))

stopifnot("type of sample is not numeric"= setequal(digest(paste(toString(class(sample)), "9b23a")), "979a9c67bc8b1fa370ce7836a4603e38"))
stopifnot("value of sample is not correct (rounded to 2 decimal places)"= setequal(digest(paste(toString(round(sample, 2)), "9b23a")), "392c8035eea59e8c23c949b377466553"))
stopifnot("length of sample is not correct"= setequal(digest(paste(toString(length(sample)), "9b23a")), "dc1fecc530a25ac0b386c1ef2e635922"))
stopifnot("values of sample are not correct"= setequal(digest(paste(toString(sort(sample)), "9b23a")), "392c8035eea59e8c23c949b377466553"))

stopifnot("type of observation is not numeric"= setequal(digest(paste(toString(class(observation)), "9b23b")), "ec4d0b07d8e307ffa955469a0dde68b8"))
stopifnot("value of observation is not correct (rounded to 2 decimal places)"= setequal(digest(paste(toString(round(observation, 2)), "9b23b")), "0b1e164096e8291ca5c820356adac9d0"))
stopifnot("length of observation is not correct"= setequal(digest(paste(toString(length(observation)), "9b23b")), "7353b306a548bc19c81d784442f1b21e"))
stopifnot("values of observation are not correct"= setequal(digest(paste(toString(sort(observation)), "9b23b")), "0b1e164096e8291ca5c820356adac9d0"))

stopifnot("type of sampling_distribution is not numeric"= setequal(digest(paste(toString(class(sampling_distribution)), "9b23c")), "fb8706a1ecb6ab2c9c4fc4aaf65af966"))
stopifnot("value of sampling_distribution is not correct (rounded to 2 decimal places)"= setequal(digest(paste(toString(round(sampling_distribution, 2)), "9b23c")), "0cec308c1ecf854a4f4f55b3a95c3722"))
stopifnot("length of sampling_distribution is not correct"= setequal(digest(paste(toString(length(sampling_distribution)), "9b23c")), "231ff24a49b61eac2d39b764ad40fada"))
stopifnot("values of sampling_distribution are not correct"= setequal(digest(paste(toString(sort(sampling_distribution)), "9b23c")), "0cec308c1ecf854a4f4f55b3a95c3722"))

print('Success!')

###  Virtual sampling simulation

In real life, we rarely, if ever, have measurements for our entire population. Here, however, we will pretend that we somehow were able to ask every single Canadian senior what their age is. We will do this so that we can experiment to learn about sampling and how this relates to estimation.

Here we make a simulated dataset of ages for our population (all Canadian seniors) bounded by realistic values ($\geq$ 65 and $\leq$ 117):

In [None]:
# run this cell to simulate a finite population
set.seed(4321) # DO NOT CHANGE
can_seniors <- tibble(age = (rexp(2000000, rate = 0.1)^2) + 65) |> 
    filter(age <= 117, age >= 65)
can_seniors

**Question 1.2** 
<br> {points: 1}

A distribution defines all the possible values (or intervals) of the data and how often they occur. Visualize the distribution of the population (`can_seniors`) that was just created by plotting a histogram using `binwidth = 1` in the `geom_histogram` argument. Name the plot `pop_dist` and give the x-axis a descriptive label.

In [None]:
options(repr.plot.width = 8, repr.plot.height = 7)
# ... <- ggplot(..., ...) + 
#    geom_...(...) +
#    ... +
#    ggtitle("Population distribution")

# your code here
fail() # No Answer - remove if you provide an answer
pop_dist

In [None]:
library(digest)
stopifnot("type of plot is not correct (if you are using two types of geoms, try flipping the order of the geom objects!)"= setequal(digest(paste(toString(sapply(seq_len(length(pop_dist$layers)), function(i) {c(class(pop_dist$layers[[i]]$geom))[1]})), "412cb")), "d3cd534e820a68733f60716c854f7e85"))
stopifnot("variable x is not correct"= setequal(digest(paste(toString(unlist(lapply(sapply(seq_len(length(pop_dist$layers)), function(i) {rlang::get_expr(c(pop_dist$layers[[i]]$mapping, pop_dist$mapping)$x)}), as.character))), "412cb")), "0920807061fdc4c3151c0cb626840389"))
stopifnot("variable y is not correct"= setequal(digest(paste(toString(unlist(lapply(sapply(seq_len(length(pop_dist$layers)), function(i) {rlang::get_expr(c(pop_dist$layers[[i]]$mapping, pop_dist$mapping)$y)}), as.character))), "412cb")), "903648ce87afa74198f17a0d4ba3f39b"))
stopifnot("x-axis label is not descriptive, nicely formatted, or human readable"= setequal(digest(paste(toString(rlang::get_expr(c(pop_dist$layers[[1]]$mapping, pop_dist$mapping)$x)!= pop_dist$labels$x), "412cb")), "ec8527910809970d9e144c5eae4ef081"))
stopifnot("y-axis label is not descriptive, nicely formatted, or human readable"= setequal(digest(paste(toString(rlang::get_expr(c(pop_dist$layers[[1]]$mapping, pop_dist$mapping)$y)!= pop_dist$labels$y), "412cb")), "903648ce87afa74198f17a0d4ba3f39b"))
stopifnot("incorrect colour variable in pop_dist, specify a correct one if required"= setequal(digest(paste(toString(rlang::get_expr(c(pop_dist$layers[[1]]$mapping, pop_dist$mapping)$colour)), "412cb")), "903648ce87afa74198f17a0d4ba3f39b"))
stopifnot("incorrect shape variable in pop_dist, specify a correct one if required"= setequal(digest(paste(toString(rlang::get_expr(c(pop_dist$layers[[1]]$mapping, pop_dist$mapping)$shape)), "412cb")), "903648ce87afa74198f17a0d4ba3f39b"))
stopifnot("the colour label in pop_dist is not descriptive, nicely formatted, or human readable"= setequal(digest(paste(toString(rlang::get_expr(c(pop_dist$layers[[1]]$mapping, pop_dist$mapping)$colour) != pop_dist$labels$colour), "412cb")), "903648ce87afa74198f17a0d4ba3f39b"))
stopifnot("the shape label in pop_dist is not descriptive, nicely formatted, or human readable"= setequal(digest(paste(toString(rlang::get_expr(c(pop_dist$layers[[1]]$mapping, pop_dist$mapping)$colour) != pop_dist$labels$shape), "412cb")), "903648ce87afa74198f17a0d4ba3f39b"))
stopifnot("fill variable in pop_dist is not correct"= setequal(digest(paste(toString(quo_name(pop_dist$mapping$fill)), "412cb")), "998d49a135882936405d3e1143deeb87"))
stopifnot("fill label in pop_dist is not informative"= setequal(digest(paste(toString((quo_name(pop_dist$mapping$fill) != pop_dist$labels$fill)), "412cb")), "903648ce87afa74198f17a0d4ba3f39b"))
stopifnot("position argument in pop_dist is not correct"= setequal(digest(paste(toString(class(pop_dist$layers[[1]]$position)[1]), "412cb")), "3564d7df2985eeae823c3a2359804970"))

stopifnot("pop_dist$data should be a data frame"= setequal(digest(paste(toString('data.frame' %in% class(pop_dist$data)), "412cc")), "e31a64ea69aed860d2c6cbba95f8169e"))
stopifnot("dimensions of pop_dist$data are not correct"= setequal(digest(paste(toString(dim(pop_dist$data)), "412cc")), "1590c0bd36db02a0291fa566500562d0"))
stopifnot("column names of pop_dist$data are not correct"= setequal(digest(paste(toString(sort(colnames(pop_dist$data))), "412cc")), "fc916efdd23651968a0bd602aca824d0"))
stopifnot("types of columns in pop_dist$data are not correct"= setequal(digest(paste(toString(sort(unlist(sapply(pop_dist$data, class)))), "412cc")), "0c9a7aa95cbb2e190ded1ce870b54826"))
stopifnot("values in one or more numerical columns in pop_dist$data are not correct"= setequal(digest(paste(toString(if (any(sapply(pop_dist$data, is.numeric))) sort(round(sapply(pop_dist$data[, sapply(pop_dist$data, is.numeric)], sum, na.rm = TRUE), 2)) else 0), "412cc")), "84fe2bae08a9c6717398b07e0cc3115e"))
stopifnot("values in one or more character columns in pop_dist$data are not correct"= setequal(digest(paste(toString(if (any(sapply(pop_dist$data, is.character))) sum(sapply(pop_dist$data[sapply(pop_dist$data, is.character)], function(x) length(unique(x)))) else 0), "412cc")), "d7d133806d686e3ba2e7e036a482c2a5"))
stopifnot("values in one or more factor columns in pop_dist$data are not correct"= setequal(digest(paste(toString(if (any(sapply(pop_dist$data, is.factor))) sum(sapply(pop_dist$data[, sapply(pop_dist$data, is.factor)], function(col) length(unique(col)))) else 0), "412cc")), "d7d133806d686e3ba2e7e036a482c2a5"))

print('Success!')

**Question 1.3** 
<br> {points: 1}

Distributions are complicated to communicate, thus we often want to represent them by a single value or small number of values. Common values used for this include the mean, median, standard deviation, etc). 

Use `summarize` to calculate the following population parameters from the `can_seniors` population:
- mean (use the `mean` function)
- median (use the `median` function)
- standard deviation (use the `sd` function)

Name this data frame `pop_parameters` which has the column names `pop_mean`, `pop_med` and `pop_sd`.

In [None]:
# your code here
fail() # No Answer - remove if you provide an answer
pop_parameters

In [None]:
library(digest)
stopifnot("pop_parameters should be a data frame"= setequal(digest(paste(toString('data.frame' %in% class(pop_parameters)), "68af6")), "4e65a30af85031d595db75996a676fe2"))
stopifnot("dimensions of pop_parameters are not correct"= setequal(digest(paste(toString(dim(pop_parameters)), "68af6")), "54722d32a7899e90a036093bbd66c853"))
stopifnot("column names of pop_parameters are not correct"= setequal(digest(paste(toString(sort(colnames(pop_parameters))), "68af6")), "403cec9900fc7098b86523768f2bd015"))
stopifnot("types of columns in pop_parameters are not correct"= setequal(digest(paste(toString(sort(unlist(sapply(pop_parameters, class)))), "68af6")), "b3bbc86f8bc85670ccb3f571a6861559"))
stopifnot("values in one or more numerical columns in pop_parameters are not correct"= setequal(digest(paste(toString(if (any(sapply(pop_parameters, is.numeric))) sort(round(sapply(pop_parameters[, sapply(pop_parameters, is.numeric)], sum, na.rm = TRUE), 2)) else 0), "68af6")), "f95ce656c4f3d0fd52addb94aa4cfabe"))
stopifnot("values in one or more character columns in pop_parameters are not correct"= setequal(digest(paste(toString(if (any(sapply(pop_parameters, is.character))) sum(sapply(pop_parameters[sapply(pop_parameters, is.character)], function(x) length(unique(x)))) else 0), "68af6")), "37897467ad480d53cfa9a62769c1d11c"))
stopifnot("values in one or more factor columns in pop_parameters are not correct"= setequal(digest(paste(toString(if (any(sapply(pop_parameters, is.factor))) sum(sapply(pop_parameters[, sapply(pop_parameters, is.factor)], function(col) length(unique(col)))) else 0), "68af6")), "37897467ad480d53cfa9a62769c1d11c"))

print('Success!')

**Question 1.4** 
<br> {points: 1}

In real life, we usually are able to only collect a single sample from the population. We use that sample to try to infer what the population looks like.

Take a single random sample of 40 observations from the Canadian seniors population (`can_seniors`). Name it `sample_1`. Use 4321 as your seed.

In [None]:
set.seed(4321) # DO NOT CHANGE!
# ... <- ... |> 
#    rep_sample_n(...)
# your code here
fail() # No Answer - remove if you provide an answer
sample_1

In [None]:
library(digest)
stopifnot("sample_1 should be a data frame"= setequal(digest(paste(toString('data.frame' %in% class(sample_1)), "b9471")), "5422d1a740943519b5b51d87d9e5a710"))
stopifnot("dimensions of sample_1 are not correct"= setequal(digest(paste(toString(dim(sample_1)), "b9471")), "34a4f02949b878f85f8f53523f56bd4e"))
stopifnot("column names of sample_1 are not correct"= setequal(digest(paste(toString(sort(colnames(sample_1))), "b9471")), "dfe04e57d7f9512b8123d8c99aa41eef"))
stopifnot("types of columns in sample_1 are not correct"= setequal(digest(paste(toString(sort(unlist(sapply(sample_1, class)))), "b9471")), "9a51ad8e9e763778f695e88744253d1e"))
stopifnot("values in one or more numerical columns in sample_1 are not correct"= setequal(digest(paste(toString(if (any(sapply(sample_1, is.numeric))) sort(round(sapply(sample_1[, sapply(sample_1, is.numeric)], sum, na.rm = TRUE), 2)) else 0), "b9471")), "765568358c1cc76f03598b2fadb7b47f"))
stopifnot("values in one or more character columns in sample_1 are not correct"= setequal(digest(paste(toString(if (any(sapply(sample_1, is.character))) sum(sapply(sample_1[sapply(sample_1, is.character)], function(x) length(unique(x)))) else 0), "b9471")), "1aa2d6772b2405fd36a3e10f6cc29947"))
stopifnot("values in one or more factor columns in sample_1 are not correct"= setequal(digest(paste(toString(if (any(sapply(sample_1, is.factor))) sum(sapply(sample_1[, sapply(sample_1, is.factor)], function(col) length(unique(col)))) else 0), "b9471")), "1aa2d6772b2405fd36a3e10f6cc29947"))

print('Success!')

**Question 1.5** 
<br> {points: 1}

Visualize the distribution of the random sample you just took (`sample_1`) that was just created by plotting a histogram using `binwidth = 1` in the `geom_histogram` argument. Name the plot `sample_1_dist` and give the plot the title "Sample 1 Distribution" (using `ggtitle`) and the x-axis a descriptive label.

In [None]:
options(repr.plot.width = 8, repr.plot.height = 7)
# your code here
fail() # No Answer - remove if you provide an answer
sample_1_dist

In [None]:
library(digest)
stopifnot("type of plot is not correct (if you are using two types of geoms, try flipping the order of the geom objects!)"= setequal(digest(paste(toString(sapply(seq_len(length(sample_1_dist$layers)), function(i) {c(class(sample_1_dist$layers[[i]]$geom))[1]})), "bf136")), "1cc69c21da0eca6c4749e256537b7574"))
stopifnot("variable x is not correct"= setequal(digest(paste(toString(unlist(lapply(sapply(seq_len(length(sample_1_dist$layers)), function(i) {rlang::get_expr(c(sample_1_dist$layers[[i]]$mapping, sample_1_dist$mapping)$x)}), as.character))), "bf136")), "9ee712ab5d6d4a68220c903d7c32ba6d"))
stopifnot("variable y is not correct"= setequal(digest(paste(toString(unlist(lapply(sapply(seq_len(length(sample_1_dist$layers)), function(i) {rlang::get_expr(c(sample_1_dist$layers[[i]]$mapping, sample_1_dist$mapping)$y)}), as.character))), "bf136")), "e310f2d6b2602242bf49a01b09eafc97"))
stopifnot("x-axis label is not descriptive, nicely formatted, or human readable"= setequal(digest(paste(toString(rlang::get_expr(c(sample_1_dist$layers[[1]]$mapping, sample_1_dist$mapping)$x)!= sample_1_dist$labels$x), "bf136")), "b310b38e204cd97b363f66822b9b94b4"))
stopifnot("y-axis label is not descriptive, nicely formatted, or human readable"= setequal(digest(paste(toString(rlang::get_expr(c(sample_1_dist$layers[[1]]$mapping, sample_1_dist$mapping)$y)!= sample_1_dist$labels$y), "bf136")), "e310f2d6b2602242bf49a01b09eafc97"))
stopifnot("incorrect colour variable in sample_1_dist, specify a correct one if required"= setequal(digest(paste(toString(rlang::get_expr(c(sample_1_dist$layers[[1]]$mapping, sample_1_dist$mapping)$colour)), "bf136")), "e310f2d6b2602242bf49a01b09eafc97"))
stopifnot("incorrect shape variable in sample_1_dist, specify a correct one if required"= setequal(digest(paste(toString(rlang::get_expr(c(sample_1_dist$layers[[1]]$mapping, sample_1_dist$mapping)$shape)), "bf136")), "e310f2d6b2602242bf49a01b09eafc97"))
stopifnot("the colour label in sample_1_dist is not descriptive, nicely formatted, or human readable"= setequal(digest(paste(toString(rlang::get_expr(c(sample_1_dist$layers[[1]]$mapping, sample_1_dist$mapping)$colour) != sample_1_dist$labels$colour), "bf136")), "e310f2d6b2602242bf49a01b09eafc97"))
stopifnot("the shape label in sample_1_dist is not descriptive, nicely formatted, or human readable"= setequal(digest(paste(toString(rlang::get_expr(c(sample_1_dist$layers[[1]]$mapping, sample_1_dist$mapping)$colour) != sample_1_dist$labels$shape), "bf136")), "e310f2d6b2602242bf49a01b09eafc97"))
stopifnot("fill variable in sample_1_dist is not correct"= setequal(digest(paste(toString(quo_name(sample_1_dist$mapping$fill)), "bf136")), "5f4f3d0ed53a59b3bb5093d1bba7ca18"))
stopifnot("fill label in sample_1_dist is not informative"= setequal(digest(paste(toString((quo_name(sample_1_dist$mapping$fill) != sample_1_dist$labels$fill)), "bf136")), "e310f2d6b2602242bf49a01b09eafc97"))
stopifnot("position argument in sample_1_dist is not correct"= setequal(digest(paste(toString(class(sample_1_dist$layers[[1]]$position)[1]), "bf136")), "01e6171583d0bddca81fd880bc6f1d5d"))

stopifnot("sample_1_dist$data should be a data frame"= setequal(digest(paste(toString('data.frame' %in% class(sample_1_dist$data)), "bf137")), "f434c8d889b792854a88d460cdb49cdd"))
stopifnot("dimensions of sample_1_dist$data are not correct"= setequal(digest(paste(toString(dim(sample_1_dist$data)), "bf137")), "a840be5640908b4d88feb0b900b9e9af"))
stopifnot("column names of sample_1_dist$data are not correct"= setequal(digest(paste(toString(sort(colnames(sample_1_dist$data))), "bf137")), "aa2045dd3ddc237dd65d149740b9dc56"))
stopifnot("types of columns in sample_1_dist$data are not correct"= setequal(digest(paste(toString(sort(unlist(sapply(sample_1_dist$data, class)))), "bf137")), "52f2d393215ca26b73021e0aaa580c5d"))
stopifnot("values in one or more numerical columns in sample_1_dist$data are not correct"= setequal(digest(paste(toString(if (any(sapply(sample_1_dist$data, is.numeric))) sort(round(sapply(sample_1_dist$data[, sapply(sample_1_dist$data, is.numeric)], sum, na.rm = TRUE), 2)) else 0), "bf137")), "5202440751495da774a9a78378c01d1c"))
stopifnot("values in one or more character columns in sample_1_dist$data are not correct"= setequal(digest(paste(toString(if (any(sapply(sample_1_dist$data, is.character))) sum(sapply(sample_1_dist$data[sapply(sample_1_dist$data, is.character)], function(x) length(unique(x)))) else 0), "bf137")), "915ef1993b73b8683f97bcbb61339c21"))
stopifnot("values in one or more factor columns in sample_1_dist$data are not correct"= setequal(digest(paste(toString(if (any(sapply(sample_1_dist$data, is.factor))) sum(sapply(sample_1_dist$data[, sapply(sample_1_dist$data, is.factor)], function(col) length(unique(col)))) else 0), "bf137")), "915ef1993b73b8683f97bcbb61339c21"))

stopifnot("type of sample_1_dist$labels$title is not character"= setequal(digest(paste(toString(class(sample_1_dist$labels$title)), "bf138")), "12c75b2d1b6669a7781ce0cfc688ef64"))
stopifnot("length of sample_1_dist$labels$title is not correct"= setequal(digest(paste(toString(length(sample_1_dist$labels$title)), "bf138")), "235a543792293513b35243f795b7f9fe"))
stopifnot("value of sample_1_dist$labels$title is not correct"= setequal(digest(paste(toString(tolower(sample_1_dist$labels$title)), "bf138")), "edcb4bd06031860f0bf2b75ab5cd0c81"))
stopifnot("letters in string value of sample_1_dist$labels$title are correct but case is not correct"= setequal(digest(paste(toString(sample_1_dist$labels$title), "bf138")), "253126a492a49d7178ca03458a52a748"))

print('Success!')

**Question 1.6** 
<br> {points: 1}

Use `summarize` to calculate the following point estimates from the random sample you just took (`sample_1`):

* mean
* median
* standard deviation

Name this data frame `sample_1_estimates` which has the column names `sample_1_mean`, `sample_1_med` and `sample_1_sd`.

In [None]:
# your code here
fail() # No Answer - remove if you provide an answer
sample_1_estimates

In [None]:
library(digest)
stopifnot("sample_1_estimates should be a data frame"= setequal(digest(paste(toString('data.frame' %in% class(sample_1_estimates)), "22ff5")), "73a308f9ab5e9319ee9f11f8227b8a1d"))
stopifnot("dimensions of sample_1_estimates are not correct"= setequal(digest(paste(toString(dim(sample_1_estimates)), "22ff5")), "659ac7531f78dad1b6e36849e230f4c6"))
stopifnot("column names of sample_1_estimates are not correct"= setequal(digest(paste(toString(sort(colnames(sample_1_estimates))), "22ff5")), "58ada95b097a47eccf7eeca94afc165a"))
stopifnot("types of columns in sample_1_estimates are not correct"= setequal(digest(paste(toString(sort(unlist(sapply(sample_1_estimates, class)))), "22ff5")), "e0e4c45d7ef56048c5f802f141696325"))
stopifnot("values in one or more numerical columns in sample_1_estimates are not correct"= setequal(digest(paste(toString(if (any(sapply(sample_1_estimates, is.numeric))) sort(round(sapply(sample_1_estimates[, sapply(sample_1_estimates, is.numeric)], sum, na.rm = TRUE), 2)) else 0), "22ff5")), "345d78552f7ab44fe042a8d4a577fc3a"))
stopifnot("values in one or more character columns in sample_1_estimates are not correct"= setequal(digest(paste(toString(if (any(sapply(sample_1_estimates, is.character))) sum(sapply(sample_1_estimates[sapply(sample_1_estimates, is.character)], function(x) length(unique(x)))) else 0), "22ff5")), "bc4e5dd4170958a97ab0a34c0745a82e"))
stopifnot("values in one or more factor columns in sample_1_estimates are not correct"= setequal(digest(paste(toString(if (any(sapply(sample_1_estimates, is.factor))) sum(sapply(sample_1_estimates[, sapply(sample_1_estimates, is.factor)], function(col) length(unique(col)))) else 0), "22ff5")), "bc4e5dd4170958a97ab0a34c0745a82e"))

print('Success!')

Let's now compare our random sample to the population from which it was drawn. In `ggplot`, it is possible to display multiple charts together by using the function `plot_grid` from a separate package called `cowplot`. We can use the `ncol` parameter to control how many columns of plots the grid contains. Since we want to compare the distributions' shape and position on the x-axis, it is most effective to concatenate these charts vertically in a single column.

In [None]:
# run this code cell
options(repr.plot.width = 7, repr.plot.height = 7)
plot_grid(pop_dist, sample_1_dist, ncol = 1)

And now let's compare the point estimates (mean, median and standard deviation) with the true population parameters we were trying to estimate:

In [None]:
# run this cell
pop_parameters
sample_1_estimates |> select(-replicate)

**Question 1.7** Multiple Choice
<br> {points: 1}

After comparing the population and sample distributions above, and the true population parameters and the sample point estimates, which statement below **is not** correct:

A. The sample point estimates are close to the values for the true population parameters we are trying to estimate

B. The sample distribution is of a similar shape to the population distribution

C. The sample point estimates are identical to the values for the true population parameters we are trying to estimate

*Assign your answer to an object called `answer1.7`. Your answer should be a single character surrounded by quotes.*

In [None]:
# your code here
fail() # No Answer - remove if you provide an answer

In [None]:
library(digest)
stopifnot("type of answer1.7 is not character"= setequal(digest(paste(toString(class(answer1.7)), "4105c")), "059db055e1774479ef0b37f984a38c45"))
stopifnot("length of answer1.7 is not correct"= setequal(digest(paste(toString(length(answer1.7)), "4105c")), "843aa44a5c1e469e51a30133bebd035a"))
stopifnot("value of answer1.7 is not correct"= setequal(digest(paste(toString(tolower(answer1.7)), "4105c")), "7a07943b3940cc40511f4d53bd6fc5b2"))
stopifnot("letters in string value of answer1.7 are correct but case is not correct"= setequal(digest(paste(toString(answer1.7), "4105c")), "02b8ab9620a6b84d43a24479926f3f4c"))

print('Success!')

**Question 1.8.0** 
<br> {points: 1}

What if we took another sample? What would we expect? Let's try! 

Take another random sample of size 40 from population (using a different random seed this time so that you get a different sample), visualize its distribution (give the plot the title "Sample 2 Distribution" using `ggtitle`), and calculate the point estimates for the sample mean, median and standard deviation. Name your random sample of data `sample_2`, name your visualization  `sample_2_dist`, and finally name your estimates `sample_2_estimates`, which has the column names `sample_2_mean`, `sample_2_med` and `sample_2_sd`. 

In [None]:
set.seed(2020) # DO NOT CHANGE!

# your code here
fail() # No Answer - remove if you provide an answer

In [None]:
library(digest)
stopifnot("sample_2 should be a data frame"= setequal(digest(paste(toString('data.frame' %in% class(sample_2)), "c82ab")), "56c901bd657c7c813ea90d93eb147b64"))
stopifnot("dimensions of sample_2 are not correct"= setequal(digest(paste(toString(dim(sample_2)), "c82ab")), "dc4e31fe8682fd3b7dae96b1a1946f7b"))
stopifnot("column names of sample_2 are not correct"= setequal(digest(paste(toString(sort(colnames(sample_2))), "c82ab")), "09d1ae647663a1ac3f5143a0cccad0f0"))
stopifnot("types of columns in sample_2 are not correct"= setequal(digest(paste(toString(sort(unlist(sapply(sample_2, class)))), "c82ab")), "354827244bac68b1048cdaaf2ec49f68"))
stopifnot("values in one or more numerical columns in sample_2 are not correct"= setequal(digest(paste(toString(if (any(sapply(sample_2, is.numeric))) sort(round(sapply(sample_2[, sapply(sample_2, is.numeric)], sum, na.rm = TRUE), 2)) else 0), "c82ab")), "fbd18ffe1484bdc193d7151b6947291f"))
stopifnot("values in one or more character columns in sample_2 are not correct"= setequal(digest(paste(toString(if (any(sapply(sample_2, is.character))) sum(sapply(sample_2[sapply(sample_2, is.character)], function(x) length(unique(x)))) else 0), "c82ab")), "3c1e00b2dcd7c2dfa3df377fe58f5c2a"))
stopifnot("values in one or more factor columns in sample_2 are not correct"= setequal(digest(paste(toString(if (any(sapply(sample_2, is.factor))) sum(sapply(sample_2[, sapply(sample_2, is.factor)], function(col) length(unique(col)))) else 0), "c82ab")), "3c1e00b2dcd7c2dfa3df377fe58f5c2a"))

stopifnot("type of plot is not correct (if you are using two types of geoms, try flipping the order of the geom objects!)"= setequal(digest(paste(toString(sapply(seq_len(length(sample_2_dist$layers)), function(i) {c(class(sample_2_dist$layers[[i]]$geom))[1]})), "c82ac")), "32a806e56d3aeb9a18b827775e873bf7"))
stopifnot("variable x is not correct"= setequal(digest(paste(toString(unlist(lapply(sapply(seq_len(length(sample_2_dist$layers)), function(i) {rlang::get_expr(c(sample_2_dist$layers[[i]]$mapping, sample_2_dist$mapping)$x)}), as.character))), "c82ac")), "d125d6d704ee48f070873b70bc2ac792"))
stopifnot("variable y is not correct"= setequal(digest(paste(toString(unlist(lapply(sapply(seq_len(length(sample_2_dist$layers)), function(i) {rlang::get_expr(c(sample_2_dist$layers[[i]]$mapping, sample_2_dist$mapping)$y)}), as.character))), "c82ac")), "f3e7d5d77b4ab5aff30f0f2670103bb9"))
stopifnot("x-axis label is not descriptive, nicely formatted, or human readable"= setequal(digest(paste(toString(rlang::get_expr(c(sample_2_dist$layers[[1]]$mapping, sample_2_dist$mapping)$x)!= sample_2_dist$labels$x), "c82ac")), "46be0e6e00678cb2c091b353c6234520"))
stopifnot("y-axis label is not descriptive, nicely formatted, or human readable"= setequal(digest(paste(toString(rlang::get_expr(c(sample_2_dist$layers[[1]]$mapping, sample_2_dist$mapping)$y)!= sample_2_dist$labels$y), "c82ac")), "f3e7d5d77b4ab5aff30f0f2670103bb9"))
stopifnot("incorrect colour variable in sample_2_dist, specify a correct one if required"= setequal(digest(paste(toString(rlang::get_expr(c(sample_2_dist$layers[[1]]$mapping, sample_2_dist$mapping)$colour)), "c82ac")), "f3e7d5d77b4ab5aff30f0f2670103bb9"))
stopifnot("incorrect shape variable in sample_2_dist, specify a correct one if required"= setequal(digest(paste(toString(rlang::get_expr(c(sample_2_dist$layers[[1]]$mapping, sample_2_dist$mapping)$shape)), "c82ac")), "f3e7d5d77b4ab5aff30f0f2670103bb9"))
stopifnot("the colour label in sample_2_dist is not descriptive, nicely formatted, or human readable"= setequal(digest(paste(toString(rlang::get_expr(c(sample_2_dist$layers[[1]]$mapping, sample_2_dist$mapping)$colour) != sample_2_dist$labels$colour), "c82ac")), "f3e7d5d77b4ab5aff30f0f2670103bb9"))
stopifnot("the shape label in sample_2_dist is not descriptive, nicely formatted, or human readable"= setequal(digest(paste(toString(rlang::get_expr(c(sample_2_dist$layers[[1]]$mapping, sample_2_dist$mapping)$colour) != sample_2_dist$labels$shape), "c82ac")), "f3e7d5d77b4ab5aff30f0f2670103bb9"))
stopifnot("fill variable in sample_2_dist is not correct"= setequal(digest(paste(toString(quo_name(sample_2_dist$mapping$fill)), "c82ac")), "9171292a914ceebc299f583237fdabea"))
stopifnot("fill label in sample_2_dist is not informative"= setequal(digest(paste(toString((quo_name(sample_2_dist$mapping$fill) != sample_2_dist$labels$fill)), "c82ac")), "f3e7d5d77b4ab5aff30f0f2670103bb9"))
stopifnot("position argument in sample_2_dist is not correct"= setequal(digest(paste(toString(class(sample_2_dist$layers[[1]]$position)[1]), "c82ac")), "f92fb60a77b73873f58386dca1cf1a3c"))

stopifnot("sample_2_dist$data should be a data frame"= setequal(digest(paste(toString('data.frame' %in% class(sample_2_dist$data)), "c82ad")), "bfc7ea6fef3f419db9b34786fbc8512c"))
stopifnot("dimensions of sample_2_dist$data are not correct"= setequal(digest(paste(toString(dim(sample_2_dist$data)), "c82ad")), "302b554eb397f6bdb9e733ad0197e10a"))
stopifnot("column names of sample_2_dist$data are not correct"= setequal(digest(paste(toString(sort(colnames(sample_2_dist$data))), "c82ad")), "c7cdc4f0a27b8ffcb9c71c51fb7d62c3"))
stopifnot("types of columns in sample_2_dist$data are not correct"= setequal(digest(paste(toString(sort(unlist(sapply(sample_2_dist$data, class)))), "c82ad")), "1433cac2d6f619c126bf621d69fc0532"))
stopifnot("values in one or more numerical columns in sample_2_dist$data are not correct"= setequal(digest(paste(toString(if (any(sapply(sample_2_dist$data, is.numeric))) sort(round(sapply(sample_2_dist$data[, sapply(sample_2_dist$data, is.numeric)], sum, na.rm = TRUE), 2)) else 0), "c82ad")), "0939f8a1ebdb946623fddb9ef2e19103"))
stopifnot("values in one or more character columns in sample_2_dist$data are not correct"= setequal(digest(paste(toString(if (any(sapply(sample_2_dist$data, is.character))) sum(sapply(sample_2_dist$data[sapply(sample_2_dist$data, is.character)], function(x) length(unique(x)))) else 0), "c82ad")), "6c912de5a65a0a71384e7c0f622ac96c"))
stopifnot("values in one or more factor columns in sample_2_dist$data are not correct"= setequal(digest(paste(toString(if (any(sapply(sample_2_dist$data, is.factor))) sum(sapply(sample_2_dist$data[, sapply(sample_2_dist$data, is.factor)], function(col) length(unique(col)))) else 0), "c82ad")), "6c912de5a65a0a71384e7c0f622ac96c"))

stopifnot("sample_2_estimates should be a data frame"= setequal(digest(paste(toString('data.frame' %in% class(sample_2_estimates)), "c82ae")), "ba4596159b2953a5e7e757063d6f24fc"))
stopifnot("dimensions of sample_2_estimates are not correct"= setequal(digest(paste(toString(dim(sample_2_estimates)), "c82ae")), "eac8f8ddfb782a4d1b5160dc30a3ecdb"))
stopifnot("column names of sample_2_estimates are not correct"= setequal(digest(paste(toString(sort(colnames(sample_2_estimates))), "c82ae")), "c4d864a2e88b70c30e4cde5aa2eae164"))
stopifnot("types of columns in sample_2_estimates are not correct"= setequal(digest(paste(toString(sort(unlist(sapply(sample_2_estimates, class)))), "c82ae")), "20d7fe9d344e1cf348ee4df3f5b8a98f"))
stopifnot("values in one or more numerical columns in sample_2_estimates are not correct"= setequal(digest(paste(toString(if (any(sapply(sample_2_estimates, is.numeric))) sort(round(sapply(sample_2_estimates[, sapply(sample_2_estimates, is.numeric)], sum, na.rm = TRUE), 2)) else 0), "c82ae")), "1cbc82c5ef4749a0a14b86d406248f0e"))
stopifnot("values in one or more character columns in sample_2_estimates are not correct"= setequal(digest(paste(toString(if (any(sapply(sample_2_estimates, is.character))) sum(sapply(sample_2_estimates[sapply(sample_2_estimates, is.character)], function(x) length(unique(x)))) else 0), "c82ae")), "58aaa6366524d881457e00e5e00bbee9"))
stopifnot("values in one or more factor columns in sample_2_estimates are not correct"= setequal(digest(paste(toString(if (any(sapply(sample_2_estimates, is.factor))) sum(sapply(sample_2_estimates[, sapply(sample_2_estimates, is.factor)], function(col) length(unique(col)))) else 0), "c82ae")), "58aaa6366524d881457e00e5e00bbee9"))

print('Success!')

**Question 1.8.1** 
<br> {points: 1}

After comparing the distribution and point estimates of this second random sample from the population with that of the first random sample and the population, which of the following statements below **is not** correct:

A. The sample distributions from different random samples are of a similar shape to the population distribution, but they vary a bit depending which values are captured in the sample

B. The sample point estimates from different random samples are close to the values for the true population parameters we are trying to estimate, but they vary a bit depending which values are captured in the sample

C. Every random sample from the same population should have an identical set of values and yield identical point estimates.

*Assign your answer to an object called `answer1.8.1`. Your answer should be a single character surrounded by quotes.*

In [None]:
# your code here
fail() # No Answer - remove if you provide an answer

In [None]:
library(digest)
stopifnot("type of answer1.8.1 is not character"= setequal(digest(paste(toString(class(answer1.8.1)), "1849a")), "d41006676be3b4bc535fdc2e3cf05b55"))
stopifnot("length of answer1.8.1 is not correct"= setequal(digest(paste(toString(length(answer1.8.1)), "1849a")), "a6c726633580f74e138a942a9ff8518f"))
stopifnot("value of answer1.8.1 is not correct"= setequal(digest(paste(toString(tolower(answer1.8.1)), "1849a")), "cdc56a0c123017a9fd8bdd5bbcf0e530"))
stopifnot("letters in string value of answer1.8.1 are correct but case is not correct"= setequal(digest(paste(toString(answer1.8.1), "1849a")), "eb3532c01f3046d7fbf03780344ecb79"))

print('Success!')

### Exploring the sampling distribution of an estimate

Just how much should we expect the point estimates of our random samples to vary? To build an intuition for this, let's experiment a little more with our population of Canadian seniors. To do this we will take 1500 random samples, and then calculate the point estimate we are interested in (let's choose the mean for this example) for each sample. Finally, we will visualize the distribution of the sample point estimates. This distribution will tell us how much we would expect the point estimates of our random samples to vary for this population for samples of size 40 (the size of our samples).

**Question 1.9** 
<br> {points: 1}

Draw 1500 random samples from our population of Canadian seniors (`can_seniors`). Each sample should have 40 observations. Name the data frame `samples` and use the seed `4321`. Here we use the functions `head()`, `tail()` and `dim()` to view the first few rows, the last few rows and the dimension of the data set respectively. 

In [None]:
set.seed(4321) # DO NOT CHANGE!
# ... <- rep_sample_n(..., size = ..., reps = ...)
# your code here
fail() # No Answer - remove if you provide an answer
head(samples)
tail(samples)
dim(samples)

In [None]:
library(digest)
stopifnot("samples should be a data frame"= setequal(digest(paste(toString('data.frame' %in% class(samples)), "d4e61")), "b59549be47e374a55d03adc74d58a2c6"))
stopifnot("dimensions of samples are not correct"= setequal(digest(paste(toString(dim(samples)), "d4e61")), "6e238e3e012febab6b248461cb80f28c"))
stopifnot("column names of samples are not correct"= setequal(digest(paste(toString(sort(colnames(samples))), "d4e61")), "ec677e866597dbbea17afed3e591e3f1"))
stopifnot("types of columns in samples are not correct"= setequal(digest(paste(toString(sort(unlist(sapply(samples, class)))), "d4e61")), "979dfa0823872a77fd6c5d20e88a3af8"))
stopifnot("values in one or more numerical columns in samples are not correct"= setequal(digest(paste(toString(if (any(sapply(samples, is.numeric))) sort(round(sapply(samples[, sapply(samples, is.numeric)], sum, na.rm = TRUE), 2)) else 0), "d4e61")), "5d2fb03e92dfb5aed74820c76d7ab64a"))
stopifnot("values in one or more character columns in samples are not correct"= setequal(digest(paste(toString(if (any(sapply(samples, is.character))) sum(sapply(samples[sapply(samples, is.character)], function(x) length(unique(x)))) else 0), "d4e61")), "ca83b7b70d0bff348b8059c43f37b0d9"))
stopifnot("values in one or more factor columns in samples are not correct"= setequal(digest(paste(toString(if (any(sapply(samples, is.factor))) sum(sapply(samples[, sapply(samples, is.factor)], function(col) length(unique(col)))) else 0), "d4e61")), "ca83b7b70d0bff348b8059c43f37b0d9"))

print('Success!')

**Question 2.0** 
<br> {points: 1}

Group by the sample replicate number, and then for each sample, calculate the mean as the point estimate. Name the data frame `sample_estimates`. The data frame should have the column names `replicate` and `mean_age`.

In [None]:
# your code here
fail() # No Answer - remove if you provide an answer
head(sample_estimates)
tail(sample_estimates)

In [None]:
library(digest)
stopifnot("sample_estimates should be a data frame"= setequal(digest(paste(toString('data.frame' %in% class(sample_estimates)), "2045")), "b681cd4d2a06880a43e45587db19f080"))
stopifnot("dimensions of sample_estimates are not correct"= setequal(digest(paste(toString(dim(sample_estimates)), "2045")), "eb78d16d7dbf5ffa69dad627e292728d"))
stopifnot("column names of sample_estimates are not correct"= setequal(digest(paste(toString(sort(colnames(sample_estimates))), "2045")), "82e784c00014c369de019f2cf55fd703"))
stopifnot("types of columns in sample_estimates are not correct"= setequal(digest(paste(toString(sort(unlist(sapply(sample_estimates, class)))), "2045")), "1de55370ffc9005300404a4e5491cec1"))
stopifnot("values in one or more numerical columns in sample_estimates are not correct"= setequal(digest(paste(toString(if (any(sapply(sample_estimates, is.numeric))) sort(round(sapply(sample_estimates[, sapply(sample_estimates, is.numeric)], sum, na.rm = TRUE), 2)) else 0), "2045")), "f981e203dbaad7ec081f5410e1accb11"))
stopifnot("values in one or more character columns in sample_estimates are not correct"= setequal(digest(paste(toString(if (any(sapply(sample_estimates, is.character))) sum(sapply(sample_estimates[sapply(sample_estimates, is.character)], function(x) length(unique(x)))) else 0), "2045")), "2771708e720c5aa618f030e4ec5f7ccc"))
stopifnot("values in one or more factor columns in sample_estimates are not correct"= setequal(digest(paste(toString(if (any(sapply(sample_estimates, is.factor))) sum(sapply(sample_estimates[, sapply(sample_estimates, is.factor)], function(col) length(unique(col)))) else 0), "2045")), "2771708e720c5aa618f030e4ec5f7ccc"))

print('Success!')

**Question 2.1** 
<br> {points: 1}

Visualize the distribution of the sample estimates (`sample_estimates`) you just calculated by plotting a histogram using `binwidth = 1` in the `geom_histogram` argument. Name the plot `sampling_distribution`. Give the plot the title "Sampling Distribution of the Sample Means" using `ggtitle`, and give the x-axis a descriptive label.

In [None]:
options(repr.plot.width = 8, repr.plot.height = 7)
# your code here
fail() # No Answer - remove if you provide an answer
sampling_distribution

In [None]:
library(digest)
stopifnot("type of plot is not correct (if you are using two types of geoms, try flipping the order of the geom objects!)"= setequal(digest(paste(toString(sapply(seq_len(length(sampling_distribution$layers)), function(i) {c(class(sampling_distribution$layers[[i]]$geom))[1]})), "ae8ed")), "83b6aa3c135ee97044661ca42767a0e8"))
stopifnot("variable x is not correct"= setequal(digest(paste(toString(unlist(lapply(sapply(seq_len(length(sampling_distribution$layers)), function(i) {rlang::get_expr(c(sampling_distribution$layers[[i]]$mapping, sampling_distribution$mapping)$x)}), as.character))), "ae8ed")), "35b175b7bca47af1ff3102f49d826cb8"))
stopifnot("variable y is not correct"= setequal(digest(paste(toString(unlist(lapply(sapply(seq_len(length(sampling_distribution$layers)), function(i) {rlang::get_expr(c(sampling_distribution$layers[[i]]$mapping, sampling_distribution$mapping)$y)}), as.character))), "ae8ed")), "a40fba9a45f867cfc3856cc2979d48b6"))
stopifnot("x-axis label is not descriptive, nicely formatted, or human readable"= setequal(digest(paste(toString(rlang::get_expr(c(sampling_distribution$layers[[1]]$mapping, sampling_distribution$mapping)$x)!= sampling_distribution$labels$x), "ae8ed")), "7390d87bd1fee041de89fb13d8391876"))
stopifnot("y-axis label is not descriptive, nicely formatted, or human readable"= setequal(digest(paste(toString(rlang::get_expr(c(sampling_distribution$layers[[1]]$mapping, sampling_distribution$mapping)$y)!= sampling_distribution$labels$y), "ae8ed")), "a40fba9a45f867cfc3856cc2979d48b6"))
stopifnot("incorrect colour variable in sampling_distribution, specify a correct one if required"= setequal(digest(paste(toString(rlang::get_expr(c(sampling_distribution$layers[[1]]$mapping, sampling_distribution$mapping)$colour)), "ae8ed")), "a40fba9a45f867cfc3856cc2979d48b6"))
stopifnot("incorrect shape variable in sampling_distribution, specify a correct one if required"= setequal(digest(paste(toString(rlang::get_expr(c(sampling_distribution$layers[[1]]$mapping, sampling_distribution$mapping)$shape)), "ae8ed")), "a40fba9a45f867cfc3856cc2979d48b6"))
stopifnot("the colour label in sampling_distribution is not descriptive, nicely formatted, or human readable"= setequal(digest(paste(toString(rlang::get_expr(c(sampling_distribution$layers[[1]]$mapping, sampling_distribution$mapping)$colour) != sampling_distribution$labels$colour), "ae8ed")), "a40fba9a45f867cfc3856cc2979d48b6"))
stopifnot("the shape label in sampling_distribution is not descriptive, nicely formatted, or human readable"= setequal(digest(paste(toString(rlang::get_expr(c(sampling_distribution$layers[[1]]$mapping, sampling_distribution$mapping)$colour) != sampling_distribution$labels$shape), "ae8ed")), "a40fba9a45f867cfc3856cc2979d48b6"))
stopifnot("fill variable in sampling_distribution is not correct"= setequal(digest(paste(toString(quo_name(sampling_distribution$mapping$fill)), "ae8ed")), "a60f0ce747d9b15b0d506d482219c636"))
stopifnot("fill label in sampling_distribution is not informative"= setequal(digest(paste(toString((quo_name(sampling_distribution$mapping$fill) != sampling_distribution$labels$fill)), "ae8ed")), "a40fba9a45f867cfc3856cc2979d48b6"))
stopifnot("position argument in sampling_distribution is not correct"= setequal(digest(paste(toString(class(sampling_distribution$layers[[1]]$position)[1]), "ae8ed")), "d6968bc6de81667d25faae5ea0dc0c3e"))

stopifnot("type of sampling_distribution$labels$title is not character"= setequal(digest(paste(toString(class(sampling_distribution$labels$title)), "ae8ee")), "207d4bef5a0c18d6e6d52f957ac9bffd"))
stopifnot("length of sampling_distribution$labels$title is not correct"= setequal(digest(paste(toString(length(sampling_distribution$labels$title)), "ae8ee")), "6cc67bc04992c12a2a2d9fd8143434c1"))
stopifnot("value of sampling_distribution$labels$title is not correct"= setequal(digest(paste(toString(tolower(sampling_distribution$labels$title)), "ae8ee")), "ce2d919bca1b251486f9d8b3593e3b49"))
stopifnot("letters in string value of sampling_distribution$labels$title are correct but case is not correct"= setequal(digest(paste(toString(sampling_distribution$labels$title), "ae8ee")), "adead84bd8a87d0ef024ebdeedd7747e"))

stopifnot("sampling_distribution$data should be a data frame"= setequal(digest(paste(toString('data.frame' %in% class(sampling_distribution$data)), "ae8ef")), "337f01a54acf7cc0a0a01637b7b5e727"))
stopifnot("dimensions of sampling_distribution$data are not correct"= setequal(digest(paste(toString(dim(sampling_distribution$data)), "ae8ef")), "99d144ca0598e2ac2e3ea64de2342850"))
stopifnot("column names of sampling_distribution$data are not correct"= setequal(digest(paste(toString(sort(colnames(sampling_distribution$data))), "ae8ef")), "daf3728eab7016b8d14bf3424a5a95f1"))
stopifnot("types of columns in sampling_distribution$data are not correct"= setequal(digest(paste(toString(sort(unlist(sapply(sampling_distribution$data, class)))), "ae8ef")), "86a361ec6cfff98ebdffde9db43ad25d"))
stopifnot("values in one or more numerical columns in sampling_distribution$data are not correct"= setequal(digest(paste(toString(if (any(sapply(sampling_distribution$data, is.numeric))) sort(round(sapply(sampling_distribution$data[, sapply(sampling_distribution$data, is.numeric)], sum, na.rm = TRUE), 2)) else 0), "ae8ef")), "e38e33ab88392c7169e85044c69c2fb2"))
stopifnot("values in one or more character columns in sampling_distribution$data are not correct"= setequal(digest(paste(toString(if (any(sapply(sampling_distribution$data, is.character))) sum(sapply(sampling_distribution$data[sapply(sampling_distribution$data, is.character)], function(x) length(unique(x)))) else 0), "ae8ef")), "dd5ebef6e9c782dc577de380f42d7f5a"))
stopifnot("values in one or more factor columns in sampling_distribution$data are not correct"= setequal(digest(paste(toString(if (any(sapply(sampling_distribution$data, is.factor))) sum(sapply(sampling_distribution$data[, sapply(sampling_distribution$data, is.factor)], function(col) length(unique(col)))) else 0), "ae8ef")), "dd5ebef6e9c782dc577de380f42d7f5a"))

print('Success!')

**Question 2.2** 
<br> {points: 1}

Let's refresh our memories: what is the mean age of the whole population (we calculated this above)? *Assign your answer to an object called `answer2.2`. Your answer should be a single number reported to two decimal places.*


In [None]:
# your code here
fail() # No Answer - remove if you provide an answer
answer2.2

In [None]:
library(digest)
stopifnot("type of answer2.2 is not numeric"= setequal(digest(paste(toString(class(answer2.2)), "47686")), "955a3b68b850c5412b950aad72b641ca"))
stopifnot("value of answer2.2 is not correct (rounded to 2 decimal places)"= setequal(digest(paste(toString(round(answer2.2, 2)), "47686")), "567d7223481526901f9a61e0a776b163"))
stopifnot("length of answer2.2 is not correct"= setequal(digest(paste(toString(length(answer2.2)), "47686")), "19c9f6cb90f0a16f51da70184c6e4fe4"))
stopifnot("values of answer2.2 are not correct"= setequal(digest(paste(toString(sort(answer2.2)), "47686")), "567d7223481526901f9a61e0a776b163"))

print('Success!')

**Question 2.3** Multiple Choice
<br> {points: 1}

Considering the true value for the population mean, and the sampling distribution you created and visualized in **question 2.1**, which of the following statements below **is not** correct:

A. The sampling distribution is centered at the true population mean

B. All the sample means are the same value as the true population mean

C. Most sample means are at or very near the value of the true population mean

D. A few sample means are far away from the value of the true population mean

*Assign your answer to an object called `answer2.3`. Your answer should be a single character surrounded by quotes.*

In [None]:
# your code here
fail() # No Answer - remove if you provide an answer
answer2.3

In [None]:
library(digest)
stopifnot("type of answer2.3 is not character"= setequal(digest(paste(toString(class(answer2.3)), "3bc4b")), "31640ba597cd692a87ca7b9235c16676"))
stopifnot("length of answer2.3 is not correct"= setequal(digest(paste(toString(length(answer2.3)), "3bc4b")), "d103cabf249a396b065d2001c3bf3732"))
stopifnot("value of answer2.3 is not correct"= setequal(digest(paste(toString(tolower(answer2.3)), "3bc4b")), "c1357ff8f85bba489b4d3a382f48d3e6"))
stopifnot("letters in string value of answer2.3 are correct but case is not correct"= setequal(digest(paste(toString(answer2.3), "3bc4b")), "309613f595acf71014b6290830b6e150"))

print('Success!')

**Question 2.4** True/False
<br> {points: 1}

Taking a random sample and calculating a point estimate is a good way to get a "best guess" of the population parameter you are interested in. True or False?

*Assign your answer to an object called `answer2.4`. Your answer should be either "True" or "False", surrounded by quotes.*

In [None]:
# your code here
fail() # No Answer - remove if you provide an answer
answer2.4

In [None]:
library(digest)
stopifnot("type of answer2.4 is not character"= setequal(digest(paste(toString(class(answer2.4)), "a1f85")), "776cfa0eba589545ef8d698a8a01f38c"))
stopifnot("length of answer2.4 is not correct"= setequal(digest(paste(toString(length(answer2.4)), "a1f85")), "37b271dd78faa3855ea9937f77eb033a"))
stopifnot("value of answer2.4 is not correct"= setequal(digest(paste(toString(tolower(answer2.4)), "a1f85")), "bba4a5f3be9c0666ebb761243ab1f4f6"))
stopifnot("letters in string value of answer2.4 are correct but case is not correct"= setequal(digest(paste(toString(answer2.4), "a1f85")), "125a4dffe92127d1ea781f45129306f5"))

print('Success!')

### The influence of sample size on the sampling distribution

What happens to our point estimate when we change the sample size? Let's answer this question by experimenting! We will create 3 different sampling distributions of sample means, each using a different sample size. As we did above, we will draw samples from our Canadian seniors population. We will visualize these sampling distributions and see if we can see a pattern when we vary the sample size.

**Question 2.5** 
<br> {points: 1}

Using the same strategy as you did above, draw 1500 random samples from the Canadian seniors population (`can_seniors`), each of size 20. For each sample, calculate the mean age and assign this data to a column called `mean_age`. 

Then, visualize the distribution of the sample estimates (means) you just calculated by plotting a histogram using `binwidth = 1` in the `geom_histogram` argument. Name the plot `sampling_distribution_20`. Give the plot the title "Sampling Distribution (n=20)" using `ggtitle`, and give the x-axis a descriptive label. Also specify the x-axis limits to be 65 and 95 using `xlim(c(65, 95))`.

Set the seed as 4321 when you collect your samples.

In [None]:
set.seed(4321) # DO NOT CHANGE THIS!
options(repr.plot.width = 8, repr.plot.height = 7)

# your code here
fail() # No Answer - remove if you provide an answer
sampling_distribution_20

In [None]:
library(digest)
stopifnot("type of plot is not correct (if you are using two types of geoms, try flipping the order of the geom objects!)"= setequal(digest(paste(toString(sapply(seq_len(length(sampling_distribution_20$layers)), function(i) {c(class(sampling_distribution_20$layers[[i]]$geom))[1]})), "8ad21")), "8f8cc9367108265799db27d7d4b4ad7f"))
stopifnot("variable x is not correct"= setequal(digest(paste(toString(unlist(lapply(sapply(seq_len(length(sampling_distribution_20$layers)), function(i) {rlang::get_expr(c(sampling_distribution_20$layers[[i]]$mapping, sampling_distribution_20$mapping)$x)}), as.character))), "8ad21")), "4978582df96027225aac249f3ab2cf37"))
stopifnot("variable y is not correct"= setequal(digest(paste(toString(unlist(lapply(sapply(seq_len(length(sampling_distribution_20$layers)), function(i) {rlang::get_expr(c(sampling_distribution_20$layers[[i]]$mapping, sampling_distribution_20$mapping)$y)}), as.character))), "8ad21")), "e84253c5c4007fcfe2086bb5852eecc0"))
stopifnot("x-axis label is not descriptive, nicely formatted, or human readable"= setequal(digest(paste(toString(rlang::get_expr(c(sampling_distribution_20$layers[[1]]$mapping, sampling_distribution_20$mapping)$x)!= sampling_distribution_20$labels$x), "8ad21")), "2204c9e5409f8f5e29509c27aef96d08"))
stopifnot("y-axis label is not descriptive, nicely formatted, or human readable"= setequal(digest(paste(toString(rlang::get_expr(c(sampling_distribution_20$layers[[1]]$mapping, sampling_distribution_20$mapping)$y)!= sampling_distribution_20$labels$y), "8ad21")), "e84253c5c4007fcfe2086bb5852eecc0"))
stopifnot("incorrect colour variable in sampling_distribution_20, specify a correct one if required"= setequal(digest(paste(toString(rlang::get_expr(c(sampling_distribution_20$layers[[1]]$mapping, sampling_distribution_20$mapping)$colour)), "8ad21")), "e84253c5c4007fcfe2086bb5852eecc0"))
stopifnot("incorrect shape variable in sampling_distribution_20, specify a correct one if required"= setequal(digest(paste(toString(rlang::get_expr(c(sampling_distribution_20$layers[[1]]$mapping, sampling_distribution_20$mapping)$shape)), "8ad21")), "e84253c5c4007fcfe2086bb5852eecc0"))
stopifnot("the colour label in sampling_distribution_20 is not descriptive, nicely formatted, or human readable"= setequal(digest(paste(toString(rlang::get_expr(c(sampling_distribution_20$layers[[1]]$mapping, sampling_distribution_20$mapping)$colour) != sampling_distribution_20$labels$colour), "8ad21")), "e84253c5c4007fcfe2086bb5852eecc0"))
stopifnot("the shape label in sampling_distribution_20 is not descriptive, nicely formatted, or human readable"= setequal(digest(paste(toString(rlang::get_expr(c(sampling_distribution_20$layers[[1]]$mapping, sampling_distribution_20$mapping)$colour) != sampling_distribution_20$labels$shape), "8ad21")), "e84253c5c4007fcfe2086bb5852eecc0"))
stopifnot("fill variable in sampling_distribution_20 is not correct"= setequal(digest(paste(toString(quo_name(sampling_distribution_20$mapping$fill)), "8ad21")), "b658f7f61ca1ec54fa3ee7cf60da5875"))
stopifnot("fill label in sampling_distribution_20 is not informative"= setequal(digest(paste(toString((quo_name(sampling_distribution_20$mapping$fill) != sampling_distribution_20$labels$fill)), "8ad21")), "e84253c5c4007fcfe2086bb5852eecc0"))
stopifnot("position argument in sampling_distribution_20 is not correct"= setequal(digest(paste(toString(class(sampling_distribution_20$layers[[1]]$position)[1]), "8ad21")), "30fa1d892ee409b6c181163edeef3696"))

stopifnot("type of sampling_distribution_20$labels$title is not character"= setequal(digest(paste(toString(class(sampling_distribution_20$labels$title)), "8ad22")), "5ad8469a4083dfb0ce11557d3985880f"))
stopifnot("length of sampling_distribution_20$labels$title is not correct"= setequal(digest(paste(toString(length(sampling_distribution_20$labels$title)), "8ad22")), "9cd4c9dc82ba108f63e1bfa86d742059"))
stopifnot("value of sampling_distribution_20$labels$title is not correct"= setequal(digest(paste(toString(tolower(sampling_distribution_20$labels$title)), "8ad22")), "948c769c7502a9886638ea45b8f678e4"))
stopifnot("letters in string value of sampling_distribution_20$labels$title are correct but case is not correct"= setequal(digest(paste(toString(sampling_distribution_20$labels$title), "8ad22")), "36c92956e1e26415ffb9f22bc5f2f209"))

stopifnot("sampling_distribution_20$data should be a data frame"= setequal(digest(paste(toString('data.frame' %in% class(sampling_distribution_20$data)), "8ad23")), "8b15950c56394f2d8c034981ace8b01d"))
stopifnot("dimensions of sampling_distribution_20$data are not correct"= setequal(digest(paste(toString(dim(sampling_distribution_20$data)), "8ad23")), "23bab558f0b89ad9fc394084e94ee47e"))
stopifnot("column names of sampling_distribution_20$data are not correct"= setequal(digest(paste(toString(sort(colnames(sampling_distribution_20$data))), "8ad23")), "16107bb4ad77186575d9635a04888185"))
stopifnot("types of columns in sampling_distribution_20$data are not correct"= setequal(digest(paste(toString(sort(unlist(sapply(sampling_distribution_20$data, class)))), "8ad23")), "288c768f08ce10ee2084c62a6ebbaf04"))
stopifnot("values in one or more numerical columns in sampling_distribution_20$data are not correct"= setequal(digest(paste(toString(if (any(sapply(sampling_distribution_20$data, is.numeric))) sort(round(sapply(sampling_distribution_20$data[, sapply(sampling_distribution_20$data, is.numeric)], sum, na.rm = TRUE), 2)) else 0), "8ad23")), "89980033df168617354159156887c16e"))
stopifnot("values in one or more character columns in sampling_distribution_20$data are not correct"= setequal(digest(paste(toString(if (any(sapply(sampling_distribution_20$data, is.character))) sum(sapply(sampling_distribution_20$data[sapply(sampling_distribution_20$data, is.character)], function(x) length(unique(x)))) else 0), "8ad23")), "5786fb1781279ca2b1561ea53538430f"))
stopifnot("values in one or more factor columns in sampling_distribution_20$data are not correct"= setequal(digest(paste(toString(if (any(sapply(sampling_distribution_20$data, is.factor))) sum(sapply(sampling_distribution_20$data[, sapply(sampling_distribution_20$data, is.factor)], function(col) length(unique(col)))) else 0), "8ad23")), "5786fb1781279ca2b1561ea53538430f"))

print('Success!')

**Question 2.6** 
<br> {points: 1}

Using the same strategy as you did above, draw 1500 random samples from the Canadian seniors population (`can_seniors`), each of size 100. For each sample, calculate the mean age and assign this data to a column called `mean_age`. 

Then, visualize the distribution of the sample estimates (means) you just calculated by plotting a histogram using `binwidth = 1` in the `geom_histogram` argument. Name the plot `sampling_distribution_100`. Give the plot the title "Sampling Distribution (n=100)" using `ggtitle`, and give the x axis a descriptive label. Also specify the x-axis limits to be 65 and 95 using `xlim(c(65, 95))`.

Set the seed as 4321 when you collect your samples.

In [None]:
set.seed(4321) # DO NOT CHANGE THIS!
options(repr.plot.width = 8, repr.plot.height = 7)
# your code here
fail() # No Answer - remove if you provide an answer
sampling_distribution_100

In [None]:
library(digest)
stopifnot("type of plot is not correct (if you are using two types of geoms, try flipping the order of the geom objects!)"= setequal(digest(paste(toString(sapply(seq_len(length(sampling_distribution_100$layers)), function(i) {c(class(sampling_distribution_100$layers[[i]]$geom))[1]})), "6be2d")), "05a74f89507e69bd8c1c6e4180827862"))
stopifnot("variable x is not correct"= setequal(digest(paste(toString(unlist(lapply(sapply(seq_len(length(sampling_distribution_100$layers)), function(i) {rlang::get_expr(c(sampling_distribution_100$layers[[i]]$mapping, sampling_distribution_100$mapping)$x)}), as.character))), "6be2d")), "dc5af9ae6d4002ecb4713a6f82dce9bb"))
stopifnot("variable y is not correct"= setequal(digest(paste(toString(unlist(lapply(sapply(seq_len(length(sampling_distribution_100$layers)), function(i) {rlang::get_expr(c(sampling_distribution_100$layers[[i]]$mapping, sampling_distribution_100$mapping)$y)}), as.character))), "6be2d")), "4e597b86a2b4069866d2c7e88ea2dae5"))
stopifnot("x-axis label is not descriptive, nicely formatted, or human readable"= setequal(digest(paste(toString(rlang::get_expr(c(sampling_distribution_100$layers[[1]]$mapping, sampling_distribution_100$mapping)$x)!= sampling_distribution_100$labels$x), "6be2d")), "b227094f91354b24837870fd175e6a3d"))
stopifnot("y-axis label is not descriptive, nicely formatted, or human readable"= setequal(digest(paste(toString(rlang::get_expr(c(sampling_distribution_100$layers[[1]]$mapping, sampling_distribution_100$mapping)$y)!= sampling_distribution_100$labels$y), "6be2d")), "4e597b86a2b4069866d2c7e88ea2dae5"))
stopifnot("incorrect colour variable in sampling_distribution_100, specify a correct one if required"= setequal(digest(paste(toString(rlang::get_expr(c(sampling_distribution_100$layers[[1]]$mapping, sampling_distribution_100$mapping)$colour)), "6be2d")), "4e597b86a2b4069866d2c7e88ea2dae5"))
stopifnot("incorrect shape variable in sampling_distribution_100, specify a correct one if required"= setequal(digest(paste(toString(rlang::get_expr(c(sampling_distribution_100$layers[[1]]$mapping, sampling_distribution_100$mapping)$shape)), "6be2d")), "4e597b86a2b4069866d2c7e88ea2dae5"))
stopifnot("the colour label in sampling_distribution_100 is not descriptive, nicely formatted, or human readable"= setequal(digest(paste(toString(rlang::get_expr(c(sampling_distribution_100$layers[[1]]$mapping, sampling_distribution_100$mapping)$colour) != sampling_distribution_100$labels$colour), "6be2d")), "4e597b86a2b4069866d2c7e88ea2dae5"))
stopifnot("the shape label in sampling_distribution_100 is not descriptive, nicely formatted, or human readable"= setequal(digest(paste(toString(rlang::get_expr(c(sampling_distribution_100$layers[[1]]$mapping, sampling_distribution_100$mapping)$colour) != sampling_distribution_100$labels$shape), "6be2d")), "4e597b86a2b4069866d2c7e88ea2dae5"))
stopifnot("fill variable in sampling_distribution_100 is not correct"= setequal(digest(paste(toString(quo_name(sampling_distribution_100$mapping$fill)), "6be2d")), "fe60acb79f09ca26d66b2cea54a093f4"))
stopifnot("fill label in sampling_distribution_100 is not informative"= setequal(digest(paste(toString((quo_name(sampling_distribution_100$mapping$fill) != sampling_distribution_100$labels$fill)), "6be2d")), "4e597b86a2b4069866d2c7e88ea2dae5"))
stopifnot("position argument in sampling_distribution_100 is not correct"= setequal(digest(paste(toString(class(sampling_distribution_100$layers[[1]]$position)[1]), "6be2d")), "f692140a95143e90bd2985a94961319f"))

stopifnot("type of sampling_distribution_100$labels$title is not character"= setequal(digest(paste(toString(class(sampling_distribution_100$labels$title)), "6be2e")), "5fe8ac3ed3db8b9530764e79720fc41a"))
stopifnot("length of sampling_distribution_100$labels$title is not correct"= setequal(digest(paste(toString(length(sampling_distribution_100$labels$title)), "6be2e")), "1d09df540e59f9def945af75909bd8b6"))
stopifnot("value of sampling_distribution_100$labels$title is not correct"= setequal(digest(paste(toString(tolower(sampling_distribution_100$labels$title)), "6be2e")), "c6f74a74996e5a1ecab38a6441593631"))
stopifnot("letters in string value of sampling_distribution_100$labels$title are correct but case is not correct"= setequal(digest(paste(toString(sampling_distribution_100$labels$title), "6be2e")), "bb041b588b96394c3b18c7a13099e1df"))

stopifnot("sampling_distribution_100$data should be a data frame"= setequal(digest(paste(toString('data.frame' %in% class(sampling_distribution_100$data)), "6be2f")), "165f79539ce4d1e30d2c01cc14b6618a"))
stopifnot("dimensions of sampling_distribution_100$data are not correct"= setequal(digest(paste(toString(dim(sampling_distribution_100$data)), "6be2f")), "414c5d79f7dd081999bf8ee19236ecda"))
stopifnot("column names of sampling_distribution_100$data are not correct"= setequal(digest(paste(toString(sort(colnames(sampling_distribution_100$data))), "6be2f")), "b5e45fafce6bd526f899ec5e31384198"))
stopifnot("types of columns in sampling_distribution_100$data are not correct"= setequal(digest(paste(toString(sort(unlist(sapply(sampling_distribution_100$data, class)))), "6be2f")), "74502a2338b60bae0cb1c6a9a8baa1c6"))
stopifnot("values in one or more numerical columns in sampling_distribution_100$data are not correct"= setequal(digest(paste(toString(if (any(sapply(sampling_distribution_100$data, is.numeric))) sort(round(sapply(sampling_distribution_100$data[, sapply(sampling_distribution_100$data, is.numeric)], sum, na.rm = TRUE), 2)) else 0), "6be2f")), "63e30da677ee59450411d2368944be82"))
stopifnot("values in one or more character columns in sampling_distribution_100$data are not correct"= setequal(digest(paste(toString(if (any(sapply(sampling_distribution_100$data, is.character))) sum(sapply(sampling_distribution_100$data[sapply(sampling_distribution_100$data, is.character)], function(x) length(unique(x)))) else 0), "6be2f")), "aeb6c9b45b5b360362d85e156d8bf601"))
stopifnot("values in one or more factor columns in sampling_distribution_100$data are not correct"= setequal(digest(paste(toString(if (any(sapply(sampling_distribution_100$data, is.factor))) sum(sapply(sampling_distribution_100$data[, sapply(sampling_distribution_100$data, is.factor)], function(col) length(unique(col)))) else 0), "6be2f")), "aeb6c9b45b5b360362d85e156d8bf601"))

print('Success!')

In [None]:
# run this cell to change the sampling distribution plot created
# earlier in the notebook so that the x-axis is the same dimensions
# as the other two plots you just made, and so that the title is "n = 40"
sampling_distribution <- sampling_distribution + 
    xlim(c(65, 95))
sampling_distribution$labels$title <- "Sampling Distribution (n=40)"

**Question 2.7** 
<br> {points: 1}

Fill in the blanks in the code below to use `plot_grid` to concatenate the three sampling distributions vertically. Order them from smallest sample size on the on the top, to largest sample size on the bottom. Name the final panel figure `sampling_distribution_panel`.

In [None]:
options(repr.plot.width = 6)
# sampling_distribution_panel <- plot_grid(
#     ...,
#     ...,
#     ...,
#     ncol = 1
# )

# your code here
fail() # No Answer - remove if you provide an answer

In [None]:
library(digest)
stopifnot("type of unlist(sapply(sampling_distribution_panel$layers, function(x) x$geom_params$grob$grobs[[which(x$geom_params$grob$layout$name == \"title\")]]$children[[1]]$label)) is not character"= setequal(digest(paste(toString(class(unlist(sapply(sampling_distribution_panel$layers, function(x) x$geom_params$grob$grobs[[which(x$geom_params$grob$layout$name == "title")]]$children[[1]]$label)))), "ed003")), "a66ee208c03848cdc9923b8e9e07b850"))
stopifnot("length of unlist(sapply(sampling_distribution_panel$layers, function(x) x$geom_params$grob$grobs[[which(x$geom_params$grob$layout$name == \"title\")]]$children[[1]]$label)) is not correct"= setequal(digest(paste(toString(length(unlist(sapply(sampling_distribution_panel$layers, function(x) x$geom_params$grob$grobs[[which(x$geom_params$grob$layout$name == "title")]]$children[[1]]$label)))), "ed003")), "f0fbf359cf86bbbb0b435ff8f8b087a9"))
stopifnot("value of unlist(sapply(sampling_distribution_panel$layers, function(x) x$geom_params$grob$grobs[[which(x$geom_params$grob$layout$name == \"title\")]]$children[[1]]$label)) is not correct"= setequal(digest(paste(toString(tolower(unlist(sapply(sampling_distribution_panel$layers, function(x) x$geom_params$grob$grobs[[which(x$geom_params$grob$layout$name == "title")]]$children[[1]]$label)))), "ed003")), "4d3fcf1a06e6a857cbd26a5441070921"))
stopifnot("letters in string value of unlist(sapply(sampling_distribution_panel$layers, function(x) x$geom_params$grob$grobs[[which(x$geom_params$grob$layout$name == \"title\")]]$children[[1]]$label)) are correct but case is not correct"= setequal(digest(paste(toString(unlist(sapply(sampling_distribution_panel$layers, function(x) x$geom_params$grob$grobs[[which(x$geom_params$grob$layout$name == "title")]]$children[[1]]$label))), "ed003")), "32113e1822fdd024d0b2080f9625d150"))

print('Success!')

**Question 2.8** Multiple Choice
<br> {points: 1}

Considering the panel figure you created above in **question 2.7**, which of the following statements below **is not** correct:

A. As the sample size increases, the sampling distribution of the point estimate becomes narrower.

B. As the sample size increases, more sample point estimates are closer to the true population mean.

C. As the sample size decreases, the sample point estimates become more variable (spread out).

D. As the sample size increases, the sample point estimates become more variable (spread out).

*Assign your answer to an object called `answer2.8`. Your answer should be a single character surrounded by quotes.*

In [None]:
# your code here
fail() # No Answer - remove if you provide an answer
answer2.8

In [None]:
library(digest)
stopifnot("type of answer2.8 is not character"= setequal(digest(paste(toString(class(answer2.8)), "1295c")), "30545fdc69b8c2f46e5c56fbec6e6b63"))
stopifnot("length of answer2.8 is not correct"= setequal(digest(paste(toString(length(answer2.8)), "1295c")), "16d39cd20bf6827654dcbe362ca3ea1b"))
stopifnot("value of answer2.8 is not correct"= setequal(digest(paste(toString(tolower(answer2.8)), "1295c")), "3190c6a35d7613fb9cacdbe9314f54de"))
stopifnot("letters in string value of answer2.8 are correct but case is not correct"= setequal(digest(paste(toString(answer2.8), "1295c")), "e28bd1157ab35285c0ae12901244abc4"))

print('Success!')

**Question 2.9** True/False
<br> {points: 1}

Given what you observed above, and considering the real life scenario where you will only have one sample, answer the True/False question below:

The smaller your random sample, the better your sample point estimate reflect the true population parameter you are trying to estimate. True or False?

*Assign your answer to an object called `answer2.9`. Your answer should be either "true" or "false", surrounded by quotes.*

In [None]:
# your code here
fail() # No Answer - remove if you provide an answer
answer2.9

In [None]:
library(digest)
stopifnot("type of answer2.9 is not character"= setequal(digest(paste(toString(class(answer2.9)), "2c130")), "52308d931cbec43d926baa2a9c27ec70"))
stopifnot("length of answer2.9 is not correct"= setequal(digest(paste(toString(length(answer2.9)), "2c130")), "198667e56627dd537241529e777f6ecc"))
stopifnot("value of answer2.9 is not correct"= setequal(digest(paste(toString(tolower(answer2.9)), "2c130")), "e8cc029f6348b382afad35667a0e0fc8"))
stopifnot("letters in string value of answer2.9 are correct but case is not correct"= setequal(digest(paste(toString(answer2.9), "2c130")), "e8cc029f6348b382afad35667a0e0fc8"))

print('Success!')

In [None]:
source('cleanup.R')