# Worksheet 11 - Introduction to Statistical Inference

### Lecture and Tutorial Learning Goals:

After completing this week's lecture and tutorial work, you will be able to:
- Describe real world examples of questions that can be answered with the statistical inference methods.
- Name common population parameters (e.g., mean, proportion, median, variance, standard deviation) that are often estimated using sample data, and use computation to estimate these.
- Define the following statistical sampling terms (population, sample, population parameter, point estimate, sampling distribution).
- Explain the difference between a population parameter and sample point estimate.
- Use computation to draw random samples from a finite population.
- Use computation to create a sampling distribution from a finite population.
- Describe how sample size influences the sampling distribution.

This worksheet covers parts of [Chapter 10](https://python.datasciencebook.ca/inference) of the online textbook. You should read this chapter before attempting this assignment. Any place you see `___`, you must fill in the function, variable, or data to complete the code. Substitute the `raise NotImplementedError` with your completed code and answers then proceed to run the cell.

In [None]:
### Run this cell before continuing.
import altair as alt
import numpy as np
import pandas as pd

# Simplify working with large datasets in Altair
alt.data_transformers.enable('vegafusion')

**Question 1.0** Multiple Choice:
<br> {points: 1}

In which of the following questions would inferential methods (e.g., estimation or hypothesis testing) be appropriate?

A. Does treating a corn crop with Roundup cause greater yields compared to corn crops that are not treated with pesticides in Saskatchewan?

B. Are yields of corn crops which are treated with Roundup different than corn crops which are not treated with pesticides in Saskatchewan?

C. What will be the yield of a corn crop in Saskatchewan if we treat it with Roundup next year?

D. Are yields of corn crops which are treated with Roundup different than corn crops which are not treated with pesticides in the data set collected from the Rural Municipality of Cymri No. 36 in Saskatchewan?

*Assign your answer to an object called `answer1_0`. Your answer should be a single character surrounded by quotes.*

In [None]:
# your code here
raise NotImplementedError

In [None]:
from hashlib import sha1
assert sha1(str(type(answer1_0)).encode("utf-8")+b"6c022").hexdigest() == "710687cbebd515a0c2e73c181d3a15fa68b8aac7", "type of answer1_0 is not str. answer1_0 should be an str"
assert sha1(str(len(answer1_0)).encode("utf-8")+b"6c022").hexdigest() == "c6c1af78ca7f2b4ff3e0ddfbe7536cc249417671", "length of answer1_0 is not correct"
assert sha1(str(answer1_0.lower()).encode("utf-8")+b"6c022").hexdigest() == "2e7cd0787e08e6702f7dc13c59cbc123fe5f3d2c", "value of answer1_0 is not correct"
assert sha1(str(answer1_0).encode("utf-8")+b"6c022").hexdigest() == "a04408574441765026696d6ba305cbc57c25d38e", "correct string value of answer1_0 but incorrect case of letters"

print('Success!')

**Question 1.1** Matching:
<br> {points: 1}

Read the mixed up table below and assign the variables in the code cell below a number to match the the term to it's correct definition. Do not put quotations around the number or include words in the answer, we are expecting the assigned values to be numbers.

| Terms |  Definitions |
|----------------|------------|
| <p align="left">point estimate | <p align="left">1. the entire set of entities/objects of interest |
| <p align="left">population | <p align="left">2. selecting a subset of observations from a population where each observation is equally likely to be selected at any point during the selection process|
| <p align="left">random sampling | <p align="left">3. a numerical summary value about the population |
| <p align="left">representative sampling | <p align="left">4. a distribution of point estimates, where each point estimate was calculated from a different random sample from the same population |
| <p align="left">population parameter | <p align="left">5. a collection of observations from a population |
| <p align="left">sample |  <p align="left">6. a single number calculated from a random sample that estimates an unknown population parameter of interest |
| <p align="left">observation | <p align="left">7. selecting a subset of observations from a population where the sample’s characteristics are a good representation of the population’s characteristics |
| <p align="left">sampling distribution | <p align="left">8. a quantity or a quality (or set of these) we collect from a given entity/object |

In [None]:
point_estimate = None
population = None
random_sampling = None
representative_sampling = None
population_parameter = None
sample = None
observation = None
sampling_distribution = None

# your code here
raise NotImplementedError

In [None]:
from hashlib import sha1
assert sha1(str(type(point_estimate)).encode("utf-8")+b"268ba").hexdigest() == "0e6ff717338c470d50b8eba39b10f54913ad654d", "type of point_estimate is not int. Please make sure it is int and not np.int64, etc. You can cast your value into an int using int()"
assert sha1(str(point_estimate).encode("utf-8")+b"268ba").hexdigest() == "fabcd5914d8b8e1c68d1a9e037ae4cfe7f7c928d", "value of point_estimate is not correct"

assert sha1(str(type(population)).encode("utf-8")+b"268bb").hexdigest() == "76ed8878f9e81c2a8fed4444bc112c353893974c", "type of population is not int. Please make sure it is int and not np.int64, etc. You can cast your value into an int using int()"
assert sha1(str(population).encode("utf-8")+b"268bb").hexdigest() == "272dd7dcd6f4349eddf1dc1b333ac0bcf964a53d", "value of population is not correct"

assert sha1(str(type(random_sampling)).encode("utf-8")+b"268bc").hexdigest() == "d6d521ce9291a2641f9581011e1339fd203b718e", "type of random_sampling is not int. Please make sure it is int and not np.int64, etc. You can cast your value into an int using int()"
assert sha1(str(random_sampling).encode("utf-8")+b"268bc").hexdigest() == "59d2c9ac442fa0cb147b730e421ee2496503f416", "value of random_sampling is not correct"

assert sha1(str(type(representative_sampling)).encode("utf-8")+b"268bd").hexdigest() == "d9793518bd12033183ca1669dd12db5c33259365", "type of representative_sampling is not int. Please make sure it is int and not np.int64, etc. You can cast your value into an int using int()"
assert sha1(str(representative_sampling).encode("utf-8")+b"268bd").hexdigest() == "35465a75d685bf93047f450184b2f1f13c0beb9e", "value of representative_sampling is not correct"

assert sha1(str(type(population_parameter)).encode("utf-8")+b"268be").hexdigest() == "e431ea1e158754d22bb2f0d0616f2cd7acab0f96", "type of population_parameter is not int. Please make sure it is int and not np.int64, etc. You can cast your value into an int using int()"
assert sha1(str(population_parameter).encode("utf-8")+b"268be").hexdigest() == "096f3f9ad7d8a00b4f377fc5d2927ecd6fedda35", "value of population_parameter is not correct"

assert sha1(str(type(sample)).encode("utf-8")+b"268bf").hexdigest() == "7ef603cc6f9b0eaf6384f815b16d375dfc941640", "type of sample is not int. Please make sure it is int and not np.int64, etc. You can cast your value into an int using int()"
assert sha1(str(sample).encode("utf-8")+b"268bf").hexdigest() == "d4bc1bba1235f2cead1cdc66641b958ef411c158", "value of sample is not correct"

assert sha1(str(type(observation)).encode("utf-8")+b"268c0").hexdigest() == "4d9764d06f14314ac4d9ad92f2a37bc4b2e67470", "type of observation is not int. Please make sure it is int and not np.int64, etc. You can cast your value into an int using int()"
assert sha1(str(observation).encode("utf-8")+b"268c0").hexdigest() == "f62f0c6a1964d48ad8cba312e3762b71b36f692a", "value of observation is not correct"

assert sha1(str(type(sampling_distribution)).encode("utf-8")+b"268c1").hexdigest() == "dc9d77781abb23b739fa06fdb2828914aae09247", "type of sampling_distribution is not int. Please make sure it is int and not np.int64, etc. You can cast your value into an int using int()"
assert sha1(str(sampling_distribution).encode("utf-8")+b"268c1").hexdigest() == "c6a30513e572fb285b2c89d38fdc8a4b0f8bdd48", "value of sampling_distribution is not correct"

print('Success!')

###  Virtual sampling simulation

In real life, we rarely, if ever, have measurements for our entire population. Here, however, we will pretend that we somehow were able to ask every single Candian senior what their age is. We will do this so that we can experiment to learn about sampling and how this relates to estimation.

Here we make a simulated dataset of ages for our population (all Canadian seniors) bounded by realistic values ($\geq$ 65 and $\leq$ 118):

In [None]:
# Run this cell to simulate a large finite population
# Don't change the seed!
np.random.seed(4321)

can_seniors = pd.DataFrame({
    'age': np.random.exponential(1 / 0.1, 2_000_000) ** 2 + 65,
}).query(
    "65 <= age <= 118"
)

can_seniors

**Question 1.2** 
<br> {points: 1}

A distribution defines all the possible values (or intervals) of the data and how often they occur. Visualize the distribution of the population (`can_seniors`) that was just created by plotting a histogram using `maxbins=30` in the `alt.Bin` argument. *Name the plot `pop_dist` and give the x-axis a descriptive label.*

In [None]:
# ___ = alt.Chart(___, title='Population distribution').mark_bar().encode(
#     x=alt.X(___)
#         .title(___)
#         .bin(___=30),
#     y=___
# )

# your code here
raise NotImplementedError
pop_dist

In [None]:
from hashlib import sha1
assert sha1(str(type(pop_dist.mark)).encode("utf-8")+b"f2970").hexdigest() == "15fafeb147ef2c09c0fbbdf9c142e600d9524daf", "type of pop_dist.mark is not str. pop_dist.mark should be an str"
assert sha1(str(len(pop_dist.mark)).encode("utf-8")+b"f2970").hexdigest() == "ecd99844572865e942a6ddc3ce95d077cfb9d658", "length of pop_dist.mark is not correct"
assert sha1(str(pop_dist.mark.lower()).encode("utf-8")+b"f2970").hexdigest() == "e0f59c6b5c45338b3d987d901686942463e09f3c", "value of pop_dist.mark is not correct"
assert sha1(str(pop_dist.mark).encode("utf-8")+b"f2970").hexdigest() == "e0f59c6b5c45338b3d987d901686942463e09f3c", "correct string value of pop_dist.mark but incorrect case of letters"

assert sha1(str(type(pop_dist.encoding['x']['shorthand'])).encode("utf-8")+b"f2971").hexdigest() == "c9242a560fd5b932262330d8253213b20ddc3648", "type of pop_dist.encoding['x']['shorthand'] is not str. pop_dist.encoding['x']['shorthand'] should be an str"
assert sha1(str(len(pop_dist.encoding['x']['shorthand'])).encode("utf-8")+b"f2971").hexdigest() == "275f39a0860f8dcc71bedb5fdd5f8aa18d773da1", "length of pop_dist.encoding['x']['shorthand'] is not correct"
assert sha1(str(pop_dist.encoding['x']['shorthand'].lower()).encode("utf-8")+b"f2971").hexdigest() == "b3b20a82f02b04e38a7aac7d4dd72df6bcdd8083", "value of pop_dist.encoding['x']['shorthand'] is not correct"
assert sha1(str(pop_dist.encoding['x']['shorthand']).encode("utf-8")+b"f2971").hexdigest() == "b3b20a82f02b04e38a7aac7d4dd72df6bcdd8083", "correct string value of pop_dist.encoding['x']['shorthand'] but incorrect case of letters"

assert sha1(str(type(pop_dist.encoding['y']['shorthand'])).encode("utf-8")+b"f2972").hexdigest() == "6904b732d746c7a333c26bd94b7e538c39456a1c", "type of pop_dist.encoding['y']['shorthand'] is not str. pop_dist.encoding['y']['shorthand'] should be an str"
assert sha1(str(len(pop_dist.encoding['y']['shorthand'])).encode("utf-8")+b"f2972").hexdigest() == "bbe7d226ca1c4ca8872f4baae98a335ad177c0ee", "length of pop_dist.encoding['y']['shorthand'] is not correct"
assert sha1(str(pop_dist.encoding['y']['shorthand'].lower()).encode("utf-8")+b"f2972").hexdigest() == "3b236bb221a60e492d3be0b31f02f3e707099e3e", "value of pop_dist.encoding['y']['shorthand'] is not correct"
assert sha1(str(pop_dist.encoding['y']['shorthand']).encode("utf-8")+b"f2972").hexdigest() == "3b236bb221a60e492d3be0b31f02f3e707099e3e", "correct string value of pop_dist.encoding['y']['shorthand'] but incorrect case of letters"

assert sha1(str(type(isinstance(pop_dist.encoding.x['title'], str))).encode("utf-8")+b"f2973").hexdigest() == "7a14e0c6dfb1dac88f4356a70c1c40dfc19d0205", "type of isinstance(pop_dist.encoding.x['title'], str) is not bool. isinstance(pop_dist.encoding.x['title'], str) should be a bool"
assert sha1(str(isinstance(pop_dist.encoding.x['title'], str)).encode("utf-8")+b"f2973").hexdigest() == "c35201a52f2f699d2f23609d99e9c6df2a14d7d3", "boolean value of isinstance(pop_dist.encoding.x['title'], str) is not correct"

assert sha1(str(type(pop_dist.data.equals(can_seniors))).encode("utf-8")+b"f2974").hexdigest() == "77f5962941dce89617db238c73d176955fa3373e", "type of pop_dist.data.equals(can_seniors) is not bool. pop_dist.data.equals(can_seniors) should be a bool"
assert sha1(str(pop_dist.data.equals(can_seniors)).encode("utf-8")+b"f2974").hexdigest() == "81be1cab6ed220fe013c62d755662fdf19a9c3ce", "boolean value of pop_dist.data.equals(can_seniors) is not correct"

print('Success!')

**Question 1.3** 
<br> {points: 1}

We often want to represent distributions by a single value or small number of values. Common values used for this include the mean, median, standard deviation, etc). 

Use the `agg` method to calculate the following population parameters from the `can_seniors` population:

- mean (`mean`)
- median (`median`)
- standard deviation (`std`)

*Name the resulting data frame `pop_parameters` (it should have one only column, called `age`, and one population parameter per row)*

In [None]:
# your code here
raise NotImplementedError
pop_parameters

In [None]:
from hashlib import sha1
assert sha1(str(type(pop_parameters.shape[0])).encode("utf-8")+b"aacb2").hexdigest() == "17a303a4b5ab04f7673e26b5ce8eed7570396891", "type of pop_parameters.shape[0] is not int. Please make sure it is int and not np.int64, etc. You can cast your value into an int using int()"
assert sha1(str(pop_parameters.shape[0]).encode("utf-8")+b"aacb2").hexdigest() == "0025421d3f5650cda2103768219a24f5919cf11c", "value of pop_parameters.shape[0] is not correct"

assert sha1(str(type(pop_parameters.shape[1])).encode("utf-8")+b"aacb3").hexdigest() == "91e71c590fa0439fc5a005ff092d20cde06d2e2c", "type of pop_parameters.shape[1] is not int. Please make sure it is int and not np.int64, etc. You can cast your value into an int using int()"
assert sha1(str(pop_parameters.shape[1]).encode("utf-8")+b"aacb3").hexdigest() == "5f4630f75d0230da6bc489a10ffd014ec6dcaf9f", "value of pop_parameters.shape[1] is not correct"

assert sha1(str(type("".join(pop_parameters.columns.values))).encode("utf-8")+b"aacb4").hexdigest() == "56cee7a511c136ad9ced5bf89b8bbef9e5510e78", "type of \"\".join(pop_parameters.columns.values) is not str. \"\".join(pop_parameters.columns.values) should be an str"
assert sha1(str(len("".join(pop_parameters.columns.values))).encode("utf-8")+b"aacb4").hexdigest() == "02ec09cdb05f671b4ea924e96bebbc13c4e92a28", "length of \"\".join(pop_parameters.columns.values) is not correct"
assert sha1(str("".join(pop_parameters.columns.values).lower()).encode("utf-8")+b"aacb4").hexdigest() == "59006204a44676616ad79c6363ca3878f111cc6a", "value of \"\".join(pop_parameters.columns.values) is not correct"
assert sha1(str("".join(pop_parameters.columns.values)).encode("utf-8")+b"aacb4").hexdigest() == "59006204a44676616ad79c6363ca3878f111cc6a", "correct string value of \"\".join(pop_parameters.columns.values) but incorrect case of letters"

print('Success!')

**Question 1.4** 
<br> {points: 1}

In real life, we usually are able to only collect a single sample from the population. We use that sample to try to infer what the population looks like.

Take a single random sample of 40 observations using `sample` from the Canadian seniors population (`can_seniors`). Name it `sample_1`. Use 4321 as your `random_state`.

In [None]:
# ___ = ___.sample(___, random_state=4321)

# your code here
raise NotImplementedError
sample_1.head()

In [None]:
from hashlib import sha1
assert sha1(str(type(sample_1.shape[0])).encode("utf-8")+b"cae72").hexdigest() == "50509324252bd7838583ef43ac08ad2afe19a919", "type of sample_1.shape[0] is not int. Please make sure it is int and not np.int64, etc. You can cast your value into an int using int()"
assert sha1(str(sample_1.shape[0]).encode("utf-8")+b"cae72").hexdigest() == "4bbf559d3ec30d905f63a070228f9b05f14b3b34", "value of sample_1.shape[0] is not correct"

assert sha1(str(type(sample_1.shape[1])).encode("utf-8")+b"cae73").hexdigest() == "4277017b144852fadc8f11b8448bb5d7002424ef", "type of sample_1.shape[1] is not int. Please make sure it is int and not np.int64, etc. You can cast your value into an int using int()"
assert sha1(str(sample_1.shape[1]).encode("utf-8")+b"cae73").hexdigest() == "6d083819c316eedb144babfc5452c917eb08c91c", "value of sample_1.shape[1] is not correct"

assert sha1(str(type("".join(sample_1.columns.values))).encode("utf-8")+b"cae74").hexdigest() == "643b374efc48b91e4dc8808819edd76b1e1e0943", "type of \"\".join(sample_1.columns.values) is not str. \"\".join(sample_1.columns.values) should be an str"
assert sha1(str(len("".join(sample_1.columns.values))).encode("utf-8")+b"cae74").hexdigest() == "1ba40395bc384d814f7397cd6c9efbc94065cd9f", "length of \"\".join(sample_1.columns.values) is not correct"
assert sha1(str("".join(sample_1.columns.values).lower()).encode("utf-8")+b"cae74").hexdigest() == "a7c7c8ffe46db06791bb8b9e32535560e5caaf08", "value of \"\".join(sample_1.columns.values) is not correct"
assert sha1(str("".join(sample_1.columns.values)).encode("utf-8")+b"cae74").hexdigest() == "a7c7c8ffe46db06791bb8b9e32535560e5caaf08", "correct string value of \"\".join(sample_1.columns.values) but incorrect case of letters"

print('Success!')

**Question 1.5** 
<br> {points: 1}

Visualize the distribution of the random sample you just took (`sample_1`) by plotting a histogram using `maxbins=30`. Just as in the population histogram we created above, give the plot a title; a suitable choice could be `"Sample 1 distribution"`.

*Name the plot `sample_1_dist` and give the plot and the x-axis a descriptive label.*

In [None]:
# your code here
raise NotImplementedError
sample_1_dist

In [None]:
from hashlib import sha1
assert sha1(str(type(sample_1_dist.mark)).encode("utf-8")+b"824df").hexdigest() == "4b504a57c0a66ff643492f8c638bb70f57a07016", "type of sample_1_dist.mark is not str. sample_1_dist.mark should be an str"
assert sha1(str(len(sample_1_dist.mark)).encode("utf-8")+b"824df").hexdigest() == "228eb1095a3e29be22e8626b70ba069ae2473283", "length of sample_1_dist.mark is not correct"
assert sha1(str(sample_1_dist.mark.lower()).encode("utf-8")+b"824df").hexdigest() == "6fbd850255a0b7ef8a8110c4496fb8818511b600", "value of sample_1_dist.mark is not correct"
assert sha1(str(sample_1_dist.mark).encode("utf-8")+b"824df").hexdigest() == "6fbd850255a0b7ef8a8110c4496fb8818511b600", "correct string value of sample_1_dist.mark but incorrect case of letters"

assert sha1(str(type(sample_1_dist.encoding.x['shorthand'])).encode("utf-8")+b"824e0").hexdigest() == "04eef7ecfba46fabc38d5f59308c8d671fb942d1", "type of sample_1_dist.encoding.x['shorthand'] is not str. sample_1_dist.encoding.x['shorthand'] should be an str"
assert sha1(str(len(sample_1_dist.encoding.x['shorthand'])).encode("utf-8")+b"824e0").hexdigest() == "299c56123fc24279e688c098dc2e1f5955f1400b", "length of sample_1_dist.encoding.x['shorthand'] is not correct"
assert sha1(str(sample_1_dist.encoding.x['shorthand'].lower()).encode("utf-8")+b"824e0").hexdigest() == "ffd8b6b4a9c52fbaebe551d4ccee141746ca7875", "value of sample_1_dist.encoding.x['shorthand'] is not correct"
assert sha1(str(sample_1_dist.encoding.x['shorthand']).encode("utf-8")+b"824e0").hexdigest() == "ffd8b6b4a9c52fbaebe551d4ccee141746ca7875", "correct string value of sample_1_dist.encoding.x['shorthand'] but incorrect case of letters"

assert sha1(str(type(sample_1_dist.encoding.y['shorthand'])).encode("utf-8")+b"824e1").hexdigest() == "840f864b8b313efb3ac20c89782298939062e2f3", "type of sample_1_dist.encoding.y['shorthand'] is not str. sample_1_dist.encoding.y['shorthand'] should be an str"
assert sha1(str(len(sample_1_dist.encoding.y['shorthand'])).encode("utf-8")+b"824e1").hexdigest() == "99ef253843c805cb5b3256ef49fcf17deb723e89", "length of sample_1_dist.encoding.y['shorthand'] is not correct"
assert sha1(str(sample_1_dist.encoding.y['shorthand'].lower()).encode("utf-8")+b"824e1").hexdigest() == "0d6fe253c038a91099e10d4244b04d84456d5e9d", "value of sample_1_dist.encoding.y['shorthand'] is not correct"
assert sha1(str(sample_1_dist.encoding.y['shorthand']).encode("utf-8")+b"824e1").hexdigest() == "0d6fe253c038a91099e10d4244b04d84456d5e9d", "correct string value of sample_1_dist.encoding.y['shorthand'] but incorrect case of letters"

assert sha1(str(type(isinstance(sample_1_dist.encoding.x['title'], str))).encode("utf-8")+b"824e2").hexdigest() == "9b322c01efa15bb71b850fb0d96997b5ad53a0e4", "type of isinstance(sample_1_dist.encoding.x['title'], str) is not bool. isinstance(sample_1_dist.encoding.x['title'], str) should be a bool"
assert sha1(str(isinstance(sample_1_dist.encoding.x['title'], str)).encode("utf-8")+b"824e2").hexdigest() == "946e61c4fa16e19569b58b1ad250b44d0829e1cc", "boolean value of isinstance(sample_1_dist.encoding.x['title'], str) is not correct"

assert sha1(str(type(sample_1_dist.data.equals(sample_1))).encode("utf-8")+b"824e3").hexdigest() == "e51d383836e43d4908a3de7c2eb4043f5a25e968", "type of sample_1_dist.data.equals(sample_1) is not bool. sample_1_dist.data.equals(sample_1) should be a bool"
assert sha1(str(sample_1_dist.data.equals(sample_1)).encode("utf-8")+b"824e3").hexdigest() == "af2f119b4ad0e305ee474feb18ecc4645b2f79b8", "boolean value of sample_1_dist.data.equals(sample_1) is not correct"

print('Success!')

**Question 1.6** 
<br> {points: 1}

Use `agg` to calculate the following point estimates from the random sample you just took (`sample_1`):

- mean 
- median 
- standard deviation 

*Name this data frame `sample_1_estimates`.*

In [None]:
# your code here
raise NotImplementedError
sample_1_estimates

In [None]:
from hashlib import sha1
assert sha1(str(type(sample_1_estimates.shape[0])).encode("utf-8")+b"8a164").hexdigest() == "90bf1792a5eb6cad717eac14630455b4f7126c48", "type of sample_1_estimates.shape[0] is not int. Please make sure it is int and not np.int64, etc. You can cast your value into an int using int()"
assert sha1(str(sample_1_estimates.shape[0]).encode("utf-8")+b"8a164").hexdigest() == "30f48e869e0fbc93d2cf585aad17674b1eef86b0", "value of sample_1_estimates.shape[0] is not correct"

assert sha1(str(type(sample_1_estimates.shape[1])).encode("utf-8")+b"8a165").hexdigest() == "5bf50f3aaffbd2d3ddad786322d71765b8783f02", "type of sample_1_estimates.shape[1] is not int. Please make sure it is int and not np.int64, etc. You can cast your value into an int using int()"
assert sha1(str(sample_1_estimates.shape[1]).encode("utf-8")+b"8a165").hexdigest() == "7c0d308b2179c0c5d807405a44fc3ca67bb33098", "value of sample_1_estimates.shape[1] is not correct"

assert sha1(str(type(sample_1_estimates.columns.values)).encode("utf-8")+b"8a166").hexdigest() == "c6d62474cba6c3bcc3fa60c6d33c3cebbf802d71", "type of sample_1_estimates.columns.values is not correct"
assert sha1(str(sample_1_estimates.columns.values).encode("utf-8")+b"8a166").hexdigest() == "b94bc33b45edb32d1e8592aab6a2aa073df1f59b", "value of sample_1_estimates.columns.values is not correct"

print('Success!')

Let's now compare our random sample to the population from which it was drawn. In `altair`, it is possible to display multiple charts together by using the concatenation operators. We can use the `&` operator to concatenate charts vertically and `|` to concatenate horizontally. Since we want to compare the distributions' shape and position on the x-axis, it is most effective to concatenate these charts vertically.

In [None]:
# run this code cell
pop_dist & sample_1_dist

And now let's compare the point estimates (mean, median and standard deviation) with the true population parameters we were trying to estimate:

In [None]:
# run this cell
pop_parameters

In [None]:
# run this cell
sample_1_estimates

**Question 1.7** Multiple Choice
<br> {points: 1}

After comparing the population and sample distributions above, and the true population parameters and the sample point estimates, which statement below **is not** correct:

A. The sample point estimates are close to the values for the true population parameters we are trying to estimate

B. The sample distribution is of a similar shape to the population distribution

C. The sample point estimates are identical to the values for the true population parameters we are trying to estimate

*Assign your answer to an object called `answer1_7`. Your answer should be a single character surrounded by quotes.*

In [None]:
# your code here
raise NotImplementedError

In [None]:
from hashlib import sha1
assert sha1(str(type(answer1_7)).encode("utf-8")+b"a42").hexdigest() == "155b5d482db5d4eb60ffbce114106185e0e19792", "type of answer1_7 is not str. answer1_7 should be an str"
assert sha1(str(len(answer1_7)).encode("utf-8")+b"a42").hexdigest() == "58e9720da90075311f01576ec4592ae98b8628c3", "length of answer1_7 is not correct"
assert sha1(str(answer1_7.lower()).encode("utf-8")+b"a42").hexdigest() == "953b6b74b6da78e7ea91d548cb4b7e0cf4d91c27", "value of answer1_7 is not correct"
assert sha1(str(answer1_7).encode("utf-8")+b"a42").hexdigest() == "b7d57aa61f39f03f6466b442b96fb7523130a80b", "correct string value of answer1_7 but incorrect case of letters"

print('Success!')

**Question 1.8.0** 
<br> {points: 1}

What if we took another sample? What would we expect? Let's try! Take another random sample of size 40 from population (using the different random seed `2020` so that you get a different sample), visualize its distribution with the title `"Sample 2 distribution"`, and calculate the point estimates for the sample mean, median, and standard deviation.

*Name your random sample of data `sample_2`, name your visualization  `sample_2_dist`, and finally name your estimates `sample_2_estimates`.*

In [None]:
# your code here
raise NotImplementedError
sample_2_dist

In [None]:
# Run this cell
sample_2_estimates

In [None]:
from hashlib import sha1
assert sha1(str(type(sample_2.shape[0])).encode("utf-8")+b"9853f").hexdigest() == "c5072de847046d7c77b9da3cdbc7f3309c000945", "type of sample_2.shape[0] is not int. Please make sure it is int and not np.int64, etc. You can cast your value into an int using int()"
assert sha1(str(sample_2.shape[0]).encode("utf-8")+b"9853f").hexdigest() == "9ceec5c353ec34340daadce485ab8f0b83a81ed3", "value of sample_2.shape[0] is not correct"

assert sha1(str(type(sample_2.shape[1])).encode("utf-8")+b"98540").hexdigest() == "083daadf7f1020733265574680b7046e467d1626", "type of sample_2.shape[1] is not int. Please make sure it is int and not np.int64, etc. You can cast your value into an int using int()"
assert sha1(str(sample_2.shape[1]).encode("utf-8")+b"98540").hexdigest() == "06bf52d191266d46698ef228287c0fadfa538193", "value of sample_2.shape[1] is not correct"

assert sha1(str(type("".join(sample_2.columns.values))).encode("utf-8")+b"98541").hexdigest() == "72a59c9e24553d69cc6935abf8728d150f30db1f", "type of \"\".join(sample_2.columns.values) is not str. \"\".join(sample_2.columns.values) should be an str"
assert sha1(str(len("".join(sample_2.columns.values))).encode("utf-8")+b"98541").hexdigest() == "baee856d2fec70ceb80b0bcedc51978a536623a0", "length of \"\".join(sample_2.columns.values) is not correct"
assert sha1(str("".join(sample_2.columns.values).lower()).encode("utf-8")+b"98541").hexdigest() == "346a17a92f8f6a86d964beccbbd42c958b10b50d", "value of \"\".join(sample_2.columns.values) is not correct"
assert sha1(str("".join(sample_2.columns.values)).encode("utf-8")+b"98541").hexdigest() == "346a17a92f8f6a86d964beccbbd42c958b10b50d", "correct string value of \"\".join(sample_2.columns.values) but incorrect case of letters"

assert sha1(str(type(sample_2_estimates.shape[0])).encode("utf-8")+b"98542").hexdigest() == "335cb9c93b9e61ac8da40b5e8d19c0d6f93a2cfb", "type of sample_2_estimates.shape[0] is not int. Please make sure it is int and not np.int64, etc. You can cast your value into an int using int()"
assert sha1(str(sample_2_estimates.shape[0]).encode("utf-8")+b"98542").hexdigest() == "ee364d158d0a70a56fcb1749c79cd0b20c9db37c", "value of sample_2_estimates.shape[0] is not correct"

assert sha1(str(type(sample_2_estimates.shape[1])).encode("utf-8")+b"98543").hexdigest() == "fc6f7277b17f224ac5be2a90ab8f8c048ca135cd", "type of sample_2_estimates.shape[1] is not int. Please make sure it is int and not np.int64, etc. You can cast your value into an int using int()"
assert sha1(str(sample_2_estimates.shape[1]).encode("utf-8")+b"98543").hexdigest() == "9b560c15c4ec90a8e6b7d2ff5be1a516cb275914", "value of sample_2_estimates.shape[1] is not correct"

assert sha1(str(type(sample_2_estimates.columns.values)).encode("utf-8")+b"98544").hexdigest() == "f5c6c1069bc9ec741b1fda5bc30778823ddf78fb", "type of sample_2_estimates.columns.values is not correct"
assert sha1(str(sample_2_estimates.columns.values).encode("utf-8")+b"98544").hexdigest() == "59f43d8f0caab9271bbb8199bf3467062c907c45", "value of sample_2_estimates.columns.values is not correct"

assert sha1(str(type(sample_2_dist.mark)).encode("utf-8")+b"98545").hexdigest() == "a10ebf9d83879eb78b11d881043e3827dec3f849", "type of sample_2_dist.mark is not str. sample_2_dist.mark should be an str"
assert sha1(str(len(sample_2_dist.mark)).encode("utf-8")+b"98545").hexdigest() == "c47e59eb29ddcd7821802945b69f0e5adaaf3d0e", "length of sample_2_dist.mark is not correct"
assert sha1(str(sample_2_dist.mark.lower()).encode("utf-8")+b"98545").hexdigest() == "4b22fda09c0819c6c975d2be30b77b9e74b0faa0", "value of sample_2_dist.mark is not correct"
assert sha1(str(sample_2_dist.mark).encode("utf-8")+b"98545").hexdigest() == "4b22fda09c0819c6c975d2be30b77b9e74b0faa0", "correct string value of sample_2_dist.mark but incorrect case of letters"

assert sha1(str(type(sample_2_dist.encoding.x['shorthand'])).encode("utf-8")+b"98546").hexdigest() == "cfb469dba4d4b3128e059cb93927dc057ff2ecd0", "type of sample_2_dist.encoding.x['shorthand'] is not str. sample_2_dist.encoding.x['shorthand'] should be an str"
assert sha1(str(len(sample_2_dist.encoding.x['shorthand'])).encode("utf-8")+b"98546").hexdigest() == "96d1c5000fa9bb52684fa552b7ad69ce162f4e7b", "length of sample_2_dist.encoding.x['shorthand'] is not correct"
assert sha1(str(sample_2_dist.encoding.x['shorthand'].lower()).encode("utf-8")+b"98546").hexdigest() == "d772317faf043dbe0ddd9f184ee44d2d597fdef3", "value of sample_2_dist.encoding.x['shorthand'] is not correct"
assert sha1(str(sample_2_dist.encoding.x['shorthand']).encode("utf-8")+b"98546").hexdigest() == "d772317faf043dbe0ddd9f184ee44d2d597fdef3", "correct string value of sample_2_dist.encoding.x['shorthand'] but incorrect case of letters"

assert sha1(str(type(sample_2_dist.encoding.y['shorthand'])).encode("utf-8")+b"98547").hexdigest() == "833545da5558b1ccc5314515ad410ee5111be4b8", "type of sample_2_dist.encoding.y['shorthand'] is not str. sample_2_dist.encoding.y['shorthand'] should be an str"
assert sha1(str(len(sample_2_dist.encoding.y['shorthand'])).encode("utf-8")+b"98547").hexdigest() == "30d7640c57abdacf268f2c829ecb65df3ee91d6a", "length of sample_2_dist.encoding.y['shorthand'] is not correct"
assert sha1(str(sample_2_dist.encoding.y['shorthand'].lower()).encode("utf-8")+b"98547").hexdigest() == "bc8e9fbc40eaedb6c23b9ff4307055deb35d93b1", "value of sample_2_dist.encoding.y['shorthand'] is not correct"
assert sha1(str(sample_2_dist.encoding.y['shorthand']).encode("utf-8")+b"98547").hexdigest() == "bc8e9fbc40eaedb6c23b9ff4307055deb35d93b1", "correct string value of sample_2_dist.encoding.y['shorthand'] but incorrect case of letters"

assert sha1(str(type(isinstance(sample_2_dist.encoding.x['title'], str))).encode("utf-8")+b"98548").hexdigest() == "2611d02fcd60c513300f98eb8f3f06851a5e3542", "type of isinstance(sample_2_dist.encoding.x['title'], str) is not bool. isinstance(sample_2_dist.encoding.x['title'], str) should be a bool"
assert sha1(str(isinstance(sample_2_dist.encoding.x['title'], str)).encode("utf-8")+b"98548").hexdigest() == "7aaf457af4b716fb7738160819dce0d74ccca574", "boolean value of isinstance(sample_2_dist.encoding.x['title'], str) is not correct"

assert sha1(str(type(sample_2_dist.data.equals(sample_2))).encode("utf-8")+b"98549").hexdigest() == "b3527788f498084ff05d735c4a2ec3b049617755", "type of sample_2_dist.data.equals(sample_2) is not bool. sample_2_dist.data.equals(sample_2) should be a bool"
assert sha1(str(sample_2_dist.data.equals(sample_2)).encode("utf-8")+b"98549").hexdigest() == "ecc17d5b7a387f409b0687b1fe4680e3a4d7801d", "boolean value of sample_2_dist.data.equals(sample_2) is not correct"

print('Success!')

**Question 1.8.1** 
<br> {points: 1}

After comparing the distribution and point estimates of this second random sample from the population with that of the first random sample and the population, which of the following statements below **is not** correct:

A. The sample distributions from different random samples are of a similar shape to the population distribution, but they vary a bit depending which values are captured in the sample

B. The sample point estimates from different random samples are close to the values for the true population parameters we are trying to estimate, but they vary a bit depending which values are captured in the sample

C. Every random sample from the same population should have an identical set of values and yield identical point estimates.

*Assign your answer to an object called `answer1_8_1`. Your answer should be a single character surrounded by quotes.*

In [None]:
# your code here
raise NotImplementedError

In [None]:
from hashlib import sha1
assert sha1(str(type(answer1_8_1)).encode("utf-8")+b"55942").hexdigest() == "0a32a5651865684e1340a709cb9ffb4709638b5b", "type of answer1_8_1 is not str. answer1_8_1 should be an str"
assert sha1(str(len(answer1_8_1)).encode("utf-8")+b"55942").hexdigest() == "748539f0bebe8f8ebd69979110de4e50fcc7c2b7", "length of answer1_8_1 is not correct"
assert sha1(str(answer1_8_1.lower()).encode("utf-8")+b"55942").hexdigest() == "f86428e64fdfb6963da7f52769179c8a4830576c", "value of answer1_8_1 is not correct"
assert sha1(str(answer1_8_1).encode("utf-8")+b"55942").hexdigest() == "d9cc21b4c43b5d163ec453d881ae44e21b6c8ddf", "correct string value of answer1_8_1 but incorrect case of letters"

print('Success!')

### Exploring the sampling distribution of an estimate

Just how much should we expect the point estimates of our random samples to vary? To build an intuition for this, let's experiment a little more with our population of Canadian seniors. To do this we will take 1000 random samples, and then calculate the point estimate we are interested in (let's choose the mean for this example) for each sample. Finally, we will visualize the distribution of the sample point estimates. This distribution will tell us how much we would expect the point estimates of our random samples to vary for this population for samples of size 40 (the size of our samples).

**Question 1.9** 
<br> {points: 1}

Draw 1000 random samples from our population of Canadian seniors (`can_seniors`). Each sample should have 40 observations. Use a list comprehension wrapped in `pd.concat` as in the textbook, and name the resulting data frame `samples`.

In [None]:
np.random.seed(4321) # DO NOT CHANGE

# ___ = pd.concat([
#     can_seniors.sample(___).assign(replicate=___)
#     for n in range(___)
# ])

# your code here
raise NotImplementedError
samples

In [None]:
from hashlib import sha1
assert sha1(str(type(samples.shape[0])).encode("utf-8")+b"a4e60").hexdigest() == "6a9ebc46a9865b53e9e5ad9e182cb4f3a95717d5", "type of samples.shape[0] is not int. Please make sure it is int and not np.int64, etc. You can cast your value into an int using int()"
assert sha1(str(samples.shape[0]).encode("utf-8")+b"a4e60").hexdigest() == "f7081fd7d8068b01dcfd5a29339d931caaa6ed9b", "value of samples.shape[0] is not correct"

assert sha1(str(type(samples.shape[1])).encode("utf-8")+b"a4e61").hexdigest() == "30a3f05ee645d0e7b68d351417beecc2cdc59acd", "type of samples.shape[1] is not int. Please make sure it is int and not np.int64, etc. You can cast your value into an int using int()"
assert sha1(str(samples.shape[1]).encode("utf-8")+b"a4e61").hexdigest() == "42e901b0771952e6d4cdcd6c315777be81f31c35", "value of samples.shape[1] is not correct"

assert sha1(str(type("".join(samples.columns.values))).encode("utf-8")+b"a4e62").hexdigest() == "e1d9442d51bee68e1cc3512558ee84057138fda8", "type of \"\".join(samples.columns.values) is not str. \"\".join(samples.columns.values) should be an str"
assert sha1(str(len("".join(samples.columns.values))).encode("utf-8")+b"a4e62").hexdigest() == "b7880d272c77d8f28b99f04d721ea8c85cf8128c", "length of \"\".join(samples.columns.values) is not correct"
assert sha1(str("".join(samples.columns.values).lower()).encode("utf-8")+b"a4e62").hexdigest() == "c085d7ce73de25eae491805ffc1ff179163af7de", "value of \"\".join(samples.columns.values) is not correct"
assert sha1(str("".join(samples.columns.values)).encode("utf-8")+b"a4e62").hexdigest() == "c085d7ce73de25eae491805ffc1ff179163af7de", "correct string value of \"\".join(samples.columns.values) but incorrect case of letters"

print('Success!')

**Question 2.0** 
<br> {points: 1}

Group by the sample replicate number, and then for each sample, calculate the mean as the point estimate. Name the data frame `sample_estimates`. Use `reset_index` and `rename(columns=___)`, so that the final data frame has the column names `replicate` and `mean_age`.

In [None]:
# your code here
raise NotImplementedError
sample_estimates

In [None]:
from hashlib import sha1
assert sha1(str(type(sample_estimates.shape[0])).encode("utf-8")+b"c99e0").hexdigest() == "b1550079421ba81aabccc92916777d5b3f1ec347", "type of sample_estimates.shape[0] is not int. Please make sure it is int and not np.int64, etc. You can cast your value into an int using int()"
assert sha1(str(sample_estimates.shape[0]).encode("utf-8")+b"c99e0").hexdigest() == "2da4838800f21c0e4077af4dee8532704d0d9913", "value of sample_estimates.shape[0] is not correct"

assert sha1(str(type(sample_estimates.shape[1])).encode("utf-8")+b"c99e1").hexdigest() == "fec73d948b54d05ca4b00573996ed912666dd77a", "type of sample_estimates.shape[1] is not int. Please make sure it is int and not np.int64, etc. You can cast your value into an int using int()"
assert sha1(str(sample_estimates.shape[1]).encode("utf-8")+b"c99e1").hexdigest() == "c87ecd6c47dea0b9ff782fc86ad0686c31de1d06", "value of sample_estimates.shape[1] is not correct"

assert sha1(str(type("".join(sample_estimates.columns.values))).encode("utf-8")+b"c99e2").hexdigest() == "1c9f9d76e3962f22539984fbbec492aaf30c4e17", "type of \"\".join(sample_estimates.columns.values) is not str. \"\".join(sample_estimates.columns.values) should be an str"
assert sha1(str(len("".join(sample_estimates.columns.values))).encode("utf-8")+b"c99e2").hexdigest() == "f29cb013bf1c1e70d88576bdf5a12efc3482f1a8", "length of \"\".join(sample_estimates.columns.values) is not correct"
assert sha1(str("".join(sample_estimates.columns.values).lower()).encode("utf-8")+b"c99e2").hexdigest() == "5c0215c561b6c44e9affb1a9efde4e89739d74d7", "value of \"\".join(sample_estimates.columns.values) is not correct"
assert sha1(str("".join(sample_estimates.columns.values)).encode("utf-8")+b"c99e2").hexdigest() == "5c0215c561b6c44e9affb1a9efde4e89739d74d7", "correct string value of \"\".join(sample_estimates.columns.values) but incorrect case of letters"

print('Success!')

**Question 2.1** 
<br> {points: 1}

Visualize the distribution of the sample estimates (`sample_estimates`) you just calculated by plotting a histogram using `maxbins=30`. Name the plot `sampling_distribution`, title the plot `"Sampling distribution of the sample means"` and give the x-axis a descriptive label.

In [None]:
# your code here
raise NotImplementedError
sampling_distribution

In [None]:
from hashlib import sha1
assert sha1(str(type(sampling_distribution.mark)).encode("utf-8")+b"8307b").hexdigest() == "975043e7c2a3e77d3048d57dbf6890ed3df4ea83", "type of sampling_distribution.mark is not str. sampling_distribution.mark should be an str"
assert sha1(str(len(sampling_distribution.mark)).encode("utf-8")+b"8307b").hexdigest() == "022fd38ae33951e649be80b4a47b682ab6f4ef91", "length of sampling_distribution.mark is not correct"
assert sha1(str(sampling_distribution.mark.lower()).encode("utf-8")+b"8307b").hexdigest() == "1b97e9205b0de050dd838f14dc7109d899e7636a", "value of sampling_distribution.mark is not correct"
assert sha1(str(sampling_distribution.mark).encode("utf-8")+b"8307b").hexdigest() == "1b97e9205b0de050dd838f14dc7109d899e7636a", "correct string value of sampling_distribution.mark but incorrect case of letters"

assert sha1(str(type(sampling_distribution.encoding.x['shorthand'])).encode("utf-8")+b"8307c").hexdigest() == "c00969d2ec534f2c9caab7cc6182b134d710140b", "type of sampling_distribution.encoding.x['shorthand'] is not str. sampling_distribution.encoding.x['shorthand'] should be an str"
assert sha1(str(len(sampling_distribution.encoding.x['shorthand'])).encode("utf-8")+b"8307c").hexdigest() == "1a1249ec5de81f1b0b09f3d59d8cbfffc874f342", "length of sampling_distribution.encoding.x['shorthand'] is not correct"
assert sha1(str(sampling_distribution.encoding.x['shorthand'].lower()).encode("utf-8")+b"8307c").hexdigest() == "acde2158bbb1530285e8c63525369d2f2dd8b343", "value of sampling_distribution.encoding.x['shorthand'] is not correct"
assert sha1(str(sampling_distribution.encoding.x['shorthand']).encode("utf-8")+b"8307c").hexdigest() == "acde2158bbb1530285e8c63525369d2f2dd8b343", "correct string value of sampling_distribution.encoding.x['shorthand'] but incorrect case of letters"

assert sha1(str(type(sampling_distribution.encoding.y['shorthand'])).encode("utf-8")+b"8307d").hexdigest() == "aff98e708235790227c179e84b6eb4c5b0d3282e", "type of sampling_distribution.encoding.y['shorthand'] is not str. sampling_distribution.encoding.y['shorthand'] should be an str"
assert sha1(str(len(sampling_distribution.encoding.y['shorthand'])).encode("utf-8")+b"8307d").hexdigest() == "c6751c85081472de1203a9c3ad004cc97399f8a9", "length of sampling_distribution.encoding.y['shorthand'] is not correct"
assert sha1(str(sampling_distribution.encoding.y['shorthand'].lower()).encode("utf-8")+b"8307d").hexdigest() == "bee4310dbd764c0d3feedc93c2eaf741d7d70956", "value of sampling_distribution.encoding.y['shorthand'] is not correct"
assert sha1(str(sampling_distribution.encoding.y['shorthand']).encode("utf-8")+b"8307d").hexdigest() == "bee4310dbd764c0d3feedc93c2eaf741d7d70956", "correct string value of sampling_distribution.encoding.y['shorthand'] but incorrect case of letters"

assert sha1(str(type(isinstance(sampling_distribution.encoding.x['title'], str))).encode("utf-8")+b"8307e").hexdigest() == "2d829ac6017b2d6788a8b3960088f1981c046608", "type of isinstance(sampling_distribution.encoding.x['title'], str) is not bool. isinstance(sampling_distribution.encoding.x['title'], str) should be a bool"
assert sha1(str(isinstance(sampling_distribution.encoding.x['title'], str)).encode("utf-8")+b"8307e").hexdigest() == "239517bc469c7808e94df2beeb029dc3ffe9f0e7", "boolean value of isinstance(sampling_distribution.encoding.x['title'], str) is not correct"

assert sha1(str(type(sampling_distribution.data.equals(sample_estimates))).encode("utf-8")+b"8307f").hexdigest() == "0650cbc423a0bb84a2718b383ef79eea6f4ca846", "type of sampling_distribution.data.equals(sample_estimates) is not bool. sampling_distribution.data.equals(sample_estimates) should be a bool"
assert sha1(str(sampling_distribution.data.equals(sample_estimates)).encode("utf-8")+b"8307f").hexdigest() == "81ae7650a12b3c4baceebbb53b25e8d64316bebd", "boolean value of sampling_distribution.data.equals(sample_estimates) is not correct"

print('Success!')

**Question 2.2** 
<br> {points: 1}

Let's refresh our memories: what is the mean age of the whole population (we calculated this above)? *Assign your answer to an object called `answer2_2`. Your answer should be a single number reported to two decimal places.*


In [None]:
# your code here
raise NotImplementedError
answer2_2

In [None]:
from hashlib import sha1
assert sha1(str(type(round(answer2_2, 1))).encode("utf-8")+b"36722").hexdigest() == "7447b95eabb83d6b33ff87ba34059acd7a38c479", "type of round(answer2_2, 1) is not float. Please make sure it is float and not np.float64, etc. You can cast your value into a float using float()"
assert sha1(str(round(round(answer2_2, 1), 2)).encode("utf-8")+b"36722").hexdigest() == "ae22674522c9d76c0c23c3e7a3e66cc42168d64d", "value of round(answer2_2, 1) is not correct (rounded to 2 decimal places)"

print('Success!')

**Question 2.3** Multiple Choice
<br> {points: 1}

Considering the true value for the population mean, and the sampling distribution you created and visualized in **question 2.1**, which of the following statements below **is not** correct:

A. The sampling distribution is centered at the true population mean

B. All the sample means are the same value as the true population mean

C. Most sample means are at or very near the same value as the true population mean

D. A few sample means are far away from the same value as the true population mean

*Assign your answer to an object called `answer2_3`. Your answer should be a single character surrounded by quotes.*

In [None]:
# your code here
raise NotImplementedError
answer2_3

In [None]:
from hashlib import sha1
assert sha1(str(type(answer2_3)).encode("utf-8")+b"3bf83").hexdigest() == "fcfe7d9f912b4bc9db38d1b7664c1c3154dcbb48", "type of answer2_3 is not str. answer2_3 should be an str"
assert sha1(str(len(answer2_3)).encode("utf-8")+b"3bf83").hexdigest() == "e352b2688135ddafcd9240bac4495b865413845e", "length of answer2_3 is not correct"
assert sha1(str(answer2_3.lower()).encode("utf-8")+b"3bf83").hexdigest() == "0065989114e3a5499d4d4cf3ec55691e2fd614a2", "value of answer2_3 is not correct"
assert sha1(str(answer2_3).encode("utf-8")+b"3bf83").hexdigest() == "5c7ddb6fc3c5c90f3facc86a2720ec0785a9bedd", "correct string value of answer2_3 but incorrect case of letters"

print('Success!')

**Question 2.4** True/False
<br> {points: 1}

Taking a random sample and calculating a point estimate is a good way to get a "best guess" of the population parameter you are interested in. True or False?

*Assign your answer to an object called `answer2_4`. Your answer should be a boolean. i.e. `True` or `False`*

In [None]:
# your code here
raise NotImplementedError
answer2_4

In [None]:
from hashlib import sha1
assert sha1(str(type(answer2_4)).encode("utf-8")+b"b97b3").hexdigest() == "4dff6c58b215c772eb37ce825a7166961795261f", "type of answer2_4 is not bool. answer2_4 should be a bool"
assert sha1(str(answer2_4).encode("utf-8")+b"b97b3").hexdigest() == "825ff218be45994ce37ba446e5a07380f70fee07", "boolean value of answer2_4 is not correct"

print('Success!')

### The influence of sample size on the sampling distribution

What happens to our point estimate when we change the sample size? Let's answer this question by experimenting! We will create 3 different sampling distributions of sample means, each using a different sample size. As we did above, we will draw samples from our Canadian seniors population. We will visualize these sampling distributions and see if we can see a pattern when we vary the sample size.

**Question 2.5** 
<br> {points: 1}

Using the same strategy as you did above, draw 1000 random samples from the Canadian seniors population (`can_seniors`), each of size **20**. For each sample, calculate the mean age and assign this data frame to an object called `sample_estimates_20`. As previously, make sure you use `reset_index` so that the data frame has the columns `replicate` and `mean_age`. 

Then, visualize the distribution of the sample estimates (means) you just calculated by plotting a histogram using `maxbins=30`. Name the plot variable `sampling_distribution_20` and give the x-axis a descriptive label. Give the plot the title `"n = 20"`.

In [None]:
np.random.seed(4321)  # DO NOT CHANGE

# your code here
raise NotImplementedError
sampling_distribution_20

In [None]:
from hashlib import sha1
assert sha1(str(type(sampling_distribution_20.mark)).encode("utf-8")+b"c028e").hexdigest() == "88b6a8dc819f9ada8c80b3aa46964e96f75b78ea", "type of sampling_distribution_20.mark is not str. sampling_distribution_20.mark should be an str"
assert sha1(str(len(sampling_distribution_20.mark)).encode("utf-8")+b"c028e").hexdigest() == "f6e862c0901a3e57f90bfec85e62cc83964c8683", "length of sampling_distribution_20.mark is not correct"
assert sha1(str(sampling_distribution_20.mark.lower()).encode("utf-8")+b"c028e").hexdigest() == "e0b150bd4efa4667259a17ad692574fd72ab90b2", "value of sampling_distribution_20.mark is not correct"
assert sha1(str(sampling_distribution_20.mark).encode("utf-8")+b"c028e").hexdigest() == "e0b150bd4efa4667259a17ad692574fd72ab90b2", "correct string value of sampling_distribution_20.mark but incorrect case of letters"

assert sha1(str(type(sampling_distribution_20.encoding.x['shorthand'])).encode("utf-8")+b"c028f").hexdigest() == "04416bba23dc709133fa45a5e60f8ace799f4d58", "type of sampling_distribution_20.encoding.x['shorthand'] is not str. sampling_distribution_20.encoding.x['shorthand'] should be an str"
assert sha1(str(len(sampling_distribution_20.encoding.x['shorthand'])).encode("utf-8")+b"c028f").hexdigest() == "4030ed8c49da8b2dace89f2c3f94415c3996b26b", "length of sampling_distribution_20.encoding.x['shorthand'] is not correct"
assert sha1(str(sampling_distribution_20.encoding.x['shorthand'].lower()).encode("utf-8")+b"c028f").hexdigest() == "3ff3be4dba46dbb644ef53c2a0b3926b10abb526", "value of sampling_distribution_20.encoding.x['shorthand'] is not correct"
assert sha1(str(sampling_distribution_20.encoding.x['shorthand']).encode("utf-8")+b"c028f").hexdigest() == "3ff3be4dba46dbb644ef53c2a0b3926b10abb526", "correct string value of sampling_distribution_20.encoding.x['shorthand'] but incorrect case of letters"

assert sha1(str(type(sampling_distribution_20.encoding.y['shorthand'])).encode("utf-8")+b"c0290").hexdigest() == "25170c3f1f5a124a932dc5b23bd20268710cd75c", "type of sampling_distribution_20.encoding.y['shorthand'] is not str. sampling_distribution_20.encoding.y['shorthand'] should be an str"
assert sha1(str(len(sampling_distribution_20.encoding.y['shorthand'])).encode("utf-8")+b"c0290").hexdigest() == "9243352947241c7d45a41ca4735956b17b688f47", "length of sampling_distribution_20.encoding.y['shorthand'] is not correct"
assert sha1(str(sampling_distribution_20.encoding.y['shorthand'].lower()).encode("utf-8")+b"c0290").hexdigest() == "b58c113f69f88b53ad2ea9df53e2eadd42d7099f", "value of sampling_distribution_20.encoding.y['shorthand'] is not correct"
assert sha1(str(sampling_distribution_20.encoding.y['shorthand']).encode("utf-8")+b"c0290").hexdigest() == "b58c113f69f88b53ad2ea9df53e2eadd42d7099f", "correct string value of sampling_distribution_20.encoding.y['shorthand'] but incorrect case of letters"

assert sha1(str(type(isinstance(sampling_distribution_20.encoding.x['title'], str))).encode("utf-8")+b"c0291").hexdigest() == "d65db7b7d6174605157b57ad9de65b0bdaacf9f0", "type of isinstance(sampling_distribution_20.encoding.x['title'], str) is not bool. isinstance(sampling_distribution_20.encoding.x['title'], str) should be a bool"
assert sha1(str(isinstance(sampling_distribution_20.encoding.x['title'], str)).encode("utf-8")+b"c0291").hexdigest() == "2afdf3077eccffba194b5f36acba924a5d82ab7b", "boolean value of isinstance(sampling_distribution_20.encoding.x['title'], str) is not correct"

assert sha1(str(type(sampling_distribution_20.data.equals(sample_estimates_20))).encode("utf-8")+b"c0292").hexdigest() == "064c8d068f0dcb6369c77cadefc587bdac866a94", "type of sampling_distribution_20.data.equals(sample_estimates_20) is not bool. sampling_distribution_20.data.equals(sample_estimates_20) should be a bool"
assert sha1(str(sampling_distribution_20.data.equals(sample_estimates_20)).encode("utf-8")+b"c0292").hexdigest() == "c34d993cc9d24cdbb373813efc2dff6539437050", "boolean value of sampling_distribution_20.data.equals(sample_estimates_20) is not correct"

print('Success!')

**Question 2.6** 
<br> {points: 1}

Using the same strategy as you did above, draw 1000 random samples from the Canadian seniors population (`can_seniors`), each of size **100**. For each sample, calculate the mean age and assign this data frame to an object called `sample_estimates_100`. As previously, make sure you use `reset_index` so that the data frame has the columns `replicate` and `mean_age`. 

Then, visualize the distribution of the sample estimates (means) you just calculated by plotting a histogram using `maxbins=30`. Name the plot variable `sampling_distribution_100` and give the x-axis a descriptive label. Give the plot the title `"n = 100"`.

In [None]:
np.random.seed(4321)  # DO NOT CHANGE

# your code here
raise NotImplementedError
sampling_distribution_100

In [None]:
from hashlib import sha1
assert sha1(str(type(sampling_distribution_100.mark)).encode("utf-8")+b"8766c").hexdigest() == "51fe5d19d7ce659727ea0af031066ec944aa40ac", "type of sampling_distribution_100.mark is not str. sampling_distribution_100.mark should be an str"
assert sha1(str(len(sampling_distribution_100.mark)).encode("utf-8")+b"8766c").hexdigest() == "dfed1589a1cc3bbe1d8d956bff8981511b169410", "length of sampling_distribution_100.mark is not correct"
assert sha1(str(sampling_distribution_100.mark.lower()).encode("utf-8")+b"8766c").hexdigest() == "438f9fc17883eebae875137839fa2a4ac1b997f4", "value of sampling_distribution_100.mark is not correct"
assert sha1(str(sampling_distribution_100.mark).encode("utf-8")+b"8766c").hexdigest() == "438f9fc17883eebae875137839fa2a4ac1b997f4", "correct string value of sampling_distribution_100.mark but incorrect case of letters"

assert sha1(str(type(sampling_distribution_100.encoding.x['shorthand'])).encode("utf-8")+b"8766d").hexdigest() == "6502e3a871c1bb307fe6c81226241b3c324ad273", "type of sampling_distribution_100.encoding.x['shorthand'] is not str. sampling_distribution_100.encoding.x['shorthand'] should be an str"
assert sha1(str(len(sampling_distribution_100.encoding.x['shorthand'])).encode("utf-8")+b"8766d").hexdigest() == "6e9fc8f27ec8475d57582193e1acb41f3ec32250", "length of sampling_distribution_100.encoding.x['shorthand'] is not correct"
assert sha1(str(sampling_distribution_100.encoding.x['shorthand'].lower()).encode("utf-8")+b"8766d").hexdigest() == "5e7a0427ecf5127f21906a34fa2bc4b0c7612e7d", "value of sampling_distribution_100.encoding.x['shorthand'] is not correct"
assert sha1(str(sampling_distribution_100.encoding.x['shorthand']).encode("utf-8")+b"8766d").hexdigest() == "5e7a0427ecf5127f21906a34fa2bc4b0c7612e7d", "correct string value of sampling_distribution_100.encoding.x['shorthand'] but incorrect case of letters"

assert sha1(str(type(sampling_distribution_100.encoding.y['shorthand'])).encode("utf-8")+b"8766e").hexdigest() == "82b845a887a7cdd80377f2f771e62549988c9c8c", "type of sampling_distribution_100.encoding.y['shorthand'] is not str. sampling_distribution_100.encoding.y['shorthand'] should be an str"
assert sha1(str(len(sampling_distribution_100.encoding.y['shorthand'])).encode("utf-8")+b"8766e").hexdigest() == "bd21ea7481ab63adc12a3c28882e36a0bcafdcb1", "length of sampling_distribution_100.encoding.y['shorthand'] is not correct"
assert sha1(str(sampling_distribution_100.encoding.y['shorthand'].lower()).encode("utf-8")+b"8766e").hexdigest() == "27639958f2cb6283cdea73a0098ba79b2d4c448b", "value of sampling_distribution_100.encoding.y['shorthand'] is not correct"
assert sha1(str(sampling_distribution_100.encoding.y['shorthand']).encode("utf-8")+b"8766e").hexdigest() == "27639958f2cb6283cdea73a0098ba79b2d4c448b", "correct string value of sampling_distribution_100.encoding.y['shorthand'] but incorrect case of letters"

assert sha1(str(type(isinstance(sampling_distribution_100.encoding.x['title'], str))).encode("utf-8")+b"8766f").hexdigest() == "7cddea9db0e3838627043213c3552ace51a838e7", "type of isinstance(sampling_distribution_100.encoding.x['title'], str) is not bool. isinstance(sampling_distribution_100.encoding.x['title'], str) should be a bool"
assert sha1(str(isinstance(sampling_distribution_100.encoding.x['title'], str)).encode("utf-8")+b"8766f").hexdigest() == "3190073ef24598b19f3b14604e71d23ed7116201", "boolean value of isinstance(sampling_distribution_100.encoding.x['title'], str) is not correct"

assert sha1(str(type(sampling_distribution_100.data.equals(sample_estimates_100))).encode("utf-8")+b"87670").hexdigest() == "7754d494a096f89691dd2d6eae1cbebfa09e3fd9", "type of sampling_distribution_100.data.equals(sample_estimates_100) is not bool. sampling_distribution_100.data.equals(sample_estimates_100) should be a bool"
assert sha1(str(sampling_distribution_100.data.equals(sample_estimates_100)).encode("utf-8")+b"87670").hexdigest() == "d23424811ac1e55f3ecaffc53746651f0e22f568", "boolean value of sampling_distribution_100.data.equals(sample_estimates_100) is not correct"

print('Success!')

**Question 2.7** 
<br> {points: 1}

Next, let's compare the three sampling distributions together. To do this more effectively we need to change the histograms x-axes to span the same range. We have previously seen how to change the axis range using `domain`, but since we also want the histogram bins to be recalculated with the new axis range, we need to use a slightly different method here where we change the `extent` over which the bins are created. We have already done this in the cell below, where we also change the title of our initial chart to indicate the sample size we used there.

You task is to fill in the name of the variables for the charts that we are concatenating vertically. Start with the one with the smallest `n` on top, and put the one with the largest `n` on the bottom. Name the final panel figure `sampling_distribution_panel`.

In [None]:
# Change the range of the x-axis to be the same for all charts
sampling_distribution.title = "n = 40"
sampling_distribution.encoding.x['bin']['extent'] = (70, 92)
sampling_distribution_20.encoding.x['bin']['extent'] = (70, 92)
sampling_distribution_100.encoding.x['bin']['extent'] = (70, 92)

# ___ = (
#     ___.properties(height=150)
#     & ___.properties(height=150)
#     & ___.properties(height=150)
# ).resolve_scale(
#     y='shared'  # Set the same y-axis range for all charts
# )

# your code here
raise NotImplementedError
sampling_distribution_panel

In [None]:
from hashlib import sha1
assert sha1(str(type(sampling_distribution_panel)).encode("utf-8")+b"41b2d").hexdigest() == "71bb53fcb653acc7ab0eeaee6e3908fcae25017b", "type of type(sampling_distribution_panel) is not correct"

assert sha1(str(type(sampling_distribution_panel.vconcat[0]['title'])).encode("utf-8")+b"41b2e").hexdigest() == "7e03a0f48273f294b993c8044d608beec2583d77", "type of sampling_distribution_panel.vconcat[0]['title'] is not str. sampling_distribution_panel.vconcat[0]['title'] should be an str"
assert sha1(str(len(sampling_distribution_panel.vconcat[0]['title'])).encode("utf-8")+b"41b2e").hexdigest() == "fb8e4906fbae10e890618a9b71725a83b0046c36", "length of sampling_distribution_panel.vconcat[0]['title'] is not correct"
assert sha1(str(sampling_distribution_panel.vconcat[0]['title'].lower()).encode("utf-8")+b"41b2e").hexdigest() == "7bd90ec2ef75db682d028a0df8f2169cdb1bcf15", "value of sampling_distribution_panel.vconcat[0]['title'] is not correct"
assert sha1(str(sampling_distribution_panel.vconcat[0]['title']).encode("utf-8")+b"41b2e").hexdigest() == "7bd90ec2ef75db682d028a0df8f2169cdb1bcf15", "correct string value of sampling_distribution_panel.vconcat[0]['title'] but incorrect case of letters"

assert sha1(str(type(sampling_distribution_panel.vconcat[1]['title'])).encode("utf-8")+b"41b2f").hexdigest() == "d74047dd71009392392c1655271b9f0302102ad2", "type of sampling_distribution_panel.vconcat[1]['title'] is not str. sampling_distribution_panel.vconcat[1]['title'] should be an str"
assert sha1(str(len(sampling_distribution_panel.vconcat[1]['title'])).encode("utf-8")+b"41b2f").hexdigest() == "154b8d69d71f59eb9b8224510946cbee7c81b9ff", "length of sampling_distribution_panel.vconcat[1]['title'] is not correct"
assert sha1(str(sampling_distribution_panel.vconcat[1]['title'].lower()).encode("utf-8")+b"41b2f").hexdigest() == "501b1ad93e16420a9a3ecd9b9ffe7539938e3c1e", "value of sampling_distribution_panel.vconcat[1]['title'] is not correct"
assert sha1(str(sampling_distribution_panel.vconcat[1]['title']).encode("utf-8")+b"41b2f").hexdigest() == "501b1ad93e16420a9a3ecd9b9ffe7539938e3c1e", "correct string value of sampling_distribution_panel.vconcat[1]['title'] but incorrect case of letters"

assert sha1(str(type(sampling_distribution_panel.vconcat[2]['title'])).encode("utf-8")+b"41b30").hexdigest() == "a6ae099dff9c30359d8ff92abdf5187f2e304b40", "type of sampling_distribution_panel.vconcat[2]['title'] is not str. sampling_distribution_panel.vconcat[2]['title'] should be an str"
assert sha1(str(len(sampling_distribution_panel.vconcat[2]['title'])).encode("utf-8")+b"41b30").hexdigest() == "0dde8df98596b0f7553946f0fabbf336c0d4b74d", "length of sampling_distribution_panel.vconcat[2]['title'] is not correct"
assert sha1(str(sampling_distribution_panel.vconcat[2]['title'].lower()).encode("utf-8")+b"41b30").hexdigest() == "033e4a9ce6acbf9ce8f99b2299bcce3eea495a1b", "value of sampling_distribution_panel.vconcat[2]['title'] is not correct"
assert sha1(str(sampling_distribution_panel.vconcat[2]['title']).encode("utf-8")+b"41b30").hexdigest() == "033e4a9ce6acbf9ce8f99b2299bcce3eea495a1b", "correct string value of sampling_distribution_panel.vconcat[2]['title'] but incorrect case of letters"

assert sha1(str(type(len(sampling_distribution_panel.vconcat))).encode("utf-8")+b"41b31").hexdigest() == "124a7af552fb62a649576f786d9b45aed34edf4c", "type of len(sampling_distribution_panel.vconcat) is not int. Please make sure it is int and not np.int64, etc. You can cast your value into an int using int()"
assert sha1(str(len(sampling_distribution_panel.vconcat)).encode("utf-8")+b"41b31").hexdigest() == "d63aff658c0ec0029caa540b13e3c92a74a6df98", "value of len(sampling_distribution_panel.vconcat) is not correct"

print('Success!')

**Question 2.8** Multiple Choice
<br> {points: 1}

Considering the panel figure you created above in **question 2.7**, which of the following statements below **is not** correct:

A. As the sample size increases, the sampling distribution of the point estimate becomes narrower.

B. As the sample size increases, more sample point estimates are closer to the true population mean.

C. As the sample size decreses, the sample point estimates become more variable.

D. As the sample size increases, the sample point estimates become more variable.

*Assign your answer to an object called `answer2_8`. Your answer should be a single character surrounded by quotes.*

In [None]:
# your code here
raise NotImplementedError
answer2_8

In [None]:
from hashlib import sha1
assert sha1(str(type(answer2_8)).encode("utf-8")+b"4d87").hexdigest() == "609204ef6413c931f65ec9bf71d605b79b694e2a", "type of answer2_8 is not str. answer2_8 should be an str"
assert sha1(str(len(answer2_8)).encode("utf-8")+b"4d87").hexdigest() == "ee52adb4d4af3f16ffed5cb71a4dca92e3e1a9d8", "length of answer2_8 is not correct"
assert sha1(str(answer2_8.lower()).encode("utf-8")+b"4d87").hexdigest() == "dab045ddec0066d70c471e7edea0397325e7cbd8", "value of answer2_8 is not correct"
assert sha1(str(answer2_8).encode("utf-8")+b"4d87").hexdigest() == "6b7b68dcaa09859f7104ecf7cfaff3171e67f447", "correct string value of answer2_8 but incorrect case of letters"

print('Success!')

**Question 2.9** True/False
<br> {points: 1}

Given what you observed above, and considering the real life scenario where you will only have one sample, answer the True/False question below:

The smaller your random sample, the better your sample point estimate reflect the true population parameter you are trying to estimate. True or False?

*Assign your answer to an object called `answer2_9`. Your answer should be a boolean. i.e. `True` or `False`.*

In [None]:
# your code here
raise NotImplementedError
answer2_9

In [None]:
from hashlib import sha1
assert sha1(str(type(answer2_9)).encode("utf-8")+b"19cb").hexdigest() == "93bfb5501b88f939f7861bd364faeee342d794d2", "type of answer2_9 is not bool. answer2_9 should be a bool"
assert sha1(str(answer2_9).encode("utf-8")+b"19cb").hexdigest() == "e4c96356336098472378b9841c394d8de1cae8cf", "boolean value of answer2_9 is not correct"

print('Success!')