# Tutorial 11 - Introduction to Statistical Inference

### Lecture and Tutorial Learning Goals:

After completing this week's lecture and tutorial work, you will be able to:
- Describe real world examples of questions that can be answered with the statistical inference methods.
- Name common population parameters (e.g., mean, proportion, median, variance, standard deviation) that are often estimated using sample data, and use computation to estimate these.
- Define the following statistical sampling terms (population, sample, population parameter, point estimate, sampling distribution).
- Explain the difference between a population parameter and sample point estimate.
- Use computation to draw random samples from a finite population.
- Use computation to create a sampling distribution from a finite population.
- Describe how sample size influences the sampling distribution.

This worksheet covers parts of [Chapter 10](https://python.datasciencebook.ca/inference) of the online textbook. You should read this chapter before attempting this assignment. Any place you see `___`, you must fill in the function, variable, or data to complete the code. Substitute the `raise NotImplementedError` with your completed code and answers then proceed to run the cell.

In [None]:
### Run this cell before continuing.
import altair as alt
import numpy as np
import pandas as pd

# Simplify working with large datasets in Altair
alt.data_transformers.enable('vegafusion')

###  Virtual sampling simulation

In this tutorial you will study samples and sample means generated from different distributions. In real life, we rarely, if ever, have measurements for our entire population. Here, however, we will make simulated datasets so we can understand the behaviour of sample means.

Suppose we had the data science final grades for a large population of students. 

In [None]:
# run this cell to simulate a finite population
np.random.seed(20201)  # DO NOT CHANGE
students_pop = pd.DataFrame({
    "grade": np.random.normal(size=10_000, loc=70, scale=8)
})
students_pop

**Question 1.0** 
<br> {points: 1}

Visualize the distribution of the population (`students_pop`) that was just created by plotting a histogram with `maxbins=30`. Name the plot `pop_dist` and give the x-axis a descriptive label.

In [None]:
# ___ = alt.Chart(___, title="Population distribution").___().encode(
#     x=alt.X(___)
#         .title(___)
#         .___(___=1),
#     y=___
# )

# your code here
raise NotImplementedError
pop_dist

In [None]:
from hashlib import sha1
assert sha1(str(type(pop_dist.encoding.x['shorthand'])).encode("utf-8")+b"3a54a").hexdigest() == "04c864fa9a8ae5f85af366d4480a88beedd1f270", "type of pop_dist.encoding.x['shorthand'] is not str. pop_dist.encoding.x['shorthand'] should be an str"
assert sha1(str(len(pop_dist.encoding.x['shorthand'])).encode("utf-8")+b"3a54a").hexdigest() == "b5dff484dca8e9dc3d69a63991c5b38c371835b5", "length of pop_dist.encoding.x['shorthand'] is not correct"
assert sha1(str(pop_dist.encoding.x['shorthand'].lower()).encode("utf-8")+b"3a54a").hexdigest() == "422891c4fa64af2778038599e852389e519ff498", "value of pop_dist.encoding.x['shorthand'] is not correct"
assert sha1(str(pop_dist.encoding.x['shorthand']).encode("utf-8")+b"3a54a").hexdigest() == "422891c4fa64af2778038599e852389e519ff498", "correct string value of pop_dist.encoding.x['shorthand'] but incorrect case of letters"

assert sha1(str(type(pop_dist.encoding.y['shorthand'])).encode("utf-8")+b"3a54b").hexdigest() == "eb7bcabc5de1e4f1e57917a44913065c34dc9c91", "type of pop_dist.encoding.y['shorthand'] is not str. pop_dist.encoding.y['shorthand'] should be an str"
assert sha1(str(len(pop_dist.encoding.y['shorthand'])).encode("utf-8")+b"3a54b").hexdigest() == "bc4cec541660aa2d4afc54cba18e9337b98ee0a9", "length of pop_dist.encoding.y['shorthand'] is not correct"
assert sha1(str(pop_dist.encoding.y['shorthand'].lower()).encode("utf-8")+b"3a54b").hexdigest() == "dc56262ceefefce213ca86da3de4f86c48ec1dc7", "value of pop_dist.encoding.y['shorthand'] is not correct"
assert sha1(str(pop_dist.encoding.y['shorthand']).encode("utf-8")+b"3a54b").hexdigest() == "dc56262ceefefce213ca86da3de4f86c48ec1dc7", "correct string value of pop_dist.encoding.y['shorthand'] but incorrect case of letters"

assert sha1(str(type(pop_dist.mark)).encode("utf-8")+b"3a54c").hexdigest() == "245fb3e2bc9a0c31291fd896149268b80f262d68", "type of pop_dist.mark is not str. pop_dist.mark should be an str"
assert sha1(str(len(pop_dist.mark)).encode("utf-8")+b"3a54c").hexdigest() == "95abc548f5dd44fdb60eaa3f7d82ba26d0d1d082", "length of pop_dist.mark is not correct"
assert sha1(str(pop_dist.mark.lower()).encode("utf-8")+b"3a54c").hexdigest() == "f7f0a095fd536a2a37b290dc5988aa04cbe03c55", "value of pop_dist.mark is not correct"
assert sha1(str(pop_dist.mark).encode("utf-8")+b"3a54c").hexdigest() == "f7f0a095fd536a2a37b290dc5988aa04cbe03c55", "correct string value of pop_dist.mark but incorrect case of letters"

assert sha1(str(type(pop_dist.data.shape[0])).encode("utf-8")+b"3a54d").hexdigest() == "7d467bfae02a79bffed9a56010adf20c3cd502c5", "type of pop_dist.data.shape[0] is not int. Please make sure it is int and not np.int64, etc. You can cast your value into an int using int()"
assert sha1(str(pop_dist.data.shape[0]).encode("utf-8")+b"3a54d").hexdigest() == "0d9f9829362a7975701b3390ff31a10e132bf74c", "value of pop_dist.data.shape[0] is not correct"

assert sha1(str(type(round(sum(pop_dist.data.grade), 2))).encode("utf-8")+b"3a54e").hexdigest() == "71788ab15ef26bead52b7ef16e808da660f7d862", "type of round(sum(pop_dist.data.grade), 2) is not float. Please make sure it is float and not np.float64, etc. You can cast your value into a float using float()"
assert sha1(str(round(round(sum(pop_dist.data.grade), 2), 2)).encode("utf-8")+b"3a54e").hexdigest() == "8026d360b9459e8812540a43e1565cd4bdc66fd6", "value of round(sum(pop_dist.data.grade), 2) is not correct (rounded to 2 decimal places)"

print('Success!')

**Question 1.1** 
<br> {points: 3}

Describe in words the distribution above, comment on the shape, center and how spread out the distribution is. 

DOUBLE CLICK TO EDIT **THIS CELL** AND REPLACE THIS TEXT WITH YOUR ANSWER.

**Question 1.2** 
<br> {points: 1}

Use `agg` to calculate the following point estimates from the `students_pop` population:

- mean 
- median 
- standard deviation 

Name this data frame `pop_parameters`.

In [None]:
# your code here
raise NotImplementedError
pop_parameters

In [None]:
from hashlib import sha1
assert sha1(str(type(pop_parameters.shape[0])).encode("utf-8")+b"a2397").hexdigest() == "c36d0cddb5c7771a665d999c7fdc21549781978b", "type of pop_parameters.shape[0] is not int. Please make sure it is int and not np.int64, etc. You can cast your value into an int using int()"
assert sha1(str(pop_parameters.shape[0]).encode("utf-8")+b"a2397").hexdigest() == "fed24554b732b9f344b265419996164a549ecbd2", "value of pop_parameters.shape[0] is not correct"

assert sha1(str(type(pop_parameters.shape[1])).encode("utf-8")+b"a2398").hexdigest() == "7d830e0022e2545c05c3d62de4ebd495a56a6ead", "type of pop_parameters.shape[1] is not int. Please make sure it is int and not np.int64, etc. You can cast your value into an int using int()"
assert sha1(str(pop_parameters.shape[1]).encode("utf-8")+b"a2398").hexdigest() == "341991977cc80ada38c41b9b214f2be10bb71e09", "value of pop_parameters.shape[1] is not correct"

assert sha1(str(type("".join(pop_parameters.columns.values))).encode("utf-8")+b"a2399").hexdigest() == "2735f0461ff049de0c2212e55c78629337bdddb0", "type of \"\".join(pop_parameters.columns.values) is not str. \"\".join(pop_parameters.columns.values) should be an str"
assert sha1(str(len("".join(pop_parameters.columns.values))).encode("utf-8")+b"a2399").hexdigest() == "fae476edc83dd096ba23dcafa792e04915e65517", "length of \"\".join(pop_parameters.columns.values) is not correct"
assert sha1(str("".join(pop_parameters.columns.values).lower()).encode("utf-8")+b"a2399").hexdigest() == "a6f63acb5f7302e91a7cc847728ef355c52a34e0", "value of \"\".join(pop_parameters.columns.values) is not correct"
assert sha1(str("".join(pop_parameters.columns.values)).encode("utf-8")+b"a2399").hexdigest() == "a6f63acb5f7302e91a7cc847728ef355c52a34e0", "correct string value of \"\".join(pop_parameters.columns.values) but incorrect case of letters"

print('Success!')

### Exploring the sampling distribution of the sample mean for different populations
We will create the sampling distribution of the sample mean by taking 1500 random samples of size 5 from this population and visualize the distribution of the sample means. 


**Question 1.3** 
<br> {points: 1}

Draw 1500 random samples from our population of students (`students_pop`). Each sample should have 5 observations. Name the data frame `samples`.

In [None]:
np.random.seed(4321)  # DO NOT CHANGE

# ___ = pd.concat([
#     ___.sample(___).___(replicate=n)
#     for n in range(___)
# ])

# your code here
raise NotImplementedError
samples

In [None]:
from hashlib import sha1
assert sha1(str(type(samples.shape[0])).encode("utf-8")+b"424d7").hexdigest() == "ac208bb5ced8f90dd2e37c8457e2846b4b5824a7", "type of samples.shape[0] is not int. Please make sure it is int and not np.int64, etc. You can cast your value into an int using int()"
assert sha1(str(samples.shape[0]).encode("utf-8")+b"424d7").hexdigest() == "5a34ea675f3bbb9ceb6d926c62f3073bf76be3cf", "value of samples.shape[0] is not correct"

assert sha1(str(type(samples.shape[1])).encode("utf-8")+b"424d8").hexdigest() == "207826ea2ed30fab262d40d867dfc57915f41e0d", "type of samples.shape[1] is not int. Please make sure it is int and not np.int64, etc. You can cast your value into an int using int()"
assert sha1(str(samples.shape[1]).encode("utf-8")+b"424d8").hexdigest() == "17fc6fb5381006f6463448881da2d77ac532a0b2", "value of samples.shape[1] is not correct"

assert sha1(str(type("".join(samples.columns))).encode("utf-8")+b"424d9").hexdigest() == "3381151766e29a7167200273ec6b682b0c5b0f5c", "type of \"\".join(samples.columns) is not str. \"\".join(samples.columns) should be an str"
assert sha1(str(len("".join(samples.columns))).encode("utf-8")+b"424d9").hexdigest() == "ec91ea9f7d18576a9d7389c4817a3dc6519d6430", "length of \"\".join(samples.columns) is not correct"
assert sha1(str("".join(samples.columns).lower()).encode("utf-8")+b"424d9").hexdigest() == "fb2cadd9a7a6832072d98e8db0b0fe1b5caad96c", "value of \"\".join(samples.columns) is not correct"
assert sha1(str("".join(samples.columns)).encode("utf-8")+b"424d9").hexdigest() == "fb2cadd9a7a6832072d98e8db0b0fe1b5caad96c", "correct string value of \"\".join(samples.columns) but incorrect case of letters"

assert sha1(str(type(sum(samples.replicate.unique()))).encode("utf-8")+b"424da").hexdigest() == "bcadc01cf6116b1bd1d13573a011eaa5f1515cab", "type of sum(samples.replicate.unique()) is not correct"
assert sha1(str(sum(samples.replicate.unique())).encode("utf-8")+b"424da").hexdigest() == "3a33b37b9745f4b6f7eb5de444f86d794631384e", "value of sum(samples.replicate.unique()) is not correct"

print('Success!')

**Question 1.4** 
<br> {points: 1}

Group by the sample replicate number, and then for each sample, calculate the mean. Name the data frame `sample_estimates`. The data frame should have the column names `replicate` and `mean_grade`.

In [None]:
# your code here
raise NotImplementedError
sample_estimates

In [None]:
from hashlib import sha1
assert sha1(str(type(sample_estimates.shape[0])).encode("utf-8")+b"87a20").hexdigest() == "649825d9a6c76c03b1f6e410815aa5efdb38e3c4", "type of sample_estimates.shape[0] is not int. Please make sure it is int and not np.int64, etc. You can cast your value into an int using int()"
assert sha1(str(sample_estimates.shape[0]).encode("utf-8")+b"87a20").hexdigest() == "368bbacee0c0735da54e80bcde01d7275882481c", "value of sample_estimates.shape[0] is not correct"

assert sha1(str(type(sample_estimates.shape[1])).encode("utf-8")+b"87a21").hexdigest() == "70ffc7228045c01e28562294bcee5e417666e2d9", "type of sample_estimates.shape[1] is not int. Please make sure it is int and not np.int64, etc. You can cast your value into an int using int()"
assert sha1(str(sample_estimates.shape[1]).encode("utf-8")+b"87a21").hexdigest() == "d51a50eaa3d72e3bb56a7f22a4c91d53fbbb1cef", "value of sample_estimates.shape[1] is not correct"

assert sha1(str(type("".join(sample_estimates.columns))).encode("utf-8")+b"87a22").hexdigest() == "6de7480a2e480eccb91549395f6844730c7450f8", "type of \"\".join(sample_estimates.columns) is not str. \"\".join(sample_estimates.columns) should be an str"
assert sha1(str(len("".join(sample_estimates.columns))).encode("utf-8")+b"87a22").hexdigest() == "23b842aa57380439320adb790af93112dda2038d", "length of \"\".join(sample_estimates.columns) is not correct"
assert sha1(str("".join(sample_estimates.columns).lower()).encode("utf-8")+b"87a22").hexdigest() == "8ccba9b5e969f4bf0bb4864e1de7f15ffff4f60a", "value of \"\".join(sample_estimates.columns) is not correct"
assert sha1(str("".join(sample_estimates.columns)).encode("utf-8")+b"87a22").hexdigest() == "8ccba9b5e969f4bf0bb4864e1de7f15ffff4f60a", "correct string value of \"\".join(sample_estimates.columns) but incorrect case of letters"

print('Success!')

**Question 1.5** 
<br> {points: 1}

Visualize the distribution of the sample estimates (`sample_estimates`) you just calculated by plotting a histogram with `maxbins=30`. Name the plot `sampling_distribution` and give the plot and the x axis a descriptive label.

In [None]:
# your code here
raise NotImplementedError
sampling_distribution_5

In [None]:
from hashlib import sha1
assert sha1(str(type(sampling_distribution_5.encoding.x['shorthand'])).encode("utf-8")+b"f00f5").hexdigest() == "3dc47af2fef6ec8bca25d4c48e6e9bfd96559d0f", "type of sampling_distribution_5.encoding.x['shorthand'] is not str. sampling_distribution_5.encoding.x['shorthand'] should be an str"
assert sha1(str(len(sampling_distribution_5.encoding.x['shorthand'])).encode("utf-8")+b"f00f5").hexdigest() == "a879a086ef42c6807067544dcec52ff746e01c5f", "length of sampling_distribution_5.encoding.x['shorthand'] is not correct"
assert sha1(str(sampling_distribution_5.encoding.x['shorthand'].lower()).encode("utf-8")+b"f00f5").hexdigest() == "208850815cfdcc4bbe1e67f08a45289f9f287daa", "value of sampling_distribution_5.encoding.x['shorthand'] is not correct"
assert sha1(str(sampling_distribution_5.encoding.x['shorthand']).encode("utf-8")+b"f00f5").hexdigest() == "208850815cfdcc4bbe1e67f08a45289f9f287daa", "correct string value of sampling_distribution_5.encoding.x['shorthand'] but incorrect case of letters"

assert sha1(str(type(sampling_distribution_5.encoding.y['shorthand'])).encode("utf-8")+b"f00f6").hexdigest() == "1e9a0dbd3ec9a8617f78a628e656b8cb8eb2414f", "type of sampling_distribution_5.encoding.y['shorthand'] is not str. sampling_distribution_5.encoding.y['shorthand'] should be an str"
assert sha1(str(len(sampling_distribution_5.encoding.y['shorthand'])).encode("utf-8")+b"f00f6").hexdigest() == "6ddac2518ed3828b6dc10021d9c63657709b317d", "length of sampling_distribution_5.encoding.y['shorthand'] is not correct"
assert sha1(str(sampling_distribution_5.encoding.y['shorthand'].lower()).encode("utf-8")+b"f00f6").hexdigest() == "1a29132afba8c30f893161c0d2bff20b27a21342", "value of sampling_distribution_5.encoding.y['shorthand'] is not correct"
assert sha1(str(sampling_distribution_5.encoding.y['shorthand']).encode("utf-8")+b"f00f6").hexdigest() == "1a29132afba8c30f893161c0d2bff20b27a21342", "correct string value of sampling_distribution_5.encoding.y['shorthand'] but incorrect case of letters"

assert sha1(str(type(sampling_distribution_5.mark)).encode("utf-8")+b"f00f7").hexdigest() == "08e979771666c2510e4ded84c49659a533b094fa", "type of sampling_distribution_5.mark is not str. sampling_distribution_5.mark should be an str"
assert sha1(str(len(sampling_distribution_5.mark)).encode("utf-8")+b"f00f7").hexdigest() == "a6f134f9b78cee9712a7261acafedfd987295457", "length of sampling_distribution_5.mark is not correct"
assert sha1(str(sampling_distribution_5.mark.lower()).encode("utf-8")+b"f00f7").hexdigest() == "6181ac5bcdafffca92df059012cc3f3a61b957d8", "value of sampling_distribution_5.mark is not correct"
assert sha1(str(sampling_distribution_5.mark).encode("utf-8")+b"f00f7").hexdigest() == "6181ac5bcdafffca92df059012cc3f3a61b957d8", "correct string value of sampling_distribution_5.mark but incorrect case of letters"

assert sha1(str(type(sampling_distribution_5.data.shape[0])).encode("utf-8")+b"f00f8").hexdigest() == "3a44d8a903d05fe34eb3f511a49df8fe469064ec", "type of sampling_distribution_5.data.shape[0] is not int. Please make sure it is int and not np.int64, etc. You can cast your value into an int using int()"
assert sha1(str(sampling_distribution_5.data.shape[0]).encode("utf-8")+b"f00f8").hexdigest() == "a25b5aacdfb28e4523bc330727ccb684de273b05", "value of sampling_distribution_5.data.shape[0] is not correct"

assert sha1(str(type(round(sum(sampling_distribution_5.data.sum()), 2))).encode("utf-8")+b"f00f9").hexdigest() == "939ccf85926a4bc969d708e7418f590950c816e4", "type of round(sum(sampling_distribution_5.data.sum()), 2) is not float. Please make sure it is float and not np.float64, etc. You can cast your value into a float using float()"
assert sha1(str(round(round(sum(sampling_distribution_5.data.sum()), 2), 2)).encode("utf-8")+b"f00f9").hexdigest() == "a5974054af0efe0dc2b3fc56cf24bbce0ece84a8", "value of round(sum(sampling_distribution_5.data.sum()), 2) is not correct (rounded to 2 decimal places)"

assert sha1(str(type(isinstance(sampling_distribution_5.encoding.x['title'], str))).encode("utf-8")+b"f00fa").hexdigest() == "343455f8bd30887a2632e8032591c7910569f627", "type of isinstance(sampling_distribution_5.encoding.x['title'], str) is not bool. isinstance(sampling_distribution_5.encoding.x['title'], str) should be a bool"
assert sha1(str(isinstance(sampling_distribution_5.encoding.x['title'], str)).encode("utf-8")+b"f00fa").hexdigest() == "11b0ddc0138975a09c828626a5bca712a42f0161", "boolean value of isinstance(sampling_distribution_5.encoding.x['title'], str) is not correct"

assert sha1(str(type(isinstance(sampling_distribution_5.encoding.y['title'], str))).encode("utf-8")+b"f00fb").hexdigest() == "6a93cf6b9d5ebfae682b8c3920544a4fb9dcfb6e", "type of isinstance(sampling_distribution_5.encoding.y['title'], str) is not bool. isinstance(sampling_distribution_5.encoding.y['title'], str) should be a bool"
assert sha1(str(isinstance(sampling_distribution_5.encoding.y['title'], str)).encode("utf-8")+b"f00fb").hexdigest() == "afe0254b0f8b456c2bf986a4f40ce14341040eec", "boolean value of isinstance(sampling_distribution_5.encoding.y['title'], str) is not correct"

print('Success!')

**Question 1.6** 
<br> {points: 3}

Describe in words the distribution above, comment on the shape, center and how spread out the distribution is. Compare this sampling distribution to the population distribution of students' grades above. 

DOUBLE CLICK TO EDIT **THIS CELL** AND REPLACE THIS TEXT WITH YOUR ANSWER.

**Question 1.7**
<br> {points: 1}

Let's create a simulated dataset of the number of cups of coffee drunk per week for our population of students. 
Describe in words the distribution, comment on the shape, center and how spread out the distribution is.

In [None]:
# run this cell to simulate a finite population
coffee_data = pd.DataFrame({
    "cups": np.random.exponential(size=2000, scale=1 / 0.34)
})

pop_dist = alt.Chart(coffee_data, title="Population distribution").mark_bar().encode(
    x=alt.X("cups")
        .title("Cups of coffee per week")
        .bin(maxbins=30),
    y="count()"
)
pop_dist

DOUBLE CLICK TO EDIT **THIS CELL** AND REPLACE THIS TEXT WITH YOUR ANSWER.

**Question 1.8**
<br> {points: 1}

Repeat the steps in questions 1.3 - 1.5 with sample size 5 for this population. Remember to rename the column containing the mean number of cups per replicate to `mean_cups`. You should end up with a plot of the sampling distribution called `sampling_distribution_5`. Set the `maxbins` of this plot to be 30 and the title to be `"Sampling distribution of the sample means"`.

In [None]:
np.random.seed(4321)  # DO NOT CHANGE!

# your code here
raise NotImplementedError
sampling_distribution_5

In [None]:
from hashlib import sha1
assert sha1(str(type(sampling_distribution_5.encoding.x['shorthand'])).encode("utf-8")+b"1875b").hexdigest() == "f83eeb2a020f3e15c2423fbb712563bb9eecf517", "type of sampling_distribution_5.encoding.x['shorthand'] is not str. sampling_distribution_5.encoding.x['shorthand'] should be an str"
assert sha1(str(len(sampling_distribution_5.encoding.x['shorthand'])).encode("utf-8")+b"1875b").hexdigest() == "50ff712c4df349281a1535069f38382e97fee750", "length of sampling_distribution_5.encoding.x['shorthand'] is not correct"
assert sha1(str(sampling_distribution_5.encoding.x['shorthand'].lower()).encode("utf-8")+b"1875b").hexdigest() == "9a1e7f3a80a203fca88868d19a454b0e31ac0884", "value of sampling_distribution_5.encoding.x['shorthand'] is not correct"
assert sha1(str(sampling_distribution_5.encoding.x['shorthand']).encode("utf-8")+b"1875b").hexdigest() == "9a1e7f3a80a203fca88868d19a454b0e31ac0884", "correct string value of sampling_distribution_5.encoding.x['shorthand'] but incorrect case of letters"

assert sha1(str(type(sampling_distribution_5.encoding.y['shorthand'])).encode("utf-8")+b"1875c").hexdigest() == "6f3fdd8c026a5e22d4bf55de577b5cc81dc73ef5", "type of sampling_distribution_5.encoding.y['shorthand'] is not str. sampling_distribution_5.encoding.y['shorthand'] should be an str"
assert sha1(str(len(sampling_distribution_5.encoding.y['shorthand'])).encode("utf-8")+b"1875c").hexdigest() == "322efa6f6a1f0ca7bb97c2ee59f91c4da0780571", "length of sampling_distribution_5.encoding.y['shorthand'] is not correct"
assert sha1(str(sampling_distribution_5.encoding.y['shorthand'].lower()).encode("utf-8")+b"1875c").hexdigest() == "496e4c7a81fb52855937791d97331baed07b9447", "value of sampling_distribution_5.encoding.y['shorthand'] is not correct"
assert sha1(str(sampling_distribution_5.encoding.y['shorthand']).encode("utf-8")+b"1875c").hexdigest() == "496e4c7a81fb52855937791d97331baed07b9447", "correct string value of sampling_distribution_5.encoding.y['shorthand'] but incorrect case of letters"

assert sha1(str(type(sampling_distribution_5.mark)).encode("utf-8")+b"1875d").hexdigest() == "dc6a28125f18f8b492e357c73d3eb54508609413", "type of sampling_distribution_5.mark is not str. sampling_distribution_5.mark should be an str"
assert sha1(str(len(sampling_distribution_5.mark)).encode("utf-8")+b"1875d").hexdigest() == "ef5baf2890ffc7a5cda0b006cbc6fb74698ea1c0", "length of sampling_distribution_5.mark is not correct"
assert sha1(str(sampling_distribution_5.mark.lower()).encode("utf-8")+b"1875d").hexdigest() == "b21767f2a4ef5e345abb6df2a26f50b72c5babe6", "value of sampling_distribution_5.mark is not correct"
assert sha1(str(sampling_distribution_5.mark).encode("utf-8")+b"1875d").hexdigest() == "b21767f2a4ef5e345abb6df2a26f50b72c5babe6", "correct string value of sampling_distribution_5.mark but incorrect case of letters"

assert sha1(str(type(sampling_distribution_5.data.shape[0])).encode("utf-8")+b"1875e").hexdigest() == "dd3752aff175262fa7ef4f0f991391fe5745c108", "type of sampling_distribution_5.data.shape[0] is not int. Please make sure it is int and not np.int64, etc. You can cast your value into an int using int()"
assert sha1(str(sampling_distribution_5.data.shape[0]).encode("utf-8")+b"1875e").hexdigest() == "826d463fb42bc73b55fafdbc1f24d9901bc14609", "value of sampling_distribution_5.data.shape[0] is not correct"

assert sha1(str(type(round(sum(sampling_distribution_5.data.sum()), 2))).encode("utf-8")+b"1875f").hexdigest() == "d920a4d9581ace9ede9d85ec015214295ed31b8b", "type of round(sum(sampling_distribution_5.data.sum()), 2) is not float. Please make sure it is float and not np.float64, etc. You can cast your value into a float using float()"
assert sha1(str(round(round(sum(sampling_distribution_5.data.sum()), 2), 2)).encode("utf-8")+b"1875f").hexdigest() == "dde070ba6dd0c06bb190d7d0f513cb4e9588583d", "value of round(sum(sampling_distribution_5.data.sum()), 2) is not correct (rounded to 2 decimal places)"

assert sha1(str(type(isinstance(sampling_distribution_5.encoding.x['title'], str))).encode("utf-8")+b"18760").hexdigest() == "069f690cd8c977e868985aa02ee459ba99763950", "type of isinstance(sampling_distribution_5.encoding.x['title'], str) is not bool. isinstance(sampling_distribution_5.encoding.x['title'], str) should be a bool"
assert sha1(str(isinstance(sampling_distribution_5.encoding.x['title'], str)).encode("utf-8")+b"18760").hexdigest() == "0ee8ae9bbe8f97eb462f5c25938e71c00b48400e", "boolean value of isinstance(sampling_distribution_5.encoding.x['title'], str) is not correct"

assert sha1(str(type(isinstance(sampling_distribution_5.encoding.y['title'], str))).encode("utf-8")+b"18761").hexdigest() == "0e46c30b57a19750300c761662f70f9dc98cb502", "type of isinstance(sampling_distribution_5.encoding.y['title'], str) is not bool. isinstance(sampling_distribution_5.encoding.y['title'], str) should be a bool"
assert sha1(str(isinstance(sampling_distribution_5.encoding.y['title'], str)).encode("utf-8")+b"18761").hexdigest() == "072b8d3e2f580bf7a8ba98730f3978a09c0b9937", "boolean value of isinstance(sampling_distribution_5.encoding.y['title'], str) is not correct"

assert sha1(str(type(sampling_distribution_5.title is not None)).encode("utf-8")+b"18762").hexdigest() == "71343490cfffe85f9023402bf0ad1eb8b5c91bc6", "type of sampling_distribution_5.title is not None is not bool. sampling_distribution_5.title is not None should be a bool"
assert sha1(str(sampling_distribution_5.title is not None).encode("utf-8")+b"18762").hexdigest() == "8b39549f7508f25150ce6e924dbb1b5eb9970eed", "boolean value of sampling_distribution_5.title is not None is not correct"

assert sha1(str(type(sum(samples.replicate.unique()))).encode("utf-8")+b"18763").hexdigest() == "fc82ec786a1d957653a20ee648a3952d38318b24", "type of sum(samples.replicate.unique()) is not correct"
assert sha1(str(sum(samples.replicate.unique())).encode("utf-8")+b"18763").hexdigest() == "356e8fa3ed96357b948e522926b3283a9ea8b6c4", "value of sum(samples.replicate.unique()) is not correct"

print('Success!')

**Question 1.9** 
<br> {points: 3}

Describe in words the distribution above, comment on the shape, center and how spread out the distribution is. Compare this sampling distribution to the population distribution above. 

DOUBLE CLICK TO EDIT **THIS CELL** AND REPLACE THIS TEXT WITH YOUR ANSWER.

**Question 2.0** 
<br> {points: 1}

Repeat the steps in questions 1.3 - 1.5 using a sample size of 30 for this coffee population. Remember to rename the column containing the mean number of cups per replicate to `mean_cups`. You should end up with a plot of the sampling distribution called `sampling_distribution_30`. Set the `maxbins` of this plot to be 30 and the title to be `"Sampling distribution of the sample means"`.

In [None]:
np.random.seed(4321)  # DO NOT CHANGE!

# your code here
raise NotImplementedError
sampling_distribution_30

In [None]:
from hashlib import sha1
assert sha1(str(type(sampling_distribution_30.encoding.x['shorthand'])).encode("utf-8")+b"82fa5").hexdigest() == "03ad64982694698b60b68524320dba0a99d1e346", "type of sampling_distribution_30.encoding.x['shorthand'] is not str. sampling_distribution_30.encoding.x['shorthand'] should be an str"
assert sha1(str(len(sampling_distribution_30.encoding.x['shorthand'])).encode("utf-8")+b"82fa5").hexdigest() == "7513741475f0b9d6b78e781fd6f1c225631fcd35", "length of sampling_distribution_30.encoding.x['shorthand'] is not correct"
assert sha1(str(sampling_distribution_30.encoding.x['shorthand'].lower()).encode("utf-8")+b"82fa5").hexdigest() == "a15f0227e8a59da24861f24550beb57a08ff5058", "value of sampling_distribution_30.encoding.x['shorthand'] is not correct"
assert sha1(str(sampling_distribution_30.encoding.x['shorthand']).encode("utf-8")+b"82fa5").hexdigest() == "a15f0227e8a59da24861f24550beb57a08ff5058", "correct string value of sampling_distribution_30.encoding.x['shorthand'] but incorrect case of letters"

assert sha1(str(type(sampling_distribution_30.encoding.y['shorthand'])).encode("utf-8")+b"82fa6").hexdigest() == "1feefddd9c6d09cc2d06754cd5f74385e2672cc0", "type of sampling_distribution_30.encoding.y['shorthand'] is not str. sampling_distribution_30.encoding.y['shorthand'] should be an str"
assert sha1(str(len(sampling_distribution_30.encoding.y['shorthand'])).encode("utf-8")+b"82fa6").hexdigest() == "4144e65388059f9ec743cd449527d47926e22bd4", "length of sampling_distribution_30.encoding.y['shorthand'] is not correct"
assert sha1(str(sampling_distribution_30.encoding.y['shorthand'].lower()).encode("utf-8")+b"82fa6").hexdigest() == "b01d36c832723f6e7441ee9e401be1e0ec04d09d", "value of sampling_distribution_30.encoding.y['shorthand'] is not correct"
assert sha1(str(sampling_distribution_30.encoding.y['shorthand']).encode("utf-8")+b"82fa6").hexdigest() == "b01d36c832723f6e7441ee9e401be1e0ec04d09d", "correct string value of sampling_distribution_30.encoding.y['shorthand'] but incorrect case of letters"

assert sha1(str(type(sampling_distribution_30.mark)).encode("utf-8")+b"82fa7").hexdigest() == "e3cbfae32bb8466f58122f54de1cc3ce1a65b05d", "type of sampling_distribution_30.mark is not str. sampling_distribution_30.mark should be an str"
assert sha1(str(len(sampling_distribution_30.mark)).encode("utf-8")+b"82fa7").hexdigest() == "47aa9a6b8e084d37aa9f16835fa3b0792fe067e1", "length of sampling_distribution_30.mark is not correct"
assert sha1(str(sampling_distribution_30.mark.lower()).encode("utf-8")+b"82fa7").hexdigest() == "8c840c128fbcee7c4c2bc28979abe968ec72e91b", "value of sampling_distribution_30.mark is not correct"
assert sha1(str(sampling_distribution_30.mark).encode("utf-8")+b"82fa7").hexdigest() == "8c840c128fbcee7c4c2bc28979abe968ec72e91b", "correct string value of sampling_distribution_30.mark but incorrect case of letters"

assert sha1(str(type(sampling_distribution_30.data.shape[0])).encode("utf-8")+b"82fa8").hexdigest() == "1df4480ed39592a4612147f9d90189f6142ce13e", "type of sampling_distribution_30.data.shape[0] is not int. Please make sure it is int and not np.int64, etc. You can cast your value into an int using int()"
assert sha1(str(sampling_distribution_30.data.shape[0]).encode("utf-8")+b"82fa8").hexdigest() == "931f885e2b802797c7df8f285277953214b0a3ee", "value of sampling_distribution_30.data.shape[0] is not correct"

assert sha1(str(type(round(sum(sampling_distribution_30.data.sum()), 2))).encode("utf-8")+b"82fa9").hexdigest() == "aec83499f940d3c1e7412e570dabe38ea6879fc5", "type of round(sum(sampling_distribution_30.data.sum()), 2) is not float. Please make sure it is float and not np.float64, etc. You can cast your value into a float using float()"
assert sha1(str(round(round(sum(sampling_distribution_30.data.sum()), 2), 2)).encode("utf-8")+b"82fa9").hexdigest() == "6122763160962d25ad89eacc57ad5517653a2470", "value of round(sum(sampling_distribution_30.data.sum()), 2) is not correct (rounded to 2 decimal places)"

assert sha1(str(type(isinstance(sampling_distribution_30.encoding.x['title'], str))).encode("utf-8")+b"82faa").hexdigest() == "d45185131a0ac0bdeb318d8e8cde305aac36776f", "type of isinstance(sampling_distribution_30.encoding.x['title'], str) is not bool. isinstance(sampling_distribution_30.encoding.x['title'], str) should be a bool"
assert sha1(str(isinstance(sampling_distribution_30.encoding.x['title'], str)).encode("utf-8")+b"82faa").hexdigest() == "2e3a19b6d68df06fd00180b28e0472c7c7b53c40", "boolean value of isinstance(sampling_distribution_30.encoding.x['title'], str) is not correct"

assert sha1(str(type(isinstance(sampling_distribution_30.encoding.y['title'], str))).encode("utf-8")+b"82fab").hexdigest() == "308df46db2674caa25320a89908bb456c15306fc", "type of isinstance(sampling_distribution_30.encoding.y['title'], str) is not bool. isinstance(sampling_distribution_30.encoding.y['title'], str) should be a bool"
assert sha1(str(isinstance(sampling_distribution_30.encoding.y['title'], str)).encode("utf-8")+b"82fab").hexdigest() == "91444e51cf943fe55f72634711411559230b1811", "boolean value of isinstance(sampling_distribution_30.encoding.y['title'], str) is not correct"

assert sha1(str(type(sampling_distribution_30.title is not None)).encode("utf-8")+b"82fac").hexdigest() == "d2fe6ce6cfd89777d5364d85cdf8ca150049b59b", "type of sampling_distribution_30.title is not None is not bool. sampling_distribution_30.title is not None should be a bool"
assert sha1(str(sampling_distribution_30.title is not None).encode("utf-8")+b"82fac").hexdigest() == "4593fe45e7cfb05dc192ed3c4f2bdf42eed43c33", "boolean value of sampling_distribution_30.title is not None is not correct"

assert sha1(str(type(sum(samples.replicate.unique()))).encode("utf-8")+b"82fad").hexdigest() == "b31c595054c2df47e783cbe990bcc2c6de580084", "type of sum(samples.replicate.unique()) is not correct"
assert sha1(str(sum(samples.replicate.unique())).encode("utf-8")+b"82fad").hexdigest() == "b30e25a92687098c817bfaf7a94c170a2c5ff3d7", "value of sum(samples.replicate.unique()) is not correct"

print('Success!')

**Question 2.1** 
<br> {points: 3}

Describe in words the distribution above, comment on the shape, center and how spread out the distribution is. Compare this sampling distribution with samples of size 30 to the sampling distribution with samples of size 5. 

DOUBLE CLICK TO EDIT **THIS CELL** AND REPLACE THIS TEXT WITH YOUR ANSWER.