# Symbulate Lab 5 - Central Limit Theorem

This Jupyter notebook provides a template for you to fill in.  Read the notebook from start to finish, completing the parts as indicated.  To run a cell, make sure the cell is highlighted by clicking on it, then press SHIFT + ENTER on your keyboard.  (Alternatively, you can click the "play" button in the toolbar above.)

In this lab you will use the Symbulate package.  You have seen most of the commands that you will use in previos labs, but remember to refer to the [documentation](https://dlsun.github.io/symbulate/index.html) for help.   In particular, read the documentation on [Normal distributions](https://dlsun.github.io/symbulate/common_continuous.html#normal), the [standardize method](https://dlsun.github.io/symbulate/rv.html#standardize), and the [`**` (exponentiation) notation](https://dlsun.github.io/symbulate/probspace.html#Independent-probability-spaces) for drawing multiple values independently from a distribution. **You should Symbulate commands whenever possible.**  If you find yourself writing long blocks of Python code, you are probably doing something wrong.  For example, you should not need to write any `for` loops.

**Warning:** You may notice that many of the cells in this notebook are not editable. This is intentional and for your own safety. We have made these cells read-only so that you don't accidentally modify or delete them. However, you should still be able to execute the code in these cells.

In [1]:
from symbulate import *
%matplotlib inline

## Setup

A random sample of $n$ customers at the Avenue is selected.  Let $\bar{X}$ represent the mean dollar amount spent by the $n$ customers in the sample.  In this lab, you will investigate the distribution of $\bar{X}$: how does the mean dollar amount spent vary over many samples of size $n$?

Each of the parts assumes a different distribution for dollar amounts spent by individual customers.  Within each part you will investigate how the distribution of the sample mean changes as the sample size increases.

In each simulation, you should first define a probability space so that an outcome represents the $n$ individual dollar amounts spent by the customers in a random sample.  You can assume the dollar amounts spent are independent from customer to customer, and each is amount is drawn from the specified distribution.  (We say that the dollar amounts in a random sample are **independent and identically distributed (i.i.d.)**)

In much of this lab, you will only need to make small modifications from question to question.  But do make sure you take time to think about the output of each part before moving on.  In particular, be sure to note the scale on the horizontal axis on your plots.

You will run a simulation for each question, but there are some parts for which you should be able to derive the distribution analytically.  You are encouraged to do this outside of class for practice.

Some of the simulations will take some a minute or two to run, especially for the larger values of $n$, so please be patient.  You might want to run `.sim(10)` first to make sure your code works, and then you can change to `.sim(10000)`.

## Part I

Assume dollar amounts spent by individual customers can be modeled with a Normal distribution with mean 6.50 and standard deviation 1.71.

## a)

First assume just a single customer is selected at random, and let $X$ represent the dollar amount spent.  Use simulation to:

- Plot the approximate distribution of $X$
- Estimate its expected value and standard deviation
- Estimate the probability that $X$ is more than 2 standard deviations greater than its expected value.

In [2]:
# Type all of your code for this problem in this cell.
# Feel free to add additional cells for scratch work, but they will not be graded.

## b)

Now $n=2$ customers are selected at random, and $\bar{X}$ represents the mean dollar amount spent for the two customers.  Use simulation to:

- Plot the approximate distribution of $\bar{X}$
- Estimate its expected value and standard deviation
- Estimate the probability that $\bar{X}$ is more than 2 standard deviations greater than its expected value.

In [3]:
# Type all of your code for this problem in this cell.
# Feel free to add additional cells for scratch work, but they will not be graded.

## c)

Repeat part b) with $n=5$

In [4]:
# Type all of your code for this problem in this cell.
# Feel free to add additional cells for scratch work, but they will not be graded.

## d)

Repeat part b) with $n=30$

In [5]:
# Type all of your code for this problem in this cell.
# Feel free to add additional cells for scratch work, but they will not be graded.

## e)

Repeat part b) with $n=100$

In [6]:
# Type all of your code for this problem in this cell.
# Feel free to add additional cells for scratch work, but they will not be graded.

## f)

How does increasing the sample size $n$ affect the distribution of $\bar{X}$? 

**TYPE YOUR RESPONSE HERE.**

## Part II

Assume the dollar amount spent by any individual customer is equally likely to be 4, 5, 6, 7, 8, or 9.

## a)

First assume just a single customer is selected at random, and let $X$ represent the dollar amount spent.  Use simulation to:

- Plot the approximate distribution of $X$
- Estimate its expected value and standard deviation
- Estimate the probability that $X$ is more than 2 standard deviations greater than its expected value.

In [7]:
# Type all of your code for this problem in this cell.
# Feel free to add additional cells for scratch work, but they will not be graded.

## b)

Now $n=2$ customers are selected at random, and $\bar{X}$ represents the mean dollar amount spent for the two customers.  Use simulation to:

- Plot the approximate distribution of $\bar{X}$
- Estimate its expected value and standard deviation
- Estimate the probability that $\bar{X}$ is more than 2 standard deviations greater than its expected value.

In [8]:
# Type all of your code for this problem in this cell.
# Feel free to add additional cells for scratch work, but they will not be graded.

## c)

Repeat part b) with $n=5$

In [9]:
# Type all of your code for this problem in this cell.
# Feel free to add additional cells for scratch work, but they will not be graded.

## d)

Repeat part b) with $n=30$

In [10]:
# Type all of your code for this problem in this cell.
# Feel free to add additional cells for scratch work, but they will not be graded.

## e)

Repeat part b) with $n=100$

In [11]:
# Type all of your code for this problem in this cell.
# Feel free to add additional cells for scratch work, but they will not be graded.

## f)

How does increasing the sample size $n$ affect the distribution of $\bar{X}$? 

**TYPE YOUR RESPONSE HERE.**

## Part III

Assume the dollar amount spent by any individual customer has an Exponential distribution with mean 6.50

## a)

First assume just a single customer is selected at random, and let $X$ represent the dollar amount spent.  Use simulation to:

- Plot the approximate distribution of $X$
- Estimate its expected value and standard deviation
- Estimate the probability that $X$ is more than 2 standard deviations greater than its expected value.

In [12]:
# Type all of your code for this problem in this cell.
# Feel free to add additional cells for scratch work, but they will not be graded.

## b)

Now $n=2$ customers are selected at random, and $\bar{X}$ represents the mean dollar amount spent for the two customers.  Use simulation to:

- Plot the approximate distribution of $\bar{X}$
- Estimate its expected value and standard deviation
- Estimate the probability that $\bar{X}$ is more than 2 standard deviations greater than its expected value.

In [13]:
# Type all of your code for this problem in this cell.
# Feel free to add additional cells for scratch work, but they will not be graded.

## c)

Repeat part b) with $n=5$

In [14]:
# Type all of your code for this problem in this cell.
# Feel free to add additional cells for scratch work, but they will not be graded.

## d)

Repeat part b) with $n=30$

In [15]:
# Type all of your code for this problem in this cell.
# Feel free to add additional cells for scratch work, but they will not be graded.

## e)

Repeat part b) with $n=100$

In [16]:
# Type all of your code for this problem in this cell.
# Feel free to add additional cells for scratch work, but they will not be graded.

## f)

How does increasing the sample size $n$ affect the distribution of $\bar{X}$? 

**TYPE YOUR RESPONSE HERE.**

## Part IV

Now suppose that for 99% of the customers, the amounts spent follow the distribution in Part II, but the remaining 1% of customers spend 31 dollars (maybe they treat a few friends to lunch).  (Hint: use a BoxModel.)

## a)

First assume just a single customer is selected at random, and let $X$ represent the dollar amount spent.  Use simulation to:

- Plot the approximate distribution of $X$
- Estimate its expected value and standard deviation
- Estimate the probability that $X$ is more than 2 standard deviations greater than its expected value.

In [17]:
# Type all of your code for this problem in this cell.
# Feel free to add additional cells for scratch work, but they will not be graded.

## b)

Now $n=2$ customers are selected at random, and $\bar{X}$ represents the mean dollar amount spent for the two customers.  Use simulation to:

- Plot the approximate distribution of $\bar{X}$
- Estimate its expected value and standard deviation
- Estimate the probability that $\bar{X}$ is more than 2 standard deviations greater than its expected value.

In [18]:
# Type all of your code for this problem in this cell.
# Feel free to add additional cells for scratch work, but they will not be graded.

## c)

Repeat part b) with $n=5$

In [19]:
# Type all of your code for this problem in this cell.
# Feel free to add additional cells for scratch work, but they will not be graded.

## d)

Repeat part b) with $n=30$

In [20]:
# Type all of your code for this problem in this cell.
# Feel free to add additional cells for scratch work, but they will not be graded.

## e)

Repeat part b) with $n=100$

## f)

How does increasing the sample size $n$ affect the distribution of $\bar{X}$? 

**TYPE YOUR RESPONSE HERE.**

## Part V

Review your work from the previous parts.  Write a few sentences summarizing what you have learned about the distribution of the sample mean of a random sample.  Be sure to consider shape, expected value, and standard deviation of the distribution.

**TYPE YOUR RESPONSE HERE.**

## Submission Instructions

Before you submit this notebook, click the "Kernel" drop-down menu at the top of this page and select "Restart & Run All". This will ensure that all of the code in your notebook executes properly. Please fix any errors, and repeat the process until the entire notebook executes without any errors.