1\. From sample mean to population mean
---------------------------------------

00:00 - 00:13

Now we're going to study some patterns that we can observe in the sample mean when the sample size becomes larger. These patterns form the basis of the law of large numbers.

2\. Sample mean review
----------------------

00:13 - 00:27

Jakob Bernoulli developed the law of large numbers in his book Ars Conjectandi (1713). The law states that the sample mean tends to the expected value as the sample grows larger.

3\. Sample mean review (Cont.)
------------------------------

00:27 - 00:33

For example, we calculate the sample mean of two values by adding the values and dividing by two.

4\. Sample mean review (Cont.)
------------------------------

00:33 - 00:39

For three values, we add up the values and divide by three.

5\. Sample mean review (Cont.)
------------------------------

00:39 - 00:45

If we have n samples, we add the n values and divide by n.

6\. Sample mean review (Cont.)
------------------------------

00:45 - 00:54

As the sample becomes larger, the sample mean gets nearer to the population mean. Let's code a bit.

7\. Generating the sample
-------------------------

00:54 - 01:34

To generate a sample of coin flips, we will use the binomial distribution. First we import the binom object and the describe method from scipy dot stats, then we generate the sample using binom dot rvs. We specify n as 1 coin flip and p as the probability of success (0.5 for a fair coin), then we specify the sample size as 250 and set random_state so we can reproduce our results. After that, we print the first 100 values from our samples.

8\. Calculating the sample mean
-------------------------------

01:34 - 01:53

To calculate the sample mean we pass the sample to describe dot mean. We specify samples from 0 to 10, and we see that for the first 10 values the sample mean is 0.6. Now let's see what this process looks like with an animation.

9\. Sample mean of coin flips (Cont.)
-------------------------------------

01:53 - 02:31

In this animation you see how we take the sample mean for values from 2 to 250 using the describe method. The red line represents the population mean, in this case 0.5, and the blue line is the sample mean. As you'll notice, due to the randomness of the data, the sample mean fluctuates around the population mean -- but as more data becomes available, the sample mean approaches the population mean. Let's see another example with the normal distribution.

10\. Sample mean of normal distribution
---------------------------------------

02:31 - 03:19

Now we have three animated plots. At the top left we have our sample data from a normal distribution. We use one dot for each sample. At the top right we've plotted a histogram of the sample data, and at the bottom we've plotted the sample mean. In all the plots the population mean is represented with a black line and the sample mean is drawn using a red line. You can see how the red line moves and gets nearer to the population mean as more data becomes available. Enjoy the animations for a bit, and get some perspective. Now let's move on and learn how to plot the sample mean with Python.

11\. Plotting the sample mean
-----------------------------

03:19 - 03:45

First we import the binom object and describe from scipy dot stats, along with matplotlib dot pyplot as plt. Then we initialize the variables, setting coin_flips to 1, p to 0.5, sample_size to 1000, and averages to an empty list.

12\. Plotting the sample mean (Cont.)
-------------------------------------

03:45 - 04:06

Finally, we calculate the sample mean using describe from 0 to the i index that goes from 2 to sample_size plus 1. We store the result in the averages list using append, then we print the first 10 values.

13\. Plotting the sample mean (Cont.)
-------------------------------------

04:06 - 04:20

We add a red line with plt dot axhline at the population mean and plot the averages. Then we add a legend in the upper-right corner and show our plot.

14\. Sample mean plot
---------------------

04:20 - 04:25

The result is this beautiful plot that shows the law of large numbers in action.

15\. Let's practice!
--------------------

04:25 - 04:32

Let's get some hands-on practice with the law of large numbers.

Generating a sample
===================

A hospital's planning department is investigating different treatments for newborns. As a data scientist you are hired to simulate the sex of 250 newborn children, and you are told that on average 50.50% are males.

Instructions
------------

-   Import the `binom` object from `scipy.stats`.
-   Generate a sample of 250 newborns with 50.50% probability of being male.
-   Print the sample.

In [None]:
# Import the binom object
from scipy.stats import binom

# Generate a sample of 250 newborn children
sample = binom.rvs(n=1, p=0.505, size=250, random_state=42)

# Show the sample values
print(sample)

Calculating the sample mean
===========================

Now you can calculate the sample mean for this generated sample by taking some elements from the sample.

Using the `sample` variable you just created, you'll calculate the sample means of the first 10, 50, and 250 samples.

The `binom` object and `describe()` method from `scipy.stats` have been imported for your convenience.

Instructions 1/3
----------------

Print the sample mean of the first 10 samples.

In [None]:
# Print the sample mean of the first 10 samples
print(describe(sample[0:10]).mean)

Instructions 2/3
----------------

-   Print the sample mean of the first 10 samples.

In [None]:
# Print the sample mean of the first 50 samples
print(describe(sample[0:50]).mean)

Instructions 3/3
----------------

-   Print the sample mean of the first 50 samples.

In [None]:
# Print the sample mean of the first 250 samples
print(describe(sample[0:250]).mean)

Plotting the sample mean
========================

Now let's plot the sample mean, so you can see more clearly how it evolves as more data becomes available.

For this exercise we'll again use the sample you generated earlier, which is available in the `sample` variable. The `binom` object and `describe()` function have already been imported for you from `scipy.stats`, and `matplotlib.pyplot` is available as `plt`.

Instructions 1/3
----------------

In a `for` statement for `i` in a range that goes from `2` to `251`, do the following:

-   Calculate the sample mean for the first `i` values.
-   Use `append` to add the value to the `averages` array.

In [None]:
# Calculate sample mean and store it on averages array
averages = []
for i in range(2, 251):
    averages.append(describe(sample[0:i]).mean)

Instructions 2/3
----------------

Add a horizontal line at the mean value of the binomial distribution with `n=1` and `p=0.505`.

In [None]:
# Calculate sample mean and store it on averages array
averages = []
for i in range(2, 251):
    averages.append(describe(sample[0:i]).mean)

# Add population mean line and sample mean plot
plt.axhline(binom.mean(n=1, p=0.505), color='red')
plt.plot(averages, '-')

Instructions 3/3
----------------

Add a legend with labels `Population mean` and `Sample mean` and show the plot.

In [None]:
# Calculate sample mean and store it on averages array
averages = []
for i in range(2, 251):
    averages.append(describe(sample[0:i]).mean)

# Add population mean line and sample mean plot
plt.axhline(binom.mean(n=1, p=0.505), color='red')
plt.plot(averages, '-')

# Add legend
plt.legend(("Population mean","Sample mean"), loc='upper right')
plt.show()