1\. Normal distributions
------------------------

00:00 - 00:10

It's time to study the most important and widely used type of distribution in probability and statistics: the normal distribution.

2\. Modeling for measures
-------------------------

00:10 - 00:28

Normal distributions, also called Gaussian distributions after mathematician Johann Carl Friedrich Gauss, allow you to model many situations. For instance, you can use them to model different measures like clothing sizes, speed measures or product weights.

3\. Adults' heights example
---------------------------

00:28 - 00:54

Let's look at an example. The heights of adults aged between 18 and 35 years are normally distributed. The mean height of adult males in this age group is 70 inches. Adult females have a mean height of 65 inches. You can see the probability density in the plot. Let's take a look at how this works.

4\. Probability density
-----------------------

00:54 - 01:13

The probability density is a function that assigns the relative likelihood to each possible outcome in the sample space. In normal distributions, the probability density has a bell shape. The plot is dense and symmetric around the mean.

5\. Probability density examples
--------------------------------

01:13 - 01:49

For instance, in the plot on the left you can see that the probability density of getting -1 is roughly 0.24. On the other hand, the probability density of getting 0 is 0.4, as seen in the plot on the right. You can compare the probability densities for different values of the random variable to determine which is more likely. In this case, 0 is more likely than -1. But what would the probability of getting a value between -1 and 0 be?

6\. Probability density and probability
---------------------------------------

01:49 - 02:08

To calculate such a probability, we need to calculate the area under the probability density curve between -1 and 0. We do this by subtracting the cdf for -1 from cdf for 0. The probability is 0.34.

7\. Symmetry
------------

02:08 - 02:21

Normal distributions are symmetric around the mean. That means that the probability of getting a value below the mean is the same as the probability of getting a value above the mean: 0.5.

8\. Mean
--------

02:21 - 02:39

One important consequence of the symmetry of the probability density function is that the mean is the value with the highest probability density. In this plot you can see that the mean of the probability density indicated by the green dotted line is 0.

9\. Mean (Cont.)
----------------

02:39 - 02:49

The probability density indicated by the blue dashed line has a mean of 1. You can see how the curve is moved to the right.

10\. Mean (Cont.)
-----------------

02:49 - 02:58

The probability density indicated by the red solid line has a mean of -2; it's moved to the left.

11\. Standard deviation
-----------------------

02:58 - 03:20

The standard deviation is a measure of how spread out the probability density is. For different standard deviations the curve concentrates more or less probability density around the mean. In this plot you can see a red solid line representing a probability density with mean 0 and standard deviation 0.64.

12\. Standard deviation (Cont.)
-------------------------------

03:20 - 03:27

And here you can see a green dotted line with a standard deviation of 1.

13\. Standard deviation (Cont.)
-------------------------------

03:27 - 03:46

Finally, the blue dashed line shows a standard deviation of 2. The lower the value of the standard deviation, the more concentrated the probability density is around the mean. Let's take a look at some other important properties.

14\. One standard deviation
---------------------------

03:46 - 04:05

From a statistical point of view, it's interesting to know how far the data is from the mean in terms of standard deviations. In any normal distribution, 0.68 probability is concentrated one standard deviation around the mean.

15\. Two standard deviations
----------------------------

04:05 - 04:12

0.95 probability is concentrated two standard deviations around the mean.

16\. Three standard deviations
------------------------------

04:12 - 04:18

0.997 probability is within three standard deviations.

17\. Normal sampling
--------------------

04:18 - 04:59

What if you want to generate a sample from a normal distribution? First, you import norm from scipy dot stats, matplotlib dot pyplot as plt, and seaborn as sns. Then you use norm dot rvs, with the loc parameter as the mean and the scale parameter as the standard deviation, and specify the size of the sample. Use random_state to reproduce the results. Finally, you use sns dot distplot to plot the sample.

18\. Normal sampling (Cont.)
----------------------------

04:59 - 05:08

We got this beautiful probability density plot. You can plot any probability density just knowing the mean and standard deviation.

19\. Let's do some exercises with normal distributions
------------------------------------------------------

05:08 - 05:15

We already know the fundamentals about normal distributions. Now let's see what we can do with them.

Range of values
===============

Suppose the scores on a given academic test are normally distributed, with a mean of 65 and standard deviation of 10.

What would be the range of scores **two** standard deviations from the mean?

Instructions
------------

### Possible answers

10 and 65

55 and 75

[x] 45 and 85

35 and 95

Plotting normal distributions
=============================

A certain restaurant chain has been collecting data about customer spending. The data shows that the spending is approximately normally distributed, with a mean of $3.15 and a standard deviation of $1.50 per customer.

Instructions
------------

-   Import `norm` from `scipy.stats`, `matplotlib.pyplot` as `plt`, and `seaborn` as `sns`.
-   Generate a normal distribution sample with mean `3.15` and standard deviation `1.5`.
-   Plot the sample generated.

In [None]:
# Import norm, matplotlib.pyplot, and seaborn
from scipy.stats import norm
import matplotlib.pyplot as plt 
import seaborn as sns

# Create the sample using norm.rvs()
sample = norm.rvs(loc=3.15, scale=1.5, size=10000, random_state=13)

# Plot the sample
sns.distplot(sample)
plt.show()

Within three standard deviations
================================

The heights of every employee in a company have been measured, and they are distributed normally with a mean of 168 cm and a standard deviation of 12 cm.

-   What is the probability of getting a height within three standard deviations of the mean?

##### Answer the question

#### Possible Answers

Select one answer

-   68%


-   95%


[x] -   99.7%


-   30%


1\. Normal probabilities
------------------------

00:00 - 00:09

You're familiar with the fundamentals of normal distributions; now we're going to calculate probabilities. Let's do it!

2\. Probability density
-----------------------

00:09 - 00:50

Before we start, we have to import the norm object from the scipy dot stats library. This has to be done every time we need to use norm. In the rest of the lesson we will assume it is already imported. To calculate the probability density of a given value we use the probability density function, pdf. We pass the value we want to calculate, with the loc parameter for the mean and the scale parameter for the standard deviation. By default, loc is 0 and scale is 1 on all the functions available in the norm object.

3\. pdf() vs. cdf()
-------------------

00:50 - 01:37

Consider these two plots. What if we want to calculate the probability of getting a value below -1? The plot on the left is the probability density, with a green area. This area represents the probability of getting a value less than -1. On the right we have a plot of the cumulative distribution function (cdf), which gives us the probability of a value being in the green area. In this case, it's 0.15. The cumulative distribution function is an S-shaped function that allows us to calculate the probability of getting a value less than a given x.

4\. pdf() vs. cdf() (Cont.)
---------------------------

01:37 - 01:55

Let's look at another example. We can see on the left the area we want to calculate, which is the probability of getting a value less than 1.5. On the right we can see that result of the cdf is 0.93.

5\. pdf() vs. cdf() (Cont.)
---------------------------

01:55 - 02:05

And finally, for the area below the curve less than 5, we can see that the result of the cdf is almost 1.

6\. Cumulative distribution function examples
---------------------------------------------

02:05 - 02:28

We've seen that if you calculate norm dot cdf for -1 you get 0.15. If you want to know how probable it is to get a value less than 0.5, you can do that with norm dot cdf too: in this case the probability is 0.69.

7\. The percent point function (ppf)
------------------------------------

02:28 - 03:07

If instead you want to know the value where a given probability is accumulated, you use the percent point function, norm dot ppf. Notice the direction of the arrows from probability to values in the plot. For example, if you want to calculate the value in a normal distribution with a 0.2 probability of occurring, you use norm dot ppf of 0.2 and you get -0.8416. For 0.55 probability, you get 0.1256.

8\. ppf() is the inverse of cdf()
---------------------------------

03:07 - 03:22

As you've seen, we can take values and get probabilities with norm dot cdf and we can take probabilities to get values with norm dot ppf. One is the inverse of the other.

9\. Probability between two values
----------------------------------

03:22 - 03:36

If we want the probability of getting a value between -1 and 1, we take the value of cdf for 1 and subtract the value for -1, and we get 0.68.

10\. Tail probability
---------------------

03:36 - 03:59

If we instead want the probability of a random variable being greater than a given value, we can use norm dot sf with the desired value. sf stands for survival function, which is the complement of the cdf. The probability of getting a value greater than 1 is 0.15.

11\. Tails
----------

03:59 - 04:16

What if we want to calculate the probability of getting a value less than -2 and greater than 2? We just add the probabilities of each tail using cdf and sf.

12\. Tails (Cont.)
------------------

04:16 - 04:28

The result is 0.045, which means there's only a 4.5% probability of a value being two standard deviations away from the mean. Tail probabilities are important to study extreme events.

13\. Intervals
--------------

04:28 - 04:50

Finally, if we want to know the interval where any given probability concentrates, we can use norm dot interval and specify the probability. For 0.95, we get -1.95 and 1.95.

14\. On to some practice!
-------------------------

04:50 - 04:53

Now let's calculate some normal probabilities.

Restaurant spending example
===========================

Let's go back to the example of the restaurant chain that has been collecting data about customer spending. Recall that the data shows that the spending is approximately normally distributed, with a **mean of 3.15** and a **standard deviation of 1.5** per customer, as pictured in the plot.


```markdown

**_mean_ = 3.15, _stddev_ = 1.5**

    ↑
0.25 ┤
     │
     │
     │
     │
     │
     │
     │
     │
     │
     │
0.00 ┼────────────────────────────────────────────
     0         1         2         3         4         5         6

**_x_ axis:** `x`  
**_y_ axis:** `Probability density of x`

The plot shows a Gaussian (normal) distribution centered at `x = 3.15` with standard deviation `1.5`.
```


We can use the already imported `norm` object from `scipy.stats` to answer several questions about customer spending at this restaurant chain.

Instructions 1/4
----------------

What is the probability that a customer will spend $3 or less?

In [None]:
# Probability of spending $3 or less
spending = norm.cdf(3, loc=3.15, scale=1.5)
print(spending)

Instructions 2/4
----------------

-   What is the probability that a customer will spend $3 or less?

In [None]:
# Probability of spending more than $5
spending = norm.sf(5, loc=3.15, scale=1.5)
print(spending)

Instructions 3/4
----------------

-   What is the probability that a customer will spend more than $5?

In [None]:
# Probability of spending more than $2.15 and $4.15 or less
spending_4 = norm.cdf(4.15, loc=3.15, scale=1.5)
spending_2 = norm.cdf(2.15, loc=3.15, scale=1.5)
print(spending_4 - spending_2)

Instructions 4/4
----------------

-   What is the probability that a customer will spend more than $2.15 and $4.15 or less?

In [None]:
# Probability of spending $2.15 or less or more than $4.15
spending_2 = norm.cdf(2.15, loc=3.15, scale=1.5)
spending_over_4 = norm.sf(4.15, loc=3.15, scale=1.5) 
print(spending_2 + spending_over_4)