1\. Normal distributions
------------------------

00:00 - 00:10

It's time to study the most important and widely used type of distribution in probability and statistics: the normal distribution.

2\. Modeling for measures
-------------------------

00:10 - 00:28

Normal distributions, also called Gaussian distributions after mathematician Johann Carl Friedrich Gauss, allow you to model many situations. For instance, you can use them to model different measures like clothing sizes, speed measures or product weights.

3\. Adults' heights example
---------------------------

00:28 - 00:54

Let's look at an example. The heights of adults aged between 18 and 35 years are normally distributed. The mean height of adult males in this age group is 70 inches. Adult females have a mean height of 65 inches. You can see the probability density in the plot. Let's take a look at how this works.

4\. Probability density
-----------------------

00:54 - 01:13

The probability density is a function that assigns the relative likelihood to each possible outcome in the sample space. In normal distributions, the probability density has a bell shape. The plot is dense and symmetric around the mean.

5\. Probability density examples
--------------------------------

01:13 - 01:49

For instance, in the plot on the left you can see that the probability density of getting -1 is roughly 0.24. On the other hand, the probability density of getting 0 is 0.4, as seen in the plot on the right. You can compare the probability densities for different values of the random variable to determine which is more likely. In this case, 0 is more likely than -1. But what would the probability of getting a value between -1 and 0 be?

6\. Probability density and probability
---------------------------------------

01:49 - 02:08

To calculate such a probability, we need to calculate the area under the probability density curve between -1 and 0. We do this by subtracting the cdf for -1 from cdf for 0. The probability is 0.34.

7\. Symmetry
------------

02:08 - 02:21

Normal distributions are symmetric around the mean. That means that the probability of getting a value below the mean is the same as the probability of getting a value above the mean: 0.5.

8\. Mean
--------

02:21 - 02:39

One important consequence of the symmetry of the probability density function is that the mean is the value with the highest probability density. In this plot you can see that the mean of the probability density indicated by the green dotted line is 0.

9\. Mean (Cont.)
----------------

02:39 - 02:49

The probability density indicated by the blue dashed line has a mean of 1. You can see how the curve is moved to the right.

10\. Mean (Cont.)
-----------------

02:49 - 02:58

The probability density indicated by the red solid line has a mean of -2; it's moved to the left.

11\. Standard deviation
-----------------------

02:58 - 03:20

The standard deviation is a measure of how spread out the probability density is. For different standard deviations the curve concentrates more or less probability density around the mean. In this plot you can see a red solid line representing a probability density with mean 0 and standard deviation 0.64.

12\. Standard deviation (Cont.)
-------------------------------

03:20 - 03:27

And here you can see a green dotted line with a standard deviation of 1.

13\. Standard deviation (Cont.)
-------------------------------

03:27 - 03:46

Finally, the blue dashed line shows a standard deviation of 2. The lower the value of the standard deviation, the more concentrated the probability density is around the mean. Let's take a look at some other important properties.

14\. One standard deviation
---------------------------

03:46 - 04:05

From a statistical point of view, it's interesting to know how far the data is from the mean in terms of standard deviations. In any normal distribution, 0.68 probability is concentrated one standard deviation around the mean.

15\. Two standard deviations
----------------------------

04:05 - 04:12

0.95 probability is concentrated two standard deviations around the mean.

16\. Three standard deviations
------------------------------

04:12 - 04:18

0.997 probability is within three standard deviations.

17\. Normal sampling
--------------------

04:18 - 04:59

What if you want to generate a sample from a normal distribution? First, you import norm from scipy dot stats, matplotlib dot pyplot as plt, and seaborn as sns. Then you use norm dot rvs, with the loc parameter as the mean and the scale parameter as the standard deviation, and specify the size of the sample. Use random_state to reproduce the results. Finally, you use sns dot distplot to plot the sample.

18\. Normal sampling (Cont.)
----------------------------

04:59 - 05:08

We got this beautiful probability density plot. You can plot any probability density just knowing the mean and standard deviation.

19\. Let's do some exercises with normal distributions
------------------------------------------------------

05:08 - 05:15

We already know the fundamentals about normal distributions. Now let's see what we can do with them.

Range of values
===============

Suppose the scores on a given academic test are normally distributed, with a mean of 65 and standard deviation of 10.

What would be the range of scores **two** standard deviations from the mean?

Instructions
------------

### Possible answers

10 and 65

55 and 75

[x] 45 and 85

35 and 95

Plotting normal distributions
=============================

A certain restaurant chain has been collecting data about customer spending. The data shows that the spending is approximately normally distributed, with a mean of $3.15 and a standard deviation of $1.50 per customer.

Instructions
------------

-   Import `norm` from `scipy.stats`, `matplotlib.pyplot` as `plt`, and `seaborn` as `sns`.
-   Generate a normal distribution sample with mean `3.15` and standard deviation `1.5`.
-   Plot the sample generated.

In [None]:
# Import norm, matplotlib.pyplot, and seaborn
from scipy.stats import norm
import matplotlib.pyplot as plt 
import seaborn as sns

# Create the sample using norm.rvs()
sample = norm.rvs(loc=3.15, scale=1.5, size=10000, random_state=13)

# Plot the sample
sns.distplot(sample)
plt.show()

Within three standard deviations
================================

The heights of every employee in a company have been measured, and they are distributed normally with a mean of 168 cm and a standard deviation of 12 cm.

-   What is the probability of getting a height within three standard deviations of the mean?

##### Answer the question

#### Possible Answers

Select one answer

-   68%


-   95%


[x] -   99.7%


-   30%


1\. Normal probabilities
------------------------

00:00 - 00:09

You're familiar with the fundamentals of normal distributions; now we're going to calculate probabilities. Let's do it!

2\. Probability density
-----------------------

00:09 - 00:50

Before we start, we have to import the norm object from the scipy dot stats library. This has to be done every time we need to use norm. In the rest of the lesson we will assume it is already imported. To calculate the probability density of a given value we use the probability density function, pdf. We pass the value we want to calculate, with the loc parameter for the mean and the scale parameter for the standard deviation. By default, loc is 0 and scale is 1 on all the functions available in the norm object.

3\. pdf() vs. cdf()
-------------------

00:50 - 01:37

Consider these two plots. What if we want to calculate the probability of getting a value below -1? The plot on the left is the probability density, with a green area. This area represents the probability of getting a value less than -1. On the right we have a plot of the cumulative distribution function (cdf), which gives us the probability of a value being in the green area. In this case, it's 0.15. The cumulative distribution function is an S-shaped function that allows us to calculate the probability of getting a value less than a given x.

4\. pdf() vs. cdf() (Cont.)
---------------------------

01:37 - 01:55

Let's look at another example. We can see on the left the area we want to calculate, which is the probability of getting a value less than 1.5. On the right we can see that result of the cdf is 0.93.

5\. pdf() vs. cdf() (Cont.)
---------------------------

01:55 - 02:05

And finally, for the area below the curve less than 5, we can see that the result of the cdf is almost 1.

6\. Cumulative distribution function examples
---------------------------------------------

02:05 - 02:28

We've seen that if you calculate norm dot cdf for -1 you get 0.15. If you want to know how probable it is to get a value less than 0.5, you can do that with norm dot cdf too: in this case the probability is 0.69.

7\. The percent point function (ppf)
------------------------------------

02:28 - 03:07

If instead you want to know the value where a given probability is accumulated, you use the percent point function, norm dot ppf. Notice the direction of the arrows from probability to values in the plot. For example, if you want to calculate the value in a normal distribution with a 0.2 probability of occurring, you use norm dot ppf of 0.2 and you get -0.8416. For 0.55 probability, you get 0.1256.

8\. ppf() is the inverse of cdf()
---------------------------------

03:07 - 03:22

As you've seen, we can take values and get probabilities with norm dot cdf and we can take probabilities to get values with norm dot ppf. One is the inverse of the other.

9\. Probability between two values
----------------------------------

03:22 - 03:36

If we want the probability of getting a value between -1 and 1, we take the value of cdf for 1 and subtract the value for -1, and we get 0.68.

10\. Tail probability
---------------------

03:36 - 03:59

If we instead want the probability of a random variable being greater than a given value, we can use norm dot sf with the desired value. sf stands for survival function, which is the complement of the cdf. The probability of getting a value greater than 1 is 0.15.

11\. Tails
----------

03:59 - 04:16

What if we want to calculate the probability of getting a value less than -2 and greater than 2? We just add the probabilities of each tail using cdf and sf.

12\. Tails (Cont.)
------------------

04:16 - 04:28

The result is 0.045, which means there's only a 4.5% probability of a value being two standard deviations away from the mean. Tail probabilities are important to study extreme events.

13\. Intervals
--------------

04:28 - 04:50

Finally, if we want to know the interval where any given probability concentrates, we can use norm dot interval and specify the probability. For 0.95, we get -1.95 and 1.95.

14\. On to some practice!
-------------------------

04:50 - 04:53

Now let's calculate some normal probabilities.

Restaurant spending example
===========================

Let's go back to the example of the restaurant chain that has been collecting data about customer spending. Recall that the data shows that the spending is approximately normally distributed, with a **mean of 3.15** and a **standard deviation of 1.5** per customer, as pictured in the plot.

```
 Probability density of x
        |
   0.12 |                                                    
        |                                                    
   0.10 |                        . . . . . . . .              
        |                    . '               ' .           
   0.08 |                 . '                     ' .         
        |               /                           \        
   0.06 |             /                               \      
        |           /                                 \     
   0.04 |         /                                     \    
        |       /                                         \  
   0.02 |     /                                             \
        |   /                                                 \
   0.00 |__/_____________________________________________________ x
        | 55        60        65        70        75        80  

Legend:
  - - - - - - - -  Male height
  _ _ _ _ _ _ _ _  Female height
```

We can use the already imported `norm` object from `scipy.stats` to answer several questions about customer spending at this restaurant chain.

Instructions 1/4
----------------

What is the probability that a customer will spend $3 or less?

In [None]:
# Probability of spending $3 or less
spending = norm.cdf(3, loc=3.15, scale=1.5)
print(spending)

Instructions 2/4
----------------

-   What is the probability that a customer will spend $3 or less?

In [None]:
# Probability of spending more than $5
spending = norm.sf(5, loc=3.15, scale=1.5)
print(spending)

Instructions 3/4
----------------

-   What is the probability that a customer will spend more than $5?

In [None]:
# Probability of spending more than $2.15 and $4.15 or less
spending_4 = norm.cdf(4.15, loc=3.15, scale=1.5)
spending_2 = norm.cdf(2.15, loc=3.15, scale=1.5)
print(spending_4 - spending_2)

Instructions 4/4
----------------

-   What is the probability that a customer will spend more than $2.15 and $4.15 or less?

In [None]:
# Probability of spending $2.15 or less or more than $4.15
spending_2 = norm.cdf(2.15, loc=3.15, scale=1.5)
spending_over_4 = norm.sf(4.15, loc=3.15, scale=1.5) 
print(spending_2 + spending_over_4)

Smartphone battery example
==========================

One of the most important things to consider when buying a smartphone is how long the battery will last.

Suppose the period of time between charges can be modeled with a normal distribution with a **mean of 5 hours** and a **standard deviation of 1.5 hours**.

A friend wants to buy a smartphone and is asking you the following questions.

`norm` is already imported from `scipy.stats`.

Instructions 1/3
----------------

What is the probability that the battery will last less than 3 hours?

In [None]:
# Probability that battery will last less than 3 hours
less_than_3h = norm.cdf(3, loc=5, scale=1.5)
print(less_than_3h)

Instructions 2/3
----------------

-   What is the probability that the battery will last less than 3 hours?

In [None]:
# Probability that battery will last more than 3 hours
more_than_3h = norm.sf(3, loc=5, scale=1.5)
print(more_than_3h)

Instructions 3/3
----------------

-   What is the probability that the battery will last more than 3 hours?

In [None]:
# Probability that battery will last between 5 and 7 hours
P_less_than_7h = norm.cdf(7, loc=5, scale=1.5)
P_less_than_5h = norm.sf(5, loc=5, scale=1.5)
print(P_less_than_7h - P_less_than_5h)

Adults' heights example
=======================

The heights of adults aged between 18 and 35 years are normally distributed. For males, the **mean height is 70 inches with a standard deviation of 4**. Adult females have a **mean height of 65 inches with a standard deviation of 3.5**. You can see how the heights are distributed in this plot:


```
Probability density of x
        |
   0.12 |                   -----
        |                 /       \         ---------
   0.10 |               /           \      /         \
        |             /              \    /           \
   0.08 |           /                 \  /             \
        |         /                   \/               \
   0.06 |        /                                     /
        |       /                                     /
   0.04 |      /                                     /
        |     /                                     /
   0.02 |    /                                     /
        |___/_____________________________________/
        | 55    60    65    70    75    80    85    x

Legend:
    Female height (red dashed line)
    Male height (blue dotted line)
```

Using the previous information, complete the following exercises.

For your convenience, `norm` has been imported from the library `scipy.stats`.

Instructions 1/4
----------------

Print the range of female heights one standard deviation from the mean.

In [None]:
# Values one standard deviation from mean height for females
interval = norm.interval(0.68, loc=65, scale=3.5)
print(interval)

Instructions 2/4
----------------

-   Print the range of female heights one standard deviation from the mean.

In [None]:
# Value where the tallest males fall with 0.01 probability
tallest = norm.ppf(0.99, loc=70, scale=4)
print(tallest)

Instructions 3/4
----------------

-   Print the value where the tallest males fall with 0.01 probability.

In [None]:
# Probability of being taller than 73 inches for males and females
P_taller_male = norm.sf(73, loc=70, scale=4)
P_taller_female = norm.sf(73, loc=65, scale=3.5)
print(P_taller_male, P_taller_female)

Instructions 4/4
----------------

-   Print the probability of being taller than 73 inches for a male and for a female.

In [None]:
# Probability of being shorter than 61 inches for males and females
P_shorter_male = norm.cdf(61, loc=70, scale=4)
P_shorter_female = norm.cdf(61, loc=65, scale=3.5)
print(P_shorter_male, P_shorter_female)

1\. Poisson distributions
-------------------------

00:00 - 00:11

The Poisson distribution is a very useful type of probability distribution that can model the frequency with which an event occurs during a fixed interval of time.

2\. Poisson modeling
--------------------

00:11 - 00:34

Suppose the mean number of call center calls per minute is 2.2. What is the probability of having 3 calls in any minute? Calling a call center is an example of a Poisson process. Other examples include visiting a bank branch, finishing a course on DataCamp, and so on.

3\. Poisson distribution properties
-----------------------------------

00:34 - 01:15

Before doing any calculations, let's study the most important properties. In Poisson distributions, your outcomes can be classified as successes or failures, and the average number of successful events per unit is known. In the call center example, a call is a success and the average number of successful events per unit is the number of calls per minute. In this lesson, we will not go into detail about the mathematical formulas for the Poisson distribution. If you're interested, you can find more information online.

4\. Probability mass function (pmf)
-----------------------------------

01:15 - 01:22

Let's do a few probability calculations with the Poisson distribution. First, we'll calculate the pmf.

5\. Probability mass function (pmf) (Cont.)
-------------------------------------------

01:22 - 02:00

Suppose we know that the average number of calls per minute to the call center is 2.2, and we want to know the probability of having 3 phone calls in a minute. To find this we use the probability mass function and specify mu, the mean of the distribution. First we import the poisson object from scipy dot stats, then we call poisson dot pmf with k equals 3 and mu equals 2.2. We will use the same mu throughout the lesson. The result is 0.196.

6\. pmf examples
----------------

02:00 - 02:20

If we want the probability of having no calls in a minute, we call poisson dot pmf with k equals 0, and we get 0.11. If we instead want the probability of having 6 calls in a minute, we get 0.017.

7\. Different means
-------------------

02:20 - 02:38

Take a look at these plots. You'll notice that for different means, the shape of the distribution varies. When the mean is small, the probability of having 0 events is higher. As the mean gets higher, the curve moves to the right. Let's study the cdf now.

8\. Cumulative distribution function (cdf)
------------------------------------------

02:38 - 03:06

If we want to know the probability of having 2 or fewer phone calls in a minute, we use cdf. In the plot on the left, we call poisson dot cdf and specify k equals 2 to get 0.62. On the right, to find the probability of having 5 or fewer calls in a minute, we specify k equals 5 to get 0.97.

9\. Survival function and percent point function (ppf)
------------------------------------------------------

03:06 - 03:39

To calculate the probability of having more than 2 calls in a minute, we use the survival function, sf. With k equals 2, we get 0.38. If we instead want the value where we accumulate a given probability, we use the percent point function, ppf. For 0.5 probability we get a value of 2. In the plot, notice the arrow that goes from the probability to the associated value.

10\. Sample generation (rvs)
----------------------------

03:39 - 04:23

Finally, suppose we want to generate 10,000 samples of a Poisson random variable with mean 2.2. We use the rvs function for this. We first import poisson from scipy dot stats, matplotlib dot pyplot as plt, and seaborn as sns. Then we call rvs and specify mu, the size of the sample, and random_state equals 13. We generate the plot by calling sns dot distplot with sample as a parameter and kde equals False. The result is...

11\. Sample generation (Cont.)
------------------------------

04:23 - 04:34

This beautiful plot with the frequency of each possible outcome in each bar. Notice that the sum of all the frequencies is 10,000.

12\. Let's practice with Poisson
--------------------------------

04:34 - 04:41

You're doing great -- now let's practice with Poisson.

ATM example
===========

If you know how many specific events occurred per unit of measure, you can assume that the distribution of the random variable follows a Poisson distribution to study the phenomenon.

Consider an ATM (automatic teller machine) at a very busy shopping mall. The bank wants to avoid making customers wait in line to use the ATM. It has been observed that the average number of customers making withdrawals between 10:00 a.m. and 10:05 a.m. on any given day is 1.

As a data analyst at the bank, you are asked what the probability is that the bank will need to install another ATM to handle the load.

To answer the question, you need to calculate the probability of getting more than one customer during that time period.

Instructions
------------

-   Import `poisson` from `scipy.stats`.
-   Calculate the probability of having more than one customer visiting the ATM in this 5-minute period.

In [None]:
# Import poisson from scipy.stats
from scipy.stats import poisson

# Probability of more than 1 customer
probability = poisson.sf(k=1, mu=1)

# Print the result
print(probability)

Highway accidents example
=========================

On a certain turn on a very busy highway, there are 2 accidents per day. Let's assume the number of accidents per day can be modeled as a Poisson random variable and is distributed as in the following plot:

poisson.pmf(k, mu=2)

|  k  | pmf(k) | Bar Representation     |
|:---:|:------:|:------------------------|
|  0  | 0.135  | ████                    |
|  1  | 0.271  | ██████████              |
|  2  | 0.271  | ██████████              |
|  3  | 0.180  | ███████                 |
|  4  | 0.090  | ███                     |
|  5  | 0.036  | █                       |
|  6  | 0.012  |                         |

*pmf values are scaled for visual representation*


For your convenience, the `poisson` object has already been imported from the `scipy.stats` library.

Aiming to improve road safety, the transportation agency of the regional government has assigned you the following tasks.

Instructions 1/4
----------------

Determine and print the probability of there being 5 accidents on any day.

In [None]:
# Import the poisson object
from scipy.stats import poisson

# Probability of 5 accidents any day
P_five_accidents = poisson.pmf(k=5, mu=2)

# Print the result
print(P_five_accidents)

Instructions 2/4
----------------

-   Determine and print the probability of there being 5 accidents on any day.

In [None]:
# Import the poisson object
from scipy.stats import poisson

# Probability of having 4 or 5 accidents on any day
P_less_than_6 = poisson.cdf(k=5, mu=2)
P_less_than_4 = poisson.cdf(k=3, mu=2)

# Print the result
print(P_less_than_6 - P_less_than_4)

Instructions 3/4
----------------

-   Determine and print the probability of having 4 or 5 accidents on any day.

In [None]:
# Import the poisson object
from scipy.stats import poisson

# Probability of more than 3 accidents any day
P_more_than_3 = poisson.sf(k=3, mu=2)

# Print the result
print(P_more_than_3)

Instructions 4/4
----------------

-   Determine and print the probability of having more than 3 accidents on any day.

In [None]:
# Import the poisson object
from scipy.stats import poisson

# Number of accidents with 0.75 probability
accidents = poisson.ppf(q=0.75, mu=2)

# Print the result
print(accidents)

Generating and plotting Poisson distributions
=============================================

In the previous exercise, you calculated some probabilities. Now let's plot that distribution.

Recall that on a certain highway turn, there are 2 accidents per day on average. Assuming the number of accidents per day can be modeled as a Poisson random variable, let's plot the distribution.

Instructions
------------

-   Import `poisson` from `scipy.stats`, `matplotlib.pyplot` as `plt`, and `seaborn` as `sns`.
-   Generate a Poisson distribution sample with `size=10000` and `mu=2`.
-   Plot the sample generated.

In [None]:
# Import poisson, matplotlib.pyplot, and seaborn
from scipy.stats import poisson
import matplotlib.pyplot as plt 
import seaborn as sns

# Create the sample
sample = poisson.rvs(mu=2, size=10000, random_state=13)

# Plot the sample
sns.distplot(sample, kde=False)
plt.show()