# The Normal Distribution 

### Introduction

In a previous lesson we saw that, because of the central limit theorem, the distribution of the means of the sample distributions tends toward a normal distribution (informally a “bell curve”), irrespective of the shape of the population distribution.  We then also used a distribution to serve as a model of an underlying population.

Now it's time to learn a little bit more about the normal distribution, and how we can put it to use.

### The normal distribution

<img src="./pdf-normal.png" width="40%">

Now above is the normal distribution.  More specifically, it's the probability density function of the normal distribution.  At the center we see $\mu$, the center of a normally distributed population.  

Take note of the percentages in different regions of the plot above.  The percentages are indicating the area under the curve.  So in a normal distribution:
* 68% of the values will always be within one standard deviation, plus or minus, of the mean
* 95% of the values are within two standard deviations of the mean
* 99.7 are within three standard deviations of the mean

Now this becomes useful because if we want to see how rare a value is in our distribution, we can see how many standard deviations it is from the mean.

For example:
* What's the probability a value is 3 standard deviations greater than the mean?

> Well it's $.1$%.

* What's the probability is 3 standard deviations away from the mean in either direction?
> Well it's $.2$%.

* Now try to answer the following question: what's the probability of a value being larger than the mean by one standard deviation or more?
> Give it a shot.

### Calculating the Z-Score

<img src="./pdf-normal.png" width="40%">

Now so far we've seen that calculating how many standard deviations away a value is from mean can be very useful.  This number is called the z-score. 

> A Z-score is a numerical measurement used in statistics of a value's relationship to the mean (average) of a group of values, measured in terms of standard deviations from the mean. If a Z-score is 0, it indicates that the data point's score is identical to the mean score.

Let's learn about it by way of example.

The weights of adult males are normally distributed with a mean of 172 pounds and a standard deviation of 29 pounds.  Sam has a weight of 143.  If we randomly select a male adult, what's the probability his weight will be 143 or less.

Well remember the idea is to find number of standard deviations away from the mean.  To do this, we can first find the difference from the mean.  And then divide by the standard deviation.

In [5]:
(143 - 172)/29

-1.0

So 143 is one standard deviation away from the mean.  That $-1$ is called the z-score.  And the general formula is 

$z = \frac{(x - \mu)}{\sigma}$

Once we calculate the z-score we can use our graph to calculate the probability of selecting a value that amount or less.  


<img src="./pdf-normal.png" width="40%">

For example, for us, the probability of selecting a male adult that weighs 143  or less is 16%.  

> Because this is scipy returns this value with the CDF, we can also `scipy.stats` to answer this for us.

In [7]:
from scipy.stats import norm

norm_weight_male = norm(172, 29)

In [8]:
norm_weight_male.cdf(143)

0.15865525393145707

### Summary

In this lesson we learned about the normal distribution.  We saw that we can get a probability that values are within a certain threshold by using the z-score.  With the z-score we calculate how many standard deviations a value is away from the mean of our normal distribution, and can do so with the formula: 

$z = \frac{(x - \mu)}{\sigma}$

Once we make this calculation, we can use our chart to calculate the corresponding probabilities.

<img src="./pdf-normal.png" width="40%">