Discrete probability distributions can’t handle every situation. So far we’ve looked at probability distributions where we’ve been able to specify exact
values, but this isn’t the case for every set of data. Some types of data just don’t fit the
probability distributions we’ve encountered so far. In this chapter, we’ll take a look at
how continuous probability distributions work, and introduce you to one of the most
important probability distributions in town—the normal distribution.

## Discrete data takes exact values…

So far we’ve looked at probability distributions where the data is
**discrete**. By this we mean the data is composed of distinct numeric
values, and we’re been able to calculate the probability of each of
these values. As an example, when we looked at the probability
distribution for the winnings on a slot machine, the possible amounts
we could win on each game were very precise. We knew exactly what
amounts of money we could win, and we knew we’d win one of them.

If data is discrete, it’s numeric and can take only exact values. It’s
often data that can be *counted* in some way, such as the number of
gumballs in a gumball machine, the number of questions answered
correctly in a game show, or the number of breakdowns in a particular
period.

![001](001.png)

![001](002.png)

## …but not all numeric data is discrete

It’s not always possible to say what all the values should be in a set of
data. Sometimes data covers a range, where any value within that range
is possible. As an example, suppose you were asked to accurately measure
pieces of string that are between 10 inches and 11 inches long. You could
have measurements of 10 inches, 10.1 inches, 10.01 inches, and so on, as the
length could be anything within that range.

Numeric data like this is called **continuous**. It’s frequently data that is
*measured* in some way rather than counted, and a lot depends on the degree
of precision you need to measure to.

![3](003.png)

![3](004.png)

But why should I care
about continuous data?

The type of data you have affects how you find probabilities.

So far we’ve only looked at probability distributions that deal with discrete data.
Using these probability distributions, we’ve been able to find the probabilities of
exact discrete values.

The problem is that a lot of real-world problems involve continuous data, and
discrete probability distributions just don’t work with this sort of data. To find
probabilities for continuous data, you need to know about continuous data and
continuous probability distributions.

Meanwhile, someone has a problem...

## What’s the delay?

Julie is a student, and her best friend keeps trying to get her fixed up on
blind dates in the hope that she’ll find that special someone. The only
trouble is that not many of her dates are punctual—or indeed turn up.

Julie hates waiting alone for her date to arrive, so she’s made herself a rule:
if her date hasn’t turned up after 20 minutes, then she leaves.

I have another date tonight.
I definitely won’t wait for more than 20
minutes, but I hate standing around, What’s
the probability I’ll be left waiting for more
than 5 minutes? Can you help?

Here’s a sketch of the frequency showing the amount of time Julie
spends waiting for her date to arrive:

![5](005.png)

We need to find probabilities for the amount of time Julie spends waiting for
her date. Is the amount of time discrete or continuous? Why? How do you
think we can go about finding probabilities?

## We need a probability distribution for continuous data

We need to find the probability that Julie will have to wait for more than 5
minutes for her date to turn up. The trouble is, the amount of time Julie has to
wait is continuous data, which means the probability distributions we’ve learned
thus far don’t apply.

When we were dealing with discrete data, we were able to produce a specific
probability distribution. We could do this by either showing the probability of
each value in a table, or by specifying whether it followed a defined probability
distribution, such as the binomial or Poisson distribution. By doing this, we
were able to specify the probability of each possible value. As an example, when
we found the probability distribution for the winnings per game for one of Fat
Dan’s slot machines, we knew all of the possible values for the winnings and
could calculate the probability of each one..

With discrete data, we could give
the probability of each value.

| x | -1 | 4 | 9 | 14 | 19 |
|------|------|------|------|------|------|
| P(X = x) | 0.977 | 0.008 | 0.008 | 0.006 | 0.001 |

For continuous data, it’s a different matter. We can no longer give the probability
of each value because it’s impossible to say what each of these precise values is.
As an example, Julie’s date might turn up after 4 minutes, 4 minutes 10 seconds,
or 4 minutes 10.5 seconds. Counting the number of possible options would be
impossible. Instead, we need to focus on a particular level of accuracy and the
probability of getting a **range** of values.

For discrete probability
distributions, we look at the probability of
getting a **particular value**; for continuous
probability distributions, we look at the
probability of getting a **particular range**.

## Probability density functions can be used for continuous data

We can describe the probability distribution of a continuous random
variable using a **probability density function**.

A probability density function f(x) is a function that you can use to find the
probabilities of a continuous variable across a range of values. It tells us
what the shape of the probability distribution is.

Here’s a sketch of the probability density function for the amount of time
Julie spends waiting for her date to turn up:

![006](006.png)

This line is the probability density function for
the amount of time Julie waits for her dates.
The probability is constant for the first 20
minutes, and then it drops to 0 because she leaves.

![007](007.png)

These are the
same basic shape.

Can you see how it matches the shape of the frequency? This isn’t
just a coincidence.

Probability is all about how likely things are to happen, and the
frequency tells you how often values occur. The higher the relative
frequency, the higher the probability of that value occurring. As
the frequency for the amount of time Julie has to wait is constant
across the 20 minute period, this means that the probability density
function is constant too.

## Probability = area

For continuous random variables, probabilities are given by area. To find the
probability of getting a particular range of values, we start off by sketching the
probability density function. The probability of getting a particular range of values
is given by the area under the line between those values.

As an example, we want to find the probability that Julie has to wait for between 5
and 20 minutes for her date to turn up. We can find this probability by sketching
the probability density function, and then working out the area under it where x is
between 5 and 20.

![8](008.png)

The total area under the line must be equal to 1, as the total area represents the
total probability. This is because for any probability distribution, the total probability
must be equal to 1, and, therefore, the area must be too.

![9](009.png)

Let’s use this to help us find the probability that Julie will need to wait for over 5
minutes for her date to arrive.

The total area under the line must be 1. What’s the value of f(x)?

Hint: It’s a
constant value.

## To calculate probability, start by finding f(x)…

Before we can find probabilities for Julie, we need to find f(x), the probability
density function.

So far, we know that f(x) is a constant value, and we know that the total area under
it must be equal to 1. If you look at the sketch of f(x), the area under it forms
a rectangle where the width of the base is 20. If we can find the height of the
rectangle, we’ll have the value of f(x).

![10](010.png)

We find the area of a rectangle by multiplying its width and height together.
This means that
1 = 20 × height

height = 1/20

= 0.05

This means that f(x) must be equal to 0.05, as that ensures the total area under it
will be 1. In other words,

f(x) = 0.05 where x between 0 and 20

Here’s a sketch:

![11](011.png)

Now that we’ve found the probability density function, we can find P(X > 5).

## …then find probability by finding the area

The area under the probability density line between 5 and 20 is a rectangle.
This means that calculating the area of this rectangle will give us the probability
P(X > 5).

P(X > 5) = (20 - 5) × 0.05 (Area of rectangle = base x height)

= 0.75 

So the probability that Julie will have to wait for more than 5 minutes is 0.75.

![012](012.png)

Do I have to use area to find
probability? Can’t I just pick all the
exact values in that range and add their
probabilities together? That’s what we
did for discrete probabilities.

That doesn’t work for continuous probabilities.

For continuous probabilities, we have to find the probability by calculating the
area under the probability density line.

We can’t add together the probability of getting each value within the range
as there are an infinite number of values. It would take forever.

The only way we can find the probability for continuous probability
distributions is to work out the area underneath the curve formed by the
probability density function.

When dealing with
continuous data, you
calculate probabilities
for a range of values.

**Q: So there’s a function called the
probability density function. What’s
probability density?**

A: Probability density tells you how
high probabilities are across ranges, and
it’s described by the probability density
function. It’s very similar to frequency density.

Probability density uses area to tell you
about probabilities, and frequency density
uses area to tell you about frequencies.

**Q: So aren’t probability density and
probability the same thing?**

A: Probability density gives you a
means of finding probability, but it’s not the
probability itself. The probability density
function is the line on the graph, and the
probability is given by the area underneath it
for a specific range of values.

**Q: I see, so if you have a chart
showing a probability density function,
you find the probability by looking at area,
instead of reading it directly off the chart.**

A: Exactly. For continuous data, you
need to find probability by calculating area.
Reading probabilities directly off a chart only
works for discrete probabilities.

**Q: Doesn’t finding the probability
get complicated if you have to calculate
areas? I mean, what if the probability
density function is a curve and not a
straight line?**

A: It’s still possible to do it, but you
need to use calculus, which is why we’re
not expecting you to do that in this book.
The key thing is that you see where the
probabilities come from and how to interpret
them.

If you’re really interested in working out
probabilities using calculus, by all means,
give it a go. We don’t want to hold you back.

**Q: You’ve talked a lot about
probability ranges. How do I find the
probability of a precise value?**

A: When you’re dealing with continuous
data, you’re really talking about acceptable
degrees of accuracy, and you form a range
based on these values. Let’s look at an
example:

Suppose you wanted a piece of string that’s
10 inches long to the nearest inch. It would
be tempting to say that you need a piece of
string that’s exactly 10 inches long, but that’s
not entirely accurate. What you’re really after
is a piece of string that’s between 9.5 inches
and 10.5 inches, as you want string that 10
inches in length to the nearest inch. In other
words, you want to find the probability of the
length being in the range 9.5 inches to 10.5
inches.

**Q: But what if I want to find the
probability of a precise single value?**

A: This may not sound intuitive at first,
but it’s actually 0. What you’re really talking
about is the probability that you have a
precise value to an infinite number of
decimal places.

If we go back to the string length example,
what would happen if you needed a piece
of string exactly 10 inches long? You would
need to have a length of string measuring
10 inches long to the nearest atom and
examined under a powerful microscope.

The probability of the string being precisely
10 inches long is virtually impossible.

**Q: But I’m sure that degree of
accuracy isn’t needed. Surely it would
be enough to measure it to the nearest
hundredth of an inch?**

A: Ah, but that brings us back to the
degree of accuracy you need in order for
the length to pass as 10 inches, rather
than finding the probability of a value to an
infinite degree of precision. You use your
degree of accuracy to construct your range
of acceptable measurements so that you can
work out the probability.

* Discrete data is composed of distinct numeric values.
* Continuous data covers a range, where any value within that range is possible. It’s frequently data that is measured in some way, rather than counted.
* Continuous probability distributions can be described with a probability density function.
* You find the probability for a range of values by calculating the area under the probability density function between those values. So to find P(a < X < b), you need to calculate the area under the probability density function between a and b.
* The total area under the probability density function must equal 1.

# We’ve found the probability

So far, we’ve looked at how you can use probability density functions
to find probabilities for continuous data. We’ve found that the
probability that Julie will have to wait for more than 5 minutes for her
date to turn up is 0.75.

That’s great, at least
now I have an idea of
how long I’ll be waiting.
But what about my shoes?

## Searching for a sole mate

As well as preferring men who are punctual, Julie has preconceived ideas
about what the love of her like should be like.

I need a man who’ll be taller
than me when I wear my
highest heels. Shoes definitely
come first.

Julie loves wearing high-heeled shoes, and the higher the heel, the
happier she is. The only problem is that she insists that her dates
should be taller than her when she’s wearing her most extreme set of
heels, and she’s running out of suitable men.

Unfortunately, the last couple of times Julie was sent on a blind date,
the guys fell short of her expectations. She’s wondering how many
men out there are taller than her and what the probability is that her
dates will be tall enough for her high standards.

So how can we work out the probability this time?

## Male modelling

So far we’ve looked at very simple continuous distributions, but it’s
unlikely these will model the heights of the men Julie might be dating.
It’s likely we’ll have several men who are quite a bit shorter than average,
a few really tall ones, and a lot of men somewhere in between. We can
expect most of the men to be average height.

![13](013.png)

Given this pattern, the probability density of the height of the men is likely
to look something like this.

![14](014.png)

This shape of distribution is actually fairly common and can be
applied to lots of situations. It’s called the **normal distribution**.

## The normal distribution is an “ideal” model for continuous data

The normal distribution is called normal because it’s seen as an ideal. It’s
what you’d “normally” expect to see in real life for a lot of continuous data
such as measurements.

The normal distribution is in the shape of a bell curve. The curve is
symmetrical, with the highest probability density in the center of the curve.
The probability density decreases the further away you get from the mean.
Both the mean and median are at the center and have the highest probability
density.

The normal distribution is defined by two parameters, $\mu$ and $\sigma^2$. $\mu$ tells you
where the center of the curve is, and $\sigma$ gives you the spread. If a continuous
random variable X follows a normal distribution with mean $\mu$ and standard
deviation $\sigma$, this is generally written $X \sim N(\mu, \sigma^2)$.

![15](015.png)

So what effect do $\mu$ and $\sigma$ really have on the shape of the normal distribution?

We said that $\mu$ tells you where the center of the curve is, and $\sigma^2$ indicates the
spread of values. In practice, this means that as $\sigma^2$ gets larger, the flatter and
wider the normal curve becomes.

![16](016.png)

If the probability density decreases the further you get from μ, when does it reach 0?

No matter how far you go out on the graph, the
probability density never equals 0.

The probability density gets closer and closer to 0, but never quite
reaches it. If you looked at the probability density curve a very long
way from μ, you’d find that the curve just skims above 0.

Another way of looking at this is that events become more and more
unlikely to occur, but there’s always a tiny chance they might.

## So how do we find normal probabilities?

As with any other continuous probability distribution, you find
probabilities by calculating the area under the curve of the
distribution. The curve gives the probability density, and the
probability is given by the area between particular ranges. If, for
instance, you wanted to find the probability that a variable X lies
between a and b, you’d need to find the area under the curve between
points a and b.

![17](017.png)

Sound complicated? Don’t worry, it’s easier than you might think.

Working out the area under the normal curve would be difficult if you had to
do it all by yourself, but fortunately you have a helping hand in the form of
probability tables. All you need to do is work out the range of the area you
want to find, and then look up the corresponding probability in the table.

## Three steps to calculating normal probabilities

There are a few steps you need to take in order to find normal
probabilities. We’ll guide you through the process, but for now here’s
a roadmap of where we’re headed.


1. Grab your distribution and range.
If the normal distribution applies to
your situation, see if you can find what
the mean and standard deviation are.
You’ll need these before you can find
your probabilities. You also need to
figure out what area you need to find.

2. Standardize it.
Don’t worry about this for now;
we’ll show you how to do this
really soon.

3. Look up the probabilities

## Step 1: Determine your distribution

The first thing we need to do is determine the distribution of the data.

Julie has been given the mean and standard deviation of the heights of eligible
men in Statsville. The mean is 71 inches, and the variance is 20.25 inches. This
means that if X represents the heights of the men, X ~ N(71, 20.25).

This is shorthand for “The variable
X follows a normal distribution, and
has a mean of 71 and a variance of
20.25.”

![18](018.png)

We also need to know which range of values will give us the right probability
area. In this case, we need to find the probability that Julie’s blind date will be
sufficiently tall.

That’s easy. Julie wants her date to be taller
than her, so we can work out probabilities based
on her height.

Julie is 64 inches tall, so we’ll find the probability that her date is taller.
Here’s a sketch:

![19](019.png)

## Step 2: Standardize to N(0, 1)

The next step is to standardize our variable X so that the mean becomes 0
and the standard deviation 1. This gives us a standardized normal variable
Z where Z ~ N(0, 1).

![20](020.png)

Being able to use a standard normal distribution means that we can use the
same set of probability tables for all possible values of $\mu$ and $\sigma^2$. There’s just
one question—how do we convert out normal distribution into a standard
form?

How do you think we might be able to standardize our normal distribution?

### To standardize, first move the mean…

Let’s start off by transforming our normal distribution so that the mean
becomes 0 rather than 71. To do this, we move the curve to the left by 71.

![021](021.png)

This gives us a new distribution of
X - 71 ~ N(0, 20.25)

...then squash the width

We also need to adjust the variance. To do this, we “squash” our distribution
by dividing by the standard deviation. We know the variance is 20.25, so the
standard deviation is 4.5.

Recall that the standard
deviation is the square root of
the variance.

![022](022.png)

Doing this gives us

$ \frac{X-71}{4.5} \sim N(0,1)$

or Z ~ N(0, 1) where

$ Z = \frac{X-71}{4.5}$

Look familiar? This is the standard score we encountered when
we first looked at the standard deviation. In general,
you can find the standard score for any normal variable X using

$ Z = \frac{X-\mu}{\sigma}$

## Now find Z for the specific value you want to find probability for

So far we’ve looked at how our probability distribution can be standardized to
get from $X \sim N(\mu, \sigma^2)$ to Z ~ N(0, 1). What we’re most interested in is actual
probabilities. What we need to do is take the range of values we want to find
probabilities for, and find the standard score of the limit of this range. Then
we can look up the probability for our standard score using normal probability
tables.

In our situation, we want to find the probability that Julie’s date is taller than
her. Since Julie is 64 inches tall, we need to find P(X > 64). The limit of this
range is 64, so if we calculate the standard score z of 64, we’ll be able to use
this to find our probability.

![023](023.png)

Let’s find the standard score of 64.

$ Z = \frac{X-\mu}{\sigma}$

$ Z = \frac{64-71}{4.5}$

= -1.56 (to 2 decimal places)

So -1.56 is the standard score of 64, using the mean and standard
deviation of the men’s heights in Statsville.

Now that we have this, we can move onto the final step, using
tables to look up the probability.

**Q: Is this the same standard score that we saw before?**

A: Yes it is. It has more uses than just the normal distribution, but
it’s particularly useful here as it allows us to use standard normal
probability tables.

**Q: Is the probability for my standardized range really the
same as for my original distribution? How does that work?**

A: The probabilities work out the same, but using the probability
tables is a lot more convenient.

When we standardize our original normal distribution, everything
keeps the same proportion. The overall area doesn’t grow or shrink,
and as it’s area that gives the probability, the probability stays the
same too.

## Look up the probability

P(Z > –1.56) = 1 - P(Z < –1.56)

In [5]:
from scipy import stats

In [8]:
1 - stats.norm.cdf(-1.56, loc=0, scale=1)

0.940620059405207

In other words, the probability that Julie’s date is taller than her is 0.94

There’s a 94%
chance my date
will be taller than
me? I like those
odds!

Note, stats.norm.pdf will give probability density at -1.56.

In [12]:
stats.norm.pdf(-1.56, loc=0, scale=1)

0.11815729505958228

If we want probability till -1.56 we need to use a cdf:

![middle](025.gif)

## Probability Tables Up Close

Probability tables allow you to look up the probability P(Z < z) where
z is some value. The problem is you don’t always want to find this
sort of probability; sometimes you want to find the probability that a
continuous random variable is greater than z, or between two values.

How can you use probability tables to find the probability you need?
The big trick is to find a way of using the probability tables to get to
what you want, usually by finding a whole area and then subtracting
what you don’t need.

![26](026.png)

### Finding P(Z > z)

We can find probabilities of the form P(Z > z) using

We’ve already used this to find
the probability that Julie is
taller than her date

P(Z > z) = 1 - P(Z < z)

We’ve already used this to find
the probability that Julie is
taller than her date.

In other words, take the area where Z < z away from the total probability.

![27](027.png)

### Finding P(a < Z < b)

Finding this sort of probability is slightly more complicated to calculate,
but it’s still possible. You can calculate this sort of probability using

P(a < Z < b) = P(Z < b) - P(Z < a)

You could use this to find the
probability that the height of Julie’s
date is within a particular range.

In other words, calculate P(Z < b), and take away the area for P(Z < a).

![28](028.png)

**Q: I’ve heard of the term “Gaussian.”
What’s that?**

A: Another name for the normal
distribution is the Gaussian distribution. If
you hear someone talking about a Gaussian
distribution, they’re talking about the same
thing as the normal distribution.

**Q: Finding the probability of a range
looks kinda tricky. How do I do it?**

A: The big thing here is to think about
how you can get the area you want using
the probability tables. Probability tables
generally only give probabilities in the form
P(Z < z) where z is some value. The big trick,
then, is to rewrite your probability only in
these terms.

If you’re dealing with a probability in the form
P(a < Z < b)—that is, some sort of range—
you’ll have two probabilities to look up, one
for P(Z < a) and the other for P(Z < b). Once
you have these probabilities, subtract the
smallest from the largest.

**Q: Do continuous distributions have
a mode? Can you find the mode of the
normal distribution?**

A: Yes. The mode of a continuous
probability distribution is the value where
the probability density is highest. If you draw
the probability density, it’s the value of the
highest point of the curve.

If you look at the curve of the normal
distribution, the highest point is in the middle.
The mode of the normal distribution is μ.

**Q: What about the median?**

A: The median of a continuous probability
distribution is the value a where
P(X < a) = 0.5. In other words, it’s the value
that area of the probability density curve in
half.

For the normal distribution, the median is
also μ. The median and mode don’t get used
much when we’re dealing with continuous
probability distributions. Expectation and
variance are more important.

**Q: What’s a standard score?**

A: The standard score of a variable is
what you get if you subtract its mean and
divide by its standard deviation. It’s a way
of standardizing normal distributions so
that they are transformed into a N(0, 1)
distribution, and that gives you a way of
comparing them. Standard scores are
useful when you’re dealing with the normal
distribution because it means you can look
up the probability of a range using standard
normal probability tables.

The standard score of a particular value also
describes how many standard deviations
away from the mean the value is, which
gives you an idea of its relative proximity to
the mean.