If only all probability distributions were normal. Life can be so much simpler with the normal distribution. Why spend all your time
working out individual probablities when you can look up entire ranges in one swoop, and
still leave time for game play? In this chapter, you’ll see how to solve more complex
problems in the blink of an eye, and you’ll also find out how to bring some of that normal
goodness to other probability distributions.

## Love is a roller coaster

The wedding market is big business nowadays, and Dexter has an idea for
making that special day truly memorable. Why get married on the ground when
you can get married on a roller coaster?

Dexter’s convinced there’s a lot of money to be made from his innovative Love
Train ride, if only it passes the health and safety regulations.

I need to make sure
the combined weight of
the bride and groom won’t
be above 380 pounds.
Think you can help?

Before Dexter can go any further, he needs to make
sure that his special ride can cope with the weight
of the bride and groom, and he’s asked if you can
help him.

The ride he has in mind can cope with combined weights of up to 380
pounds. What’s the probability that the combined weight will be less
than this?

## All aboard the Love Train

Before we start, we need to know how the weights of brides and grooms in
Statsville are distributed, taking into account the weight of all their wedding
clothes. Both follow a normal distribution, with the bride weight distributed
as N(150, 400) and the groom weight as N(190, 500). Their weights are
measured in pounds.

![01](001.png)

We need to use these two probability distributions to somehow work out
the probability that the weight of a bride and groom will be less than the
maximum weight allowance on the ride. If the probability is sufficiently
high, we can be confident the ride is feasible.

We can calculate
this probability if we
know what the combined
probability distribution is,
but what’s that?

How do you think we can find the probability
distribution for the combined weights of the bride and
groom? What sort of distribution do you think this
might be? Why?

## Normal bride + normal groom

Let’s start by taking a closer look at how the weights of the bride and groom
are distriuted.

As you know, the weights follow normal distributions like this:

![02](002.png)

What we’re really after, though, is the probability distribution of the
combined weight of the bride and groom. In other words, we want to find
the probability distribution of the weight of the bride added to the weight
of the groom.

Bride weight + Groom weight ~ ?

Assuming the weights of the bride and groom are independent, the
shape of the distribution should look something like this:

![03](003.png)

## It’s still just weight

Can you remember when we first looked at continuous data and looked at how data
such as height and weight tend to be distributed? We found that data such as height
and weight are continuous, and they also tend to follow a normal distribution.
This time we’re looking at the combined weight of the happy couple. Even though
it’s combined weight, it’s still just weight, and we already know how weight tends to be
distributed. The combined weight is still **continuous**. What’s more, the combined
weight is still **distributed normally**. In other words, the combined weight of the
bride and groom follows a normal distribution.

Knowing that the combined weight of the bride and groom follows a normal
distribution helps us a lot. It means that we’ll be able to use probability tables just
like we did before to look up probabilities, which means we’ll be able to look up the
probability that the combined weight is less than 380 pounds—just what we need for
the ride.

There’s only one problem—before we can go any further, we need to know the mean
and variance of the combined weight of the bride and groom. How can we find this?

Bride weight + Groom weight ~ N(?, ?)

The combined weight
of the bride and
groom follows a normal
distribution, but what’s
the mean and variance?

It’s time for a trip down memory lane. Can you remember the
discrete shortcuts for the following formulas? Assume X and Y are
independent.

1. E(X + Y)

E(X + Y) = E(X) + E(Y)

2. Var(X + Y)

Var(X + Y) = Var(X) + Var(Y)

3. E(X - Y)

E(X - Y) = E(X) - E(Y)

4. Var(X - Y)

Var(X + Y) = Var(X) + Var(Y)

Remember that we
ADD the variances, even
though it’s for X - Y.

I don’t see how these
shortcuts help us. They’re
for discrete data, and we’re
dealing with continuous now.

The shortcuts apply to continuous data too.
When we originally encountered these shortcuts, we were dealing with discrete data.
Fortunately, the same rules and shortcuts also apply to continuous data.

How do you think we can use these shortcuts to find the probability
distribution of the weight of the bride + the weight of the groom?

## How’s the combined weight distributed

So far, we’ve found that the combined weight of the bride and groom are
normally distributed, and this means we can use probability tables to look up
the probability of the combined weight being less than a certain amount.

Let’s try rewriting the bride and groom weight distributions in terms of X
and Y. If X represents the weight of the bride and Y the weight of the groom,
and X and Y are independent, then we want to find μ and σ where

$X + Y ~ N(\mu, \sigma^2)$

X + Y means “the weight of the bride + the
weight of the groom.” But how do we know
what the mean and variance are?

In other words, before we go any further we need to find the mean and variance of
X + Y. But how?

Take a look at the answers to the last exercise. When we were working with discrete
probability distributions, we saw that as long as X and Y are independent we could
work out E(X + Y) and Var(X + Y) by using

E(X + Y) = E(X) + E(Y)

Var(X + Y) = Var(X) + Var(Y)

So if we know what the expectation and variance of X and Y are, we can use these
to work out the expectation and variance of X + Y.

That means that if we
know the distribution of X
and Y, we can figure out the
distribution of X + Y too.

We can use what we already know to figure out
what we don’t.

Because we know how the weight of the bride and the weight of the
groom are distributed, we can find the distribution of the combined
weight of the bride and groom.

Let’s look at this in more detail.

## X + Y Distribution Up Close

Being able to find the distribution of X + Y is useful if you’re working
with combinations of normal variables. If independent random variables
X and Y are normally distributed, then X + Y is normal too. What’s
more, you can use the mean and variance of X and Y to calculate the
distribution of X + Y.

Remember, two variables are
independent if they have no impact
on each other’s probabilities.

To find the mean and variance of X + Y, you can use the same formulae
that we used for discrete probability distributions. In other words, if

$X \sim N(\mu_x, \sigma_x^2)$
and
$Y \sim N(\mu_y, \sigma_y^2)$
then

$X + Y ~ N(\mu, \sigma^2)$

where

$\mu = \mu_x + \mu_y$

$\sigma^2 = \sigma_x^2 + \sigma_y^2$

If you add the means of X and Y together,
you get the mean of X + Y. Similarly,
summing the variances of X and Y gives you
the variance of X + Y

We can use these shortcuts if
X and Y are independent, which
makes life very easy indeed

In other words, the mean of X + Y is equal to the mean of X plus the
mean of Y, and the variance of X + Y is equal to the variance of X plus
the variance of Y.

Let’s look at a sketch of this. What do you notice about the variance of
X + Y?

![4](004.png)

The variance of X + Y is greater than the variance of X and also
greater than the variance of Y, which means that the curve of X + Y
is more elongated than either. This is true for any normal X and Y.

By adding the two variables together, you are in effect increasing the
amount of variability, and this elongates the shape of the distribution.

This in turn means that the shape of the distribution gets flatter so that
the total area under the curve is still 1.

## X – Y Distribution Up Close

Sometimes X + Y just won’t give you the sorts of probabilities you’re
after. If you need to find probabilities involving the difference between two
variables, you’ll need to use X - Y instead.

X - Y follows a normal distribution if X and Y are independent random
variables and are both normally distributed. This is exactly the same
criteria as for X + Y.

To find the mean and variance, we again use the same shortcuts that we
used for discrete probability distributions. If

$X \sim N(\mu_x, \sigma_x^2)$
and
$Y \sim N(\mu_y, \sigma_y^2)$
then

$X - Y ~ N(\mu, \sigma^2)$

where

$\mu = \mu_x - \mu_y$

$\sigma^2 = \sigma_x^2 + \sigma_y^2$

We ADD the variances together,
just like we did for discrete
probability distributions.

In other words, the mean of X – Y is equal to the mean of Y subtracted
from the mean of X, and you find the variance of X – Y by adding the X
and Y variances together.

Adding the variances together may not make intuitive sense at first,
but it’s exactly the same as when we worked with discrete probability
distributions. Even though we’re subtracting Y from X, we’re actually
still increasing the amount of variability. Adding the variances together
reflects this. As with the X + Y distribution, this leads to a flatter, more
elongated shape than either X or Y

![5](005.png)

If you look at the actual shape of the X - Y distribution, it’s the same
shape curve as for X + Y distribution, except that the center has moved.
The two distributions have the same variances, but different means.

## Finding probabilities

Now that we know how to calculate the distribution of X + Y, we
can look at how to use it to calculate probabilities. Here are the
steps you need to go through.

1. Work out the distribution and range

We know we need to use X + Y, and
we have a way of working out the
mean and variance

2. Standardize it

Once we know the distribution
and the range, we standardize it.

3. Look up the probabilities

We can then look up the
probability in standard
normal probability tables.

Sound familiar? These are exactly the same steps that
we went through in the previous chapter for the normal
distribution.

**Q: Remind me, why did we need to
find the distribution of X + Y?**

A: We’re looking for the probability that
the combined weight of a bride and groom
will be less than 380 pounds, which means
we need to know how the combined weight
is distributed. We’re using X to represent the
weight of the bride, and Y to represent the
weight of the groom, which means we need
to use the distribution of X + Y.

**Q: You say we can look up
probabilities for X + Y using probability
tables. How?**

A: In exactly the same way as we did
before. We take our probability distribution,
calculate the standard score, and then look
this value up in probablity tables.
Looking up probabilities for X + Y is no
different from looking up probabilities for
anything else. Just find the standard score,
look it up, and that gives you your probability.

**Q: So do all of the shortcuts we
learned for discrete data apply to
continuous data too?**

A: Yes, they do. This means we have an
easy way of combining random variables
and finding out how they’re distributed,
which in turn means we can solve more
complex problems.

The key thing to remember is that these
shortcuts apply as long as the random
variables are independent.

**Q: Can you remind me what
independent means?**

A: If two variables are independent, then
their probabilities are not affected by each
other. In our case, we’re assuming that the
weight of the bride is not influenced by the
weight of the groom.

**Q: What if X and Y aren’t independent?
What then?**

A: If X and Y aren’t independent, then
we can’t use these shortcuts. We’d need to
do a lot more work to find out how X + Y is
distributed because you’d have to find out
what the relationship is between X and Y.

## More people want the Love Train

It looks like there’s a good chance that the combined weight of the happy
couple will be less than the maximum the ride can take. But why restrict
the ride to the bride and groom?

Customers are demanding
that we allow more members of the
wedding party to join the ride, and
they’ll pay good money. That’s great, but
will the Love Train be able to handle
the extra load?

Let’s see what happens if we add another car for four more members of
the wedding party. These could be parents, bridesmaids, or anyone else
the bride and groom want along for the ride.

The car will hold a total weight of 800 pounds, and we’ll assume the
weight of an adult in pounds is distributed as

X ~ N(180, 625)

where X represents the weight of an adult. But how can we work out the
probability that the combined weight of four adults will be less than 800
pounds?

Think back to the shortcuts you can use when you calculate expectation
and variance. What’s the difference between independent observations and
linear transformations? What effect does each have on the expectation and
variance? Which is more appropriate for this problem?

## Linear transforms describe underlying changes in values…

Let’s start off by looking at the probability distribution of 4X, where X is
the weight of one adult. Is 4X appropriate for describing the probability
distribution for the weight of 4 people?

The distribution of 4X is actually a linear transform of X. It’s a
transformation of X in the form aX + b, where a is equal to 4, and b is equal
to 0. This is exactly the same sort of transform as we encountered earlier with
discrete probability distributions.

Linear transforms describe underlying changes to the size of the values in the
probability distribution. This means that 4X actually describes the weight of
an individual adult whose weight has been multiplied by 4.

The 4X probability
distribution describes
adults whose weights have
been multiplied by 4. The
weight is changed, not the
number of adults.

![6](006.png)

What we wanted was 4 adults,
not 1 adult 4 times actual size.

### So what’s the distribution of a linear transform?

Suppose you have a linear transform of X in the form aX + b, where
$X \sim N(\mu, \sigma^2)$. As X is distributed normally, this means that aX + b is distributed
normally too. But what’s the expectation and variance?

Let’s start with the expectation. When we looked at discrete probability
distributions, we found that E(aX + b) = aE(X) + b. Now, X follows a normal
distribution where E(X) = μ, so this gives us E(aX + b) = aμ + b.

We can take a similar approach with the variance. When we looked at discrete
probability distributions, we found that ${Var}(aX + b) = a^2 {Var}(X)$. We know that
Var(X) in this case is given by Var(X) = $\sigma^2$, so this means that Var(aX + b) = $a^2σ^2$.

$aX + b ~ N(aμ + b, a^2σ^2)$

The new variance is the SQUARE
of a multiplied by the original
variance

In other words, the new mean becomes aμ + b, and the new variance becomes $a^2σ^2$.
So what about independent observations?

## …and independent observations describe how many values you have

Rather than transforming the weight of each adult, what we really need to figure out is the probability distribution for the combined weight of four separate adults. In other words, we need to work out the probability distribution of four independent observations of X.

![07](007.png)

The weight of each adult is an observation of X, so this means that the
weight of each adult is described by the probability distribution of X. We
need to find the probability distribution of four independent observations of
X, so this means we need to find the probability distribution of

$X_1 + X_2 + X_3 + X_4$

where $X_1 , X_2 , X_3 {and} X_4$ are independent observations of X.

Each adult’s weight is an
independent observation of X.

## Expectation and variance for independent observations

When we looked at the expectation and variance of independent observations of
discrete random variables, we found that

$E(X_1 + X_2 + ... + X_n) = nE(X)$

and

$Var(X_1 + X_2 + ... + X_n) = nVar(X)$

As you’d expect, these same calculations work for continuous random variables too.

This means that if $X \sim N(\mu, \sigma^2)$, then

$X_1 + X_2 + ... + X_n ~ N(n\mu, n\sigma^2) $

**Q: So what’s the difference between
linear transforms and independent
observations?**

A: Linear transforms affect the underlying
values in your probability distribution. As
an example, if you have a length of rope of
a particular length, then applying a linear
transform affects the length of the rope.
Independent observations have to do with
the quantity of things you’re dealing with.
As an example, if you have n independent
observations of a piece of rope, then you’re
talking about n pieces of rope.
In general, if the quantity changes, you’re
dealing with independent observations. If the
underlying values change, then you’re dealing
with a transform.

**Q: Do I really have to know which is
which? What difference does it make?**

A: You have to know which is which
because it make a difference in your
probability calculations. You calculate
the expectation for linear transforms and
independent observations in the same
way, but there’s a big difference in the way
the variance is calculated. If you have n
independent observations then the variance
is n times the original. If you transform your
probability distribution as aX + b, then your
variance becomes a2 times the original.

**Q: Can I have both independent
observations and linear transforms in the
same probability distribution?**

A: Yes you can. To work out the probability
distribution, just follow the basic rules for
calculating expectation and variance. You
use the same rules for both discrete and
continuous probability distributions.

## The Normal Distribution Exposed

Us: Hey, Normal, glad you could make it
on the show.

Normal: Thanks for inviting me, Head First.

Us: Now, my first question is about your
name. Why are you called Normal?

Normal: It’s really because I’m so representative
of a lot of types of data. They have a probability
distribution that has a distinctive shape and a
smooth, bell-curved shape, and that’s me. I’m
something of an ideal.

Us: Can you give me an example?

Normal: Sure. Imagine you have a baker’s shop that
sells loaves of bread. Now, each loaf of a particular
sort of bread should theoretically weigh about the
same, but in practice, the actual weight of each loaf
of bread will vary.

Us: But surely they’ll all weigh about the
same?

Normal: More or less, but with variation. I model
that variation.

Us: So why’s that so important?

Normal: Well, it means that you can use me to work
out probabilities. Say you want to find the probability
of a randomly chosen loaf of bread being below a
particular weight. That sounds like something that
could be quite difficult, but with me, it’s easy.

Us: Easy? How do you mean?

Normal: With a lot of the other probability
distributions, there can be lots of complicated
calculations involved. With Binomial you have
factorials, and with Poisson you have to work with
exponentials. With me there’s none of that. Just look
me up in a table and away you go.

Us: Surely it’s not quite as simple as that?

Normal: Well, you do have to convert me to a
standard score first, but that’s nothing, not in the
grand scheme of things.

Us: So tell me, do you think you’re better
than the other probability distributions?

Normal: I wouldn’t say that I’m better as such,
but I’m a lot more flexible, and I’m useful in lots
of situations. I’m also a lot more robust. When
the numbers get high for Poisson and Binomial
distributions, they run into trouble. Mind you, I do
what I can to help out.

Us: You do? How?

Normal: Well under certain circumstances both
Binomial and Poisson look like me. It’s uncanny;
they’re often stopped at parties by people asking
them if they’re Normal. I tell them to take it as a
compliment.

Us: So how does that help?

Normal: Well, because they look like me, it means
that you can actually use my probability tables to
work out their probabilities. How cool is that? No
more late nights slaving over a calculator; just look it
up.

Us: I’m afraid that’s all we’ve got time for
tonight. Normal, thanks for coming along, it’s been a
pleasure.

Normal: You’re welcome, Head First.