Unlikely events happen, but what are the consequences? So far we’ve looked at how probabilities tell you how likely certain events are. What
probability doesn’t tell you is the overall impact of these events, and what it means
to you. Sure, you’ll sometimes make it big on the roulette table, but is it really worth it
with all the money you lose in the meantime? In this chapter, we’ll show you how you
can use probability to predict long-term outcomes, and also measure the certainty
of these predictions.

# Back at Fat Dan’s Casino

Have you ever felt mesmerized by the
flashing lights of a slot machine? Well,
you’re in luck. At Fat Dan’s Casino, there’s
a full row of shiny slot machines just waiting
to be played. Let’s play one of them, which
costs $1 per game (pull of the lever). Who
knows, maybe you’ll hit jackpot!

The slot machine has three windows, and
if all three windows line up in the right way,
the cash will come cascading out.

![1](001.png)

The amount of money
you can win looks tempting, but
I’d like to know the probability of
getting any of these combinations
before playing.

This sounds like something we can calculate.
Here are the probabilities of a particular image
appearing in a particular window:

| $ | Cherry | Leamon | Other |
|------|------|------|------|
| 0.1 | 0.2 | 0.2 | 0.5 |

The three windows are independent of each other,
which means that the image that appears in one of
the windows has no effect on the images that appear
in any of the others.

## BE the gambler
Your job is to play like you’re the
gambler and work out the probability of getting
each combination on the poster. What’s the
probability of not winning anything?

Exercise 01

We can compose a probability distribution for the slot machine

Here are the probabilities of the different winning combinations
on the slot machine.

| Combination | None | Lemons | Cherries | Dollars/cherry | Dollars |
|------|------|------|------|------|------|
| Probability  | 0.977 | 0.008 | 0.008 | 0.006 | 0.001 |

This looks useful, but I wonder if we can take it one
step further. We’ve found the probabilities of getting
each of the winning combinations, but what we’re
really interested in is how much we’ll win or lose.

We don’t just want to know the probability of
winning, we want to know how much we stand to
win.

The probabilities are currently written in terms of combinations of
symbols, which makes it hard to see at a glance what out gain will be.

We don’t have to write them like this though. Instead of writing the
probabilities in terms of slot machine images, we can write them in
terms of how much we win or lose on each game. All we need to do
is take the amount we’ll win for each combination, and subtract the
amount we’ve paid for the game.

| Combination | None | Lemons | Cherries | Dollars/cherry | Dollars |
|------|------|------|------|------|------|
| Gain in \\$  | -1 | 4 | 9 | 14 | 19 |
| Probability  | 0.977 | 0.008 | 0.008 | 0.006 | 0.001 |

The table gives us the probability distribution of the
winnings, a set of the probabilities for every possible gain
or loss for our slot machine.

## Probability Distributions Up Close

When you derived the probabilities of the slot machine, you calculated the
probability of making each gain or loss. In other words, you calculated the
probability distribution of a **random variable**, which is a variable that can
takes on a set of values, where each value is associated with a specific probability.
In the case of Fat Dan’s slot machine, the random variable represents the
amount we’ll gain in each game.

When we want to refer to a random variable, it’s usual to represent it by a capital
letter, like X or Y. The particular values that the variable can take are represented
by a lowercase letter—for example, x or y. Using this notation, P(X = x) is a way
of saying “the probability that the variable X takes a particular value x.”

Here’s our slot machine probability distribution written using this notation:

| Combination | None | Lemons | Cherries | Dollars/cherry | Dollars |
|------|------|------|------|------|------|
| x  | -1 | 4 | 9 | 14 | 19 |
| P(X=x)  | 0.977 | 0.008 | 0.008 | 0.006 | 0.001 |

P(X=9) = 0.008

The probability of Random Variable X taking the value of 9 is 0.008

The variable is discrete. This means that it can only take exact values.

As well as giving a table of the probability distribution, we can also show the
distribution on a chart to help us visualize it. Here is a bar chart showing the slot
machine probabilities.

![002](002.png)

Why should I care about probability
distributions? All I want to know is
how much I’ll win on the slot machine.
Can I calculate that?

Once you’ve calculated a probability
distribution, you can use this information
to determine the expected outcome.
In the case of Fat Dan’s slot machine, we can use our
probability distribution to determine how much you can
expect to win or lose long-term.

**Q: Why couldn’t we have just used
the symbols instead of winnings? I’m not
sure we’ve really gained that much.**

A: We could have, but we can do more
things if we have numeric data because we
can use it in calculations. You’ll see shortly
how we can use numeric data to work out
how much we can expect to win on each
game, for instance. We couldn’t have done
that if we had just used symbols.

**Q: What if I want to show probability
distributions on a Venn diagram?**

A: It’s not that appropriate to show
probability distributions like that. Venn
diagrams and probability trees are useful if
you want to calculate probabilities. With a
probability distribution, the probabilities have
already been calculated.

**Q: Can you use any letter to represent
a variable?**

A: Yes, you can, as long as you don’t
confuse it with anything else. It’s most
common to use letters towards the end of
the alphabet, though, such as X and Y.

**Q: Should I use the same letter for the
variable and the values? Would I ever use
X for the variable and y for the values?**

A: Theoretically, there’s nothing to
stop you, but in practice you’ll find it more
confusing if you use different letters. It’s best
to stick to using the same letter for each.

**Q: You said that a discrete random
variable is one where you can say
precisely what the values are. Isn’t that
true of every variable?**

A: No, it’s not. With the slot machine
winnings, you know precisely what the
winnings are going to be for each symbol
combination. You can’t get any more precise,
and it wouldn’t matter how many times you
played. For each game the possible values
remain the same.
Sometimes you’re given a range of values
where any value within the range is possible.
As an example, suppose you were asked to
measure pieces of string that are between
10 inches and 11 inches long. The length
could be literally any value within that range.
Don’t worry about the distinction too much
for now; we’ll look at this in more detail
later on in the book. For now, every random
variable we look at will be discrete.

## Expectation gives you a prediction of the results…

You have a probability distribution for the amount you could
gain on the slot machines, but now you need to know how much
you can expect to win or lose long-term. You can do this by
calculating how much you can typically expect to win or lose in
each game. In other words, you can find the **expectation.**

The expectation of a variable X is a bit like the mean, but
for probability distributions. You even calculate it in a similar
way. To find the expectation, you multiply each value x by the
probability of getting that value, and then sum the results.


The expectation of a variable X is usually written E(X), but
you’ll sometimes see it written as μ, the symbol for the mean.
Think of the expectation and mean as twins separated at birth.

Here’s the equation for working out E(X):

$\mu = E(X) = \sum{xP(X=x)}$

Let’s use this to calculate the expectation of the slot machine
gain. Here’s a reminder of our probability distribution:

| x  | -1 | 4 | 9 | 14 | 19 |
|------|------|------|------|------|------|
| P(X=x)  | 0.977 | 0.008 | 0.008 | 0.006 | 0.001 |

E(X) = (-1 × 0.977) + (4 × 0.008) + (9 × 0.008) + (14 × 0.006) + (19 × 0.001)

= -0.977 + 0.032 + 0.072 + 0.084 + 0.019

= -0.77

This is the amount in \\$’s you can expect to gain on each pull of the lever—and it’s negative!

In other words, over a large number of games, you can expect
to lose \\$0.77 for each game. This means that if you played the
slot machine 100 times, you could expect to lose \\$77.

## …and variance tells you about the spread of the results

The expectation tells you how much on average you can expect to win or
lose with each game. If you lost this amount every single time, where would
the fun be, and who would play?

Just because you can expect to lose each time you play doesn’t mean there
isn’t a small chance you’ll win big. Just like the mean, the expectation doesn’t
give the full story as the amount you stand to gain on each game could vary
a lot. How do you think we can measure this?

![3](003.png)

I wonder...if expectation
is like the mean, can we
use some sort of variance?
That’s what we did before.

Probability distributions have variance.
The expectation gives the typical or average value of
a variable but it doesn’t tell you anything about how
the values are spread out. For our slot machine, this
will tell us more about the variation of our potential
winnings.

We can use variance to
measure this spread. Let’s see how we can do this.

## Variances and probability distributions

We calculated the variance of a set of numbers. We worked
out $(x - \mu)^2$ for each number, and then we took the average of these results.

We can do something similar to work out the variance of a variable X. Instead
of finding the average of $(X - \mu)^2$, we find its expectation. We use this
formula:

$Var(X) = E(X - \mu)^2$

So how do we calculate $E(X - \mu)^2$?

Finding $E(X - \mu)^2$ is actually quite similar to finding E(X).

When we calculate E(X), we take each value in the probability distribution, multiply it by its probability, and then add the results together. In other words, we use the calculation

$E(X) = \sum{xP(X=x)}$

When we calculate the variance of X, we calculating (x - μ)2 for every value
x, multiply it by the probability of getting that value x, and then add the
results together.

$E(X-\mu)^2 = \sum{ (x-\mu)^2 P(X=x)}$

In other words, instead of multiplying x by its probability, you multiply $(X - \mu)^2$ by the probability of getting that value of x.

## Let’s calculate the slot machine’s variance

Let’s see if we can use this to calculate the variance of
the slot machine. To do this, we subtract μ from each
value, square the result, and then multiply each one by
the probability. As a reminder, E(X) or $\mu$ is -0.77.

| x  | -1 | 4 | 9 | 14 | 19 |
|------|------|------|------|------|------|
| P(X=x)  | 0.977 | 0.008 | 0.008 | 0.006 | 0.001 |

$Var(X) = E(X - \mu)^2$

= (-1+0.77)2 × 0.977 + (4+0.77)2 × 0.008 + (9+0.77)2 × 0.008 + (14+0.77)2 × 0.006 + (19+0.77)2 × 0.001

= 2.6971

This means that while the expectation of our winnings is -0.77, the
variance is 2.6971.

What about the standard deviation? Can we calculate that too?

As well as having a variance, probability
distributions have a standard deviation.
It serves a similar function to the standard deviation of a set of values.
It’s a way of measuring how far away from the center you can expect
your values to be.

As before, the standard deviation is calculated by taking the square
root of the variance like this:

$\sigma = \sqrt{Var(X)}$

This means that the standard deviation of the slot machine winnings is $\sqrt{2.6971}$, or 1.642. This means that on average, our winnings per game will be 1.642 away from the expectation of -0.77.

Would you prefer to play on a slot machine
with a high or low variance? Why?

**Q: So expectation is a lot like the
mean. Is there anything for probability
distributions that’s like the median or
mode?**

A: You can work out the most likely
probability, which would be a bit like the
mode, but you won’t normally have to do this.
When it comes to probability distributions,
the measure that statisticians are most
interested in is the expectation.

**Q: Shouldn’t the expectation be one of
the values that X can take?**

A: It doesn’t have to be. Just as the mean
of a set of values isn’t necessarily the same
as one of the values, the expectation of a
probability distribution isn’t necessarily one
of the values X can take.

**Q: Are the variance and standard
deviation the same as we had before
when we were dealing with values?**

A: They’re the same, except that this time
we’re dealing with probability distributions.
The variance and standard deviation of a
set of values are ways of measuring how
far values are spread out from the mean.
The variance and standard deviation of
a probability distribution measure how
the probabilities of particular values are
dispersed.

**Q: I find the concept of $E(X - \mu)^2$
confusing. Is it the same as finding
$E(X - \mu)$ and then squaring the end result?**

A: No, these are two different calculations.
$E(X - \mu)^2$ means that you find the square of
$X - \mu$ for each value of X, and then find the
expectation of all the results. If you calculate
$E(X - \mu)$ and then square the result, you’ll get
a completely different answer.
Technically speaking, you’re working out
$E((X - \mu)^2)$, but it’s not often written that way.

**Q: So what’s the difference between a
slot machine with a low variance and one
with a high variance?**

A: A slot machine with a high variance
means that there’s a lot more variability in
your overall winnings. The amount you could
win overall is less predictable.
In general, the smaller the variance is, the
closer your average winnings per game are
likely to be to the expectation. If you play on
a slot machine with a larger variance, your
overall winnings will be less reliable.

## Every pull of the lever is an independent observation

When we play multiple games on the slot machine, each game
is called an **event**, and the outcome of each game is called an
**observation**. Each observation has the same expectation and
variance, but their outcomes can be different. You may not gain
the same amount in each game.

We need some way of differentiating between the different
games or observations. If the probability distribution of the slot
machine gains is represented by X, we call the first observation
$X_1$ and the second observation $X_2$.

![4](004.png)

$X_1$ and $X_2$ have the same probabilities, possible values,
expectation and variance as X. In other words, they have the
same probability distribution, even though they are separate
observations and their outcomes can be different.

![5](005.png)

When we want to find the expectation and variance of two
games on the slot machine, what we really want to find is
the expectation and variance of $X_1 + X_2$. Let’s take a look at
some shortcuts.

### Observation shortcuts

Let’s find the expectation and variance of $X_1 + X_2$.

### Expectation
First of all, let’s deal with $E(X_1 + X_2)$.

$E(X_1 + X_2) = E(X_1) + E(X_2)$

$= E(X) + E(X)$

$= 2E(X)$

$E(X_1)$ and $E(X_2)$ are both
equal to E(X) as X1 and X2
follow the same probability
distribution as X

In other words, if we have the expectation of two observations, we
multiply E(X) by 2. This means that if we were to play two games
on a slot machine where E(X) = -0.77, the expectation would be
-0.77×2, or -1.54.

We can extend this to deal with multiple observations. If we want to
find the expectation of n observations, we can use

$E(X)1 + X_2 + ... X_n) = nE(X)$

$X_1 + X_2$ is not the same as 2X.

$X_1 + X_2$ means you are considering
two observations
of X. 2X means you have one
observation, but the possible
values have doubled.

### Variance
So what about $Var(X_1 + X_2)$? Here’s the calculation.

$Var(X_1 + X_2) = Var(X_1) + Var(X_2)$<br>
= Var(X) + Var(X)<br>
= 2Var(X)<br>

$Var(X_1)$ and $Var(X_2)$ are the same as Var(X) as $X_1$ and $X_2$ follow the same probability distribution as X.

This means that if we were to play two games on a slot machine where
Var(X) = 2.6971, the variance would be 2.6971×2, or 5.3942.

We can extend this for any number of independent observations. If we
have n independent observations of X

$Var(X_1 + X_2 + ... X_n) = nVar(X)$

In other words, to find the expectation and variance of multiple
observations, just multiply E(X) and Var(X) by the number of observations.

**Q: Isn’t $E(X_1 + X_2)$ the same as E(2X)?**

A: They look similar but they’re actually
two different concepts.

With E(2X), you want to find the expectation
of a variable where the underlying values
have been doubled. In other words, there’s
only one variable, but the values are twice
the size.

With $E(X_1 + X_2)$, you’re looking at two
separate instances of X, and you’re looking
at the joint expectation. As an example, if X
represents the distribution of a game, then
$X_1 + X_2$ represents the distribution of two
games.

**Q: So are $X_1$ and $X_2$ the same?**

A: They follow the same distribution, but
they’re different instances or observations.
As an example, $X_1$ could refer to game
1, and $X_2$ to game 2. They both have the
same probability distribution, but the actual
outcome of each might be different.

## New slot machine on the block

Fat Dan has brought in a new model slot machine. Each
game costs more, but if you win you’ll win big. Here’s the
probability distribution:

| x  | -5 | 395 |
|------|------|------|
| P(X=x)  | 0.99 | 0.01 |

Each game costs more
than the other slot
machine, but just look
at the jackpot!

We’ve looked at the expectation and variance of playing a
single machine, and also for playing several independent games
on the same machine. What happens if we play two different
machines at once?

In this situation, we have two different, independent probability
distributions for our machines:

These are the current
gains of Fat Dan’s new
slot machine:

| x  | -5 | 395 |
|------|------|------|
| P(X=x)  | 0.99 | 0.01 |

These are the
current gains of our
original slot machine:

| x  | -2 | 23 | 48 | 73 | 98 |
|------|------|------|------|------|------|
| P(X=x)  | 0.977 | 0.008 | 0.008 | 0.006 | 0.001 |

So how can we find the expectation and variance of playing
one game each on both machines?

We could work out the probability
distribution of X + Y, but that would be timeconsuming,
and we might make a mistake. I
wonder if we can take another shortcut?

## Add E(X) and E(Y) to get E(X + Y)…

We want to find the expectation and variance of playing one game each
on both of the slot machines. In other words, we want to find E(X + Y)
and Var(X + Y) where X and Y are random variables representing the
two machines. X and Y are independent.

One way of doing this would be to calculate the probability distribution
of X + Y, and then calculate the expectation and variance.

![6](006.png)

Fortunately we don’t have to do this. To find E(X + Y), all we
need to do is add together E(X) and E(Y).

E(X + Y) = E(X) + E(Y)

Intuitively this makes sense. If, for example, you were playing
two games where you would expect to win \\$5 in one game
and \\$10 in the other, you would expect to win \\$15 overall—
\\$5 + \\$10.

We can do something similar with the variance. To find
Var(X +Y), we add the two variances together. This works for
all independent random variables.

Var(X + Y) = Var(X) + Var(Y)

![7](007.png)

Adding the
variances
together only
works for
independent
random variables

If X and Y are not independent,
then Var(X + Y) is no longer
equal to Var(X) + Var(Y).

## …and subtract E(X) and E(Y) to get E(X – Y)

You’re not just limited to adding random variables; you
can also subtract one from the other. Instead of using the
probability distribution of X + Y, we can use X – Y.

If you’re dealing with the difference between two random
variables, it’s easy to find the expectation. To find E(X – Y),
we subtract E(Y) from E(X).

Finding the variance of X – Y is less intuitive. To find
Var(X – Y), we add the two variances together.

E(X - Y) = E(X) – E(Y)

Var(X - Y) = Var(X) + Var(Y)

(We add the variances, so be careful!)

But that doesn’t make
sense. Why should we
add the variances?

Because the variability increases.
When we subtract one random variable
from another, the variance of the probability
distribution still increases.

If you’re
subtracting
two random
variables, add
the variances.

It’s easy to make this
mistake as at first glance it
seems counterintuitive. Just
remember that if the two
variables are independent,
Var(X - Y) = Var(X) + Var(Y)

Subtracting independent
random variables still
increases the variance.

When we subtract independent random variables, the
variance is exactly the same as if we’d added them together.
The amount of variability can only increase.

![8](008.png)

**Q: So if X and Y are games, does
aX + bY mean a games of X and b games
of Y?**

A: aX + bY actually refers to two linear
transforms added together. In other words,
the underlying values of X and Y are
changed. This is different from independent
observations, where each game would be an
independent observation.

**Q: I can’t see when I’d ever want to
use X – Y. Does it have a purpose?**

A: X – Y is really useful if you want to find
the difference between two variables.
E(X – Y) is a bit like saying “What do you
expect the difference between X and Y to
be”, and Var(X – Y) tells you the variance.

**Q: Why do you add the variances for
X – Y? Surely you’d subtract them?**

A: At first it sounds counterintuitive,
but when you subtract one variable from
another, you actually increase the amount
of variability, and so the variance increases.
The variability of subtracting a variable is
actually the same as adding it.
Another way of thinking of it is that
calculating the variance squares the
underlying values. Var(X + bY) is equal to
$Var(X) + b^2 Var(Y)$, and if b is -1, this gives us
Var(X - Y). As $(-1)^2 = 1$, this means that
Var(X - Y) = Var(X) + Var(Y).

**Q: Can we do this if X and Y aren’t
independent?**

A: No, these rules only apply if X and
Y are independent. If you need to find the
variance of X + Y where there’s dependence,
you’ll have to calculate the probability
distribution from scratch.

**Q: It looks like the same rules apply
for X + Y as $X_1 + X_2$. Is this correct?**

A: Yes, that’s right, as long as X, Y, $X_1$
and $X_2$ are all independent.