Calculating probability distributions takes time. So far we’ve looked at how to calculate and use probability distributions, but wouldn’t it be
nice to have something easier to work with, or just quicker to calculate? In this chapter,
we’ll show you some special probability distributions that follow very definite patterns.
Once you know these patterns, you’ll be able to use them to calculate probabilities,
expectations, and variances in record time. Read on, and we’ll introduce you to the
geometric, binomial and Poisson distributions.

It’s time to exercise your probability skills. The probability of Chad
making a successful run down the slopes is 0.2 for any given trial
(assume trials are independent). What’s the probability he’ll need
two trials? What’s the probability he’ll make a successful run down
the slope in one or two trials? Remember, when he’s had his first
successful run, he’s going to stop.

Chad is remarkably resilient,
and any collisions in a given run
don’t affect his performance in
future trials.

Hint: You may want to draw a
probability tree to help visualize
the problem.

Here’s a probability tree for the first two trials, as these are all that’s needed to work out the
probabilities.

![1](001.png)

If we say X is the number of trials needed to get down the slopes, then

P(X = 1) = P(Success in trial 1)

= 0.2

P(X = 2) = P(Success in trial 2 ∩ Failure in trial 1)

= 0.2 x 0.8
= 0.16

P(X ≤ 2) = P(X = 1) + P(X = 2)

= 0.2 + 0.16

= 0.36

We can add these probabilities because they’re independent.

### We need to find Chad’s probability distribution

So far you’ve found the probability that Chad will need fewer than three
attempts to make it down the slope. But what if you needed to look at the
probability of him needing fewer than 10 attempts (for insurance reasons),
or even 20 or 100?

Rather than work out the probabilities from scratch every time, it would
be useful if we could use a probability distribution. To do this, we need
to work out the probability for every single possible number of attempts
Chad needs to get down the slope.

Hang on. If we have to work
out every single probability, we’ll
be here forever.

### There’s a problem because the number of possibilities is neverending.
Chad will continue with his attempts to make it down the slope until he is
successful. This could take him 1 attempt, 10 attempts, 100 attempts, or
even 1,000 attempts. There are no guarantees about exactly when Chad
will first successfully make it down the slopes.

So you expect me to come up
with the probability distribution
of something that’s neverending? Is
that your idea of a joke?

### Even though it’s neverending, there’s still a way of figuring out this type of probability distribution.
This is actually a special kind of probability distribution, with special
properties that makes it easy to calculate probabilities, along with the
expectation and variance.
Let’s see if we can figure it out.

## There’s a pattern to this probability distribution

Let’s define the variable X to be the number of trials needed for
Chad to make a successful run down the slope. Chad only needs
to make one successful run, and then he’ll stop.

Let’s start off by examining the first four trials so that we can
calculate probabilities for the first four values of X. By doing this,
we can see if there’s some sort of pattern that will help us to easily
work out the probabilities of other values.

![2](002.png)

| x | P(X=x) |
|------|------|
| 1 | 0.2 |
| 2 | 0.8 × 0.2 = 0.16 |
| 3 | 0.8 × 0.8 × 0.2 = 0.128 |
| 4 | 0.8 × 0.8 × 0.8 × 0.2 = 0.1024 |

These probabilities are
calculated using the
probability tree.

Notice each probability is
composed by multiplying
different powers of 0.8 and
0.2 together.

When we use P(X = x), we’re using it to demonstrate x taking on any value
in the probability distribution. In the table above, we show various values
of x, and we calculate the probability of getting each of these values.

When we use P(X = r), x takes on the particular value r. We’re looking
for the probability of getting this specific value. It’s just that we haven’t
specified what the value of r is so that we can come up with a generalized
calculation for the probability.

It’s a bit like saying that x can take on any value, including the fixed value r.

## The probability distribution can be represented algebraically

As you can see, the probabilities of Chad’s snowboarding trials follow a
particular pattern. Each probability consists of multiples of 0.8 and 0.2.
You can quickly work out the probabilities for any value r by using:

P(X = r) = $0.8^{r-1}$ × 0.2

In other words, if you want to find P(X = 100), you don’t have to draw an
enormous probability tree to work out the probability, or think your way
through exactly what happens in every trial. Instead, you can use:

P(X = r) = $0.8^{99}$ × 0.2

We can generalize this even further. If the probability of success in a trial
is represented by p and the probability of failure is 1 - p, which we’ll call
q, we can work out any probability of this nature by using:

$P(X = r) = q^{r-1} × p$

This formula is called the **geometric distribution**.

(r - 1) failures and 1 success.
In our case, p = 0.2 and
q = 0.8.

q is equal to 1 - p. If p
represents the probability
of success, then q represents
the probability of failure.

**Q: What’s the point in generalizing
this? It’s just one particular problem
we’re dealing with.**

A: We’re generalizing it so that we can
apply the results to other similar problems. If
we can generalize the results for this kind of
problem, it will be quicker to use it for other
similar situations in the future.

**Q: You said we needed to find an
expression for P(X = r). What’s r?**

A: P(X = r) means “the probability that X
is equal to value r,” where r is the number of
trials we need to get the first success.
If you wanted to find, say, P(X = 20), you
could substitute r for 20. This would give you
a quick way of finding the probability.

**Q: Why is it the letter r? Why not some
other letter?**

A: We used the letter r so that we could
generalize the result for any particular
number. We could have used practically any
other letter, but using r is common.

**Q: How can we have a probability
distribution if the number of possibilities
is endless?**

A: We don’t have to specify a probability
distribution by physically listing the
probability of every possible outcome. The
key thing is that we need a way of describing
every possibility, which we can do with a
formula for computing the probability.

**Q: Wouldn’t Chad’s snowboarding
skills eventually improve? Is it realistic to
say the probability of success is 0.2 for
every trial?**

A: That may be a fair assumption. But
in this problem, Chad is truly hapless when
it comes to snowboarding, and we have to
assume that his skills won’t improve—which
means his probability of success on the
slopes will follow the geometric distribution.

## Geometric Distribution Up Close

We said that Chad’s snowboarding exploits are an example of the geometric
distribution. The geometric distribution covers situations where:

* You run a series of independent trials.
* There can be either a success or failure for each trial, and the probability of success is the same for each trial.
* The main thing you’re interested in is how many trials are needed in order to get the first successful outcome.

So if you have a situation that matches this set of criteria, you can use the
geometric distribution to help you take a few shortcuts. The important thing
to be aware of is that we use the word “success” to mean that the event
we’re interested in happens. If we’re looking for an event that has negative
connotations, in statistical terms it’s still counted as a success.

Let’s use the variable X to represent the number of trials needed to
get the first successful outcome—in other words, the number of trials
needed for the event we’re interested in to happen.

To find the probability of X taking a particular value r, you can get a quick
result by using:

$P(X = r) = p q^{r-1}$

where p is the probability of success, and q = 1 – p, the probability of failure.

In other words, to get a success on the rth attempt, there must first have been
(r – 1) failures.

The geometric distribution has a distinctive shape.

P(X = r) is at its highest when r = 1, and it gets lower and
lower as r increases. Notice that the probability of getting
a success is highest for the first trial. This means that the
mode of any geometric distribution is always 1,
as this is the value with the highest probability.

This may sound counterintuitive, but it’s most likely that
only one attempt will be needed for a successful outcome.

![3](003.png)

### The geometric distribution also works with inequalities

As well as finding exact probabilities for the geometric distribution, there’s also
a quick way of finding probabilities that deal with inequalities.

Let’s start with P(X > r).

P(X > r) is the probability that more than r trials will be needed in order to get
the first successful outcome. In order for more than r trials to be needed, this
means that the first r trials must have ended in failure. This means that you
find the probability by multiplying the probability of failure together r times.

$P(X > r) = q^r$

For the number of trials needed for a success to be greater
than r, there must have been r failures.

We don’t need p in this formula because
we don’t need to know exactly which trial
was successful, just that there must be
more than r trials.

We can use this to find $P(X \leqslant r)$, the probability that r or fewer trials are
needed in order for there to be a successful outcome.

If we add together $P(X \leqslant r)$ and P(X > r), the total must be 1. This means that

$P(X \leqslant r) + P(X > r) = 1$

or

$P(X \leqslant r) = 1 - P(X > r)$

This gives us

$P(X \leqslant r) = 1 - q^r$

If a variable X follows a geometric distribution where the probability of
success in a trial is p, this can be written as

$X \sim Geo(p)$

This is a quick way of saying “X follows a geometric
distribution where the probability of success is p.”

I’m getting bruised! How
many attempts do you
expect me to have to make
before I make it down the
slope OK?

## The pattern of expectations for the geometric distribution

So far we’ve found probabilities for the number of attempts Chad
needs to make before successfully makes it down the slope, but what
if we want to find the expectation and variance? If we know the
expectation, for instance, we’ll be able to say how many attempts we
expect Chad to make before he’s successful.

As a reminder, expectation is the average
value that you expect to get, a bit like the
mean but for probability distributions.

Variance is a measure of how much you can
expect this to varies by.

We find E(X) by calculating $\sum xP(X = x)$. The probabilities in this
case go on forever, but let’s start by working out the first few values to
see if there’s some sort of pattern.

Here are the first few values of x, where X ~ Geo(0.2)

![4](004.png)

Can you see what happens to the values of xP(X = x)?

The values of xP(X = x) start off small, and then they get larger until x = 5. When
x is larger than 5, the values start decreasing again, and keep on decreasing as x
gets larger. As x gets larger, xP(X = x) becomes smaller and smaller until it makes
virtually no difference to the running total.

We can see this more clearly if we chart the cumulative total of xP(X = x):

![5](005.png)

### Expectation is 1/p

Drawing the chart for the running total of xP(X = x) shows you that as x gets
larger, the running total gets closer and closer to a particular value, 5. In fact, the
running total of xP(X = x) for an infinite number of trials is 5 itself. This means
that

E(X) = 5

This makes intuitive sense. The probability of a successful outcome is 0.2. This is a
bit like saying that 1 in 5 attempts tend to be successful, so we can expect Chad to
make 5 attempts before he is successful.

We can generalize this for any value p. If X ~ Geo(p) then

$E(X) = \frac{1}{p}$ The expectation is 1 divided by the probability of success.

We’re not just limited to finding the expectation of the geometric distribution,
we can find the variance too.

Let’s see if we can find an expression for the variance of the
geometric distribution in the same way that we did for the
expectation. Complete the table below. What do you notice?

![6](006.png)

![7](007.png)

$x^2 P(X = x)$ gets larger and larger up until a certain point, and then it starts
decreasing again. Eventually it becomes very close to 0.

![8](008.png)

## Finding the variance for our distribution

So how does this help us find the variance of the number of trials it takes
Chad to make a successful run down the slopes?

We find the variance of a probability distribution by calculating

$Var(X) = E(X^2) – E^2(X)$

This means that we calculate $\sum x^2 P(X = x)$, and then subtract E(X) squared.
By graphing the resulting values against the values of x, you can see the
pattern of Var(X) as x increases. Here’s the graph of $x^2P(X \leqslant x) - E^2(X)$

![9](009.png)

As x gets larger, the value of $x^2P(X \leqslant x) - E^2(X)$ gets closer and closer to a
particular value, this time 20.

As with the expectation, we can generalize this. If X ~ Geo(p) then

$Var(X) = \frac{q}{p^2}$

Even though there’s no fixed number of
trials, you can still work out what the
expectation and variance are.

## A quick guide to the geometric distribution

Here’s a quick summary of everything you could possibly need to know about the Geometric distribution

### When do I use it?

Use the Geometric distribution if you’re running independent trials, each one can have a success or failure, and
you’re interested in how many trials are needed to get the first successful outcome

### How do I calculate probabilities?

Use the following handy formulae. p is the probability of success in a trial, q = 1 - p, and X is the number of
trials needed in order to get the first successful outcome. We say X ~ Geo(p).

The probability of the first
success being in the r’th trial

$P(X = r) = p q^{r - 1}$

The probability you’ll need more than
r trials to get your first success

$P(X > r) = q^r$

The probability you’ll need r trials
or less to get your first success

$P(X \leqslant r) = 1 - q^r$

### What about the expectation and variance?
$E(X) = \frac{1}{p}$

$Var(X) = \frac{q}{p^2}$

**Q: Can I trust these formulae? Can
I use them any time I need to find
probabilities and expectations?**

A: You can use these shortcuts whenever
you’re dealing with the geometric distribution,
as they’re shortcuts for that probability
distribution. If you’re dealing with a situation
that can’t be modelled by the geometric
d istribution, don’t use these shortcuts.
Remember, the geometric distribution is
used for situations where you’re running
independent trials (so the probability stays
the same for each one), each trial ends in
either success or failure, and the thing you’re
interested in is how many trials are needed
to get the first successful outcome.

**Q: What about if my circumstances
are different? What if I have a fixed
number of trials and I want to find the
number of successful outcomes?**

A: You can’t use the geometric
distribution to model this sort of situation, but
don’t worry, there are other methods.

**Q: Why does the distribution use the
letters p and q?**

A: The letter p stands for probability. In
this case, it’s the probability of getting a
s uccessful outcome in one trial.
The letter q is often used in statistics to
represent 1 - p, or p'.

We’ve got some great questions for you today,
so let’s get started. In Round One I’m going to ask you
three questions, and for each question there are four possible
answers. You can quit now and walk away with the consolation
prize, but if you play on and beat your competitors, you’ll move
on to the next round and be one step closer to winning a swivel
chair. The title of Round One is “All About Me.” Good luck!

![10](010.png)

**Q: What’s a quiz show doing in the middle of my chapter? I
thought we were talking about probability distributions.**

A: We still are. This situation is ideal for another sort of probability
distribution. Keep reading and everything will become clear.

**Q: I don’t know the answers to these questions. What should
I do?**

A: If you don’t know the answers you’ll have to answer them at
random. Give it your best shot - you might win a swivel chair.

## Should you play, or walk away?

It’s unlikely you’ll know the game show host well enough to answer these
questions, so let’s see if we can find the probability distribution for the number
of questions you’ll get correct if you choose answers at random. That should
help you decide whether or not to play on.

Here’s a probability tree for the three questions:

![011](011.png)

What are the probabilities for this problem? What sort of pattern
can you see? We’re using X to represent the number of questions
you get correct out of three.

![012](012.png)
![013](013.png)

You’ve got a 42% chance of
getting one question right, and a
14% chance of getting two right.
Those aren’t bad odds. I suggest
you go for it and guess.

Think back to when you looked at permutations
and combinations. How do you think
they might help you with this sort of problem?

## Generalizing the probability for three questions

So far we’ve looked at the probability distribution of X, the number of
questions we answer correctly out of three.

Just as with the geometric distribution, there seems to be a pattern in
the way the probabilities are formed. Each probability contains different
powers of 0.75 and 0.25. As x increases, the power of 0.75 decreases
while the power of 0.25 increases.

In general, P(X = r) is given by:

$P(X = r) = ? × 0.25^r × 0.75^{3 - r}$

r is the number of
questions we get right

There are 3 questions

In other words, to find the probability of getting exactly r questions right,
we calculate $0.25^r$, multiply it by $0.75^{3-r}$, and then multiply the whole lot by
some number. But what?

What’s the missing number?

For each probability, we need to answer a certain number of questions
correctly, and there are different ways of achieving this. As an example,
there are three different ways of answering exactly one question correctly
out of three questions. Another way of looking at this is that there are 3
different combinations.

Just to remind you, a combination $^nC_r$ is the number of ways of choosing r
objects from n, without needing to know the exact order. This is exactly the
situation we have here. We need to choose r correct questions from 3.

This means that the probability of getting r questions correct out of 3 is
given by

$P(X = r) =  {^3C_r} × 0.25^r × 0.75^{3 - r}$

So, by this formula, the probability of getting 1 question
correct is:

$P(X = r) =  {^3C_1} × 0.25^r × 0.75^{3 - r} = 0.422$

This is the same result we got using
our chart.

Round Two of Who Wants To Win A Swivel Chair
is called “More About Me.” This time I’ll ask you five
questions. As before, there are four possible answers
to each question. Do you want to play on?

It looks like these questions are just as obscure as the ones in the previous round, so
you’ll have to answer questions at random again.

Let’s see if we can work out the probability distribution for this new set of questions.

![14](014.png)

## Let’s generalize the probability further

So far you’ve seen that the probability of getting r questions correct out
of 3 is given by

$P(X = r) =  {^3C_r} × 0.25^r × 0.75^{3 - r}$

where the probability of answering a question correctly is 0.25, and the
probability of answering incorrectly is 0.75.

The next round of Who Wants To Win A Swivel Chair has 5 questions
instead of 3. Rather than rework this probability for 5 questions, let’s
rework it for n questions instead. That way we’ll be able to use the same
formula for every round of Who Wants To Win A Swivel Chair.

So what’s the formula for the probability of getting r questions right out of
n? It’s actually

$P(X = r) =  {^nC_r} × 0.25^r × 0.75^{n - r}$

What if the probability of getting a
question right changes? I wonder if we
can generalize this further.

Yes, we can generalize this further.
Imagine the probability of getting a question right is given by p, and
the probability of getting a question wrong is given by 1 – p, or q. The
probability of getting r questions right out of n is given by

$P(X = r) =  {^nC_r} × p^r × q^{n - r}$

This sort of problem is called the **binomial distribution**. Let’s take a
closer look.

## Binomial Distribution Up Close

Guessing the answers to the questions on Who Wants To Win A Swivel
Chair is an example of the binomial distribution. The binomial
distribution covers situations where

1. You’re running a series of independent trials.
2. There can be either a success or failure for each trial, and the probability of success is the same for each trial.
3. There are a finite number of trials.

1 and 2 are like the Geometric distribution.

3 is different.

Just like the geometric distribution, you’re running a series of independent
trials, and each one can result in success or failure. The difference is that
this time you’re interested in the number of successes.

Let’s use the variable X to represent the number of successful
outcomes out of n trials. To find the probability there are r successes,
use:

$P(X = r) =  {^nC_r}  p^r  q^{n - r}$

p is the probability of a successful outcome in each trial, and n is the number
of trials. We can write this as

X ~ B(n, p)

The exact shape of the binomial distribution varies
according to the values of n and p. The closer to 0.5 p is,
the more symmetrical the shape becomes. In general it is
skewed to the right when p is below 0.5, and skewed to the
left when p is greater than 0.5.

![15](015.png)


## What’s the expectation and variance?

So far we’ve looked at how to use the binomial distribution to find basic
probabilities, which allows us to calculate the probability of getting a certain
number of questions correct. But how many questions can we actually expect to
get right if we choose the answers at random? That will help you better decide
whether we should answer the next round of questions.

Let’s see if we can find a general expression for the expectation and variance.
We’ll start by working out the expectation and variance for a single trial, and then
see if we can extend it to n independent trials.

### Let’s look at one trial
Suppose we conduct just one trial. Each trial can only result in success or
failure, so in one trial, it’s possible to have 0 or 1 successes. If X ~ B(1, p),
the probability of 1 success is p, and the probability of 0 successes is q.

We can use this to find the expectation and variance of X. Let’s start with the expectation.

| x | 0 | 1 |
|------|------|------|
| P(X=x) | q | p |

E(X) = 0q + 1p
= p

$Var(X) = E(X^2) - E(X)^2$

$= (0q + 1p) - p^2$

= pq

So for a single trial, E(X) = p and Var(X) = pq. But what if there are n trials?

In general, what happens to the expectation and variance when there are n
independent observations? How can this help us now?

## Binomial expectation and variance

Let’s summarize what we just did. First of all, we took at one trial, where
the probability of success is p, and where the distribution is binomial.
Using this, we found the expectation and variance of a single trial.

We then considered n independent trials, and used shortcuts to find the
expectation and variance of n trials. We found that if X ~ B(n, p)

E(X) = np

Var(X) = npq

These formulae work for any binomial distribution.

This is useful to know as it gives us a quick way of finding the expectation
and variance of any probability distribution, without us having to work out
lots of individual probabilities.

**Q: The geometric distribution and the
binomial distribution seem similar. What’s
the difference between them? Which one
should I use when?**

**A: The geometric and binomial
distributions do have some things
in common. Both of them deal with
independent trials, and each trial can result
in success or failure. The difference between
them lies in what you actually need to find
out, and this dictates which probability
distribution you need to use.**

If you have a fixed number of trials and you
want to know the probability of getting a
certain number of successes, you need to
use the binomial distribution. You can also
use this to find out how many successes you
can expect to have in your n trials.

If you’re interested in how many trials you’ll
need before you have your first success,
then you need to use the geometric
distribution instead.

**Q: The geometric distribution has a
mode. Does the binomial distribution?**

A: Yes, it does. The mode of a probability
distribution is the value with the highest
probability. If p is 0.5 and n is even, the
mode is np. If p is 0.5 and n is odd it has two
modes, the two values either side of np. For
other values of n and p, finding the mode is
a matter of trial and error, but it’s generally
fairly close to np.

**Q: So for both the geometric and the
binomial distributions you run a series
of trials. Does the probability of success
have to be the same for each trial?**

A: In order for the geometric or binomial
distribution to be applicable, the probability
of success in each trial must be the same.
If it’s not, then neither the geometric nor
binomial distribution is appropriate.

**Q: I’ve tried calculating E(X) and
it’s not a value that’s in the probability
distribution. Did I do something wrong?**

A: When you calculate E(X), the result
may not be a possible value in your
probability distribution. It may not be a value
that can actually occur. If you get a result
like this, it doesn’t mean that you’ve made a
mistake, so don’t worry.

## Your quick guide to the binomial distribution

Here’s a quick summary of everything you could possibly need to know about the binomial distribution

### When do I use it?
Use the binomial distribution if you’re running a fixed number of independent trials, each one can have a success or failure, and you’re interested in the number of successes or failures 

### How do I calculate probabilities?

$P(X = r) =  {^nC_r}  p^r  q^{n - r}$

where p is the probability of success in a trial, q = 1 - p, n is the number of trials, and X is the number of
successes in the n trials.

### What about the expectation and variance?

E(X) = np 

Var(X) = npq

## The Statsville Cinema has a problem

It’s a fact of life that cinemagoers like popcorn.

The trouble is that the popcorn machine at the Statsville Cinema keeps
breaking down, and the customers aren’t happy.

The cinema has a big promotion on next week, and the cinema manager
needs everything to be perfect. He doesn’t want the popcorn machine to
break down during the week, or people won’t come back.

The mean number of popcorn machine malfunctions per week, or rate of
malfunctions, is 3.4. What’s the probability that it won’t break down at all
next week?

If they expect the machine to break down more than a few times next week,
the Statsville Cinema will buy a new popcorn machine, but if not, they’ll
stick with the current one and run the risk of a breakdown.

### It’s a different sort of distribution
This is a different sort of problem from the ones we’ve encountered so far.
This time there’s no series of attempts or trials. Instead, we have a situation
where we know the rate at which malfunctions happen, and where
malfunctions occur at random.

### So how do we find probabilities?
The trouble with this sort of problem is that while we know the mean
number of popcorn machine malfunctions per week, the actual number
of breakdowns varies each week. On the whole we can expect 3 or 4
malfunctions per week, but in a bad week there’ll be far more, and in a good
week there might be none at all.

We need to find the probability that the popcorn machine won’t break down
next week.

Sound difficult? Don’t worry, there’s a probability distribution that’s
designed for just this sort of situation. It’s called the **Poisson distribution**.

## Poisson Distribution Up Close

The Poisson distribution covers situations where:

1. Individual events occur at random and independently in a given interval. This can be an interval of time or space—for example, during a week, or per mile.
2. You know the mean number of occurrences in the interval or the rate of occurrences, and it’s finite. The mean number of occurrences is normally represented by the Greek letter $\lambda$ (lambda).

Let’s use the variable X to represent the number of occurrences in
the given interval, for instance the number of breakdowns in a week. If
X follows a Poisson distribution with a mean of $\lambda$ occurrences per interval
or rate, we write this as:

X ~ Po($\lambda$)

We’re not going to derive it here, but to find the probability that there are r
occurrences in a specific interval, use the formula:

$P(X=x) = \frac{e^{-\lambda} \lambda^r}{r!}$

e is a mathematical
constant. It always
stands for 2.718, so you
can just substitute in
this number for e in the
Poisson formula.

As an example, if X ~ Po(2)

$P(X=3) = \frac{e^{-2} 2^3}{3!}$

= 0.18

So if X follows a Poisson distribution, what’s its expectation and variance?
It’s easier than you might think...

## Expectation and variance for the Poisson distribution

Finding the expectation and variance for the Poisson distribution is a lot easier
than finding it for other distributions.

If X ~ Po($\lambda$), E(X) is the number of occurrences we can expect to have in a
given intervals, so for the popcorn machine, it’s the number of breakdowns we
can expect to have in a typical week. In other words, E(X) is the mean number
of occurrences in the given interval.

Now, if X ~ Po(λ), then the mean number of occurrences is given by λ. In
other words, E(X) is equal to λ, the parameter that defines our Poisson
distribution.

To make things even simpler, the variance of the Poisson distribution is also
given by λ, so if X ~ Po(λ)

E(X) = λ 

Var(X) = λ

In other words, if you’re given a Poisson distribution Po(λ), you don’t have to
calculate anything at all to find the expectation and variance. It’s the parameter
of the Poisson distribution itself.

## What does the Poisson distribution look like?

The shape of the Poisson distribution varies depending on the value of λ. If
λ is small, then the distribution is skewed to the right, but it becomes more
symmetrical as λ gets larger.

If λ is an integer, then there are two modes, λ and λ - 1. If λ is not an integer,
then the mode is λ.

![016](016.png)

**Q: How come we use λ to represent
the mean for the Poisson distribution?
Why not use μ like we do elsewhere?**

A: We use λ because for the Poisson
distribution, the parameter of the distribution,
expectation and variance are all the same.
It’s a way of making sure we keep everything
neutral.

**Q: Where does the formula for the
Poisson distribution come from?**

A: It can actually be derived from the
other distributions, but the mathematics
are quite involved. In practice it’s best to
just accept the formula, and remember the
situations in which it’s useful.

**Q: What’s the difference between
the Poisson distribution and the other
probability distributions?**

A: The key difference is that the Poisson
distribution doesn’t involve a series of
trials. Instead, it models the number of
occurrences in a particular interval.

**Q: Does λ have to be an integer?**

A: Not at all. λ can be any non-negative
number. It can’t be negative as it’s the mean
number of occurrences in an interval, and
it doesn’t make sense to have a negative
number of occurrences.

**Q: What’s that “e” in the formula all
about?**

A: e is a constant in mathematics that
stands for the number 2.718. So you can
substitute in 2.718 for e in the formula for
calculating Poisson probabilities.
The constant e is used frequently in calculus,
and it also has many other applications
in everything from calculating compound
interest to advanced probability theory.

## The Statsville Cinema has another problem.
It’s not just the popcorn machine that keeps breaking down, now the drinks
machine has begun malfunctioning too. The mean number of breakdowns
per week of the drinks machine is 2.3.

The cinema manager can’t afford for anything to go wrong next week when
the promotion is on. What’s the probability that there will be no breakdowns
next week, either with the popcorn machine nor the drinks machine?

What’s the probability distribution of the drinks
machine? How can we find the probability that
neither the popcorn machine nor the drinks
machine go wrong next week?

## So what’s the probability distribution?

Let’s take a closer look at this situation.
We have two machines, a popcorn machine and a drinks machine, and we know
the mean number of breakdowns of each machine in a week. We want to find
the probability that there will be no breakdowns next week.

Here are the distributions of the two machines:

The mean number of
breakdowns per week of
the popcorn machine is 3.4.

X ~ Po(3.4)

The mean number of
breakdowns per week of
the drinks machine is 2.3.

Y ~ Po(2.3)

If X represents the number of breakdowns of the popcorn machine and Y
represents the number of breakdowns of the drinks machine, then both X and
Y follow Poisson distributions. What’s more, X and Y are independent. In other
words, the popcorn machine breaking down has no impact on the probability
that the drinks machine will malfunction, and the drinks machine breaking down
has no impact on the probability that the popcorn machine will malfunction.

We need to find the probability that the total number of malfunctions next week
is 0. In other words, we need to find

P(X + Y = 0)

Think back to the chapter on probabilities. If X and Y are independent variables,
how can we find probabilities for X + Y?

## Combine Poisson variables

You saw in previous chapters that if X and Y are independent random
variables, then

P(X + Y) = P(X) + P(Y)

E(X + Y) = E(X) + E(Y)

This means that if X ~ Po($\lambda_x$) and Y ~ Po($\lambda_y$),

X + Y ~ Po($\lambda_x$ + $\lambda_y$)

This means that if X and Y both follow Poisson distributions, then so does
X + Y. In other words, we can use our knowledge of the way both X and Y
are distributed to find probabilities for X + Y.

If X is the number of times the popcorn machine malfunctions
and Y is the number of times the drinks machine malfunctions,
then X ~ Po(3.4) and Y ~ Po(2.3).

1. What’s the distribution of X + Y?
X + Y ~ Po(3.4 + 2.3)
X + Y ~ Po(5.7)

2. Once you’ve found how X + Y is distributed, you can use it to find probabilities. What’s P(X + Y = 0)?

$P(X=r) = \frac{e^{-\lambda} \lambda^r}{r!}$

$P(X+Y=0) = \frac{e^{-5.7} 5.7^0}{0!} = 0.003$

Only a .003 chance
of no breakdowns next
week? Guess we better
get some new machines
after all.

**Q: Does that mean that the probability
and expectation shortcuts we saw
earlier in the book work for the Poisson
distribution too?**

A: Yes they do. X and Y are independent
random variables, because the popcorn
machine malfunctioning does not affect
the probability that the drinks machine will
malfunction, and vice versa. This means that
we can use all of the shortcuts that apply to
independent variables.

**Q: Why does X + Y follow a Poisson
distribution?**

A: X + Y follows a Poisson distribution
because both X and Y are independent, and
they both follow a Poisson distribution.
Both the popcorn machine and drinks
machine each malfunction at random but at
a mean rate. This means that together they
also breakdown at random and at a mean
rate. Together, they still meet the criteria for
the Poisson distribution.

**Q: So can we use the distribution of
X + Y in the same we would any other
Poisson distribution?**

A: Yes, we use it in exactly the same way,
so once you know what the parameter λ is,
you can use it to find probabilities.

## Anyone for popcorn?

You’ve covered a lot of ground in this chapter. You’ve built on your
existing knowledge of probability and statistics by tackling three of the
most important discrete probability distributions. Moreover, you’ve
gained a deeper understanding of how probability distributions work
and the sort of shortcuts you can make to save yourself time and
produce reliable results, skills that will come in useful in the rest of the
book.

So sit back and enjoy the popcorn — you’ve earned it.

## Your quick guide to the Poisson distribution

Here’s a quick summary of everything you could possibly need to know about the Poisson distribution

### When do I use it?

Use the Poisson distribution if you have independent events such as malfunctions occurring in a given interval,
and you know λ, the mean number of occurrences in a given interval. You’re interested in the number of
occurrences in one particular interval.

How do I calculate probabilities, and the expectation and variance?

$P(X=r) = \frac{e^{-\lambda} \lambda^r}{r!}$

E(X) = λ

Var(X) = λ

### How do I combine independent random variables?

If X ~ Po($\lambda_x$) and Y ~ Po($\lambda_y$),

X + Y ~ Po($\lambda_x$ + $\lambda_y$)