# 3.5 The Binomial Distribution #

## Objectives ##
- Apply probability and counting rules (including addition and multiplication rules) as pertaining to both continuous random variables and discrete random variables.
- Analyze an application in the disciplines business, social sciences, psychology, life sciences, health science, and education, and utilize the correct statistical processes to arrive at a solution.

## The Binomial Distribution ##
There are three characteristics of a binomial experiment.

- There are a fixed number of trials. Think of trials as repetitions of an experiment. The letter n denotes the number of trials.
- There are only two possible outcomes, called "success" and "failure," for each trial. The letter $p$ denotes the probability of a success on one trial, and $q$ denotes the probability of a failure on one trial. $p + q = 1$.
- The $n$ trials are independent and are repeated using identical conditions. Because the $n$ trials are independent, the outcome of one trial does not help in predicting the outcome of another trial. Another way of saying this is that for each individual trial, the probability, $p$, of a success and probability, $q$, of a failure remain the same. For example, randomly guessing at a true-false statistics question has only two outcomes. If a success is guessing correctly, then a failure is guessing incorrectly. Suppose Joe always guesses correctly on any statistics true-false question with probability $p = 0.6$. Then, $q = 0.4$. This means that for every true-false statistics question Joe answers, his probability of success ($p = 0.6$ and his probability of failure ($q = 0.4$) remain the same.
The outcomes of a binomial experiment fit a **binomial probability distribution**, which is a special kind of discrete probability distribution. The random variable $X$ = the number of successes obtained in the $n$ independent trials.

The mean, $\mu$, and variance, $\sigma^2$, for the binomial probability distribution are $\mu = np$ and $\sigma^2 = npq$. The standard deviation, $\sigma$, is then $\sigma =  \sqrt{npq}$.

***


### Example 3.5.1 ###
At ABC College, the withdrawal rate from an elementary physics course is 30% for any given term. This implies that, for any given term, 70% of the students stay in the class for the entire term. A "success" could be defined as an individual who withdrew. The random variable $X$ = the number of students who withdraw from the randomly selected elementary physics class. The probability of success is $p = 0.30$ and the probability of failure is $q = 0.70$. If we track 20 students to see whether or not they withdraw, then this is a binomial experiment with $n = 20$.


***

### Example 3.5.2
Suppose you play a game that you can only either win or lose. The probability that you win any game is 55%, and the probability that you lose is 45%. Each game you play is independent. You play the game 20 times. 

1. Write the function that describes the probability that you win 15 of the 20 times. 
2. Find the expected value of the distribution.
3. Find the standard deviation of the distribution.

#### Solution
Here, if you define $X$ as the number of wins, then $X$ takes on the values 0, 1, 2, 3, ..., 20. The probability of a success is $p = 0.55$. The probability of a failure is $q = 0.45$. 

##### Step 1
The number of trials is $n = 20$. The probability question can be stated mathematically as $P(x = 15)$.

##### Step 2
The expected value of the distribution is
$$ \mu = np = 20(0.55) = 11. $$

##### Step 3
The standard deviation of the distribution is
$$ \sigma = \sqrt{npq} = \sqrt{20(0.55)(0.45)} = 2.2249. $$

***


### Example 3.5.3
Approximately 70% of statistics students do their homework in time for it to be collected and graded. Each student does homework independently. In a statistics class of 50 students, what is the probability that at least 40 will do their homework on time? Students are selected randomly.

1. Is this a binomial problem? Why or why not?
2. How do we define $X$?
3. What values does $x$ take on?
4. What is "failure," in words?
5. What is $n$? What is $p$? What is $q$?
6. State the probability question mathematically.
7. What is the expected value of the distribution?
8. What is the standard deviation of the distribution?

#### Solution
##### Part 1
This is a binomial problem because there are a fixed number of trials, there is only a success or a failure for each trial, and the trials are independent (that is, the probability of success doesn't change from trial to trial).

##### Part 2
$X$ = the number of students who do their homework on time.

##### Part 3
$x = 0, 1, 2, 3, \ldots, 50$

##### Part 4
Failure is defined as a student who does not complete his or her homework on time.

##### Part 5
The number of trials is $n = 50$.

The probability of success is $p = 0.70$.

The probability of failure is $q = 0.30$.

##### Part 6
We want to know the probability that **at least** 40 will do their homework on time. We write this mathematically as $P(x \geq 40)$.

##### Part 7
The expected value is
$$ \mu = np = 50(0.70) = 35 $$

##### Part 8
The standard deviation of the distribution is
$$ \sigma = \sqrt{npq} = \sqrt{50(0.70)(0.30)} = 3.2404 $$


***

## Notation and Calculation ##
When we want to say that $X$ is a random variable with a binomial distribution, we write
$$ X \sim B(n, p) $$
where $n$ is the number of trials and $p$ is the probability of success of any one trial.

We can calculate the the probability of a binomial distribution using the R function
```R
dbinom(x, size, prob)
```
where <code>x</code> is the value or list of values we want to calculate the probability for, <code>size</code> is the number of trials $n$, and <code>prob</code> is the probability of success $p$ of one trial.

***


### Example 3.5.4
It has been stated that about 41% of adult workers have a high school diploma but do not pursue any further education. 20 adult workers are randomly selected.

1. Find the probability that exactly 7 of the 20 workers have a high school diploma but do not pursue further education.
2. Find the probability that at most 4 workers have a high school diploma but do not pursue further education.
3. Find the probability that more than 15 workers have a high school diploma but do not pursue further education.
4. Find the probability that more than 8 but less than 18 workers have a high school diploma but do not pursue further education.

#### Solution
For this problem, $X$ = the number of workers, of the 20 selected, that have a high school diploma but do not pursue further education. The number of trials is $n = 20$, and the probability of "success" is $p = 0.41$. So
$$ X \sim B(20, 0.41) $$

##### Part 1
We want to find $P(x = 7)$. Simply use the <code>dbinom</code> function.

In [1]:
dbinom(x = 7, size = 20, prob = 0.41)

The probability that exactly 7 of the 20 adults have a high school diploma but do not pursue further education is $P(x = 7) = 0.1585$, or 15.85%.

##### Part 2
We want to find $P(x \leq 4)$. In other words, we want to know the probability that $x$ is 0 or 1 or 2 or 3 or 4. We will again use the <code>dbinom</code> function, but instead of passing a single value, we will pass a list of values. Let's first see how this works.

In [1]:
values = c(0, 1, 2, 3, 4)
dbinom(x = values, size = 20, prob = 0.41)

This gives us a list of 5 probabilities, one for each value we passed. For example, we can see from the list that $P(x = 4) = 0.0295$. 

But these values are all mutually exclusive (because, for example, we can't have at the same time 2 and 3 adults with a high school diploma but who do not pursue higher education; it is either 2, or 3, not both). So 

$$ P(x \leq 4) = P(x = 0) + P(x = 1) + P(x = 2) + P(x = 3) + P(x = 4); $$

that is, we just need to add up these probabilities.

In [1]:
values = c(0, 1, 2, 3, 4)
probs = dbinom(x = values, size = 20, prob = 0.41)
sum(probs)

So $P(X \leq 4) = 0.0423$. That is, there is a 4.23% chance that no more than 4 adults of the 20 selected have a high school diploma but do not pursue higher education.

##### Part 3
We want $P(x > 15)$.

In [1]:
values = c(16, 17, 18, 19, 20)
probs = dbinom(x = values, size = 20, prob = 0.41)
sum(probs)

So $P(x > 15) = 0.0004$. There is only a 0.04% chance that more than 15 of the 20 adults selected have a high school diploma but do not pursue further education.

##### Part 4
We want $P(8 < x < 18)$.

In [1]:
values = c(9, 10, 11, 12, 13, 14, 15, 16, 17)
probs = dbinom(x = values, size = 20, prob = 0.41)
sum(probs)

So $P(8 < x < 18) = 0.4406$. This is a 44.06% chance that between 8 and 18 adults of the 20 selected have a high school diploma but do not pursue futher education.

***


### Example 3.5.5
The lifetime risk of developing pancreatic cancer is about one in 78 (1.28%). Suppose we randomly sample 200 people. Let $X$ = the number of people who will develop pancreatic cancer.

1. Is it more likely that five or six people will develop pancreatic cancer? Justify your answer numerically.
2. Find the probability that less than eight people develop pancreatic cancer.

#### Solution
For this problem, $X \sim B(200, 0.0128)$.

##### Part 1
We need to find $P(x = 5)$ and $P(x = 6)$ and see which probability is greater.

In [1]:
values = c(5, 6)
dbinom(x = values, size = 200, prob = 0.0128)

So $P(x = 5) = 0.0707$ and $P(x = 6) = 0.0298$. Since the probability is greater when $x = 5$ than when $x = 6$, it is more likely that 5 people develop pancreatic cancer than it is for 6 people to develop pancreatic cancer.

##### Part 2
We want to find $P(x < 8)$.

In [1]:
values = c(0, 1, 2, 3, 4, 5, 6, 7)
probs = dbinom(x = values, size = 200, prob = 0.0128)
sum(probs)

So $P(x < 8) = 0.9954$. There is a 99.54% chance that fewer than 8 of the 200 people sampled develop pancreatic cancer.

***

### Example 3.5.6

In [None]:
#**VID=XLKrsYg-WOE**#

***

<small style="color:gray"><b>License:</b> This work is licensed under a [Creative Commons Attribution 4.0 International](https://creativecommons.org/licenses/by/4.0/) license.</small>

<small style="color:gray"><b>Author:</b> Taylor Baldwin, Mt. San Jacinto College</small>

<small style="color:gray"><b>Adapted From:</b> <i>Introductory Statistics</i>, by Barbara Illowsky and Susan Dean. Access for free at [https://openstax.org/books/introductory-statistics/pages/1-introduction](https://openstax.org/books/introductory-statistics/pages/1-introduction).</small>