# 4.3 The Central Limit Theorem for Means

## Objectives

- Compute probabilities using the Central Limit Theorem and demonstrate the ability to interpret sampling distributions of both population proportions and means.
- Analyze an application in the disciplines business, social sciences, psychology, life sciences, health science, and education, and utilize the correct statistical processes to arrive at a solution.

## The Central Limit Theorem

Suppose $X$ is a random variable with a distribution that may be known or unknown (it can be *any* distribution). Using a subscript that matches the random variable, suppose:

- $\mu_X$ = the mean of $X$
- $\sigma_X$ = the standard deviation of $X$

If you draw random samples of size n, then as n increases, the random variable $\overline{X}$  which consists of sample means, tends to be **normally distributed** and

$$ \overline{X} \sim N\left(\mu_X, \frac{\sigma_X}{\sqrt{n}}\right). $$

The **central limit theorem** for sample means says that if you keep drawing larger and larger samples (such as rolling one, two, five, and finally, ten dice) and **calculating their means,** the sample means form their own **normal distribution** (the sampling distribution). The normal distribution has the same mean as the original distribution and a variance that equals the original variance divided by the sample size. Standard deviation is the square root of variance, so the standard deviation of the sampling distribution is the standard deviation of the original distribution divided by the square root of $n$. The variable $n$ is the number of values that are averaged together, not the number of times the experiment is done.

To put it more formally, if you draw random samples of size $n$, the distribution of the random variable  $\overline{X}$, which consists of sample means, is called the sampling distribution of the mean. The sampling distribution of the mean approaches a normal distribution as $n$, the sample size, increases.

The random variable  $\overline{X}$  has a different $z$-score associated with it from that of the random variable $X$. The mean  $\bar{x}$  is the value of  $\overline{X}$ in one sample.

$$ z = \frac{\bar{x} - \mu_{\overline{X}}}{\sigma_{\overline{X}}} $$

$\mu_X$ is the average of both $X$ and $\overline{X}$.

$\sigma_{\overline{X}} = \frac{\sigma_X}{\sqrt{n}}$ = standard deviation of $\overline{X}$ and is called the **standard error of the mean**.

To illustrate this principle, consider a fair six-sided die. Since each outcome is equally likely, the expected value of rolling the die is

$$ \mu_X = \frac{1 + 2 + 3 + 4 + 5 + 6}{6} = 3.5. $$

We can also calculate that the population standard deviation is $\sigma_X = 1.7078$.

What does this mean? If we roll a die once, we certainly don't expect to get 3.5. But if we roll the die several times, we expect the average of those rolls to be reasonably close to 3.5. The more times we roll the die, the closer we expect the average of those rolls to be to 3.5. The Central Limit Theorem is the formal mathematical statement of this concept. 

The Central Limit Theorem tells us exactly how likely or unlikely getting an average from a sample is. Imagine we roll a die 50 times; that is, we "sample" 50 rolls. The Central Limit Theorem tells us that the means of *all* possible samples of 50 rolls form the normal distribution $\overline{X} \sim N(3.5, 1.7078/\sqrt{50})$. We can use this to determine how likely or unlikely it is to randomly select a sample with a certain mean.

***

### Example 4.3.1
The expected value of a fair six-sided die roll is $\mu_X = 3.5$. The standard deviation is $\sigma_X = 1.7078$.

1. If we roll the die 50 times, what is the likelihood that the sample mean is smaller than $3.0$?
2. If we roll the die 100 times, what is the likelihood that the sample mean is smaller than $3.0$?
3. Is it more likely that our sample mean is smaller than $3.0$ if we roll the die 50 times or 100 times? Why?

#### Solution ####

##### Part 1

By the Central Limit Theorem, we know that means of samples of size 50 are normally distributed with mean 

$$\mu_{\overline{X}} = \mu_X = 3.5$$ 

and with standard deviation

$$\sigma_{\overline{X}} = \frac{\sigma_X}{\sqrt{n}} = \frac{1.7078}{\sqrt{50}} = 0.2415$$

We want to find $P(\bar{x} < 3.0)$. Since $\overline{X} \sim N(3.5, 0.2415)$, we first find the $z$-score for $\bar{x} = 3.0$.

$$z = \frac{\bar{x} - \mu_{\overline{X}}}{\sigma_{\overline{X}}} = \frac{3.0 - 3.5}{0.2415} = -2.0704. $$

So $P(\bar{x} < 3.0) = P(z < -2.0704)$. We use R to find the probability.

In [1]:
pnorm(q = -2.0704, lower.tail = TRUE)

So $P(\bar{x} < 3.0) = P(z < -2.0704) = 0.0192$. That is, there is only a 1.92% chance that our 50 die rolls have an average of less than 3.0.

##### Part 2
By the Central Limit Theorem, we know that the means of samples of size 100 are normally distributed with mean

$$ \mu_{\overline{X}} = \mu_X = 3.5 $$

and with standard deviation

$$ \sigma_{\overline{X}} = \frac{\sigma_{X}}{\sqrt{n}} = \frac{1.7078}{\sqrt{100}} = 0.1708. $$

We want to find $P(\bar{x} < 3.0)$. Since $\overline{X} \sim N(3.5, 0.1708)$, we first need to find the $z$-score associated with $\bar{x} = 3.0$.

$$ z = \frac{\bar{x} - \mu_{\overline{X}}}{\sigma_{\overline{X}}} = \frac{3.0 - 3.5}{0.1708} = -2.9274. $$

So $P(\bar{x} < 3.0) = P(z < -2.9274)$. We use R to find the probability.

In [1]:
pnorm(q = -2.9274, lower.tail = TRUE)

So $P(\bar{x} < 3.0) = P(z < -2.9274) = 0.0017$. That is, there is only a 0.17% chance that our 100 die rolls have an average of less than 3.0.

##### Part 3
The probability of getting a sample mean of less than 3.0 after 50 die rolls is 1.92%. But the probability of getting a sample mean of less than 3.0 after 100 die rolls is only 0.17%, more than 10 times smaller. We are much more likely to have a sample mean of less than 3.0 after only 50 die rolls than we are after 100 die rolls.

This is because the larger our sample size, the more likely our sample mean will be close to the expected value.

***

### Example 4.3.2
The length of time taken on the SAT for a group of students is normally distributed with a mean of 2.5 hours and a standard deviation of 0.85 hours. A sample size of $n = 30$ is drawn randomly from the population. Find the probability that the sample mean is between 2.3 hours and 2.7 hours.

#### Solution
By the Central Limit Theorem, sample means of samples of size $n = 30$ are normally distributed with

$$ \mu_{\overline{X}} = \mu_X = 2.5 $$

and a standard deviation of

$$ \sigma_{\overline{X}} = \frac{\sigma_X}{\sqrt{n}} = \frac{0.85}{\sqrt{30}} = 0.1552; $$

that is, $\overline{X} \sim N(2.5, 0.1552)$.

We want to find $P(2.3 < \bar{x} < 2.7)$. To do so, we first must find the $z$-scores associated with $\bar{x} = 2.3$ and $\bar{x} = 2.7$. We calculate

$$ z = \frac{\bar{x} - \mu_{\overline{X}}}{\sigma_{\overline{X}}} = \frac{2.3 - 2.5}{0.1552} = -1.2887, $$

$$ z = \frac{\bar{x} - \mu_{\overline{X}}}{\sigma_{\overline{X}}} = \frac{2.7 - 2.5}{0.1552} = 1.2887. $$

Then $P(2.3 < \bar{x} < 2.7) = P(-1.2887 < z < 1.2887)$. We will use R to find the probability. We will first find *all* the area to the left of $z = 1.2887$, then subtract off the excess area to the left of $z = -1.2887$.

In [1]:
pnorm(q = 1.2887, lower.tail = TRUE) - pnorm(q = -1.2887, lower.tail = TRUE)

So $P(2.3 < \bar{x} < 2.7) = P(-1.2887 < z < 1.2887) = 0.8025$. There is an 80.25% chance that the sample of 30 individuals has a mean test time of between 2.3 and 2.7 hours.

***

### Example 4.3.3

In [1]:
#**VID=pQmC_Ft1huQ**#

***

### Example 4.3.4 ###

In [2]:
#**VID=_SrobSlBqjQ**#

***

<small style="color:gray"><b>License:</b> This work is licensed under a [Creative Commons Attribution 4.0 International](https://creativecommons.org/licenses/by/4.0/) license.</small>

<small style="color:gray"><b>Author:</b> Taylor Baldwin, Mt. San Jacinto College</small>

<small style="color:gray"><b>Adapted From:</b> <i>Introductory Statistics</i>, by Barbara Illowsky and Susan Dean. Access for free at [https://openstax.org/books/introductory-statistics/pages/1-introduction](https://openstax.org/books/introductory-statistics/pages/1-introduction).</small>