# 2.2 Measures of Central Tendency #

## Objectives ##
- Perform descriptive data analysis on collected data including calculation and interpretation of measures of central tendency, measures of variation, measures of position, detection of outliers, and applying the Empirical Rule to bell-shaped data.
- Estimate population parameters using both point estimates and confidence interval estimates using both the normal and the Student $t$-distribution.
- Analyze an application in the disciplines business, social sciences, psychology, life sciences, health science, and education, and utilize the correct statistical processes to arrive at a solution.

## Mean and Median ##
The "center" of a data set is also a way of describing location. The two most widely used measures of the "center" of the data are the **mean** (average) and the **median**. To calculate the mean weight of 50 people, add the 50 weights together and divide by 50. To find the median weight of the 50 people, order the data and find the number that splits the data into two equal parts. The median is generally a better measure of the center when there are extreme values or outliers because it is not affected by the precise numerical values of the outliers. The mean is the most common measure of the center.

**Note:** The words “mean” and “average” are often used interchangeably. The substitution of one word for the other is common practice. The technical term is “arithmetic mean” and “average” is technically a center location. However, in practice among non-statisticians, “average" is commonly accepted for “arithmetic mean.”

When each value in the data set is not unique, the mean can be calculated by multiplying each distinct value by its frequency and then dividing the sum by the total number of data values. The letter used to represent the sample mean is an x with a bar over it (pronounced “x bar”): $\bar{x}$.

The Greek letter $\mu$ (pronounced "mew" and spelled in English "mu") represents the population mean. One of the requirements for the sample mean to be a good estimate of the population mean is for the sample taken to be truly random.

To see that both ways of calculating the mean are the same, consider the sample:
<center>
    1, 1, 1, 2, 2, 3, 4, 4, 4, 4, 4
</center>
<br/>

$$ \bar{x} = \frac{1 + 1 + 1 + 2 + 2 + 3 + 4 + 4 + 4 + 4 + 4}{11} = 2.727$$
<br/>

$$ \bar{x} = \frac{3(1) + 2(2) + 1(3) + 5(4)}{11} = 2.727$$
<br/>

In the second calculation, the frequencies are 3, 2, 1, and 5 since the data contains 3 ones, 2 twos, 1 three, and 5 fours.

In R, these calculations look like:

In [1]:
xbar = (1 + 1 + 1 + 2 + 2 + 3 + 4 + 4 + 4 + 4 + 4)/11
xbar

In [1]:
xbar = (3*1 + 2*2 + 1*3 + 5*4)/11
xbar

**Note**: In each case, we first calculate the mean and store the value in the variable <code>xbar</code>. To have the computer display the value that we stored in <code>xbar</code>, we simply type <code>xbar</code> by itself on the next line.

Observe that all we do to calculate the mean is add all the data values together, then divide by the number of data values. This concept can be more succinctly expressed using the formula

$$ \bar{x} = \frac{\sum x}{n} $$

We use $\sum$ (the capital Greek letter sigma) when we want to add up or find the sum of values. In this case, the formula is telling us to add up all the $x$'s, where we use $x$ as a placeholder for our data values. Then we divide the sum of $x$'s by $n$, where $n$ is the number of data values we have. Note, though we've used the sample mean $\bar{x}$ in the formula, this is the same formula we use to calculate the population mean $\mu$.

R makes this easy. We can use the <code>sum</code> function to add up the values in a list, which gives us $\sum x$. And we can use the <code>length</code> function to find out how many numbers are in a list, which give us $n$. Both the <code>sum</code> function and the <code>length</code> function have just one argument:

```R
sum(x)
```

```R
length(x)
```

In both cases, <code>x</code> is a list of data values.

So for the above sample data, we can calculate the mean using R as follows:

In [1]:
values = c(1, 1, 1, 2, 2, 3, 4, 4, 4, 4, 4)
n = length(values)

xbar = sum(values)/n
xbar

We've already discussed the median in section 2.1: the median $M$ is the same as $Q_2$, the second quartile, or the 50th percentile. The median is the 'middle value' of the data: exactly half the data are greater than the median, and exactly half the data are less than the median. We've seen that we can find the median in R using <code>quantile(x, probs=0.50)</code>. But the median is such an important value that there is another even simpler function in R to calculate it:

```R
median(x)
```
where, as usual, <code>x</code> is the list of data values that we want the median of.

For example, to find the median of the data in our example above, we would type:

In [1]:
values = c(1, 1, 1, 2, 2, 3, 4, 4, 4, 4, 4)

median(values)

***


### Example 2.2.1 ###
AIDS data indicating the number of years a patient with AIDS lives after taking a new antibody drug are as follows:
<center>
    3, 4, 8, 8, 10, 11, 12, 13, 14, 15, 15, 16, 16, 17, 17, 18, 21, 22, 22, 24, 24, 25, 26, 26, 27, 27, 29, 29, 31, 32, 33, 33, 34, 34, 35, 37, 40, 44, 44, 47
</center>
Calculate the mean and median.

#### Solution ####

In [1]:
years = c(3, 4, 8, 8, 10, 11, 12, 13, 14, 15, 15, 16, 16, 17, 17, 18, 21, 22, 22, 24, 24, 25, 26, 26, 27, 27, 29, 29, 31, 32, 33, 33, 34, 34, 35, 37, 40, 44, 44, 47)

# Find the Mean
n = length(years)

xbar = sum(years)/n
xbar

# Find the Median
M = median(years)
M

So the mean is $\bar{x} = 23.575$  years and the median is $M = 24$ years.

***


### Example 2.2.2 ###
The following data show the number of months patients typically wait on a transplant list before getting surgery. The data are ordered from smallest to largest. Calculate the mean and median.
<center>
    3, 4, 5, 7, 7, 7, 7, 8, 8, 9, 9, 10, 10, 10, 10, 10, 11, 12, 12, 13, 14, 14, 15, 15, 17, 17, 18, 19, 19, 19, 21, 21, 22, 22, 23, 24, 24, 24, 24
</center>

#### Solution ####

In [1]:
months = c(3, 4, 5, 7, 7, 7, 7, 8, 8, 9, 9, 10, 10, 10, 10, 10, 11, 12, 12, 13, 14, 14, 15, 15, 17, 17, 18, 19, 19, 19, 21, 21, 22, 22, 23, 24, 24, 24, 24)

# Find the Mean
n = length(months)

xbar = sum(months)/n
xbar

# Find the Median
M = median(months)
M

So the mean is about $\bar{x} = 13.949$ months and the median is $M = 13$ months.

***


### Example 2.2.3 ###
Suppose that in a small town of 50 people, one person earns \$5,000,000 per year and the other 49 each earn \$30,000. Which is better measure of the "center": the mean or the median?

#### Solution ####
Since there is one person who earns \$5,000,000 per year and 49 people who earn \$30,000 per year, we calculate the mean as follows:

In [1]:
xbar <- (5000000 + 49*30000)/50
xbar

So the mean is \$129,400 per year.

To calculate the median, imagine lining up our data from smallest to largest. Clearly, the value in the middle, the median, would be \$30,000 per year.

The median is a better measure of the "center" than the mean in this case because 49 of the values are \$30,000 and one is \$5,000,000. The \$5,000,000 value is an outlier. The \$30,000 gives us a better sense of the middle of the data.

***

### Example 2.2.4

In [None]:
#**VID=FgjyGnTE3mM**#

***

<small style="color:gray"><b>License:</b> This work is licensed under a [Creative Commons Attribution 4.0 International](https://creativecommons.org/licenses/by/4.0/) license.</small>

<small style="color:gray"><b>Author:</b> Taylor Baldwin, Mt. San Jacinto College</small>

<small style="color:gray"><b>Adapted From:</b> <i>Introductory Statistics</i>, by Barbara Illowsky and Susan Dean. Access for free at [https://openstax.org/books/introductory-statistics/pages/1-introduction](https://openstax.org/books/introductory-statistics/pages/1-introduction).</small>