# "Tutorial 09: Measure of Central Tendency: Ungrouped Data"
> "Central Tendency for Ungrouped Data"

- toc: true 
- badges: true
- comments: true
- categories: [basic-stats]
- sticky_rank: 9
- search_exclude: true
- hide: true

In [1]:
#collapse-hide
## required packages/modules
import numpy as np
import matplotlib.pyplot as plt
from matplotlib import rcParams
from IPython.display import display, HTML

## default font style
rcParams["font.family"] = "serif"

## format output
CSS = """
.output {
  margin-left:20;
}
"""

HTML('<style>{}</style>'.format(CSS))

# Introduction 

* A **measure of central tendency** is a summary statistic that represents the centre value of a dataset.

* These measures indicate where most values in a distribution fall. In statistics, the three most common measures of central tendency are:
<ol>
    <li>Mean</li>
    <li>Median</li>
    <li>Mode</li>
</ol>

* If we have a dataset of test scores for a particular class, the measures of central tendency can yield information such as *the average test-score*, *the middle test-score*, and *the most frequently occurring test-score*.

> Note: Measures of central tendency do not focus on the span of the dataset or how far values are from the middle numbers. The central tendency of a distribution represents only one characteristic of a distribution i.e. the center value.

# Mean

* The **mean** is the *arithmetic average of a group of observations* (or numbers).

* It is computed by summing all the observations and dividing by the total number of observations.

* The population mean is represented by the Greek letter mu ($\mu$), and the sample mean is represented by $\overline{x}$.

* The formulae for computing the population mean and the sample mean are given below:
    
    * Population Mean: $\mu = \frac{\Sigma x}{N} = \frac{x_{1} + x_{2} + x_{3} + .. + x_{N}}{N}$
    
    * Sample Mean: $\overline{x} = \frac{\Sigma x}{n} = \frac{x_{1} + x_{2} + x_{3} + .. + x_{n}}{n}$
    
* Let's break down the formulae:
    
    * The capital Greek letter *sigma* (\Sigma) is commonly used in mathematics to represent a summation of all the numbers in a grouping. 

    * *N* is the number of observations in the population, and *n* is the number of observation in the sample.

## Example

**Example 01:** The number of U.S. cars in service by top car rental companies in a recent year according to Auto Rental News follows.

|Company|Number of Cars in Service|
|:-|-:|
|Enterprise|643,000|
|Hertz|327,000|
|National/Alamo|233,000|
|Avis|204,000|
|Dollar/Thrifty|167,000|
|Budget|144,000|
|Advantage|20,000|
|U-Save|12,000|
|Payless|10,000|
|ACE|9,000|
|Fox|9,000|
|Rent-A-Wreck|7,000|
|Triangle|6,000|

Compute the mean.

* **Solution:**

Here we have a total of 12 observations, so *N=13*. 

$\mu = $ $\frac{643000 + 327000 + 233000 + 204000 + 167000 + 144000 + 20000 + 12000 + 10000 + 9000 + 9000 + 7000 + 6000}{13}$ 

$\mu = $ $\frac{1791000}{13}$ = $137769.23$

* Let's look how we can do the same in Python.

In [6]:
#collapse-hide
mean_ = np.mean(
    a=[
        643000, 327000, 233000, 204000, 167000, 144000, 
        20000, 12000, 10000, 9000, 9000, 7000, 6000
    ]
)

print(f"Mean = {round(mean_, 2)}")

Mean = 137769.23


## Properties of Mean

* **Property 01:** In a distribution, *minimum-value* $\leq$ *mean* $\leq$ *maximum-value*

* **Property 02:** Mean of *equal valued* elements is equal to the element value.
    
    * e.g. if we have the following distribution: 10, 10, 10. The mean will be equal to $\frac{10 + 10 + 10}{3} = 10$ which is equal to the element value itself, i.e. 10
    
* **Property 03:** Given the mean ($\mu$) and total number of observation ($N$) of a distribution, we can find the total sum of all the numbers in the distribution by using the following formula.
    
    * $\Sigma x = \mu * N$
    
* **Property 04:** If a distribution $x_{1}$, $x_{2}$, $x_{3}$, ..., $x_{N}$ has $\mu$ as its mean and we increment each number in the distribution by a value $k$, the mean of the new distribution will be $\mu + k$.
    
    * e.g. Let's say we have the following distribution: 7, 3, 5, 2, 9, 4, 1, 2, 3.
    
    * The mean ($\mu_{1}$) $= \frac{7 + 3 + 5 + 2 + 9 + 4 + 1 + 2 + 3}{9} = 4$
    
    * If we increment each number in the above distribution by *k=2*. The new distribution becomes: 9, 5, 7, 4, 11, 6, 3, 4, 5.
    
    * The mean of new distribution ($\mu_{2}$) $= \frac{9 + 5 + 7 + 4 + 11 + 6 + 3 + 4 + 5}{9} = 6$
    
    * Or it can be easily calculated by the formula in *Property 03*. 
        
        * So, $\mu_{2} = \mu_{1} + k = 4 + 2 = 6$
        
* **Property 05:** If a distribution $x_{1}$, $x_{2}$, $x_{3}$, ..., $x_{N}$ has $\mu$ as its mean and we multiply each number in the distribution by a value $k$, the mean of the new distribution will be $\mu * k$.
    
    * e.g. Let's say we have the following distribution: 7, 3, 5, 2, 9, 4, 1, 2, 3.
    
    * The mean ($\mu_{1}$) $= \frac{7 + 3 + 5 + 2 + 9 + 4 + 1 + 2 + 3}{9} = 4$
    
    * If we multiply each number in the above distribution by *k=2*. The new distribution becomes: 14, 6, 10, 4, 18, 8, 2, 4, 6.
    
    * The mean of new distribution ($\mu_{2}$) $= \frac{14 + 6 + 10 + 4 + 18 + 8 + 2 + 4 + 6}{9} = 8$
    
    * Or it can be easily calculated by the formula in *Property 04*. 
        
        * So, $\mu_{2} = \mu_{1} * k = 4 * 2 = 8$
        
* **Property 06:** If we have two distribution, first having *N* elements and a mean of $\mu_{N}$, and second having *M* elements and a mean of $\mu_{M}$, then the mean of $M + N$ elements is:
    
    * $\mu_{N + M} = \frac{(N * \mu_{N}) + (M * \mu_{M})}{N + M}$
    
    * e.g. Let's say our first distribution (N=9) looks like this: 7, 3, 5, 2, 9, 4, 1, 2, 3. ($\mu_{N} = 4$)
    
    * And our second distribution (M=7) looks like this: 4, 6, 5, 4, 1, 3, 2. ($\mu_{M} = 3.57$)
    
    * To calculate the mean for both the distribution combined we can use the above formula.
    
    * $\mu_{N + M} = \frac{(9 * 4) + (7 * 3.57)}{9 + 7} = 3.81$
    
* **Property 07:** If in a given distribution with a mean value $\mu$, we add a new number greater than the mean-value ($\mu$), then our mean will increase otherwise it will decrease. The new mean can be calculated by the following formula.

    * $\mu_{new} = \mu + \frac{x_{N + 1} - \mu}{N + 1}$, where $x_{N + 1}$ is the newly added number.

    * e.g. Let's say we have the following distribution: 7, 3, 5, 2, 9, 4, 1, 2, 3. The mean of this given distribution is 4.
    
    * **Case 01:** If we add a number, to this distribution, which is greater than the mean itself then the new mean will increase. Let's say we added 6 to this distribution. Now the new mean can be calculate by the above formula.
    
        * $\mu_{new} = 4 + \frac{6 - 4}{10} = 4.2$
    
        * 4.2 is the new mean when 6 is added to the distribution and since 6 is greater than the original mean value (which is 4), our new mean(4.2) has increased.

    * **Case 02:** If we add a number, to the original distribution, which is smaller than the mean itself then the new mean will decrease. Let's say we added 2 to the original distribution. Now the new mean can be calculate by the above given formula.
        
        * $\mu_{new} = 4 + \frac{2 - 4}{10} = 3.8$
    
        * 3.8 is the new mean when 2 is added to the original distribution and since 2 is smaller than the original mean value (which is 4), our new mean(3.8) has decreased.
        
    * **Case 03:** If we add a number, to the original distribution, which is equal to the mean itself then the mean remains the same. Let's say we added 4 to the original distribution. Now the new mean can be calculate by the above given formula.
        
        * $\mu_{new} = 4 + \frac{4 - 4}{10} = 4$
    
        * 4 is the new mean when 4 is added to the original distribution and since 4 is equal to the original mean value, our mean value remains the same.

* **Property 08:** If in a given distribution with a mean value $\mu$, we remove a number from the distribution which is greater than the mean-value ($\mu$), then our new mean will decrease otherwise it will increase. The new mean can be calculated by the following formula.

    * $\mu_{new} = \frac{(\mu * N) - k}{N - 1}$, where $N$ is total number of observation in the original distribution and $k$ is the number removed from the distribution.

In [38]:
np.mean(
    [7, 3, 5, 2, 9, 4, 1, 2]
)

4.125