# Averages and Variability

> Explore how to summarize distributions using the mean, the median, and the mode. Explore how to measure variability using variance or standard deviation, and how to locate and compare values using z-scores.

- author: Victor Omondi
- toc: true
- comments: true
- categores: [statistics]
- image:

# Libraries

In [1]:
# WARNINGS
import warnings

# MANIPULATION & CLEANING
import pandas as pd
import numpy as np

# VISUALIZATION
import matplotlib.pyplot as plt
import seaborn as sns

## Libraries Configuration

In [2]:
pd.set_option('display.max_columns', None)
pd.set_option('display.max_colwidth', None)

# The mean

We will explore: 
* Why the mean is the balance point of a distribution.
* How to distinguish between the sample and the population mean
* Why the sample mean is an unbiased estimator

We'll explore how to  *summarize*  the distribution of a variable with a single value. Depending on the particular characteristics of a distribution, we'll see that we can summarize it using  **the mean** ,  **the weighted mean** ,  **the median** , or  **the mode** .

We'll also explore how to measure the  *variability*  in a distribution. If we have a distribution  *A*  with the values [3, 3, 3, 3], and a distribution  *B*  with [30, 1, 15, 43], we can clearly see that there's much more variability (diversity) in  *B* . We'll learn to quantify variability using measures like  **variance**  and  **standard deviation** .we can then explore how to  *locate a value in a distribution* , and determine how it compares to other values. For instance, when we analyze salaries, we might want to find out whether a salary of \$75000 is common or extreme inside a company. We'll explore how to answer this question with precision using a  **z-score** .

Let's say we want to summarize the distribution below with a single value that is representative of the distribution as a whole.

$$
[
0
,
1
,
4
,
7
,
8
,
10
]
$$

Intuitively, we need to take into account  *equally*  every single value in the distribution if we want to find a good summary value that's representative of the entire distribution. We could try to sum all the values in the distribution, and then divide the total by the number of values we added — this way we'll manage to take into account equally every value in the distribution:

$$
\frac{0+1+4+7+8+10}{6}=\frac{30}{6}=5
$$

When we compute the summary value of a distribution in this way, we call the value  **the arithmetic mean** , or  **the mean** . For our distribution above, the mean is 5.