# Measures of Central Tendency

**"The tendency of quantitative data to cluster around some central value."**

Measures of central tendency help you find the middle, or the average, of a data set. The 3 most common measures of central tendency are the mode, median, and mean.

* Mean: the sum of all values divided by the total number of values.
* Median: the middle number in an ordered data set.
* Mode: the most frequent value.

## Mean

*The sum of all values divided by the total number of values.*

**So Provided that we have a set of data. How can we find the mean?**

In [1]:
# Suppose that A is a list of student GPAs, we are going to find the mean:
A = [3.5, 2, 2.8, 2.9, 3.3, 4.0, 2.9, 1.7, 3.4, 3.0, 3.0, 2.6, 3.1, 2.6, 2.0, 3.7]

# First we find the sum of all values in A
sum = 0
for value in A:
    sum += value

# Then we devide the sum by the number of elements in A
mean = sum / len(A)

print(mean)

2.9062500000000004


**Using NumPy Arrays:**

Note that using NumPy will make this much easier:

In [3]:
import numpy as np

A = [3.5, 2, 2.8, 2.9, 3.3, 4.0, 2.9, 1.7, 3.4, 3.0, 3.0, 2.6, 3.1, 2.6, 2.0, 3.7]

# Create a NumPy array from the python list A
A = np.array(A)

# Find the sum using NumPy's built-in methods
sum = A.sum()

mean = sum/len(A)

print(mean)

2.90625


Actually we can find the mean directly by calling NumPy's universal function (mean):

In [4]:
import numpy as np

A = np.array(A)

mean = np.mean(A)

print(mean)

2.90625




```
# This is formatted as code
```

## Median

*The middle number in an ordered data set.*

**Finding the median using python lists:**

In [13]:
def median(x):
    # First step is to sort the data. Python lists have a built-in method to do that
    x.sort()
    print(x)
    # We have to find the middle number which can be tricky depending on the number
    # of elements we have in the list

    if len(x) % 2 == 1:  # If the list has an odd number of elements 
        print("the list has odd elements")
        middle = int((len(x)) / 2)  # int(3.5) = 3    This works because indexing starts from 0
        median = x[middle]    #  x[3]
        
    else:  # If the list has an even number of elements 
        print("the list has even elements")
        median = (x[int(len(x)/2 - 1)] + x[int(len(x)/2)]) / 2  # See the image above

    print(median)

In [14]:
# Odd
x = [6, 3, 8, 1, 7, 9, 3]
median(x)

# Even
x = [6, 3, 2, 8, 9, 1, 5, 4]
median(x)

[1, 3, 3, 6, 7, 8, 9]
the list has odd elements
6
[1, 2, 3, 4, 5, 6, 8, 9]
the list has even elements
4.5


**Using NumPy to find the median:**

In [3]:
import numpy as np

x = np.array([6, 3, 8, 1, 7, 9, 3])
# sort = [1,3,3,6,7,8,9]
print(np.median(x))

# Even
x = np.array([6, 3, 2, 8, 9, 1, 5, 4])
print(np.median(x))

6.0
4.5


**So Basically NumPy is better at doing this kind of operations on data. This is why we are going to use it from now on**

*We are not very interested in the (mode) so we will not discuss it here. Feel free to read about it online.*

## Mean VS Median

As mentioned above, these central tendency measures tend to give us a representative value of the data (where the data is centered). 

* If the data is symmetric (normally distributed), then **the mean and the median are the same**.
* If the data is skewed, then **the median has the better value to represent the data**.

# Standard Deviation σ

*measures the dispersion of a dataset relative to its mean*

Read more [here](https://www.investopedia.com/terms/s/standarddeviation.asp#:~:text=The%20standard%20deviation%20is%20a,square%20root%20of%20the%20variance.&text=If%20the%20data%20points%20are,the%20higher%20the%20standard%20deviation.) to understand the theory behind it and what it exactly means

**We can find the standard deviation for any array easily using NumPy:**

In [15]:
# Create a random array
np.random.seed(42)
x = np.random.rand(100)

std = np.std(x)

print(std)

0.29599822663249037


# Empirical Rule

**The empirical rule states that:**

### *For a normal distribution, almost all observed data will fall within three standard deviations (denoted by σ) of the mean or average (denoted by µ).*


**In particular, the empirical rule predicts that:**
* **68% of observations falls within the first standard deviation (µ ± σ)**
* **95% within the first two standard deviations (µ ± 2σ)**
* **and 99.7% within the first three standard deviations (µ ± 3σ).**
-----------------------------------------------
**µ and σ can be calculated using the methods above**

**The data has to be normally distributed (have a symmetric shape around the mean µ)**

**Read more about the Empirical Rule [here](https://www.investopedia.com/terms/e/empirical-rule.asp)**

-----------------------------------------------