# Central Tendency

In `statistics`, a `central tendency` (or measure of central tendency) is a `central` or `typical value` for a `probability distribution`. It may also be called a `center` or `location` of the `distribution`. ... The most common measures of central tendency are the arithmetic `mean`, the `median`, and the `mode`.

### Mode

The mode is the most frequent observation (or observations) in a sample. If we have the sample `[4, 1, 2, 2, 3, 5]`, then its mode is `2` because `2` appears two times in the sample whereas the other elements only appear once.

The mode doesn't have to be unique. Some samples have more than one mode. Say we have the sample `[4, 1, 2, 2, 3, 5, 4]`. This sample has two modes - `2` and `4` because they're the values that appear more often and both appear the same number of times.

The mode is commonly used for `categorical` data. Common categorical data types are:

- `boolean` - Can take only two values like in true or false, male or female
- `nominal` - Can take more than two values like in American - European - Asian - African
- `ordinal` - Can take more than two values but the values have a logical order like in few - some - many
When we're analyzing a dataset of categorical data, we can use the mode to know which category is the most common in our data.

#### Using `statistics` Module

In [1]:
import statistics

'mode' in dir(statistics) # cheak mode function is in statistics module?

True

In [3]:
statistics.mode([1,2,1,1,3,4,5,2,1,2,2,4,4])

1

In [4]:
statistics.multimode([1,2,1,1,3,4,5,2,1,2,2,4,4])

[1, 2]

#### Using user-defined function

In [39]:
def find_mode(sample):
    val_counts_map = {}
    
    # count the frequency     
    for samp in sample:
        if samp in val_counts_map.keys():
            val_counts_map[samp] += 1
        else:
            val_counts_map[samp] = 1
    
    
    # find the max freq key
    max_freq = max(val_counts_map.values())
    
    # multimode      
    result = list()
    for key, values in val_counts_map.items():
        if max_freq == values:
            result.append(key)
    
    return result

In [40]:
find_mode([1,2,1,1,3,4,5,2,1,2,2,4,4])

[1, 2]

### Mean

If we have a sample of numeric values, then its `mean` or the `average` is the `total sum of the values` (or observations) `divided` by the number of `values`.

Say we have the sample `[4, 8, 6, 5, 3, 2, 8, 9, 2, 5]`. We can calculate its mean by performing the operation:

`(4 + 8 + 6 + 5 + 3 + 2 + 8 + 9 + 2 + 5) / 10 = 5.2`

In [43]:
statistics.mean([1,2,3,4,5,6,7,8,9,10])

5.5

In [42]:
def find_mean(sample):
    return sum(sample) / len(sample)

find_mean([1,2,3,4,5,6,7,8,9,10])

5.5

### Median

The median is the middle number in a sorted, ascending or descending, list of numbers and can be more descriptive of that data set than the average. ... If there is an odd amount of numbers, the median value is the number that is in the middle, with the same amount of numbers below and above.

|   |   |
|---|---|
|![median](https://upload.wikimedia.org/wikipedia/commons/thumb/c/cf/Finding_the_median.png/220px-Finding_the_median.png)   | ![median](https://www.gstatic.com/education/formulas2/355397047/en/median_formula.svg)  |


In [44]:
statistics.median([1,2,3,4,3,2,2,4,2])

2

In [50]:
sample = [1,2,3,4,5,6,1,4,2]

def find_median(sample):
    # first ordering the sample using sort function
    sample.sort()
    
    if len(sample) % 2 == 0:
        return (sample[len(sample) // 2] + sample[len(sample) // 2 + 1]) / 2
    else:
        return sample[len(sample) // 2]

find_median(sample)

3

In [49]:
statistics.median([1,2,3,4,5,6,1,4,2])

3

## Central Tendency Quiz Solution using Pandas DataFrame

[Dataset link - BBC Memory Test](https://docs.google.com/spreadsheets/d/1VqVtu1bszwZUjjigukMcMHQNxFSvviGgjx6REcTUQgc/edit#gid=0)

In [52]:
import pandas as pd

In [56]:
df = pd.read_csv('./datasets/Sample Memory Scores.csv')

In [57]:
df.head()

Unnamed: 0,Recognition_Score,Variance,Squared Variance,Temporal_Memory_Score,Other1,Other2
0,91,91,8281,86,86,7396
1,95,95,9025,78,78,6084
2,95,95,9025,56,56,3136
3,91,91,8281,81,81,6561
4,100,100,10000,75,75,5625


#### Calculate Mean, Mode, Median

In [67]:
df.Recognition_Score.mean()

93.11555555555556

In [68]:
df.Recognition_Score.mode()

0    100
dtype: int64

In [69]:
df.Recognition_Score.median()

92.0

In [70]:
find_mode(list(df.Recognition_Score))

[100]

In [71]:
statistics.mode(list(df.Recognition_Score))

100

### Other Pandas DataFrame Operations

In [72]:
df.Recognition_Score.sum()

20951

In [73]:
df.Recognition_Score.min()

10

In [74]:
df.Recognition_Score.max()

200

In [76]:
df.Recognition_Score.size

225

In [77]:
# Calculate mean ... using sum() and size
df_mean = df.Recognition_Score.sum() / df.Recognition_Score.size
df_mean

93.11555555555556

In [94]:
df.Recognition_Score.value_counts().

TypeError: 'numpy.ndarray' object is not callable