# Measures of Central Tendency

Measure of central tendency is a summary statistic that represents central point or typical value of a dataset.

This measures indicates where most of the value in the distribution falls and are also known as **central location of distribution.**

You think of it as a tendency of data to cluster around a middle value.

# Uses of Measures of Central Tendency

A measure of central tendency can be used as a standard for judging the relative positions of other items in the same set of data.

A measure of central tendency can be used to compare the relative sizes of two different sets of data (for example: compare the average of two sets of data).

We get the picture for the variablity (spread) of the data by looking at the dispersion (grouping of individual observations around the average). This helps us to determine the consistency among the observations.

# Types of Central Tendency

1. Mean
2. Median
3. Mode

**Mean, Median, Mode** are different measures of center in a numerical data set. They each try to summarize a dataset with a single number to represent a "typical" data point from the data set.

# Mean:

The "average" number found by adding all data points divided by the number of datapoints.

The mean is arithmetic average. The value where the set of data balances also called as **"point of balance"** when each data value is stacked on the data line.

<img src="images/population_mean.png">

<img src="images/sample_mean.png">

<font color="red">Mathematically solved</font>

1. Find the mean of this data 1,5,4,5,8,1,2,3,4.

Solution:

Add all the numbers 

1+5+4+5+8+1+2+3+4 = 33

Mean = 33/9 = 3.666

<font color="red">Python code </font>

In [2]:
import numpy as np
import pandas as pd
import statistics as st

data = np.array([1, 5, 4, 5, 8, 1, 2, 3, 4])

print(f'The mean of the data point is: {st.mean(data)}')

The mean of the data point is: 3


<font color="red">Note:</font>
    
Mean is the most common measures of central tendency but it has a huge downside because it is easily afftected by outliers- which value is significantly greater than other values in the dataset.

For example:

<font color="red">Mathematically solved</font>

1. Find the mean of this data 1,5,4,5,8,1,2,3,4,50.

Solution:

Add all the numbers 

1+5+4+5+8+1+2+3+4+50 = 83

Mean = 83/10 = 8.3

<font color="red">Python code</font>

In [3]:
data = np.array([1, 5, 4, 5, 8, 1, 2, 3, 4, 50])

print(f'The mean of the data point is: {st.mean(data)}')

The mean of the data point is: 8


**Explaination:**

As you can see because of one outlier value 50 your mean get corrupted by high margin from 3.66 to 8.3

# Median

The median is the middle value that splits the data set into half. In the middle value in distribution when values are arranged according to the size.

It is the value of the variable that divides a set of data into two equal groups so that half the observations have values smaller than the median, and half the values larger than the median.

The median is measure of choice when a numerical value as some few unusually low or high values in the data set. If this occurs then mean will be pulled away from the center and not be representative in majority of cases.

<img src="images/median_odd_formula.png">

<img src="images/median_even_formula.png">

<font color="red">Mathematical Implementation (Odd number of points)</font>

Question: Find the median of this data: 1,5,4,8,1,2,3,9,7.

Solution:

1. Put the data in order first: 1,1,3,4,5,7,8,9

2. There is an odd number in data points.

3. SO the median is 4 the middle data point.

<font color="red">Python Code:</font>

In [4]:
data = np.array([1,5,4,8,1,2,3,9,7])

print(f'The median of the data point is {st.median(data)}')

The median of the data point is 4


<font color="red">Mathematical Implementation (Even number of points)</font>

Question: Find the median of this data: 1,5,4,8,1,2,3,9.

Solution:

1. Put the data in order first: 1,1,3,4,5,7,8,9

2. There is an even number in data points.

3. The mean is mean of 3 and 4 which is 3.5.

<font color="red">Python Code:</font>

In [5]:
data = np.array([1,5,4,8,1,2,3,9])

print(f'The median of the data point is {st.median(data)}')

The median of the data point is 3.5


# Mode

The mode is the value that offers the largest number of times in a data set for the response category of a variable that is most frequently chosen by the respondents.

In a bar chart or histogram mode is tallest bar.

When a distribution has one mode we say it is unimodal if it has two modes we say it is bi-model if there are several modes we say it multi-modal if no value replace data has no mode.

### Finding the mode:

The mode is the most commonly occuring data points in a dataset. The mode is useful when there are lots of repeated values in a datset. There can be no mode, one mode or multiple modes.

<font color="red">Mathematical Implementation </font>

Question: Find the mode of the data: 1,5,4,8,1,2,3,9.

Answer:

As we can see the 1 is occuring the highest times. So the mode is 1.

<font color="red">Python Code:</font>

In [6]:
data = np.array([1,5,4,8,1,2,3,9])

print(f'The mode of the data poin is {st.mode(data)}')

The mode of the data poin is 1


# Measures of Central Tendency with level of measurements:

### Nominal Data: 
Measures of Central tendency are applied to the frequency found in different categories of a nominal variable.  

In nominal data mode is only measure of Central tendency that can be used.

### Ordinal Data:
Either **median** or **mode** is the measure of Central tendency that can be used. Here median is a preferred measure of Central tendency.

### Interval or Ratio:
The **mean median and mode** may be the used as measure of Central tendency. 

- For **normal distribution** <font color="red">mean</font> is a most preferred measure of Central tendency. 

- For **skewed distribution** <font color="red">median</font> is a most preferred over mean and mode.

The mode is only measure of Central tendency that can be used for all level of measurements

### Given a set of marks of students in a class. Find mean, median and mode.

### Marks = {40, 32, 42, 40, 15, 25, 40, 10, 32, 40, 37, 23,18, 29,41}

In [7]:
import numpy as np
import statistics as st

data = np.array([40,32,42,40,15,25,40,10,32,40,37,23,18,29,41])

mean = st.mean(data)
median = st.median(data)
mode = st.mode(data)

print(f"The mean of the data points in the data set is {mean}")
print(f"The median of the data points in the data set is {median}")
print(f"The mode of the data points in the data set is {mode}")

The mean of the data points in the data set is 30
The median of the data points in the data set is 32
The mode of the data points in the data set is 40


In [12]:
# importing the dataset
df = pd.read_csv('Yield_vs_Temperature.csv')
df

Unnamed: 0,Observation Number,Temperature (Xi),Yield (Yi)
0,1,50,122
1,2,53,118
2,3,54,128
3,4,55,121
4,5,56,125
5,6,59,136
6,7,62,144
7,8,65,142
8,9,67,149
9,10,71,161


In [9]:
# finding mean
df['Temperature (Xi)'].mean()

74.84

In [10]:
# finding median
df['Temperature (Xi)'].median()

75.0

In [15]:
# finding mode
st.mode(df['Temperature (Xi)'])

50