# Mean
- The mean (or average) is the most popular and well-known measure of central tendency. 
- We can calculate the mean for both discrete and continuous data, although its use is most often with continuous data.
- Formula 
\begin{equation} \overline{x} = \frac{x_{(1)} + x_{(2)} + x_{(3)} + ... + x_{(n)}}{n} \end{equation} 

- we can also write this queation as 
\begin{equation} \overline{x} = \frac{\sum_{i=1}^{n} x_{(i)}}{n} \end{equation} 
    - x_bar = Mean value of data
    - x1, x2, x3, ..., xn = n different values of data
    - n = total no of samples
- The mean is equal to the sum of all the values in the data set divided by the number of values in the data set.
- one of its important properties is that it minimises error in the prediction of any one value in your data set. That is, it is the value that produces the lowest amount of error from all other values in the data set.
- Another important property of the mean is that it includes every value in your data set as part of the calculation. 
- Also, the mean is the only measure of central tendency where the sum of the deviations of each value from the mean is always zero.


**Advantages**
  - Most popular measure in fields such as business, engineering and computer science.
  - It is unique - there is only one answer.
  - Useful when comparing sets of data

**Disadvantages**
  - Affected by extreme values (outliers)

In [None]:
# lets calculate the mean value
values = [12000, 13000, 12000, 14000, 15000, 13000, 15000, 16000, 17000, 14000, 13000, 12000]

mean = sum(values) / len(values)

print(f"The average salary of Surat people is {round(mean,2)}")

# Median

- You get the median of a set by simply arranging all the elements of it from smallest to greatest, then taking the middle value.
- Lets consider an example  X = [3,1,6,7,2,8,9]
   - The first step is to sort the given values
   - X = [1,2,3,6,7,8,9]
   - Take the middle value
       - If the length of X is odd we can take the middle value so here median is **6**
       - If the length of X is even then we take the average of 2 middle values 
   - The mean of this data is **5.1** which is close to a median value

**Advantages**
  - Extreme values (outliers) do not affect the median as strongly as they do the mean.
  - Useful when comparing sets of data.
  - It is unique - there is only one answer.

**Disadvantages**
  -  Not as popular as mean.

In [8]:
# lets take the same example
values = [12000, 13000, 12000, 14000, 15000, 13000, 15000, 16000, 17000, 14000, 13000, 12000]
values = sorted(values)

if len(values) % 2 == 0:
    mid_index = int(len(values)//2)
    median = (values[mid_index-1] + values[mid_index]) / 2
else:
    mid_index = int(len(values)//2)
    median = values[mid_index]

mean = sum(values) / len(values)
print(f"The median salary of surat : ${round(median,2)}")
print(f"The mean salary of surat: ${round(mean,2)}")

The median salary of surat : $13500.0
The mean salary of surat: $13833.33


# Mode

- The definition is simple, mode is the element that occurs most often in data. 
- If you plot your data in histograms, the mode will be the highest bar in a bar chart or histogram. 
- You can, therefore, sometimes consider the mode as being the most popular option.
- Lets consider an example X = [3,1,6,1,2,1,9]
    - Get the value which is repeated most of the times
    - The value 1 is the most repeated and it is the mode of the data. 
- one of the problems with the mode is that it is not unique, so it leaves us with problems when we have two or more values that share the highest frequency
- Lets consider an example X = [3,1,6,1,2,9,2]
    - The value 1 and 2 are the most repeated and those two values are the mode of the data. 
    - If we get one mode we call it as **Uni-modal**.
    - If we get two mode values we call it as **Bi-modal**.
    - If we get more than two we call it as **Multi-modal**.
- Another problem with the mode is that it will not provide us with a very good measure of central tendency when the most common value is far away from the rest of the data in the data set.
- It is possible to have more than one mode, and it is possible to have no mode.  If there is no mode-write "no mode", do not write zero (0).

So if we give an array with no repeated values, we get all values as a mode.

**Advantages**
   - Extreme values (outliers) do not affect the mode.
   
**Disadvantages**
   - Not as popular as mean and median.
   - Not necessarily unique - may be more than one answer
   - When no values repeat in the data set, the mode is every value and is useless.
   - When there is more than one mode, it is difficult to interpret and/or compare.

In [11]:
# lets calculate Mode
def Mode(value):
    value.sort()
    frequency_count = []

    i = 0
    while i<len(value):
        frequency_count.append(value.count(value[i]))
        i += 1
        
    #the occurrences for each number in sorted values
    count_dict = dict(zip(value,frequency_count))

    # now you can create a custom dictionary count_dict for value in sorted values : the occurrences of each value
    mode = {value for (value, count) in count_dict.items() if count == max(frequency_count)}
    return mode

value = [3,4,4,5,6,6,6,7,8,8,8,11]
mode = Mode(value)
print("Mode(s) is/are: "+ str(mode))


Mode(s) is/are: {8, 6}


# What to use? Mean, median or mode?

- There is no definite answer to this question.
- when to use which one? That really depends on the particular case.
- There are some rules of thumb, but there is no point of using only one number to describe a dataset.

**Two important points**
- If we add any value to all the points in the dataset. The Mean, Median and Mode values will be also shifted by the same value. If you add 5 to each data value, you will add 5 to the mean, mode and median.
- If we multiply any value to all the points in the dataset. The Mean, Median and Mode values will be also shifted by the same value. If you multiply 2 to each data value, you will multiple 2 to the mean, mode and median.
- Let's see an example

| Datatype | values | Mean | Mode | Median  |
|----------|--------|------|--------|-------|
| Original data   |   6, 7, 8, 10, 12, 14, 14, 15, 16, 20| 12.2 | 14 | 13 |
| Each value in original data added by 5    |   11, 12, 13, 15, 17, 19, 19, 20, 21, 25| 17.2 | 19 | 18
| Each value in original data multiplied by 2| 12, 14, 16, 20, 24, 28, 28, 30, 32, 40 | 24.4 | 28 |26|


![alt text](https://blog.minitab.com/hubfs/Imported_Blog_Media/1_curves.png)



```
If you understand it completely, then this image will be self-explanatory. 
```