# The mean, median, and mode
The mean, median, and mode are the measures of central tendency that give us an idea of the distribution of a variable.

 **Mean** is the average value. It can be calculated by adding all the numbers in a dataset and then dividing the sum by the number of values.
 
 **Median** is the middle value when the values are sorted in ascending or descending order. If the number of values in the dataset is even, the median is the average of the two values in the middle.
 
For example, the following Series has six values. Run the code to verify that the median value is the average of the two values in the middle.

In [1]:
import pandas as pd

myseries = pd.Series([1, 2, 5, 7, 11, 36])

print(myseries.median())

6.0


**Mode** is the most frequent value in a dataset. Let’s run a quick example to find the mode of a dataset.

In [2]:
print(myseries.mode())

0     1
1     2
2     5
3     7
4    11
5    36
dtype: int64


In [4]:
import pandas as pd

myseries_ = pd.Series([1, 4, 6, 6, 6, 11, 11, 24])

print(f"The mode of my series is {myseries_.mode()[0]}")

The mode of my series is 6


In [5]:
print(myseries_.mode())

0    6
dtype: int64


In [6]:
my_series = pd.Series([1, 1, 2, 5, 7, 11, 36])

print(my_series.mode())

0    1
dtype: int64


Mean and median are only applicable to numerical data, while the only mode is applicable to categorical data. For instance, the most popular burger in a restaurant is the mode of the dataset that contains all of the burgers sold in that particulate restaurant, not in the entire town.

We might also be interested in the minimum and maximum values when exploring the data.

Let’s calculate the measure of central tendency of the price column in the sales along with the minimum and maximum values.

In [9]:
import pandas as pd

sales = pd.read_csv("sales.csv")

print("mean: ") 
print(sales["price"].mean())


print("median: ")
print(sales["price"].median())

print("mode: ")
print(sales["price"].mode()[0])

print("minimum: ")
print(sales["price"].min())

print("maximum: ")
print(sales["price"].max())

mean: 
67.06351000000001
median: 
23.74
mode: 
10.44
minimum: 
0.66
maximum: 
1500.05


The average price is about 67 and the median is about 23. This indicates that we have some products that are priced higher compared to other ones.

Measures of central tendency are fundamental to descriptive statistics. To better understand the distribution of a variable, we also need to account for the variance or standard deviation.

Variance is a measure of the variation among values. It can be calculated as follows:

1. Find the difference between each value in the dataset and the mean.
2. Take the square of the differences.
3. Find the average of the squared differences.

Standard deviation is a measure of how spread out the values are. To be more specific, it’s the square root of variance.

The <font color='red'>var</font> and <font color='red'>std</font> methods of Pandas can be used to calculate the variance and standard deviation, respectively.

In [10]:
import pandas as pd

sales = pd.read_csv("sales.csv")

print("variance: ")
print(sales["price"].var())

print("standard deviation: ")
print(sales["price"].std())

variance: 
20766.24382460458
standard deviation: 
144.10497501684173


The variance and standard deviation give us an idea of how the values are spread out around the measure of central tendency. In general, the higher the standard deviation is, the more spread out the values are.