This Notebook covers the concept of Percentile, Range and Interquartile range. These concepts fall under the topic of Measure Of Dispersion.

# Percentile

It is defined as the percentage of values found under the specific values. They are commonly reported as 'Quartiles' (25th, 50th and 75th percentile). The 50th percentile is also known as Median. Percentiles are extremely valuable in summarizing the tails (outer range) of the data distribution. <br>

For the given example in Python, Percentile() of the numpy library will be used. <br>
Formula used to calculate the percentile is as follows: <br>
Percentile = i + (j-i) * fraction
Link for more information on numpy percentile - https://numpy.org/doc/stable/reference/generated/numpy.percentile.html

In [2]:
import numpy as np

List = [10, 20, 30, 40, 50, 60, 70, 80, 90, 100]

# distance between first and the last element (d) = 10 - 1 = 9
# Index of the 25th percentile = 9 * 0.25 = 2.25
# List[2] = 30 and list[3] = 40
# Using Linear Interpolation:
# 30 + 0.25 * (40 - 30) = 32.5
print("25th Percentile of the List is:", np.percentile(List, 25))

# distance between first and the last element (d) = 10 - 1 = 9
# Index of the 25th percentile = 9 * 0.50 = 4.5
# List[4] = 50 and list[5] = 60
# Using Linear Interpolation:
# 50 + 0.50 * (60 - 50) = 55
print("50th Percentile of the List is:", np.percentile(List, 50))

# distance between first and the last element (d) = 10 - 1 = 9
# Index of the 25th percentile = 9 * 0.75 = 6.75
# List[6] = 70 and list[7] = 80
# Using Linear Interpolation:
# 70 + 0.75 * (80 - 70) = 77.5
print("75th Percentile of the List is:", np.percentile(List, 75))

25th Percentile of the List is: 32.5
50th Percentile of the List is: 55.0
75th Percentile of the List is: 77.5


# Range

It is the difference between the maximum value and the minimum value in the list. It is extremely sensitive to outliers and thus not a useful tool for general measure of dispersion in data.

In [4]:
List = [10, 20, 30, 40, 50, 60, 70, 80, 90, 100]

Range = max(List) - min(List)
print("Range is:" + str(Range))

Range is:90


# Quartiles and InterQuartile Range (IQR)

Quartiles of an ordered dataset are the three points which split the dataset into four equal groups.
The three quartiles are defined as follows: <br>
Q1 - The first quartile is the middle number between the smallest number and it's median in a list. <br>
Q2 - The second quartile is the median of the list.
Q3 - The third quartile is the middle number between the list's median and it's larget number.</br>

InterQuartile Range is the difference between first quartile (Q1) and third quartile (Q3).

In [5]:
# In this example, the list is already sorted. If they are not sorted then we need to sort them first before finding IQR.

List = [10, 20, 30, 40, 50, 60, 70, 80, 90, 100]

Q1 = np.median(List[:5])
print("Q1 is:" + str(Q1))

Q3 = np.median(List[5:])
print("Q3 is:" + str(Q3))

IQR = Q3 - Q1
print("InterQuartile Range is:" + str(IQR))

Q1 is:30.0
Q3 is:80.0
InterQuartile Range is:50.0
