# Video: Calculating Quantiles

This video shows an easy way to calculate quantiles in Python, and a faster way using NumPy's quantile functions.

Script:
* Quantile calculations are usually described as sorting the values in a big list, and then picking a position in the list based on the target fraction.
* This description is fine to describe what the results should be, but is not a very efficient computation-wise. We'll use that description once now, so you'll know how to calculate them easily, and can do so with minimal support.
* So here is some data we have from before.


In [None]:
data = [3, 4, 4, 5, 7, 2, 10]

Script:
* Python provides a sorted function that will take any container and return a list with the container's contents in sorted order.

In [None]:
sorted_data = sorted(data)

Script:
* So now, we just pick an entry in this sorted list based on which fraction we are targeting for the quantile.
* We can pick which entry blindly without looking at the data, just calculating the index, the position in the list, using the target fraction and the number of data points.
* The basic calculation for the index for target fraction p is

In [None]:
p=0.5

In [None]:
p * (len(data) - 1)

3.0

Script:
* Why minus one? Because list indexes in Python start at zero and end at the length - 1.
* So when p is zero, we want to look at index zero, the first, smallest data value.
* And when p is one, we want to look at index length minus one, the last, largest data value.
* And in between, move smoothly from the first value to the last value.
* Let's do that now.

In [None]:
sorted_data[p * (len(data) - 1)]

TypeError: list indices must be integers or slices, not float

Script:
* Oops, fractional list locations don't work. Makes sense since we are asking for a spot between list entries.
* There are two very common ways to deal with this. First is called the nearest rank method and just rounds up to get the next higher list entry.
* So that would be


In [None]:
import math

In [None]:
sorted_data[int(math.ceil(p * (len(data) - 1)))]

4

Script:
* The other common method is linear interpolation, using the fractional part to blend between the surrounding list entries.
* This is the usual method for the median which simplifies a bit.
* If you are calculating the median of an odd number of data points, then you just return the middle one.
* If you are calculating the median of an even number of data points, then you return the average of the middle two data points.
* Again, you'll want to use library functions to do this except for the most quick and dirty analysis on a small data set.

In [None]:
import numpy as np

In [None]:
np.quantile(data, 0.3)

3.8

In [None]:
np.quantile(data, 0.5)

4.0

In [None]:
np.median(data)

4.0

Script:
* Those NumPy functions will be much faster than the quick and dirty sorting methods.
