# Percentiles

A percentile is a measure used in statistics to indicate the value below which a certain percentage of observations in a dataset falls. More specifically, the pth percentile is the value below which p percent of the observations in the dataset falls. <br>

For example, if the 90th percentile of a dataset is 100, then it means that 90% of the observations in the dataset are below 100, and the remaining 10% are above 100. Similarly, if the 25th percentile of a dataset is 50, then it means that 25% of the observations in the dataset are below 50, and 75% are above 50. <br>

Percentiles are often used in data analysis to understand the distribution of the data and to identify outliers. They can also be used to compare datasets and to calculate various summary statistics such as quartiles, medians, and interquartile ranges. <br>

In [None]:
import numpy as np

# Create a sample data
data = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])

# Calculate the 50th percentile (median)
p50 = np.percentile(data, 50)

# Calculate the 75th percentile
p75 = np.percentile(data, 75)

print("50th percentile: ", p50)
print("75th percentile: ", p75)


A quantile is a measure used in statistics to divide a dataset into smaller groups or subsets of equal sizes. More specifically, a quantile represents the value below which a certain percentage of the dataset falls. <br>

For example, the median is the 50th percentile, which represents the value below which 50% of the data falls. Similarly, the first quartile or 25th percentile represents the value below which 25% of the data falls, and the third quartile or 75th percentile represents the value below which 75% of the data falls. <br>

In general, the pth percentile can be defined as the value below which p percent of the data falls. For instance, the 90th percentile represents the value below which 90% of the data falls. <br>

In [14]:
import pandas as pd

# Create a sample data
data = pd.Series([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])

# Calculate the 25th percentile (1st quartile)
p25 = data.quantile(0.25)

# Calculate the 50th percentile (median)
p50 = data.quantile(0.5)

# Calculate the 75th percentile
p75 = data.quantile(0.75)

# Calculate the 90th percentile
p90 = data.quantile(0.90)

print("25th percentile: ", p25)
print("50th percentile: ", p50)
print("75th percentile: ", p75)
print("90th percentile: ", p90)


25th percentile:  3.25
50th percentile:  5.5
75th percentile:  7.75
90th percentile:  9.1


Lets plot the percentiles.

In [17]:
import numpy as np
import plotly.graph_objects as go

# Create a sample data
# data = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
# generates an array of 1000 random numbers that are normally distributed with a mean of 10 and a standard deviation of 2.
data = np.random.normal(10, 2, 1000)

# Calculate the percentiles
p25 = np.percentile(data, 25)
p50 = np.percentile(data, 50)
p75 = np.percentile(data, 75)

# Create a box plot
fig = go.Figure()
fig.add_trace(go.Box(y=data, name='Data'))

# Add percentile lines
fig.add_shape(type='line', x0=0, y0=p25, x1=1, y1=p25, line=dict(color='red', width=2))
fig.add_shape(type='line', x0=0, y0=p50, x1=1, y1=p50, line=dict(color='blue', width=2))
fig.add_shape(type='line', x0=0, y0=p75, x1=1, y1=p75, line=dict(color='green', width=2))

# Update layout
fig.update_layout(title='Percentile Plot',
                  yaxis_title='Value',
                  showlegend=True)

# Show plot
fig.show()
