# Quantile-Quantile Plots (QQ plots) 

Quantile-Quantile plots, also known as QQ plots, are a useful tool for comparing data distributions. This notebook will explain what QQ plots are and how to interpret them. The concept of quantiles will be used, which are points in a distribution that relate to the rank order of values in that distribution. For more information on quantiles, refer to the StatQuest video on quantiles and percentiles.

## What is a QQ plot?

A QQ plot is a graphical tool to help us assess if a dataset follows a theoretical distribution. It is constructed by plotting the quantiles of the data against the quantiles of the chosen theoretical distribution. If the data follows the theoretical distribution, the points in the QQ plot will roughly fall on the line $y = x$.

## How to construct a QQ plot?

Here are the steps to construct a QQ plot:

1. **Sort your data**: Arrange the data points from least to greatest.
2. **Calculate Quantiles**: Assign each data point a quantile based on its position in the sorted list.
3. **Choose a theoretical distribution and calculate its quantiles**: This could be any distribution like Normal, Uniform etc. The chosen distribution should have the same number of quantiles as in the data.
4. **Create the plot**: Plot the quantiles from the data (on y-axis) against the quantiles from the theoretical distribution (on x-axis).
5. **Interpret the plot**: If the data follows the theoretical distribution, the points will approximately lie on the line $y = x$. If the points deviate from this line, it suggests that the data may not follow the theoretical distribution.

Let's see an example on how to create a QQ plot in Python.

In [None]:
# Import necessary libraries
import numpy as np
import scipy.stats as stats
import matplotlib.pyplot as plt

# Generate some data
np.random.seed(0)
data = np.random.normal(0, 1, 1000)

# Calculate quantiles
quantiles = stats.probplot(data, dist="norm", plot=plt)
plt.show()

The red line in the above plot is the line $y = x$, which represents the theoretical quantiles. The blue dots represent the quantiles of our data. As we can see, the blue dots approximately follow the red line, suggesting that our data follows a normal distribution.

QQ plots can also be used to compare two datasets to check if they come from the same distribution. To do this, we simply plot the quantiles of one dataset against the quantiles of the other dataset.

Remember that QQ plots do not prove that your data comes from a certain distribution, it only suggests that it could. Other tests, like the Kolmogorov-Smirnov test, can be used to statistically test the distribution of the data.