#### About
> Kullback Leibler Divergence

Kullback-Leibler (KL) divergence is a measure of how different two probability distributions are from each other. It measures the amount of information lost when approximating one probability distribution with another.

The KL divergence between two probability distributions P and Q is defined as:

$D_{KL}(P||Q) = \sum_{i} P(i) \log\frac{P(i)}{Q(i)}$

The KL divergence is non-negative, and it is equal to zero if and only if P and Q are identical.

Use cases -

1. Model selection: KL divergence can be used to compare the performance of different models. For example, if we have two models that predict the same output, we can use KL divergence to measure the difference between the distributions of the predicted values.

2. Feature selection: KL divergence can be used to measure the information gain of adding a new feature to a model. For example, if we have a classification problem and we want to add a new feature to our model, we can use KL divergence to measure the difference between the class distributions with and without the new feature.

3. Optimization: KL divergence can be used as a loss function in optimization problems. For example, in some unsupervised learning problems, we want to find a distribution that is similar to the data distribution. We can use KL divergence to measure the difference between the data distribution and the model distribution and minimize it using gradient descent.


Suppose we have two probability distributions P and Q represented by the following arrays:



In [1]:
import numpy as np

P = np.array([0.3, 0.2, 0.5])
Q = np.array([0.4, 0.1, 0.5])


In [2]:
def kl_divergence(P, Q):
    return np.sum(P * np.log(P / Q))

print(kl_divergence(P, Q))


0.05232481437645474


KL divergence between P and Q is 0.05232481437645474, which indicates that the two distributions are very similar.