The Kullback-Leibler divergence, sometimes also called the relative entropy, of a distribution p from a distribution q is defined as:
The Kullback-Leibler divergence quantifies the average number of extra bits required to represent a distribution p when using an arbitrary distribution q. This can be seen through the following identity:
Where the cross_entropy
quantifies the total cost of encoding p using q, and the ../multivariate/entropy
quantifies the true, minimum cost of encoding p. For example, let's consider the cost of representing a biased coin by a fair one:
In [1]: from dit.divergences import kullback_leibler_divergence
In [2]: p = dit.Distribution(['0', '1'], [3/4, 1/4])
In [3]: q = dit.Distribution(['0', '1'], [1/2, 1/2])
@doctest float In [4]: kullback_leibler_divergence(p, q) Out[4]: 0.18872187554086717
That is, it costs us 0.1887 bits of wasted overhead by using a mismatched distribution.
Although the Kullback-Leibler divergence is often used to see how "different" two distributions are, it is not a metric. Importantly, it is neither symmetric nor does it obey the triangle inequality. It does, however, have the following property:
with equality if and only if p = q. This makes it a premetric.
kullback_leibler_divergence