Skip to content

Latest commit

 

History

History
68 lines (45 loc) · 1.78 KB

cross_entropy.rst

File metadata and controls

68 lines (45 loc) · 1.78 KB
.. py:module:: dit.divergences.cross_entropy

Cross Entropy

The cross entropy between two distributions p(x) and q(x) is given by:

\xH{p || q} = -\sum_{x \in \mathcal{X}} p(x) \log_2 q(x)

This quantifies the average cost of representing a distribution defined by the probabilities p(x) using the probabilities q(x). For example, the cross entropy of a distribution with itself is the entropy of that distribion because the entropy quantifies the average cost of representing a distribution:

.. ipython::

   In [1]: from dit.divergences import cross_entropy

   In [2]: p = dit.Distribution(['0', '1'], [1/2, 1/2])

   @doctest float
   In [3]: cross_entropy(p, p)
   Out[3]: 1.0

If, however, we attempted to model a fair coin with a biased on, we could compute this mis-match with the cross entropy:

.. ipython::

   In [4]: q = dit.Distribution(['0', '1'], [3/4, 1/4])

   @doctest float
   In [5]: cross_entropy(p, q)
   Out[5]: 1.207518749639422

Meaning, we will on average use about 1.2 bits to represent the flips of a fair coin. Turning things around, what if we had a biased coin that we attempted to represent with a fair coin:

.. ipython::

   @doctest float
   In [6]: cross_entropy(q, p)
   Out[6]: 1.0

So although the entropy of q is less than 1, we will use a full bit to represent its outcomes. Both of these results can easily be seen by considering the following identity:

\xH{p || q} = \H{p} + \DKL{p || q}

So in representing p using q, we of course must at least use \H{p} bits -- the minimum required to represent p -- plus the Kullback-Leibler divergence of q from p.

API

.. autofunction:: cross_entropy