# KL Divergence Visualization

In this notebook, we will run some simple simulations to verify your results.

In particular, we will consider the coin-toss example, using weighted coins.

Let's first set up some utility functions, and generate the true weights of our coins.


In [None]:
import numpy as np
import matplotlib.pyplot as plt
from ipywidgets import interactive
import ipywidgets as widgets
import warnings
warnings.filterwarnings('ignore')


In [None]:
p_head = 0.60 + np.random.random() * 0.15
print("True P(head) = {}".format(p_head))


In [None]:
def prob_widget(title):
    return widgets.FloatSlider(
        value=p_head,
        min=0.01,
        max=0.99,
        step=0.01,
        description=title,
        disabled=False,
        continuous_update=False,
        orientation='horizontal',
        readout=True,
        readout_format='f')


def toss(num_coins, p=None):
    p = p or p_head
    return np.where(np.random.random(num_coins) > p, 0, 1)


Now that we've set up our utility functions, let's look at the distribution of our coin tosses.


In [None]:
num_trials = 10000
n = 100

data = np.array([sum(toss(n)) for _ in range(num_trials)])

plt.hist(data, bins=range(n))
plt.show()


This is a nice binomial distribution centered at $p_{head} n$, as we'd expect.

## Part (a)
Let's now look at our distribution from part (a). First, we will plot the number of possible ways to achieve a certain number of heads, out of 100 total coin tosses.


In [None]:
def entropy(p):
    return sum(p * np.log(1 / p))

def log_num_possibilities(n, num_heads):
    f_n = np.array([n - num_heads, num_heads])
    return n * entropy(f_n / n)

def num_possibilities(n, num_heads):
    return np.exp(log_num_possibilities(n, num_heads))

xs = range(1, 100)
plt.plot(xs, [num_possibilities(n, num_heads) for num_heads in xs])
plt.show()


Ignoring the vertical scale and focusing only on its relative magnitude, does this distribution match the histogram of the coin toss samples? Why or why not?


### begin (a) ###

### end (a) ###


## Part (b)

We will now compute and plot the probability of our observations (generated using `toss`) working under the hypothesis that the coin is fair (equally likely to be heads or tails). Note that we actually plot the log of our probability, to reduce numerical error.


In [None]:
def log_p_empirical_type(n, num_heads, p_guess):
    return log_num_possibilities(n, num_heads) + num_heads * np.log(p_guess) + (n - num_heads) * np.log(1 - p_guess)

def p_empirical_type(n, num_heads, p_guess):
    return np.exp(log_p_empirical_type(n, num_heads, p_guess))

def plot_empirical_prob(p, p_guess):
    candidate_ns = range(100, 1000)
    p_observations = []
    for n in candidate_ns:
        num_heads = sum(toss(n, p))
        p_observations.append(log_p_empirical_type(n, num_heads, p_guess))

    plt.plot(candidate_ns, p_observations)

p = prob_widget("True probability")
p_guess = prob_widget("Hypothesis probability")
interactive(plot_empirical_prob, p=p, p_guess=p_guess)


Comment on how the slope of the above plot varies with the true and hypothesized $p_{head}$. (be aware that the y-axis scale will change as you drag the slider!) When is it steepest? When is it flattest? How does this relate to your observations?


### begin (b) ###

### end (b) ###


The next plot normalizes the above probability by $1/n$. We return to the hypothesis of a fair coin and the true distribution that we chose at the beginning of the notebook, ignoring the values chosen with the sliders above.

As you proved in the theory section, this quantity should converge to the (negation of the) KL divergence of the empirical model from the true one. Look at the plot and see whether it converges to the value we expect (computed below). Does it do so? (no response necessary, just observe the plot)


In [None]:
candidate_ns = range(100, 10000)
p_observations = []
for n in candidate_ns:
    num_heads = sum(toss(n))
    p_observations.append(log_p_empirical_type(n, num_heads, 0.5) / n)

plt.plot(candidate_ns, p_observations)


In [None]:
def compute_kl(p, q):
    return sum(p * np.log(p / q))

f = np.array([1 - p_head, p_head]) # true distribution
p = np.array([0.5, 0.5]) # hypothesis distribution

kl = compute_kl(f, p)
print("KL divergence: {}".format(kl))
