# Sequential Active Learning Worked Example

## Set up

Three features $(x1, x_2, x3)$ and two labels $(y_0, y_1)$

The hypothesis space is given by:

$$
h_1 = [1, 1, 1] \\
h_2 = [0, 1, 1] \\
h_3 = [0, 0, 1] \\
h_4 = [0, 0, 0]
$$

The learner's prior over hypotheses is uniform, $p_L(h') = 1/4 \quad \forall h' \in h$

In [62]:
import numpy as np

def create_boundary_hyp_space(n_features):
    """Creates a hypothesis space of concepts defined by a linear boundary"""
    hyp_space = []
    for i in range(n_features + 1):
        hyp = [1 for _ in range(n_features)]
        hyp[:i] = [0 for _ in range(i)]
        hyp_space.append(hyp)
    hyp_space = np.array(hyp_space)
    return hyp_space

# initialize model
n_features = 3  # number of features
features = np.arange(n_features)  # features
n_labels = 2  # number of labels
labels = np.arange(n_labels)  # labels
hyp_space = create_boundary_hyp_space(n_features)
n_hyp = len(hyp_space)  # number of hypotheses
hyp_shape = (n_hyp, n_features, n_labels)  # shape of structures


# set learner's prior p_L(h) to be uniform over hypotheses
prior = 1 / n_hyp * np.ones(hyp_shape)

assert np.allclose(np.sum(learner_prior, axis=0), 1.0)

The likelihood $p(y|x, h)$

In [63]:
lik = np.ones(hyp_shape)

for i, hyp in enumerate(hyp_space):
    for j, feature in enumerate(features):
        for k, label in enumerate(labels):
            if hyp[feature] == label:
                lik[i, j, k] = 1
            else:
                lik[i, j, k] = 0
                
assert lik.shape == hyp_shape

The posterior $p(h|x, y) \propto p(x, y|h)p(h)$

In [64]:
posterior = lik * prior
posterior = posterior / np.sum(posterior, axis=0)

assert np.allclose(np.sum(posterior, axis=0), 1.0)

The entropy of a random variable is given by: $H(X) = -\sum_x p(x) \log_2(p(x))$
We calculate the entropy of the prior and posterior using $H(p(h))$ and $H(p(h|x, y))$ respectively

In [65]:
def entropy(X):
    assert np.isclose(np.sum(X), 1.0)  # check for valid probability distribution
    return -1 * np.nansum(X * np.log2(X))

prior_entropy = np.empty((n_features, n_labels))
posterior_entropy = np.empty((n_features, n_labels))

for i, feature in enumerate(features):
    for j, label in enumerate(labels):
        prior_entropy[feature, label] = entropy(prior[:, i, j])
        posterior_entropy[feature, label] = entropy(posterior[:, i, j])

  app.launch_new_instance()
  app.launch_new_instance()


Next, we calculate the amount of information gained by calculating $IG(x, y) = H(p(h)) - H(p(h|x, y))$

In [66]:
information_gain = prior_entropy - posterior_entropy
print(information_gain)

[[ 0.4150375  2.       ]
 [ 1.         1.       ]
 [ 2.         0.4150375]]


Finally, we calculate the expected information gain by averaging across all possible label observations $EIG(x) = H(p(h)) - \sum_y p(y|x) H(p(h|x, y))$

In [70]:
label_predictive = np.sum(lik * prior, axis=0) 
foo = prior_entropy[:,] - np.sum(label_predictive * posterior_entropy, axis=1)

In [71]:
foo / np.sum(foo)

array([ 0.30934632,  0.38130736,  0.30934632])