In [1]:
%matplotlib inline
import itertools
import os
import numpy as np
import gpflow
import gpflow.training.monitor as mon
import numbers
import matplotlib.pyplot as plt
import tensorflow as tf
X = np.random.rand(10000, 1) * 10
Y = np.sin(X) + np.random.randn(*X.shape)
Xt = np.random.rand(10000, 1) * 10
Yt = np.sin(X) + np.random.randn(*X.shape)
np.random.seed(0)

# Demo: `gpflow.training.monitor`
In this notebook we'll demo how to use `gpflow.training.monitor` for logging the optimisation of a GPflow model. The example should cover pretty much all use cases.

## Creating the GPflow model
We first create the GPflow model. Under the hood, GPflow gives a unique name to each model which is used to name the Variables it creates in the TensorFlow graph containing a random identifier. This is useful in interactive sessions, where people may create a few models, to prevent variables with the same name conflicting. However, when loading the model, we need to make sure that the names of all the variables are exactly the same as in the checkpoint. This is why we pass `name="SVGP"` to the model constructor, and why we use `gpflow.defer_build()`.

In [2]:
with gpflow.defer_build():
    m = gpflow.models.SVGP(X, Y, gpflow.kernels.RBF(1), gpflow.likelihoods.Gaussian(),
                           Z=np.linspace(0, 10, 5)[:, None],
                           minibatch_size=100, name="SVGP")
    m.likelihood.variance = 0.01
m.compile()

In [3]:
m.compute_log_likelihood()

-1088595.1395058229

## Setting up the optimisation
Next we need to set up the optimisation process. `gpflow.training.monitor` provides classes that manage the optimsation, and perform certain logging tasks. In this example, we want to:
- log certain scalar parameters in TensorBoard
- log the full optimisation objective (log marginal likelihood bound) periodically, even though we optimise with minibatches
- store a backup of the optimisation process periodically
- log performance for a test set periodically

Because of the integration with TensorFlow ways of storing and logging, we will need to perform a few TensorFlow manipulations outside of GPflow as well.

We start by creating the `global_step` variable. This is not strictly required by TensorFlow optimisers, but they do all have support for it. Its purpose is to track how many optimisation steps have occurred. It is useful to keep this in a TensorFlow variable as this allows it to be restored together with all the parameters of the model.

In [4]:
global_step = tf.Variable(0, trainable=False, name="global_step")
m.enquire_session().run(global_step.initializer)

Next, we create the optimiser action. `make_optimize_action` also creates the optimisation tensor, which is added to the computational graph. Later, the saver will store the whole graph, and so can also restore the exact optimiser state.

In [5]:
adam = gpflow.train.AdamOptimizer(0.01).make_optimize_action(m, global_step=global_step)

## Creating actions for keeping track of the optimisation

In [6]:
m.kern.lengthscales.value

array(1.0)

In [20]:
def cb():
    m.anchor(m.enquire_session())
    print('lengthscales: {}'.format(m.kern.lengthscales.value))

In [7]:
print_lml = mon.PrintAction(itertools.count(), mon.Trigger.ITER, m, "lml", single_line=False)
callback = mon.CallbackAction(itertools.count(step=10), mon.Trigger.ITER,
#                               cb)
                              lambda : print('lengthscales: {}'.format(m.kern.lengthscales.value)))
sleep = mon.SleepAction(itertools.count(), mon.Trigger.ITER, 0.01)
saver = mon.StoreSession(itertools.count(step=10), mon.Trigger.ITER, m.enquire_session(),
                         hist_path="./monitor-saves/checkpoint", global_step=global_step)
actions = [adam, print_lml, callback, sleep, saver]

Restoring session from `./monitor-saves/checkpoint-472`.
INFO:tensorflow:Restoring parameters from ./monitor-saves/checkpoint-472


In [8]:
gpflow.actions.Loop(actions, stop=100)()
# m.anchor(m.enquire_session())  # <-- Why is this needed?

lml: iteration 1 likelihood -67081.8343
lengthscales: 1.0
lml: iteration 2 likelihood -53473.8822
lml: iteration 3 likelihood -58519.8139
lml: iteration 4 likelihood -66894.6986
lml: iteration 5 likelihood -79654.5435
lml: iteration 6 likelihood -85969.7993
lml: iteration 7 likelihood -71594.3317
lml: iteration 8 likelihood -66891.0894
lml: iteration 9 likelihood -48644.3054
lml: iteration 10 likelihood -74417.3452
lml: iteration 11 likelihood -79721.6210
lengthscales: 1.0
lml: iteration 12 likelihood -43877.1873
lml: iteration 13 likelihood -71479.6665
lml: iteration 14 likelihood -53425.0124
lml: iteration 15 likelihood -84990.4086
lml: iteration 16 likelihood -61294.9662
lml: iteration 17 likelihood -53121.8563
lml: iteration 18 likelihood -53664.7576
lml: iteration 19 likelihood -70662.5137
lml: iteration 20 likelihood -74941.6222
lml: iteration 21 likelihood -53200.7872
lengthscales: 1.0
lml: iteration 22 likelihood -67710.9326
lml: iteration 23 likelihood -62780.5405
lml: iterati

In [9]:
saver.r

AttributeError: 'StoreSession' object has no attribute 'r'

Next, we create an instance of `FileWriter`, which will save the TensorBoard logs to a file. This object needs to be shared between all `gpflow_monitor.TensorBoard` objects, if they are to write to the same path.

In [4]:
fw = tf.summary.FileWriter(os.path.join("./results/test/tensorboard/"), m.enquire_session().graph)

Now the TensorFlow side is set up, we can focus on the `gpflow_monitor` part. The optimsation is taken care of by the `ManagedOptimisation` class. This will run the training loop. The `ManagedOptimisation` object will also take care of running `Task`s.

Each `Task` is something that needs to be run periodically during the optimisation. The first and second parameters of all tasks are a generator returning times (either in iterations or time) of when the `Task` needs to be run. The second determines whether a number of iterations (`Trigger.ITER`), an amount of time spent optimising (`Trigger.OPTIMISATION_TIME`), or the wall-clock time (`Trigger.TOTAL_TIME`) triggers the `Task` to be run. The following `Task`s are run once in every 100 or 1000 iterations.

In [5]:
opt_method = ManagedOptimisation(m, gpflow.train.AdamOptimizer(0.01), global_step)
opt_method.tasks += [
    PrintTimings((x * 100 for x in itertools.count()), Trigger.ITER),
    ModelTensorBoard((x * 100 for x in itertools.count()), Trigger.ITER, m, fw),
    LmlTensorBoard((x * 1000 for x in itertools.count()), Trigger.ITER, m, fw, verbose=False),
    StoreSession((x * 1000 for x in itertools.count()), Trigger.ITER, m.enquire_session(), "./results/test/checkpoint")
]

INFO:tensorflow:Summary name full lml is illegal; using full_lml instead.


We may also want to perfom certain tasks that do not have pre-defined `Task` classes. For example, computing the performance on a test set. Here we create such a class by extending `ModelTensorBoard` to log the testing benchmarks in addition to all the scalar parameters.

In [6]:
class TestTensorBoard(ModelTensorBoard):
    def __init__(self, sequence, trigger: Trigger, model, file_writer, Xt, Yt):
        super().__init__(sequence, trigger, model, file_writer)
        self.Xt = Xt
        self.Yt = Yt
        self._full_test_err = tf.placeholder(gpflow.settings.tf_float, shape=())
        self._full_test_nlpp = tf.placeholder(gpflow.settings.tf_float, shape=())

        self.summary = tf.summary.merge([tf.summary.scalar("test_rmse", self._full_test_err),
                                         tf.summary.scalar("test_nlpp", self._full_test_nlpp)])

    def _event_handler(self, manager):
        minibatch_size = 100
        preds = np.vstack([m.predict_y(Xt[mb * minibatch_size:(mb + 1) * minibatch_size, :])[0]
                            for mb in range(-(-len(Xt) // minibatch_size))])
        test_err = np.mean((Yt - preds) ** 2.0)**0.5
        summary, step = m.enquire_session().run([self.summary, global_step],
                                      feed_dict={self._full_test_err: test_err,
                                                 self._full_test_nlpp: 0.0})
        self.file_writer.add_summary(summary, step)

We then add it to the task list.

In [7]:
opt_method.tasks.append(TestTensorBoard((x * 1000 for x in itertools.count()), Trigger.ITER, m, fw, Xt, Yt))

## Running the optimisation
We finally get to running the optimisation. The second time this is run, the session should be restored from a checkpoint created by `StoreSession`. To confirm this, we print out the first value in all TensorFlow tensors. This includes any values used by the optimiser. This is important to ensure that the optimiser starts off from _exactly_ the same state as that it left. If this is not done correctly, models may start diverging after loading.

In [8]:
sess = m.enquire_session()
[u[1] if isinstance(u[1], numbers.Number) else u[1].flatten()[0]  for u in sorted([(v.name, sess.run(v)) for v in tf.global_variables()], key=lambda x: x[0])]

[0.0,
 0.0,
 0.54132327263575086,
 0.0,
 0.0,
 0.54132327263575086,
 0.0,
 0.0,
 -4.6002665251585171,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 1.0,
 0.89999998,
 0.99900001,
 0]

In [None]:
opt_method.minimize(maxiter=8000)

1, 1:	2.16 optimisation iter/s	2.16 total iter/s	0.00 last iter/sFull lml: -1198200.924550 (-1.20e+06)
1000, 1000:	454.39 optimisation iter/s	325.05 total iter/s	576.21 last iter/sFull lml: -30515.580038 (-3.05e+04)
2000, 2000:	506.20 optimisation iter/s	367.32 total iter/s	570.79 last iter/sFull lml: -17402.941729 (-1.74e+04)
3000, 3000:	527.07 optimisation iter/s	384.49 total iter/s	573.41 last iter/sFull lml: -15024.233225 (-1.50e+04)


Here, we print the optimised variables for comparison on the next run.

In [None]:
[u[1] if isinstance(u[1], numbers.Number) else u[1].flatten()[0]  for u in sorted([(v.name, sess.run(v)) for v in tf.global_variables()], key=lambda x: x[0])]