In [1]:
%matplotlib inline
import itertools
import os
os.environ['CUDA_VISIBLE_DEVICES']=""
import numpy as np
import gpflow
import gpflow.training.monitor as mon
import numbers
import matplotlib.pyplot as plt
import tensorflow as tf

# Demo: `gpflow.training.monitor`
In this notebook we'll demo how to use `gpflow.training.monitor` for logging the optimisation of a GPflow model.

## Creating the GPflow model
We first generate some random data and create a GPflow model.

Under the hood, GPflow gives a unique name to each model which is used to name the Variables it creates in the TensorFlow graph containing a random identifier. This is useful in interactive sessions, where people may create a few models, to prevent variables with the same name conflicting. However, when loading the model, we need to make sure that the names of all the variables are exactly the same as in the checkpoint. This is why we pass name="SVGP" to the model constructor, and why we use gpflow.defer_build().

In [2]:
np.random.seed(0)
X = np.random.rand(10000, 1) * 10
Y = np.sin(X) + np.random.randn(*X.shape)
Xt = np.random.rand(10000, 1) * 10
Yt = np.sin(Xt) + np.random.randn(*Xt.shape)

with gpflow.defer_build():
    m = gpflow.models.SVGP(X, Y, gpflow.kernels.RBF(1), gpflow.likelihoods.Gaussian(),
                           Z=np.linspace(0, 10, 5)[:, None],
                           minibatch_size=100, name="SVGP")
    m.likelihood.variance = 0.01
m.compile()

Let's compute log likelihood before the optimisation

In [3]:
print('LML before the optimisation: %f' % m.compute_log_likelihood())

LML before the optimisation: -1271605.621944


We will be using a TensorFlow optimiser. All TensorFlow optimisers have a support for `global_step` variable. Its purpose is to track how many optimisation steps have occurred. It is useful to keep this in a TensorFlow variable as this allows it to be restored together with all the parameters of the model.

The code below creates this variable using a monitor's helper function. It is important to create it before building the monitor in case the monitor includes a checkpoint task. This is because the checkpoint internally uses the TensorFlow Saver which creates a list of variables to save. Therefore all variables expected to be saved by the checkpoint task should exist by the time the task is created.

In [4]:
session = m.enquire_session()
global_step = mon.create_global_step(session)

## Construct the monitor

Next we need to construct the monitor. `gpflow.training.monitor` provides classes that are building blocks for the monitor. Essengially, a monitor is a function that is provided as a callback to an optimiser. It consists of a number of tasks that may be executed at each step, subject to their running condition.

In this example, we want to:
- log certain scalar parameters in TensorBoard,
- log the full optimisation objective (log marginal likelihood bound) periodically, even though we optimise with minibatches,
- store a backup of the optimisation process periodically,
- log performance for a test set periodically.

We will define these tasks as follows:

In [5]:
print_task = mon.PrintTimingsTask().with_name('print')\
    .with_condition(mon.PeriodicIterationCondition(10))\
    .with_exit_condition(True)

sleep_task = mon.SleepTask(0.01).with_name('sleep').with_name('sleep')

saver_task = mon.CheckpointTask('./monitor-saves').with_name('saver')\
    .with_condition(mon.PeriodicIterationCondition(10))\
    .with_exit_condition(True)

std_tboard_task = mon.StandardTensorBoardTask(m, './model-tensorboard').with_name('std_tboard')\
    .with_condition(mon.PeriodicIterationCondition(10))\
    .with_exit_condition(True)

lml_tboard_task = mon.LmlTensorBoardTask(m, './model-tensorboard').with_name('lml_tboard')\
    .with_condition(mon.PeriodicIterationCondition(100))\
    .with_exit_condition(True)

As the above code shows, each task can be assigned a name and running conditions. The name will be shown in the task timing summary.

There are two different types of running conditions: `with_condition` controls execution of the task at each iteration in the optimisation loop. `with_exit_condition` is a simple boolean flag indicating that the task should also run at the end of optimisation.
In this example we want to run our tasks periodically, at every iteration or every 10th or 100th iteration.


## Custom tasks
We may also want to perfom certain tasks that do not have pre-defined `Task` classes. For example, we may want to compute the performance on a test set. Here we create such a class by extending `BaseTensorBoardTask` to log the testing benchmarks in addition to all the scalar parameters.

In [6]:
class CustomTensorBoardTask(mon.BaseTensorBoardTask):
    def __init__(self, model, event_path, Xt, Yt):
        super().__init__(model, event_path)
        self.Xt = Xt
        self.Yt = Yt
        self._full_test_err = tf.placeholder(gpflow.settings.tf_float, shape=())
        self._full_test_nlpp = tf.placeholder(gpflow.settings.tf_float, shape=())
        self._summary = tf.summary.merge([tf.summary.scalar("test_rmse", self._full_test_err),
                                         tf.summary.scalar("test_nlpp", self._full_test_nlpp)])
    
    def run(self, context: mon.MonitorContext, *args, **kwargs) -> None:
        minibatch_size = 100
        preds = np.vstack([self.model.predict_y(Xt[mb * minibatch_size:(mb + 1) * minibatch_size, :])[0]
                            for mb in range(-(-len(Xt) // minibatch_size))])
        test_err = np.mean((Yt - preds) ** 2.0)**0.5
        self._eval_summary(context, {self._full_test_err: test_err, self._full_test_nlpp: 0.0})

        
custom_tboard_task = CustomTensorBoardTask(m, './model-tensorboard', Xt, Yt).with_name('custom_tboard')\
    .with_condition(mon.PeriodicIterationCondition(100))\
    .with_exit_condition(True)

Now we can put all these tasks into a monitor.

In [7]:
monitor_tasks = [print_task, std_tboard_task, lml_tboard_task, custom_tboard_task, saver_task, sleep_task]
monitor = mon.Monitor(monitor_tasks, session, global_step)

## Running the optimisation
We finally get to running the optimisation.

We may want to continue a previously run optimisation by resotring the TensorFlow graph from the latest checkpoint. Otherwise skip this step.

In [8]:
if os.path.isdir('./monitor-saves'):
    mon.restore_session(session, './monitor-saves')

Restoring session from `./monitor-saves/cp-900`.
INFO:tensorflow:Restoring parameters from ./monitor-saves/cp-900


In [9]:
try:
    optimiser = gpflow.train.AdamOptimizer(0.01)
    monitor.start_monitoring()
    optimiser.minimize(m, step_callback=monitor, maxiter=450, global_step=global_step)
finally:
    monitor.stop_monitoring()
    monitor.print_summary()

Iteration 10	total itr.rate 13.12/s	recent itr.rate 13.12/s	opt.step 910	total opt.rate 14.89/s	recent opt.rate 14.89/s
Iteration 20	total itr.rate 19.26/s	recent itr.rate 36.25/s	opt.step 920	total opt.rate 29.32/s	recent opt.rate 948.09/s
Iteration 30	total itr.rate 24.01/s	recent itr.rate 47.41/s	opt.step 930	total opt.rate 43.14/s	recent opt.rate 754.96/s
Iteration 40	total itr.rate 27.53/s	recent itr.rate 49.13/s	opt.step 940	total opt.rate 56.59/s	recent opt.rate 880.83/s
Iteration 50	total itr.rate 30.03/s	recent itr.rate 47.11/s	opt.step 950	total opt.rate 69.33/s	recent opt.rate 693.44/s
Iteration 60	total itr.rate 32.22/s	recent itr.rate 50.70/s	opt.step 960	total opt.rate 81.52/s	recent opt.rate 678.44/s
Iteration 70	total itr.rate 33.86/s	recent itr.rate 48.76/s	opt.step 970	total opt.rate 93.30/s	recent opt.rate 700.50/s
Iteration 80	total itr.rate 34.21/s	recent itr.rate 36.87/s	opt.step 980	total opt.rate 105.07/s	recent opt.rate 896.01/s
Iteration 90	total itr.rate 35.4

 26%|██▌       | 26/100 [00:00<00:00, 257.39it/s]

Iteration 100	total itr.rate 36.43/s	recent itr.rate 48.94/s	opt.step 1000	total opt.rate 126.02/s	recent opt.rate 522.29/s


100%|██████████| 100/100 [00:00<00:00, 392.21it/s]


Iteration 110	total itr.rate 31.12/s	recent itr.rate 12.67/s	opt.step 1010	total opt.rate 136.35/s	recent opt.rate 755.99/s
Iteration 120	total itr.rate 32.04/s	recent itr.rate 47.28/s	opt.step 1020	total opt.rate 146.30/s	recent opt.rate 743.00/s
Iteration 130	total itr.rate 32.76/s	recent itr.rate 44.95/s	opt.step 1030	total opt.rate 155.63/s	recent opt.rate 663.28/s
Iteration 140	total itr.rate 33.55/s	recent itr.rate 48.99/s	opt.step 1040	total opt.rate 165.07/s	recent opt.rate 779.95/s
Iteration 150	total itr.rate 34.26/s	recent itr.rate 48.54/s	opt.step 1050	total opt.rate 174.93/s	recent opt.rate 1068.59/s
Iteration 160	total itr.rate 35.17/s	recent itr.rate 58.32/s	opt.step 1060	total opt.rate 183.52/s	recent opt.rate 696.94/s
Iteration 170	total itr.rate 35.72/s	recent itr.rate 47.69/s	opt.step 1070	total opt.rate 192.22/s	recent opt.rate 794.86/s
Iteration 180	total itr.rate 36.21/s	recent itr.rate 47.16/s	opt.step 1080	total opt.rate 200.20/s	recent opt.rate 682.01/s
Iterati

 22%|██▏       | 22/100 [00:00<00:00, 219.27it/s]

Iteration 200	total itr.rate 37.03/s	recent itr.rate 47.99/s	opt.step 1100	total opt.rate 215.58/s	recent opt.rate 750.17/s


100%|██████████| 100/100 [00:00<00:00, 250.86it/s]


Iteration 210	total itr.rate 32.94/s	recent itr.rate 10.25/s	opt.step 1110	total opt.rate 223.19/s	recent opt.rate 760.43/s
Iteration 220	total itr.rate 33.41/s	recent itr.rate 47.71/s	opt.step 1120	total opt.rate 230.45/s	recent opt.rate 726.76/s
Iteration 230	total itr.rate 33.84/s	recent itr.rate 47.50/s	opt.step 1130	total opt.rate 237.41/s	recent opt.rate 707.27/s
Iteration 240	total itr.rate 34.27/s	recent itr.rate 48.42/s	opt.step 1140	total opt.rate 244.13/s	recent opt.rate 700.41/s
Iteration 250	total itr.rate 34.66/s	recent itr.rate 47.64/s	opt.step 1150	total opt.rate 250.50/s	recent opt.rate 669.16/s
Iteration 260	total itr.rate 35.07/s	recent itr.rate 49.71/s	opt.step 1160	total opt.rate 258.77/s	recent opt.rate 1484.84/s
Iteration 270	total itr.rate 35.41/s	recent itr.rate 47.30/s	opt.step 1170	total opt.rate 264.99/s	recent opt.rate 706.51/s
Iteration 280	total itr.rate 35.75/s	recent itr.rate 48.57/s	opt.step 1180	total opt.rate 271.45/s	recent opt.rate 793.03/s
Iterati

 36%|███▌      | 36/100 [00:00<00:00, 352.95it/s]

Iteration 300	total itr.rate 36.35/s	recent itr.rate 47.44/s	opt.step 1200	total opt.rate 283.23/s	recent opt.rate 759.64/s


100%|██████████| 100/100 [00:00<00:00, 422.58it/s]


Iteration 310	total itr.rate 34.95/s	recent itr.rate 16.21/s	opt.step 1210	total opt.rate 289.07/s	recent opt.rate 756.62/s
Iteration 320	total itr.rate 35.27/s	recent itr.rate 49.19/s	opt.step 1220	total opt.rate 294.45/s	recent opt.rate 695.87/s
Iteration 330	total itr.rate 35.55/s	recent itr.rate 47.71/s	opt.step 1230	total opt.rate 299.72/s	recent opt.rate 701.22/s
Iteration 340	total itr.rate 35.84/s	recent itr.rate 49.10/s	opt.step 1240	total opt.rate 305.25/s	recent opt.rate 780.17/s
Iteration 350	total itr.rate 36.08/s	recent itr.rate 46.40/s	opt.step 1250	total opt.rate 309.78/s	recent opt.rate 625.71/s
Iteration 360	total itr.rate 36.31/s	recent itr.rate 46.92/s	opt.step 1260	total opt.rate 314.40/s	recent opt.rate 658.33/s
Iteration 370	total itr.rate 36.56/s	recent itr.rate 48.46/s	opt.step 1270	total opt.rate 318.97/s	recent opt.rate 669.18/s
Iteration 380	total itr.rate 36.78/s	recent itr.rate 47.49/s	opt.step 1280	total opt.rate 323.71/s	recent opt.rate 717.86/s
Iteratio

 20%|██        | 20/100 [00:00<00:00, 196.31it/s]

Iteration 400	total itr.rate 37.22/s	recent itr.rate 48.31/s	opt.step 1300	total opt.rate 329.92/s	recent opt.rate 573.24/s


100%|██████████| 100/100 [00:00<00:00, 241.21it/s]


Iteration 410	total itr.rate 35.12/s	recent itr.rate 10.76/s	opt.step 1310	total opt.rate 334.40/s	recent opt.rate 732.61/s
Iteration 420	total itr.rate 35.34/s	recent itr.rate 48.11/s	opt.step 1320	total opt.rate 338.70/s	recent opt.rate 716.14/s
Iteration 430	total itr.rate 35.57/s	recent itr.rate 48.42/s	opt.step 1330	total opt.rate 342.59/s	recent opt.rate 661.55/s
Iteration 440	total itr.rate 35.80/s	recent itr.rate 50.21/s	opt.step 1340	total opt.rate 346.78/s	recent opt.rate 732.64/s
Iteration 450	total itr.rate 36.01/s	recent itr.rate 47.91/s	opt.step 1350	total opt.rate 350.71/s	recent opt.rate 699.44/s


 44%|████▍     | 44/100 [00:00<00:00, 438.82it/s]

Iteration 450	total itr.rate 35.21/s	recent itr.rate 0.00/s	opt.step 1350	total opt.rate 308.71/s	recent opt.rate 0.00/s


100%|██████████| 100/100 [00:00<00:00, 458.09it/s]


Tasks execution time summary:
print:	0.0426 (sec)
std_tboard:	0.1860 (sec)
lml_tboard:	1.5571 (sec)
custom_tboard:	1.4538 (sec)
saver:	4.0203 (sec)
sleep:	4.5388 (sec)


Now lets compute the log likelihood again. Hopefully we will see an increase in its value

In [10]:
print('LML after the optimisation: %f' % m.compute_log_likelihood())

LML after the optimisation: -13787.010035
