# Xmas Bayesian Optimisation Guessing Game
This notebook implements a xmas-themed guessing game to try and find the maximum of a black-box function.

In [95]:
import ipywidgets as widgets
from IPython.display import display
import numpy as np
from scipy.interpolate import interp1d
import pandas as pd
import altair as alt
import gpflow
import blackbox as bb
from importlib import reload
reload(bb);

## The scenario
It is Xmas Eve and Santa has to deliver his presents to all the kids around the world.
Unfortunately it has been a really cold winter and some houses are completely blanketed by snow.
Santa has a shovel and could dig down to the chimney of each house to deliver his presents if only he knew where the chimneys were.

Fortunately Rudolph can help.
He can measure the depth of the snow at any point with his antlers but unfortunately this takes some time and the measurements are noisy.

Santa only has one night to deliver his presents, how can he find the chimneys in time?

Let's examine what a typical house's skyline looks like.
`x` represents the location and `depth` measures the depth below the snowline.

In [103]:
x, depth = bb.random_submerged_skyline(100)
house = pd.DataFrame(dict(x=x, depth=depth))
house_chart = (
    alt.Chart(house)
    .mark_line()
    .encode(x='x', y='depth'))
house_chart

One strategy to find the chimney is to lay down a grid and ask Rudolph to measure the height at each grid point.

Let's fix the standard deviation of the Rudolph's measurement error and choose a number of grid points:

In [112]:
error_sd = widgets.FloatSlider(
    value=.03,
    min=0.02,
    max=.1,
    step=0.01,
    description='error sd:',
    readout_format='.2f',
)
npoints = widgets.IntSlider(
    value=20,
    min=3,
    max=100,
    step=1,
    description='grid size:',
    readout_format='d'
)
display(error_sd)
display(npoints)

FloatSlider(value=0.03, description='error sd:', max=0.1, min=0.02, step=0.01)

IntSlider(value=20, description='grid size:', min=3)

Rudolph makes his measurements.

In [116]:
measurements = pd.DataFrame(dict(x=np.linspace(house['x'].min(), house['x'].max(), npoints.value)))
f = interp1d(house['x'], house['depth'])
measurements['f'] = f(measurements['x'])
measurements['y'] = measurements['f'] + bb.rng.normal(0, error_sd.value, size=npoints.value)
measurements_chart = (
    alt.Chart(measurements)
    .mark_point(color='red')
    .encode(x='x', y='y'))
measurements_chart + house_chart

We can see that there are many wasted measurements in locations where we are reasonably sure the maximum is not located.
In addition it is still difficult to be sure where the chimney is.

How can we do better?

## Bayesian optimisation
This is where Bayesian optimisation comes in.

Bayesian optimisation (BO) requires:
- a prior on the underlying function to be optimised (the house depth)
- a prior on the measurement error (using Rudolph's antlers)

Using some underlying theory, Bayesian optimisation then iterates around the following loop:
- it chooses the next measurement location to try to minimise the total number of measurements required (the acquisition function)
- asks the black-box function for a noisy measurement at that location
- updates its model of the underlying function and measurement error (computes a posterior given prior and data)

Some stopping criterion is used and Bayesian optimisation returns the posterior for the underlying function to use as you see fit.
In particular you may wish to obtain a posterior over the location of the maximum of the function.

How does this work in practice?
We create a black-box function that simulates Rudolph's measurements and use a standard Bayesian optimisation implementation.

In [None]:
def house_bb(x):
    return f(x) + bb.rng.normal(np.zeros_like(x), error_sd.value)

from dataclasses import astuple

import gpflow
from gpflow.utilities import print_summary, set_trainable
import numpy as np
import tensorflow as tf

import trieste
from trieste.bayesian_optimizer import OptimizationResult
from trieste.utils.objectives import branin, mk_observer
from trieste.acquisition.rule import OBJECTIVE

from util.plotting_plotly import plot_function_plotly, plot_gp_plotly, add_bo_points_plotly
from util.plotting import plot_function_2d, plot_bo_points, plot_regret

## 1-dimensional input
Create a scalar valued black-box function of a 1-dimensional input and a widget to control which point to evaluate at.

In [47]:
blackbox1 = bb.GPBlackBox(ndim=1)
x0 = widgets.FloatSlider(
    value=0,
    min=bb.DOMAIN_MIN,
    max=bb.DOMAIN_MAX,
    step=0.01,
    description='x0:',
    readout_format='.2f',
)

Choose which input point (x0) to evaluate the function at:

In [31]:
display(x0)

FloatSlider(value=0.0, description='x0:', max=1.0, min=-1.0, step=0.01)

Evaluate the function and plot all the evaluations so far:

In [36]:
y = blackbox1([x0.value])[0][0]
print(f'Evaluated black box at {x0.value}; result={y}')
blackbox1.plot_xy()

Evaluated black box at 0.2; result=-0.3298341970613642


If you have finished evaluating the function at different points and you are confident where the maximum is, you can make a guess before executing the cells below.

Now show the function as a line and the noisy data we received as evaluations of it:

In [48]:
f1 = blackbox1.sample_f(100)
chart1f = (
    alt.Chart(f1)
    .mark_line()
    .encode(x='x0', y='f'))
chart1y = blackbox1.plot_xy()
chart1 = alt.layer(chart1y, chart1f)
chart1

## 2-dimensional input
Create a scalar valued black-box function of a 2-dimensional input and widgets to control which point to evaluate at.

In [38]:
blackbox2 = bb.GPBlackBox(ndim=2)
x0 = widgets.FloatSlider(
    value=0,
    min=bb.DOMAIN_MIN,
    max=bb.DOMAIN_MAX,
    step=0.01,
    description='x0:',
    readout_format='.2f',
)
x1 = widgets.FloatSlider(
    value=0,
    min=bb.DOMAIN_MIN,
    max=bb.DOMAIN_MAX,
    step=0.01,
    description='x1:',
    readout_format='.2f',
)
w = widgets.Box([x0, x1])

Choose which input point (x0, x1) to evaluate the function at:

In [39]:
display(w)

Box(children=(FloatSlider(value=0.0, description='x0:', max=1.0, min=-1.0, step=0.01), FloatSlider(value=0.0, …

Evaluate the function and plot all the evaluations so far:

In [43]:
y = blackbox2([x0.value, x1.value])[0][0]
print(f'Evaluated black box at ({x0.value}, {x1.value}); result={y}')
blackbox2.plot_xy()

Evaluated black box at (-0.45, 0.49); result=0.42793430293644613


If you have finished evaluating the function at different points and you are confident where the maximum is, you can make a guess before executing the cells below.

Now show the underlying function f (without noise) as a heatmap and the noisy data we received as evaluations of it:

In [44]:
f2 = blackbox2.sample_f(45)
chart2f = (
    alt.Chart(f2)
    .mark_square(size=30)
    .encode(x=alt.X('x0:Q', scale=alt.Scale(domain=bb.DOMAIN)),
            y=alt.X('x1:Q', scale=alt.Scale(domain=bb.DOMAIN)),
            color=alt.Color('f:Q', scale=alt.Scale(scheme=bb.COLOURSCHEME, domainMid=0))))
chart2y = blackbox2.plot_xy()
chart2 = alt.layer(chart2f, chart2y)
chart2

## Gaussian process model underlying our black boxes

In [45]:
gpflow.utilities.print_summary(blackbox1.model)

╒════════════════════════════════════╤═══════════╤══════════════════╤═════════╤═════════════╤═════════╤═════════╤═════════╕
│ name                               │ class     │ transform        │ prior   │ trainable   │ shape   │ dtype   │   value │
╞════════════════════════════════════╪═══════════╪══════════════════╪═════════╪═════════════╪═════════╪═════════╪═════════╡
│ GPR.kernel.kernels[0].variance     │ Parameter │ Softplus         │         │ True        │ ()      │ float64 │    1    │
├────────────────────────────────────┼───────────┼──────────────────┼─────────┼─────────────┼─────────┼─────────┼─────────┤
│ GPR.kernel.kernels[0].lengthscales │ Parameter │ Softplus         │         │ True        │ ()      │ float64 │    0.4  │
├────────────────────────────────────┼───────────┼──────────────────┼─────────┼─────────────┼─────────┼─────────┼─────────┤
│ GPR.kernel.kernels[1].variance     │ Parameter │ Softplus         │         │ True        │ ()      │ float64 │    0.16 │
├───────

In [46]:
gpflow.utilities.print_summary(blackbox2.model)

╒════════════════════════════════════╤═══════════╤══════════════════╤═════════╤═════════════╤═════════╤═════════╤═════════╕
│ name                               │ class     │ transform        │ prior   │ trainable   │ shape   │ dtype   │   value │
╞════════════════════════════════════╪═══════════╪══════════════════╪═════════╪═════════════╪═════════╪═════════╪═════════╡
│ GPR.kernel.kernels[0].variance     │ Parameter │ Softplus         │         │ True        │ ()      │ float64 │    1    │
├────────────────────────────────────┼───────────┼──────────────────┼─────────┼─────────────┼─────────┼─────────┼─────────┤
│ GPR.kernel.kernels[0].lengthscales │ Parameter │ Softplus         │         │ True        │ ()      │ float64 │    0.4  │
├────────────────────────────────────┼───────────┼──────────────────┼─────────┼─────────────┼─────────┼─────────┼─────────┤
│ GPR.kernel.kernels[1].variance     │ Parameter │ Softplus         │         │ True        │ ()      │ float64 │    0.16 │
├───────

## Bayesian optimisation

Use Bayesian optimisation to find a maximum.

In [23]:
blackbox2.x.shape

(6, 2)

In [19]:
f2

Unnamed: 0,x0,x1,f
0,-1.000000,-1.0,-0.805729
1,-0.954545,-1.0,-0.777584
2,-0.909091,-1.0,-0.668566
3,-0.863636,-1.0,-0.586156
4,-0.818182,-1.0,-0.527920
...,...,...,...
2020,0.818182,1.0,0.905606
2021,0.863636,1.0,1.045910
2022,0.909091,1.0,1.196871
2023,0.954545,1.0,1.296711


In [24]:
x2 = np.concatenate((blackbox2.x, np.array([f2['x0'], f2['x1']]).T))
x2

array([[0.        , 0.        ],
       [0.        , 0.        ],
       [0.49      , 0.49      ],
       ...,
       [0.90909091, 1.        ],
       [0.95454545, 1.        ],
       [1.        , 1.        ]])

In [26]:
y2 = np.array([blackbox2.y, np.zeros_like(blackbox2.y)]).T

In [None]:
gpflow.models.VGP(data, kernel, likelihood)