## Gaussian Process Regression

### Gaussian Process
* Random process where any point $\large 𝑥∈\mathbb{R}^𝑑$ is assigned random variable $\large \mathbb{f}(𝑥)$
* Joint distribution of such finite number of variables is given by:

$$\large 𝑝(\mathbb{f}│𝑋)=𝒩(\mathbb{f}|𝜇,𝐾)$$ where
$$ \mathbb{f} = (\mathbb{f}(𝑥_1 ), …, \mathbb{f}(𝑥_𝑁 )) $$
$$ \mu = (𝑚(𝑥_1 ),…, 𝑚(𝑥_𝑁 )) $$
$$ 𝐾_{𝑖𝑗} = \kappa(𝑥_𝑖, 𝑥_𝑗) $$ where $\kappa$ is a PSD kernel function

### Gaussian Process Regression
* Joint distribution of observed values $\large \mathbb{f} $ and predictions $\large \mathbb{f}_∗ $ is Gaussian with
$$
\begin{pmatrix} \large \mathbb{f} \\ \large \mathbb{f}_* \end{pmatrix} \sim N\Bigg( \large 0, \begin{pmatrix} K & K_* \\ K_*^T & K_{**} \end{pmatrix} \Bigg)
$$
where $𝐾 = \kappa(𝑋, 𝑋)$, $𝐾_∗ = \kappa(𝑋, 𝑋_∗)$ and $𝐾_{∗*}=\kappa(𝑋_∗, 𝑋_∗)$

* Posterior/predictive distribution for $\large 𝑦=f+\epsilon$  with $\large \epsilon \sim N(0, \sigma_𝑦^2 \mathbb{I})$ is given by

$$ \large 𝑝(\mathbb{𝕗}_∗│𝑋_∗, 𝑋, 𝑦) = N(\mu_∗, \Sigma_∗ )$$
where 
$$\large \mu_∗=𝐾_∗(𝐾+\sigma_𝑛^2 𝐼)^{−1} 𝑦$$
$$\large \Sigma_∗=𝐾_{∗∗}−(𝐾_∗ (𝐾+\sigma_𝑛^2 \mathbb{I})^{−1} 𝐾_∗^𝑇$$

* Regression line is the mean of the posterior distribution $\large\mu_*$
* Diagonal entries of the covariance matrix $\large \Sigma_*$ can be used for confidence intervals surrounding the regression line

## Gaussian Process Regression Dashboard
The dashboard below helps us better understand GP regression
* Different kernels (and their hyper-params) can be selected from the kernel dropdown
* `Display 5 Priors?` checkbox shows/hides 5 realizations from prior distribution
* Ground truth (or the function GPR is trying to learn) is shown as a white dotted line. Training samples can be added by clicking anywhere on the figure or can be updated by dragging the existing points. 
* `Display 5 Posteriors?` checkbox shows/hides 5 realizations from the posterior distribution
* `Display Std Bands?` checkbox shows/hides 2 std bands from the posterior mean (aka regression line)
* $\sigma_{noise}$ slider controls noise around the training samples
* Add a few points close to the white line at different places to see the regression line and the confidence intervals update in real time!

In [None]:
import inspect
from collections import OrderedDict
import numpy as np

import ipywidgets as w
import bqplot.pyplot as plt
from bqplot import *

In [None]:
# kernels
def rbf(x1, x2, sigma=1., l=1.):
    z = (x1 - x2[:, np.newaxis]) / l
    return sigma**2 * np.exp(-.5 * z ** 2)


def linear(x1, x2, sigma=.2, sigma_b=.04, c=0.):
    return sigma_b ** 2 + sigma ** 2 * (x1 - c) * (x2[:, np.newaxis] - c)

In [None]:
def gp_regression(X_train, y_train, X_test,
                  kernel=rbf,
                  sigma_noise=.1,
                  kernel_params=dict(sigma=1., l=1.)):
    # compute the kernel matrices for train, train_test, test combinations
    K = kernel(X_train, X_train, **kernel_params)
    K_s = kernel(X_train, X_test, **kernel_params)
    K_ss = kernel(X_test, X_test, **kernel_params)
    
    n, p = len(X_train), len(X_test)
    
    # compute the posterior mean and cov
    mu_s = np.dot(K_s, np.linalg.solve(K + sigma_noise**2 * np.eye(n), y_train))
    cov_s = K_ss - np.dot(K_s, np.linalg.solve(K + sigma_noise**2 * np.eye(n), K_s.T))
    if np.any(np.diag(cov_s) < 0):
        print('diag ')
    
    # prior and posterior moments
    mu_prior, cov_prior = np.zeros(p), K_ss
    mu_post, cov_post = mu_s, cov_s + sigma_noise**2
    
    return dict(prior=(mu_prior, cov_prior), 
                posterior=(mu_post, cov_post))

In [None]:
WIDGETS_MAP = {float: w.FloatText, int: w.IntText, bool: w.Checkbox}
textbox_layout = w.Layout(width="180px")

class KeywordArgsWidget(w.Box):
    """
    automatic keyword args UI for an object (class or func)
    """
    def __init__(self, obj, orientation="horizontal"):
        self.obj = obj
        self.orientation = orientation
        self.widgets_layout = w.Box()

        self.param_wids = {}
        if self.orientation == "horizontal":
            self.widgets_layout = w.HBox()
        else:
            self.widgets_layout = w.VBox()
        self._build_widgets()
        super(KeywordArgsWidget, self).__init__(children=[self.widgets_layout])
    
    def _build_widgets(self):
        if self.obj:
            params = inspect.signature(self.obj).parameters

            self.widgets = OrderedDict({param_name: 
                WIDGETS_MAP.get(type(param.default), w.FloatText)(
                    description=f"$\\{param_name}$" if param_name.startswith('sigma') else f"${param_name}$", 
                    layout=textbox_layout,
                    value=param.default
                )
                for param_name, param in params.items()
                if param.default is not inspect._empty})
            self.widgets_layout.children = list(self.widgets.values())
    
    def get_param_values(self):
        params = {k: v.value for k, v in self.widgets.items() if v.value}
        return params

In [None]:
xmin, xmax = -1, 2
kernel = rbf
params = dict(sigma=1., l=1.)

X_test = np.arange(xmin, xmax, .05)
p = len(X_test)
K_ss = kernel(X_test, X_test, **params)
mu_prior, cov_prior = np.zeros(p), K_ss

N = 5
f_priors = np.random.multivariate_normal(mu_prior, cov_prior, N)

In [None]:
KERNEL_METADATA = {
    'RBF': dict(func=rbf,
                equation="$\kappa(x_1, x_2) = \sigma^2 exp(-\\frac{(x_1 - x_2)^2}{2l^2})$",
                args_widget=KeywordArgsWidget(rbf)),
    'Linear': dict(name='Linear',
                   func=linear,
                   equation="$\kappa(x_1, x_2) = \sigma_b^2 + \sigma^2(x_1 - c)(x_2 - c)$",
                   args_widget=KeywordArgsWidget(linear))
}

In [None]:
# kernel controls
kernel_dropdown = widgets.Dropdown(description='Kernel', options=KERNEL_METADATA)
equation_label = widgets.Label()
equation_label.layout.margin = '0px 0px 0px 50px'
param_widget_placeholder = w.Box()

fig_margin=dict(top=60, bottom=40, left=50, right=0)

fig = plt.figure(title='Gaussian Process Regression', 
                 layout=w.Layout(width='1200px', height='700px'),
                 animation_duration=750,
                 fig_margin=fig_margin)

plt.scales(scales={'x': LinearScale(min=xmin, max=xmax),
                   'y': LinearScale(min=-2, max=2)})

# ground truth line
y = -np.sin(3 * X_test) - X_test ** 2 + .3 * X_test + .5
f_line = plt.plot(X_test, y, colors=['white'], line_style='dash_dotted')
std_bands = plt.plot(X_test, [],
                     fill='between',
                     fill_colors=['yellow'],
                     apply_clip=False,
                     fill_opacities=[.2], stroke_width=0)

train_scat = plt.scatter([], [], colors=['magenta'], 
                         enable_move=True,
                         interactions={'click': 'add'},
                         marker_size=1, marker='square')

prior_lines = plt.plot(X_test, f_priors, stroke_width=1, 
                       colors=['#ccc'], apply_clip=False)
posterior_lines = plt.plot(X_test, [], stroke_width=1, apply_clip=False)

mean_line = plt.plot(X_test, [], 'm')

plt.xlabel('X')
plt.ylabel('Y')

# reset btn
reset_button = w.Button(description='Reset Points', button_style='success')
reset_button.layout.margin = '20px 0px 0px 70px'

data_noise_slider = widgets.FloatSlider(description='$\sigma_{noise}$', value=0, step=.01, max=1)

# controls for the plot
f_priors_cb = w.Checkbox(description='Display 5 Priors?')
f_posteriors_cb = w.Checkbox(description='Display 5 Posteriors?')
std_bands_cb = w.Checkbox(description='Display Std Bands?')
check_boxes = [f_priors_cb, f_posteriors_cb, std_bands_cb]

label = w.Label('*Click on the figure to add training samples')
controls = w.VBox(check_boxes + [reset_button, label, data_noise_slider])

# link widgets
_ = w.jslink((f_priors_cb, 'value'), (prior_lines, 'visible'))
_ = w.jslink((f_posteriors_cb, 'value'), (posterior_lines, 'visible'))
_ = w.jslink((std_bands_cb, 'value'), (std_bands, 'visible'))

def update_reg_line(change):    
    X_train = train_scat.x
    y_train = train_scat.y
    
    kernel_metadata = kernel_dropdown.value

    gp_res = gp_regression(X_train, y_train, X_test,
                           sigma_noise=data_noise_slider.value,
                           kernel=kernel_metadata['func'],
                           kernel_params=kernel_metadata['args_widget'].get_param_values())
    mu_post, cov_post = gp_res['posterior']
    
    # simulate N samples from the posterior distribution
    posterior_lines.y = np.random.multivariate_normal(mu_post, cov_post, N)
    sig_post = np.sqrt(np.diag(cov_post))

    # update the regression line to the mean of the posterior distribution
    mean_line.y = mu_post
    
    # update the std bands to +/- 2 sigmas from the posterior mean
    std_bands.y = [mu_post - 2 * sig_post, mu_post + 2 * sig_post]

train_scat.observe(update_reg_line, names=['x', 'y'])
data_noise_slider.observe(update_reg_line)

# redraw plots whenever kernel params are updated
for kernel_metadata in KERNEL_METADATA.values():
    args_widget = kernel_metadata['args_widget']
    for widget in args_widget.widgets.values():
        widget.observe(update_reg_line)

def reset_points(*args):
    with train_scat.hold_trait_notifications():
        train_scat.x = []
        train_scat.y = []
reset_button.on_click(lambda btn: reset_points())

def update_kernel_params(*args):
    kernel_metadata = kernel_dropdown.value
    param_widget_placeholder.children = [kernel_metadata['args_widget']]
    equation_label.value = kernel_metadata['equation']
    update_reg_line(None)

kernel_dropdown.observe(update_kernel_params)

fig.on_displayed(update_reg_line)
kernel_controls = w.HBox([kernel_dropdown, param_widget_placeholder, equation_label])
update_kernel_params(None)
w.VBox([kernel_controls, w.HBox([fig, controls])])