In [1]:
# --- sampling
from src.gp_sampling import *
from src.gp_regression import *
# --- style
# If in your browser the figures are not nicely vizualized, please change the following
rcParams['font.size'] = 12 
rcParams['figure.figsize'] = (15,8)

## Sampling from a GP
Demo explanation:
* After specifying a kernel and its hyperparameters, we would like to sample functions from our GP. Two strategies to do this:
* (1) *Cholesky decomposition*: $f = L\epsilon$, where $\epsilon \sim \mathcal{N}(0, 1)$ and $K=LL^\top$
* (2) *Forward sampling*: $p(f_1, ..., f_N) = \prod_{i=1}^N p(f_i|f_{1:i-1})$
* You can choose to sample from the posterior (conditioned on observations) or directly from the prior
* Lengthscale hyper-parameter is specific to RBF and Matern kernels, Sigma scales the linear kernel

In [2]:
plot_sample()

FloatSlider(value=4.0, description='lengthscale:', layout=Layout(width='500px'), max=8.0, min=1.0, step=0.5, s…

FloatSlider(value=0.5, description='sigma:', layout=Layout(width='500px'), max=1.0, min=0.001, step=0.05, styl…

IntSlider(value=3, description='Number of points to condition on:', layout=Layout(width='500px'), max=9, min=3…

Dropdown(description='Kernel selection:', options=('RBF', 'Matern-1/2', 'Matern-3/2', 'Matern-5/2', 'Linear'),…

Dropdown(description='Sampling type:', options=('Prior (Cholesky)', 'Posterior (Cholesky)', 'Posterior (Forwar…

Button(description='Resample', style=ButtonStyle())

Output()

## GP regression with different kernel selection

Demo explanation:
* The true function is sinusoidal. 
* We obtain observations with iid. additive Gaussian noise.
* We fit a GP with some specified kernel, or composition of kernels. RBF, quadratic, Matern have parameter *lengthscale*.
* Another parameter is output scale: [explanation](https://docs.gpytorch.ai/en/stable/examples/00_Basic_Usage/Hyperparameters.html). One of raw parameters for learning
* The likelihood noise variance can be specified as a hyperparameter.
* Total uncertainty: Standard deviation of label y
* Epistemic uncertainty: Come from the model confidence region (standard deviation of f)
* Aleatoric uncertainty: The difference between above

Play around:
* Change the lengthscale to see how the complexity of the model influence the result
* Try multiple kernel combinations and think about how to choose them based on data
* Think about why the epistemic uncertainty is dominant in the data-lacking area


In [3]:
plot_gp_regression()

interactive(children=(IntSlider(value=25, continuous_update=False, description='num_training', min=1), FloatSl…

## Model Selection: Optimization of Hyper Parameters with type-II MLE.
In this demo the model learn the best hyper parameters by optimizing the Marginal Likelihood.
The second graph show the best model with learned hyper-parameters

In [4]:
plot_mll_kernel()

interactive(children=(IntSlider(value=25, description='num_training', min=1), Output()), _dom_classes=('widget…

# Predicting Weather with GPs

In this demo we use GPR to solve a real-world problem.

Play around:
* Think about what makes a good prediction
* Try to find the best kernel combination for prediction both yearly and monthly


In [5]:
plot_gp_weather()

interactive(children=(Dropdown(description='period', options=('yearly', 'monthly after 2010'), value='yearly')…

# Sparse GPs

Demo guide:
* Here we show different sparse GP / inducing-points methods
* (1) [Deterministic Training Conditional (DTC)](http://proceedings.mlr.press/r4/seeger03a/seeger03a.pdf)
* (2) [Subset of Regressors (SOR)](https://www.jmlr.org/papers/volume6/quinonero-candela05a/quinonero-candela05a.pdf)
* (3) [Fully Independent Training Conditional (FITC)](https://papers.nips.cc/paper/2857-sparse-gaussian-processes-using-pseudo-inputs)

Play around
* Compare different methods for generating inducing points and their inference times

In [6]:
plot_sparse_gp()

interactive(children=(IntSlider(value=500, continuous_update=False, description='Number training points:', max…

## Random Feature Approximation 

Demo guide:
* Here we employ three random feature approximations to the RBF kernel
* (1) [Random Fourier Features](https://people.eecs.berkeley.edu/~brecht/papers/07.rah.rec.nips.pdf)
* (2) [Quadrature Fourier Features](https://papers.nips.cc/paper_files/paper/2018/hash/4e5046fc8d6a97d18a5f54beaed54dea-Abstract.html)
* (3) [Orthogonal Random Features](https://arxiv.org/pdf/1610.09072.pdf)

Play around:

* Compare different kernel approximation methods
* Try to change the number of random features and see how it will influence the result


In [7]:
plot_rff_gp_regression()

interactive(children=(IntSlider(value=500, continuous_update=False, description='Number training points:', max…