<a href="https://colab.research.google.com/github/ziatdinovmax/gpax/blob/main/examples/compare_GPs.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Compare SimpleGP and viGP

This is a simple notebook to compare timings and results of two different commonly used GPs. One trained using NUTS, and the other trained using SVI.

*Prepared by Matthew R. Carbone & Maxim Ziatdinov (2023)*

## Background

Depending on the amount of data you have, the number of dimensions the inputs have, and your time budget for training, you may want to use a GP fit using stochastic variational inference vs. Markov chain monte carlo. The following compares some strengths and weaknesses of the two methods.

**Hamiltonian Monte Carlo (HMC)/No U-Turn Sampler (NUTS)**

- Sensitivity to Priors: This can be perceived as a strength or a weakness, depending on the context. However, many researchers appreciate it because it offers a more intuitive grasp of the model.
- Reliable Uncertainty Estimates: Offers robust evaluations of uncertainties as it directly samples from the posterior. The variational methods are known to lead to underestimation of uncertainties.
- Integration with Classical Bayesian Models: This is particularly evident when you consider the combination of Gaussian Processes with traditional Bayesian models, as demonstrated in structured GP and hypothesis learning.
- Comprehensive Convergence Diagnostics: Indicators such as n_eff, r_hat, and acc_prob for each inferred parameter.
- Speed Limitations: One of the primary drawbacks is its computational speed.

**Stochastic Variational Inference (SVI)**

- Efficiency: It's significantly faster and is memory-efficient (performs equally well with 32-bit precision)
- Acceptable Trade-offs: For many real-world tasks, the slight decrease in the accuracy of predictive uncertainty estimates is either negligible or acceptable.
- Convergence Indicator Limitations: The loss may not be a very good indicator of convergence - can easily overshoot or undershoot.

## Install & Import

Install the latest GPax package from PyPI (this is best practice, as it installs the latest, deployed and tested version). Please do not install from a GitHub url!

In [None]:
!pip install gpax

Import needed packages:

In [None]:
try:
    # For use on Google Colab
    import gpax

except ImportError:
    # For use locally (where you're using the local version of gpax)
    print("Assuming notebook is being run locally, attempting to import local gpax module")
    import sys
    sys.path.append("..")
    import gpax

In [None]:
import numpy as np
import matplotlib.pyplot as plt

gpax.utils.enable_x64()  # enable double precision

Enable some pretty plotting.

In [None]:
import matplotlib as mpl

In [None]:
mpl.rcParams['mathtext.fontset'] = 'stix'
mpl.rcParams['font.family'] = 'STIXGeneral'
mpl.rcParams['text.usetex'] = False
plt.rc('xtick', labelsize=12)
plt.rc('ytick', labelsize=12)
plt.rc('axes', labelsize=12)
mpl.rcParams['figure.dpi'] = 200

## Create data

Generate some noisy observations:

In [None]:
np.random.seed(0)

NUM_INIT_POINTS = 25 # number of observation points
NOISE_LEVEL = 0.1 # noise level

# Generate noisy data from a known function
f = lambda x: np.sin(10*x)

X = np.random.uniform(-1., 1., NUM_INIT_POINTS)
y = f(X) + np.random.normal(0., NOISE_LEVEL, NUM_INIT_POINTS)

# Plot generated data
fig, ax = plt.subplots(1, 1, figsize=(6, 2))
ax.set_xlabel("$x$")
ax.set_ylabel("$y$")
ax.scatter(X, y, marker='x', c='k', zorder=1, label='Noisy observations')
ax.set_ylim(-1.8, 2.2)
plt.show()

## Standard `ExactGP`

Next, we initialize and train a GP model. We are going to use an RBF kernel, $k_{RBF}=𝜎exp(-\frac{||x_i-x_j||^2}{2l^2})$, which is a "go-to" kernel functions in GP.

In [None]:
# Get random number generator keys for training and prediction
rng_key, rng_key_predict = gpax.utils.get_keys()

# Initialize model
gp_model_1 = gpax.ExactGP(1, kernel='RBF')

# Run Hamiltonian Monte Carlo to obtain posterior samples for kernel parameters and model noise
gp_model_1.fit(rng_key, X, y, num_chains=1)

## Standard `viGP`

In [None]:
# Get random number generator keys for training and prediction
rng_key, rng_key_predict = gpax.utils.get_keys()

# Initialize model
gp_model_2 = gpax.viGP(1, kernel='RBF')

# Run Hamiltonian Monte Carlo to obtain posterior samples for kernel parameters and model noise
gp_model_2.fit(rng_key, X, y)

In [None]:
X_test = np.linspace(-1, 1, 100)

In [None]:
y_pred_1, y_sampled_1 = gp_model_1.predict(rng_key_predict, X_test, n=200)

In [None]:
y_pred_2, y_sampled_2 = gp_model_2.predict(rng_key_predict, X_test, n=200)

Note that SVI (the `viGP`) is significantly faster (use the `% timeit` magic before the command to see). SVI is usually better to use on larger datasets and is more easily scalable. In this case, they produce similar results.

In [None]:
y_sampled_1.shape

In [None]:
y_sampled_2.shape  # Note shape difference between predict methods

Plot the obtained results:

In [None]:
_, ax = plt.subplots(1, 1, figsize=(6, 2))

ax.set_xlabel("$x$")
ax.set_ylabel("$y$")
ax.plot(X_test, y_pred_1, lw=1.5, zorder=2, c='r', label='NUTS/MCMC')
ax.fill_between(X_test, y_pred_1 - y_sampled_1.std(axis=(0,1)), y_pred_1 + y_sampled_1.std(axis=(0,1)),
                color='r', alpha=0.3, linewidth=0)


ax.set_xlabel("$x$")
ax.set_ylabel("$y$")
ax.plot(X_test, y_pred_2, lw=1.5, zorder=2, c='b', label='SVI')
ax.fill_between(X_test, y_pred_2 - np.sqrt(y_sampled_2), y_pred_2 + np.sqrt(y_sampled_2),
                color='b', alpha=0.3, linewidth=0)



ax.set_ylim(-1.8, 2.2)

ax.scatter(X, y, marker='x', c='k', zorder=2, label="Noisy observations", alpha=0.7)

ax.legend(loc='upper left', ncols=3)

plt.show()