Copyright 2023-2023 Lawrence Livermore National Security, LLC and other MuyGPyS
Project Developers. See the top-level COPYRIGHT file for details.

SPDX-License-Identifier: MIT

# Nonstationary tutorial

This notebook demonstrates how to use hierarchical nonstationary hyperparameters to perform nonstationary regression using a hierarchical model.

⚠️ _Note that this is still an experimental feature at this point._ ⚠️

In [None]:
import matplotlib.pyplot as plt
import numpy as np

from MuyGPyS.gp import MuyGPS
from MuyGPyS.gp.distortion import IsotropicDistortion, l2, F2
from MuyGPyS.gp.hyperparameter import ScalarHyperparameter
from MuyGPyS.gp.hyperparameter.experimental import (
    sample_knots,
    HierarchicalNonstationaryHyperparameter,
)
from MuyGPyS.gp.kernels import RBF

We will set a random seed here for consistency when building docs.
In practice we would not fix a seed.

In [None]:
np.random.seed(0)

## Preliminary setup

For simplicity, we start with an isotropic distortion so we only need to use a single `HierarchicalNonstationaryHyperparameter`.
Let's also build a GP with a fixed length scale for comparison.

Let's create some training data with a little bit of noise.

In [None]:
data_max = 5
data_count = 500
train_step = 10
train_count = int(data_count / train_step)
test_count = data_count - train_count
x = np.linspace(-data_max, data_max, num=data_count)
y = np.sinc(x) - np.mean(np.sinc(x))
train_features = np.reshape(x[::train_step] + np.random.normal(scale=0.02, size=train_count), (-1, 1))
train_responses = np.reshape(y[::train_step] + np.random.normal(scale=0.02, size=train_count), (-1, 1))
test_features = x[np.mod(np.arange(data_count), train_step) != 0].reshape(test_count, 1)
test_responses = y[np.mod(np.arange(data_count), train_step) != 0].reshape(test_count, 1)

We can visualize the true function we are trying to predict, along with the training data with which will optimize a model.

In [None]:
fig, ax = plt.subplots(1, 1, figsize=(7,5))
ax.plot(x, y, label="True Response")
ax.plot(train_features, train_responses, '.', label="Training Data")
plt.legend()
plt.show()

We will create a stationary MuyGPs object.

In [None]:
muygps_fixed = MuyGPS(
    kernel=RBF(
        metric=IsotropicDistortion(
            l2,
            length_scale=ScalarHyperparameter(5.0),
        ),
    ),
)

We will also create a hierarchical nonstationary MuyGPs object, where we assume that the `length_scale` of the distance function itself varies according to a Gaussian process with some "knots", locations in the range of the function where we assume that we know or can learn the true value of the `length_scale`. We will start by sampling some knots and giving them initial values.

In [None]:
knot_count = 6
knot_features = sample_knots(feature_count=1, knot_count=knot_count)
knot_features *= data_max * 2
knot_features -= data_max
knot_features = np.array(sorted(knot_features))
knot_values = np.array([10.0, 7.5, 5.0, 5.0, 7.5, 10.0])

We then create a `MuyGPS` object like before, except now we specify that the `length_scale` is hierarchical and pass the knots. 

In [None]:
high_level_kernel = RBF(
    IsotropicDistortion(
        F2,
        length_scale=ScalarHyperparameter(10.0))
)

muygps = MuyGPS(
    kernel=RBF(
        metric=IsotropicDistortion(
            l2,
            length_scale=HierarchicalNonstationaryHyperparameter(
                knot_features, knot_values, high_level_kernel
            ),
        ),
    ),
)

We can visualize the knots and the resulting `length_scale` surface over the domain of the function.
Unlike `ScalarHyperparameter`, `HierarchicalNonstationaryHyperparameter` takes an array of feature vectors for each point where you would like to evaluate the local value of the hyperparameter.

In [None]:
length_scale_curve = muygps.kernel.distortion_fn.length_scale(x.reshape(data_count, 1))

Since this is a small example, we can evaluate and display the predicted `length_scale` values across the whole domain.

In [None]:
fig, ax = plt.subplots(1, 1, figsize=(7,5))
ax.set_title("Hierarchical Length Scale Surface Over the Domain")
order = np.argsort(knot_features[:,0])
ax.plot(knot_features[order,:], knot_values[order], "*", label="Knot Values")
ax.plot(x, length_scale_curve, label="Interpolated Surface")
plt.legend()
plt.show()

Now we can proceed as usual to generate the nearest neighbors lookup index and tensors.

In [None]:
from MuyGPyS.neighbors import NN_Wrapper

nn_count = 30
nbrs_lookup = NN_Wrapper(train_features, nn_count, nn_method="exact", algorithm="ball_tree")

Note that in this simple example we're using all of the data as batch points, i.e. we're not really batching, since the dataset is very small.

In [None]:
from MuyGPyS.gp.tensors import make_predict_tensors

test_indices = np.arange(test_count)
test_nn_indices, _ = nbrs_lookup.get_nns(test_features)

(
    test_crosswise_diffs,
    test_pairwise_diffs,
    test_nn_targets,
) = make_predict_tensors(
    test_indices,
    test_nn_indices,
    test_features,
    train_features,
    train_responses,
)

Normally, at this point, we would optimize hyperparameters, but that part hasn't been implemented yet so we'll skip it for now.

One notable difference when using a hierarchical model is that the kernel takes an additional tensor, the batch tensor, which can be easily obtained using the `batch_features_tensor` helper function.

In [None]:
from MuyGPyS.gp.tensors import batch_features_tensor

batch_features = batch_features_tensor(test_features, test_indices)

Finally, we're ready to realize the kernel tensors and use them to predict the response of the test data. First using the GP with a fixed length scale.

In [None]:
Kcross_fixed = muygps_fixed.kernel(test_crosswise_diffs)
K_fixed = muygps_fixed.kernel(test_pairwise_diffs)
test_responses_mean_fixed = muygps_fixed.posterior_mean(K_fixed, Kcross_fixed, test_nn_targets)
test_responses_var_fixed = muygps_fixed.posterior_variance(K_fixed, Kcross_fixed)

Then the hierarchical GP.

In [None]:
Kcross = muygps.kernel(test_crosswise_diffs, batch_features)
K = muygps.kernel(test_pairwise_diffs, batch_features)
test_responses_mean = muygps.posterior_mean(K, Kcross, test_nn_targets)
test_responses_var = muygps.posterior_variance(K, Kcross)

And we can visualize the results by plotting the predicted means as well as one predicted standard deviation.

In [None]:
fig, ax = plt.subplots(1, 1, figsize=(7, 5))

ax.plot(x, y, label="truth")
ax.plot(test_features, test_responses_mean_fixed, ".-", label="fixed")
ax.plot(test_features, test_responses_mean, "--", label="hierarchical")
ax.fill_between(
    np.ravel(test_features),
    np.ravel(test_responses_mean_fixed + np.sqrt(test_responses_var_fixed) * 1.96),
    np.ravel(test_responses_mean_fixed - np.sqrt(test_responses_var_fixed) * 1.96),
    facecolor="C1",
    alpha=0.2,
)
ax.fill_between(
    np.ravel(test_features),
    np.ravel(test_responses_mean + np.sqrt(test_responses_var) * 1.96),
    np.ravel(test_responses_mean - np.sqrt(test_responses_var) * 1.96),
    facecolor="C2",
    alpha=0.2,
)
plt.legend()
plt.show()