Copyright 2023-2023 Lawrence Livermore National Security, LLC and other MuyGPyS
Project Developers. See the top-level COPYRIGHT file for details.

SPDX-License-Identifier: MIT

# Nonstationary tutorial

This notebook demonstrates how to use hierarchical nonstationary hyperparameters to perform nonstationary regression using a hierarchical model.

⚠️ _Note that this is still an experimental feature at this point._ ⚠️

In [None]:
import matplotlib.pyplot as plt
import numpy as np

from MuyGPyS.gp import MuyGPS
from MuyGPyS.gp.deformation import Isotropy, l2, F2
from MuyGPyS.gp.hyperparameter import Parameter, VectorParameter
from MuyGPyS.gp.hyperparameter.experimental import (
    sample_knots,
    HierarchicalParameter,
)
from MuyGPyS.gp.kernels import RBF
from MuyGPyS.gp.noise import HomoscedasticNoise

We will set a random seed here for consistency when building docs.
In practice we would not fix a seed.

In [None]:
np.random.seed(0)

## Preliminary setup

For simplicity, we start with an isotropic distortion so we only need to use a single `HierarchicalNonstationaryHyperparameter`.
Let's also build a GP with a fixed length scale for comparison.

Let's create some training data with a little bit of noise.

In [None]:
data_max = 5
data_count = 500
train_step = 10
train_count = int(data_count / train_step)
test_count = data_count - train_count
noise_prior = 1e-5
noise_actual = 2e-4
xs = np.linspace(-data_max, data_max, num=data_count)
ys = np.sinc(xs) - np.mean(np.sinc(xs))
xs = (xs - np.min(xs)) / (2 * np.max(xs))
train_features = xs[::train_step]
train_responses = ys[::train_step] + np.random.normal(scale=noise_actual, size=train_count)
test_features = xs[np.mod(np.arange(data_count), train_step) != 0]
test_responses = ys[np.mod(np.arange(data_count), train_step) != 0]

We can visualize the true function we are trying to predict, along with the training data with which will optimize a model.

In [None]:
fig, ax = plt.subplots(1, 1, figsize=(7,5))
ax.plot(xs, ys, label="True Response")
ax.plot(train_features, train_responses, '.', label="Training Data")
plt.legend()
plt.show()

We will create a stationary MuyGPs object for reference.

In [None]:
muygps_fixed = MuyGPS(
    kernel=RBF(
        deformation=Isotropy(
            l2,
            length_scale=Parameter(0.5),
        ),
    ),
    noise=HomoscedasticNoise(noise_prior),
)

## Hierarchical Nonstationary MuyGPs

We will also create a hierarchical nonstationary MuyGPs object, where we assume that the `length_scale` of the distance function itself varies according to a Gaussian process with some "knots", locations in the range of the function where we assume that we know or can learn the true value of the `length_scale`. We will start by sampling some knots and giving them initial values.

In [None]:
knot_count = 6
knot_features = np.squeeze(sample_knots(feature_count=1, knot_count=knot_count))
knot_features = np.array(sorted(knot_features))
knot_values = VectorParameter(
    Parameter(0.3),
    Parameter(0.4),
    Parameter(0.5),
    Parameter(0.5),
    Parameter(0.4),
    Parameter(0.3),
)

We then create a `MuyGPS` object like before, except now we specify that the `length_scale` is hierarchical and pass the knots. 

In [None]:
print(knot_features)

In [None]:
print(knot_values)

In [None]:
high_level_kernel = RBF(
    deformation=Isotropy(
        F2,
        length_scale=Parameter(0.5))
)

muygps = MuyGPS(
    kernel=RBF(
        deformation=Isotropy(
            l2,
            length_scale=HierarchicalParameter(
                knot_features, knot_values, high_level_kernel
            ),
        ),
    ),
    noise=HomoscedasticNoise(noise_prior),
)

We can visualize the knots and the resulting `length_scale` surface over the domain of the function.
Unlike `ScalarHyperparameter`, `HierarchicalNonstationaryHyperparameter` takes an array of feature vectors for each point where you would like to evaluate the local value of the hyperparameter.

In [None]:
length_scale_curve = muygps.kernel.deformation.length_scale(xs)

Since this is a small example, we can evaluate and display the predicted `length_scale` values across the whole domain.

In [None]:
fig, ax = plt.subplots(1, 1, figsize=(7,5))
ax.set_title("Hierarchical Length Scale Surface Over the Domain")
order = np.argsort(knot_features)
ax.plot(knot_features[order], knot_values()[order], "*", label="Knot Values")
ax.plot(xs, length_scale_curve, label="Interpolated Surface")
plt.legend()
plt.show()

Now we can proceed as usual to generate the nearest neighbors lookup index and tensors.

In [None]:
from MuyGPyS.neighbors import NN_Wrapper

nn_count = 30
nbrs_lookup = NN_Wrapper(train_features, nn_count, nn_method="exact", algorithm="ball_tree")

Note that in this simple example we're using all of the data as batch points, i.e. we're not really batching, since the dataset is very small.

In [None]:
test_indices = np.arange(test_count)
test_nn_indices, _ = nbrs_lookup.get_nns(test_features)

In [None]:
(
    test_crosswise_dists,
    test_pairwise_dists,
    test_nn_targets,
) = muygps.make_predict_tensors(
    test_indices,
    test_nn_indices,
    test_features,
    train_features,
    train_responses,
)

One notable difference when using a hierarchical model is that the kernel takes an additional tensor, the batch tensor, which can be easily obtained using the `batch_features_tensor` helper function.

In [None]:
from MuyGPyS.gp.tensors import batch_features_tensor

batch_test_features = batch_features_tensor(test_features, test_indices)

Finally, we're ready to realize the kernel tensors and use them to predict the response of the test data. First using the GP with a fixed length scale.

In [None]:
Kcross_flat_fixed = muygps_fixed.kernel(test_crosswise_dists)
Kin_flat_fixed = muygps_fixed.kernel(test_pairwise_dists)
mean_flat_fixed = muygps_fixed.posterior_mean(
    Kin_flat_fixed, Kcross_flat_fixed, test_nn_targets
)
var_flat_fixed = muygps_fixed.posterior_variance(
    Kin_flat_fixed, Kcross_flat_fixed
)

Then the hierarchical GP.

In [None]:
Kcross_hierarchical_fixed = muygps.kernel(
    test_crosswise_dists, batch_features=batch_test_features
)
Kin_hierarchical_fixed = muygps.kernel(
    test_pairwise_dists, batch_features=batch_test_features
)
mean_hierarchical_fixed = muygps.posterior_mean(
    Kin_hierarchical_fixed, Kcross_hierarchical_fixed, test_nn_targets
)
var_hierarchical_fixed = muygps.posterior_variance(
    Kin_hierarchical_fixed, Kcross_hierarchical_fixed
)

And we can visualize the results by plotting the predicted means as well as one predicted standard deviation.

In [None]:
fig, ax = plt.subplots(1, 1, figsize=(7, 5))

ax.plot(xs, ys, label="truth")
ax.plot(test_features, mean_flat_fixed, ".-", label="flat fixed")
ax.fill_between(
    np.ravel(test_features),
    np.ravel(mean_flat_fixed + np.sqrt(var_flat_fixed) * 1.96),
    np.ravel(mean_flat_fixed - np.sqrt(var_flat_fixed) * 1.96),
    facecolor="C1",
    alpha=0.2,
)
ax.plot(test_features, mean_hierarchical_fixed, "--", label="hierarchical fixed")
ax.fill_between(
    np.ravel(test_features),
    np.ravel(mean_hierarchical_fixed + np.sqrt(var_hierarchical_fixed) * 1.96),
    np.ravel(mean_hierarchical_fixed - np.sqrt(var_hierarchical_fixed) * 1.96),
    facecolor="C2",
    alpha=0.2,
)
plt.legend()
plt.show()

## Optimization

The knot values of hierarchical nonstationary hyperparameters can be optimized using like any other hyperparameters, using the `optimize_from_tensors` utility. But first, we need to initialize them as `ScalarHyperparameter`s with bounds rather than as fixed values.

In [None]:
bounds = [0.1, 0.5]
knot_values_to_be_optimized = VectorParameter(
    *[Parameter("sample", bounds) for _ in range(knot_count)]
)

In [None]:
len(knot_values_to_be_optimized)

Let's recreate a MuyGPs object. It's identical to the one we've created before except for the knot values.

In [None]:
hierarchical_to_be_optimized = MuyGPS(
    kernel=RBF(
        deformation=Isotropy(
            l2,
            length_scale=HierarchicalParameter(
                knot_features, knot_values_to_be_optimized, high_level_kernel
            ),
        ),
    ),
    noise=HomoscedasticNoise(noise_prior),
)

Then we use `make_train_tensors` to obtain the training tensors. Once again, we use all the training data instead of batching it due to the small size of the dataset.

In [None]:
train_indices = np.arange(train_count)
train_nn_indices, _ = nbrs_lookup.get_batch_nns(train_indices)

In [None]:
(
    batch_crosswise_diffs,
    batch_pairwise_diffs,
    batch_targets,
    batch_nn_targets,
) = hierarchical_to_be_optimized.make_train_tensors(
    train_indices,
    train_nn_indices,
    train_features,
    train_responses,
)

As before, we need the prediction features (in this case, of the training batch) in order to evaluate the objective function.

In [None]:
batch_features = batch_features_tensor(train_features, train_indices)

In [None]:
from MuyGPyS.optimize import Bayes_optimize
from MuyGPyS.optimize.loss import lool_fn, mse_fn

In [None]:
hierarchical_optimized = Bayes_optimize(
    hierarchical_to_be_optimized,
    batch_targets,
    batch_nn_targets,
    batch_crosswise_diffs,
    batch_pairwise_diffs,
    batch_features=batch_features,
    loss_fn=mse_fn,
    verbose=True,
    random_state=1,
    init_points=5,
    n_iter=45,
)

Then we use `optimize_from_tensors` to create a new MuyGPs object for which the knot values have been fixed to their optimal values. Note that we must pass the `batch_features` tensor, this time from the training data.

⚠️ _The next four cells are for testing purposes and should be removed prior to finalizing this tutorial._ ⚠️

The optimized model has set new knot values, which we can use to visualize the learned `length_scale` surface. 

In [None]:
hierarchical_optimized.kernel.deformation.length_scale.knot_values()

In [None]:
length_scale_curve_optimized = hierarchical_optimized.kernel.deformation.length_scale(xs)

In [None]:
knot_features.shape

In [None]:
fig, ax = plt.subplots(1, 1, figsize=(7,5))
ax.set_title("Trained Hierarchical Length Scale Surface Over the Domain")
order = np.argsort(knot_features)
ax.plot(
    knot_features[order], 
    hierarchical_optimized.kernel.deformation.length_scale.knot_values()[order], 
    "*", 
    label="Knot Values")
ax.plot(xs, length_scale_curve_optimized, label="Interpolated Surface")
plt.legend()
plt.show()

We can now use the optimized the kernel to predict the test responses.

In [None]:
Kcross_hierarchical_opt = hierarchical_optimized.kernel(test_crosswise_dists, batch_features=batch_test_features)
Kin_hierarchical_opt = hierarchical_optimized.kernel(test_pairwise_dists, batch_features=batch_test_features)
mean_hierarchical_opt = hierarchical_optimized.posterior_mean(
    Kin_hierarchical_opt, Kcross_hierarchical_opt, test_nn_targets
)
var_hierarchical_opt = hierarchical_optimized.posterior_variance(
    Kin_hierarchical_opt, Kcross_hierarchical_opt
)

We also optimize a flat model with the same batch for comparison.

In [None]:
flat_to_be_optimized = MuyGPS(
    kernel=RBF(
        deformation=Isotropy(
            l2,
            length_scale=Parameter("sample", [0.001, 5.0]),
        ),
    ),
    noise=HomoscedasticNoise(noise_prior),
)

In [None]:
flat_optimized = Bayes_optimize(
    flat_to_be_optimized,
    batch_targets,
    batch_nn_targets,
    batch_crosswise_diffs,
    batch_pairwise_diffs,
    batch_features=batch_features,
    loss_fn=mse_fn,
    verbose=True,
    random_state=1,
    init_points=5,
    n_iter=15,
)

In [None]:
Kcross_flat_opt = flat_optimized.kernel(test_crosswise_dists)
Kin_flat_opt = flat_optimized.kernel(test_pairwise_dists)
mean_flat_opt = flat_optimized.posterior_mean(
    Kin_flat_opt, Kcross_flat_opt, test_nn_targets
)
var_flat_opt = flat_optimized.posterior_variance(
    Kin_flat_opt, Kcross_flat_opt
)

And visualize the results.

In [None]:
fig, axes = plt.subplots(2, 2, figsize=(8, 8))

for axi in axes:
    for ax in axi:
        ax.set_ylim([-1, 1.5])
#         ax.plot(xs, ys, label="truth")
        for knot in knot_features:
            ax.axvline(x=knot)
axes[0, 0].set_title("flat fixed")
axes[0, 0].plot(test_features, mean_flat_fixed, "-", label="flat fixed")
axes[0, 0].fill_between(
    np.ravel(test_features),
    np.ravel(mean_flat_fixed + np.sqrt(var_flat_fixed) * 1.96),
    np.ravel(mean_flat_fixed - np.sqrt(var_flat_fixed) * 1.96),
    facecolor="C1",
    alpha=0.2,
)
axes[0, 1].set_title("hierarchical fixed")
axes[0, 1].plot(test_features, mean_hierarchical_fixed, "-", label="hierarchical fixed")
axes[0, 1].fill_between(
    np.ravel(test_features),
    np.ravel(mean_hierarchical_fixed + np.sqrt(var_hierarchical_fixed) * 1.96),
    np.ravel(mean_hierarchical_fixed - np.sqrt(var_hierarchical_fixed) * 1.96),
    facecolor="C2",
    alpha=0.2,
)
axes[1, 0].set_title("flat optimized")
axes[1, 0].plot(test_features, mean_flat_opt, "-", label="flat optimized")
axes[1, 0].fill_between(
    np.ravel(test_features),
    np.ravel(mean_flat_opt + np.sqrt(var_flat_opt) * 1.96),
    np.ravel(mean_flat_opt - np.sqrt(var_flat_opt) * 1.96),
    facecolor="C3",
    alpha=0.2,
)
axes[1, 1].set_title("hierarchical optimized")
axes[1, 1].plot(test_features, mean_hierarchical_opt, "-", label="hierarchical optimized")
axes[1, 1].fill_between(
    np.ravel(test_features),
    np.ravel(mean_hierarchical_opt + np.sqrt(var_hierarchical_opt) * 1.96),
    np.ravel(mean_hierarchical_opt - np.sqrt(var_hierarchical_opt) * 1.96),
    facecolor="C3",
    alpha=0.2,
)
for knot in knot_features:
    ax.axvline(x=knot)
plt.legend()
plt.show()