Copyright 2021-2023 Lawrence Livermore National Security, LLC and other MuyGPyS
Project Developers. See the top-level COPYRIGHT file for details.

SPDX-License-Identifier: MIT

# Anisotropic Metric Tutorial

This notebook walks through a simple anisotropic regression workflow and illustrates anisotropic features of `MuyGPyS`.

In [None]:
import sys
for m in sys.modules.keys():
    if m.startswith("Muy"):
        sys.modules.pop(m)
%env MUYGPYS_BACKEND=numpy
%env MUYGPYS_FTYPE=64

In [None]:
import numpy as np

from MuyGPyS._test.sampler import UnivariateSampler2D, print_results
from MuyGPyS.gp import MuyGPS
from MuyGPyS.gp.distortion import AnisotropicDistortion, IsotropicDistortion, l2
from MuyGPyS.gp.hyperparameter import ScalarHyperparameter
from MuyGPyS.gp.kernels import Matern
from MuyGPyS.gp.noise import HomoscedasticNoise
from MuyGPyS.gp.tensors import make_predict_tensors, make_train_tensors
from MuyGPyS.neighbors import NN_Wrapper
from MuyGPyS.optimize import optimize_from_tensors
from MuyGPyS.optimize.batch import sample_batch
from MuyGPyS.optimize.loss import lool_fn
from MuyGPyS.optimize.sigma_sq import muygps_sigma_sq_optim

We will set a random seed here for consistency when building docs.
In practice we would not fix a seed.

In [None]:
np.random.seed(0)

## Sampling a 2D Surface from a Conventional GP

This notebook will use a simple two-dimensional curve sampled from a conventional Gaussian process.
We will specify the domain as a simple grid on a one-dimensional surface and divide the observations näively into train and test data.

Feel free to download the source notebook and experiment with different parameters.

First we specify the data size and the proportion of the train/test split.

In [None]:
points_per_dim = 60
train_ratio = 0.2

We will assume that the true data is produced with no noise, so we specify a very small noise prior for numerical stability.
This is an idealized experiment with effectively no instrument error.

In [None]:
nugget_noise = HomoscedasticNoise(1e-14)

We will perturb our simulated observations (the training data) with some i.i.d Gaussian measurement noise.

In [None]:
measurement_noise = HomoscedasticNoise(1e-7)

Finally, we will specify a `nu` kernel hyperparameters.
`nu` determines how smooth the GP prior is.
The larger `nu` grows, the smoother sampled functions will become.

In [None]:
sim_nu = ScalarHyperparameter(1.5)

We will use an anisotropic distance metric, where displacement along the dimensions are weighted differently.
Each dimension has a corresponding `length_scale` parameter.

In [None]:
sim_length_scale0 = ScalarHyperparameter(0.1)
sim_length_scale1 = ScalarHyperparameter(0.5)

We use all of these parameters to define a Matérn kernel GP and a sampler for convenience.
The `UnivariateSampler2D` class is a convenience class for this tutorial, and is not a part of the library.
We will use an anisotropic distance metric for our kernel.

In [None]:
sampler = UnivariateSampler2D(
    points_per_dim=points_per_dim,
    train_ratio=train_ratio,
    kernel=Matern(
        nu=sim_nu,
        metric=AnisotropicDistortion(
            l2,
            length_scale0=sim_length_scale0,
            length_scale1=sim_length_scale1,
        ),
    ),
    eps=nugget_noise,
    measurement_eps=measurement_noise,
)

Finally, we will sample a curve from this GP prior and visualize it.
Note that we perturb the train responses (the values that our model will actual receive) with Gaussian measurement noise.
Further note that this is not especially fast, as sampling from a conventional Gaussian process requires computing the Cholesky decomposition of a `(data_count, data_count)` matrix.

In [None]:
train_features, test_features = sampler.features()

In [None]:
train_responses, test_responses = sampler.sample()

In [None]:
sampler.plot_sample()

We can observe that our choice of anisotropy has caused the globular Gaussian features in the sampled surface to "smear" in the direction of the more heavily weighted axis.  

## Training an Anisotropic Model

We will not belabor the details covered in the [Univariate Regression Tutorial](./univariate_regression_tutorial.ipynb).
We must similarly construct a nearest neighbors index and sample a training batch in order to optimize a model.

⚠️ For now, we use isotropic nearest neighbors as we do not have a guess as to the anisotropic scaling. Future versions of the library will use learned anisotropy to modify neighborhood structure during optimization. 

In [None]:
nn_count = 30
nbrs_lookup = NN_Wrapper(train_features, nn_count, nn_method="exact", algorithm="ball_tree")
batch_count = sampler.train_count
batch_indices, batch_nn_indices = sample_batch(
    nbrs_lookup, batch_count, sampler.train_count
)

We construct a MuyGPs object with a Matérn kernel.
For simplicity, we will fix `nu` and attempt to optimize the two `length_scale`s.

In [None]:
exp_length_scale0 = ScalarHyperparameter("log_sample", (0.01, 1.0))
exp_length_scale1 = ScalarHyperparameter("log_sample", (0.01, 1.0))
muygps_anisotropic = MuyGPS(
    kernel=Matern(
        nu=sim_nu,
        metric=AnisotropicDistortion(
            l2,
            length_scale0=exp_length_scale0,
            length_scale1=exp_length_scale1,
        ),
    ),
    eps=measurement_noise,
)

⚠️ We will also create a fixed muygps object with the hyperparameters that we used for simulation, as well as an isotropic muygps that will be optimized for comparison.

In [None]:
muygps_fixed = MuyGPS(
    kernel=Matern(
        nu=sim_nu,
        metric=AnisotropicDistortion(
            l2,
            length_scale0=sim_length_scale0,
            length_scale1=sim_length_scale1,
        ),
    ),
    eps=measurement_noise,
)
muygps_isotropic = MuyGPS(
    kernel=Matern(
        nu=sim_nu,
        metric=IsotropicDistortion(
            l2,
            length_scale=ScalarHyperparameter("log_sample", (0.01, 1.0)),
        ),
    ),
    eps=measurement_noise,
)

We build our difference tensors as usual and use Bayesian optimization.

In [None]:
(
    batch_crosswise_diffs,
    batch_pairwise_diffs,
    batch_targets,
    batch_nn_targets,
) = make_train_tensors(
    batch_indices,
    batch_nn_indices,
    train_features,
    train_responses,
)

Keyword arguments for the optimization:

In [None]:
opt_kwargs = {
    "loss_fn": lool_fn,
    "obj_method": "loo_crossval",
    "opt_method": "bayesian",
    "verbose": True,
    "random_state": 1,
    "init_points": 5,
    "n_iter": 30,
    "allow_duplicate_points": True,
}

In [None]:
muygps_anisotropic = optimize_from_tensors(
    muygps_anisotropic,
    batch_targets,
    batch_nn_targets,
    batch_crosswise_diffs,
    batch_pairwise_diffs,
    **opt_kwargs,
)

In [None]:
print(f"BayesianOptimization finds an optimimal pair of length scales")
print(f"\tlength_scale0 is {muygps_anisotropic.kernel.distortion_fn.length_scale['length_scale0']()}")
print(f"\tlength_scale1 is {muygps_anisotropic.kernel.distortion_fn.length_scale['length_scale1']()}")

Note here that these returned values might be a little different than what we used to sample the surface due to mutual unidentifiability between each other and the `sigma_sq` parameter.
However, `length_scale0 < length_scale1` as expected.

⚠️ Here we optimize the isotropic benchmark

In [None]:
muygps_isotropic = optimize_from_tensors(
    muygps_isotropic,
    batch_targets,
    batch_nn_targets,
    batch_crosswise_diffs,
    batch_pairwise_diffs,
    **opt_kwargs,
)

In [None]:
print(f"BayesianOptimization finds that the optimimal isotropic length scale is {muygps_isotropic.kernel.distortion_fn.length_scale()}")

In [None]:
muygps_anisotropic = muygps_sigma_sq_optim(
    muygps_anisotropic, 
    batch_pairwise_diffs, 
    batch_nn_targets, 
    sigma_method="analytic"
)
print(f"Optimized anisotropic sigma_sq: {muygps_anisotropic.sigma_sq()}")

⚠️ And the isotropic comparison

In [None]:
muygps_isotropic = muygps_sigma_sq_optim(
    muygps_isotropic, 
    batch_pairwise_diffs, 
    batch_nn_targets, 
    sigma_method="analytic"
)
print(f"Optimized isotropic sigma_sq: {muygps_isotropic.sigma_sq()}")

## Inference

As in the [Univariate Regression Tutorial](./univariate_regression_tutorial.ipynb), we must realize difference tensors formed from the testing data and apply them to form Gaussian process predictions for our problem.

In [None]:
test_count, _ = test_features.shape
indices = np.arange(test_count)
test_nn_indices, _ = nbrs_lookup.get_nns(test_features)
(
    test_crosswise_diffs,
    test_pairwise_diffs,
    test_nn_targets,
) = make_predict_tensors(
    indices,
    test_nn_indices,
    test_features,
    train_features,
    train_responses,
)

As before we will evaluate our prediction performance in terms of RMSE, mean diagonal posterior variance, the mean 95% confidence interval size, and the coverage, which ideally should be near 95%. 

In [None]:
Kcross_anisotropic = muygps_anisotropic.kernel(test_crosswise_diffs)
K_anisotropic = muygps_anisotropic.kernel(test_pairwise_diffs)

predictions_anisotropic = muygps_anisotropic.posterior_mean(
    K_anisotropic, Kcross_anisotropic, test_nn_targets
)
variances_anisotropic = muygps_anisotropic.posterior_variance(
    K_anisotropic, Kcross_anisotropic
)

confidence_intervals_anisotropic = np.sqrt(variances_anisotropic) * 1.96
coverage_anisotropic = (
    np.count_nonzero(
        np.abs(test_responses - predictions_anisotropic) < confidence_intervals_anisotropic
    ) / test_count
)
print_results(
    "anisotropic", test_responses, predictions_anisotropic, variances_anisotropic, confidence_intervals_anisotropic, coverage_anisotropic
)

⚠️ We also evaluate the fixed and isotropic models

In [None]:
Kcross_fixed = muygps_fixed.kernel(test_crosswise_diffs)
K_fixed = muygps_fixed.kernel(test_pairwise_diffs)

predictions_fixed = muygps_fixed.posterior_mean(
    K_fixed, Kcross_fixed, test_nn_targets
)
variances_fixed = muygps_fixed.posterior_variance(
    K_fixed, Kcross_fixed
)

confidence_intervals_fixed = np.sqrt(variances_fixed) * 1.96
coverage_fixed = (
    np.count_nonzero(
        np.abs(test_responses - predictions_fixed) < confidence_intervals_fixed
    ) / test_count
)
print_results(
    "fixed anisotropic", test_responses, predictions_fixed, variances_fixed, confidence_intervals_fixed, coverage_fixed
)

In [None]:
Kcross_isotropic = muygps_isotropic.kernel(test_crosswise_diffs)
K_isotropic = muygps_isotropic.kernel(test_pairwise_diffs)

predictions_isotropic = muygps_isotropic.posterior_mean(K_isotropic, Kcross_isotropic, test_nn_targets)
variances_isotropic = muygps_isotropic.posterior_variance(K_isotropic, Kcross_isotropic)

confidence_intervals_isotropic = np.sqrt(variances_isotropic) * 1.96
coverage_isotropic = (
    np.count_nonzero(
        np.abs(test_responses - predictions_isotropic) < confidence_intervals_isotropic
    ) / test_count
)
print_results(
    "isotropic", test_responses, predictions_isotropic, variances_isotropic, confidence_intervals_isotropic, coverage_isotropic
)

This dataset is low-dimensional so we can plot our predictions and visually evaluate their performance. 
We plot below the expected (true) surface, and the surface that our model predicts.
Note that they are visually similar and major trends are captured, although there are some differences. 

In [None]:
sampler.plot_predictions(predictions_anisotropic)

We will also plot information about the errors.
Below we produce three plots that help us to understand our results.
The left plot shows the residual, which is the difference between the true values and our expectations.
The middle plot shows the magnitude of the 95% confidence interval.
The larger the confidence interval, the less certain the model is of its predictions.
Finally, the right plot shows the difference between the 95% confidence interval length and the magnitude of the residual.
All of the points larger than zero (in red) are not captured by the confidence interval.
Hence, this plot shows our coverage.

In [None]:
sampler.plot_errors(
    "Anisotropic",
    predictions_anisotropic,
    confidence_intervals_anisotropic,
    "Fixed",
    predictions_fixed,
    confidence_intervals_fixed,
    "Isotropic",
    predictions_isotropic,
    confidence_intervals_isotropic,
)