## **Gaussian Process Campaign with Functional Data**

In many scenarios, we may be dealing with *functional* data. This means that the input, output, or both, are sampled over a particular dimension (e.g. time). In `twinlab`, data are presented in column-feature format, meaning a single data sample of functional format may contain hundreds or thousands of columns. 

Gaussian Process models do not scale well to these scenarios, so we provide the ability to perform dimensionality reduction before model fitting. 

This notebook will cover:
- How to decompose functional inputs and outputs

In [None]:
# Third party imports
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from itertools import product

# twinLab import
import twinlab as tl

### **Problem Formulation**
Here, we define a problem with two input dimensions and one functional output defined over a grid of sample locations

In [None]:
# Grid over which the functional output is defined
grid = np.linspace(0, 1)

# True function: Forrester function with variable a and b
def model(x):
    return (x[0] * grid - 2) ** 2 * np.sin(x[1] * grid - 4)

# Define input data
x = np.random.uniform(size=(100, 2))
x[:, 0] = x[:, 0] * 4 + 4
x[:, 1] = x[:, 1] * 4 + 10

# Compute output data
y = np.zeros((x.shape[0], grid.size))

for i, x_i in enumerate(x):
    y[i, :] = model(x_i)

y = {"y_{}".format(i): y[:, i] for i in range(grid.size)}

# Save to DataFrame
df = pd.DataFrame({"x1": x[:, 0], "x2": x[:, 1], **y})
df.head()

In [None]:
# Define the name of the dataset
dataset_id = "FunctionalGP_Data"

# Upload the dataset to the cloud
tl.upload_dataset(df, dataset_id, verbose=True)

### **Functional Campaign Workflow**

In `twinlab`, dimensionality reduction is implemented in the form of truncated Singular Value Decomposition (tSVD), and is accessible via the convenience keyword parameters `decompose_inputs` and `decompose_outputs`, and the ratio of retained singular components are controlled by the keywords `input_explained_variance` and `output_explained_variance`.

One can decompose the inputs, outputs, or both in the same `Campaign`.

In [None]:
# Initialise campaign
campaign_id = "FunctionalGP"

campaign_params = {
    "dataset_id": dataset_id,                   # This points the campaign to the uploaded dataset
    "inputs": list(df.columns[:2]),             # Using the datasets column headers define the input and output data
    "outputs": list(df.columns[2:]),
    "test_train_ratio": 0.75,                   # Determine how much data is used for training, here 75% is used to train the model  
    "estimator": "gaussian_process_regression", # and 25% is used to test it.
    "decompose_inputs": False,                  # Whether to reduce input dimensions
    "input_explained_variance": 0.99,           # Keep 99% of variance information in the data
    "decompose_outputs": True,                  # Whether to reduce output dimensions
    "output_explained_variance": 0.99999,
}                                        

# Start a new campaign and train a surrogate model
tl.train_campaign(campaign_params, campaign_id, verbose=True)

In [None]:
# Create grid of output plots
grid = np.linspace(0, 1)

x1 = [4, 6, 8]
x2 = [10, 12, 14]

X = np.array(list(product(x1, x2)))
ax_i = list(product([0, 1, 2], [0, 1, 2]))

# Create output plot and save in directory
fig, axes = plt.subplots(figsize=(14, 12), nrows=3, ncols=3)

# Setup legend
legend_labels = {}

for i, x_i in enumerate(X):
    r, c = ax_i[i]
    ax = axes[r, c]

    X_test = pd.DataFrame(x_i[np.newaxis, :], columns=["x1", "x2"])
    y_test = model(X_test.values.flatten())

    y_mean, y_stdev = tl.predict_campaign(X_test, campaign_id)
    y_mean = y_mean.values
    y_stdev = y_stdev.values

    ax.set_title("a = {}, b = {}".format(x_i[0], x_i[1]))
    ax.plot(grid, y_mean.flatten(), c="k")
    ax.plot(grid, y_test.flatten(), c="red", linestyle="dashed")
    ax.fill_between(
        grid,
        (y_mean - 1.96 * y_stdev).flatten(),
        (y_mean + 1.96 * y_stdev).flatten(),
        color="k",
        alpha=0.1,
    )

    ax.set_xlabel("x")
    ax.set_ylabel("y")

# Store labels for the legend
legend_labels[f"Prediction"] = plt.Line2D([0], [0], color="k")
legend_labels[f"True Function"] = plt.Line2D([0], [0], color="red", linestyle="dashed")

# Print legend
fig.legend(legend_labels.values(), legend_labels.keys(), loc="upper left")
fig.tight_layout()

In [None]:
# Delete campaign and dataset
tl.delete_campaign(campaign_id)

tl.delete_dataset(dataset_id)