# More Scalable GPs

In this notebook I will demonstrate a few implementations of some more advanced and scalable GPs. These methods use approximations via inducing inputs (a smart subset of the data to represent the entire dataset). This allows the GP algorithms to run in $\mathcal{O}(NM^2)$ instead of $\mathcal{O}(N^3)$ where $N$ are the number of samples and $M$ are the number of inducing inputs.

The algorithms to be used below are as follows:
* Sparse GP - FITC Approximation (**TODO**)
* Sparse GP - VFE Approximation (**Done**)

In [24]:
import numpy as np
from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
from sklearn.base import clone
from sklearn.decomposition import PCA
from sklearn.multioutput import MultiOutputRegressor
from sklearn.compose import TransformedTargetRegressor
from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score
import time as time

#### Import Module

So here you need to add the path to the models directory. There are automated ways to do this but just for a simple case, you can just manually add the directoy whenever you start the notebook. See below.

In [25]:
import sys

# Add the path to the models
sys.path.insert(0, '/Users/eman/Documents/code_projects/ml4ocean')

# Import the GP Functions
from src.models.gpy import SGP, SVGP
import GPy

%load_ext autoreload
%autoreload 2

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


In [26]:
# Make Fake Dataset
X, y = make_regression(
    n_samples=5000, 
    n_features=10,    # Total Features
    n_informative=3,   # Informative Features 
    n_targets=10,
    bias=10,
    noise=0.8,
    random_state=123

)
train_size = 2000

# Training and Testing
xtrain, xtest, ytrain, ytest = train_test_split(
    X, y, train_size=train_size, random_state=123
)

xtrain.shape, ytrain.shape



((2000, 10), (2000, 10))

### Sparse Variational GP

In [30]:
# Define Kernel Function
input_dimensions = X.shape[1]
kernel = GPy.kern.RBF(
    input_dim=input_dimensions, 
    ARD=False
)

# define GP model
n_inducing = 100
gp_model = SVGP(
    kernel=kernel,
    n_inducing=n_inducing,
    max_iters=300, 
    optimizer='lbfgs',
    verbose=True,
    n_restarts=3
)


# train GP Model
# print(xtrain, ytrain)
t0 = time.time()
gp_model.fit(xtrain, ytrain)
t1 = time.time() - t0

# Predictions
ypred, ystd = gp_model.predict(xtest, return_std=True)



Optimization restart 1/3, f = 26394.87495869398
Optimization restart 2/3, f = 26847.46038106084
Optimization restart 3/3, f = 26004.22769281268


In [31]:
gp_model.display_model()

sparse_gp.,value,constraints,priors
inducing inputs,"(100, 10)",,
rbf.variance,5568539.0740072485,+ve,
rbf.lengthscale,321.43264291430154,+ve,
Gaussian_noise.variance,0.6690238780760511,+ve,


In [32]:
# Get Stats
mae = mean_absolute_error(ypred, ytest)
mse = mean_squared_error(ypred, ytest)
rmse = np.sqrt(mse)
r2 = r2_score(ypred, ytest)

print(
    f"MAE: {mae:.3f}\nMSE: {mse:.3f}\nRMSE: {rmse:.3f}\nR2: {r2:.3f}" 
    f" \nTime: {t1:.3} seconds"
)

MAE: 0.641
MSE: 0.643
RMSE: 0.802
R2: 1.000 
Time: 1.44e+02 seconds


### SVGP - MultiOutput w. PCA Transformer (Manually)

In [22]:
# Define Kernel Function
input_dimensions = X.shape[1]
kernel = GPy.kern.RBF(
    input_dim=input_dimensions, 
    ARD=False
)

# define GP model
n_inducing = 100
gp_model = SVGP(
    kernel=kernel,
    n_inducing=n_inducing,
    max_iters=300, 
    optimizer='lbfgs',
    verbose=True
)

# Define target transformer
pca_model = PCA(n_components=3)

# Transform Targes
ytrain_red = pca_model.fit_transform(ytrain)


# train GP Model
t0 = time.time()
gp_model.fit(xtrain, ytrain_red)
t1 = time.time() - t0

# Predictions
ypred_red, ystd = gp_model.predict(xtest, return_std=True)

# Inverse transform predictions
ypred = pca_model.inverse_transform(ypred_red)

Running L-BFGS-B (Scipy implementation) Code:
  runtime   i     f              |g|        
    00s15  001   1.053726e+08   4.435381e+15 
    01s21  013   2.425162e+06   3.022081e+09 
    03s32  033   3.278976e+04   6.144264e+04 
    10s59  112   2.825658e+04   5.701109e+04 
    25s36  257   1.017394e+04   8.367080e+04 
    29s50  291   8.752625e+03   6.245622e+03 
    30s84  302   8.418246e+03   2.687379e+03 
Runtime:     30s84
Optimization status: Maximum number of f evaluations reached



In [23]:
# Get Stats
mae = mean_absolute_error(ypred, ytest)
mse = mean_squared_error(ypred, ytest)
rmse = np.sqrt(mse)
r2 = r2_score(ypred, ytest)

print(
    f"MAE: {mae:.3f}\nMSE: {mse:.3f}\nRMSE: {rmse:.3f}\nR2: {r2:.3f}" 
    f" \nTime: {t1:.3} seconds"
)

MAE: 0.640
MSE: 0.640
RMSE: 0.800
R2: 1.000 
Time: 31.1 seconds
