## System Description
1. We have a set of COFs from a database. Each COF is characterized by a feature vector $$x_{COF} \in X \subset R^d$$ were d=14.


2. We have **two different types** of simulations to calculate **the same material property $S_{Xe/Kr}$**. Therefore, we have a Single-Task/Objective (find the material with the optimal selevtivity), Multi-Fidelity problem. 
    1. low-fidelity  = Henry coefficient calculation - MC integration - cost=1
    2. high-fidelity = GCMC mixture simulation - 80:20 (Kr:Xe) at 298 K and 1.0 bar - cost=30


3. We will initialize the system with *two* COFs at both fidelities in order to initialize the Covariance Matrix.
    - The fist COF will be the one closest to the center of the normalized feature space
    - The second COF will be chosen at random


4. Each surrogate model will **only train on data acquired at its level of fidelity** (Heterotopic data). $$X_{lf} \neq X_{hf} \subset X$$
    1. We are using the augmented EI acquisition function from [here](https://link.springer.com/content/pdf/10.1007/s00158-005-0587-0.pdf)


5. **kernel model**: 
    1.  We need a Gaussian Process (GP) that will give a *correlated output for each fidelity* i.e. we need a vector-valued kernel
    2. Given the *cost aware* acquisition function, we anticipate the number of training points at each fidelity *will not* be equal (asymmetric scenario) $$n_{lf} > n_{hf}$$
        - perhaps we can force the symmetric case, $n_{lf} = n_{hf} = n$, if we can include `missing` or `empty` entries in the training sets.

### Strategy
1. Implement SingleTaskMultiFidelity Gp
2. Get augmented EI working


In [1]:
import torch
import gpytorch
from botorch.models import SingleTaskMultiFidelityGP
from botorch.models.transforms.outcome import Standardize
from gpytorch.mlls import ExactMarginalLogLikelihood
from botorch import fit_gpytorch_model



from scipy.stats import norm
import math 
import numpy as np
import h5py
import matplotlib.pyplot as plt
import os

In [2]:
###
#  import data
###
f = h5py.File("targets_and_normalized_features.jld2", "r")

X = torch.from_numpy(np.transpose(f["X"][:]))
henry_y = torch.from_numpy(np.transpose(f["henry_y"][:]))
gcmc_y  = torch.from_numpy(np.transpose(f["gcmc_y"][:]))
print("raw data - \nX:", X.shape)
print("henry_y:", henry_y.shape)
print("gcmc_y: ", gcmc_y.shape)

raw data - 
X: torch.Size([608, 14])
henry_y: torch.Size([608])
gcmc_y:  torch.Size([608])


In [35]:
###
#  construct initial inputs
#  1. get initial points
#  2. standardize outputs
#  3. stack into tensor
###
nb_COFs = henry_y.shape[0] # total number of COFs data points 
nb_COFs_initialization = 7 # number of COFs to initialize with
ids_acquired = np.random.choice(np.arange((nb_COFs)), size=nb_COFs_initialization, replace=False)

fidelity_acquired = torch.randint(2, (nb_COFs_initialization, 1))
fidelity_acquired

costs_acquired = torch.rand(nb_COFs_initialization) # make this the actual costs
costs_acquired

tensor([0.9729, 0.5439, 0.0461, 0.9468, 0.8175, 0.7543, 0.6284])

In [4]:
def build_X_train(ids_acquired, fidelity_acquired):
    return torch.cat((X[ids_acquired, :], fidelity_acquired), dim=1)

def build_y_train(ids_acquired, fidelity_acquired):
    train_y = torch.tensor((), dtype=torch.float64).new_zeros((ids_acquired.shape[0], 1))
    for i, fid in enumerate(fidelity_acquired):
        if fid == 0:
            train_y[i][0] = henry_y[ids_acquired[i]]
        else:
            train_y[i][0] = gcmc_y[ids_acquired[i]]
    return train_y

X_train = build_X_train(ids_acquired, fidelity_acquired)

In [5]:
y_train = build_y_train(ids_acquired, fidelity_acquired)

In [6]:
X_unsqueezed = X.unsqueeze(1)
X_unsqueezed.shape

torch.Size([608, 1, 14])

In [7]:
###
#  construct surrogate model
#   - pass argument to standardize outputs (zero mean, unit variance)
###
def train_surrogate_model(X_train, y_train):
    model = SingleTaskMultiFidelityGP(
        X_train, 
        y_train, 
        outcome_transform=Standardize(m=1), # m is the output dimension
        data_fidelity=X_train.shape[1] - 1
    )   
    mll = ExactMarginalLogLikelihood(model.likelihood, model)
    fit_gpytorch_model(mll)
    return mll, model

mll, model = train_surrogate_model(X_train, y_train)

In [24]:
def mu_sigma(model, X, fidelity):
    f = torch.tensor((), dtype=torch.float64).new_ones((nb_COFs, 1)) * fidelity
    X_f = torch.cat((X, f), dim=1) # last col is associated fidelity
    f_posterior = model.posterior(X_f)
    return f_posterior.mean.squeeze().detach().numpy(), f_posterior.variance.squeeze().detach().numpy()

lf_mu, lf_sigma = mu_sigma(model, X, 1)

In [25]:
def get_y_max(ids_acquired, fidelity_acquired):
    y_max = 0.0
    for i, fid in enumerate(fidelity_acquired):
        if fid == 1:
            if gcmc_y[ids_acquired[i]] > y_max:
                y_max = gcmc_y[ids_acquired[i]]
    return y_max.item()

In [None]:
def cost_ratio(fidelity_acquired, costs_acquired):
    avg_cost_hf = np.mean(costs_acquired[fidelity_acquired == 1])
    avg_cost_lf = np.mean(costs_acquired[fidelity_acquired == 0])
    return avg_cost_hf / avg_cost_lf

In [31]:
def EI_hf(model, X, ids_acquired, fidelity_acquired):
    mu_hf, sigma_hf = mu_sigma(model, X, 1) # only use hf
    y_max = get_y_max(ids_acquired, fidelity_acquired)
    
    z = (mu_hf - y_max) / sigma_hf
    explore_term = sigma_hf * norm.pdf(z)
    exploit_term = (mu_hf - y_max) * norm.cdf(z)
    ei = explore_term + exploit_term
    return np.maximum(ei, np.zeros(nb_COFs))

EI_hf(model, X, ids_acquired, fidelity_acquired)

array([9.10039559e-002, 4.63442215e-002, 2.87192587e-002, 9.32618730e-002,
       2.70825728e-002, 4.60771573e-003, 3.69035438e-002, 1.72999980e-010,
       2.94531283e-005, 8.77585942e-002, 9.54352173e-002, 9.29169471e-002,
       8.54693096e-002, 8.23572154e-002, 1.21507422e-001, 3.29702174e-002,
       6.73277381e-003, 5.97893892e-002, 1.19639368e-001, 4.13518425e-002,
       6.40299509e-021, 2.23374731e-003, 4.09012918e-002, 4.95012200e-003,
       1.28808047e-003, 8.94776540e-002, 6.94663281e-026, 3.28160651e-016,
       1.08437714e-005, 2.78829352e-002, 3.05327172e-004, 1.56571526e-002,
       6.26883159e-003, 3.32240763e-002, 3.40428852e-002, 5.97023438e-002,
       3.00220734e-002, 4.65919310e-002, 2.51996298e-002, 6.58002656e-002,
       4.16539723e-002, 1.21370740e-001, 7.85087494e-002, 7.86544016e-002,
       8.18555883e-002, 9.13978184e-002, 3.38703111e-002, 3.94887014e-002,
       4.51193446e-002, 5.41086445e-005, 1.64571505e-001, 1.76527112e-002,
       3.16138484e-062, 3

In [12]:
mll

ExactMarginalLogLikelihood(
  (likelihood): GaussianLikelihood(
    (noise_covar): HomoskedasticNoise(
      (noise_prior): GammaPrior()
      (raw_noise_constraint): GreaterThan(1.000E-04)
    )
  )
  (model): SingleTaskMultiFidelityGP(
    (likelihood): GaussianLikelihood(
      (noise_covar): HomoskedasticNoise(
        (noise_prior): GammaPrior()
        (raw_noise_constraint): GreaterThan(1.000E-04)
      )
    )
    (mean_module): ConstantMean()
    (covar_module): ScaleKernel(
      (base_kernel): LinearTruncatedFidelityKernel(
        (raw_power_constraint): Positive()
        (power_prior): GammaPrior()
        (covar_module_unbiased): MaternKernel(
          (lengthscale_prior): GammaPrior()
          (raw_lengthscale_constraint): Positive()
          (distance_module): Distance()
        )
        (covar_module_biased): MaternKernel(
          (lengthscale_prior): GammaPrior()
          (raw_lengthscale_constraint): Positive()
          (distance_module): Distance()
        

In [5]:
mll.likelihood.marginal

<bound method _GaussianLikelihoodBase.marginal of GaussianLikelihood(
  (noise_covar): HomoskedasticNoise(
    (noise_prior): GammaPrior()
    (raw_noise_constraint): GreaterThan(1.000E-04)
  )
)>

In [13]:
post = model.posterior(train_x_full) # .mvn, .mean, .variance

In [7]:
###
#  Acquisition function
###
# def u():
#     # utility function: u(x) = -f̂ₘ(x) - csₘ(x) 
#     c = 1.0
#     sm = model.posterior().variance
#     x_star = 
#     return x_star

# def α1():
#     # corr[fₗᵖ(x), fₘᵖ(x)]
#     return

# def α2():
#     #
#     return

# def α3(cm, cl):
#     # cost ratio: cₘ/cₗ
#     cost_ratio = cm / cl
#     return cost_ratio

# EI = (f̂ₘ(x*) - f̂ₘ(m))Φ(z) + sₘ(x)ϕ(z)
# z = (f̂ₘ(x*) - f̂ₘ(x)) / sₘ(x)
# where sₘ(x) = sqrt(cov[fₘᵖ(x), fₘᵖ(x)]) i.e. MSE,
# and x* is the "efective best solution" -> x* = argmax_{x in {xᵢ; i=1,..,n}}[u(x)]
# s.t. u(x) = -f̂ₘ(x) - csₘ(x) is the utility function, c=1.0

### Tutorial 

[link](https://botorch.org/tutorials/discrete_multi_fidelity_bo) to web page

In [8]:
# from botorch.test_functions.multi_fidelity import AugmentedHartmann

# problem = AugmentedHartmann(negate=True).to()
# fidelities = torch.tensor([0.5, 0.75, 1.0])

# def generate_initial_data(n=16):
#     # generate training data
#     train_x = torch.rand(n, 6) # torch.Size([n, 6])
#     train_f = fidelities[torch.randint(2, (n, 1))] # torch.Size([n, 1]), sampled fidelities of training data
#     train_x_full = torch.cat((train_x, train_f), dim=1) # torch.Size([16, 7]), last col is associated fidelity
#     train_obj = problem(train_x_full).unsqueeze(-1) # torch.Size([16, 1]), add output dimension
#     return train_x_full, train_obj
    

# def initialize_model(train_x, train_obj):
#     # define a surrogate model suited for a "training data"-like fidelity parameter
#     # in dimension 6, as in [2]
#     model = SingleTaskMultiFidelityGP(
#         train_x, 
#         train_obj, 
#         outcome_transform=Standardize(m=1),
#         data_fidelity=6
#     )   
#     mll = ExactMarginalLogLikelihood(model.likelihood, model)
#     return mll, model

# train_x, train_obj = generate_initial_data(n=16)
# mll, model = initialize_model(train_x, train_obj)

# fit_gpytorch_model(mll)

In [9]:
# cumulative_cost = 0.0
# N_ITER = 3 if not SMOKE_TEST else 1


# for _ in range(N_ITER):
#     mll, model = initialize_model(train_x, train_obj)
#     fit_gpytorch_model(mll)
#     mfkg_acqf = get_mfkg(model)
#     new_x, new_obj, cost = optimize_mfkg_and_get_observation(mfkg_acqf)
#     train_x = torch.cat([train_x, new_x])
#     train_obj = torch.cat([train_obj, new_obj])
#     cumulative_cost += cost