## System Description
1. We have a set of COFs from a database. Each COF is characterized by a feature vector $$x_{COF} \in X \subset R^d$$ were d=14.


2. We have **two different types** of simulations to calculate **the same material property $S_{Xe/Kr}$**. Therefore, we have a Single-Task/Objective 
$$argmax_{x_{COF} \in X}[S_{Xe/Kr}(x_{COF})]$$

3. Multi-Fidelity problem. 
    1. low-fidelity  => Henry coefficient calculation - MC integration: $S_{Xe/Kr} = \frac{H_{Xe}}{H_{Kr}}$
    2. high-fidelity => GCMC mixture simulation - 80:20 (Kr:Xe) at 298 K and 1.0 bar: $S_{Xe/Kr} = \frac{n_{Xe} / n_{Kr}}{y_{Xe}/y_{Kr}}$


3. We will initialize the system with a few COFs at **both** fidelities in order to initialize the Covariance Matrix.
    1. The fist COF will be the one closest to the center of the normalized feature space
    2. The rest will be chosen to maximize diversity of the training set


4. Each surrogate model will **only train on data acquired at its level of fidelity** (Heterotopic data). $$X_{lf} \neq X_{hf} \subset X$$
    1. We  use the augmented-EI (aEI) acquisition function from [here](https://link.springer.com/content/pdf/10.1007/s00158-005-0587-0.pdf)
    2. Botorch GP surrogate model: [SingleTaskMultiFidelityGP](https://botorch.org/api/models.html#module-botorch.models.gp_regression_fidelity)
    3. Needed to use [this](https://botorch.org/api/optim.html#module-botorch.optim.fit) optimizer to correct matrix jitter
    4. Helpful [tutorial](https://botorch.org/tutorials/discrete_multi_fidelity_bo) for a similar BoTorch Model used

In [1]:
import torch
import gpytorch
from botorch.models import SingleTaskMultiFidelityGP
from botorch.models.transforms.outcome import Standardize
from gpytorch.mlls import ExactMarginalLogLikelihood
from botorch import fit_gpytorch_model
from botorch.optim.fit import fit_gpytorch_torch # fix Cholecky jitter error
from scipy.stats import norm
from sklearn.decomposition import PCA
import math 
import numpy as np
import matplotlib.pyplot as plt
import pickle
import h5py # for .jld2 files
import os

# config plot settings
plt.rcParams["font.size"] = 16

  return torch._C._cuda_getDeviceCount() > 0


In [2]:
###
#  Load Data
###
file = h5py.File("targets_and_normalized_features.jld2", "r")
# feature matrix
X = torch.from_numpy(np.transpose(file["X"][:]))
# simulation data
y = [torch.from_numpy(np.transpose(file["henry_y"][:])), 
     torch.from_numpy(np.transpose(file["gcmc_y"][:]))]
# associated simulation costs
cost = [np.transpose(file["henry_total_elapsed_time"][:]), 
        np.transpose(file["gcmc_elapsed_time"][:])]

# total number of COFs in data set
nb_COFs = X.shape[0] 

print("raw data - \n\tX:", X.shape)
for f in range(2):
    print("\tfidelity:", f)
    print("\t\ty:", y[f].shape)
    print("\t\tcost: ", cost[f].shape)
    
print("\nEnsure features are normalized - ")
print("max:\n", torch.max(X, 0).values)
print("min:\n", torch.min(X, 0).values)

raw data - 
	X: torch.Size([608, 14])
	fidelity: 0
		y: torch.Size([608])
		cost:  (608,)
	fidelity: 1
		y: torch.Size([608])
		cost:  (608,)

Ensure features are normalized - 
max:
 tensor([1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],
       dtype=torch.float64)
min:
 tensor([0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
       dtype=torch.float64)


In [3]:
print("total high-fidelity cost:", sum(cost[1]).item(), "[min]")
print("total low-fidelity cost: ", sum(cost[0]).item(), "[min]\n")

print("average high-fidelity cost:", np.mean(cost[1]), "[min]")
print("average low-fidelity cost: ", np.mean(cost[0]), "[min]")
print("average cost ratio:\t   ", np.mean(cost[1] / cost[0]))

total high-fidelity cost: 139887.66223703226 [min]
total low-fidelity cost:  10076.305239888028 [min]

average high-fidelity cost: 230.0783918372241 [min]
average low-fidelity cost:  16.57287046034216 [min]
average cost ratio:	    13.444745568580501


## Helper Functions

#### Construct Initial Inputs

In [4]:
# find COF closest to the center of feature space
def get_initializing_COF(X):
    # center of feature space
    feature_center = np.ones(X.shape[1]) * 0.5
    # max possible distance between normalized features
    return np.argmin(np.linalg.norm(X - feature_center, axis=1))

# yields np.array([25, 494, 523])
def diverse_set(X, train_size):
    # initialize with one random point; pick others in a max diverse fashion
    ids_train = [get_initializing_COF(X)]
    # select remaining training points
    for j in range(train_size - 1):
        # for each point in data set, compute its min dist to training set
        dist_to_train_set = np.linalg.norm(X - X[ids_train, None, :], axis=2)
        assert np.shape(dist_to_train_set) == (len(ids_train), nb_COFs)
        min_dist_to_a_training_pt = np.min(dist_to_train_set, axis=0)
        assert np.size(min_dist_to_a_training_pt) == nb_COFs
        
        # acquire point with max(min distance to train set) i.e. Furthest from train set
        ids_train.append(np.argmax(min_dist_to_a_training_pt))
    assert np.size(np.unique(ids_train)) == train_size # must be unique
    return np.array(ids_train)

In [5]:
def initialize_acquired_set(X, y, nb_COFs_initialization, discrete_fidelities):
    cof_ids = diverse_set(X, nb_COFs_initialization) # np.array(ids_train)
    return torch.tensor([[f_id, cof_id] for cof_id in cof_ids for f_id in discrete_fidelities])

In [6]:
# construct feature matrix of acquired points
def build_X_train(acquired_set):
    cof_ids = [a[1] for a in acquired_set]
    f_ids = torch.tensor([a[0] for a in acquired_set])
    return torch.cat((X[cof_ids, :], f_ids.unsqueeze(dim=-1)), dim=1)

# construct output vector for acquired points
def build_y_train(acquired_set):
    return torch.tensor([y[f_id][cof_id] for f_id, cof_id in acquired_set]).unsqueeze(-1)

# construct vector to track accumulated cost of acquired points
def build_cost(acquired_set):
    return torch.tensor([cost[f_id][cof_id] for f_id, cof_id in acquired_set]).unsqueeze(-1)

# construct vector to track accumulated cost of acquired points
def build_cost_fidelity(acquired_set, fidelity):
    return torch.tensor([cost[f_id][cof_id] for f_id, cof_id in acquired_set if f_id == fidelity]).unsqueeze(-1)

### test

In [7]:
def test_initializing_functions(X, y):
    ###
    #  Construct training sets
    ###
    # list of (cof_id, fid_id)'s
    acquired_set = [[1, 10], [0, 3], [0, 4]]
    
    # Training Sets
    X_train = build_X_train(acquired_set)
    y_train = build_y_train(acquired_set)
    
    ###
    #  Test that the constructor functions are working properly
    ###
    assert np.allclose(X[10, :], X_train[0, :14])
    assert X_train[0, 14] == 1
    assert X_train[1, 14] == 0
    assert y_train[0] == y[1][10] # y[fid_id][cof_id]
    assert y_train[2] == y[0][4]
    return

test_initializing_functions(X, y)

#### Surrogate Model

In [8]:
def train_surrogate_model(X_train, y_train):
    model = SingleTaskMultiFidelityGP(
        X_train, 
        y_train, 
        outcome_transform=Standardize(m=1), # m is the output dimension
        data_fidelity=X_train.shape[1] - 1
    )   
    mll = ExactMarginalLogLikelihood(model.likelihood, model)
    fit_gpytorch_model(mll, optimizer=fit_gpytorch_torch)
    return model

#### Acquisition Function

In [9]:
# calculate posterior mean and variance for a given fidelity
def mu_sigma(model, X, fidelity):
    f = torch.tensor((), dtype=torch.float64).new_ones((nb_COFs, 1)) * fidelity
    X_f = torch.cat((X, f), dim=1) # last col is associated fidelity
    f_posterior = model.posterior(X_f)
    return f_posterior.mean.squeeze().detach().numpy(), np.sqrt(f_posterior.variance.squeeze().detach().numpy())

# get the current best y-value of desired_fidelity in the acquired set
def get_y_max(acquired_set, desired_fidelity):
    return np.max([y[f_id][cof_id] for f_id, cof_id in acquired_set if f_id == desired_fidelity])

In [10]:
###
# efficient multi-fidelity correlation function
# corr(y at given fidelity, y at high-fidelity)
# (see notes)
###
def mfbo_correlation_function(model, X, fidelity):
    # given fidelity
    f   = torch.tensor((), dtype=torch.float64).new_ones((nb_COFs, 1)) * fidelity
    X_f = torch.cat((X, f), dim=1) # last col is associated fidelity
    
    #  high-fidelity
    hf   = torch.tensor((), dtype=torch.float64).new_ones((nb_COFs, 1)) 
    X_hf = torch.cat((X, hf), dim=1) # last col is associated fidelity

    # combine into a single tensor
    X_all_fid = torch.cat((X_f, X_hf), dim=0)
    
    # get variance for each fidelity
    var_f = torch.flatten(model.posterior(X_f).variance)
    var_hf = torch.flatten(model.posterior(X_hf).variance) # variance
    
    # posterior covariance 
    cov = torch.diag(model(X_all_fid).covariance_matrix[:X_f.size()[0], X_f.size()[0]:])
    
    corr = cov / (torch.sqrt(var_f) * torch.sqrt(var_hf))
    return corr

In [11]:
###
#  Cost ratio
###
def estimate_cost_ratio(fidelity, acquired_set):
    avg_cost_f  = torch.mean(build_cost_fidelity(acquired_set, fidelity))
    avg_cost_hf = torch.mean(build_cost_fidelity(acquired_set, 1))
    cr = avg_cost_hf / avg_cost_f
    return cr.item()

###
#  Expected Imrovement function, only uses hf
###
def EI_hf(model, X, acquired_set):
    hf_mu, hf_sigma = mu_sigma(model, X, 1)
    y_max = get_y_max(acquired_set, 1)
    
    z = (hf_mu - y_max) / hf_sigma
    explore_term = hf_sigma * norm.pdf(z) 
    exploit_term = (hf_mu - y_max) * norm.cdf(z) 
    ei = explore_term + exploit_term
    return np.maximum(ei, np.zeros(nb_COFs))

###
#  Acquisition function
###
def acquisition_scores(model, X, fidelity, acquired_set):
    # expected improvement for high-fidelity
    ei = EI_hf(model, X, acquired_set) 
    
    # augmenting functions
    corr_f1_f0 = mfbo_correlation_function(model, X, fidelity)
    
    cr = estimate_cost_ratio(fidelity, acquired_set)

    scores = torch.from_numpy(ei) * corr_f1_f0 * cr
    return scores.detach().numpy()

def in_acquired_set(f_id, cof_id, acquired_set):
    for this_f_id, this_cof_id in acquired_set:
        if this_cof_id == cof_id and this_f_id == f_id:
            return True
    return False

# Run MFBO

In [12]:
###
#  construct initial inputs
###
discrete_fidelities = [0, 1] # set of discrete fidelities to select from
nb_COFs_initialization = 3   # at each fidelity, number of COFs to initialize with
nb_iterations = 100          # BO budget, includes initializing COFs

acquired_set = initialize_acquired_set(X, y, nb_COFs_initialization, discrete_fidelities)

X_train = build_X_train(acquired_set)
y_train = build_y_train(acquired_set)

print("Initialization - \n")
print("\tid acquired = ", [acq_[0].item() for acq_ in acquired_set])
print("\tfidelity acquired = ", [acq_[1].item() for acq_ in acquired_set])
print("\tcosts acquired = ", build_cost(acquired_set), " [min]")

print("\n\tTraining data:\n")
print("\t\t X train shape = ", X_train.shape)
print("\t\t y train shape = ", y_train.shape)
print("\t\t training feature vector = \n", X_train)

Initialization - 

	id acquired =  [0, 1, 0, 1, 0, 1]
	fidelity acquired =  [25, 25, 494, 494, 523, 523]
	costs acquired =  tensor([[ 33.2507],
        [399.7577],
        [ 33.2389],
        [171.9985],
        [  6.1207],
        [280.4524]], dtype=torch.float64)  [min]

	Training data:

		 X train shape =  torch.Size([6, 15])
		 y train shape =  torch.Size([6, 1])
		 training feature vector = 
 tensor([[0.1500, 0.4533, 0.1088, 0.5523, 0.4387, 0.1463, 0.3480, 0.2643, 0.0000,
         0.1769, 0.2237, 0.0000, 0.0000, 0.3471, 0.0000],
        [0.1500, 0.4533, 0.1088, 0.5523, 0.4387, 0.1463, 0.3480, 0.2643, 0.0000,
         0.1769, 0.2237, 0.0000, 0.0000, 0.3471, 1.0000],
        [0.8347, 0.9979, 0.9053, 0.0040, 0.0161, 0.0623, 0.0000, 0.0095, 0.0000,
         0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000],
        [0.8347, 0.9979, 0.9053, 0.0040, 0.0161, 0.0623, 0.0000, 0.0095, 0.0000,
         0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 1.0000],
        [0.0509, 0.2570, 0.4148, 0.6881, 0.

In [None]:
###
#  Run Search
###
for i in range(nb_COFs_initialization * len(discrete_fidelities), nb_iterations): 
    print("BO iteration: ", i)
    ###
    #  Train Model
    ###
    model = train_surrogate_model(X_train, y_train)

    ###
    # Acquire new (COF, fidelity) not yet acquired.
    ###
    # entry (fid_id, cof_id) is the acquisition value for fidelity f_id and cof cof_id
    the_acquisition_scores = np.array([acquisition_scores(model, X, f_id, acquired_set) for f_id in discrete_fidelities])
    # overwrite acquired COFs/fidelities with negative infinity to not choose these.
    for f_id, cof_id in acquired_set:
        the_acquisition_scores[f_id, cof_id] = - np.inf
    # select COF/fidelity with highest aquisition score.
    f_id, cof_id = np.unravel_index(np.argmax(the_acquisition_scores), np.shape(the_acquisition_scores))
    assert not in_acquired_set(f_id, cof_id, acquired_set)
    # Update acquired_set
    acq = torch.tensor([[f_id, cof_id]], dtype=int)
    acquired_set = torch.cat((acquired_set, acq))
    
    ###
    #
    ###
    print("\tacquired COF ", cof_id, " at fidelity, ", f_id)
    print("\t\ty = ", y[f_id][cof_id].item())
    print("\t\tcost = ", cost[f_id][cof_id])
            
    # Update training sets
    X_train = build_X_train(acquired_set)
    y_train = build_y_train(acquired_set)

BO iteration:  6
Iter 10/100: 8.649342013940094
Iter 20/100: 6.261137550553084
Iter 30/100: 4.990258879375678
Iter 40/100: 4.189199617403134
Iter 50/100: 3.970566587573529
Iter 60/100: 3.861769073961766
Iter 70/100: 3.8169358792521293
Iter 80/100: 3.7957507575624336
Iter 90/100: 3.786184368275768
Iter 100/100: 3.779489479113306


torch.linalg.solve_triangular has its arguments reversed and does not return a copy of one of the inputs.
X = torch.triangular_solve(B, A).solution
should be replaced with
X = torch.linalg.solve_triangular(A, B). (Triggered internally at  ../aten/src/ATen/native/BatchLinearAlgebra.cpp:2189.)


	acquired COF  521  at fidelity,  0
		y =  15.30972918212556
		cost =  6.206935115655263
BO iteration:  7
Iter 10/100: 7.614303650099994
Iter 20/100: 5.5481737176964065
Iter 30/100: 4.4193375753205375
Iter 40/100: 3.6769469794672327
Iter 50/100: 3.513558616402799
Iter 60/100: 3.4289432847700096
Iter 70/100: 3.3896726465679157
Iter 80/100: 3.3695011054234394
Iter 90/100: 3.3575767043886984
Iter 100/100: 3.347487472544626
	acquired COF  522  at fidelity,  0
		y =  15.951650673989429
		cost =  6.0286850492159525
BO iteration:  8
Iter 10/100: 6.850133783119505
Iter 20/100: 5.027511893864176
Iter 30/100: 4.017520396009643
Iter 40/100: 3.334977681387051
Iter 50/100: 3.207249960866922
Iter 60/100: 3.1371892739627727
Iter 70/100: 3.1022503356921383
Iter 80/100: 3.0824577383154748
Iter 90/100: 3.0693220621859405
Iter 100/100: 3.0569681276770346
	acquired COF  583  at fidelity,  0
		y =  2.848405682526983
		cost =  24.38421953121821
BO iteration:  9
Iter 10/100: 6.2718023493026855
Iter 20/100: 4

In [None]:
###
# look at unique COFs acquired
###
cof_ids_acquired = torch.tensor([acq[1] for acq in acquired_set])
n_unique_cofs_acquired = len(np.unique(cof_ids_acquired))
print("total number of unique COFs acquired", n_unique_cofs_acquired)

###
#  Iterations until top COF and accumulated 
###
if np.argmax(y[1]) in cof_ids_acquired:
    print("woo, top COF acquired!")
else:
    print("oh no, top COF not acquired!")
    
BO_iter_top_cof_acquired = np.argmax(cof_ids_acquired == np.argmax(y[1]))
top_cof_acc_cost = sum(build_cost(acquired_set)[:BO_iter_top_cof_acquired])

print("iteration we acquire top COF = ", BO_iter_top_cof_acquired.item() + 1)
print("accumulated cost up to observation of top COF = ", top_cof_acc_cost, " [min]" )

# Store Results

In [None]:
mfbo_res = dict({'acquired_set': acquired_set,
                 'cost_acquired': build_cost(acquired_set) # name to be consistent with other notebooks
                })

with open('search_results/mfbo_results_with_EI.pkl', 'wb') as file:
    pickle.dump(mfbo_res, file)