## System Description
1. We have a set of COFs from a database. Each COF is characterized by a feature vector $$x_{COF} \in X \subset R^d$$ were d=14.


2. We have **two different types** of simulations to calculate **the same material property $S_{Xe/Kr}$**. Therefore, we have a Single-Task/Objective 
$$argmax_{x_{COF} \in X}[S_{Xe/Kr}(x_{COF})]$$

3. Multi-Fidelity problem. 
    1. low-fidelity  => Henry coefficient calculation - MC integration: $S_{Xe/Kr} = \frac{H_{Xe}}{H_{Kr}}$
    2. high-fidelity => GCMC mixture simulation - 80:20 (Kr:Xe) at 298 K and 1.0 bar: $S_{Xe/Kr} = \frac{n_{Xe} / n_{Kr}}{y_{Xe}/y_{Kr}}$


3. We will initialize the system with a few COFs at **both** fidelities in order to initialize the Covariance Matrix.
    1. The fist COF will be the one closest to the center of the normalized feature space
    2. The rest will be chosen to maximize diversity of the training set


4. Each surrogate model will **only train on data acquired at its level of fidelity** (Heterotopic data). $$X_{lf} \neq X_{hf} \subset X$$
    1. We  use the augmented-EI (aEI) acquisition function from [here](https://link.springer.com/content/pdf/10.1007/s00158-005-0587-0.pdf)
    2. Botorch GP surrogate model: [SingleTaskMultiFidelityGP](https://botorch.org/api/models.html#module-botorch.models.gp_regression_fidelity)
    3. Needed to use [this](https://botorch.org/api/optim.html#module-botorch.optim.fit) optimizer to correct matrix jitter
    4. Helpful [tutorial](https://botorch.org/tutorials/discrete_multi_fidelity_bo) for a similar BoTorch Model used

In [1]:
import torch
import gpytorch
from botorch.models import SingleTaskMultiFidelityGP
from botorch.models.transforms.outcome import Standardize
from gpytorch.mlls import ExactMarginalLogLikelihood
from botorch import fit_gpytorch_model
from botorch.optim.fit import fit_gpytorch_torch # fix Cholecky jitter error
from scipy.stats import norm
from sklearn.decomposition import PCA
import math 
import numpy as np
import matplotlib.pyplot as plt
import pickle
import h5py # for .jld2 files
import os

# config plot settings
plt.rcParams["font.size"] = 16

In [2]:
###
#  Load Data
###
file = h5py.File("targets_and_normalized_features.jld2", "r")
# feature matrix
X = torch.from_numpy(np.transpose(file["X"][:]))
# simulation data
y = [torch.from_numpy(np.transpose(file["henry_y"][:])), 
     torch.from_numpy(np.transpose(file["gcmc_y"][:]))]
# associated simulation costs
cost = [np.transpose(file["henry_total_elapsed_time"][:]), 
        np.transpose(file["gcmc_elapsed_time"][:])]

# total number of COFs in data set
nb_COFs = X.shape[0] 

print("raw data - \n\tX:", X.shape)
for f in range(2):
    print("\tfidelity:", f)
    print("\t\ty:", y[f].shape)
    print("\t\tcost: ", cost[f].shape)
    
print("\nEnsure features are normalized - ")
print("max:\n", torch.max(X, 0).values)
print("min:\n", torch.min(X, 0).values)
print("width:\n",torch.max(X, 0).values - torch.min(X, 0).values)

raw data - 
	X: torch.Size([608, 14])
	fidelity: 0
		y: torch.Size([608])
		cost:  (608,)
	fidelity: 1
		y: torch.Size([608])
		cost:  (608,)

Ensure features are normalized - 
max:
 tensor([0.7144, 0.4136, 0.4696, 0.6677, 0.9579, 0.8383, 0.3595, 0.3207, 0.9938,
        0.8242, 0.9692, 0.9869, 0.9868, 0.9762], dtype=torch.float64)
min:
 tensor([-0.2856, -0.5864, -0.5304, -0.3323, -0.0421, -0.1617, -0.6405, -0.6793,
        -0.0062, -0.1758, -0.0308, -0.0131, -0.0132, -0.0238],
       dtype=torch.float64)
width:
 tensor([1.0000, 1.0000, 1.0000, 1.0000, 1.0000, 1.0000, 1.0000, 1.0000, 1.0000,
        1.0000, 1.0000, 1.0000, 1.0000, 1.0000], dtype=torch.float64)


In [3]:
print("total high-fidelity cost:", sum(cost[1]).item(), "[min]")
print("total low-fidelity cost: ", sum(cost[0]).item(), "[min]\n")

print("average high-fidelity cost:", np.mean(cost[1]), "[min]")
print("average low-fidelity cost: ", np.mean(cost[0]), "[min]")
print("average cost ratio:\t   ", np.mean(cost[1] / cost[0]))

total high-fidelity cost: 139887.66223703226 [min]
total low-fidelity cost:  10076.305239888028 [min]

average high-fidelity cost: 230.0783918372241 [min]
average low-fidelity cost:  16.57287046034216 [min]
average cost ratio:	    13.444745568580501


In [None]:
###
#  Load initializing COFs
###
init_cof_ids_file = pickle.load(open('search_results/initializing_cof_ids.pkl', 'rb'))
init_cof_ids = init_cof_ids_file['init_cof_ids']
init_cof_ids[:5]

## Helper Functions

#### Construct Initial Inputs

In [4]:
# # find COF closest to the center of feature space
# def get_initializing_COF(X):
#     # center of feature space
# #     feature_center = np.ones(X.shape[1]) * 0.5
#     data_center = np.array([X[:, i].mean() for i in range(X.size()[1])])
#     # max possible distance between normalized features
#     return np.argmin(np.linalg.norm(X - data_center, axis=1))

# # yields np.array([25, 494, 523])
# def diverse_set(X, train_size):
#     # initialize with one random point; pick others in a max diverse fashion
#     ids_train = [get_initializing_COF(X)]
#     # select remaining training points
#     for j in range(train_size - 1):
#         # for each point in data set, compute its min dist to training set
#         dist_to_train_set = np.linalg.norm(X - X[ids_train, None, :], axis=2)
#         assert np.shape(dist_to_train_set) == (len(ids_train), nb_COFs)
#         min_dist_to_a_training_pt = np.min(dist_to_train_set, axis=0)
#         assert np.size(min_dist_to_a_training_pt) == nb_COFs
        
#         # acquire point with max(min distance to train set) i.e. Furthest from train set
#         ids_train.append(np.argmax(min_dist_to_a_training_pt))
#     assert np.size(np.unique(ids_train)) == train_size # must be unique
#     return np.array(ids_train)

In [7]:
def initialize_acquired_set(X, y, nb_COFs_initialization, discrete_fidelities):
    cof_ids = diverse_set(X, nb_COFs_initialization) # np.array(ids_train)
    return torch.tensor([[f_id, cof_id] for cof_id in cof_ids for f_id in discrete_fidelities])

In [8]:
# construct feature matrix of acquired points
def build_X_train(acquired_set):
    cof_ids = [a[1] for a in acquired_set]
    f_ids = torch.tensor([a[0] for a in acquired_set])
    return torch.cat((X[cof_ids, :], f_ids.unsqueeze(dim=-1)), dim=1)

# construct output vector for acquired points
def build_y_train(acquired_set):
    return torch.tensor([y[f_id][cof_id] for f_id, cof_id in acquired_set]).unsqueeze(-1)

# construct vector to track accumulated cost of acquired points
def build_cost(acquired_set):
    return torch.tensor([cost[f_id][cof_id] for f_id, cof_id in acquired_set]).unsqueeze(-1)

# construct vector to track accumulated cost of acquired points
def build_cost_fidelity(acquired_set, fidelity):
    return torch.tensor([cost[f_id][cof_id] for f_id, cof_id in acquired_set if f_id == fidelity]).unsqueeze(-1)

#### test

In [9]:
def test_initializing_functions(X, y):
    ###
    #  Construct training sets
    ###
    # list of (cof_id, fid_id)'s
    acquired_set = [[1, 10], [0, 3], [0, 4]]
    
    # Training Sets
    X_train = build_X_train(acquired_set)
    y_train = build_y_train(acquired_set)
    
    ###
    #  Test that the constructor functions are working properly
    ###
    assert np.allclose(X[10, :], X_train[0, :14])
    assert X_train[0, 14] == 1
    assert X_train[1, 14] == 0
    assert y_train[0] == y[1][10] # y[fid_id][cof_id]
    assert y_train[2] == y[0][4]
    return

test_initializing_functions(X, y)

### Surrogate Model

In [10]:
def train_surrogate_model(X_train, y_train):
    model = SingleTaskMultiFidelityGP(
        X_train, 
        y_train, 
        outcome_transform=Standardize(m=1), # m is the output dimension
        data_fidelity=X_train.shape[1] - 1
    )   
    mll = ExactMarginalLogLikelihood(model.likelihood, model)
    fit_gpytorch_model(mll, optimizer=fit_gpytorch_torch)
    return model

### Acquisition Function

In [11]:
# calculate posterior mean and variance for a given fidelity
def mu_sigma(model, X, fidelity):
    f = torch.tensor((), dtype=torch.float64).new_ones((nb_COFs, 1)) * fidelity
    X_f = torch.cat((X, f), dim=1) # last col is associated fidelity
    f_posterior = model.posterior(X_f)
    return f_posterior.mean.squeeze().detach().numpy(), np.sqrt(f_posterior.variance.squeeze().detach().numpy())

# get the current best y-value of desired_fidelity in the acquired set
def get_y_max(acquired_set, desired_fidelity):
    return np.max([y[f_id][cof_id] for f_id, cof_id in acquired_set if f_id == desired_fidelity])

In [12]:
###
# efficient multi-fidelity correlation function
# corr(y at given fidelity, y at high-fidelity)
# (see notes)
###
def mfbo_correlation_function(model, X, fidelity):
    # given fidelity
    f   = torch.tensor((), dtype=torch.float64).new_ones((nb_COFs, 1)) * fidelity
    X_f = torch.cat((X, f), dim=1) # last col is associated fidelity
    
    #  high-fidelity
    hf   = torch.tensor((), dtype=torch.float64).new_ones((nb_COFs, 1)) 
    X_hf = torch.cat((X, hf), dim=1) # last col is associated fidelity

    # combine into a single tensor
    X_all_fid = torch.cat((X_f, X_hf), dim=0)
    
    # get variance for each fidelity
    var_f = torch.flatten(model.posterior(X_f).variance)
    var_hf = torch.flatten(model.posterior(X_hf).variance) # variance
    
    # posterior covariance 
    cov = torch.diag(model(X_all_fid).covariance_matrix[:X_f.size()[0], X_f.size()[0]:])
    
    corr = cov / (torch.sqrt(var_f) * torch.sqrt(var_hf))
    return corr

In [13]:
###
#  Cost ratio
###
def estimate_cost_ratio(fidelity, acquired_set):
    avg_cost_f  = torch.mean(build_cost_fidelity(acquired_set, fidelity))
    avg_cost_hf = torch.mean(build_cost_fidelity(acquired_set, 1))
    cr = avg_cost_hf / avg_cost_f
    return cr.item()

###
#  Expected Imrovement function, only uses hf
###
def EI_hf(model, X, acquired_set):
    hf_mu, hf_sigma = mu_sigma(model, X, 1)
    y_max = get_y_max(acquired_set, 1)
    
    z = (hf_mu - y_max) / hf_sigma
    explore_term = hf_sigma * norm.pdf(z) 
    exploit_term = (hf_mu - y_max) * norm.cdf(z) 
    ei = explore_term + exploit_term
    return np.maximum(ei, np.zeros(nb_COFs))

###
#  Acquisition function
###
def acquisition_scores(model, X, fidelity, acquired_set):
    # expected improvement for high-fidelity
    ei = EI_hf(model, X, acquired_set) 
    
    # augmenting functions
    corr_f1_f0 = mfbo_correlation_function(model, X, fidelity)
    
    cr = estimate_cost_ratio(fidelity, acquired_set)

    scores = torch.from_numpy(ei) * corr_f1_f0 * cr
    return scores.detach().numpy()

def in_acquired_set(f_id, cof_id, acquired_set):
    for this_f_id, this_cof_id in acquired_set:
        if this_cof_id == cof_id and this_f_id == f_id:
            return True
    return False

# Run MFBO

In [14]:
###
#  construct initial inputs
###
discrete_fidelities = [0, 1] # set of discrete fidelities to select from
nb_COFs_initialization = 3   # at each fidelity, number of COFs to initialize with
nb_iterations = 75          # BO budget, includes initializing COFs

acquired_set = initialize_acquired_set(X, y, nb_COFs_initialization, discrete_fidelities)

X_train = build_X_train(acquired_set)
y_train = build_y_train(acquired_set)

print("Initialization - \n")
print("\tid acquired = ", [acq_[0].item() for acq_ in acquired_set])
print("\tfidelity acquired = ", [acq_[1].item() for acq_ in acquired_set])
print("\tcosts acquired = ", build_cost(acquired_set), " [min]")

print("\n\tTraining data:\n")
print("\t\t X train shape = ", X_train.shape)
print("\t\t y train shape = ", y_train.shape)
print("\t\t training feature vector = \n", X_train)

Initialization - 

	id acquired =  [0, 1, 0, 1, 0, 1]
	fidelity acquired =  [112, 112, 522, 522, 45, 45]
	costs acquired =  tensor([[ 16.0102],
        [ 85.4680],
        [  6.0287],
        [255.8537],
        [  3.5390],
        [ 67.3320]], dtype=torch.float64)  [min]

	Training data:

		 X train shape =  torch.Size([6, 15])
		 y train shape =  torch.Size([6, 1])
		 training feature vector = 
 tensor([[-0.0266,  0.0133,  0.0307, -0.0114, -0.0421, -0.0162,  0.0047,  0.0116,
         -0.0062,  0.0424, -0.0308, -0.0131, -0.0132, -0.0238,  0.0000],
        [-0.0266,  0.0133,  0.0307, -0.0114, -0.0421, -0.0162,  0.0047,  0.0116,
         -0.0062,  0.0424, -0.0308, -0.0131, -0.0132, -0.0238,  1.0000],
        [-0.2399, -0.2826, -0.2483,  0.4336,  0.3246,  0.6383, -0.4210, -0.4127,
         -0.0062, -0.1758, -0.0308,  0.9869,  0.4535, -0.0238,  0.0000],
        [-0.2399, -0.2826, -0.2483,  0.4336,  0.3246,  0.6383, -0.4210, -0.4127,
         -0.0062, -0.1758, -0.0308,  0.9869,  0.4535, -0

In [15]:
###
#  Run Search
###
for i in range(nb_COFs_initialization * len(discrete_fidelities), nb_iterations): 
    print("BO iteration: ", i)
    ###
    #  Train Model
    ###
    model = train_surrogate_model(X_train, y_train)

    ###
    # Acquire new (COF, fidelity) not yet acquired.
    ###
    # entry (fid_id, cof_id) is the acquisition value for fidelity f_id and cof cof_id
    the_acquisition_scores = np.array([acquisition_scores(model, X, f_id, acquired_set) for f_id in discrete_fidelities])
    # overwrite acquired COFs/fidelities with negative infinity to not choose these.
    for f_id, cof_id in acquired_set:
        the_acquisition_scores[f_id, cof_id] = - np.inf
    # select COF/fidelity with highest aquisition score.
    f_id, cof_id = np.unravel_index(np.argmax(the_acquisition_scores), np.shape(the_acquisition_scores))
    assert not in_acquired_set(f_id, cof_id, acquired_set)
    # Update acquired_set
    acq = torch.tensor([[f_id, cof_id]], dtype=int)
    acquired_set = torch.cat((acquired_set, acq))
    
    ###
    #  print useful info
    ###
    print("\tacquired COF ", cof_id, " at fidelity, ", f_id)
    print("\t\ty = ", y[f_id][cof_id].item())
    print("\t\tcost = ", cost[f_id][cof_id])
            
    # Update training sets (perform experiment and incur cost)
    X_train = build_X_train(acquired_set)
    y_train = build_y_train(acquired_set)

BO iteration:  6
Iter 10/100: 8.649966451577285
Iter 20/100: 6.26206755970746
Iter 30/100: 4.9909465606677434
Iter 40/100: 4.18377881356157
Iter 50/100: 3.969026411594752
Iter 60/100: 3.8593285725557678
Iter 70/100: 3.814466705392395
Iter 80/100: 3.7929741679094255
Iter 90/100: 3.7831518794269083
Iter 100/100: 3.7761276175238065
	acquired COF  521  at fidelity,  0
		y =  15.30972918212556
		cost =  6.206935115655263
BO iteration:  7
Iter 10/100: 7.632250650920198
Iter 20/100: 5.572069981168265
Iter 30/100: 4.456094986634148
Iter 40/100: 3.7145561522884245
Iter 50/100: 3.5556454359205065
Iter 60/100: 3.470555955829871
Iter 70/100: 3.4310039381264614
Iter 80/100: 3.411292441021827
Iter 90/100: 3.3999351933715007
Iter 100/100: 3.390423229014288
	acquired COF  523  at fidelity,  0
		y =  20.368232946295223
		cost =  6.120688116550445
BO iteration:  8
Iter 10/100: 6.865587544981905
Iter 20/100: 5.048079508976835
Iter 30/100: 4.053556062111394
Iter 40/100: 3.394821216999607
Iter 50/100: 3.25

Iter 80/100: 1.7599014444535204
Iter 90/100: 1.7579995533619663
Iter 100/100: 1.7572479839263033
	acquired COF  504  at fidelity,  0
		y =  2.3677421486178543
		cost =  54.77730476856232
BO iteration:  29
Iter 10/100: 3.0100642234594828
Iter 20/100: 2.440687071820215
Iter 30/100: 2.1247103167854755
Iter 40/100: 1.9201841298907774
Iter 50/100: 1.7859533161183545
Iter 60/100: 1.758994894910641
Iter 70/100: 1.739628200712164
Iter 80/100: 1.7332281781810512
Iter 90/100: 1.7314436562138935
Iter 100/100: 1.7307216208267546
	acquired COF  237  at fidelity,  1
		y =  18.170979513751256
		cost =  233.49063474734623
BO iteration:  30
Iter 10/100: 2.9558779126603762
Iter 20/100: 2.397765249091492
Iter 30/100: 2.075670423428208
Iter 40/100: 1.8554547359401177
Iter 50/100: 1.7256271762601463
Iter 60/100: 1.6993379838982556
Iter 70/100: 1.6788783159529785
Iter 80/100: 1.6731497462001281
Iter 90/100: 1.6712034601294277
Iter 100/100: 1.6704320247657056
	acquired COF  302  at fidelity,  0
		y =  2.8092

Iter 30/100: 1.7191528467633337
Iter 40/100: 1.5154506275291508
Iter 50/100: 1.4247780726721555
Iter 60/100: 1.4041506561660935
Iter 70/100: 1.3918252828859885
Iter 80/100: 1.3869144127194188
Iter 90/100: 1.3850771627270444
Iter 100/100: 1.3846042911206409
	acquired COF  317  at fidelity,  0
		y =  3.846852134351744
		cost =  17.92598495086034
BO iteration:  49
Iter 10/100: 2.3691792599469546
Iter 20/100: 1.9746527406215428
Iter 30/100: 1.706991113478633
Iter 40/100: 1.5044853555231756
Iter 50/100: 1.4135747148834352
Iter 60/100: 1.3937093942962393
Iter 70/100: 1.3824792915668487
Iter 80/100: 1.3771961223657494
Iter 90/100: 1.37563967035472
Iter 100/100: 1.3750002418343419
	acquired COF  515  at fidelity,  0
		y =  6.114563023506541
		cost =  7.098235750198365
BO iteration:  50
Iter 10/100: 2.350752745605308
Iter 20/100: 1.9619190108999396
Iter 30/100: 1.6967240962853187
Iter 40/100: 1.495767993983852
Iter 50/100: 1.406589755939657
Iter 60/100: 1.3868395565050577
Iter 70/100: 1.3755653

Iter 50/100: 1.2444784414049186
	acquired COF  376  at fidelity,  1
		y =  18.396015574220662
		cost =  1178.0223760167758
BO iteration:  69
Iter 10/100: 2.095368003230686
Iter 20/100: 1.772586704441491
Iter 30/100: 1.5224234153868679
Iter 40/100: 1.2821241745923984
Iter 50/100: 1.1610609255394535
Iter 60/100: 1.1481168673778852
Iter 70/100: 1.141262602513715
Iter 80/100: 1.137331348873333
Iter 90/100: 1.1362628532673984
Iter 100/100: 1.1359703534738992
	acquired COF  340  at fidelity,  0
		y =  3.5070975129309887
		cost =  11.001785151163737
BO iteration:  70
Iter 10/100: 2.0850512489676327
Iter 20/100: 1.76503505811603
Iter 30/100: 1.515587925743378
Iter 40/100: 1.2770245457510205
Iter 50/100: 1.1596739404842715
Iter 60/100: 1.146069837860541
Iter 70/100: 1.139256070765039
Iter 80/100: 1.1353915085362982
Iter 90/100: 1.1343055442921115
Iter 100/100: 1.1340407758932765
	acquired COF  248  at fidelity,  0
		y =  1.7942868968651118
		cost =  16.317000313599905
BO iteration:  71
Iter 10/

The stuf below doesn't check the fidelity which is causing the indexing error (I think). The top COF (cof_id=375, y=18.53, cost=1002.68) is indeed being identified on iteration 44, but at the low-fidelity. It isn't until iteration 46 that it is being acquired at the high-fidelity (which is the one that matters.

In [16]:
###
# look at unique COFs acquired
###
# cof_ids_acquired = torch.tensor([acq[1] for acq in acquired_set])
n_unique_cofs_acquired = len(np.unique([acq[1] for acq in acquired_set]))
print("total number of unique COFs acquired", n_unique_cofs_acquired)

###
#  Iterations until top COF and accumulated 
###
cof_id_with_max_selectivity = np.argmax(y[1])
BO_iter_top_cof_acquired = float("inf") # dummy 
for i, (f_id, cof_id) in enumerate(acquired_set):
    if cof_id == cof_id_with_max_selectivity and f_id == 1:
        BO_iter_top_cof_acquired = i
        print("woo, top COF acquired!")
        print("iteration we acquire top COF = ", BO_iter_top_cof_acquired) 
        break
    elif i == len(acquired_set)-1:
        print("oh no, top COF not acquired!")


top_cof_acc_cost = sum(build_cost(acquired_set)[:BO_iter_top_cof_acquired])
print("accumulated cost up to observation of top COF = ", top_cof_acc_cost.item(), " [min]")

total number of unique COFs acquired 66
woo, top COF acquired!
iteration we acquire top COF =  67
accumulated cost up to observation of top COF =  2242.038217175008  [min]


# Store Results

In [19]:
mfbo_res = dict({'acquired_set': acquired_set.detach().numpy(),
                 'cost_acquired': build_cost(acquired_set).flatten().detach().numpy(),
                 'nb_COFs_initialization': nb_COFs_initialization,
                 'BO_iter_top_cof_acquired': BO_iter_top_cof_acquired
                })

with open('search_results/mfbo_results_with_EI.pkl', 'wb') as file:
    pickle.dump(mfbo_res, file)