## TODO:
1. figure out the shape issue with the model inputs
    - currently: `model.num_outputs = 5` (pretty sure this is due to num training pts)
    - what are `task_feature`
    - maybe look at input_constructor
    - can it handle heterotopic data? We are only training henry on henry data and gcmc on gcmc data.
2. look over GPyTorch models
    - figure out how to define the MultiTask model
    - consider just writing this notebook in python
3. Write MultiFidelity Expectied Improvement Acquisition function
4. Perform BO and plot results

## System Description
1. We have a set of COFs from a database. Each COF is characterized by a feature vector $$x_{COF} \in X \subset R^d$$ were d=14.


2. We have **two different types** of simulations to calculate **the same material property $S_{Xe/Kr}$**. Therefore, we have a Single-Task/Objective (find the material with the optimal selevtivity), Multi-Fidelity problem. 
    1. low-fidelity  = Henry Coefficient calculation - MC integration - cost=1
    2. high-fidelity = GCMC mixture simulation - 80:20 (Kr:Xe) at 298 K and 1.0 bar - cost=30


3. We will initialize the system with *two* COFs at both fidelities in order to initialize the Covariance Matrix.
    - The fist COF will be the one closest to the center of the normalized feature space
    - The second COF will be chosen at random


4. Each surrogate model will **only train on data acquired at its level of fidelity** (Heterotopic data). $$X_{lf} \neq X_{hf} \subset X$$
    1. We are using the augmented EI acquisition function from [here](https://link.springer.com/content/pdf/10.1007/s00158-005-0587-0.pdf)


5. **kernel model**: 
    1.  We need a Gaussian Process (GP) that will give a *correlated output for each fidelity* i.e. we need a vector-valued kernel
    2. Given the *cost aware* acquisition function, which imposes a fidelity hierarchy, we anticipate the number of training points at each fidelity *will not* be equal (asymmetric scenario) $$n_{lf} > n_{hf}$$
        - perhaps we can force the symmetric case, $n_{lf} = n_{hf} = n$, if we can include `missing` or `empty` entries in the training sets.


Note: even though we have heterotopic data in an asymmetric scenario -- due to hierarchical, multi-fidelity -- we can still use a symmetric multi-output GP. 

#### Strategy
We are going to attempt to use the Intrinsic Coregionalization Model (ICM) -- a symmetric, multi-output GP (MOGP) -- along with the augmented EI acquisition function to identify the optimally slective COF in database.

In [1]:
using JLD2
using PyPlot
using StatsBase # Statistics
using Distributions

using PyCall
@pyimport torch 
@pyimport gpytorch
@pyimport botorch


# config plot settings
PyPlot.matplotlib.style.use("ggplot")
rcParams = PyPlot.PyDict(PyPlot.matplotlib."rcParams")
rcParams["font.size"] = 16;

In [10]:
###
#  example
###
train_X = torch.rand(20, 4)
train_Y = train_X.pow(2).sum(dim=-1, keepdim=true)
model = botorch.models.gp_regression_fidelity.SingleTaskMultiFidelityGP(train_X, train_Y, data_fidelity=2);

In [11]:
model.num_outputs

1

In [12]:
mll = gpytorch.mlls.ExactMarginalLogLikelihood(model.likelihood, model);

In [13]:
botorch.fit.fit_gpytorch_model(mll, max_retries=5)

PyObject ExactMarginalLogLikelihood(
  (likelihood): GaussianLikelihood(
    (noise_covar): HomoskedasticNoise(
      (noise_prior): GammaPrior()
      (raw_noise_constraint): GreaterThan(1.000E-04)
    )
  )
  (model): SingleTaskMultiFidelityGP(
    (likelihood): GaussianLikelihood(
      (noise_covar): HomoskedasticNoise(
        (noise_prior): GammaPrior()
        (raw_noise_constraint): GreaterThan(1.000E-04)
      )
    )
    (mean_module): ConstantMean()
    (covar_module): ScaleKernel(
      (base_kernel): LinearTruncatedFidelityKernel(
        (raw_power_constraint): Positive()
        (power_prior): GammaPrior()
        (covar_module_unbiased): MaternKernel(
          (lengthscale_prior): GammaPrior()
          (raw_lengthscale_constraint): Positive()
          (distance_module): Distance()
        )
        (covar_module_biased): MaternKernel(
          (lengthscale_prior): GammaPrior()
          (raw_lengthscale_constraint): Positive()
          (distance_module): Distance()

In [7]:
###
#  load data
###
@load joinpath(pwd(), "targets_and_normalized_features.jld2") X henry_y gcmc_y

3-element Vector{Symbol}:
 :X
 :henry_y
 :gcmc_y

In [9]:
X[1, :]

14-element Vector{Float64}:
 0.006426269579626523
 0.19186685507929027
 0.5030387259825977
 0.6277566164161331
 0.5349395204896041
 0.42291262109265054
 0.7543285744454562
 0.0
 0.0
 0.0
 0.9699383580246026
 0.0
 0.0
 0.0

In [5]:
torch.from_numpy(X).shape

(608, 14)

In [12]:
# fidelities = torch.tensor([0.5, 0.75, 1.0], **tkwargs)
c1 = 1       # low-fidelity absolute cost
c2 = 30 * c1 # high-fidelity absolute cost

function cost_function(cl::Int64, cm::Float64)
    return cl/cm
end

fidelity_cost = torch.from_numpy([c1/c2, c2/c2])

PyObject tensor([0.0333, 1.0000], dtype=torch.float64)

In [5]:
###
#  Randomly select COFs to train GP
###
nb_henry_init = 2 # assuming asymmetric, heterotopic scenario
nb_gcmc_init  = 2
ids_acquired_henry = StatsBase.sample(1:length(henry_y), nb_henry_init, replace=false)
ids_acquired_gcmc  = StatsBase.sample(1:length(gcmc_y),  nb_gcmc_init, replace=false)
@assert ! any([id in ids_acquired_gcmc for id in ids_acquired_henry]) "data not heterotopic"

In [6]:
###
#  construct input tensors
###
train_X1 = torch.from_numpy(X[ids_acquired_henry, :])
train_X2 = torch.from_numpy(X[ids_acquired_gcmc, :])

PyObject tensor([[0.4479, 0.6000, 0.5181, 0.1963, 0.2187, 0.1991, 0.1015, 0.1028, 0.0000,
         0.0000, 0.0000, 0.0000, 0.0000, 0.0000],
        [0.2303, 0.9000, 0.5653, 0.3508, 0.4941, 0.3773, 0.0000, 0.0974, 0.0000,
         0.0000, 0.0000, 0.0000, 0.0000, 0.0000]], dtype=torch.float64)

In [7]:
train_X = torch.cat((train_X1, train_X2)) # feature vectors

PyObject tensor([[0.3216, 0.7000, 0.6009, 0.3101, 0.3860, 0.3409, 0.0000, 0.0513, 0.0000,
         0.2061, 0.0000, 0.0000, 0.0000, 0.0000],
        [0.0101, 0.5000, 0.6240, 0.6346, 0.7929, 0.7044, 0.2284, 0.1157, 0.5909,
         0.0000, 0.0000, 0.0000, 0.0000, 0.0000],
        [0.4479, 0.6000, 0.5181, 0.1963, 0.2187, 0.1991, 0.1015, 0.1028, 0.0000,
         0.0000, 0.0000, 0.0000, 0.0000, 0.0000],
        [0.2303, 0.9000, 0.5653, 0.3508, 0.4941, 0.3773, 0.0000, 0.0974, 0.0000,
         0.0000, 0.0000, 0.0000, 0.0000, 0.0000]], dtype=torch.float64)

In [8]:
y1 = torch.from_numpy(henry_y[ids_acquired_henry]) # low-fidelity
y2 = torch.from_numpy(gcmc_y[ids_acquired_gcmc])   # high-fidelity

y_acquired = torch.stack((y1, y2))
# standardize outputs using *only currently acquired data*
y_acquired = (y_acquired - torch.mean(y_acquired)) / torch.std(y_acquired)

PyObject tensor([[ 0.3046,  1.1003],
        [-1.2982, -0.1066]], dtype=torch.float64)

In [9]:
y_acquired.size()

(2, 2)

In [10]:
train_X = train_X[:2]

PyObject tensor([0.0101, 0.5000, 0.6240, 0.6346, 0.7929, 0.7044, 0.2284, 0.1157, 0.5909,
        0.0000, 0.0000, 0.0000, 0.0000, 0.0000], dtype=torch.float64)

In [11]:
train_X.size()

(14,)

In [12]:
# botorch.models.gp_regression_fidelity.SingleTaskMultiFidelityGP(train_X, train_Y, 
#     iteration_fidelity=None, data_fidelity=None, linear_truncated=True, nu=2.5, likelihood=None, 
#     outcome_transform=None, input_transform=None)
# 
# 
# botorch.models.model_list_gp_regression.ModelListGP()
# botorch.models.gpytorch.MultiTaskGPyTorchModel
# botorch.models.gp_regression.SingleTaskGP(train_X, train_Y, likelihood="None")
# model = botorch.models.gp_regression.SingleTaskGP(train_X, y_acquired)


LoadError: PyError ($(Expr(:escape, :(ccall(#= /home/ng/.julia/packages/PyCall/3fwVL/src/pyfncall.jl:43 =# @pysym(:PyObject_Call), PyPtr, (PyPtr, PyPtr, PyPtr), o, pyargsptr, kw))))) <class 'botorch.exceptions.errors.BotorchTensorDimensionError'>
BotorchTensorDimensionError('Expected X and Y to have the same number of dimensions (got X with dimension 1 and Y with dimension 2.')
  File "/home/ng/.local/lib/python3.8/site-packages/botorch/models/gp_regression_fidelity.py", line 112, in __init__
    super().__init__(
  File "/home/ng/.local/lib/python3.8/site-packages/botorch/models/gp_regression.py", line 100, in __init__
    self._validate_tensor_args(X=transformed_X, Y=train_Y)
  File "/home/ng/.local/lib/python3.8/site-packages/botorch/models/gpytorch.py", line 83, in _validate_tensor_args
    raise BotorchTensorDimensionError(message)


In [13]:
###
#  Construct GP
#  note - I should be able to include the noise if I add it to the data dictionary -> FixedNoiseMultiTaskGP
###
# task_feature = 1
# model = botorch.models.multitask.MultiTaskGP(train_X, y_acquired, task_feature)

model = botorch.models.gp_regression_fidelity.SingleTaskMultiFidelityGP(train_X, y_acquired, data_fidelity=2)

LoadError: PyError ($(Expr(:escape, :(ccall(#= /home/ng/.julia/packages/PyCall/3fwVL/src/pyfncall.jl:43 =# @pysym(:PyObject_Call), PyPtr, (PyPtr, PyPtr, PyPtr), o, pyargsptr, kw))))) <class 'botorch.exceptions.errors.BotorchTensorDimensionError'>
BotorchTensorDimensionError('Expected X and Y to have the same number of dimensions (got X with dimension 1 and Y with dimension 2.')
  File "/home/ng/.local/lib/python3.8/site-packages/botorch/models/multitask.py", line 126, in __init__
    self._validate_tensor_args(X=transformed_X, Y=train_Y)
  File "/home/ng/.local/lib/python3.8/site-packages/botorch/models/gpytorch.py", line 83, in _validate_tensor_args
    raise BotorchTensorDimensionError(message)


In [14]:
model.num_outputs

LoadError: UndefVarError: model not defined

In [15]:
mll = gpytorch.mlls.ExactMarginalLogLikelihood(model.likelihood, model)
# mll = gpytorch.mlls.SumMarginalLogLikelihood(model.likelihood, model)

LoadError: UndefVarError: model not defined

In [16]:
botorch.fit.fit_gpytorch_model(mll, max_retries=5)

LoadError: UndefVarError: mll not defined

In [17]:
optput = model(train_X)
optput.mean

LoadError: UndefVarError: model not defined

In [18]:
optput._covar.size()

LoadError: UndefVarError: optput not defined

#### BO function

In [19]:
# """
# # Arguments
# - `X`: feature matrix
# - `y`: target vector
# - `nb_iterations`: maximum number of BO iterations (experiment budget)
# - `which_acquisition`: which acquisition function to implement
# ` `store_explore_exploit_terms`: whether or not to keep track of the explore and exploit 
#                                  terms from the acqisition for the acquired material at each iteration
# - `sample_gp`: whether or not to store sample GP functions
# - `initialize_with`: specify which and/or how many materials to initialize the search
# - `kwargs`: dictionary of optional keyword arguments
# """
# function run_bayesian_optimization(X, y1, y2, nb_iterations::Int, 
#                                    nb_COFs_initialization::Int;
#                                    which_acquisition::Symbol=:EI,
#                                    store_explore_exploit_terms::Bool=false,
#                                    sample_gp::Bool=false,
#                                    initialize_with::Union{Array{Int, 1}, Nothing}=nothing,
#                                    kwargs::Dict{Symbol, Any}=Dict{Symbol, Any}())
#     # quick checks
#     @assert nb_iterations > nb_COFs_initialization "More initializations than itterations not allowed."
#     @assert which_acquisition in [:EI] "Acquisition function not supported:\t $(which_acquisition)"
    
#     # create array to store explore-explot terms if needed
#     if store_explore_exploit_terms
#         # store as (explore, exploit, fidelity)
#         explore_exploit_balance = Tuple{Float64, Float64, Int64}[]
#     end
    
#     ###
#     #  1. randomly select COF IDs for training initial GP
#     ###
#     if isnothing(initialize_with)
#         ids_acquired = StatsBase.sample(1:nb_COFs, nb_COFs_initialization, replace=false)
#         @assert length(unique(ids_acquired)) == nb_COFs_initialization
#     else
#         # initialize using a specified set of indecies
#         ids_acquired = initialize_with
#         fidelity = 1
#         @assert length(unique(ids_acquired)) == nb_COFs_initialization
#     end
#     # initialize using ONLY the high-fidelity results
#     x = X[ids_acquired, :]
#     train_X = torch.from_numpy(x)       # feature vectors
#     y1 = torch.from_numpy(gcmc_y[ids_acquired])  # low-fidelity
#     y2 = torch.from_numpy(gcmc_y[ids_acquired])  # high-fidelity
#     train_y = torch.stack([y1, y2], -1)
#     # standardize outputs using *only currently acquired data*
#     y_acquired = (y_acquired - torch.mean(y_acquired)) / torch.std(y_acquired)
    
#     # uses ICM [here](https://botorch.org/api/models.html#multitaskgp)
#     # botorch.models.multitask.MultiTaskGP(train_X, train_Y, task_feature, 
#     #     task_covar_prior=None, output_tasks=None, 
#     #     rank=None, input_transform=None, outcome_transform=None)
#     model = botorch.models.multitask.MultiTaskGP(train_X, train_Y, -1)
    
# end

## RUN MTBO