# **PART 2B**: Under the hood of MFBO: multi-fidelity Gaussian process regression (MFGPR)

This Juypter notebook contains more in-depth information on MFGPR and how it can be used in `F3DASM`.

## 1. Import the necessary packages.

In [None]:
import f3dasm
import numpy as np
import pandas as pd

## 2. Define the hyperparameters

In [None]:
dim = 1
fids = [0.5, 1.0]
costs = [0.5, 1.0]
samp_nos = [500, 10] # Explanation: [no. of low fidelity pts., no. of high fidelity pts.]

noise_fix = False

## 3. Specify the multi-fidelity problem
In the part 1, the user employs `create_analytical_mf_problem` as a pipeline method to generate elements of a multi-fidelity problem based on analytical functions.

Here, we go more in depth about what this feature does.

Pick a base function.

In [None]:
base_fun = f3dasm.functions.Periodic(
    dimensionality=dim,
    scale_bounds=np.tile([0.0, 1.0], (dim, 1)),
)

Compile the list of augmented functions that constitute the multi-fidelity function.

In [None]:
funs = []
for fid in fids:
        fun = f3dasm.AugmentedFunction(
                base_fun=base_fun,
                fid=fid,
                )

        funs.append(fun)

Next, the multi-fidelity design space is created as a list of augmentations of the design parameter space.

In [None]:
mf_design_space = []

for fid in fids:
    design = f3dasm.make_nd_continuous_design(
        bounds=np.tile([0.0, 1.0], (dim, 1)),
        dimensionality=dim,
    )
    fidelity_parameter = f3dasm.ConstantParameter(name="fid", constant_value=fid)
    design.add_input_space(fidelity_parameter)

    mf_design_space.append(design)

In [None]:
mf_design_space

Each of the fidelity-augmented design spaces then gives rise to a sampler.

In [None]:
mf_sampler = []
for design in mf_design_space:
    
    sampler = f3dasm.sampling.SobolSequence(design=design)
    mf_sampler.append(sampler)

Finally, a DoE from each augmented design space is sampled, and subsequently combined into a multi-fidelity DoE.

In [None]:
mf_train_data = []
for sampler, fun,  samp_no in zip(mf_sampler, funs, samp_nos):
    train_data = sampler.get_samples(numsamples=samp_no)

    train_data.add_output(output=fun(train_data))    
    
    mf_train_data.append(train_data)

mf_train_data[-1].data = pd.concat([d.data for d in mf_train_data], ignore_index=True)

In [None]:
mf_train_data[-1].data

## 4. Regression and prediction


A regressor object is built based on the (multi-fidelity) data and the design.

This type of multi-fidelity Gaussian process regression is described in the following article:

```{bibliography}
Wu, J.; Toscano-Palmerin, S.; Frazier, P. I. & Wilson, A. G.
Practical multi-fidelity bayesian optimization for hyperparameter tuning 
Uncertainty in Artificial Intelligence, 2020, 788-798
```

In [None]:
regressor = f3dasm.regression.gpr.Stmf(
    mf_train_data=mf_train_data[-1],
    mf_design=mf_train_data[-1].design,
    noise_fix=noise_fix,
)

The resulting surrogate model is obtained by training the regressor.

In [None]:
surrogate = regressor.train()

This surrogate model can then be used to predict the (high-fidelity) output for any point in the design space.

In [None]:
test_sampler = f3dasm.sampling.LatinHypercube(design=mf_design_space[-1])
test_data = test_sampler.get_samples(numsamples=500)

mean, var = surrogate.predict(test_data)

Finally, let's plot the prediction alongside the exact objective landscape.

In [None]:
test_data.add_output(mean)
test_data_var = np.hstack((test_data.data.values, var))
test_sort = test_data_var[test_data_var[:, 0].argsort()]

ucb, lcb = [test_sort[:, 2] + 2 * (-1) ** k * np.sqrt(np.abs(test_sort[:, 3])) for k in range(2)]

import matplotlib.pyplot as plt
hf_data = train_data.data.loc[train_data.data['input', 'fid'] == 1.]
plt.scatter(hf_data['input', 'x0'], hf_data['output'], label='High fidelity data', c='b')
plt.plot(test_sort[:, 0], test_sort[:, 2], color='purple', label='High fidelity prediction')
plt.fill_between(test_sort[:, 0].flatten(), lcb, ucb, color='purple', alpha=.25, label='Confidence')
plt.plot(test_sort[:, 0], base_fun(test_sort[:, 0][:, None]), 'b--', label='High fidelity exact')
plt.xlabel('x')
plt.ylabel('y')
plt.legend()
plt.show()

### 5. Exercises.
1. Change the (base) function into the Schwefel function. What do you notice?
2. Use `50` high-fidelity data points and `500` low-fidelity data points. Compare the result with the single-fidelity result with suggested hyperparameter settings of Exercise 3 in part A. What do you notice?