### Hackathon 3: Optimisation of a Bioprocess with Multifidelity Bayesian Optimisation


#### Hackathon Breif
This hackathon involves the optimisation of a simulated bioprocess at process scale involving CHO cells to produce a desired protein. (ie. growing and feeding cells under precise conditions to produce the desired product).

#### Inputs and Outputs
Inputs to the bioprocess includes 5 vairables: the temperature [°C], pH and the concentration of feed [mM] at 3 different timepoints over 150 minutes. The output is the concentration of the titre (desired product) [g/L]. The goal is to obtain the input variables that correspond to the highest obtained titre. 

The bounds of the inputs are as follows: 

```
temperature [°C]               -> 30 - 40
pH                             -> 6 - 8
first feed concentration [mM]  -> 0 - 50
second feed concentration [mM] -> 0 - 50
third feed concentration [mM]  -> 0 - 50
```

#### Fidelities and Running the simulation
The simulations can be perfomed at 3 levels of fidelities with an associated accuracy and costs. These fidelities corresponds to a different reactor type and scale used. 

```
Lowest fideility: 3L reactor with 1 feeding timepoint at 60 mins.
Realtive cost: 10
Remarks: The feeding concentration is taken as the second feed concentration. Lowest accuracy, but also lowest cost. 

Middle fidelity: 3L reactor with 3 feeding timepoints at 40, 80, 120 mins.
Relative cost: 575
Remarks: -

Highest fidelity: 15L reactor with 3 feeding timepoints at 40, 80, 120 mins.
Relative cost: 2100
Remarks: Highest accuracy but high cost.
```

To run an experiment, one can use the `vl.conduct_experiment(X)` function -> this is your objective function. The inputs to this function is a matrix of shape (N, 6) where N is the number of data points and 6 refers to the total number of variables in the following order: `[temperature, pH, feed1, feed2, feed3, fidelity]`. The fidelities are refered to as integers where `0` corresponds to the lowest fidelity, `1` with the middle and `2` with the highest fidelity. An example is shown below. 

``` python
def obj_func(X):
	return (-np.array(vl.conduct_experiment(X)) #negative placed if optimisation performed is minimisation

X_initial = np.array([[33, 6.25, 10, 20, 20, 0],
                      [38, 8, 20, 10, 20, 0]])
Y_initial = vl.conduct_experiment(X_initial)
print(Y_initial)
```

#### Goal and Submission
Your goal is to develop a Bayesian Optimisation class to obtain the set of inputs which maximizes the titre. You have a budget of 10000 (observe the cost of running each fidelity), a maximum runtime (on the intructor's computer - be aware of how large the search space becomes especially with 6 dimensions!) and starting with a maximum of 6 training points. (Remember, you have to have at least 2 points for each variable for the covariance matrix to be calculated.)

Like in previous hackathons, please submit your BO class (and GP class) along with the execution block to the Stremlit app. A different cell type (with different simulation parameters and maxima) will be used for scoring.

This hackathon will be scored based on the sum of the normalised maximum titre concentration obtained.

You must stay within the allocated budget! This will be checked, and if exceeded, your submission will be disqualified!

#### Form of the BO class and execution block
You are allowed to write your own BO class or make modifications to any of the previously seen BO classes. 

You must include the attributes `self.X` and `self.Y` corresponding to all of your evaluated inputs and outputs as this will be used to retrive the information used for scoring. 

```python
#submission should look something like the following
class GP: #if you have any separate classes other than the BO class
    def __init__(self, ...):
        ...
#BO class
class BO: 
    def __init__(self, ...):
        self.X = #training data which the evaluated data is to be appended
        self.Y = #evaluated via the objective function using self.X

# BO Execution Block
X_training = [...]
X_seachspace = [...]

BO_m = BO(...)
```

#### Guidance (Advanced)
You must develop a multifidelity Bayesian Optimisation algorithm. Your scoring will be additionally penalised by the code runtime in the units of seconds. Make your algorithm as fast as it can go!


#### Guidance (Intermediate) 
It is not mandatory for you to develop a multifidelity BO algorithm. You could, if you choose, use the single or batch BO algorithm developed previously to perform the optimisation. The lowest fidelity experiments do not offer accurate outcomes and you have to choose how many number of expeirments for each fidelity to be performed such that you do not exceed your allocated budget. 

However, if you do wish to tackle the hackathon via a multifidelity BO by modifying the single batch BO code from the first hackathon. Here are some pointers. 
1. Observe the output of the `vl.conduct_experiment(X)` function. The output does not have the same array structure/shape as the outputs obtained in the previous sections. You have to modify this in order to accomodate for the BO algorithm.
2. Create a new acquisition function that is cost aware. We have previously used Lower Confidence Bound to balance exploration and exploitation of the search space. To make this cost aware, we can scale the values obtained from LCB by the cost.

```python
    def MF_lower_confidence_bound(...):
        lower_std = Ysearchspace_mean - acquisition_hyperparam[0]*np.sqrt(Ysearchspace_std)
        # mf_lower_std = lower_std / assocated cost for each simulation
        return (X_searchspace[np.argmin(mf_lower_std)])
```

If this is done succefully, well done! However, you might see that the code will run rather slowly for each iteration (remember how the runtime scales with respect to additional dimensions in the search space). If you are finding it difficult to run the full iterations, a recommendation is to lower the number of total points in your search space. For example, if you are using the np.linspace() function, start with a very course number of points for each dimension (ex. 3) to develop your code. Once you are happy that the code can run without errors, then you can increase the number of points per dimension. 

#### Package Imports

Packages are limited to the the ones listed in the package cell - Talk to one of the intructors to ask if it is possible to import other packages

In [None]:
# if using google collab, run the following pip installs!
!pip install sobol_seq
!pip install plotly
!pip install gpytorch
!pip install rdkit

In [None]:
import numpy as np
import numpy.random as rnd
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import axes3d, Axes3D
import plotly.graph_objs as go
from scipy.integrate import quad
from scipy.spatial.distance import cdist
from scipy.optimize import minimize, differential_evolution, NonlinearConstraint
from sklearn.decomposition import PCA
import math
import time
import sobol_seq
import torch
import gpytorch
import copy

import virtual_lab as vl
import conditions_data as data
from utils import standardize_data, unstandardize_y, train_gp_model, \
    expected_improvement, summed_feeding, optimize_acquisition_function, plot_data