### Hackathon 1: Optimisation of a non-convex 2-D function with Bayesian Optimisation

#### Hackathon Brief
This hackathon involves the optimisation of a complex, non-convex function with Bayesian Optimisation. 

#### Optimisation Details and Constraints
Using your knowledge of how Gaussian Processes and Bayesian Optimisation operate (mean function, convariance/kernal functions, initialisation points, acquisition functions etc.), your task is to develop two python classes: a GP class and a BO class (similar to what was seen previously in sections B and C) to obtain input values (x1 and x2) which conresponds to the minima of a complex, non-convex 2D function. The exact function is not revealed, but is based on a modification of either the `Rastrigin, Ackley or Shubert function`. <a href="https://www.sfu.ca/~ssurjano/optimization.html"> See here for equations.</a>

```
Optimisation Task : Minimisation.
Search space      : x1 and x2 both from -10 to 10. 
Constrains        : Budget of 50 iterations. (See guidance for advanced teams.)
Training points   : Allowed a maximum of 15 initial training points.

```

#### Submission Details
A template and example of a GP and BO class function can be seen below (the same code structure as seen previously). You are allowed to write your own BO class or copy-paste/make modifications to any of the previously seen BO classes. 

You must include the attributes `self.X` and `self.Y` corresponding to all of your evaluated inputs and outputs in your BO class as this will be used to retrive the information used for scoring. You must include the input `objective_func` to your BO class as the intructor will parse the scoring objecting function as as input! Please remove/comment your own test objective functions when submitting!

```python
#submission should look something like the following
class GP: #if you have any separate classes other than the BO class
    def __init__(self, ...):
        ...
#BO class
class BO: 
    def __init__(self, ...):
        self.X = #training data which the evaluated data is to be appended
        self.Y = #evaluated via the objective function using self.X

# BO Execution Block
X_training = [...] #maximum of 15 training points


#Please remove/comment your objective functions when submitting!
#def obj_func(X):
#	return (...)

BO_m = BO(...,
          objective_func = obj_func, #please have objective_func as input and do not change the 'obj_func' variable name!
         ...)
```

Once completed, please upload your classes to the Stremlet submission page where your code will be tested against an objective function. The scoring is based on the lowest output obtained (the lower the 

#### Guidance (Advanced):
It is encouraged that you write your own GP/BO algorithm! You have a range of possibilities from implimenting better kernels to using designer acquisition functions (given some knowledge of the function). There are packages (see below) that can be used to drastically improve tensor manipulation and increase the speed to which your code runs. Please be adviced that the code will be run on the instructor's laptop and will not have any GPUs!

You do not have a maximum iteration budget. However, your score will be penalised by the number of iterations^2! (ie, your final score will be `min(self.Y) - iteration**2`) With this, you *must* define the attribute `self.iterations = # number_of_iterations` in your BO class.

#### Guidance (Intermediates):
You are more than welcomed to write your own GP/BO algorithm! You can also use the template given below. You should have some familiarity with how the basic GP and BO class in the given template work from sections B and C. 

The search space is defined in the BO class already. You can change the number of points per variable `number_points_pervariable = 200` within the BO class if you wish. Here are some additional guidance:

##### Form of Training Points Input (X):
If you are using the template given, like previous examples, the input of training points X must be a matrix with shape (N,2) where N is the number of training points. An example of a symmetrical input can be made with list comprehension. Tailored matrixes can be made and inputed directly. Example:

``` python
x1loc  = [1,2]; x2loc = [3,4]
X_training = np.array([[x1,x2] for x1 in x1loc for x2 in x2loc])

# This is equivalent to:
X_training =  [[1, 3], [1, 4], [2, 3], [2, 4]]
```

##### Form of GP and BO class
It is advised that key aspects of the class (ex. hyperparameters, acquisition function etc.) be observed and modified first to obtain a better BO algorithm prior to changing the mathematical implimentation of the GP/BO class. Under the BO class, you must remember to include (or not delete) the addition of the new training data point to 'self.X' as the score ranking will be dependant on observing which inputs in self.X corresponds to the lowest output obtained in self.Y.

##### Some Starting Points to Help
1. The potential objective function is given - look up the functions! You can plot these functions (or observe them through a graphing website like Desmos) to observe the nature of the functions. (symmetry, number of minima etc.) Use this information to decide your BO strategy.
2. Similar to Section C, modify the code to include plots of how your BO functions are performing over each iteration. This will help you visualise and evaluate the usefulness of your modifications.
3. Decide the initial training points (Random? Bias? Uniform distrubution?) and acquisition function (Greedy? Purely explorative? Lower confidence bounnd? Expected Improvement?) with the associated hyperparameters.

Other considerations:
1. Will changing the GP mean function help with efficiency?
2. Are there better kernels for non-convex functions?


#### Package Imports

Packages are limited to the the ones listed in the package cell - Talk to one of the intructors to ask if it is possible to import other packages

In [None]:
# if using google collab, run the following pip installs!
!pip install sobol_seq
!pip install plotly
!pip install gpytorch
!pip install rdkit

In [None]:
import numpy as np
import numpy.random as rnd
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import axes3d, Axes3D
import plotly.graph_objs as go
from scipy.integrate import quad
from scipy.spatial.distance import cdist
from scipy.optimize import minimize, differential_evolution, NonlinearConstraint
from sklearn.decomposition import PCA
import math
import time
import sobol_seq
import torch
import gpytorch
import copy

#### GP class, BO class, BO Execution Template/Example


```python
#GP class
class GP:
    def __init__(self, X, Y, kernel):
        
        self.X, self.Y, self.kernel                                  = X, Y, kernel
        self.number_of_point, self.nx_dimensions, self.ny_dimensions = X.shape[0], X.shape[1], Y.shape[1]
        self.multistart_loops                                        = 3

        self.X_mean, self.X_std     = np.mean(X, axis=0), np.std(X, axis=0)
        self.Y_mean, self.Y_std     = np.mean(Y, axis=0), np.std(Y, axis=0)
        self.X_norm, self.Y_norm    = (X-self.X_mean)/self.X_std, (Y-self.Y_mean)/self.Y_std

        self.hyperparam_optimized , self.inverse_covariance_matrix_opt   = self.determine_hyperparameters()     
        
    def Cov_mat(self, kernel, X_norm, W, sf2):
        if kernel == 'SquaredExponential':
            xixj_euclidean_distance = cdist(X_norm, X_norm, 'seuclidean', V=W)**2 
            cov_matrix = sf2*np.exp(-0.5*xixj_euclidean_distance)
            return (cov_matrix)
        else:
            print('ERROR no kernel with name ', kernel)

    def negative_loglikelihood(self, hyper, X, Y):
        n_point, nx_dim = self.number_of_point, self.nx_dimensions
        kernel          = self.kernel
        
        W               = np.exp(2*hyper[:nx_dim])   
        sf2             = np.exp(2*hyper[nx_dim])    
        sn2             = np.exp(2*hyper[nx_dim+1])  

        K       = self.Cov_mat(kernel, X, W, sf2)  
        K       = K + (sn2 + 1e-8)*np.eye(n_point)
        K       = (K + K.T)*0.5                    
        L       = np.linalg.cholesky(K)           
        logdetK = 2 * np.sum(np.log(np.diag(L)))   
        invLY   = np.linalg.solve(L,Y)             
        alpha   = np.linalg.solve(L.T,invLY)       
        NLL     = np.dot(Y.T,alpha) + logdetK      
        return (NLL)

    def determine_hyperparameters(self): 
        lower_bound = np.array([-4.]*(self.nx_dimensions+1) + [-8.])  
        upper_bound = np.array([4.]*(self.nx_dimensions+1) + [ -2.]) 
        bounds      = np.hstack((lower_bound.reshape(self.nx_dimensions+2,1), upper_bound.reshape(self.nx_dimensions+2,1)))
    
        multi_startvec                = sobol_seq.i4_sobol_generate(self.nx_dimensions + 2, self.multistart_loops)
        
        temp_min_hyperparams          = [0.]*self.multistart_loops
        temp_loglikelihood            = np.zeros((self.multistart_loops))
        hyperparam_optimized          = np.zeros((self.nx_dimensions+2, self.ny_dimensions)) #for best solutions
        inverse_covariance_matrix_opt = []
        
        for i in range(self.ny_dimensions):
            for j in range(self.multistart_loops ):
                hyperparams_initialisation   = lower_bound + (upper_bound-lower_bound)*multi_startvec[j,:] # mapping sobol unit cube to boudns
                result  = minimize(self.negative_loglikelihood,
                                   hyperparams_initialisation,
                                   args     = (self.X_norm, self.Y_norm[:,i]),
                                   method   = 'SLSQP',
                                   options  = {'disp':False,'maxiter':10000},
                                   bounds   = bounds,
                                   tol      = 1e-12)
                temp_min_hyperparams[j] = result.x
                temp_loglikelihood[j]   = result.fun  

            minimumloglikelihood_index    = np.argmin(temp_loglikelihood)
            hyperparam_optimized[:,i]     = temp_min_hyperparams[minimumloglikelihood_index  ]
    
            lengthscale_opt         = np.exp(2.*hyperparam_optimized[:self.nx_dimensions,i])
            signalvarience_opt      = np.exp(2.*hyperparam_optimized[self.nx_dimensions,i])
            noise_opt               = np.exp(2.*hyperparam_optimized[self.nx_dimensions+1,i]) + 1e-8
    
            covarience_matrix_opt              = self.Cov_mat(self.kernel, self.X_norm, lengthscale_opt,signalvarience_opt) + noise_opt*np.eye(self.number_of_point)   
            inverse_covariance_matrix_opt     += [np.linalg.solve(covarience_matrix_opt, np.eye(self.number_of_point))]
        return (hyperparam_optimized , inverse_covariance_matrix_opt)


    def calc_cov_sample(self,xnorm,Xnorm,ell,sf2):
        nx_dim     = self.nx_dimensions
        dist       = cdist(Xnorm, xnorm.reshape(1,nx_dim), 'seuclidean', V=ell)**2
        cov_matrix = sf2 * np.exp(-.5*dist)
        return (cov_matrix )         


    def GP_inference_np(self, x):
        nx_dim                   = self.nx_dimensions
        kernel, ny_dim           = self.kernel, self.ny_dimensions
        hypopt, Cov_mat          = self.hyperparam_optimized, self.Cov_mat
        stdX, stdY, meanX, meanY = self.X_std, self.Y_std, self.X_mean, self.Y_mean
        calc_cov_sample          = self.calc_cov_sample
        invKsample               = self.inverse_covariance_matrix_opt
        Xsample, Ysample         = self.X_norm, self.Y_norm

        xnorm = (x - meanX)/stdX
        mean  = np.zeros(ny_dim)
        var   = np.zeros(ny_dim)
        
        for i in range(ny_dim):
            invK           = invKsample[i]
            hyper          = hypopt[:,i]
            ellopt, sf2opt = np.exp(2*hyper[:nx_dim]), np.exp(2*hyper[nx_dim])

            k             = calc_cov_sample(xnorm,Xsample,ellopt,sf2opt)
            raw_mean      = np.matmul(np.matmul(k.T,invK),Ysample[:,i]).item()
            mean[i]       = raw_mean
            raw_var_array = np.maximum(0, sf2opt - (k.T @ invK @ k)).item()
            raw_var       = raw_var_array
            var[i]        = raw_var
    
        mean_sample = mean*stdY + meanY
        var_sample  = var*stdY**2
        
        return (mean_sample, var_sample)

#BO class
class BO: 
    def __init__(self, X, kernel, acquisition_function, objective_func, acquisition_hyperparam, iterations):       
        number_points_pervariable      = 200
        number_points_searchspace      = number_points_pervariable ** (np.shape(X_training)[1])
        X_searchspace                  = np.linspace(-10, 10, num=number_points_pervariable)
        X_searchspace                  = np.array([[x,y] for x in X_searchspace for y in X_searchspace])
        
        self.X, self.iterations  = X, iterations
        Fx_training              = np.array([objective_func(x) for x in X_training])
        self.Y                   = Fx_training.reshape(Fx_training.shape[0],1)
        
        fx_searchspace           = np.array([objective_func(x) for x in X_searchspace])
        n_candidates             = X_searchspace.shape[0]
        Ysearchspace_mean        = np.zeros(n_candidates)
        Ysearchspace_std         = np.zeros(n_candidates)
        
        for i in range(iterations):
            GP_m = GP(self.X, self.Y, kernel)
            
            for number in range(len(X_searchspace)):
                m_ii, std_ii   = GP_m.GP_inference_np(X_searchspace[number])
                Ysearchspace_mean[number] = m_ii.item()
                Ysearchspace_std[number]  = std_ii.item()
   
            if acquisition_function == 'greedy':
                X_acquisitionfunc = self.greedy_fullexplotative(X_searchspace,Ysearchspace_mean, Ysearchspace_std )     
            else: 
                print('No acquisition function called ', acquisition_function)
                break
            
            self.X = np.append(self.X, [X_acquisitionfunc],0)
            self.Y = np.append(self.Y, [[objective_func(X_acquisitionfunc)]],0)
    
    def greedy_fullexplotative(self, X_searchspace, Ysearchspace_mean, Ysearchspace_std):
        return (X_searchspace[np.argmin(Ysearchspace_mean)])

# BO Execution Block
X_training = [...]

#Please remove/comment your objective functions when submitting!
#def obj_func(X):
#	return (...)

BO_m = BO(X = ...,  
           kernel = ..., 
           iterations = ..., # maximum of 50 iterations for intermediate teams!
           acquisition_function = ..., 
           objective_func = obj_func, #please have objective_func as input and do not change the 'obj_func' variable name!
           acquisition_hyperparam= [...],
           ...)
```