Skip to content

Explanation & Exploration for variable Bayesian Optimization vs Grid Search vs Random Search, for hyper parameter optimization.

Notifications You must be signed in to change notification settings

Mikhail-Naumov/Bayesian_Opt_Testing

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

18 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Bayesian_Opt_Testing

Simple Sample Baye Opt

While a number of production ready algorithms exist such as:
- MOE,
- Spearmint,
- hyperopt,
- GPyOpt,

This notebook handles making a simple algorithm.

I did not come up with this algorithm, I am simply exploring its functionality!

Simply put, the Bayesian Optimizers requires:

A Gaussian process- sklearn.gaussian_process as gp

  • From which we will predict the posterior distribution of the target function (function of errors for models).

  • Because a Gaussian distribution is defined by its mean and variance, this model relies on the assumption that the Gaussian process is also completely defined by its mean function and variance function.

    • Until math has another revolution and we discover that we know nothing about math (which I seem to find a lot) we can assume this is a pretty safe assumption (GP ~ mean function & variance function).\

    A GP is a popular probability model, because it induces a posterior distribution over the loss function that is analytically tractable. This allows us to update our beliefs of what looks like, after we have computed the loss for a new set of hyperparameters. https://thuijskens.github.io/2016/12/29/bayesian-optimisation/

An Aquisition function

- Which decides at which point in our target function, we want to sample next.
- A number of well documented aquistion functions exist, listed below:

Probability of Improvement

- Looks where a function's **improvement is most likely**
- Can lead to odd behavior because it relies on the current minimum, rather than the magnatude of possiblity of improvement.

  - **Expected improvement (EI)** (also MEI?) - should confirm
    - Looks where a function **may most improve**, aka *maximal expected utility*
    - EI(x)=𝔼[max{0,f(x)−f(x̂ )}]
    - Crowd favorite
  
  - **Entropy search**
    - Improves function by **minimizing the uncertainty** of any predicted optimium.

  - **Upper Confidence Bound** (UCB)
    - Looks where a function's **improvement is most likely**
    - Looks to exploits possibly uncertainty by finding where the upperbound may be undetermined.
    
  - **Maximum probability of improvement** (MPI)
  
  - **PMAX**
  - **IEMAX**
  - **GP-Hedge**

As to follow the crowd this notebook will use the Expected Improvement function, for reasons I may revisit this notebook to explain.

With these two parts our program should:

Pseudo-code:

  • Given observed values of target function f(x), update the posterior expectation of f using the Gaussian Process.
  • Find new_x that maximises the EI: new_x = argmax EI(x).
  • Compute the value of f(new_x).
  • Update posterior expectation of f
    • Repeat this process for n_iterations

Example Function

As we will ultimately be looking at hyperparameters, by treating the score or error as a function of the parameters. In this case we treat it as an optimization problem for an example function, finding the global minimum (for error).

example eq_1

true

GridSearch & RandomSearch

Using 15 points, finding minimums.

grid random

GridSearching misses the true global min, as true existed outside of discrete boundaries RandomSearch also misses true global min, but could have been better than Grid

Bayesian Opt

Generating priors(inital 5)

bay_5

Updating process with each aquisition. Note the red line maps to the aquisition function on the bottom and refer to the where to look next

bay_6 bay_7 bay_8 bay_9 bay_10 bay_11 bay_12 bay_13 bay_14 bay_15

Bay finds global optimum, confirms it is global, and searchs for the true.

Comparing methods from 15 Points

grid random bay_15 true

  • Gridsearching would search the param space symetrically and systematically.
    • thorough, inefficient, uniformity between samples may miss details.
      • weak in higher dimensional space
  • Randomsearching would search the param space randomly.
    • efficient, less thorough, reliant on sufficent iterations
      • stronger in higher dimensional space

neither learn from previously selected elements in the parameter space.

  • Bayesian however, does learn from previous elements, and works effectively with increased dimensional space.

2 Parameters

Example Function

Again, by treating the score or error as a function of the parameters. In this case we treat it as an optimization problem for an example function, finding the global maximum (for score). In this particular example, there are 2 maximums, which otherwise with grid and random searched would lead to confusion.

6_hump_eq

camel_true

GridSearch & RandomSearch

Using 25 points, finding minimums.

camel_grid camel_random

Bayesian Opt

Generating priors (inital 5)

camel_5

Updating process with each aquisition. Note the point on the aquisition map and refer to the where to look next

Bayesian (10 Points) camel_10

Bayesian (15 Points) camel_15

Bayesian (20 Points) - At this point, the assumed function is fairly accurate to the true camel_20

Bayesian (25 Points) camel_25

Comparing Methods with 25 points

camel_grid camel_random camel_25 camel_true

Again...

  • Gridsearching would search the param space symetrically and systematically.
    • thorough, inefficient, uniformity between samples may miss details.
      • weak in higher dimensional space
  • Randomsearching would search the param space randomly.
    • efficient, less thorough, reliant on sufficent iterations
      • stronger in higher dimensional space

neither learn from previously selected elements in the parameter space.

  • Bayesian however, does learn from previous elements, and works effectively with increased dimensional space.

    At 20 aq points with the bayesian map, the theorized model closely resembles the true function, as compared grid & random with 25 points.

Resources:

https://towardsdatascience.com/shallow-understanding-on-bayesian-optimization-324b6c1f7083 https://thuijskens.github.io/2016/12/29/bayesian-optimisation/ https://github.com/fmfn/BayesianOptimization https://sheffieldml.github.io/GPyOpt/

About

Explanation & Exploration for variable Bayesian Optimization vs Grid Search vs Random Search, for hyper parameter optimization.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published