# Bayesian Optimization

#### Bayesian Hyperparameter Optimization
* Why auto-tuning matters:
    * Humans are really bad at it
    * Properly set parameters outperform the most complex, state-of-the-art models
* Tuning tips:
    * Keep an open-mind (explore the full space from the beginning)
        * Don't prejudge hyperparameter possibilities
    * Don't do grid search as your hyperparameter search method
        * In practice you'll have many hyperparameters, some of them will matter, and some of them will end up being mostly irrelevant
    * Try and eliminate irrelevant hyperparameters where possible
        * The more parameters you have, the harder it is to tune
    * To see a clear pattern it can take way longer than you expect
        * If you want to have really good tuning, be prepared to spend a lot of time on it
        
**Bayesian parameter estimation for automatically tuning hyperparameters:**
* Neural nets have certain hyperparameters which aren't part of the training procedure
* You can evaluate them using a validation set, but there's still the problem of which values to try:
    * Brute force search (e.g. grid search, random search) is very expensive and time-consuming
* Hyperparameter tuning is a kind of black box optimization: you want to minimize a function, but you only get to query values, not compute gradients
* Each evaluation is expensive, so we want to use few evaluations
* You want to query a point which:
    * you expect to be good
    * you are uncertain about
* $\Rightarrow$ **Bayesian regression allows us to predict not just a value, but a distribution.** $\Leftarrow$

**Bayesian Linear Regression**
* We're interested in the uncertainty
* Bayesian Linear Regression considers various plausible explanations for how the data were generated 
* It makes predictions using all possible regression weights, weighted by their posterior probability
* We can turn this into non-linear regression using basis functions (e.g. Gaussian basis functions)