# Bayesian Optimization

We seek to optimize a black-box function $f(x)$, where $x \in \mathcal{D} \subset \mathbb{R}^D$ for some simple domain $\mathcal{D}$ (e.g. the unit hypercube). Evaluations of the function $f$ do not yield gradient information, and $f$ is typically assumed to be expensive to evaluate. The method of Bayesian optimization (BO) addresses these challenges by placing a prior distribution on $f$, and then iteratively selects input points and at each step updates the distribution on $f$. This distribution is then used to guide the selection of new points. The prior distribution on $f$ is typically a Gaussian process (GP), 

$$ y(\cdot) \sim \mathcal{GP}(\mu(\cdot), k(\cdot, \cdot))$$

**TODO: include the generic BO algorithm here.**

**TODO: need to clarify; at each iteration, are we re-fitting hyperparameters?**

We denote the GP conditioned on the first $n$ points as 

$$ y(\cdot)|D_n \sim \mathcal{GP}(\mu_n(\cdot), k_n(\cdot, \cdot))$$


# Probability of Improvement

A very simple acquisition function is the probability of improvement (PI), which simply selects the next input $x_{n+1}$ to maximize the probability (under the current GP distribution) that $f(x_{n+1})$ is smaller than the current minimum. We denote the current minimum by 

$$ f^n_{\text{min}} := \min\left\{f(x_1), \dots, f(x_n) \right\}$$

The PI acquisition is thus defined by 
$$
\begin{align*}
a_{PI}(x) &= \mathbb{P}\left(y(x) \leq f^n_{\text{min}} \right) \\
          &= \int_{-\infty}^{f^n_{\text{min}}} \mathcal{N}(y(x)|\mu_n(x), k_n(x)) dy(x) \\
          &= \Phi\left(\frac{f^n_{\text{min}} - \mu_n(x)}{\sqrt{k_n(x)}} \right)
\end{align*}
$$

In [None]:
# f (x) = exp(−1.4x) cos(7πx/2)
# Objective to try. 