# Bayesian Optimization

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/ChemAI-Lab/AI4Chem/blob/main/website/modules/06-atomic_simulation_environment.ipynb)

**References:**
1. **Chapters 1-3**: [Bayesian Optimization](https://bayesoptbook.com/book/bayesoptbook.pdf), R. Garnett
2. **Chapters 6**: [Pattern Recognition and Machine Learning](https://www.microsoft.com/en-us/research/wp-content/uploads/2006/01/Bishop-Pattern-Recognition-and-Machine-Learning-2006.pdf), C. M. Bishop.
3. **Chapter 2**:  [Gaussian Processes for Machine Learning](https://direct.mit.edu/books/oa-monograph-pdf/2514321/book_9780262256834.pdf), C. E. Rasmussen, C. K. I. Williams
4. **Chapter 4**: [Machine Learning in Quantum Sciences](https://arxiv.org/pdf/2204.04198)
5. **Chapter 6**: [Probabilistic Machine Learning: An Introduction, K. P. Murphy.](https://probml.github.io/pml-book/book1.html)
6. [**The Kernel Cookbook**](https://www.cs.toronto.edu/~duvenaud/cookbook/)

# Optimization without Gradients

During the course, we saw that many scientific problems can be recast as an optimization problem,
$$
\mathbf{x}^* = \arg\min f(\mathbf{x})
$$
where $\mathbf{x}^*$ is the **minimizer** of the function $f$. <br>
We have solved this problem using gradient-based methods, where at each step we used the local information of the gradient to move "towards" the minimizer of $f$, using
$$
\mathbf{x}^*_{t+1} = \mathbf{x}^*_{t} - \eta \nabla_{\mathbf{x}}f.
$$
These style of methods have been successful in scientific frameworks where $f$ is differentiable, or its gradient can be easily estimated. However, for other systems where $f$ is a **black box** function, gradient-based optimization is unfeasible.

## Black Box Function
* A black box function is a system, algorithm, or piece of code where only the inputs and outputs are visible, while the internal logic, mechanisms, or code structure are hidden or unknown.

```
x ∈ ℝ^d   ─────▶   [   BLACK BOX  f(x)   ]   ─────▶   y = f(x)
(parameters)                                   (objective value)
```

*  Expensive (minutes–days)
*  No gradients available  
*  Noisy observations  <br>


> In **Bayesian Optimization**, we assume we can query the function, but we cannot inspect its internals. <br>
> We cannot differentiate it analytically, and each evaluation may cost minutes, hours, or even days. <br>
> Therefore, we must be strategic about where to sample next.

In [None]:
#