## Optimization loop

Once we have a set-up with a [reference state](reference.ipynb) and variational form (i.e. an [ansatz](ansatz.ipynb)), that is, we have a set of quantum states to explore, we can start searching for the one that solves our target problem. The way that this is usually done is via optimization: we frame our problem in a way that the solution presents itself as the minimum (or maximum) of certain cost function $C(\vec\theta)$, and we search for the state in our collection which minimizes (maximizes) it.

The easiest example of this is finding the ground state of a system. We represent the possible states of such system on our quantum computer (i.e. through some [encoding](https://qiskit.org/documentation/nature/apidocs/qiskit_nature.second_q.mappers.html)), and the cost function to minimize is the expectation value for the observable representing energy (i.e. the Hamiltonian $\hat{\mathcal{H}}$): $C(\vec\theta) \equiv \langle\psi(\vec{\theta})|\hat{\mathcal{H}}|\psi(\vec{\theta})\rangle$. In order to evaluate said expectation value we can then use the [_Estimator_ primitive](https://github.com/qiskit-community/prototype-zne/blob/main/docs/tutorials/0-estimator.ipynb) in any of its flavors —albeit some [tradeoffs](https://qiskit.org/documentation/partners/qiskit_ibm_runtime/tutorials/Error-Suppression-and-Error-Mitigation.html). If the optimization is successful, it will return a set of optimal parameter values $\vec\theta^*$; out of which we will be able to construct the _proposed solution state_ $|\psi(\vec\theta^*)\rangle$, and compute the observed expectation value as $C(\vec\theta^*)$.

Notice how we will only be able to minimize the cost function for the limited set of states that we are considering, if during our [choice of ansatz](ansatz.ipynb) we did not include the solution state in the set of searchable states, the values returned from the optimization routine will never correspond to the desired solution to our problem. But even if we included the ground state, we still need to be able to find it out of the set of searchable states —which is not a trivial task either.


Oftentimes, optimizers will require some initial value of the parameters $\vec\theta$ to bootstrap the algorithm. We refer to these as the _initial point_ $\vec\theta_0$, and to $|\psi(\vec\theta_0)\rangle = U_V(\vec\theta_0)|\rho\rangle$ as the _initial state_ (i.e. the particular state achieved by setting the parameter values to the initial point); not to be confused with the _reference state_ $|\rho\rangle$, although they may coincide if $U_V(\vec\theta_0) \equiv I$ (i.e. the identity operation).

All in all, we will be performing a classical optimization loop but relaying the evaluation of the cost function to a quantum computer. From this perspective, one could think of the optimization as a purely classical endeavor where we call some _black-box quantum oracle_ each time the optimizer needs to evaluate the cost function; the catch being that the _quality_ of those evaluations will affect the performance of the optimizer. For instance, noise will make the retrieved values non-deterministic, leading to random fluctuations which, in turn, will harm —or even completely prevent— convergence of certain optimizers to a proposed solution.

Another issue to deal with is the presence of _local minima_ (i.e. valleys, or regions where the target function evaluates to smaller values than in its surroundings) that the optimizer could confuse with the _global minimum_ that we are looking for (i.e. the overall minimum value of the cost function). Similarly, some optimizers can get stuck in flat regions of the cost function, keeping them from converging to a final solution; these are known as [_barren plateaus_](https://learn.qiskit.org/course/machine-learning/variational-classification#variational-31-2) and are one of the most difficult things to deal with in real world applications of quantum computing. A lot of research is currently being done trying to overcome these setbacks both for specific problems, as well as in more general scenarios.


There are two main types of optimizers:

* **Local**: these optimizers look for a point that minimizes the cost function starting at an initial point(s) $C(\vec{\theta_0})$ and moving to different points based on what they see in the region they happen to be at on successive iterations. That implies that the convergence of these algorithms will usually be fast, but can heavily dependent on the initial point. Some of these algorithms use the _gradient_ of the cost function $\nabla C(\vec{\theta})$ (or an approximation) to choose the next set of values for the parameters $\vec{\theta}$ while others are based on completely different techniques like the _SIMPLEX_. Local optimizers are unable to see beyond the region were they are evaluating, and therefore turn out especially vulnerable to local minima; reporting convergence when they find one, ignoring other states with more favorable evaluations.

* **Global**: these optimizers look for the point that minimizes the cost function over several regions of its domain (i.e. non-local), evaluating it iteratively (i.e. iteration $i$) over a spread (i.e. indexed $j$) of parameter values $\vec{\theta}_{i,j}$. This makes them less susceptible to local minima and somewhat independent of initialization, but also significantly slower to converge to a proposed solution. For this reason, they are often times combined with local optimizers, where one would bootstrap the optimization globally, and refine the convergence locally.

In [None]:
import qiskit.tools.jupyter
%qiskit_copyright