## Optimization loops

During this lesson, we'll learn how to use an *optimizer* to iteratively explore our ansatz's parameterized quantum states. Generally, requires us to:

- Define a *cost function* $C(\vec\theta)$. This is a problem-specific function that defines the problem's goal for the optimizer to minimize (or maximize)
- Pass the cost function output to a classical optimzer to evaluate the next parameters needed, until our optimizer converges on an answer
- We'll also explore how to suppress and mitigate noise with Qiskit Runtime primitives (and the speed vs accuracy tradeoffs to do so).


![Optimzation Loop](optimzation.png)

## Bootstraping Optimization

*Bootstrapping*, or setting the initial value for parameters $\vec\theta$ based on a prior optimization, can help our optimzer converge on a solution faster. We refer to these as the _initial point_ $\vec\theta_0$, and $|\psi(\vec\theta_0)\rangle = U_V(\vec\theta_0)|\rho\rangle$ as the _initial state_.

This initial state differs from our *reference state* $|\rho\rangle$, as the former focuses on initial parameters set during our optimization loop, while the latter focuses on using known "reference" solutions. They may coincide if $U_V(\vec\theta_0) \equiv I$ (i.e. the identity operation).

## Local and Global Optimizers

There are two main types of optimizers:

### Local Optimizers

Local optimizers look for a point that minimizes the cost function starting at an initial point(s) $C(\vec{\theta_0})$ and moving to different points based on what they see in the region they happen to be at on successive iterations. That implies that the convergence of these algorithms will usually be fast, but can heavily dependent on the initial point. 

Some of these algorithms use the _gradient_ of the cost function $\nabla C(\vec{\theta})$ (or an approximation) to choose the next set of values for the parameters $\vec{\theta}$ while others are based on completely different techniques. Local optimizers are unable to see beyond the region were they are evaluating, and turn out to be especially vulnerable to local minima, reporting convergence when they find one, ignoring other states with more favorable evaluations.

### Global Optimizers

Global optimizers look for the point that minimizes the cost function over several regions of its domain (i.e. non-local), evaluating it iteratively (i.e. iteration $i$) over a set of parameter vectors $\Theta_i := \{\vec\theta_{i,j} \; | \; j \in \mathcal{J}_\text{opt}^i\}$ determined by the optimizer.

This makes them less susceptible to local minima and somewhat independent of initialization, but also significantly slower to converge to a proposed solution. For this reason, they are often times combined with local optimizers, where one would warm start the optimization globally, and refine the convergence locally.

In fact, the loss landscape can be quite complicated, as shown in hills and valleys of the example below. The optimization method navigates us around the loss landscape, searching for the minimum, as shown by the black points and lines. we can see that two of the three searches end up in a local landscape minimum, rather than a global one. 

![Loss Landscape](loss-landscape.png)

Generally the optimization methods can be categorised into two groups: gradient-based and gradient-free methods. To determine an optimal solution, gradient-based methods identify an extreme point at which the gradient is equal to zero. A search direction is selected and the searching direction is determined by the derivative of the loss function. The main disadvantages of this type of optimization are the convergence speed can be very slow and there is no guarantee to achieve the optimal solution. 

When derivative information is unavailable or impractical to obtain (e.g. when the loss function is expensive to evaluate or somewhat noisy), gradient-free methods can be very useful. Such optimisation techniques are robust to find the global optima, while the gradient-based methods tend to converge into local optima. However, gradient-free methods require higher computational capacities, especially for the problems with high-dimensional search spaces.

![Barren Plateaus](barren-plateaus.png)

Despite what type of optimization method is used, if the loss landscape is fairly flat, it can be difficult for the method to determine which direction to search. This situation is called a _[barren plateau](gloss:barren-plateaus),_ where the loss loss landscape becomes increasingly flat (and thus hard to determine the direction to the minimum). For a wide class of reasonable parameterized quantum circuits, the probability that the gradient along any reasonable direction is non-zero to some fixed precision is exponentially small as a function of the number of qubits.

While this is still an area of active research, we have a few recommendations:

- **Bootstrapping** helps the optimization loop avoid getting stuck in a parameter space where the gradient is small.
- **Experimenting with hardware-efficient ansatz**: as we're using a noisy quantum system as a *black-box oracle*, the _quality_ of those evaluations will affect the performance of the optimizer. Using hardware-efficent ansatz, such as [`EfficientSU2`](https://qiskit.org/documentation/stubs/qiskit.circuit.library.EfficientSU2.html), could avoid producing exponentially small gradients.
- **Experimenting with error suppression and error mitigation**: the Qiskit Runtime Primitives offer a simple interface to experiment with a variety of `optimization_level`s and `resilience_setting`s respectively. This can reduce the impact of noise and make the optimization process more efficient.
- **Experimenting with gradient-free optimizers**: Unlike gradient-based optimization algorithms, `COBYLA` does not rely on gradient information to optimize the parameters, and can avoid the barren plateau.

With this lesson, you learned how to define your optimization loop:

- Create a cost function
- Experimented with error mitigation and suppression
- Bootstrapped your parameters
- Explored optimzers and how to avoid barren plataeus

Our high-level variational workload is complete:

![Optimization Loop](circuit_optimization.png)

Next, we'll explore specific variational algorithms with this framework in mind