# Gradient and Variational Optimization

## Overview

TyxonQ is designed to make optimization of parameterized quantum gates easy, fast, and convenient. In this note, we review how to obtain circuit gradients and run variational optimization.

## Setup

In [2]:
import numpy as np
import scipy.optimize as optimize
import tensorflow as tf
import tyxonq as tq

K = tq.set_backend("tensorflow")

## PQC(Parameterized Quantum Circuit)

 Consider a variational circuit acting on $n$ qubits, and consisting of $k$ layers, where each layer comprises parameterized $e^{i\theta X\otimes X}$ gates between neighboring qubits followed by a sequence of single qubit parameterized $Z$ and $X$ rotations. We now show how to implement such circuits in TyxonQ, and how to use one of the machine learning backends to compute cost functions and gradients easily and efficiently.

The circuit for general $n,k$ and set of parameters can be defined as follows:

In [8]:
def qcircuit(n, k, params):
    c = tq.Circuit(n)
    for j in range(k):
        for i in range(n - 1):
            c.exp1(
                i, i + 1, theta=params[j * (3 * n - 1) + i], unitary=tq.gates._xx_matrix
            )
        for i in range(n):
            c.rz(i, theta=params[j * (3 * n - 1) + n - 1 + i])
            c.rx(i, theta=params[j * (3 * n - 1) + 2 * n - 1 + i])
    return c

As an example, we take $n=3, k=2$, set TensorFlow as our backend, and define an energy cost function to minimize
$$E = \langle X_0 X_1\rangle_\theta + \langle X_1 X_2\rangle_\theta.$$ 

In [9]:
n = 3
k = 2


def energy(params):
    c = qcircuit(n, k, params)
    e = c.expectation_ps(x=[0, 1]) + c.expectation_ps(x=[1, 2])
    return K.real(e)

## Grad and JIT

Using the ML(Machine Learning) backend support for automatic differentiation, we can now quickly compute both the energy and the gradient of the energy with respect to the parameters.

In [10]:
energy_val_grad = K.value_and_grad(energy)

This creates a function that given a set of parameters as input, returns both the energy and the gradient of the energy. If only the gradient is desired, then this can be computed by ``K.grad(energy)``. While we could run the above code directly on a set of parameters, if multiple evaluations of the energy will be performed, significant time savings can be had by using a just-in-time compiled version of the function.

In [11]:
energy_val_grad_jit = K.jit(energy_val_grad)

With ``K.jit``, the initial evaluation of the energy and gradient may take longer, but subsequent evaluations will be noticeably faster than non-jitted code. We recommend always using ``jit`` as long as the function is "tensor-in, tensor-out", and we have worked hard to make all aspects of the circuit simulator compatible with JIT.

## Optimization via ML(Machine Learning) Backend

With the energy function and gradients available, optimization of the parameters is straightforward.  Below is an example of how to do this via stochastic gradient descent.

In [12]:
learning_rate = 2e-2
opt = K.optimizer(tf.keras.optimizers.SGD(learning_rate))


def grad_descent(params, i):
    val, grad = energy_val_grad_jit(params)
    params = opt.update(grad, params)
    if i % 10 == 0:
        print(f"i={i}, energy={val}")
    return params


params = K.implicit_randn(k * (3 * n - 1))
for i in range(100):
    params = grad_descent(params, i)

i=0, energy=0.8478720188140869
i=10, energy=0.48806655406951904
i=20, energy=-0.21994608640670776
i=30, energy=-0.8568923473358154
i=40, energy=-1.1428836584091187
i=50, energy=-1.289249300956726
i=60, energy=-1.3814536333084106
i=70, energy=-1.451552391052246
i=80, energy=-1.5181822776794434
i=90, energy=-1.5902677774429321


## Optimization via Scipy Interface

An alternative to using the machine learning backends for the optimization is to use SciPy.
This can be done via the ``scipy_interface`` API call and allows for gradient-based (e.g. BFGS) and non-gradient-based (e.g. COBYLA) optimizers to be used, which are not available via the ML(Machine Learning) backends.

In [13]:
f_scipy = tq.interfaces.scipy_interface(energy, shape=[k * (3 * n - 1)], jit=True)
params = K.implicit_randn(k * (3 * n - 1))
r = optimize.minimize(f_scipy, params, method="L-BFGS-B", jac=True)
r

  message: CONVERGENCE: RELATIVE REDUCTION OF F <= FACTR*EPSMCH
  success: True
   status: 0
      fun: -2.0000009536743164
        x: [ 9.246e-01 -7.569e-01 ... -1.089e+00 -9.391e-01]
      nit: 27
      jac: [ 1.149e-04 -2.503e-06 ...  5.960e-08 -1.192e-07]
     nfev: 61
     njev: 61
 hess_inv: <16x16 LbfgsInvHessProduct with dtype=float64>

The first line above specifies the shape of the parameters to be supplied to the function to be minimized, which here is the energy function.  The ``jit=True`` argument automatically takes care of jitting the energy function.  Gradient-free optimization can similarly be performed efficiently by supplying the ``gradient=False`` argument to  ``scipy_interface``.

In [14]:
f_scipy = tq.interfaces.scipy_interface(
    energy, shape=[k * (3 * n - 1)], jit=True, gradient=False
)
params = K.implicit_randn(k * (3 * n - 1))
r = optimize.minimize(f_scipy, params, method="COBYLA")
r

 message: Return from COBYLA because the trust region radius reaches its lower bound.
 success: True
  status: 0
     fun: -1.9999802112579346
       x: [-7.794e-01  1.571e+00 ...  6.284e-01  1.690e+00]
    nfev: 514
   maxcv: 0.0