# Calculating Gradient Using Quantum Circuit

<em> Copyright (c) 2021 Institute for Quantum Computing, Baidu Inc. All Rights Reserved. </em>

## Overview

When using variational quantum algorithms like [Variational Quantum Eigensolver (VQE)](../quantum_simulation/VQE_EN.ipynb) and [Quantum Approximate Optimization Algorithm (QAOA)](../combinatorial_optimization/QAOA_EN.ipynb), which involves varying parameters of quantum circuit to minimize an objective function of interest, it brings out an important question - what are the ways to calculate the gradient of a parameterized quantum circuit? Since the objective function is being evaluated using a quantum circuit, it's necessary to evaluate its gradient using quantum algorithms as well. Compared with computing gradient classically, this is more challenging for sure. Below we provide three different methods to accomplish this task on a quantum computer. And here we demonstrate the code and simulate their effect of running on a quantum computer using Paddle Quantum.

## Introduction

Suppose the objective function is the typical parameterized cost function used in VQA: $O(\theta) = \left\langle00\right| U^{\dagger}(\theta)HU(\theta) \left|00\right\rangle$ where H is a Hamiltonian, $U(\theta)$ represents the parameterized quantum circuit and $\theta = [\theta_1, \theta_2, \dots, \theta_n]$ is a list of trainable parameters in the circuit, then our goal is to find

$$
\nabla O(\theta) = \begin{bmatrix} \frac{\partial O}{\partial \theta_1} \\ \frac{\partial O}{\partial \theta_2}\\ \vdots\\ \frac{\partial O}{\partial \theta_n} \end{bmatrix}.
\tag{1}
$$

First, let's import all the required packages.

In [1]:
import numpy as np
import paddle
from paddle_quantum.circuit import UAnsatz
from paddle_quantum.utils import pauli_str_to_matrix, Hamiltonian
import warnings
warnings.filterwarnings("ignore")

Then, let's construct our $U(\theta)$ and Hamiltonian $H$ of the objective function $O(\theta) = \left\langle00\right| U^{\dagger}(\theta)HU(\theta) \left|00\right\rangle$.

We will demonstrate our example on a 2-qubit quantum circuit, constructing $U(\theta)$ with randomly generated theta of size four. We choose our Hamiltonian H to be $H = Z \otimes Z$.

In [2]:
# Define Hamiltonian H
pauli_str = [[1.0, 'Z0,Z1']]
H = Hamiltonian(pauli_str)

# Randomly generate parameters in range from 0 to 2*PI
theta_np = np.random.uniform(0, 2 * np.pi, 4)
# Warning: Note that when defining theta, if we mark stop_gradient=False, then this parameter is trainable; 
# otherwise, the parameter will be treated as constant, its gradient will not be calculated and it will not be updated in training process.
theta_tensor = paddle.to_tensor(theta_np, 'float64', stop_gradient=False)

def U_theta(theta):
    cir = UAnsatz(2)
    cir.ry(theta[0], 0)
    cir.ry(theta[1], 1)
    cir.cnot([0, 1])
    cir.cnot([1, 0])
    cir.ry(theta[2], 0)
    cir.ry(theta[3], 1)
    cir.run_state_vector()
    return cir

print('Hamiltonian H: \n', H.construct_h_matrix())
print('\nU(theta):')
print(U_theta(theta_tensor))

Hamiltonian H: 
 [[ 1.+0.j  0.+0.j  0.+0.j  0.+0.j]
 [ 0.+0.j -1.+0.j  0.+0.j  0.+0.j]
 [ 0.+0.j  0.+0.j -1.+0.j  0.+0.j]
 [ 0.+0.j  0.+0.j  0.+0.j  1.+0.j]]

U(theta):
--Ry(5.690)----*----x----Ry(3.107)--
               |    |               
--Ry(2.521)----x----*----Ry(0.437)--
                                    


## Finite Difference Method

The Finite difference method is one of the most traditional and common numerical methods to estimate the gradient of a function. The main idea is that the error of the derivative of a function $f(x)$ tends to zero as $h$ tends to zero: 

$$
f'(x)= \lim_{h \to 0}\frac{f(x+h) - f(x)}{h}.
\tag{2}
$$

By choosing a sufficiently small $h$, we can get a good approximation of the derivative.

For example, by using one type of finite difference, the central finite difference method, the objective function's gradient will be

$$
\nabla O(\theta) \approx \frac{O(\theta+\delta) - O(\theta-\delta)}{2\delta} = \frac{\left\langle00\right| U^{\dagger}(\theta + \delta)HU(\theta + \delta) \left|00\right\rangle - \left\langle00\right| U^{\dagger}(\theta - \delta)HU(\theta-\delta) \left|00\right\rangle)}{2\delta}.
\tag{3}
$$

When implementing this, we can simply loop through the parameter list, shift the specific parameter in the original circuit and evaluate the objective function over and over again. There's no need to build extra circuits or using extra qubits.

Using Paddle Quantum's built-in method, we can build the circuit of $U(\theta)$, and compute the finite-difference gradient easily by passing in the corresponding Hamiltonian H and the delta. Note: The built-in method currently does not support assigning an input state or running on a noisy circuit.

In [3]:
# We reuse the predefined Hamiltonian H and parameters of the circuit. 
# Again, be sure to mark stop_gradient=False when defining tensor for parameters of the circuit

# Constructing circuit U(theta)
cir = U_theta(theta_tensor)
print(cir)

# Calculating finite-difference gradient
gradients = cir.finite_difference_gradient(H, delta=0.01)
print("Gradient of this objective function is: ", gradients.numpy())

--Ry(5.690)----*----x----Ry(3.107)--
               |    |               
--Ry(2.521)----x----*----Ry(0.437)--
                                    
Gradient of this objective function is:  [-0.79156126  0.12584221 -0.26541629  0.78068313]


## Parameter-shift Method
Again, we use this objective function $O(\theta) = \left\langle0\right| U^{\dagger}(\theta)HU(\theta) \left|0\right\rangle$ as our example. If $U(\theta)$ can be written as $e^{-ia\theta G}$ where $G$ has two unique eigenvalues $\lambda_1$ and $\lambda_2$, we can apply parameter-shift method to find its gradient [1]:

$$
\nabla O(\theta) = r \left[O(\theta+\frac{\pi}{4r}) - O(\theta-\frac{\pi}{4r})\right],
\tag{4}
$$ 

where shift constant $r = \frac{a}{2} (\lambda_2 - \lambda_1)$. Note that we're getting a theoretically exact gradient instead of an estimation like finite difference gradient. Moreover, this method doesn't require the construction of new circuits or adding ancilla qubits. Evaluations could be done just by changing the parameters inside the circuit.

The fundamental rotation gates provided by Paddle Quantum are $R_x(\theta), R_y(\theta), R_z(\theta)$ gates, which can be written as $e^{-i\frac{1}{2}\theta X}, e^{-i\frac{1}{2}\theta Y}, e^{-i\frac{1}{2}\theta Z}$ respectively. Since the eigenvalues of $X$, $Y$, and $Z$ gates are unique, which are -1 and 1, it's not hard to see that $r = \frac{1}{2}$, and the gradient of those gates is just

$$
\frac{1}{2}[ O(\theta + \frac{\pi}{2}) - O(\theta - \frac{\pi}{2})].
\tag{5}
$$

We will demonstrate this formula's derivation using a single rotation gate $R_x$.

### Derivation

In this section, we will go through all the steps and arrive at the resulting formula for calculating $R_x$ gate's derivative. Using $R_x(\theta)$ as $U(\theta)$, we have

$$
O(\theta) = \left\langle0\right| R_x^{\dagger}(\theta)HR_x(\theta) \left|0\right\rangle.
\tag{6}
$$

Given that $R_x(\theta) = e^{-i\frac{1}{2}\theta X}$, where $X$ is the Pauli-X matrix, we know $\frac{\partial}{\partial \theta}  R_x(\theta) =-i\frac{1}{2}Xe^{-i\frac{\theta}{2}X}=-i\frac{1}{2}XR_x(\theta)$.  Using the product rule, its derivative can be written as

$$
O'(\theta) = \left\langle0\right| [\frac{i}{2}X] R_x^{\dagger}(\theta)HR_x(\theta)\left|0\right\rangle + \left\langle0\right| R_x^{\dagger}(\theta)H [-\frac{i}{2}X] R_x(\theta)\left|0\right\rangle.
\tag{7}
$$

We take out the shift constant $r$ in front for the purpose of later rearranging the equation using some trick, where $r$ for $R_x$ gate is $\frac{1}{2}$. We have

$$
O'(\theta) = r \left\langle0\right| [\frac{i}{2r}X] R_x^{\dagger}(\theta)HR_x(\theta)\left|0\right\rangle + \left\langle0\right| R_x^{\dagger}(\theta)H [-\frac{i}{2r}X] R_x(\theta)\left|0\right\rangle.
\tag{8}
$$

Since we recognize that for any operators $U$, $V$ and $Q$, and for an arbitrary state $|\psi\rangle$, 

$$
\langle\psi|U^\dagger QV|\psi\rangle + \langle\psi|V^\dagger QU|\psi\rangle = \frac{1}{2} \big(\langle\psi|(U+V)^\dagger Q(U+V)|\psi\rangle - \langle\psi|(U-V)^\dagger Q(U-V)|\psi\rangle \big),
\tag{9}
$$

we get 

$$
O'(\theta) = \frac{r}{2} \big( \left\langle0\right|R_x^{\dagger}(\theta) [I + \frac{i}{2r}X]H[I - \frac{i}{2r}X]R_x(\theta)\left|0\right\rangle - \left\langle0\right| R_x^{\dagger}(\theta) [I - \frac{i}{2r}X] H [I+\frac{i}{2r}X] R_x(\theta)\left|0\right\rangle \big).
\tag{10}
$$

Using Euler's identity and knowing that $G$ has two unique eigenvalues, we can rewrite $U(\theta)$ as $e^{-ia\theta G} = I\cos(r\theta) - i\frac{a}{r}G\sin(r\theta)$ [1]. Thus, we have $R_x(\theta) = I\cos(r\theta) - i\frac{1}{2r}X\sin(r\theta)$, we notice that 

$$
R_x(\frac{\pi}{4r}) = I\cos(\frac{\pi}{4}) - i\frac{1}{2r}X\sin(\frac{\pi}{4}) = \frac{1}{\sqrt2}(I-\frac{i}{2r}X).
\tag{11}
$$

We can use the same method to get 

$$
R_x(-\frac{\pi}{4r}) = \frac{1}{\sqrt2}(I+\frac{i}{2r}X).
\tag{12}
$$

Thus, the equation can be simplified to

$$
O'(\theta) = r\big[ \left\langle0\right|R_x^{\dagger}(\theta+\frac{\pi}{4r})HR_x(\theta+\frac{\pi}{4r})\left|0\right\rangle - \left\langle0\right| R_x^{\dagger}(\theta-\frac{\pi}{4r}) H R_x(\theta-\frac{\pi}{4r})\left|0\right\rangle \big],
\tag{13}
$$

and get the final formula,

$$
O'(\theta) = r\big[O(\theta+\frac{\pi}{4r}) - O(\theta-\frac{\pi}{4r}))\big] = \frac{1}{2}\big[ O(\theta + \frac{\pi}{2}) - O(\theta - \frac{\pi}{2})\big].
\tag{14}
$$

### Paddle Quantum implementation
Here we demonstrate how to use Paddle Quantum's built-in parameter shift method to calculate the gradient. Note: The built-in method currently does not support assigning an input state or running on a noisy circuit.

In [4]:
# We reuse the predefined Hamiltonian H and parameters of the circuit. 
# Again, be sure to mark stop_gradient=False when defining tensor for parameters of the circuit.

# Constructing circuit U(theta)
cir = U_theta(theta_tensor)
print(cir)

gradients = cir.param_shift_gradient(H)
print("Gradient of this objective function is: ", gradients.numpy())

--Ry(5.690)----*----x----Ry(3.107)--
               |    |               
--Ry(2.521)----x----*----Ry(0.437)--
                                    
Gradient of this objective function is:  [-0.79156457  0.12584274 -0.2654174   0.7806864 ]


## Linear Combination of Unitary Gradients

Building a parameterized circuit $U(\theta)$ using Paddle Quantum requires many parameterized one-qubit and two-qubit gates like $R_x$ and $CR_x$. So we can rewrite $U(\theta)$ as $U_{1}(\theta_1)U_{2}(\theta_2)\cdots U_{m}(\theta_m)$, where $U_i(\theta_i)$ is one of the one-qubit and two-qubit gates and $m$ is the total number of parameterized gates in this circuit $U(\theta)$. To get the gradient of an individual parameter, we consider this equation $\frac{\partial U(\theta)}{\partial \theta_i}=U_{1}(\theta_1)U_{2}(\theta_2)\cdots\frac{\partial U_i{(\theta_i)}}{\partial \theta_i}\cdots U_{m}(\theta_m)$. We notice that as long as we know $\frac{\partial U_i{(\theta_i)}}{\partial \theta_i}$ for all parameterized gates, we can get the gradients for all parameters easily [2].

### Single qubit gate gradient

Let's consider single qubit gates first. We also take $R_x(\theta)$ as an example. In the previous sections, we've already shown that $\frac{\partial R_x(\theta)}{\partial \theta}=-i\frac{1}{2}XR_x(\theta)$, which can be easily constructed using a circuit. Let's try to implement it using Paddle Quantum.

In [5]:
# Construct the circuit with a single one-qubit gate Rx
theta = paddle.to_tensor(np.pi / 3, 'float64')
cir = UAnsatz(1)
cir.rx(theta, 0)
print('Original circuit: ')
print(cir)

print('The circuit for gradient of Rx: ')
# The first parameter here is the index of the gate, the second parameter is the name of the gate
print(cir.pauli_rotation_gate_partial(0, 'rx'))

Original circuit: 
--Rx(1.047)--
             
The circuit for gradient of Rx: 
------------x----Rx(1.047)--
            |               
--H---SDG---*--------H------
                            


It's a lot more complicated to do the same for a $u3(\theta, \phi, \lambda)$ gate. But don't worry, we provide built-in methods for generating the circuits needed for calculating gradients of all parameterized single qubit gates in Paddle Quantum, i.e., $R_x$, $R_y$, $R_z$, and $u3$. 

In [6]:
cir = UAnsatz(1)
theta = paddle.uniform([3], min=0.0, max=2*np.pi, dtype='float64')
cir.u3(theta[0], theta[1], theta[2], 0)
print('Original circuit: ')
print(cir)

# Since the u3 gate has three parameters, we need a total of three circuits. Each corresponds to one parameter.
print('Circuits for gradient of u3: ')
# The first parameter here is the index of the gate, the second parameter is the index of parameter
print(cir.u3_partial(0, 0))
print(cir.u3_partial(0, 1))
print(cir.u3_partial(0, 2))

Original circuit: 
--U--
     
Circuits for gradient of u3: 
------------z----U--
            |       
--H---SDG---*----H--
                    
--Rz(4.706)---------y----Ry(3.292)----Rz(1.748)--
                    |                            
------H-------SDG---*--------H-------------------
                                                 
--Rz(4.706)----Ry(3.292)----z----Rz(1.748)--
                            |               
------H-----------SDG-------*--------H------
                                            


### Two-qubit gate gradient

Paddle Quantum provides many two-qubit parameterized gates as well. They can be categorized into two types: one is control rotation gates like $CR_x$, the other is two-qubit rotation gates like $R_{xx}$. Circuits for gradients of two-qubit rotation gates are easy to construct. Let's take $R_{xx}$ as an example. Following the idea of one-qubit rotation gates, we first write it as $R_{xx}(\theta)=e^{-i\frac{\theta}{2}X\otimes X}$, then get the equation $\frac{\partial R_{xx}(\theta)}{\partial \theta}=-i\frac{1}{2}X\otimes Xe^{-i\frac{\theta}{2}X\otimes X}$, which can be converted into a circuit easily.

We need to be careful when calculating the gradients for control rotation gates. Usually, we will need two circuits for a control rotation gate with one parameter. For example, let's consider $CR_x(\theta)$.

$CR_x(\theta)$ can be written as $\left|0\right>\left<0\right|\otimes I + \left|1\right>\left<1\right|\otimes R_x(\theta)$, so its gradient is: 

$$
\frac{\partial CR_x(\theta)}{\partial \theta}=\left|1\right>\left<1\right|\otimes \frac{\partial R_x(\theta)}{\partial \theta}=-\frac{i}{2}\left|1\right>\left<1\right|\otimes Xe^{-i\frac{\theta}{2}X}.
\tag{15}
$$

However, this equation cannot be represented directly using one circuit. We need to use a tiny 'trick' here. Instead of using this formula directly, we decompose it into two terms 

$$
\frac{\partial CR_x(\theta)}{\partial \theta}=-\frac{i}{4}(\left|0\right>\left<0\right|\otimes I + \left|1\right>\left<1\right|\otimes R_x(\theta))I\otimes X + \frac{i}{4}(\left|0\right>\left<0\right|\otimes I + \left|1\right>\left<1\right|\otimes R_x(\theta))Z\otimes X.
\tag{16}
$$ 

You can easily verify that this formula is equivalent to the former one. By doing so, we can use two circuits to compute the gradients for $CR_x$.

As always, we provide built-in methods for calculating the gradients of all two-qubit parameterized gates in Paddle Quantum, i.e., $R_{xx}$, $R_{yy}$, $R_{zz}$, $CR_x$, $CR_y$, $CR_z$, $CU$. 

In [7]:
theta = paddle.uniform([5], min=0.0, max=2*np.pi, dtype='float64')
cir = UAnsatz(2)
cir.cu(theta[0], theta[1], theta[2], [0, 1])
cir.rzz(theta[3], [0, 1])
cir.cry(theta[4], [0, 1])
print('Original circuit: ')
print(cir)

# The first parameter here is the index of the gate, the second parameter is the index of parameter
# Since we have three parameters for cu gate, and we need two circuits for each parameter, we have a total of 6 circuits.
# Circuits for gradients of cu:
cu3_00 = cir.cu3_partial(0, 0)[0]
cu3_01 = cir.cu3_partial(0, 0)[1]
cu3_10 = cir.cu3_partial(0, 1)[0]
cu3_11 = cir.cu3_partial(0, 1)[1]
cu3_20 = cir.cu3_partial(0, 2)[0]
cu3_21 = cir.cu3_partial(0, 2)[1]

# The first parameter here is the index of the gate, the second parameter is the name of the gate
print('The circuit for gradient of rzz: ')
print(cir.pauli_rotation_gate_partial(1, 'RZZ_gate'))

# The first parameter here is the index of the gate, the second parameter is the name of the gate
print('The circuit for gradient of cry: ')
print(cir.control_rotation_gate_partial(2, 'cry')[0])
print(cir.control_rotation_gate_partial(2, 'cry')[1])

Original circuit: 
--*----Rzz(3.76)--------*------
  |        |            |      
--U----Rzz(3.76)----Ry(2.246)--
                               
The circuit for gradient of rzz: 
--*---------z---------Rzz(3.76)--------*------
  |         |             |            |      
--U---------|----z----Rzz(3.76)----Ry(2.246)--
            |    |                            
--H---SDG---*----*--------H-------------------
                                              
The circuit for gradient of cry: 
--*----Rzz(3.76)----y--------*------
  |        |        |        |      
--U----Rzz(3.76)----|----Ry(2.246)--
                    |               
--H-------SDG-------*--------H------
                                    
--*----Rzz(3.76)---------y--------*------
  |        |             |        |      
--U----Rzz(3.76)----z----|----Ry(2.246)--
                    |    |               
--H--------S--------*----*--------H------
                                         


Now that we have all individual circuits prepared for calculating gradients, the next step is to get the exact value of the gradients. How to do so? We need to plug these circuits into our objective function, then the results are our desired gradients. For gates like $CR_x$, we will take the mean of the two circuits to be the gradient, as indicated in the analytical formula. We also provide a built-in method (Note: The built-in method currently does not support assigning an input state or running on a noisy circuit):

In [8]:
# Randomly generate parameters for our circuit
theta = paddle.uniform(shape=[8], dtype='float64', min=0.0, max=np.pi * 2)
theta.stop_gradient = False

# Construct circuit of U(theta)
cir = UAnsatz(2)
cir.complex_entangled_layer(theta[:6], 1)
cir.ry(theta=theta[6], which_qubit=0)
cir.ry(theta=theta[7], which_qubit=1)
cir.run_state_vector()
print(cir)

# Calculate gradient using our built-in method
# We pass in our Hamiltonian H used in the objective function
gradient = cir.linear_combinations_gradient(H, shots=0)
print("Gradient of this objective function is: ", gradient.numpy())

--U----*----x----Ry(0.667)--
       |    |               
--U----x----*----Ry(4.807)--
                            
Gradient of this objective function is:  [ 0.         -0.76471634  0.00700639  0.         -0.29062181 -0.01701887
 -0.07729092  0.7766131 ]


## Application: Simulating VQE with Paddle Quantum

Variational Quantum Eigensolver (VQE) [3] is designed to find the ground state energy of a given molecular Hamiltonian using variational quantum circuits. Interested readers can find more details from the previous tutorial [VQE](../quantum_simulation/VQE_EN.ipynb).

We will demonstrate how to use VQE to find the ground state energy for the Hamiltonian of hydrogen molecule $H_2$. In the process, we will use the methods introduced above to calculate the gradient.

### Using Paddle's Optimizer

First, we will use Paddle's optimizer Adam to run our example. We can choose using either finite difference method or parameter-shift method to calculate gradient.

In [9]:
from paddle_quantum.VQE.chemistrysub import H2_generator
from paddle_quantum.expecval import ExpecVal

# Set up our Hamiltonian H
pauli_str, N = H2_generator()
H = Hamiltonian(pauli_str)

# Hyper-parameters
ITR = 80  # Set the number of optimization iterations
LR = 0.4   # Set the learning rate
D = 2      # Set the depth of the repetitive calculation module in QNN

def U_theta(theta, Hamiltonian, N, D):
    """
    Quantum Neural Network
    """
    # Initialize the quantum neural network according to the number of qubits N
    cir = UAnsatz(N)

    # Built-in {R_y + CNOT} circuit template
    theta = paddle.reshape(theta, [D+1, N, 1])
    cir.real_entangled_layer(theta[:D], D)

    # Lay R_y gates in the last row
    for i in range(N):
        cir.ry(theta=theta[D][i][0], which_qubit=i)

    # The quantum neural network acts on the default initial state |0...0>
    cir.run_state_vector()
    
    return cir

Here in our forward propagation mechanism, we use the updated parameters to calculate gradient using parameter-shift rule, and calculate the expectation value. You can change the method to 'finite_diff' if you'd like to try it out using finite difference method. 

In [10]:
class StateNet(paddle.nn.Layer):

    def __init__(self, cir):
        super(StateNet, self).__init__()
        
        self.cir = cir
        params = cir.get_param()
        
        # Assign the theta parameter list to be the trainable parameter list of the circuit
        self.theta = self.create_parameter(shape=[len(params)], 
                                           default_initializer=paddle.nn.initializer.Assign(params),
                                           dtype='float32', is_bias=False)
        
    # Define loss function and forward propagation mechanism
    def forward(self):
        # Calculate the loss function/expectation value using Parameter-shift rule to calculate gradient
        loss = ExpecVal.apply(self.cir, self.theta.cast('float64'), 'param_shift', H, shots=0)
        
        return loss, self.cir

In [11]:
# Initialize the theta parameter list and fill the initial value with a uniform distribution of [0, 2*pi]
theta = paddle.to_tensor(np.random.uniform(0.0, 2*np.pi, (D+1) * N), stop_gradient=False)

# Initialize the circuit
cir = U_theta(theta, H, N, D)

# Determine the parameter dimension of the network
net = StateNet(cir)

# Generally speaking, we use Adam optimizer to obtain relatively good convergence,
# You can change it to SGD or RMS prop.
opt = paddle.optimizer.Adam(learning_rate=LR, parameters=net.parameters())

# Record optimization results
summary_iter, summary_loss = [], []

# Optimization loop
for itr in range(1, ITR + 1):

    # Forward propagation to calculate loss function
    loss, cir = net()

    # Use back propagation to minimize the loss function
    loss.backward()
    opt.minimize(loss)
    opt.clear_grad()

    # Record optimization results
    summary_loss.append(loss.numpy())
    summary_iter.append(itr)

    # Print result
    if itr % 20 == 0:
        print("iter:", itr, "loss:", "%.4f" % loss.numpy())
        print("iter:", itr, "Ground state energy:", "%.4f Ha" 
                                            % loss.numpy())
    if itr == ITR:
        print("\nThe trained circuit:")
        print(cir)

print('\nGround state energy obtained: ', summary_loss[-1], "Ha")
print('Actual ground state energy: ', -1.13618, "Ha")

iter: 20 loss: -1.1114
iter: 20 Ground state energy: -1.1114 Ha
iter: 40 loss: -1.1316
iter: 40 Ground state energy: -1.1316 Ha
iter: 60 loss: -1.1357
iter: 60 Ground state energy: -1.1357 Ha
iter: 80 loss: -1.1361
iter: 80 Ground state energy: -1.1361 Ha

The trained circuit:
--Ry(6.282)----*--------------x----Ry(6.289)----*--------------x----Ry(3.148)--
               |              |                 |              |               
--Ry(3.138)----x----*---------|----Ry(0.207)----x----*---------|----Ry(3.142)--
                    |         |                      |         |               
--Ry(6.278)---------x----*----|----Ry(0.001)---------x----*----|----Ry(3.143)--
                         |    |                           |    |               
--Ry(0.001)--------------x----*----Ry(3.156)--------------x----*----Ry(3.154)--
                                                                               

Ground state energy obtained:  [-1.13609609] Ha
Actual ground state energy:  -1.1

We can see that the ground state energy we obtained is close to the theoretical value.

### Using SciPy's Optimizer

We will also demonstrate how to use SciPy's optimizer to run VQE easily with Paddle Quantum. For this example, we will use Conjugate Gradient (CG) optimizer along with linear combination method to find the ground state energy of our Hamiltonian. 

Other SciPy methods we support include Newton-CG, Powell, and SLSQP.

In [12]:
from paddle_quantum.optimizer import ConjugateGradient

# Initialize the circuit
cir = U_theta(theta, H, N, D)

optimizer = ConjugateGradient(cir, H, shots=0, grad_func_name='linear_comb')
optimizer.minimize(iterations=80)
print('Actual ground state energy: ', -1.13618, "Ha")

loss:  [-0.91176578]
loss:  [-1.03555093]
loss:  [-1.11965221]
loss:  [-1.13435502]
loss:  [-1.13577104]
loss:  [-1.13615947]
loss:  [-1.13618601]
loss:  [-1.13618942]
loss:  [-1.13618945]
loss:  [-1.13618945]
loss:  [-1.13618945]
Optimization terminated successfully.
Actual ground state energy:  -1.13618 Ha


## Conclusion

As you can see, finite-difference and parameter-shift methods have similar forms - both of them require two function evaluations per parameter. The benefits of these methods are that the gradients can be calculated without knowing much about the circuit or the objective function. We can treat them as a black box and get the gradient just by feeding in different parameters. Our preferred choice between those two is parameter-shift method because its result is an analytical gradient, while finite difference method can only get an estimation of the gradient. However, parameter-shift only applies to $U(\theta)$ that can be generated by $G$ that has two distinct eigenvalues:  $U(\theta) = e^{-ia\theta G}$ or can be decomposed into a product of gates in this form.

Using linear combination of unitary gates to calculate the gradients of a given circuit is probably the most straightforward analytical method. By differentiating unitary gates under their mathematical forms, we can use circuits to represent the resulting formula. The number of circuits required is proportional to the number of parameters in the original circuit as the other two methods. We can even calculate gradients by constructing only one single circuit for simple gates like $R_x$, $R_{xx}$ and etc. However, note that we will use an ancilla qubit in this method. Moreover, you might have noticed that this method takes a long time to run on complex circuits. That's because as the number of qubits increases, the number of circuits used to represent the first order differentiation of a single multi-qubit gate also increases. 

_______

## References

[1] Crooks, Gavin E. "Gradients of parameterized quantum gates using the parameter-shift rule and gate decomposition." [arXiv preprint arXiv:1905.13311 (2019)](https://arxiv.org/abs/1905.13311).

[2] Somma, Rolando, et al. "Simulating physical phenomena by quantum networks." [Physical Review A 65.4 (2002): 042323](https://arxiv.org/abs/quant-ph/0108146).

[3] Peruzzo, Alberto, et al. "A variational eigenvalue solver on a photonic quantum processor." [Nature communications 5.1 (2014): 1-7](https://www.nature.com/articles/ncomms5213).

[4] Schuld, Maria, et al. "Evaluating analytic gradients on quantum hardware." [Physical Review A 99.3 (2019): 032331](https://arxiv.org/abs/1811.11184).