# Example of timing benchmark of Refoqus on CPU and GPU

We will retake the VQSE example again. But we will compare this time the runtimes on CPU and GPU as the average number of shots per second we do when running on each.

## Variational Quantum State Eigensolver

In this notebook, we will optimize a VQSE problem. Here, we are given a set of states $\{\rho_i\}_{i=1}^N$ and we optimize the cost $L(\vec{\theta}) = \sum_{i} p_{i} E_{i}(\vec{\theta})$ where $\vec{\theta}$ are parameters of a variational circuit, and $E_i = \langle \rho_i |H| \rho_i \rangle, H = \mathbb{1} - r_j Z_j, r_j > 0$ .


In our example, $\{\rho_i\}_{i=1}^N$ will be taken from the collection of datasets available in Pennylane. We take several states obtained by running VQE for different bond lengths of the H2 molecule in the STO-3G basis. We obtain a dataset of 42 circuits as follows:

In [1]:
from time import time
import pennylane as qml
from pennylane import numpy as np
from refoqus import Refoqus

bondlengths = ['0.5', '0.54', '0.58', '0.62', '0.66', '0.7', '0.74', '0.742', '0.78', '0.82', '0.86', '0.9', '0.94', '0.98', '1.02', '1.06', '1.1', '1.14', '1.18', '1.22', '1.26', '1.3', '1.34', '1.38', '1.42', '1.46', '1.5', '1.54', '1.58', '1.62', '1.66', '1.7', '1.74', '1.78', '1.82', '1.86', '1.9', '1.94', '1.98', '2.02', '2.06', '2.1']
moldataset = qml.data.load("qchem", molname="H2", basis="STO-3G", bondlength=bondlengths)
nbdatapoints = len(moldataset)

Now we set the coefficients r_j, the hamiltonian terms and even define the hamiltonian of interest $- r_j Z_j$ (note $1$ is a constant to be added later).

In [2]:
nbqbits = len(moldataset[0].hamiltonian.wires)
coefficients_cost = -np.array(
    [1.0 + (i - 1) * 0.2 for i in range(1, nbqbits + 1)]
)
coefficients_cost /= np.sum(coefficients_cost)

vqse_hamiltonian_term = [qml.PauliZ(i) for i in range(nbqbits)]

hamiltonian_of_interest = qml.Hamiltonian(coefficients_cost, vqse_hamiltonian_term)

## CPU and GPU setting

In [3]:
! neofetch

[?25l[?7l[0m[31m[1m            .-/+oossssoo+\-.
        ´:+ssssssssssssssssss+:`
      -+ssssssssssssssssssyyssss+-
    .ossssssssssssssssss[37m[0m[1mdMMMNy[0m[31m[1msssso.
   /sssssssssss[37m[0m[1mhdmmNNmmyNMMMMh[0m[31m[1mssssss\
  +sssssssss[37m[0m[1mhm[0m[31m[1myd[37m[0m[1mMMMMMMMNddddy[0m[31m[1mssssssss+
 /ssssssss[37m[0m[1mhNMMM[0m[31m[1myh[37m[0m[1mhyyyyhmNMMMNh[0m[31m[1mssssssss\
.ssssssss[37m[0m[1mdMMMNh[0m[31m[1mssssssssss[37m[0m[1mhNMMMd[0m[31m[1mssssssss.
+ssss[37m[0m[1mhhhyNMMNy[0m[31m[1mssssssssssss[37m[0m[1myNMMMy[0m[31m[1msssssss+
oss[37m[0m[1myNMMMNyMMh[0m[31m[1mssssssssssssss[37m[0m[1mhmmmh[0m[31m[1mssssssso
oss[37m[0m[1myNMMMNyMMh[0m[31m[1msssssssssssssshmmmh[0m[31m[1mssssssso
+ssss[37m[0m[1mhhhyNMMNy[0m[31m[1mssssssssssss[37m[0m[1myNMMMy[0m[31m[1msssssss+
.ssssssss[37m[0m[1mdMMMNh[0m[31m[1mssssssssss[37m[0m[1mhNMMMd[0m[31m[1mssssssss.
 \ssssssss[37m[0m[1mh

Next, we define functions to evaluate the true cost during optimization. Here we will write the functions for running on cpu and gpu.

In [4]:
analytic_dev_cpu = qml.device("lightning.qubit", wires=nbqbits, shots=None)
analytic_dev_gpu = qml.device("lightning.gpu", wires=nbqbits, shots=None)

@qml.qnode(analytic_dev_cpu)
def cost_analytic_one_circuit(weights, index_datapoint):
    
    for op in moldataset[index_datapoint].vqe_gates:
        qml.apply(op)
        
    StronglyEntanglingLayers(weights, wires=analytic_dev_cpu.wires)
    return qml.expval(hamiltonian_of_interest)

def cost_analytic_alldataset(weights):
    
    cost = 0.0
    for m in range(nbdatapoints):
        cost += cost_analytic_one_circuit(weights, m)
    cost = 1.0 + cost / nbdatapoints
    return cost

@qml.qnode(analytic_dev_gpu)
def cost_analytic_one_circuit_gpu(weights, index_datapoint):
    
    for op in moldataset[index_datapoint].vqe_gates:
        qml.apply(op)
        
    StronglyEntanglingLayers(weights, wires=analytic_dev_gpu.wires)
    return qml.expval(hamiltonian_of_interest)

def cost_analytic_alldataset(weights):
    
    cost = 0.0
    for m in range(nbdatapoints):
        cost += cost_analytic_one_circuit_gpu(weights, m)
    cost = 1.0 + cost / nbdatapoints
    return cost

Now, the ansatz is defined as with StronglyEntanglingLayers. We also sample initial values and the corresponding cost.

In [5]:
from pennylane.templates.layers import StronglyEntanglingLayers

# hyperparameter of ansatz
num_layers = 3


param_shape = StronglyEntanglingLayers.shape(n_layers=num_layers, n_wires=nbqbits)
np.random.seed(10)
init_params = np.random.uniform(low=0.0, high=2*np.pi, size=param_shape, requires_grad=True)
cost_analytic_alldataset(init_params)

tensor(1.13628269, requires_grad=True)

Our adaptative optimizer will be Refoqus where we provide the necessary arguments as follows and we perform niter iterations.

## CPU benchmark

We do 10 Refoqus runs on CPU and GPU, save the number of total shots and the runtimes.

In [6]:
res_cpu = []
restime_cpu = []
nbruns = 10
niter = 20

for _ in range(nbruns):
    opt = Refoqus(nbqbits, [m.vqe_gates for m in moldataset], vqse_hamiltonian_term, coefficients_cost, param_shape, min_shots=2, device_name="lightning.qubit")
    params = init_params
    
    starttime = time()
    cost_refoqus = [cost_analytic_alldataset(params)]
    shots_refoqus = [0]

    for i in range(niter):
        params = opt.step(params)
        cost_refoqus.append(cost_analytic_alldataset(params))
        shots_refoqus.append(opt.shots_used)
    restime_cpu.append(time()-starttime)
    res_cpu.append([cost_refoqus.copy(), shots_refoqus.copy()])
    print(restime_cpu[-1])
    
average_cpu = np.sum([res_cpu[j][1][-1] / restime_cpu[j] for j in range(nbruns)]) / nbruns

176.74725651741028
222.86281490325928
79.41039514541626
105.3722755908966
186.56707239151
217.61472034454346
84.12965512275696
132.66889429092407
228.71543741226196
173.60100674629211


In [7]:
res_gpu = []
restime_gpu = []

for _ in range(nbruns):
    opt = Refoqus(nbqbits, [m.vqe_gates for m in moldataset], vqse_hamiltonian_term, coefficients_cost, param_shape, min_shots=2, device_name="lightning.gpu")
    params = init_params

    starttime = time()
    cost_refoqus = [cost_analytic_alldataset(params)]
    shots_refoqus = [0]

    for i in range(niter):
        params = opt.step(params)
        cost_refoqus.append(cost_analytic_alldataset(params))
        shots_refoqus.append(opt.shots_used)
    restime_gpu.append(time()-starttime)
    res_gpu.append([cost_refoqus.copy(), shots_refoqus.copy()])
    print(restime_gpu[-1])
    
average_gpu = np.sum([res_gpu[j][1][-1] / restime_gpu[j] for j in range(nbruns)]) / nbruns

285.66036677360535
221.94530153274536
176.87832975387573
212.0471911430359
222.54807949066162
69.64625930786133
256.0793368816376
299.05737113952637
62.0515558719635
208.52359890937805


### How many times more shots can we do with GPU?

In [8]:
average_gpu, average_cpu, average_gpu / average_cpu

(1568.135722051434, 1128.5434782479174, 1.3895217617011892)