/
tutorial_quantum_natural_gradient.py
484 lines (383 loc) · 17.8 KB
/
tutorial_quantum_natural_gradient.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
r"""
.. _quantum_natural_gradient:
Quantum natural gradient
========================
.. meta::
:property="og:description": This demonstration highlights faster optimization
convergence using the quantum natural gradient, which takes into account
the quantum geometry.
:property="og:image": https://pennylane.ai/qml/_images/qng_optimization.png
This example demonstrates the quantum natural gradient optimization technique
for variational quantum circuits, originally proposed in
`Stokes et al. (2019) <https://arxiv.org/abs/1909.02108>`__.
Background
----------
The most successful class of quantum algorithms for use on near-term noisy quantum hardware
is the so-called variational quantum algorithm. As laid out in the
:ref:`Concepts section <glossary_variational_circuit>`, in variational quantum algorithms
a low-depth parametrized quantum circuit ansatz is chosen, and a problem-specific
observable measured. A classical optimization loop is then used to find
the set of quantum parameters that *minimize* a particular measurement expectation value
of the quantum device. Examples of such algorithms include the :doc:`variational quantum
eigensolver (VQE) <tutorial_vqe>`, the
`quantum approximate optimization algorithm (QAOA) <https://arxiv.org/abs/1411.4028>`__,
and :ref:`quantum neural networks (QNN) <quantum_neural_net>`.
Most recent demonstrations
of variational quantum algorithms have used gradient-free classical optimization
methods, such as the Nelder-Mead algorithm. However, the parameter-shift rule
(as implemented in PennyLane) allows the user to automatically compute
analytic gradients of quantum circuits. This opens up the possibility to train
quantum computing hardware using gradient descent---the same method used to train
deep learning models.
Though one caveat has surfaced with gradient descent --- how do we choose the optimal
step size for our variational quantum algorithms, to ensure successful and
efficient optimization?
The natural gradient
^^^^^^^^^^^^^^^^^^^^
In standard gradient descent, each optimization step is given by
.. math:: \theta_{t+1} = \theta_t -\eta \nabla \mathcal{L}(\theta),
where :math:`\mathcal{L}(\theta)` is the cost as a function of
the parameters :math:`\theta`, and :math:`\eta` is the learning rate
or step size. In essence, each optimization step calculates the
steepest descent direction around the local value of :math:`\theta_t`
in the parameter space, and updates :math:`\theta_t\rightarrow \theta_{t+1}`
by this vector.
The problem with the above approach is that each optimization step
is strongly connected to a *Euclidean geometry* on the parameter space.
The parametrization is not unique, and different parametrizations can distort
distances within the optimization landscape.
For example, consider the following cost function :math:`\mathcal{L}`, parametrized
using two different coordinate systems, :math:`(\theta_0, \theta_1)`, and
:math:`(\phi_0, \phi_1)`:
|
.. figure:: ../demonstrations/quantum_natural_gradient/qng7.png
:align: center
:width: 90%
:target: javascript:void(0)
|
Performing gradient descent in the :math:`(\theta_0, \theta_1)` parameter
space, we are updating each parameter by the same Euclidean distance,
and not taking into account the fact that the cost function might vary at a different
rate with respect to each parameter.
Instead, if we perform a change of coordinate system (re-parametrization)
of the cost function, we might find a parameter space where variations in :math:`\mathcal{L}`
are similar across different parameters. This is the case with the new parametrization
:math:`(\phi_0, \phi_1)`; the cost function is unchanged,
but we now have a nicer geometry in which to perform gradient descent, and a more
informative stepsize. This leads to faster convergence, and can help avoid optimization
becoming stuck in local minima. For a more in-depth explanation,
including why the parameter space might not be best represented by a Euclidean space,
see `Yamamoto (2019) <https://arxiv.org/abs/1909.05074>`__.
However, what if we avoid gradient descent in the parameter space altogether?
If we instead consider the optimization problem as a
probability distribution of possible output values given an input
(i.e., `maximum likelihood estimation <https://en.wikipedia.org/wiki/Likelihood_function>`_),
a better approach is to perform the gradient descent in the *distribution space*, which is
dimensionless and invariant with respect to the parametrization. As a result,
each optimization step will always choose the optimum step-size for every
parameter, regardless of the parametrization.
In classical neural networks, the above process is known as
*natural gradient descent*, and was first introduced by
`Amari (1998) <https://www.mitpressjournals.org/doi/abs/10.1162/089976698300017746>`__.
The standard gradient descent is modified as follows:
.. math:: \theta_{t+1} = \theta_t - \eta F^{-1}\nabla \mathcal{L}(\theta),
where :math:`F` is the `Fisher information matrix <https://en.wikipedia.org/wiki/Fisher_information#Matrix_form>`__.
The Fisher information matrix acts as a metric tensor, transforming the
steepest descent in the Euclidean parameter space to the steepest descent in the
distribution space.
The quantum analog
^^^^^^^^^^^^^^^^^^
In a similar vein, it has been shown that the standard Euclidean geometry
is sub-optimal for optimization of quantum variational algorithms
`(Harrow and Napp, 2019) <https://arxiv.org/abs/1901.05374>`__.
The space of quantum states instead possesses a unique invariant metric
tensor known as the Fubini-Study metric tensor :math:`g_{ij}`, which can be used to
construct a quantum analog to natural gradient descent:
.. math:: \theta_{t+1} = \theta_t - \eta g^{+}(\theta_t)\nabla \mathcal{L}(\theta),
where :math:`g^{+}` refers to the pseudo-inverse.
.. note::
It can be shown that the Fubini-Study metric tensor reduces
to the Fisher information matrix in the classical limit.
Furthermore, in the limit where :math:`\eta\rightarrow 0`,
the dynamics of the system are equivalent to imaginary-time
evolution within the variational subspace, as proposed in
`McArdle et al. (2018) <https://arxiv.org/abs/1804.03023>`__.
"""
##############################################################################
# Block-diagonal metric tensor
# ----------------------------
#
# A block-diagonal approximation to the Fubini-Study metric tensor
# of a variational quantum circuit can be evaluated on quantum hardware.
#
# Consider a variational quantum circuit
#
# .. math::
#
# U(\mathbf{\theta})|\psi_0\rangle = V_L(\theta_L) W_L V_{L-1}(\theta_{L-1}) W_{L-1}
# \cdots V_{\ell}(\theta_{\ell}) W_{\ell} \cdots V_{0}(\theta_{0}) W_{0} |\psi_0\rangle
#
# where
#
# * :math:`|\psi_0\rangle` is the initial state,
# * :math:`W_\ell` are layers of non-parametrized quantum gates,
# * :math:`V_\ell(\theta_\ell)` are layers of parametrized quantum gates
# with :math:`n_\ell` parameters :math:`\theta_\ell = \{\theta^{(\ell)}_0, \dots, \theta^{(\ell)}_n\}`.
#
# Further, assume all parametrized gates can be written in the form
# :math:`X(\theta^{(\ell)}_{i}) = e^{i\theta^{(\ell)}_{i} K^{(\ell)}_i}`,
# where :math:`K^{(\ell)}_i` is the *generator* of the parametrized operation.
#
# For each parametric layer :math:`\ell` in the variational quantum circuit
# the :math:`n_\ell\times n_\ell` block-diagonal submatrix
# of the Fubini-Study tensor :math:`g_{ij}^{(\ell)}` is calculated by:
#
# .. math::
#
# g_{ij}^{(\ell)} = \langle \psi_{\ell-1} | K_i K_j | \psi_{\ell-1} \rangle
# - \langle \psi_{\ell-1} | K_i | \psi_{\ell-1}\rangle
# \langle \psi_{\ell-1} |K_j | \psi_{\ell-1}\rangle
#
# where
#
# .. math::
#
# | \psi_{\ell-1}\rangle = V_{\ell-1}(\theta_{\ell-1}) W_{\ell-1} \cdots V_{0}(\theta_{0}) W_{0} |\psi_0\rangle.
#
# (that is, :math:`|\psi_{\ell-1}\rangle` is the quantum state prior to the application
# of parameterized layer :math:`\ell`), and we have :math:`K_i \equiv K_i^{(\ell)}` for brevity.
#
# Let's consider a small variational quantum circuit example coded in PennyLane:
import numpy as np
import pennylane as qml
from pennylane import expval, var
dev = qml.device("default.qubit", wires=3)
@qml.qnode(dev)
def circuit(params):
# |psi_0>: state preparation
qml.RY(np.pi / 4, wires=0)
qml.RY(np.pi / 3, wires=1)
qml.RY(np.pi / 7, wires=2)
# V0(theta0, theta1): Parametrized layer 0
qml.RZ(params[0], wires=0)
qml.RZ(params[1], wires=1)
# W1: non-parametrized gates
qml.CNOT(wires=[0, 1])
qml.CNOT(wires=[1, 2])
# V_1(theta2, theta3): Parametrized layer 1
qml.RY(params[2], wires=1)
qml.RX(params[3], wires=2)
# W2: non-parametrized gates
qml.CNOT(wires=[0, 1])
qml.CNOT(wires=[1, 2])
return qml.expval(qml.PauliY(0))
params = np.array([0.432, -0.123, 0.543, 0.233])
##############################################################################
# The above circuit consists of 4 parameters, with two distinct parametrized
# layers of 2 parameters each.
#
# .. figure:: ../demonstrations/quantum_natural_gradient/qng1.png
# :align: center
# :width: 90%
# :target: javascript:void(0)
#
# |
#
# (Note that in this example, the first non-parametrized layer :math:`W_0`
# is simply the identity.) Since there are two layers, each with two parameters,
# the block-diagonal approximation consists of two
# :math:`2\times 2` matrices, :math:`g^{(0)}` and :math:`g^{(1)}`.
#
# .. figure:: ../demonstrations/quantum_natural_gradient/qng2.png
# :align: center
# :width: 30%
# :target: javascript:void(0)
#
# To compute the first block-diagonal :math:`g^{(0)}`, we create subcircuits consisting
# of all gates prior to the layer, and observables corresponding to
# the *generators* of the gates in the layer:
#
# .. figure:: ../demonstrations/quantum_natural_gradient/qng3.png
# :align: center
# :width: 30%
# :target: javascript:void(0)
g0 = np.zeros([2, 2])
def layer0_subcircuit(params):
"""This function contains all gates that
precede parametrized layer 0"""
qml.RY(np.pi / 4, wires=0)
qml.RY(np.pi / 3, wires=1)
qml.RY(np.pi / 7, wires=2)
##############################################################################
# We then post-process the measurement results in order to determine :math:`g^{(0)}`,
# as follows.
#
# .. figure:: ../demonstrations/quantum_natural_gradient/qng4.png
# :align: center
# :width: 50%
# :target: javascript:void(0)
#
# We can see that the diagonal terms are simply given by the variance:
@qml.qnode(dev)
def layer0_diag(params):
layer0_subcircuit(params)
return var(qml.PauliZ(0)), var(qml.PauliZ(1))
# calculate the diagonal terms
varK0, varK1 = layer0_diag(params)
g0[0, 0] = varK0 / 4
g0[1, 1] = varK1 / 4
##############################################################################
# The following two subcircuits are then used to calculate the
# off-diagonal covariance terms of :math:`g^{(0)}`:
@qml.qnode(dev)
def layer0_off_diag_single(params):
layer0_subcircuit(params)
return expval(qml.PauliZ(0)), expval(qml.PauliZ(1))
@qml.qnode(dev)
def layer0_off_diag_double(params):
layer0_subcircuit(params)
ZZ = np.kron(np.diag([1, -1]), np.diag([1, -1]))
return expval(qml.Hermitian(ZZ, wires=[0, 1]))
# calculate the off-diagonal terms
exK0, exK1 = layer0_off_diag_single(params)
exK0K1 = layer0_off_diag_double(params)
g0[0, 1] = (exK0K1 - exK0 * exK1) / 4
g0[1, 0] = (exK0K1 - exK0 * exK1) / 4
##############################################################################
# Note that, by definition, the block-diagonal matrices must be real and
# symmetric.
#
# We can repeat the above process to compute :math:`g^{(1)}`. The subcircuit
# required is given by
#
# .. figure:: ../demonstrations/quantum_natural_gradient/qng8.png
# :align: center
# :width: 50%
# :target: javascript:void(0)
g1 = np.zeros([2, 2])
def layer1_subcircuit(params):
"""This function contains all gates that
precede parametrized layer 1"""
# |psi_0>: state preparation
qml.RY(np.pi / 4, wires=0)
qml.RY(np.pi / 3, wires=1)
qml.RY(np.pi / 7, wires=2)
# V0(theta0, theta1): Parametrized layer 0
qml.RZ(params[0], wires=0)
qml.RZ(params[1], wires=1)
# W1: non-parametrized gates
qml.CNOT(wires=[0, 1])
qml.CNOT(wires=[1, 2])
##############################################################################
# Using this subcircuit, we can now generate the submatrix :math:`g^{(1)}`.
#
# .. figure:: ../demonstrations/quantum_natural_gradient/qng5.png
# :align: center
# :width: 50%
# :target: javascript:void(0)
@qml.qnode(dev)
def layer1_diag(params):
layer1_subcircuit(params)
return var(qml.PauliY(1)), var(qml.PauliX(2))
##############################################################################
# As previously, the diagonal terms are simply given by the variance,
varK0, varK1 = layer1_diag(params)
g1[0, 0] = varK0 / 4
g1[1, 1] = varK1 / 4
##############################################################################
# while the off-diagonal terms require covariance between the two
# observables to be computed.
@qml.qnode(dev)
def layer1_off_diag_single(params):
layer1_subcircuit(params)
return expval(qml.PauliY(1)), expval(qml.PauliX(2))
@qml.qnode(dev)
def layer1_off_diag_double(params):
layer1_subcircuit(params)
X = np.array([[0, 1], [1, 0]])
Y = np.array([[0, -1j], [1j, 0]])
YX = np.kron(Y, X)
return expval(qml.Hermitian(YX, wires=[1, 2]))
# calculate the off-diagonal terms
exK0, exK1 = layer1_off_diag_single(params)
exK0K1 = layer1_off_diag_double(params)
g1[0, 1] = (exK0K1 - exK0 * exK1) / 4
g1[1, 0] = g1[0, 1]
##############################################################################
# Putting this altogether, the block-diagonal approximation to the Fubini-Study
# metric tensor for this variational quantum circuit is
from scipy.linalg import block_diag
g = block_diag(g0, g1)
print(np.round(g, 8))
##############################################################################
# PennyLane QNodes contain a built-in method for computing the Fubini-Study metric
# tensor, :meth:`~.pennylane.QNode.metric_tensor`, which
# we can use to verify this result:
print(np.round(circuit.metric_tensor([params]), 8))
##############################################################################
# As opposed to our manual computation, which required 6 different quantum
# evaluations, the PennyLane Fubini-Study metric tensor implementation
# requires only 2 quantum evaluations, one per layer. This is done by
# automatically detecting the layer structure, and noting that every
# observable that must be measured commutes, allowing for simultaneous measurement.
#
# Therefore, by combining the quantum natural gradient optimizer with the analytic
# parameter-shift rule to optimize a variational circuit with :math:`d` parameters
# and :math:`L` parametrized layers, a total of :math:`2d+L` quantum evaluations
# are required per optimization step.
#
# Note that the :meth:`~.pennylane.QNode.metric_tensor` method also supports computing the diagonal
# approximation to the metric tensor:
print(circuit.metric_tensor([params], diag_approx=True))
##############################################################################
# Quantum natural gradient optimization
# -------------------------------------
#
# PennyLane provides an implementation of the quantum natural gradient
# optimizer, :class:`~.pennylane.QNGOptimizer`. Let's compare the optimization convergence
# of the QNG Optimizer and the :class:`~.pennylane.GradientDescentOptimizer` for the simple variational
# circuit above.
steps = 200
init_params = np.array([0.432, -0.123, 0.543, 0.233])
##############################################################################
# Performing vanilla gradient descent:
gd_cost = []
opt = qml.GradientDescentOptimizer(0.01)
theta = init_params
for _ in range(steps):
theta = opt.step(circuit, theta)
gd_cost.append(circuit(theta))
##############################################################################
# Performing quantum natural gradient descent:
qng_cost = []
opt = qml.QNGOptimizer(0.01)
theta = init_params
for _ in range(steps):
theta = opt.step(circuit, theta)
qng_cost.append(circuit(theta))
##############################################################################
# Plotting the cost vs optimization step for both optimization strategies:
from matplotlib import pyplot as plt
plt.style.use("seaborn")
plt.plot(gd_cost, "b", label="Vanilla gradient descent")
plt.plot(qng_cost, "g", label="Quantum natural gradient descent")
plt.ylabel("Cost function value")
plt.xlabel("Optimization steps")
plt.legend()
plt.show()
##############################################################################
# References
# ----------
#
# 1. Shun-Ichi Amari. "Natural gradient works efficiently in learning."
# `Neural computation 10.2, 251-276 <https://www.mitpressjournals.org/doi/abs/10.1162/089976698300017746>`__, 1998.
#
# 2. James Stokes, Josh Izaac, Nathan Killoran, Giuseppe Carleo.
# "Quantum Natural Gradient." `arXiv:1909.02108 <https://arxiv.org/abs/1909.02108>`__, 2019.
#
# 3. Aram Harrow and John Napp. "Low-depth gradient measurements can improve
# convergence in variational hybrid quantum-classical algorithms."
# `arXiv:1901.05374 <https://arxiv.org/abs/1901.05374>`__, 2019.
#
# 4. Naoki Yamamoto. "On the natural gradient for variational quantum eigensolver."
# `arXiv:1909.05074 <https://arxiv.org/abs/1909.05074>`__, 2019.