/
tutorial_noisy_circuit_optimization.py
532 lines (472 loc) · 20 KB
/
tutorial_noisy_circuit_optimization.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
r"""
Optimizing noisy circuits with Cirq
===================================
.. meta::
:property="og:description": Learn how noise can affect the optimization and training of quantum computations.
:property="og:image": https://pennylane.ai/qml/_images/noisy_circuit_optimization_thumbnail.png
.. figure:: ../demonstrations/noisy_circuit_optimization/noisy_qubit.png
:align: center
:width: 90%
Until we have fault-tolerant quantum computers,
we will have to learn to live with noise. There are lots of exciting
ideas and algorithms in quantum computing and quantum machine learning,
but how well do they survive the reality of today's noisy devices?
Background
----------
Quantum pure-state simulators are great and readily available in
a number of quantum software packages.
They allow us to experiment, prototype, test, and validate algorithms
and research ideas---up to a certain number of qubits, at least.
But present-day hardware is not ideal. We're forced to confront
decoherence, bit flips, amplitude damping, and so on. Does the
presence of noise in near-term devices impact their use in,
for example,
:doc:`variational quantum algorithms </glossary/variational_circuit>`?
Won't our models, trained so carefully in simulators, fall apart
when we run on noisy devices?
In fact, there is some optimism that variational algorithms may be
the best type of algorithms on near-term devices, and could
have an in-built adaptability to noise that more rigid
textbook algorithms do not possess.
Variational algorithms are somewhat robust against the fact
that the device they are run on may not be ideal. Being variational
in nature, they can be tuned to "work around" noise to some extent.
Quantum machine learning leverages a lot of tools from its
classical counterpart. Fortunately, there is great evidence that
machine learning algorithms can not only be robust to noise, but
can even benefit from it!
Examples include the use of
`reduced-precision arithmetic in deep learning <https://dl.acm.org/doi/abs/10.5555/3045118.3045303>`_,
the strong performance of
`stochastic gradient descent <https://en.wikipedia.org/wiki/Stochastic_gradient_descent>`_,
and the use of "dropout" noise to
`prevent overfitting <https://www.cs.toronto.edu/~hinton/absps/JMLRdropout.pdf>`_.
With this evidence to motivate us, we can still hope to find,
extract, and work with the underlying quantum "signal"
that is influenced by a device's inherent noise.
Noisy circuits: creating a Bell state
-------------------------------------
Let's consider a simple quantum circuit which performs a standard
quantum information task: the creation of an entangled
state and the measurement of a
`Bell inequality <https://en.wikipedia.org/wiki/Bell%27s_theorem>`_
(also known as the
`CHSH inequality <https://en.wikipedia.org/wiki/CHSH_inequality>`_).
Since we'll be dealing with noise, we'll need to use a simulator
that supports noise and density-state representations of quantum
states (in contrast to many simulators, which use a pure-state
representation).
Fortunately, `Cirq <https://cirq.readthedocs.io>`_ provides mixed-state
simulators and noisy operations natively, so we can use the
`PennyLane-Cirq plugin <https://pennylane-cirq.readthedocs.io>`_
to carry out our noisy simulations.
"""
import pennylane as qml
from pennylane import numpy as np
import matplotlib.pyplot as plt
dev = qml.device("cirq.mixedsimulator", wires=2)
# CHSH observables
A1 = qml.PauliZ(0)
A2 = qml.PauliX(0)
B1 = qml.Hermitian(np.array([[1, 1], [1, -1]]) / np.sqrt(2), wires=1)
B2 = qml.Hermitian(np.array([[1, -1], [-1, -1]]) / np.sqrt(2), wires=1)
CHSH_observables = [A1 @ B1, A1 @ B2, A2 @ B1, A2 @ B2]
# subcircuit for creating an entangled pair of qubits
def bell_pair():
qml.Hadamard(wires=0)
qml.CNOT(wires=[0, 1])
# circuits for measuring each distinct observable
@qml.qnode(dev)
def measure_A1B1():
bell_pair()
return qml.expval(A1 @ B1)
@qml.qnode(dev)
def measure_A1B2():
bell_pair()
return qml.expval(A1 @ B2)
@qml.qnode(dev)
def measure_A2B1():
bell_pair()
return qml.expval(A2 @ B1)
@qml.qnode(dev)
def measure_A2B2():
bell_pair()
return qml.expval(A2 @ B2)
circuits = qml.QNodeCollection([measure_A1B1,
measure_A1B2,
measure_A2B1,
measure_A2B2])
# now we measure each circuit and construct the CHSH inequality
expvals = circuits()
# The CHSH operator is A1 @ B1 + A1 @ B2 + A2 @ B1 - A2 @ B2
CHSH_expval = np.sum(expvals[:3]) - expvals[3]
print(CHSH_expval)
##############################################################################
# The output here is :math:`2\sqrt{2}`, which is the maximal value of the
# CHSH inequality. States which have a value
# :math:`\langle CHSH \rangle \geq 2` can safely be considered
# "quantum".
#
# .. note:: In this situation "quantum" means that there is
# no `local hidden variable theory <https://en.wikipedia.org/wiki/Local_hidden-variable_theory>`_
# which could produce these measurement outcomes. It does not strictly
# mean the presence of entanglement.
#
# Now let's turn up the noise! 📢 📢 📢
#
# Cirq provides a number of noisy channels that are not part of
# PennyLane core. This is no issue, as the
# `PennyLane-Cirq <https://pennylane-cirq.readthedocs.io>`_
# plugin provides these and allows them to be used directly in PennyLane
# circuit declarations.
from pennylane_cirq import ops as cirq_ops
# Note that the 'Operation' op is a generic base class
# from PennyLane core.
# All other ops are provided by Cirq.
available_ops = [op for op in dir(cirq_ops) if not op.startswith('_')]
print("\n".join(available_ops))
##############################################################################
# PennyLane operations and external framework-specific operations can be
# interwoven freely in circuits that use that plugin's device
# for execution.
# In this case, the Cirq-provided channels can be used with
# Cirq's mixed-state simulator.
#
# We'll use the ``BitFlip`` channel, which has the effect of
# randomly flipping the qubits in the computational basis.
noise_vals = np.linspace(0, 1, 25)
CHSH_vals = []
noisy_expvals = []
for p in noise_vals:
# we overwrite the bell_pair() subcircuit to add
# extra noisy channels after the entangled state is created
def bell_pair():
qml.Hadamard(wires=0)
qml.CNOT(wires=[0, 1])
cirq_ops.BitFlip(p, wires=0)
cirq_ops.BitFlip(p, wires=1)
# measuring the circuits will now use the new noisy bell_pair() function
expvals = circuits()
noisy_expvals.append(expvals)
noisy_expvals = np.array(noisy_expvals)
CHSH_expvals = np.sum(noisy_expvals[:,:3], axis=1) - noisy_expvals[:,3]
# Plot the individual observables
plt.plot(noise_vals, noisy_expvals[:, 0], 'D',
label = r"$\hat{A1}\otimes \hat{B1}$", markersize=5)
plt.plot(noise_vals, noisy_expvals[:, 1], 'x',
label = r"$\hat{A1}\otimes \hat{B2}$", markersize=12)
plt.plot(noise_vals, noisy_expvals[:, 2], '+',
label = r"$\hat{A2}\otimes \hat{B1}$", markersize=10)
plt.plot(noise_vals, noisy_expvals[:, 3], 'v',
label = r"$\hat{A2}\otimes \hat{B2}$", markersize=10)
plt.xlabel('Noise parameter')
plt.ylabel(r'Expectation value $\langle \hat{A}_i\otimes\hat{B}_j\rangle$')
plt.legend()
plt.show()
##############################################################################
# By adding the bit-flip noise, we have degraded the value of the
# CHSH observable. The first two observables
# :math:`\hat{A}_1\otimes \hat{B}_1`
# and :math:`\hat{A}_1\otimes \hat{B}_2` are sensitive to this
# noise parameter. Their value is weakened when the noise parameter is not
# 0 or 1 (note that the the CHSH operator is symmetric with
# respect to bit flips).
#
# The latter two observables, on the other hand, are seemingly
# unaffected by the noise at all.
#
# We can see that even when noise is present, there may still be subspaces
# or observables which are minimally affected or unaffected.
# This gives us some hope that variational algorithms can learn
# to find and exploit such noise-free substructures on otherwise
# noisy devices.
#
# We can also plot the CHSH observable in the noisy case. Remember,
# values greater than 2 can safely be considered "quantum".
plt.plot(noise_vals, CHSH_expvals, label="CHSH")
plt.plot(noise_vals, 2 * np.ones_like(noise_vals),
label="Quantum-classical boundary")
plt.xlabel('Noise parameter')
plt.ylabel('CHSH Expectation value')
plt.legend()
plt.show()
##############################################################################
# Too much noise (around 0.2 in this example), and we lose the
# quantumness we created in our circuit. But if we only have a little
# noise, the quantumness undeniably remains. So there is still hope
# that quantum algorithms can do something useful, even on noisy
# near-term devices, so long as the noise is not high.
#
# .. note::
#
# In Google's quantum supremacy paper [#arute2019]_,
# they were able to show that some
# small signature of quantumness remained in their computations,
# even after a deep many-qubit noisy circuit was executed.
##############################################################################
# Optimizing noisy circuits
# -------------------------
#
# Now, how does noise affect the ability to optimize or train a
# variational circuit?
#
# Let's consider an analog of the basic
# :doc:`qubit rotation tutorial </demos/tutorial_qubit_rotation>`,
# but where we add an extra noise channel after the gates.
#
# .. note:: We model the noise process as the application of ideal noise-free
# gates, followed by the action of a noisy channel. This is a common
# technique for modelling noise, but may not be appropriate for all
# situations.
@qml.qnode(dev)
def circuit(gate_params, noise_param=0.0):
qml.RX(gate_params[0], wires=0)
qml.RY(gate_params[1], wires=0)
cirq_ops.Depolarize(noise_param, wires=0)
return qml.expval(qml.PauliZ(0))
gate_pars = [0.54, 0.12]
print("Expectation value:", circuit(gate_pars))
##############################################################################
# In this case, the depolarizing channel degrades
# the qubit's density matrix :math:`\rho` towards the state
#
# .. math::
#
# \rho' = \tfrac{1}{3}\left[X\rho X + Y\rho Y + Z\rho Z\right]
#
# (at the value :math:`p=\frac{3}{4}`, it passes through the
# maximally mixed state).
# We can see this in our circuit by looking at how the final
# :class:`~pennylane.ops.PauliZ` expectation value changes as
# a function of the noise strength.
noise_vals = np.linspace(0., 1., 20)
expvals = [circuit(gate_pars, noise_param=p) for p in noise_vals]
plt.plot(noise_vals, expvals, label="Expectation value")
plt.plot(noise_vals, np.ones_like(noise_vals), '--', label="Highest possible")
plt.plot(noise_vals, -np.ones_like(noise_vals), '--', label="Lowest possible")
plt.ylabel(r"Expectation value $\langle \hat{Z} \rangle$")
plt.xlabel(r"Noise strength $p$")
plt.legend()
plt.show()
##############################################################################
# Let's fix the noise parameter and see how the noise affects the
# optimization of our circuit. The goal is the same as the
# :doc:`qubit rotation tutorial </demos/tutorial_qubit_rotation>`,
# i.e., to tune the qubit state until it has a ``PauliZ`` expectation value
# of :math:`-1` (the lowest possible).
# declare the cost functions to be optimized
def cost(x):
return circuit(x, noise_param=0.0)
def noisy_cost(x):
return circuit(x, noise_param=0.3)
# initialize the optimizer
opt = qml.GradientDescentOptimizer(stepsize=0.4)
# set the number of steps
steps = 100
# set the initial parameter values
init_params = np.array([0.011, 0.012])
noisy_circuit_params = init_params
params = init_params
for i in range(steps):
# update the circuit parameters
# we can optimize both in the same training loop
params = opt.step(cost, params)
noisy_circuit_params = opt.step(noisy_cost, noisy_circuit_params)
if (i + 1) % 5 == 0:
print("Step {:5d}. Cost: {: .7f}; Noisy Cost: {: .7f}"
.format(i + 1,
cost(params),
noisy_cost(noisy_circuit_params)))
print("\nOptimized rotation angles (noise-free case):")
print("({: .7f}, {: .7f})".format(*params))
print("Optimized rotation angles (noisy case):")
print("({: .7f}, {: .7f})".format(*noisy_circuit_params))
##############################################################################
# There are a couple interesting observations here:
#
# i) The noisy circuit isn't able to achieve the same final
# cost function value as the ideal circuit. This is because
# the noise causes the state to become irreversibly mixed.
# Mixed states can't achieve the same extremal expectation values
# as pure states.
#
# ii) However, both circuits still converge to
# the *same parameter values* :math:`(0,\pi)`, despite having
# different final states.
#
# It could have been the case that noisy devices irreparably
# damage the optimization of variational circuits, steering us
# towards parameter values which are not at all useful.
# Luckily, at least for the simple example above, this is not the case.
# *Optimizations on noisy devices can still lead to similar parameter
# values as when we run on ideal devices.*
##############################################################################
# Understanding the effect of noisy channels
# ------------------------------------------
#
# Let's dig a bit into the underlying quantum information theory
# to understand better what's happening [#meyer2020]_.
# Expectation values of :doc:`variational circuits </glossary/variational_circuit>`,
# like the one we are measuring, are composed of three pieces:
#
# i) an initial quantum state :math:`\rho` (usually the zero state);
# ii) a parameterized unitary transformation :math:`U(\theta)`); and
# iii) measurement of a final observable :math:`\hat{B}`.
#
# The equation for the expectation value is given by the
# `Born rule <https://en.wikipedia.org/wiki/Born_rule>`_:
#
# .. math::
#
# \langle \hat{B} \rangle (\theta) =
# \mathrm{Tr}(\hat{B}U(\theta)\rho U^\dagger(\theta)).
#
# When optimizing, we can compute gradients of many common gates
# using the :doc:`parameter-shift rule </glossary/parameter_shift>`:
#
# .. math::
#
# \nabla_\theta\langle \hat{B} \rangle(\theta)
# = \frac{1}{2}
# \left[
# \langle \hat{B} \rangle\left(\theta + \frac{\pi}{2}\right)
# - \langle \hat{B} \rangle\left(\theta - \frac{\pi}{2}\right)
# \right].
#
# In our example, the parametrized unitary :math:`U(\theta)` is split into two gates,
# :math:`U = U_2 U_1`,
# where :math:`U_1=R_X` and :math:`U_1=R_Y`, and each takes an independent
# parameter :math:`\theta_i`.
#
# What happens when we apply a
# noisy channel :math:`\Lambda` after the gates? In this case,
# the expectation value is now taken with
# respect to the noisy circuit:
#
# .. math::
#
# \langle \hat{B} \rangle (\theta) =
# \mathrm{Tr}\left(\hat{B}\Lambda\left[
# U(\theta)\rho U^\dagger(\theta)
# \right]\right).
#
# Thus, we can treat it as the expectation value of the same
# observable, but with respect to a different state
# :math:`\rho' = \Lambda\left[U(\theta)\rho U^\dagger(\theta)\right]`.
#
# Alternatively, using the Heisenberg picture, we can transfer the channel
# :math:`\Lambda` acting on the state :math:`U(\theta)\rho U^\dagger(\theta)`
# into the *adjoint channel* :math:`\Lambda^\dagger` acting on the
# observable :math:`\hat{B}`, transforming it to a new observable
# :math:`\hat{B} = \Lambda^\dagger[\hat{B}]=\hat{B}'`.
#
# With the channel present, the expectation value can be interpreted
# as if we had the same variational state, but measured a different
# observable:
#
# .. math::
#
# \langle \hat{B} \rangle (\theta) =
# \mathrm{Tr}(\hat{B}'U(\theta)\rho U^\dagger(\theta)) =
# \langle \hat{B}' \rangle (\theta).
#
# This has immediate consequences for the parameter-shift rule. With
# the channel present, we have simply
#
# .. math::
#
# \nabla_\theta\langle \hat{B} \rangle(\theta)
# = \frac{1}{2}
# \left[
# \langle \hat{B}' \rangle\left(\theta + \frac{\pi}{2}\right)
# - \langle \hat{B}' \rangle\left(\theta - \frac{\pi}{2}\right)
# \right].
#
# In other words, the parameter-shift rule continues to hold for all
# gates, even when we have additional noise!
#
# .. note:: In the above derivation, we implicitly assumed that the channel
# does not depend on the variational circuit's parameters. If the
# channel depended on the particular state, or if it depended
# on the parameters :math:`\theta`, we would need to be more careful.
#
#
# Let's confirm the above derivation with an example.
angles = np.linspace(0, 2 * np.pi, 50)
theta2 = np.pi / 4
def param_shift(theta1):
return 0.5 * (noisy_cost([theta1 + np.pi / 2, theta2]) - \
noisy_cost([theta1 - np.pi / 2, theta2]))
noisy_expvals = [noisy_cost([theta1, theta2]) for theta1 in angles]
noisy_param_shift = [param_shift(theta1) for theta1 in angles]
plt.plot(angles, noisy_expvals,
label="Expectation value") # looks like 0.4 * cos(phi)
plt.plot(angles, noisy_param_shift,
label="Parameter-shift value") # looks like -0.4 * sin(phi)
plt.ylabel(r"Expectation value $\langle \hat{Z} \rangle$")
plt.xlabel(r"Angle $\theta_1$")
plt.legend()
plt.show()
##############################################################################
# By inspecting the two curves, we can see that the parameter-shift rule
# gives the correct gradient of the expectation value, even with the presence
# of the noisy channel!
#
# In this example, the influence of the channel is to
# attenuate the maximal amplitude that the qubit state can acheive
# (:math:`\approx 0.4`). But even though the qubit's amplitude is attenuated,
# the gradient computed by the parameter-shift rule still "points in the
# right direction".
#
# This backs up the observation we made earlier that
# the result of the optimization gave the correct values for the angle
# parameters, but the value of the final cost function was lower than
# the noise-free case.
#
##############################################################################
# Interpreting noisy circuit optimizations
# ----------------------------------------
#
# Despite the observations that we can compute gradients for
# noisy channels, and that optimization may lead to the same parameter
# values for both noise-free and noisy circuits, we must still remain
# cautious in how we interpret the results.
#
# We can evaluate the correct gradient for the expectation value
#
# .. math::
#
# \langle \hat{B} \rangle =
# \mathrm{Tr}\left(\hat{B}\Lambda\left[
# U(\theta)\rho U^\dagger(\theta)
# \right]\right),
#
# but, because the noisy channel is present, *this expectation value may not
# reflect the actual expectation value we wanted to compute*. This is
# important to keep in mind for certain algorithms that have a physical
# interpretation for the variational circuit.
#
# For example, in the
# :doc:`variational quantum eigensolver </demos/tutorial_vqe>`, we
# want to find the ground-state energy of a physical system.
# If there is an appreciable amount of noise present,
# the state we are optimizing will necessarily become mixed, and
# we should be careful interpreting the optimum value as the exact
# ground-state energy.
##############################################################################
# References
# ----------
#
# .. [#arute2019]
#
# Frank Arute et al. "Quantum supremacy using a programmable
# superconducting processor."
# Nature, 574(7779), 505-510.
#
# .. [#meyer2020]
#
# Johannes Jakob Meyer, Johannes Borregaard, and Jens Eisert.
# "A variational toolbox for quantum multi-parameter estimation."
# `arXiv:2006.06303
# <https://arxiv.org/abs/2006.06303>`__, 2020.
#