/
tutorial_quantum_transfer_learning.py
599 lines (476 loc) · 22.8 KB
/
tutorial_quantum_transfer_learning.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
r"""
.. _quantum_transfer_learning:
Quantum transfer learning
=========================
.. meta::
:property="og:description": In this demonstration, transfer learning is applied
to a hybrid quantum-classical image classifier using PyTorch and PennyLane.
:property="og:image": https://pennylane.ai/qml/_images/transfer_images.png
In this tutorial we apply a machine learning method, known as *transfer learning*, to an
image classifier based on a hybrid classical-quantum network.
This example follows the general structure of the PyTorch
`tutorial on transfer learning <https://pytorch.org/tutorials/beginner/transfer_learning_tutorial.html>`_
by Sasank Chilamkurthy, with the crucial difference of using a quantum circuit to perform the
final classification task.
More details on this topic can be found in the research paper [1] (`Mari et al. (2019) <https://arxiv.org/abs/1912.08278>`_).
Introduction
------------
Transfer learning is a well-established technique for training artificial neural networks (see e.g., Ref. [2]),
which is based on the general intuition that if a pre-trained network is good at solving a
given problem, then, with just a bit of additional training, it can be used to also solve a different
but related problem.
As discussed in Ref. [1], this idea can be formalized in terms of two abstract networks :math:`A`
and :math:`B`, independently from their quantum or classical physical nature.
|
.. figure:: ../demonstrations/quantum_transfer_learning/transfer_learning_general.png
:scale: 45%
:alt: transfer_general
:align: center
|
As sketched in the above figure, one can give the following **general definition of the
transfer learning method**:
1. Take a network :math:`A` that has been pre-trained on a dataset :math:`D_A` and for a given
task :math:`T_A`.
2. Remove some of the final layers. In this way, the resulting truncated network :math:`A'`
can be used as a feature extractor.
3. Connect a new trainable network :math:`B` at the end of the pre-trained network :math:`A'`.
4. Keep the weights of :math:`A'` constant, and train the final block :math:`B` with a
new dataset :math:`D_B` and/or for a new task of interest :math:`T_B`.
When dealing with hybrid systems, depending on the physical nature (classical or quantum) of the
networks :math:`A` and :math:`B`, one can have different implementations of transfer learning as
summarized in following table:
|
.. rst-class:: docstable
+-----------+-----------+-----------------------------------------------------+
| Network A | Network B | Transfer learning scheme |
+===========+===========+=====================================================+
| Classical | Classical | CC - Standard classical method. See e.g., Ref. [2]. |
+-----------+-----------+-----------------------------------------------------+
| Classical | Quantum | CQ - **Hybrid model presented in this tutorial.** |
+-----------+-----------+-----------------------------------------------------+
| Quantum | Classical | QC - Model studied in Ref. [1]. |
+-----------+-----------+-----------------------------------------------------+
| Quantum | Quantum | QQ - Model studied in Ref. [1]. |
+-----------+-----------+-----------------------------------------------------+
Classical-to-quantum transfer learning
--------------------------------------
We focus on the CQ transfer learning scheme discussed in the previous section and we give a specific example.
1. As pre-trained network :math:`A` we use **ResNet18**, a deep residual neural network introduced by
Microsoft in Ref. [3], which is pre-trained on the *ImageNet* dataset.
2. After removing its final layer we obtain :math:`A'`, a pre-processing block which maps any
input high-resolution image into 512 abstract features.
3. Such features are classified by a 4-qubit "dressed quantum circuit" :math:`B`, i.e., a
variational quantum circuit sandwiched between two classical layers.
4. The hybrid model is trained, keeping :math:`A'` constant, on the *Hymenoptera* dataset
(a small subclass of ImageNet) containing images of *ants* and *bees*.
A graphical representation of the full data processing pipeline is given in the figure below.
.. figure:: ../demonstrations/quantum_transfer_learning/transfer_learning_c2q.png
:scale: 55%
:alt: transfer_c2q
:align: center
"""
##############################################################################
# General setup
# ------------------------
#
# .. note::
#
# To use the PyTorch interface in PennyLane, you must first
# `install PyTorch <https://pytorch.org/get-started/locally/#start-locally>`_.
#
# In addition to *PennyLane*, we will also need some standard *PyTorch* libraries and the
# plotting library *matplotlib*.
# Some parts of this code are based on the Python script:
# https://github.com/pytorch/tutorials/blob/master/beginner_source/transfer_learning_tutorial.py
# License: BSD
import time
import os
import copy
# PyTorch
import torch
import torch.nn as nn
import torch.optim as optim
from torch.optim import lr_scheduler
import torchvision
from torchvision import datasets, transforms
# Pennylane
import pennylane as qml
from pennylane import numpy as np
# Plotting
import matplotlib.pyplot as plt
# OpenMP: number of parallel threads.
os.environ["OMP_NUM_THREADS"] = "1"
##############################################################################
# Setting of the main hyper-parameters of the model
# ------------------------------------------------------------
#
# .. note::
# To reproduce the results of Ref. [1], ``num_epochs`` should be set to ``30`` which may take a long time.
# We suggest to first try with ``num_epochs=1`` and, if everything runs smoothly, increase it to a larger value.
n_qubits = 4 # Number of qubits
step = 0.0004 # Learning rate
batch_size = 4 # Number of samples for each training step
num_epochs = 1 # Number of training epochs
q_depth = 6 # Depth of the quantum circuit (number of variational layers)
gamma_lr_scheduler = 0.1 # Learning rate reduction applied every 10 epochs.
q_delta = 0.01 # Initial spread of random quantum weights
rng_seed = 3 # Seed for random number generator
start_time = time.time() # Start of the computation timer
##############################################################################
# We initialize a PennyLane device with a ``default.qubit`` backend.
dev = qml.device("default.qubit", wires=n_qubits)
##############################################################################
# We configure PyTorch to use CUDA only if available. Otherwise the CPU is used.
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
##############################################################################
# Dataset loading
# ------------------------------------------------------------
#
# .. note::
# The dataset containing images of *ants* and *bees* can be downloaded
# `here <https://download.pytorch.org/tutorial/hymenoptera_data.zip>`_ and
# should be extracted in the subfolder ``../_data/hymenoptera_data``.
#
# This is a very small dataset (roughly 250 images), too small for training from scratch a
# classical or quantum model, however it is enough when using *transfer learning* approach.
#
# The PyTorch packages ``torchvision`` and ``torch.utils.data`` are used for loading the dataset
# and performing standard preliminary image operations: resize, center, crop, normalize, *etc.*
data_transforms = {
"train": transforms.Compose(
[
# transforms.RandomResizedCrop(224), # uncomment for data augmentation
# transforms.RandomHorizontalFlip(), # uncomment for data augmentation
transforms.Resize(256),
transforms.CenterCrop(224),
transforms.ToTensor(),
# Normalize input channels using mean values and standard deviations of ImageNet.
transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225]),
]
),
"val": transforms.Compose(
[
transforms.Resize(256),
transforms.CenterCrop(224),
transforms.ToTensor(),
transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225]),
]
),
}
data_dir = "../_data/hymenoptera_data"
image_datasets = {
x if x == "train" else "validation": datasets.ImageFolder(os.path.join(data_dir, x), data_transforms[x])
for x in ["train", "val"]
}
dataset_sizes = {x: len(image_datasets[x]) for x in ["train", "validation"]}
class_names = image_datasets["train"].classes
# Initialize dataloader
dataloaders = {
x: torch.utils.data.DataLoader(image_datasets[x], batch_size=batch_size, shuffle=True)
for x in ["train", "validation"]
}
# function to plot images
def imshow(inp, title=None):
"""Display image from tensor."""
inp = inp.numpy().transpose((1, 2, 0))
# Inverse of the initial normalization operation.
mean = np.array([0.485, 0.456, 0.406])
std = np.array([0.229, 0.224, 0.225])
inp = std * inp + mean
inp = np.clip(inp, 0, 1)
plt.imshow(inp)
if title is not None:
plt.title(title)
##############################################################################
# Let us show a batch of the test data, just to have an idea of the classification problem.
# Get a batch of training data
inputs, classes = next(iter(dataloaders["validation"]))
# Make a grid from batch
out = torchvision.utils.make_grid(inputs)
imshow(out, title=[class_names[x] for x in classes])
# In order to get reproducible results, we set a manual seed for the
# random number generator and re-initialize the dataloaders.
torch.manual_seed(rng_seed)
dataloaders = {
x: torch.utils.data.DataLoader(image_datasets[x], batch_size=batch_size, shuffle=True)
for x in ["train", "validation"]
}
##############################################################################
# Variational quantum circuit
# ------------------------------------
# We first define some quantum layers that will compose the quantum circuit.
def H_layer(nqubits):
"""Layer of single-qubit Hadamard gates.
"""
for idx in range(nqubits):
qml.Hadamard(wires=idx)
def RY_layer(w):
"""Layer of parametrized qubit rotations around the y axis.
"""
for idx, element in enumerate(w):
qml.RY(element, wires=idx)
def entangling_layer(nqubits):
"""Layer of CNOTs followed by another shifted layer of CNOT.
"""
# In other words it should apply something like :
# CNOT CNOT CNOT CNOT... CNOT
# CNOT CNOT CNOT... CNOT
for i in range(0, nqubits - 1, 2): # Loop over even indices: i=0,2,...N-2
qml.CNOT(wires=[i, i + 1])
for i in range(1, nqubits - 1, 2): # Loop over odd indices: i=1,3,...N-3
qml.CNOT(wires=[i, i + 1])
##############################################################################
# Now we define the quantum circuit through the PennyLane `qnode` decorator .
#
# The structure is that of a typical variational quantum circuit:
#
# * **Embedding layer:** All qubits are first initialized in a balanced superposition
# of *up* and *down* states, then they are rotated according to the input parameters
# (local embedding).
#
# * **Variational layers:** A sequence of trainable rotation layers and constant
# entangling layers is applied.
#
# * **Measurement layer:** For each qubit, the local expectation value of the :math:`Z`
# operator is measured. This produces a classical output vector, suitable for
# additional post-processing.
@qml.qnode(dev, interface="torch")
def quantum_net(q_input_features, q_weights_flat):
"""
The variational quantum circuit.
"""
# Reshape weights
q_weights = q_weights_flat.reshape(q_depth, n_qubits)
# Start from state |+> , unbiased w.r.t. |0> and |1>
H_layer(n_qubits)
# Embed features in the quantum node
RY_layer(q_input_features)
# Sequence of trainable variational layers
for k in range(q_depth):
entangling_layer(n_qubits)
RY_layer(q_weights[k])
# Expectation values in the Z basis
exp_vals = [qml.expval(qml.PauliZ(position)) for position in range(n_qubits)]
return tuple(exp_vals)
##############################################################################
# Dressed quantum circuit
# ------------------------
#
# We can now define a custom ``torch.nn.Module`` representing a *dressed* quantum circuit.
#
# This is a concatenation of:
#
# * A classical pre-processing layer (``nn.Linear``).
# * A classical activation function (``torch.tanh``).
# * A constant ``np.pi/2.0`` scaling.
# * The previously defined quantum circuit (``quantum_net``).
# * A classical post-processing layer (``nn.Linear``).
#
# The input of the module is a batch of vectors with 512 real parameters (features) and
# the output is a batch of vectors with two real outputs (associated with the two classes
# of images: *ants* and *bees*).
class DressedQuantumNet(nn.Module):
"""
Torch module implementing the *dressed* quantum net.
"""
def __init__(self):
"""
Definition of the *dressed* layout.
"""
super().__init__()
self.pre_net = nn.Linear(512, n_qubits)
self.q_params = nn.Parameter(q_delta * torch.randn(q_depth * n_qubits))
self.post_net = nn.Linear(n_qubits, 2)
def forward(self, input_features):
"""
Defining how tensors are supposed to move through the *dressed* quantum
net.
"""
# obtain the input features for the quantum circuit
# by reducing the feature dimension from 512 to 4
pre_out = self.pre_net(input_features)
q_in = torch.tanh(pre_out) * np.pi / 2.0
# Apply the quantum circuit to each element of the batch and append to q_out
q_out = torch.Tensor(0, n_qubits)
q_out = q_out.to(device)
for elem in q_in:
q_out_elem = quantum_net(elem, self.q_params).float().unsqueeze(0)
q_out = torch.cat((q_out, q_out_elem))
# return the two-dimensional prediction from the postprocessing layer
return self.post_net(q_out)
##############################################################################
# Hybrid classical-quantum model
# ------------------------------------
#
# We are finally ready to build our full hybrid classical-quantum network.
# We follow the *transfer learning* approach:
#
# 1. First load the classical pre-trained network *ResNet18* from the ``torchvision.models`` zoo.
# 2. Freeze all the weights since they should not be trained.
# 3. Replace the last fully connected layer with our trainable dressed quantum circuit (``DressedQuantumNet``).
#
# .. note::
# The *ResNet18* model is automatically downloaded by PyTorch and it may take several minutes (only the first time).
#
model_hybrid = torchvision.models.resnet18(pretrained=True)
for param in model_hybrid.parameters():
param.requires_grad = False
# Notice that model_hybrid.fc is the last layer of ResNet18
model_hybrid.fc = DressedQuantumNet()
# Use CUDA or CPU according to the "device" object.
model_hybrid = model_hybrid.to(device)
##############################################################################
# Training and results
# ------------------------
#
# Before training the network we need to specify the *loss* function.
#
# We use, as usual in classification problem, the *cross-entropy* which is
# directly available within ``torch.nn``.
criterion = nn.CrossEntropyLoss()
##############################################################################
# We also initialize the *Adam optimizer* which is called at each training step
# in order to update the weights of the model.
optimizer_hybrid = optim.Adam(model_hybrid.fc.parameters(), lr=step)
##############################################################################
# We schedule to reduce the learning rate by a factor of ``gamma_lr_scheduler``
# every 10 epochs.
exp_lr_scheduler = lr_scheduler.StepLR(optimizer_hybrid, step_size=10, gamma=gamma_lr_scheduler)
##############################################################################
# What follows is a training function that will be called later.
# This function should return a trained model that can be used to make predictions
# (classifications).
def train_model(model, criterion, optimizer, scheduler, num_epochs):
since = time.time()
best_model_wts = copy.deepcopy(model.state_dict())
best_acc = 0.0
best_loss = 10000.0 # Large arbitrary number
best_acc_train = 0.0
best_loss_train = 10000.0 # Large arbitrary number
print("Training started:")
for epoch in range(num_epochs):
# Each epoch has a training and validation phase
for phase in ["train", "validation"]:
if phase == "train":
# Set model to training mode
model.train()
else:
# Set model to evaluate mode
model.eval()
running_loss = 0.0
running_corrects = 0
# Iterate over data.
n_batches = dataset_sizes[phase] // batch_size
it = 0
for inputs, labels in dataloaders[phase]:
since_batch = time.time()
batch_size_ = len(inputs)
inputs = inputs.to(device)
labels = labels.to(device)
optimizer.zero_grad()
# Track/compute gradient and make an optimization step only when training
with torch.set_grad_enabled(phase == "train"):
outputs = model(inputs)
_, preds = torch.max(outputs, 1)
loss = criterion(outputs, labels)
if phase == "train":
loss.backward()
optimizer.step()
# Print iteration results
running_loss += loss.item() * batch_size_
batch_corrects = torch.sum(preds == labels.data).item()
running_corrects += batch_corrects
print(
"Phase: {} Epoch: {}/{} Iter: {}/{} Batch time: {:.4f}".format(
phase,
epoch + 1,
num_epochs,
it + 1,
n_batches + 1,
time.time() - since_batch,
),
end="\r",
flush=True,
)
it += 1
# Print epoch results
epoch_loss = running_loss / dataset_sizes[phase]
epoch_acc = running_corrects / dataset_sizes[phase]
print(
"Phase: {} Epoch: {}/{} Loss: {:.4f} Acc: {:.4f} ".format(
"train" if phase == "train" else "validation ",
epoch + 1,
num_epochs,
epoch_loss,
epoch_acc,
)
)
# Check if this is the best model wrt previous epochs
if phase == "validation" and epoch_acc > best_acc:
best_acc = epoch_acc
best_model_wts = copy.deepcopy(model.state_dict())
if phase == "validation" and epoch_loss < best_loss:
best_loss = epoch_loss
if phase == "train" and epoch_acc > best_acc_train:
best_acc_train = epoch_acc
if phase == "train" and epoch_loss < best_loss_train:
best_loss_train = epoch_loss
# Update learning rate
if phase == "train":
scheduler.step()
# Print final results
model.load_state_dict(best_model_wts)
time_elapsed = time.time() - since
print("Training completed in {:.0f}m {:.0f}s".format(time_elapsed // 60, time_elapsed % 60))
print("Best test loss: {:.4f} | Best test accuracy: {:.4f}".format(best_loss, best_acc))
return model
##############################################################################
# We are ready to perform the actual training process.
model_hybrid = train_model(
model_hybrid, criterion, optimizer_hybrid, exp_lr_scheduler, num_epochs=num_epochs
)
##############################################################################
# Visualizing the model predictions
# ------------------------------------
##############################################################################
# We first define a visualization function for a batch of test data.
def visualize_model(model, num_images=6, fig_name="Predictions"):
images_so_far = 0
_fig = plt.figure(fig_name)
model.eval()
with torch.no_grad():
for _i, (inputs, labels) in enumerate(dataloaders["validation"]):
inputs = inputs.to(device)
labels = labels.to(device)
outputs = model(inputs)
_, preds = torch.max(outputs, 1)
for j in range(inputs.size()[0]):
images_so_far += 1
ax = plt.subplot(num_images // 2, 2, images_so_far)
ax.axis("off")
ax.set_title("[{}]".format(class_names[preds[j]]))
imshow(inputs.cpu().data[j])
if images_so_far == num_images:
return
##############################################################################
# Finally, we can run the previous function to see a batch of images
# with the corresponding predictions.
#
visualize_model(model_hybrid, num_images=batch_size)
plt.show()
##############################################################################
# References
# ------------
#
# [1] Andrea Mari, Thomas R. Bromley, Josh Izaac, Maria Schuld, and Nathan Killoran.
# *Transfer learning in hybrid classical-quantum neural networks*. arXiv:1912.08278 (2019).
#
# [2] Rajat Raina, Alexis Battle, Honglak Lee, Benjamin Packer, and Andrew Y Ng.
# *Self-taught learning: transfer learning from unlabeled data*.
# Proceedings of the 24th International Conference on Machine Learning*, 759–766 (2007).
#
# [3] Kaiming He, Xiangyu Zhang, Shaoqing ren and Jian Sun. *Deep residual learning for image recognition*.
# Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 770-778 (2016).
#
# [4] Ville Bergholm, Josh Izaac, Maria Schuld, Christian Gogolin, Carsten Blank, Keri McKiernan, and Nathan Killoran.
# *PennyLane: Automatic differentiation of hybrid quantum-classical computations*. arXiv:1811.04968 (2018).