Speed up default.qubit by using permutations to apply some gates #772

trbromley · 2020-08-21T15:40:29Z

Context:

This PR uses some array/tensor manipulation tricks such as roll, transpose and slicing to speed up the application of some gates:

PauliX
PauliY
PauliZ
Hadamard
SWAP
S
T
CNOT
CZ

For example, PauliX can be achieved simply using np.roll(state, 1, wire_axis). Currently in PL, these gates are applied using the _apply_diagonal_unitary() and _apply_unitary_einsum() methods, which both internally perform a contraction using einsum.

Description of the Change:

Additional methods have been added to DefaultQubit for each of the above gates, e.g., _apply_x() for PauliX. In the main apply_operation() method, we then dispatch to each of the functions using a dictionary that maps from operation name to corresponding method.

Benefits:

Increased speed. This can be checked using a benchmark script (see at the bottom). The script makes a trial circuit composed of a single gate that is repeatedly placed on each qubit and then repeated for a chosen depth. The script then times how long multiple runs take to complete. Using this, we can benchmark the new and old approaches in DefaultQubit and find the speedup as a function of qubit number:

We see a speed up, with different gates performing better than others. PauliY and Hadamard are the worst performers, possibly because they combine two applications of PauliX and PauliZ.

Note that the changes in this PR are also compatible with default.qubit.tf and default.qubit.autograd. The speedup plots are shown below:

I'm not sure exactly why the default.qubit.tf plot looks different. Also, these benchmarks are quite artificial in that we only restrict to the gates that we hope to have sped up - practical circuits may include other gates.

Possible Drawbacks:

Edge cases where the new approach is slower than the old?
DefaultQubit feels a bit busy now with a lot of methods. I couldn't pull out these methods as functions in a separate ops file because we need access to, e.g., self._roll for the interface-correct function.
Do we need to test these methods explicitly? We don't for other methods in DefaultQubit.
We don't speed up any gates with trainable parameters such as Rot, which are quite common. I don't think there is any fundamental roadblock for doing this though, could be a follow up PR.

Benchmark script:

import pennylane as qml
import time
import numpy as np

sped_up_gates = [
    qml.PauliX,
    qml.PauliY,
    qml.PauliZ,
    qml.Hadamard,
    qml.SWAP,
    qml.S,
    qml.T,
    qml.CNOT,
    qml.CZ,
]

layers = 5

def get_time(num_wires, layers, reps, gate):

    gate_num_wires = gate.num_wires
    print("Benchmarking for {} qubits on gate {}".format(num_wires, gate))
    
    dev = qml.device("default.qubit.tf", wires=num_wires)
    
    @qml.qnode(dev, interface="tf")
    def benchmark():
        for l in range(layers):
            for i in range(num_wires):
                wires = [w % num_wires for w in range(i, i + gate_num_wires)]
                gate(wires=wires)
        return qml.expval(qml.PauliZ(0))

    time0 = time.time()

    for r in range(reps):
        benchmark()

    time1 = time.time()

    total_time = time1 - time0
    return total_time

def get_time_over_qubits(range_qubits, layers, reps, gate):
    return [get_time(qubit, layers, reps, gate) for qubit in range_qubits]

def get_time_over_gates(range_qubits, layers, reps, range_gates):
    return np.array([get_time_over_qubits(range_qubits, layers, reps, gate) for gate in range_gates])

results = get_time_over_gates(range(2, 8), layers, 1000, sped_up_gates)

codecov · 2020-08-21T15:45:55Z

Codecov Report

Merging #772 into master will increase coverage by 0.10%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master     #772      +/-   ##
==========================================
+ Coverage   93.42%   93.53%   +0.10%     
==========================================
  Files         115      115              
  Lines        7044     7110      +66     
==========================================
+ Hits         6581     6650      +69     
+ Misses        463      460       -3

Impacted Files	Coverage Δ
pennylane/_qubit_device.py	`98.63% <100.00%> (+0.01%)`	⬆️
pennylane/devices/default_qubit.py	`98.83% <100.00%> (+0.51%)`	⬆️
pennylane/devices/default_qubit_autograd.py	`97.67% <100.00%> (+0.45%)`	⬆️
pennylane/devices/default_qubit_tf.py	`88.23% <100.00%> (+1.27%)`	⬆️
pennylane/devices/autograd_ops.py	`97.29% <0.00%> (+8.10%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update bd1ad5a...4c8178d. Read the comment docs.

trbromley · 2020-08-21T20:47:11Z

pennylane/devices/default_qubit.py

+        apply_func = self._ops_map.get(operation.name, None)
+        if apply_func:
+            self._state = apply_func(self._state, axes, inverse=operation.inverse)
+            return
+


This was partially motivated from pylint complaining about too many return statements, but also it seemed a bit bloated in the other approach which was:

if isinstance(operation, Gate): self._state = self._apply_gate(self_state, axes)

x9 times.

The downside of the current approach is that all the methods need a compatible signature, resulting in a **kwargs addition in some cases (which pylint also didn't like 😆, hence the ignore on line 157)

👍

An alternate approach is to change them to methods, with a signature like:

def _apply_y(self, state, axes, roll_fn=np.roll, stack_fn=np.stack):

e.g., you pass the functions to use for rolling and stacking. But, I don't think this is any cleaner then the current approach (it's probably worse).

Nice, could also work if we want to make these methods functions, although not sure how it would scale if we want to add other array functions to the argument. I guess we could pass a container of array functions.

trbromley · 2020-08-27T15:34:35Z

I did a second round of benchmarks, this time on a server rather than my laptop to minimize the influence of other running processes.

For me, this leaves the following gates up for discussion:

CZ
Hadamard
PauliY

Luckily, CZ and PauliY are less commonly used, but Hadamard is pretty common. One thing we can do is prevent the special methods for these gates from occurring in default.qubit.autograd and/or default.qubit.tf.

For default.qubit.tf, I propose to prevent CZ (1355e97)
For default.qubit.autograd I propose to prevent all three (0d5edb4)
For default.qubit, let's keep all the methods!

josh146 · 2020-08-27T22:13:58Z

pennylane/devices/default_qubit_autograd.py

+        del self._apply_ops["PauliY"]
+        del self._apply_ops["Hadamard"]
+        del self._apply_ops["CZ"]


😆 This is to give better benchmarks?

How so? 😄 Edit: saw the benchmark files. Might be worth leaving a comment here describing that a poorer performance was observed for this gates.

Added: 26e9ab3

josh146

Looks great! 🎉

antalszava · 2020-08-28T15:27:47Z

pennylane/devices/default_qubit.py



+def _get_slice(index, axis, num_axes):
+    """Allows slicing along an arbitrary axis of an array or tensor.


Could be worth adding an example, would also help understanding the exact role of each argument.

Also, in the quantum sense, is it fair to say that this is a _get_subsystem function? If so and didn't miss anything, might be worth a renaming, just because that it might make it more intuitive when reading how the operations are being applied

Thanks, have added an example: 9efdbd5.

is it fair to say that this is a _get_subsystem function?

No, it's slightly different. In a quantum sense, suppose we have a three qubit system with state tensor having shape (2, 2, 2). Then _get_slice(1, 1, 3) gives us all of the amplitudes that correspond to having a |1> state in qubit 1. Likewise, _get_slice(1, 2, 3) would be all the amplitudes with a |1> state in qubit 2, while _get_slice(0, 1, 3) would be all the amplitudes with a |0> state in qubit 1. So really what _get_slice() allows you to do is to slice into the state so that your selecting |0> or |1> for a given qubit. This is relevant when, for example, we want to do control or when we want to apply a phase on the |1> state.

Unfortunately, I'm not sure of the best name 😆

antalszava · 2020-08-28T15:38:50Z

pennylane/devices/default_qubit.py

+        # If axis[1] is larger than axis[0], then state[sl_1] will have self.num_wires - 1 axes with
+        # the target axes shifted down by one. Otherwise, if axis[1] is less than axis[0] then its
+        # axis number in state[sl_1] remains unchanged. For example: state has axes [0, 1, 2, 3] and
+        # axis[0] = 1 and axis[1] = 3. Then, state[sl_1] has axes [0, 1, 2] so that the target axis
+        # has shifted from 3 to 2. If axis[0] = 2 and axis[1] = 1, then state[sl_1] has axes
+        # [0, 1, 2] but with the target axis remaining unchanged.
+        if axes[1] > axes[0]:
+            target_axes = [axes[1] - 1]
+        else:
+            target_axes = [axes[1]]


This part is not entirely clear to me when going for it for some time.

then state[sl_1] will have self.num_wires - 1 axes with
# the target axes shifted down by one

Why is it so?

One-piece that is missing is how the first wire is being the control (although the operation does seem to be equivalent). Might be worth explaining perhaps in the main docstring, how using slices and the axes we can end up with the same operation.

🤔 I've tried to explain a bit better, and also add to the main docstring.
7059c50

pennylane/devices/default_qubit.py

antalszava · 2020-08-28T15:44:18Z

pennylane/devices/default_qubit.py

@@ -142,6 +180,148 @@ def _apply_operation(self, operation):
        else:
            self._apply_unitary(matrix, wires)

+    def _apply_x(self, state, axes, **kwargs):
+        """Applies a PauliX gate by rolling 1 unit along the axis specified in ``axes``.


Might be worth elaborating on why this is equivalent to applying a PauliX

Thanks, I tried to add an explanation, but I have to admit these things are hard to explain!

789a724

trbromley

Thanks @antalszava, good suggestions! I've had a go at making changes and it's ready for another look.

trbromley · 2020-08-28T20:06:10Z

pennylane/devices/default_qubit.py



+def _get_slice(index, axis, num_axes):
+    """Allows slicing along an arbitrary axis of an array or tensor.


Thanks, have added an example: 9efdbd5.

trbromley · 2020-08-28T20:14:48Z

pennylane/devices/default_qubit.py



+def _get_slice(index, axis, num_axes):
+    """Allows slicing along an arbitrary axis of an array or tensor.


is it fair to say that this is a _get_subsystem function?

No, it's slightly different. In a quantum sense, suppose we have a three qubit system with state tensor having shape (2, 2, 2). Then _get_slice(1, 1, 3) gives us all of the amplitudes that correspond to having a |1> state in qubit 1. Likewise, _get_slice(1, 2, 3) would be all the amplitudes with a |1> state in qubit 2, while _get_slice(0, 1, 3) would be all the amplitudes with a |0> state in qubit 1. So really what _get_slice() allows you to do is to slice into the state so that your selecting |0> or |1> for a given qubit. This is relevant when, for example, we want to do control or when we want to apply a phase on the |1> state.

Unfortunately, I'm not sure of the best name 😆

trbromley · 2020-08-28T20:19:48Z

pennylane/devices/default_qubit.py

@@ -142,6 +180,148 @@ def _apply_operation(self, operation):
        else:
            self._apply_unitary(matrix, wires)

+    def _apply_x(self, state, axes, **kwargs):
+        """Applies a PauliX gate by rolling 1 unit along the axis specified in ``axes``.


Thanks, I tried to add an explanation, but I have to admit these things are hard to explain!

789a724

pennylane/devices/default_qubit.py

trbromley · 2020-08-28T20:25:26Z

pennylane/devices/default_qubit_autograd.py

+        del self._apply_ops["PauliY"]
+        del self._apply_ops["Hadamard"]
+        del self._apply_ops["CZ"]


Added: 26e9ab3

trbromley · 2020-08-28T20:33:49Z

pennylane/devices/default_qubit.py

+        # If axis[1] is larger than axis[0], then state[sl_1] will have self.num_wires - 1 axes with
+        # the target axes shifted down by one. Otherwise, if axis[1] is less than axis[0] then its
+        # axis number in state[sl_1] remains unchanged. For example: state has axes [0, 1, 2, 3] and
+        # axis[0] = 1 and axis[1] = 3. Then, state[sl_1] has axes [0, 1, 2] so that the target axis
+        # has shifted from 3 to 2. If axis[0] = 2 and axis[1] = 1, then state[sl_1] has axes
+        # [0, 1, 2] but with the target axis remaining unchanged.
+        if axes[1] > axes[0]:
+            target_axes = [axes[1] - 1]
+        else:
+            target_axes = [axes[1]]


🤔 I've tried to explain a bit better, and also add to the main docstring.
7059c50

antalszava · 2020-08-28T21:36:59Z

pennylane/devices/default_qubit.py

+        # We will be slicing into the state according to state[sl_1], giving us all of the
+        # amplitudes with a |1> for the control qubit. The resulting array has lost an axis
+        # relative to state and we need to be careful about the axis we apply the PauliX rotation
+        # to. If axes[1] is larger than axes[0], then we need to shift the target axis down by
+        # one, otherwise we can leave as-is. For example: a state has [0, 1, 2, 3], control=1,
+        # target=3. Then, state[sl_1] has 3 axes and target=3 now corresponds to the second axis.


antalszava

Looks good to me! 💯 great one 😊

trbromley added 3 commits August 21, 2020 11:37

Apply SWAP

e20d5fb

Add swap properly

eae685d

Remove extra transpose

e2c6a27

trbromley added the WIP 🚧 Work-in-progress label Aug 21, 2020

trbromley self-assigned this Aug 21, 2020

trbromley added 15 commits August 21, 2020 11:54

Update UI

ff96b9d

Add PauliX

351cb1e

Work on hadamard

4dcccf3

Use stack

7910140

Add PauliZ

badb356

Add PauliY

56c295c

Add T

9c70204

Move toward more static method

945f48a

Use a common UI

be23c65

Add CNOT and CZ

9f0fc29

apply pylint

022794c

fix pylint

65866d9

Improve wording of docstring

a2e14e7

Update order

8009cf6

Improve wording

d4ca1f2

trbromley changed the title ~~[WIP] Speed up default.qubit by using permutations to apply some gates~~ Speed up default.qubit by using permutations to apply some gates Aug 21, 2020

trbromley commented Aug 21, 2020

View reviewed changes

trbromley added 2 commits August 24, 2020 10:04

Merge branch 'master' into simple_permutations

06f4620

Add test for DiagonalOperation

88db003

trbromley marked this pull request as ready for review August 24, 2020 14:35

trbromley requested review from josh146, co9olguy and antalszava August 24, 2020 14:35

Add _get_slice test

0c74f19

Remove CZ from tf

1355e97

trbromley added 5 commits August 27, 2020 13:55

Merge branch 'master' into simple_permutations

b8f6285

Run black:

9571c0c

Apply new version of black:

4c8178d

Move axes definition

c19c9c6

Run black

82fc011

josh146 reviewed Aug 27, 2020

View reviewed changes

josh146 approved these changes Aug 27, 2020

View reviewed changes

Add to changelog

ee99223

antalszava reviewed Aug 28, 2020

View reviewed changes

pennylane/devices/default_qubit.py Outdated Show resolved Hide resolved

antalszava reviewed Aug 28, 2020

View reviewed changes

trbromley added 6 commits August 28, 2020 15:55

Merge branch 'master' into simple_permutations

a933f35

Add example

9efdbd5

Add explanation

789a724

Fix axis/axes wording

b055136

Clarify exclusion of gates

26e9ab3

Alter explanation

7059c50

trbromley commented Aug 28, 2020

View reviewed changes

trbromley requested a review from antalszava August 28, 2020 20:34

antalszava reviewed Aug 28, 2020

View reviewed changes

antalszava approved these changes Aug 28, 2020

View reviewed changes

trbromley added the merge-ready ✔️ All tests pass and the PR is ready to be merged. label Aug 28, 2020

Merge branch 'master' into simple_permutations

3ceaada

josh146 merged commit 01de8fc into master Aug 31, 2020

josh146 deleted the simple_permutations branch August 31, 2020 14:40

josh146 mentioned this pull request Sep 12, 2020

Adds a mixed-state simulator: part 1 #794

Merged

josh146 mentioned this pull request Oct 1, 2020

Applying operations in default.qubit.tf errors if the number of wires is 9 or larger #835

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Speed up default.qubit by using permutations to apply some gates #772

Speed up default.qubit by using permutations to apply some gates #772

trbromley commented Aug 21, 2020 •

edited

Loading

codecov bot commented Aug 21, 2020 •

edited

Loading

trbromley Aug 21, 2020

josh146 Aug 25, 2020

trbromley Aug 26, 2020

trbromley commented Aug 27, 2020

josh146 Aug 27, 2020

trbromley Aug 28, 2020

antalszava Aug 28, 2020 •

edited

Loading

trbromley Aug 28, 2020

josh146 left a comment

antalszava Aug 28, 2020

antalszava Aug 28, 2020

trbromley Aug 28, 2020

trbromley Aug 28, 2020

antalszava Aug 28, 2020

trbromley Aug 28, 2020

antalszava Aug 28, 2020

trbromley Aug 28, 2020

trbromley left a comment

trbromley Aug 28, 2020

trbromley Aug 28, 2020

trbromley Aug 28, 2020

trbromley Aug 28, 2020

trbromley Aug 28, 2020

antalszava Aug 28, 2020

antalszava left a comment



		def _get_slice(index, axis, num_axes):
		"""Allows slicing along an arbitrary axis of an array or tensor.

Speed up default.qubit by using permutations to apply some gates #772

Speed up default.qubit by using permutations to apply some gates #772

Conversation

trbromley commented Aug 21, 2020 • edited Loading

codecov bot commented Aug 21, 2020 • edited Loading

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

trbromley commented Aug 27, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

antalszava Aug 28, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

josh146 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

trbromley left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

antalszava left a comment

Choose a reason for hiding this comment

trbromley commented Aug 21, 2020 •

edited

Loading

codecov bot commented Aug 21, 2020 •

edited

Loading

antalszava Aug 28, 2020 •

edited

Loading