Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed up default.qubit by using permutations to apply some gates #772

Merged
merged 41 commits into from
Aug 31, 2020

Conversation

trbromley
Copy link
Contributor

@trbromley trbromley commented Aug 21, 2020

Context:

This PR uses some array/tensor manipulation tricks such as roll, transpose and slicing to speed up the application of some gates:

  • PauliX
  • PauliY
  • PauliZ
  • Hadamard
  • SWAP
  • S
  • T
  • CNOT
  • CZ

For example, PauliX can be achieved simply using np.roll(state, 1, wire_axis). Currently in PL, these gates are applied using the _apply_diagonal_unitary() and _apply_unitary_einsum() methods, which both internally perform a contraction using einsum.

Description of the Change:

Additional methods have been added to DefaultQubit for each of the above gates, e.g., _apply_x() for PauliX. In the main apply_operation() method, we then dispatch to each of the functions using a dictionary that maps from operation name to corresponding method.

Benefits:

Increased speed. This can be checked using a benchmark script (see at the bottom). The script makes a trial circuit composed of a single gate that is repeatedly placed on each qubit and then repeated for a chosen depth. The script then times how long multiple runs take to complete. Using this, we can benchmark the new and old approaches in DefaultQubit and find the speedup as a function of qubit number:
image
We see a speed up, with different gates performing better than others. PauliY and Hadamard are the worst performers, possibly because they combine two applications of PauliX and PauliZ.

Note that the changes in this PR are also compatible with default.qubit.tf and default.qubit.autograd. The speedup plots are shown below:
image
image

I'm not sure exactly why the default.qubit.tf plot looks different. Also, these benchmarks are quite artificial in that we only restrict to the gates that we hope to have sped up - practical circuits may include other gates.

Possible Drawbacks:

  • Edge cases where the new approach is slower than the old?
  • DefaultQubit feels a bit busy now with a lot of methods. I couldn't pull out these methods as functions in a separate ops file because we need access to, e.g., self._roll for the interface-correct function.
  • Do we need to test these methods explicitly? We don't for other methods in DefaultQubit.
  • We don't speed up any gates with trainable parameters such as Rot, which are quite common. I don't think there is any fundamental roadblock for doing this though, could be a follow up PR.

Benchmark script:

import pennylane as qml
import time
import numpy as np

sped_up_gates = [
    qml.PauliX,
    qml.PauliY,
    qml.PauliZ,
    qml.Hadamard,
    qml.SWAP,
    qml.S,
    qml.T,
    qml.CNOT,
    qml.CZ,
]

layers = 5

def get_time(num_wires, layers, reps, gate):

    gate_num_wires = gate.num_wires
    print("Benchmarking for {} qubits on gate {}".format(num_wires, gate))
    
    dev = qml.device("default.qubit.tf", wires=num_wires)
    
    @qml.qnode(dev, interface="tf")
    def benchmark():
        for l in range(layers):
            for i in range(num_wires):
                wires = [w % num_wires for w in range(i, i + gate_num_wires)]
                gate(wires=wires)
        return qml.expval(qml.PauliZ(0))

    time0 = time.time()

    for r in range(reps):
        benchmark()

    time1 = time.time()

    total_time = time1 - time0
    return total_time

def get_time_over_qubits(range_qubits, layers, reps, gate):
    return [get_time(qubit, layers, reps, gate) for qubit in range_qubits]

def get_time_over_gates(range_qubits, layers, reps, range_gates):
    return np.array([get_time_over_qubits(range_qubits, layers, reps, gate) for gate in range_gates])

results = get_time_over_gates(range(2, 8), layers, 1000, sped_up_gates)

@trbromley trbromley added the WIP 🚧 Work-in-progress label Aug 21, 2020
@trbromley trbromley self-assigned this Aug 21, 2020
@codecov
Copy link

codecov bot commented Aug 21, 2020

Codecov Report

Merging #772 into master will increase coverage by 0.10%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #772      +/-   ##
==========================================
+ Coverage   93.42%   93.53%   +0.10%     
==========================================
  Files         115      115              
  Lines        7044     7110      +66     
==========================================
+ Hits         6581     6650      +69     
+ Misses        463      460       -3     
Impacted Files Coverage Δ
pennylane/_qubit_device.py 98.63% <100.00%> (+0.01%) ⬆️
pennylane/devices/default_qubit.py 98.83% <100.00%> (+0.51%) ⬆️
pennylane/devices/default_qubit_autograd.py 97.67% <100.00%> (+0.45%) ⬆️
pennylane/devices/default_qubit_tf.py 88.23% <100.00%> (+1.27%) ⬆️
pennylane/devices/autograd_ops.py 97.29% <0.00%> (+8.10%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update bd1ad5a...4c8178d. Read the comment docs.

@trbromley trbromley changed the title [WIP] Speed up default.qubit by using permutations to apply some gates Speed up default.qubit by using permutations to apply some gates Aug 21, 2020
Comment on lines 167 to 171
apply_func = self._ops_map.get(operation.name, None)
if apply_func:
self._state = apply_func(self._state, axes, inverse=operation.inverse)
return

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was partially motivated from pylint complaining about too many return statements, but also it seemed a bit bloated in the other approach which was:

if isinstance(operation, Gate):
    self._state = self._apply_gate(self_state, axes)

x9 times.

The downside of the current approach is that all the methods need a compatible signature, resulting in a **kwargs addition in some cases (which pylint also didn't like 😆, hence the ignore on line 157)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

An alternate approach is to change them to methods, with a signature like:

def _apply_y(self, state, axes, roll_fn=np.roll, stack_fn=np.stack):

e.g., you pass the functions to use for rolling and stacking. But, I don't think this is any cleaner then the current approach (it's probably worse).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, could also work if we want to make these methods functions, although not sure how it would scale if we want to add other array functions to the argument. I guess we could pass a container of array functions.

@trbromley trbromley marked this pull request as ready for review August 24, 2020 14:35
@trbromley
Copy link
Contributor Author

I did a second round of benchmarks, this time on a server rather than my laptop to minimize the influence of other running processes.

image
image
image

For me, this leaves the following gates up for discussion:

  • CZ
  • Hadamard
  • PauliY

Luckily, CZ and PauliY are less commonly used, but Hadamard is pretty common. One thing we can do is prevent the special methods for these gates from occurring in default.qubit.autograd and/or default.qubit.tf.

For default.qubit.tf, I propose to prevent CZ (1355e97)
For default.qubit.autograd I propose to prevent all three (0d5edb4)
For default.qubit, let's keep all the methods!

Comment on lines +114 to +116
del self._apply_ops["PauliY"]
del self._apply_ops["Hadamard"]
del self._apply_ops["CZ"]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😆 This is to give better benchmarks?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes 😅

Copy link
Contributor

@antalszava antalszava Aug 28, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How so? 😄 Edit: saw the benchmark files. Might be worth leaving a comment here describing that a poorer performance was observed for this gates.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added: 26e9ab3

Copy link
Member

@josh146 josh146 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! 🎉



def _get_slice(index, axis, num_axes):
"""Allows slicing along an arbitrary axis of an array or tensor.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could be worth adding an example, would also help understanding the exact role of each argument.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, in the quantum sense, is it fair to say that this is a _get_subsystem function? If so and didn't miss anything, might be worth a renaming, just because that it might make it more intuitive when reading how the operations are being applied

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, have added an example: 9efdbd5.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it fair to say that this is a _get_subsystem function?

No, it's slightly different. In a quantum sense, suppose we have a three qubit system with state tensor having shape (2, 2, 2). Then _get_slice(1, 1, 3) gives us all of the amplitudes that correspond to having a |1> state in qubit 1. Likewise, _get_slice(1, 2, 3) would be all the amplitudes with a |1> state in qubit 2, while _get_slice(0, 1, 3) would be all the amplitudes with a |0> state in qubit 1. So really what _get_slice() allows you to do is to slice into the state so that your selecting |0> or |1> for a given qubit. This is relevant when, for example, we want to do control or when we want to apply a phase on the |1> state.

Unfortunately, I'm not sure of the best name 😆

Comment on lines 255 to 264
# If axis[1] is larger than axis[0], then state[sl_1] will have self.num_wires - 1 axes with
# the target axes shifted down by one. Otherwise, if axis[1] is less than axis[0] then its
# axis number in state[sl_1] remains unchanged. For example: state has axes [0, 1, 2, 3] and
# axis[0] = 1 and axis[1] = 3. Then, state[sl_1] has axes [0, 1, 2] so that the target axis
# has shifted from 3 to 2. If axis[0] = 2 and axis[1] = 1, then state[sl_1] has axes
# [0, 1, 2] but with the target axis remaining unchanged.
if axes[1] > axes[0]:
target_axes = [axes[1] - 1]
else:
target_axes = [axes[1]]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This part is not entirely clear to me when going for it for some time.

then state[sl_1] will have self.num_wires - 1 axes with
# the target axes shifted down by one

Why is it so?

One-piece that is missing is how the first wire is being the control (although the operation does seem to be equivalent). Might be worth explaining perhaps in the main docstring, how using slices and the axes we can end up with the same operation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤔 I've tried to explain a bit better, and also add to the main docstring.
7059c50

@@ -142,6 +180,148 @@ def _apply_operation(self, operation):
else:
self._apply_unitary(matrix, wires)

def _apply_x(self, state, axes, **kwargs):
"""Applies a PauliX gate by rolling 1 unit along the axis specified in ``axes``.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be worth elaborating on why this is equivalent to applying a PauliX

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I tried to add an explanation, but I have to admit these things are hard to explain!

789a724

Copy link
Contributor Author

@trbromley trbromley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @antalszava, good suggestions! I've had a go at making changes and it's ready for another look.



def _get_slice(index, axis, num_axes):
"""Allows slicing along an arbitrary axis of an array or tensor.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, have added an example: 9efdbd5.



def _get_slice(index, axis, num_axes):
"""Allows slicing along an arbitrary axis of an array or tensor.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it fair to say that this is a _get_subsystem function?

No, it's slightly different. In a quantum sense, suppose we have a three qubit system with state tensor having shape (2, 2, 2). Then _get_slice(1, 1, 3) gives us all of the amplitudes that correspond to having a |1> state in qubit 1. Likewise, _get_slice(1, 2, 3) would be all the amplitudes with a |1> state in qubit 2, while _get_slice(0, 1, 3) would be all the amplitudes with a |0> state in qubit 1. So really what _get_slice() allows you to do is to slice into the state so that your selecting |0> or |1> for a given qubit. This is relevant when, for example, we want to do control or when we want to apply a phase on the |1> state.

Unfortunately, I'm not sure of the best name 😆

@@ -142,6 +180,148 @@ def _apply_operation(self, operation):
else:
self._apply_unitary(matrix, wires)

def _apply_x(self, state, axes, **kwargs):
"""Applies a PauliX gate by rolling 1 unit along the axis specified in ``axes``.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I tried to add an explanation, but I have to admit these things are hard to explain!

789a724

pennylane/devices/default_qubit.py Outdated Show resolved Hide resolved
Comment on lines +114 to +116
del self._apply_ops["PauliY"]
del self._apply_ops["Hadamard"]
del self._apply_ops["CZ"]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added: 26e9ab3

Comment on lines 255 to 264
# If axis[1] is larger than axis[0], then state[sl_1] will have self.num_wires - 1 axes with
# the target axes shifted down by one. Otherwise, if axis[1] is less than axis[0] then its
# axis number in state[sl_1] remains unchanged. For example: state has axes [0, 1, 2, 3] and
# axis[0] = 1 and axis[1] = 3. Then, state[sl_1] has axes [0, 1, 2] so that the target axis
# has shifted from 3 to 2. If axis[0] = 2 and axis[1] = 1, then state[sl_1] has axes
# [0, 1, 2] but with the target axis remaining unchanged.
if axes[1] > axes[0]:
target_axes = [axes[1] - 1]
else:
target_axes = [axes[1]]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤔 I've tried to explain a bit better, and also add to the main docstring.
7059c50

Comment on lines +277 to +282
# We will be slicing into the state according to state[sl_1], giving us all of the
# amplitudes with a |1> for the control qubit. The resulting array has lost an axis
# relative to state and we need to be careful about the axis we apply the PauliX rotation
# to. If axes[1] is larger than axes[0], then we need to shift the target axis down by
# one, otherwise we can leave as-is. For example: a state has [0, 1, 2, 3], control=1,
# target=3. Then, state[sl_1] has 3 axes and target=3 now corresponds to the second axis.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚀

Copy link
Contributor

@antalszava antalszava left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me! 💯 great one 😊

@trbromley trbromley added the merge-ready ✔️ All tests pass and the PR is ready to be merged. label Aug 28, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
merge-ready ✔️ All tests pass and the PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants