-
Notifications
You must be signed in to change notification settings - Fork 575
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add device and gradient expansions to the new batch-execution pipeline #1651
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still need to go through the tests for interfaces from autograd onward.
@@ -106,21 +107,52 @@ | |||
significant performance improvement when executing the QNode on remote | |||
quantum hardware. | |||
|
|||
- When decomposing the circuit, the default decomposition strategy will prioritize |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
when do decompositions generally occur? e.g if I was running a simple optimisation with a PL circuit, at what level does this happen on hadware/simulator?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-
expansion_strategy="device"
: decomposition happens in the QNode, during construction, by querying the device for its supported gate set. This is beneficial in terms of overhead (since the decomposition only happens once), but results in future quantum transforms/compilations working with a potentially very big/deep circuit. -
expansion_strategy="gradient"
: decomposition happens in the QNode, during construction, by querying the gradient transform. Typically, the decomposed circuit will not be as deep as the device-decomposed one, since a lot of complex unitaries have gradient rules defined. Later on, further decompositions may be required on the device to get the circuit down to native gate sets.This is beneficial in terms of a reduction in quantum resources, at the expense of moving the device decomposition down to every evaluation of the device (so additional classical overhead).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is beneficial in terms of a reduction in quantum resources, at the expense of moving the device decomposition down to every evaluation of the device (so additional classical overhead).
Seems to be yet another benefit for caching parametric circuits to reuse device translations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, precisely 💯 I would even argue, this is only fully solved by parametric compilation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Followup question: suppose I do something like
@qml.compile()
@qml.qnode(dev, expansion_strategy="device")
def some_qnode():
# stuff
(or alternatively the gradient strategy). When does decomposition happen currently vs. in this new PR w.r.t. the compilation transform? As you suggest @josh146 we would want compilation to happen before either the device
or gradient
strategy so that compilation it is acting on a smaller circuit rather than the full-depth expanded one, and consequently leading to a smaller circuit that gets expanded / gradient transformed. (That said, it's possible that a decomposition leads to optimizations in the compilation pipeline that might not otherwise be recognized...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the detailed explanation @josh146, that makes things very clear! The gradient transform continues to impress me!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@glassnotes, correct me if wrong, but compile()
is a qfunc transform, not a QNode transform? So the following order is needed:
@qml.qnode(dev, expansion_strategy="device")
@qml.compile()
def some_qnode():
# stuff
and just based on the ordering, the compile
transform would always occur prior to the QNode's expansions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes you're 100% correct, my bad 🤦♀️
|
||
- `Device.expand_fn(tape) -> tape`: expands a tape such that it is supported by the device. By | ||
default, performs the standard device-specific gate set decomposition done in the default | ||
QNode. Devices may overwrite this method in order to define their own decomposition logic. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can a user overwrite this logic?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, this is a good point 🤔
At the moment yes, but it looks a bit hacky. You could do something like this:
>>> def my_custom_expand_fn(tape, **kwargs):
... print("hello")
... return tape
>>> qnode.device.expand_fn = my_custom_expand_fn
>>> qnode(0.5)
hello
tensor(0.87758256, requires_grad=True)
Hmmm 🤔 Do you think this will be useful?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@glassnotes I think @anthayes92 might be on to something re: custom decompositions....
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean, we could even support something like how you currently 'register' QNode execution wrappers while writing a batch transform
dev = qml.device("default.qubit", wires=2)
@dev.custom_expand
def my_expansion_function(tape, **kwargs):
...
return tape
# from now on, the custom expansion is called whenever
# the device is executed.
This is more powerful (too powerful?) compared to a dictionary of gates to custom decompositions. But I still have some question marks:
-
Should this replace the device expansion?
-
If it doesn't, does it come before the device expansion? This way unsupported gates are finally decomposed down to device native gates. Or should it come after the device expansion? Execution would then fail if the custom decomp results in an unsupported gate.
-
Rather than changing the device, does it make more sense to pass a custom decomposition to the QNode? E.g.,
@qml.qnode(dev, expansion_strategy=my_custom_expansion) # or @existing_qnode.register_expansion def my_custom_expansion(...):
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is more powerful (too powerful?) compared to a dictionary of gates to custom decompositions.
How would custom decompositions be specified in these cases?
What if we did something like this, which combines a few of the ideas floating around:
custom_decomps = {qml.Hadamard : h_func, qml.CNOT : cnot_func}
def custom_expand(tape, custom_decomps):
# applies custom decompositions
qnode.device.set_expand_fn(custom_expand)
but where the device
itself does some sort of internal validation of the decompositions?
def set_expand_fn(custom_decomps):
for op, decomp in custom_decomps.items():
# Ensure all the operations in the decomposition are valid for the device
...
# Register the new decompositions to the operators
if decomp_is_valid:
op.register_new_decomposition(decomp)
If we do this kind of validation, it ensures that we can apply the expansion after the gradient tapes have already been constructed, but with the guarantee that they'll still run on the device.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really like @glassnotes suggestion of having expansion after the gradient tapes have already been constructed! So would the logic here look like: if custom gates are unsupported then decompose to device native gates, so that this is where the guarantee they'll still run on the device comes from?
params = new_tape.get_parameters(trainable_only=False) | ||
new_tape.trainable_params = qml.math.get_trainable_indices(params) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this related to the Unwrap
issue?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left some initial questions, will give things some time to sink in and come back to it later 🙂
|
||
- `Device.expand_fn(tape) -> tape`: expands a tape such that it is supported by the device. By | ||
default, performs the standard device-specific gate set decomposition done in the default | ||
QNode. Devices may overwrite this method in order to define their own decomposition logic. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is more powerful (too powerful?) compared to a dictionary of gates to custom decompositions.
How would custom decompositions be specified in these cases?
What if we did something like this, which combines a few of the ideas floating around:
custom_decomps = {qml.Hadamard : h_func, qml.CNOT : cnot_func}
def custom_expand(tape, custom_decomps):
# applies custom decompositions
qnode.device.set_expand_fn(custom_expand)
but where the device
itself does some sort of internal validation of the decompositions?
def set_expand_fn(custom_decomps):
for op, decomp in custom_decomps.items():
# Ensure all the operations in the decomposition are valid for the device
...
# Register the new decompositions to the operators
if decomp_is_valid:
op.register_new_decomposition(decomp)
If we do this kind of validation, it ensures that we can apply the expansion after the gradient tapes have already been constructed, but with the guarantee that they'll still run on the device.
params = new_tape.get_parameters(trainable_only=False) | ||
new_tape.trainable_params = qml.math.get_trainable_indices(params) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So a lot of the current logic is simply guided by 'this causes the tests to pass, for all interfaces, for all QNode variations, for all order derivatives, for all differentiation methods'.
This is how I feel any time I have to write interface tests 😓
@@ -67,6 +67,7 @@ class StronglyEntanglingLayers(Operation): | |||
num_params = 1 | |||
num_wires = AnyWires | |||
par_domain = "A" | |||
grad_method = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Re. your comment in the PR description about grad methods,
This operation supports finite-diff but not parameter-shift. Decompose it for parameter-shift support.
Could we make this parameter an ordered list of grad methods, or tuples with grad methods and additional info? For example, (grad_method, requires_decomposition_to_do_grad_method)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes definitely! The grad_method
and grad_recipe
attributes are long overdue for an overhaul. I believe they're on the agenda as part of the operator refactor
num_params = 1 | ||
par_domain = "R" | ||
|
||
def expand(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unrelated to this PR, but should all operations eventually have their decomposition
method replaced by an expand
like this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be decided 🤔 Another task for the Operator refactor story 😆
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few small suggestions, otherwise looking great!
|
||
- `Device.expand_fn(tape) -> tape`: expands a tape such that it is supported by the device. By | ||
default, performs the standard device-specific gate set decomposition done in the default | ||
QNode. Devices may overwrite this method in order to define their own decomposition logic. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really like @glassnotes suggestion of having expansion after the gradient tapes have already been constructed! So would the logic here look like: if custom gates are unsupported then decompose to device native gates, so that this is where the guarantee they'll still run on the device comes from?
Co-authored-by: Olivia Di Matteo <2068515+glassnotes@users.noreply.github.com>
Co-authored-by: anthayes92 <34694788+anthayes92@users.noreply.github.com>
Co-authored-by: anthayes92 <34694788+anthayes92@users.noreply.github.com>
@@ -106,21 +107,52 @@ | |||
significant performance improvement when executing the QNode on remote | |||
quantum hardware. | |||
|
|||
- When decomposing the circuit, the default decomposition strategy will prioritize |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Followup question: suppose I do something like
@qml.compile()
@qml.qnode(dev, expansion_strategy="device")
def some_qnode():
# stuff
(or alternatively the gradient strategy). When does decomposition happen currently vs. in this new PR w.r.t. the compilation transform? As you suggest @josh146 we would want compilation to happen before either the device
or gradient
strategy so that compilation it is acting on a smaller circuit rather than the full-depth expanded one, and consequently leading to a smaller circuit that gets expanded / gradient transformed. (That said, it's possible that a decomposition leads to optimizations in the compilation pipeline that might not otherwise be recognized...)
Context: The beta QNode, introduced in #1642, uses the new batch-execution pipeline internally, but does not yet support decompositions of circuits.
This PR adds circuit decomposition support, but in a different approach to the existing QNode.
For example, consider the
DoubleExcitation
operation. This operation decomposes down into eight parametrized RY gates, so in the existing pipeline, would require 16 evaluations to compute the gradient:However, the
DoubleExcitation
operation supports a 4-term parameter-shift rule natively. Performing the device expansion later is therefore prefereable:Thus, in this PR, we add gradient specific decomposition to the QNode construction step, and move device-specific expansions to the device.
Description of the Change:
Two new methods were added to the Device API:
Device.expand_fn(tape) -> tape
: expands a tape such that it is supported by the device. By default, performs the standard device-specific gate set decomposition done in the default QNode. Can be overwritten by the device. Note that the output is 1-1; the expanded tape returns exactly the same value as the original tape, no post-processing required.Device.batch_transform(tape) -> tapes, processing_fn
: pre-processes the tape in the case where the device needs to generate multiple circuits to execute from the input circuit. The requirement of a post-processing function makes this distinct to theexpand_fn
method above. By default, applies the transformfor devices that do not natively support Hamiltonians with non-commuting terms.
At the end of
QNode.construct()
, we callgradient_fn.expand_fn(tape)
to expand out the circuit so that all operations present have gradient rules defined. Only applies if a gradient transform is being used, and gradient transforms specify the expansion logic. E.g.,DoubleExcitation
defines a gradient recipe for parameter-shift and so will be left as is, butStronglyEntanglingLayers
doesn't, and so will be expanded.Within
QNode.__call__
, prior toexecute(tapes)
being called, we apply the device'sbatch_transform
.qml.execute()
is modified, to ensure thatdevice.expand_fn(tape)
is called before passing a tape onto a device for execution.All templates have
grad_method=None
added, to specify that they do not have a gradient method. This will trigger a decomposition into operations that do have a gradient method. This change is required because, by default,grad_method="F"
by default!! Which is a silly default :(Unit tests have been added to
tests/beta/test_beta_qnode.py
, and integration tests added totests/interfaces/batch/test_batch_interface_qnode.py
.Benefits:
The new QNode will now automatically decompose templates/operations not supported by the device.
If a template/operation has a gradient rule, but is not supported by the device, the gradient logic will be applied prior to decomposition, leading to a significant reduction in circuit evals. In particular,
AllSingleDoubles
will result in far fewer circuit executions.Devices are now in control of tape expansion, and device developers can overwrite
Device.expand_fn
andDevice.batch_transform
for full control of circuit manipulation.Possible Drawbacks:
I originally wanted to call the new method
Device.expand()
, but this is already taken by an existing device 🤦Previously, circuit decomposition for the device gate set was done only once in
QNode.construct()
. However, now that expansion is moved to the device, it happens with every execution. While this has quantum savings with respect to gradient executions, it results in a classical overhead --- the decompositions now happen multiple times a QNode is called.The
Operation.grad_method
attribute is really old, and predates a lot of quantum gradient research. As a result, it is not flexible enough for our needs. The following options are allowed:grad_method=None
: this operation does not support gradients, attempt to decompose itgrad_method="F"
: this operation only supports finite-diff, using parameter-shift will raise an errorgrad_method="A"
: this operation supports both parameter-shift and finite-diffHowever, we are missing an option for:
This latter behaviour is needed for operations such as
ApproxTimeEvolution(H, t, L)
. This operation can be differentiated using finite-differences to get a gradient with only 2 evaluations. For the parameter-shift, it can be decomposed, requiringO(2NL)
(I think?) evaluations to get the gradient. However, there is no value ofgrad_method
we can set that will 'unlock' this behaviour currently.Related GitHub Issues: n/a