Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add caching to the autograd batch interface #1508

Merged
merged 66 commits into from
Aug 20, 2021
Merged

Add caching to the autograd batch interface #1508

merged 66 commits into from
Aug 20, 2021

Conversation

josh146
Copy link
Member

@josh146 josh146 commented Aug 9, 2021

Context: In #1501, batch_execute was made differentiable in the Autograd interface using the new qml.gradients subpackage. However, since the new qml.gradients subpackage is differentiable, this allows for out-of-the-box higher-order derivatives as long as the new Autograd interface is recursive.

Thus, not only do we have 3rd order and higher derivatives, we are able to:

  • Extend the Hessian to finite-differences (previously just supported parameter-shift)
  • Extend the Hessian to all operations (previously supported just 2-term shift gates)
  • Extend the Hessian to all measurements (previously supported just expval)

However, the recursive evaluation is not smart; the autodiff frameworks will traverse the recursive structure naively, resulting in redundant evaluations.

This PR is a result of thinking about the following two questions:

  1. How is the performance of the new Autograd batch_execute pipeline compared to master?
  2. How can we keep the recursive evaluation but eliminate redundant device evaluations?

Benchmarking

To test the performance of #1501 vs. master, I ran the following benchmark:

import pennylane as qml
from pennylane import numpy as np
from pennylane.interfaces.batch import execute
import time
from pennylane.interfaces.autograd import AutogradInterface

dev = qml.device("default.qubit", wires=3)


def batch(params):
    with qml.tape.QubitParamShiftTape() as tape1:
        qml.templates.StronglyEntanglingLayers(params, wires=[0, 1, 2])
        qml.expval(qml.PauliZ(0))

    tape1 = tape1.expand(
        stop_at=lambda obj: not isinstance(obj, qml.measure.MeasurementProcess)
        and dev.supports_operation(obj.name)
    )
    tapes = (tape1,)

    res = execute(tapes, dev, qml.gradients.param_shift, max_diff=2)
    return res[0][0]


def master(params):
    with AutogradInterface.apply(qml.tape.QubitParamShiftTape()) as tape1:
        qml.templates.StronglyEntanglingLayers(params, wires=[0, 1, 2])
        qml.expval(qml.PauliZ(0))

    tape1 = tape1.expand(
        stop_at=lambda obj: not isinstance(obj, qml.measure.MeasurementProcess)
        and dev.supports_operation(obj.name)
    )
    return tape1.execute(dev)


params = np.ones([4, 3, 3], requires_grad=True)

t_batch = []
t_master = []

for i in range(20):
    start = time.time()
    qml.jacobian(master)(params)
    t_master.append(time.time() - start)

for i in range(20):
    start = time.time()
    qml.jacobian(batch)(params)
    t_batch.append(time.time() - start)

print("batch_execute:\t", np.min(t_batch))
print("master:\t\t", np.min(t_master))

With the following results:

Recursive evaluation turned off (max_diff=1)
--------------------------------------------
batch_execute:   0.04995298385620117
master:          0.04774165153503418

Recursive evaluation turned on (max_diff=2)
---------------------------------------------
batch_execute:   0.06748104095458984
master:          0.046480655670166016

Interestingly:

  • The batch_execute pipeline is ~ the same speed when the recursive evaluation is turned on.
  • The batch_execute pipeline is slower when recursive evaluation is turned on.

Description of the changes

  • A new argument max_diff is added, that allows the user to specify at what 'depth'/'order' the recursive evaluation ends. E.g., setting it to max_diff=1 completely inactivates the recursive evaluation.

  • Caching is added to the qml.interfaces.execute() function, by way of a decorator. This decorator makes use of tape.hash to identify unique tapes.

    • If a tape does not match a hash in the cache, then the tape has not been previously executed. It is executed, and the result added to the cache.

    • If a tape matches a hash in the cache, then the tape has been previously executed. The corresponding cached result is extracted, and the tape is not passed to the execution function.

    • Finally, there might be the case where one or more tapes in the current set of tapes to be executed share a hash. If this is the case, duplicated are removed, to avoid redundant evaluations.

Benefits

Caching has a significant effect. E.g., consider the benchmarking example above, modified to compute the Hessian and display the number of executions:

params = np.ones([2, 2, 3], requires_grad=True)

qml.jacobian(qml.grad(master))(params)
print("Master: \t\t\tnum_exections = ", dev.num_executions)

dev._num_executions = 0
qml.jacobian(qml.grad(batch))(params, cache=False)
print("batch_execute (no caching): \tnum_exections = ", dev.num_executions)

dev._num_executions = 0
qml.jacobian(qml.grad(batch))(params, cachesize=1)
print("batch_execute (cachesize=1): \tnum_exections = ", dev.num_executions)

dev._num_executions = 0
qml.jacobian(qml.grad(batch))(params, cachesize=20)
print("batch_execute (cachesize=20): \tnum_exections = ", dev.num_executions)

dev._num_executions = 0
qml.jacobian(qml.grad(batch))(params, cachesize=1000)
print("batch_execute (cachesize=1000): num_exections = ", dev.num_executions)
Master:                         num_exections =  313
batch_execute (no caching):     num_exections =  601
batch_execute (cachesize=1):    num_exections =  576
batch_execute (cachesize=20):   num_exections =  557
batch_execute (cachesize=1000): num_exections =  301

By using a cache, we can reduce the number of evaluations beyond the minimum we currently have in master.

Questions

  • What should the default value of max_diff be?
  • The 'smart defaults' depend on the situation - for a remote device, max_diff>1 is probably fine?

@josh146 josh146 changed the title Add caching to the autograd backend [WIP] Add caching to the autograd backend Aug 9, 2021
@github-actions
Copy link
Contributor

github-actions bot commented Aug 9, 2021

Hello. You may have forgotten to update the changelog!
Please edit .github/CHANGELOG.md with:

  • A one-to-two sentence description of the change. You may include a small working example for new features.
  • A link back to this PR.
  • Your name (or GitHub username) in the contributors section.

Base automatically changed from batch-autograd to master August 16, 2021 14:14
@josh146 josh146 requested a review from trbromley August 16, 2021 17:01
Copy link
Contributor

@trbromley trbromley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @josh146, this is great! 🚀 I've left a few questions for my understanding.

pennylane/interfaces/batch/__init__.py Show resolved Hide resolved
pennylane/interfaces/batch/__init__.py Outdated Show resolved Hide resolved
pennylane/interfaces/batch/__init__.py Outdated Show resolved Hide resolved
pennylane/interfaces/batch/__init__.py Outdated Show resolved Hide resolved
pennylane/interfaces/batch/__init__.py Show resolved Hide resolved
pennylane/interfaces/batch/__init__.py Show resolved Hide resolved
Comment on lines +254 to +260
# disable caching on the forward pass
execute_fn = cache_execute(device.batch_execute, cache=None)

# replace the backward gradient computation
gradient_fn = device.gradients
gradient_fn = cache_execute(
device.gradients, cache, pass_kwargs=True, return_tuple=False
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably my unfamiliarity with the recent changes, but do we expect to need caching for device-based gradients? I thought this was mainly for parameter shift.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caching is only needed for device-based gradients if mode="backwards". Backwards mode essentially means:

  • On the forward pass, only the cost function is computed
  • The gradients are only requested during backpropagation

This means that there will always be 1 additional eval required -- caching therefore reduces the number of evals by 1 😆

Worth it?

I mean, I'd expect 99% of users to use device gradients with mode="forward".

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this supersede #1341?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No this complements it for now 🙂

#1341 added the use_device_state keyword argument which instructs QubitDevice.adjoint_jacobian() to use the existing device state and avoid a redundant forward pass.

When mode="forward", we can pass this option:

execute(
    tapes,
    dev,
    gradient_fn="device",
    interface="torch",
    gradient_kwargs={"method": "adjoint_jacobian", "use_device_state": True},
    mode="forward"
)

pennylane/interfaces/batch/autograd.py Outdated Show resolved Hide resolved
tests/interfaces/test_batch_autograd.py Show resolved Hide resolved
mode="best",
gradient_kwargs=None,
cache=True,
cachesize=10000,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have an idea of the memory implications of this? 🤔

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assuming you do not pass a cache object manually to the execute function, the cache will be created inside execute. What this means is that - as soon as execute has exited, the cache is out of scope and will be garbage collected by Python.

I am 99.99% sure of this, but don't know how to sanity check 😖

This is from the last time I tried to explore this: #1131 (comment)

Do you have any ideas on how to double check that the cache is deleted after execution?

@josh146 josh146 requested a review from trbromley August 18, 2021 16:38
Copy link
Contributor

@trbromley trbromley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @josh146 for the updates and comments! Looks great 💯

pennylane/interfaces/batch/__init__.py Outdated Show resolved Hide resolved
pennylane/interfaces/batch/__init__.py Outdated Show resolved Hide resolved
Comment on lines +254 to +260
# disable caching on the forward pass
execute_fn = cache_execute(device.batch_execute, cache=None)

# replace the backward gradient computation
gradient_fn = device.gradients
gradient_fn = cache_execute(
device.gradients, cache, pass_kwargs=True, return_tuple=False
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good!

Comment on lines +1536 to +1537
qml.RX(np.array(a), wires=[0])
qml.RY(np.array(b), wires=[1])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the np.array() left over from the previous test? Though I guess it doesn't matter because the hash should be the same.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh this was semi-intentional - I was trying to ensure that the datatype of the parameter doesn't affect hashing 😆

tests/tape/test_tape.py Outdated Show resolved Hide resolved
Comment on lines +1649 to +1650
"""Tests that the circuit hash of circuits with single-qubit
rotations differing by multiples of 2pi have identical hash"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh wow that's cool, didn't realisze we'd support that!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's required in order to reduce the number of Hessian evals to the optimum number (I don't think the autodiff frameworks are smart enough to do this cancelling out themselves).

Currently it's hardcoded in for the R and CR gates, but it would be cool to add this as an operation property:

class Rot(Operation):
    periodicity = [2 * np.pi, 2 * np.pi, 2 * np.pi]

tests/tape/test_tape.py Outdated Show resolved Hide resolved
Comment on lines +254 to +260
# disable caching on the forward pass
execute_fn = cache_execute(device.batch_execute, cache=None)

# replace the backward gradient computation
gradient_fn = device.gradients
gradient_fn = cache_execute(
device.gradients, cache, pass_kwargs=True, return_tuple=False
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this supersede #1341?

josh146 and others added 2 commits August 20, 2021 13:29
Co-authored-by: Tom Bromley <49409390+trbromley@users.noreply.github.com>
@josh146 josh146 merged commit 117599e into master Aug 20, 2021
@josh146 josh146 deleted the autograd-caching branch August 20, 2021 05:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
review-ready 👌 PRs which are ready for review by someone from the core team.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants