### An example of Automatic Differentiation

In [1]:
using PauliPropagation

Note that we will define a lot of variables going forward as constant via the `const` syntax. In Julia, this does not fix the value of the variable, but its type. This is vital when using global variables inside functions so that performance is maintained.

In [2]:
# denote with `const` so the code that uses this global variable remains fast
const nq = 32

# denote with `const` so the code that uses this global variable remains fast
const topology = bricklayertopology(nq);

We define a transverse field Hamiltonian, whose derivative we will compute. This could be used within a variational energy minimization routine to find its ground state. 

The Hamiltonian here reads $H = \sum_{i}X_i + \sum_{\langle i, j\rangle}Z_iZ_j$ where $ \langle i, j\rangle$ denotes neighbors on the topology.

In [3]:
H = PauliSum(nq)

for qind in 1:nq
    add!(H, :X, qind, 1.0)
end

for pair in topology
    add!(H, [:Z, :Z], collect(pair), 1.0)
end

H

PauliSum(nqubits: 32, 63 Pauli terms:
 1.0 * IZZIIIIIIIIIIIIIIIII...
 1.0 * IIIIIIIIIIIIIXIIIIII...
 1.0 * IIIIIXIIIIIIIIIIIIII...
 1.0 * IIIIIIIIIIIIIIIIIIII...
 1.0 * IIIIIIIIIIIIIIIIIIII...
 1.0 * IIIIIIIIIIIIIIIIXIII...
 1.0 * IIIXIIIIIIIIIIIIIIII...
 1.0 * IIIIIIIIIIIIIIXIIIII...
 1.0 * IIIIIIIXIIIIIIIIIIII...
 1.0 * IXIIIIIIIIIIIIIIIIII...
 1.0 * IIIIIIIIIIIIIZZIIIII...
 1.0 * IIIIIIIIIIIIIIIIIIIX...
 1.0 * IIIIIIXIIIIIIIIIIIII...
 1.0 * IIIIIIIIZZIIIIIIIIII...
 1.0 * IIIIXIIIIIIIIIIIIIII...
 1.0 * IIIIIZZIIIIIIIIIIIII...
 1.0 * IIIIIIIZZIIIIIIIIIII...
 1.0 * IIIIIIIIIIIIIIIIIIXI...
 1.0 * IIIIIIIIIIIIIIIIIIII...
 1.0 * IIIIIIIIIIIIIIIIIIII...
  ⋮)

Define some generic quantum circuit

In [4]:
nl = 4

# denote with `const` so the code that uses this global variable remains fast
const circuit = hardwareefficientcircuit(nq, nl; topology=topology)
nparams = countparameters(circuit)

508

Importantly, we need to set our truncations. Depending on which package you are using to compute your gradients, you can use different truncations. 

`ReverseDiff` for example is a sophisticated package for automatic _reverse-mode_ differentiation. It will build a computational graph that it then differentiates using the chain rule. This is how large-scale neural networks are trained, and is commonly referred to as gradient _backpropagation_. The challenge here is that the graph for the chain rule is computed once (to the best of our knowledge), which means that only truncations during the initial computation will be respected. Truncations that we think work well here are `max_weight`, `max_freq`, and `max_sins`, as they do not depend on the particular parameters of the quantum circuit. On the other hand, which paths are explore with truncations such as `min_abs_coeff` will not be updated (again, to the best of our knowledge) as the gradients are computed.

Packages such as `ForwardDiff` or manual numerical differentiation, on the other hand, always involve computation of the loss function, which is affected by all truncations. Unfortunately, these methods are slower for circuits with more than several dozen parameters.

So let's wrap the coefficients into `PauliFreqTracker`, which keeps track how many times a path splits at a `PauliRotation`. We will use this to truncate our simulation, i.e., we will set a `max_freq` truncation. One could also truncate on `min_abs_coeff`, but `ReverseDiff` would not continually update which paths are truncated as you train based on which currently have small coefficient (at least we think so).

In [5]:
# the fields on PauliFreqTracker
fieldnames(PauliFreqTracker)

(:coeff, :nsins, :ncos, :freq)

The `coeff` field carries the coefficient as you are used to. But the other fields are used to keep track of additional things.

In [6]:
# an example PauliFreqTracker
# actually, PauliFreqTracker(1.0) also initializes the other fields to 0

# PauliFreqTracker(1.0)
PauliFreqTracker(1.0, 0, 0, 0)

PauliFreqTracker{Float64}(coeff=1.0, nsins=0, ncos=0, freq=0)

In [7]:
# create a PauliString with PauliFreqTracker coefficient
wrapped_pstr = PauliString(2, :X, 1, PauliFreqTracker(0.5, 0, 0, 0))

PauliString(nqubits: 2, PauliFreqTracker(0.5) * XI)

In [8]:
# showcase what this PathProperties type tracks
gate = PauliRotation([:Z, :Z], [1, 2])
θ = 0.4
wrapped_psum = propagate(gate, wrapped_pstr, θ)

PauliSum(nqubits: 2, 2 Pauli terms:
 PauliFreqTracker(-0.19471) * YZ
 PauliFreqTracker(0.46053) * XI
)

You can see that we tracked that both Pauli paths branched once (`freq=1`), and they received a `cos` or `sin` coefficient.

In [9]:
coefficients(wrapped_psum)

ValueIterator for a Dict{UInt8, PauliFreqTracker{Float64}} with 2 entries. Values:
  PauliFreqTracker{Float64}(coeff=-0.19470917115432526, nsins=1, ncos=0, freq=1)
  PauliFreqTracker{Float64}(coeff=0.46053049700144255, nsins=0, ncos=1, freq=1)

You can truncate based on these properties, which is valid even when being completely agnostic towards the parameters of the circuit.

In [10]:
# truncates everything that splits more than 0 times - this is everything
wrapped_psum = propagate(gate, wrapped_pstr, θ; max_freq = 0)

PauliSum(nqubits: 2, (no Pauli strings))

In [11]:
# truncates everything with more with nsins > 0
wrapped_psum = propagate(gate, wrapped_pstr, θ; max_sins = 0)

PauliSum(nqubits: 2, 1 Pauli term: 
 PauliFreqTracker(0.46053) * XI
)

In [12]:
# these truncations do not work with the coefficients are not properly wrapped
propagate(gate, pstr, θ; max_freq = 1, max_sin = 1)

LoadError: UndefVarError: `pstr` not defined in `Main`
Suggestion: check for spelling errors or missing imports.

Because it can be annoying to define your observable like this, we provide the function `wrapcoefficients()`, which returns a new `PauliString` or `PauliSum` where all coefficients are wrapped in the `PathProperties` type provided.

In [13]:
pstr = PauliString(2, :X, 1, 0.5)
wrapcoefficients(pstr, PauliFreqTracker) == wrapped_pstr

true

Now do this to our Hamiltonian.

In [14]:
wrapped_H = wrapcoefficients(H, PauliFreqTracker)

PauliSum(nqubits: 32, 63 Pauli terms:
 PauliFreqTracker(1.0) * IIIIIIIIIIIIIIIIIIII...
 PauliFreqTracker(1.0) * IIXIIIIIIIIIIIIIIIII...
 PauliFreqTracker(1.0) * IIIIIIIIIIIIIIIXIIII...
 PauliFreqTracker(1.0) * IZZIIIIIIIIIIIIIIIII...
 PauliFreqTracker(1.0) * IIIIIIIIIIIIIXIIIIII...
 PauliFreqTracker(1.0) * IIIIIIIIIIIIIIIIIIII...
 PauliFreqTracker(1.0) * IIIIIIIIIIIIIIIIIIII...
 PauliFreqTracker(1.0) * IIIIIIIIIIIIZZIIIIII...
 PauliFreqTracker(1.0) * IIIIIIIIIIIIIIIIIIII...
 PauliFreqTracker(1.0) * IIIIZZIIIIIIIIIIIIII...
 PauliFreqTracker(1.0) * IIIIIIIIIIIIIIIIIIII...
 PauliFreqTracker(1.0) * IIIIIXIIIIIIIIIIIIII...
 PauliFreqTracker(1.0) * IIIIIIIIIIIIIIIIIIII...
 PauliFreqTracker(1.0) * IIIIIIIIIIIIIIIIIIII...
 PauliFreqTracker(1.0) * IIIIIIIIIIIIIIIIXIII...
 PauliFreqTracker(1.0) * IIIIIIIIIIIIIIIZZIII...
 PauliFreqTracker(1.0) * IIIIIIIIIIIIIIIIIIII...
 PauliFreqTracker(1.0) * IIIXIIIIIIIIIIIIIIII...
 PauliFreqTracker(1.0) * IIIIIIZZIIIIIIIIIIII...
 PauliFreqTracker(1.0) * IIIIII

Generate some generic parameters

In [15]:
using Random
Random.seed!(42)
thetas = randn(nparams);

One expectation evaluation

In [16]:
max_freq = 30
max_weight = 5

@time psum = propagate(circuit, wrapped_H, thetas; max_freq, max_weight);
overlapwithzero(psum)

  0.963401 seconds (343.12 k allocations: 27.397 MiB, 0.99% gc time, 27.83% compilation time)


1.057832381193968

Now wrap it into a function that takes only `thetas` as argument. This is why we denoted many global variables as `const`, because we use them in here. Alternatively, one could have used so called `let` blocks for local variable namespaces.

This loss function does not work because the `ReverseDiff` package needs to propagate its custom coefficient type. But `H` is already stricktly typed. So the following loss function would not be automatically differentiable.

In [17]:
function naivelossfunction(thetas)
    # some truncations
    max_freq = 30
    max_weight = 5
    
    psum = propagate(circuit, wrapped_H, thetas; max_freq, max_weight);
    return overlapwithzero(psum)
end

naivelossfunction (generic function with 1 method)

In [18]:
@time naivelossfunction(thetas)

  1.039100 seconds (2.22 M allocations: 118.814 MiB, 0.89% gc time, 33.48% compilation time)


1.057832381193968

An example of how it would break:

In [19]:
# using Pkg; Pkg.add("ReverseDiff")
using ReverseDiff: gradient

gradient(naivelossfunction, thetas)

LoadError: ArgumentError: Converting an instance of ReverseDiff.TrackedReal{Float64, Float64, Nothing} to Float64 is not defined. Please use `ReverseDiff.value` instead.

We now create a loss function that does indeed work. It requires that we build the Hamiltonian with the correct coefficient type, which here is the element type of `thetas`. This will make everything differentiable.

In [20]:
function lossfunction(thetas)
    # differentiation libraries use custom types to trace through the computation
    # we need to make all of our objects typed like that so that nothing breaks
    CoeffType = eltype(thetas)

    # define H again 
    H = PauliSum(nq, CoeffType)
    for qind in 1:nq
        add!(H, :X, qind, CoeffType(1.0))
    end
    for pair in topology
        add!(H, [:Z, :Z], collect(pair), CoeffType(1.0))
    end

    # wrapp the coefficients into PauliFreqTracker so that we can use `max_freq` truncation.
    wrapped_H = wrapcoefficients(H, PauliFreqTracker)

    # be also need to run the in-place version with `!`, because by default we copy the Pauli sum
    wrapped_H = propagate!(circuit, wrapped_H, thetas; max_freq, max_weight);
    return overlapwithzero(wrapped_H)
end

lossfunction (generic function with 1 method)

Instead, we need to define a loss function that creates H every time with the correct coefficient type:

In [21]:
@time lossfunction(thetas)

  0.753571 seconds (208.14 k allocations: 21.325 MiB, 8.81% compilation time)


1.057832381193968

And gradients work.

In [22]:
@time gradient(lossfunction, thetas)

  6.752837 seconds (96.25 M allocations: 4.028 GiB, 21.31% gc time, 13.65% compilation time)


508-element Vector{Float64}:
  0.1853732642158934
  0.1953702986042907
  0.30751296953662316
  0.9695704885915631
 -0.021084983260654746
  0.6163209188019104
  0.9534016924800559
 -1.1217238397902465
  1.0582131923675357
 -0.5999281572290621
 -0.01901068262210987
 -0.7257847903805977
  0.4902123146539921
  ⋮
  0.289271896337007
  0.6294436192719693
  0.43775073471484177
 -0.6755773301482302
  0.3359915238950176
 -0.03296310450242135
 -0.015475732504339073
  0.38865941344257704
 -0.44246991175879197
 -0.372145658742357
 -0.2178728223760708
  0.29274707160701796

Now import ReverseDiff and follow their example for very fast gradients:

In [23]:
using ReverseDiff: GradientTape, gradient!, compile

In [24]:
### This is following an ReverseDiff.jl example

# some inputs and work buffer to play around with
grad_array = similar(thetas);

# pre-record a GradientTape for `gradsimulation` using inputs of length m with Float64 elements
@time const simulation_tape = GradientTape(lossfunction, thetas)

# first evaluation compiles and is slower
@time gradient!(grad_array, simulation_tape, thetas)
# second evaluation
@time gradient!(grad_array, simulation_tape, thetas);

  5.239832 seconds (93.65 M allocations: 3.903 GiB, 32.54% gc time, 0.09% compilation time)
  1.781946 seconds (238.94 k allocations: 11.785 MiB, 5.92% compilation time)
  1.669992 seconds


In [25]:
# compile to make it even faster
@time const compiled_simulation_tape = compile(simulation_tape)

# some inputs and work buffer to play around with
grad_array_compiled = similar(thetas);

# first evaluation compiles and is slower
@time gradient!(grad_array_compiled, compiled_simulation_tape, thetas)
# second evaluation
@time gradient!(grad_array_compiled, compiled_simulation_tape, thetas);

 10.180541 seconds (151.43 M allocations: 6.198 GiB, 23.98% gc time, 2.66% compilation time)
  1.458759 seconds (74.84 k allocations: 3.675 MiB, 4.08% compilation time)
  1.329400 seconds


`grad_array` here carries the gradient result. It is changed in-place in `gradient!` so that the array does not need to get allocated over and over.

See how calculating the gradient is only a few times slower than calculating the loss! The magic if reverse-mode differentiation.