# How to Define Custom Gates
`PauliPropagation.jl` is extensible and allows you to define your own gates. Depending on how much you can or want to code, you can definte a gate that _works_ or one that is as fast as it gets. Here will see what you need to define.

In [1]:
using PauliPropagation

Let us start by defining a `SWAP` gate. It is `struct` sub-typing from `StaticGate`, which denotes that it does not take any variable parameters at propagation time. It always acts the same. In case you are not familiar wir Julia, `struct`s are most similar to Python's dataclasses in that they only carry simple information, but on a level much closer to the hardware. They can be worked with very efficiently if done well.

In [2]:
struct CustomSWAPGate <: StaticGate
    qinds::Tuple{Int, Int}  # The two sites to be swapped
end

The action of a `SWAP` gate on a Pauli string is that it swaps the Paulis on two sites. We can now define a function `apply` which receives these 4 arguments in this order, as well as potential `kwargs`: `apply(gate::YourGate, pstr, theta, coefficient; kwargs...)`. We can ignore `kwargs` for now, but you can use them to pass arguments from the top level down to your function. Also `theta` is always passed, but for `StaticGate`s you should ignore it.

This is how you can define `SWAP`:

In [3]:
function PauliPropagation.apply(gate::CustomSWAPGate, pstr, theta, coefficient; kwargs...)
    # get the Pauli on the first site
    pauli1 = getpauli(pstr, gate.qinds[1])
    # get the Pauli on the second site
    pauli2 = getpauli(pstr, gate.qinds[2])
    
    # set the Pauli on the first site to the second Pauli
    pstr = setpauli(pstr, pauli2, gate.qinds[1])
    # set the Pauli on the second site to the first Pauli
    pstr = setpauli(pstr, pauli1, gate.qinds[2])

    return pstr, coefficient
end

This is it, really.

Now set up the simulation. 25 qubits in a 5 by 5 grid.

In [4]:
nx = 5
ny = 5
nq = nx * ny

topology = get2dtopology(nx, ny);

`nl` layers of a circuit consisting of `RX` and `RZZ` Pauli rotations.

In [5]:
nl = 3
base_circuit = tfitrottercircuit(nq, nl; topology=topology);
nparams = countparameters(base_circuit)

195

Define our observable as $ Z_7 Z_{13} $.

In [6]:
pstr = PauliString(nq, [:Z, :Z], [7, 13])

PauliString(nqubits: 25, 1.0 * IIIIIIZIIIIIZIIIIIII...)

Circuit parameters with a random seed.

In [7]:
using Random
Random.seed!(42)
thetas = randn(nparams);

For this notebook, we will use a minimum coefficient threshold. The results are still almost exact.

In [8]:
min_abs_coeff = 2e-4

0.0002

Now add a 1D line of SWAP gates after the first layer of gates in the base circuit.

In [9]:
nparams_per_layer = Int(length(base_circuit)/nl)

65

In [10]:
ourSWAP_circuit = deepcopy(base_circuit);
for qind in 1:(nq-1)
    insert!(ourSWAP_circuit, nparams_per_layer, CustomSWAPGate((qind, qind+1)))
end

Run the circuit

In [11]:
@time ourSWAP_psum = propagate(ourSWAP_circuit, pstr, thetas; min_abs_coeff=min_abs_coeff)

  0.621411 seconds (472.88 k allocations: 73.329 MiB, 1.28% gc time, 77.48% compilation time)


PauliSum(nqubits: 25, 42237 Pauli terms:
 -0.00031294 * ZIIIIXXIIIZXIIIIZIII...
 -0.00066271 * ZIIIIYZIZIZZXYIIIIZI...
 -0.0026411 * IIIIIYZIIIIXXIIZXZII...
 -0.00023134 * YZIZYYZZIZIYZIIIIIII...
 0.0005023 * IYIIIXIZIIIXZIIZXIII...
 0.00038912 * YIIIZXXZZIIYYXZIZZZI...
 -0.00022203 * XIIZXYXIIIIYIIIZXIII...
 -0.00039624 * ZZIIIZXIIIXYZIIZIIII...
 0.0062984 * IIIIIZIIIIZIIIIIXZII...
 -0.00039462 * IIIIIYZZZIZXZYZZYIZI...
 0.0017179 * ZZIIIZXIIIZYIIIIZIII...
 -0.00042019 * YIIIZYXXZIIYIIIZXIII...
 0.00020148 * XYIIZYZZIIIXZIIIYIII...
 0.0005377 * IIIIIYXIZIIIZYIIIZZI...
 0.0015528 * IIIIIZIIZIXXYXZIXZZI...
 0.00038787 * XZIIZIXIIIZXIIIZXIII...
 0.00056033 * IZIIZYIIIIIZZIIIIIII...
 0.00047457 * IIIIIYIIZIXXXXZIXIZI...
 -0.0028782 * YIIIIXYIIIIXXZIIZIII...
 0.00026907 * ZIIIIYIYIIZYYIIIZZII...
  ⋮)

Overlap with the zero-state

In [12]:
overlapwithzero(ourSWAP_psum)

0.2837639430898192

This looks okay, but is it correct? One thing you may have noticed is that `SWAP` is a `Clifford` operation, i.e., one that takes one Pauli to exactly one other Pauli. We actually have that in our package so we can easily compare.

In [13]:
cliffSWAP_circuit = deepcopy(base_circuit);
for qind in 1:(nq-1)
    insert!(cliffSWAP_circuit, nparams_per_layer, CliffordGate(:SWAP, (qind, qind+1)))
end

In [14]:
@time cliffSWAP_psum = propagate(cliffSWAP_circuit, pstr, thetas; min_abs_coeff=min_abs_coeff);

  0.234933 seconds (65.84 k allocations: 47.084 MiB, 11.36% gc time, 32.13% compilation time)


Are the results the same?

In [15]:
overlapwithzero(cliffSWAP_psum)

0.2837639430898192

In [16]:
cliffSWAP_psum == ourSWAP_psum

true

Yes!

We can also benchmark the performance.

In [17]:
using BenchmarkTools

In [18]:
@btime propagate($ourSWAP_circuit, $pstr, $thetas; min_abs_coeff=$min_abs_coeff);

  119.028 ms (1140 allocations: 42.75 MiB)


In [19]:
@btime propagate($cliffSWAP_circuit, $pstr, $thetas; min_abs_coeff=$min_abs_coeff);

  120.782 ms (1140 allocations: 42.75 MiB)


No downside at all from defining our custom gate. How? This is because the `apply` function for this gate is *type stable*! Type stability is absolutely crucial in Julia, and codes live and die by it.

In [20]:
@code_warntype apply(CustomSWAPGate((7, 8)), pstr.term, 0.0, 1.0)

MethodInstance for PauliPropagation.apply(::CustomSWAPGate, ::UInt64, ::Float64, ::Float64)
  from apply([90mgate[39m::[1mCustomSWAPGate[22m, [90mpstr[39m, [90mtheta[39m, [90mcoefficient[39m; kwargs...)[90m @[39m [90mMain[39m [90m[4mIn[3]:1[24m[39m
Arguments
  #self#[36m::Core.Const(PauliPropagation.apply)[39m
  gate[36m::CustomSWAPGate[39m
  pstr[36m::UInt64[39m
  theta[36m::Float64[39m
  coefficient[36m::Float64[39m
Body[36m::Tuple{UInt64, Float64}[39m
[90m1 ─[39m %1 = Core.NamedTuple()[36m::Core.Const(NamedTuple())[39m
[90m│  [39m %2 = Base.pairs(%1)[36m::Core.Const(Base.Pairs{Symbol, Union{}, Tuple{}, @NamedTuple{}}())[39m
[90m│  [39m %3 = Main.:(var"#apply#1")(%2, #self#, gate, pstr, theta, coefficient)[36m::Tuple{UInt64, Float64}[39m
[90m└──[39m      return %3



All blue means that everything is great! If correctly implemented, `apply` will be type stable if it returns a known number of Pauli and coefficient pairs. Here it is just 1 because it is a Clifford gate.

Onto an example of a gate that can _split_ a Pauli string into two: The `T` gate.

In [21]:
struct CustomTGate <: StaticGate
    qind::Int
end

A `T` gate is a non-Clifford gate that commutes with `I` and `Z`, splits `X` into `cos(π/4)X - sin(π/4)Y`, and `Y` into `cos(π/4)Y + sin(π/4)X`. 

Let's write the code for that.

In [22]:
function PauliPropagation.apply(gate::CustomTGate, pstr, theta, coefficient=1.0; kwargs...)
    # get the Pauli on the site `gate.qind`
    pauli = getpauli(pstr, gate.qind)
    
    if pauli == 0 || pauli == 3  # I or Z commute
        return pstr, coefficient     
    end
    
    if pauli == 1 # X goes to X, -Y
        new_pauli = 2  # Y
        # set the Pauli
        new_pstr = setpauli(pstr, new_pauli, gate.qind)
        # adapt the coefficients
        new_coefficient = -1 * coefficient * sin(π/4)
        coefficient_prime = coefficient * cos(π/4)
        
    else # Y goes to Y, X
        new_pauli = 1  # X
        # set the Pauli
        new_pstr = setpauli(pstr, new_pauli, gate.qind)
        # adapt the coefficients
        new_coefficient = coefficient * sin(π/4)
        coefficient_prime = coefficient * cos(π/4)
    end
    
    return pstr, coefficient_prime, new_pstr, new_coefficient
    
end

Insert a layer of `TGate`s after the first layer of the base circuit.

In [23]:
ourT_circuit = deepcopy(base_circuit);
for qind in 1:nq
    insert!(ourT_circuit, nparams_per_layer, CustomTGate(qind))
end

And run:

In [24]:
@time ourT_psum = propagate(ourT_circuit, pstr, thetas; min_abs_coeff=min_abs_coeff)

  0.490175 seconds (2.31 M allocations: 144.226 MiB, 2.00% gc time, 11.30% compilation time)


PauliSum(nqubits: 25, 134750 Pauli terms:
 -0.00061577 * YYIIIIZXIIIXYZIIZZII...
 0.00051862 * IYXIIIZIIIIZYZIIIZII...
 0.00056395 * XXZZIZXYYIIZXZIIIZII...
 -0.00034877 * IYIIIZYZIIIXYZIIZZII...
 0.0014633 * IXZIIYZYXIIZYIIIIZII...
 0.00031332 * IIYIIYZYXIIZXZIIIZII...
 -0.00046793 * ZXIIIXXYIIIZXIIIIYZI...
 -0.00024608 * IZIIIYYIIIIIIZIIIYII...
 -0.00024469 * YZZIIYXXXIIIZZIIIIII...
 0.00032697 * YXZIIZYIIIIZYIIIIZII...
 0.00036629 * ZZIIIYIYZIIIIXZIIIZI...
 -0.00055635 * IXIIIIIXXIIZYZIIIZII...
 -0.00024444 * IIIZIZXZXIIZXXZIIIZI...
 -0.00030366 * IXIZIYYYYIIIZZIIIIII...
 -0.00028156 * ZXZIIXZZZIIIIXIIIIZI...
 -0.00024536 * IIIIIIIIIIIXZYZIZYII...
 0.00021283 * IZZIIZZYIIIIXZIIIXII...
 0.000339 * IYIIIYXYIIIIYIIIIXZI...
 -0.00050988 * ZYZIIIXYXIIZXIIIIYZI...
 -0.0053736 * IZIIIYXYIIIXYZIIZZII...
  ⋮)

In [25]:
overlapwithzero(ourT_psum)

0.2802436569577357

But did it work? Again, we have an implementation of a `TGate` in our library. In case you are interested, we currently implement `T` gates as Pauli `Z` rotations at an angle of `π/4`. Let's compare to that.

In [26]:
frozenZ_circuit = deepcopy(base_circuit);
for qind in 1:nq
    insert!(frozenZ_circuit, nparams_per_layer, TGate(qind))
end

If you call `PauliGate(:Z, qind, parameter)`, this will create a so-called `FrozenGate` wrapping the parametrized `PauliGate`, with a fixed `parameter` at the time of circuit construction.

Run it and compare

In [27]:
@time frozenZ_psum = propagate(frozenZ_circuit, pstr, thetas; min_abs_coeff=min_abs_coeff);

  0.405651 seconds (11.54 k allocations: 83.579 MiB, 1.70% gc time, 2.61% compilation time)


In [28]:
overlapwithzero(frozenZ_psum)

0.2802436569577357

In [29]:
frozenZ_psum == ourT_psum

true

It works! But is it optimal

In [30]:
using BenchmarkTools

In [31]:
@btime propagate($ourT_circuit, $pstr, $thetas;min_abs_coeff=$min_abs_coeff);

  422.650 ms (2253396 allocations: 140.42 MiB)


In [32]:
@btime propagate($frozenZ_circuit, $pstr, $thetas; min_abs_coeff=$min_abs_coeff);

  382.325 ms (1144 allocations: 82.86 MiB)


No, because `apply` for the `CustomTGate` is not type-stable.

In [33]:
@code_warntype apply(CustomTGate(7), pstr.term, 0.0, 1.0)

MethodInstance for PauliPropagation.apply(::CustomTGate, ::UInt64, ::Float64, ::Float64)
  from apply([90mgate[39m::[1mCustomTGate[22m, [90mpstr[39m, [90mtheta[39m, [90mcoefficient[39m; kwargs...)[90m @[39m [90mMain[39m [90m[4mIn[22]:1[24m[39m
Arguments
  #self#[36m::Core.Const(PauliPropagation.apply)[39m
  gate[36m::CustomTGate[39m
  pstr[36m::UInt64[39m
  theta[36m::Float64[39m
  coefficient[36m::Float64[39m
Body[33m[1m::Union{Tuple{UInt64, Float64}, Tuple{UInt64, Float64, UInt64, Float64}}[22m[39m
[90m1 ─[39m %1 = Core.NamedTuple()[36m::Core.Const(NamedTuple())[39m
[90m│  [39m %2 = Base.pairs(%1)[36m::Core.Const(Base.Pairs{Symbol, Union{}, Tuple{}, @NamedTuple{}}())[39m
[90m│  [39m %3 = Main.:(var"#apply#2")(%2, #self#, gate, pstr, theta, coefficient)[33m[1m::Union{Tuple{UInt64, Float64}, Tuple{UInt64, Float64, UInt64, Float64}}[22m[39m
[90m└──[39m      return %3



It either returns a tuple `Tuple{UInt64, Float64}` of length 2 or a tuple `Tuple{UInt64, Float64, UInt64, Float64}` of length 4. When this is the case, you may want to define some lower-level function under `propagate` for optimal performance. This is how we would do it. Yellow `@code_warntype` output means it might be okay (it is not that much slower after all), but be wary of red.

In [34]:
function PauliPropagation.applyandadd!(gate::CustomTGate, pstr, coeff, theta, output_psum, args...; kwargs...)
    
    # get the Pauli that is on index `gate.qind` of the Pauli string `pstr`
    pauli = getpauli(pstr, gate.qind)
    
    if pauli == 0 || pauli == 3  # I or Z commute
        add!(output_psum, pstr, coeff)
        return
    end

    if pauli == 1 # X goes to X, -Y
        new_pauli = 2  # Y
        new_pstr = setpauli(pstr, new_pauli, gate.qind)
        coefficient1 = coeff * cos(π/4)
        coefficient2 = -1 * coeff * sin(π/4)
    else # Y goes to Y, X
        new_pauli = 1  # X
        new_pstr = setpauli(pstr, new_pauli, gate.qind)
        coefficient1 = coeff * cos(π/4)
        coefficient2 = coeff * sin(π/4)
    end

    add!(output_psum, pstr, coefficient1)
    add!(output_psum, new_pstr, coefficient2)

    return
end

The first function re-definition is currently necessary but might change in the future. The second function is the interesting one. Here we manually update the coefficients in the propagating Pauli string dictionary for the Pauli string that already exist, and we add the new one to the second dictionary that will later be merged into the first.

Let's see if this worked.

In [35]:
@time ourT_psum2 = propagate(ourT_circuit, pstr, thetas; min_abs_coeff=min_abs_coeff);

  0.438981 seconds (25.03 k allocations: 87.096 MiB, 1.47% gc time, 8.38% compilation time: 100% of which was recompilation)


In [36]:
overlapwithzero(ourT_psum2)

0.2802436569577357

In [37]:
ourT_psum == ourT_psum2

true

And check the performance.

In [38]:
@btime propagate($ourT_circuit, $pstr, $thetas; min_abs_coeff=$min_abs_coeff);

  387.889 ms (1168 allocations: 85.52 MiB)


In [39]:
@btime propagate($frozenZ_circuit, $pstr, $thetas; min_abs_coeff=$min_abs_coeff);

  378.418 ms (1144 allocations: 82.86 MiB)


This is almost as fast already, because we fixed the slightly annoying type-instability. Also see the number of memory allocations that went down drastically. But we can do one more thing: We can define the next higher level function called `applytoall!()`, where we can also reduce the moving Pauli strings around unnecessarily. Why would we say this? Notice that in `applyandadd!()` we add all produced Pauli strings to an output Pauli sum `output_psum`, even those that don't change under action of a T-gate (I and Z Paulis). Let's see how we introduce this additional slight optimization:

In [40]:
function PauliPropagation.applytoall!(gate::CustomTGate, theta, psum, aux_psum, args...; kwargs...)

    # loop over all Pauli strings and their coefficients in the Pauli sum
    for (pstr, coeff) in psum

        # get the Pauli that is on index `gate.qind` of the Pauli string `pstr`
        pauli = getpauli(pstr, gate.qind)
        
        if pauli == 0 || pauli == 3  # I or Z commute
            # if the T gate commutes with the pauli string, do nothing
            # the Pauli string remains in the old Pauli sum
            continue
        end

        # else we know the gate will split th Pauli string into two
        # copy-past our code from above
        if pauli == 1 # X goes to X, -Y
            new_pauli = 2  # Y
            new_pstr = setpauli(pstr, new_pauli, gate.qind)
            coefficient1 = coeff * cos(π/4)
            coefficient2 = -1 * coeff * sin(π/4)
        else # Y goes to Y, X
            new_pauli = 1  # X
            new_pstr = setpauli(pstr, new_pauli, gate.qind)
            coefficient1 = coeff * cos(π/4)
            coefficient2 = coeff * sin(π/4)
        end


        # set the coefficient of the original Pauli string in the old Pauli sum that we are looping over
        set!(psum, pstr, coefficient1)

        # set the coefficient of the new Pauli string in the aux_psum
        # we can set the coefficient because PauliRotations create non-overlapping new Pauli strings
        set!(aux_psum, new_pstr, coefficient2)

        # now both Pauli sums contain Pauli strings, which is okay because a higher level function will merge them
        # we want to reduce unnecessary movement of Pauli strings from one Pauli sum to another
    end
    return
end

In [41]:
@time ourT_psum3 = propagate(ourT_circuit, pstr, thetas; min_abs_coeff=min_abs_coeff);

  0.443565 seconds (24.26 k allocations: 84.342 MiB, 1.03% gc time, 13.52% compilation time: 100% of which was recompilation)


In [42]:
overlapwithzero(ourT_psum3)

0.2802436569577357

In [43]:
ourT_psum == ourT_psum3

true

And check the performance.

In [44]:
@btime propagate($ourT_circuit, $pstr, $thetas; min_abs_coeff=$min_abs_coeff);

  378.449 ms (1144 allocations: 82.86 MiB)


In [45]:
@btime propagate($frozenZ_circuit, $pstr, $thetas; min_abs_coeff=$min_abs_coeff);

  370.374 ms (1144 allocations: 82.86 MiB)


This is how we implement gates in `PauliPropagation.jl`. Enjoy defining custom and high-performance gates! 