# How to Define Custom Gates
`PauliPropagation.jl` is extensible and allows you to define your own gates. Depending on how much you can or want to code, you can definte a gate that _works_ or one that is as fast as it gets. Here will see what you need to define.

In [1]:
using PauliPropagation

Let us start by defining a `SWAP` gate. It is sub-typing from `StaticGate`, which denotes that it does not take any variable parameters at propagation time. It always acts the same.

In [2]:
struct CustomSWAPGate <: StaticGate
    qinds::Tuple{Int, Int}  # The two sites to be swapped
end

The action of a `SWAP` gate on a Pauli string is that it swaps the Paulis on two sites. We can now define a function `apply` which receives these 4 arguments in this order, as well as potential `kwargs`: `apply(gate::YourGate, pstr, theta, coefficient; kwargs...)`. We can ignore `kwargs` for now, but you can use them to pass arguments from the top level down to your function. If your custom gate is parametrized, the third parameter will be a the parameter for that gate.

This is how you can define `SWAP`:

In [3]:
function PauliPropagation.apply(gate::CustomSWAPGate, pstr, coefficient; kwargs...)
    # get the Pauli on the first site
    pauli1 = getpauli(pstr, gate.qinds[1])
    # get the Pauli on the second site
    pauli2 = getpauli(pstr, gate.qinds[2])
    
    # set the Pauli on the first site to the second Pauli
    pstr = setpauli(pstr, pauli2, gate.qinds[1])
    # set the Pauli on the second site to the first Pauli
    pstr = setpauli(pstr, pauli1, gate.qinds[2])

    return pstr, coefficient
end

This is it, really.

Now set up the simulation. 25 qubits in a 5 by 5 grid.

In [4]:
nx = 5
ny = 5
nq = nx * ny

topology = get2dtopology(nx, ny);

`nl` layers of a circuit consisting of `RX` and `RZZ` Pauli rotations.

In [5]:
nl = 3
base_circuit = tfitrottercircuit(nq, nl; topology=topology);
nparams = countparameters(base_circuit)

195

Define our observable as $ Z_7 Z_{13} $.

In [6]:
pstr = PauliString(nq, [:Z, :Z], [7, 13])

PauliString(nqubits: 25, 1.0 * IIIIIIZIIIIIZIIIIIII...)

Circuit parameters with a random seed.

In [7]:
using Random
Random.seed!(42)
thetas = randn(nparams);

For this notebook, we will use a minimum coefficient threshold. The results are still almost exact.

In [8]:
min_abs_coeff = 5e-3

0.005

Now add a 1D line of SWAP gates after the first layer of gates in the base circuit.

In [9]:
nparams_per_layer = Int(length(base_circuit)/nl)

65

In [10]:
ourSWAP_circuit = deepcopy(base_circuit);
for qind in 1:(nq-1)
    insert!(ourSWAP_circuit, nparams_per_layer, CustomSWAPGate((qind, qind+1)))
end

Run the circuit

In [11]:
@time ourSWAP_psum = propagate(ourSWAP_circuit, pstr, thetas; min_abs_coeff=min_abs_coeff)

  1.043115 seconds (470.99 k allocations: 32.353 MiB, 99.17% compilation time)


PauliSum(nqubits: 25, 1116 Pauli terms:
 0.016822 * IIIIIXZIIIXIIIIIXIII...
 0.015711 * IIIIZXYIIIIXZIIIYIII...
 0.0050606 * IIIIIXZIIIXZIIIZZIII...
 -0.022839 * ZIIIZXZIIIIYZIIIZIII...
 -0.009093 * XIIIZYZIIIIXZIIIZIII...
 0.011927 * YIIIZXYIIIXXZIIZZIII...
 0.0061591 * IZIIIZIZIIIYZIIIZIII...
 -0.053091 * IZIIZXXIIIIXZIIIZIII...
 0.0052665 * YIIIIXYIIIIYIIIIXIII...
 0.0062184 * IIIIIZIIIIZIIIIIXZII...
 -0.055987 * IIIIIYIIIIXXZIIZZIII...
 -0.01968 * YIIIZYYIIIZZIIIIIIII...
 -0.0068224 * IIIIIIXIZIIIXYIIIZZI...
 -0.01065 * YIIIIXXIIIIZIIIIXIII...
 0.016516 * YIIIIXXIIIIZIIIZXIII...
 -0.018619 * YZIIZZIZIIIYZIIIZIII...
 -0.0057183 * YIIIZYXIIIIXZIIIZIII...
 0.018217 * ZIIIIXIIIIZIIIIZXIII...
 -0.010847 * IIIIIXZIIIXIIIIZXIII...
 0.0068392 * IIIIIZIIIIZZIIIIXIII...
  ⋮)

Overlap with the zero-state

In [12]:
overlapwithzero(ourSWAP_psum)

0.25455408051021855

This looks okay, but is it correct? One thing you may have noticed is that `SWAP` is a `Clifford` operation, i.e., one that takes one Pauli to exactly one other Pauli. We actually have that in our package so we can easily compare.

In [13]:
cliffSWAP_circuit = deepcopy(base_circuit);
for qind in 1:(nq-1)
    insert!(cliffSWAP_circuit, nparams_per_layer, CliffordGate(:SWAP, (qind, qind+1)))
end

In [14]:
@time cliffSWAP_psum = propagate(cliffSWAP_circuit, pstr, thetas; min_abs_coeff=min_abs_coeff);

  0.169071 seconds (67.51 k allocations: 6.353 MiB, 95.25% compilation time)


Are the results the same?

In [15]:
overlapwithzero(cliffSWAP_psum)

0.25455408051021855

In [16]:
cliffSWAP_psum == ourSWAP_psum

true

Yes!

We can also benchmark the performance.

In [17]:
using BenchmarkTools

In [18]:
@btime propagate($ourSWAP_circuit, $pstr, $thetas; min_abs_coeff=$min_abs_coeff);

  7.449 ms (1098 allocations: 1.90 MiB)


In [19]:
@btime propagate($cliffSWAP_circuit, $pstr, $thetas; min_abs_coeff=$min_abs_coeff);

  18.593 ms (1098 allocations: 1.90 MiB)


No downside at all from defining our custom gate. How? This is because the `apply` function for this gate is *type stable*! Type stability is absolutely crucial in Julia, and codes live and die by it.

In [20]:
@code_warntype apply(CustomSWAPGate((7, 8)), pstr.term, 0.0, 1.0)

MethodInstance for PauliPropagation.apply(::CustomSWAPGate, ::UInt64, ::Float64, ::Float64)
  from apply([90mgate[39m::[1mSG[22m, [90mpstr[39m, [90mtheta[39m, [90mcoefficient[39m; kwargs...) where SG<:StaticGate[90m @[39m [90mPauliPropagation[39m [90m~/.julia/dev/PauliPropagation/src/Gates/[39m[90m[4mGates.jl:22[24m[39m
Static Parameters
  SG = [36mCustomSWAPGate[39m
Arguments
  #self#[36m::Core.Const(PauliPropagation.apply)[39m
  gate[36m::CustomSWAPGate[39m
  pstr[36m::UInt64[39m
  theta[36m::Float64[39m
  coefficient[36m::Float64[39m
Body[36m::Tuple{UInt64, Float64}[39m
[90m1 ─[39m %1 = Core.NamedTuple()[36m::Core.Const(NamedTuple())[39m
[90m│  [39m %2 = Base.pairs(%1)[36m::Core.Const(Base.Pairs{Symbol, Union{}, Tuple{}, @NamedTuple{}}())[39m
[90m│  [39m %3 = PauliPropagation.:(var"#apply#27")(%2, #self#, gate, pstr, theta, coefficient)[36m::Tuple{UInt64, Float64}[39m
[90m└──[39m      return %3



All blue means that everything is great! If correctly implemented, `apply` will be type stable if it returns a known number of Pauli and coefficient pairs. Here it is just 1 because it is a Clifford gate.

Onto an example of a gate that can _split_ a Pauli string into two: The `T` gate.

In [21]:
struct CustomTGate <: StaticGate
    qind::Int
end

A `T` gate is a non-Clifford gate that commutes with `I` and `Z`, splits `X` into `cos(π/4)X - sin(π/4)Y`, and `Y` into `cos(π/4)Y + sin(π/4)X`. 

Let's write the code for that.

In [22]:
function PauliPropagation.apply(gate::CustomTGate, pstr, coefficient=1.0; kwargs...)
    # get the Pauli on the site `gate.qind`
    pauli = getpauli(pstr, gate.qind)
    
    if pauli == 0 || pauli == 3  # I or Z commute
        return pstr, coefficient     
    end
    
    if pauli == 1 # X goes to X, -Y
        new_pauli = 2  # Y
        # set the Pauli
        new_pstr = setpauli(pstr, new_pauli, gate.qind)
        # adapt the coefficients
        new_coefficient = -1 * coefficient * sin(π/4)
        coefficient_prime = coefficient * cos(π/4)
        
    else # Y goes to Y, X
        new_pauli = 1  # X
        # set the Pauli
        new_pstr = setpauli(pstr, new_pauli, gate.qind)
        # adapt the coefficients
        new_coefficient = coefficient * sin(π/4)
        coefficient_prime = coefficient * cos(π/4)
    end
    
    return pstr, coefficient_prime, new_pstr, new_coefficient
    
end

Insert a layer of `TGate`s after the first layer of the base circuit.

In [23]:
ourT_circuit = deepcopy(base_circuit);
for qind in 1:nq
    insert!(ourT_circuit, nparams_per_layer, CustomTGate(qind))
end

And run:

In [24]:
@time ourT_psum = propagate(ourT_circuit, pstr, thetas; min_abs_coeff=min_abs_coeff)

  0.149694 seconds (170.94 k allocations: 8.407 MiB, 89.83% compilation time)


PauliSum(nqubits: 25, 1623 Pauli terms:
 -0.0054255 * IZIIIZZIIIIZZIIIIIII...
 0.0050267 * IZIIIYXYIIIZXIIIIZII...
 0.014193 * IIIIIIYIIIIZXIIIIZII...
 0.0056392 * IYIIIIXYIIIZYZIIIZII...
 -0.031645 * IZIIIZYZIIIZYZIIIZII...
 0.0059067 * IZIIIYXZIIIXXIIIZZII...
 -0.0087606 * IIIIIZXIIIIZYZIIIYZI...
 -0.0061325 * IZIIIYXIIIIZXIIIIYZI...
 -0.013054 * IIIIIIXIZIIZYYIIIYII...
 -0.023005 * IIIIIZXIIIIYIIIIZZII...
 -0.007082 * IIIIIIIIZIIYXYIIZXII...
 0.0071913 * IIIIIIIIIIIXXZIIIYZI...
 -0.05163 * IIIIIZXZIIIZXZIIIZII...
 0.0052415 * IIIIIIYIZIIYIYZIZIZI...
 0.006278 * IIIIIIYIZIIIXYZIIZZI...
 0.0070287 * IZIIIYXYIIIZYZIIIZII...
 0.012219 * IIIIIIXIZIIIXYIIIZZI...
 -0.02061 * IIIIIIYIIIIYXZIIIZII...
 0.020048 * IIIIIIXIIIIZXZIIIXZI...
 -0.01359 * IIIIIIZZZIIXXXZIZZZI...
  ⋮)

In [25]:
overlapwithzero(ourT_psum)

0.28722825454884676

But did it work? Again, we have an implementation of a `TGate` in our library. In case you are interested, we currently implement `T` gates as Pauli `Z` rotations at an angle of `π/4`. Let's compare to that.

In [26]:
frozenZ_circuit = deepcopy(base_circuit);
for qind in 1:nq
    insert!(frozenZ_circuit, nparams_per_layer, TGate(qind))
end
tofastgates!(frozenZ_circuit, nq);

If you call `PauliGate(:Z, qind, parameter)`, this will create a so-called `FrozenGate` wrapping the parametrized `PauliGate`, with a fixed `parameter` at the time of circuit construction.

Run it and compare

In [27]:
@time frozenZ_psum = propagate(frozenZ_circuit, pstr, thetas; min_abs_coeff=min_abs_coeff);

  0.042447 seconds (11.43 k allocations: 2.197 MiB, 68.43% compilation time)


In [28]:
overlapwithzero(frozenZ_psum)

0.28722825454884676

In [29]:
frozenZ_psum == ourT_psum

true

It works! But is it optimal?

In [30]:
using BenchmarkTools

In [31]:
@btime propagate($ourT_circuit, $pstr, $thetas;min_abs_coeff=$min_abs_coeff);

  14.490 ms (106451 allocations: 4.16 MiB)


In [32]:
@btime propagate($frozenZ_circuit, $pstr, $thetas; min_abs_coeff=$min_abs_coeff);

  52.483 ms (1034 allocations: 1.48 MiB)


No, because `apply` for the `CustomTGate` is not type-stable.

In [33]:
@code_warntype apply(CustomTGate(7), pstr.term, 0.0, 1.0)

MethodInstance for PauliPropagation.apply(::CustomTGate, ::UInt64, ::Float64, ::Float64)
  from apply([90mgate[39m::[1mSG[22m, [90mpstr[39m, [90mtheta[39m, [90mcoefficient[39m; kwargs...) where SG<:StaticGate[90m @[39m [90mPauliPropagation[39m [90m~/.julia/dev/PauliPropagation/src/Gates/[39m[90m[4mGates.jl:22[24m[39m
Static Parameters
  SG = [36mCustomTGate[39m
Arguments
  #self#[36m::Core.Const(PauliPropagation.apply)[39m
  gate[36m::CustomTGate[39m
  pstr[36m::UInt64[39m
  theta[36m::Float64[39m
  coefficient[36m::Float64[39m
Body[33m[1m::Union{Tuple{UInt64, Float64}, Tuple{UInt64, Float64, UInt64, Float64}}[22m[39m
[90m1 ─[39m %1 = Core.NamedTuple()[36m::Core.Const(NamedTuple())[39m
[90m│  [39m %2 = Base.pairs(%1)[36m::Core.Const(Base.Pairs{Symbol, Union{}, Tuple{}, @NamedTuple{}}())[39m
[90m│  [39m %3 = PauliPropagation.:(var"#apply#27")(%2, #self#, gate, pstr, theta, coefficient)[33m[1m::Union{Tuple{UInt64, Float64}, Tuple{UInt64, Fl

It either returns a tuple `Tuple{UInt64, Float64}` of length 2 or a tuple `Tuple{UInt64, Float64, UInt64, Float64}` of length 4. When this is the case, you may want to define some lower-level function under `propagate` for optimal performance. This is how we would do it. Yellow `@code_warntype` output means it might be okay (it is not that much slower after all), but be wary of red.

To avoid such type instabilities, we can overload a slightly higher level function `applyandadd!()`, which does the job of `apply()`, but as the name hints, also adds the created Pauli strings to the propagating Pauli sum. We can practically copy-paste the code from `apply()`, but the only difference being that we don't return anything, but `add!()` the Pauli strings to the `output_psum`. Be mindful of the fact that the function signature needs to be exactly like this. Even though you might not need a parameter `theta`, it needs to be received by your function.

In [34]:
function PauliPropagation.applyandadd!(gate::CustomTGate, pstr, coefficient, theta, output_psum, args...; kwargs...)
    
    pauli = getpauli(pstr, gate.qind)
    
    if pauli == 0 || pauli == 3  # I or Z commute
        add!(output_psum, pstr, coefficient)
        return
    end

    if pauli == 1 # X goes to X, -Y
        new_pauli = 2  # Y
        new_pstr = setpauli(pstr, new_pauli, gate.qind)
        new_coefficient = -1 * coefficient * sin(π/4)
    else # Y goes to Y, X
        new_pauli = 1  # X
        new_pstr = setpauli(pstr, new_pauli, gate.qind)
        new_coefficient = coefficient * sin(π/4)
    end

    add!(output_psum, pstr, coefficient * cos(π/4))
    add!(output_psum, new_pstr, new_coefficient)

    return
end

This should resolve the slight type instability. Let's see if it worked and gives the same results.

In [35]:
@time ourT_psum2 = propagate(ourT_circuit, pstr, thetas; min_abs_coeff=min_abs_coeff);

  0.431515 seconds (24.64 k allocations: 3.172 MiB, 86.32% compilation time: 100% of which was recompilation)


In [36]:
overlapwithzero(ourT_psum2)

0.28722825454884676

In [37]:
ourT_psum == ourT_psum2

true

And check the performance.

In [38]:
@btime propagate($ourT_circuit, $pstr, $thetas; min_abs_coeff=$min_abs_coeff);

  11.767 ms (1039 allocations: 1.61 MiB)


In [39]:
@btime propagate($frozenZ_circuit, $pstr, $thetas; min_abs_coeff=$min_abs_coeff);

  10.612 ms (1034 allocations: 1.48 MiB)


This is already much better and quite fast. But we still see that it is a bit slower than our inbuilt `TGate`. How so? The answer lies in the fact that we move Pauli strings more than necessary. Because the runtime of the T-gate simulation is dominated by commutation (because I is very comon for local observables), we could leave those commuting Pauli strings where they are -> in their original Pauli sum. For this, we can overload the function `applytoall!()`, which differs in that one performs the loop over the Pauli strings in the Pauli sum here, and one can thus use the old Pauli sum more flexibly. Our convention is that anything left in `psum` or `aux_psum` is later merged back into `psum`. Thus, we can simply skip the commuting Pauli strings, and edit the coefficient of Pauli strings in-place. See this version of the function:

In [40]:
function PauliPropagation.applytoall!(gate::CustomTGate, theta, psum, aux_psum, args...; kwargs...)
    
    for (pstr, coefficient) in psum 
    
        pauli = getpauli(pstr, gate.qind)

        if pauli == 0 || pauli == 3  # I or Z commute
            # do nothing
            continue
        end

        if pauli == 1 # X goes to X, -Y
            new_pauli = 2  # Y
            new_pstr = setpauli(pstr, new_pauli, gate.qind)
            new_coefficient = -1 * coefficient * sin(π/4)
        else # Y goes to Y, X
            new_pauli = 1  # X
            new_pstr = setpauli(pstr, new_pauli, gate.qind)
            new_coefficient = coefficient * sin(π/4)
        end

        set!(psum, pstr, coefficient * cos(π/4))
        set!(aux_psum, new_pstr, new_coefficient)
    end
    return
end

In [41]:
@time ourT_psum2 = propagate(ourT_circuit, pstr, thetas; min_abs_coeff=min_abs_coeff);

  0.140407 seconds (23.89 k allocations: 2.942 MiB, 92.05% compilation time: 100% of which was recompilation)


In [42]:
overlapwithzero(ourT_psum2)

0.28722825454884676

In [43]:
ourT_psum == ourT_psum2

true

And check the performance.

In [44]:
@btime propagate($ourT_circuit, $pstr, $thetas; min_abs_coeff=$min_abs_coeff);

  10.550 ms (1034 allocations: 1.48 MiB)


In [45]:
@btime propagate($frozenZ_circuit, $pstr, $thetas; min_abs_coeff=$min_abs_coeff);

  10.577 ms (1034 allocations: 1.48 MiB)


Enjoy defining custom and high-performance gates! 