# Designing and Testing Hardware-Complementary Quantum Circuits

In this notebook, we aim to explore differences between three specific but equivalent circuits on QuEra's natural atom platform. 

One key quality of quantum computing is entanglement. How would the maximumly entangled state look like?

That state is known as the GHZ (Greenberger–Horne–Zeilinger) state. For $N$ qubits, it is defined as:

$$ |GHZ\rangle = \frac{1}{\sqrt{2}} (|0\rangle^{\otimes N} + |1\rangle^{\otimes N}) = \frac{1}{\sqrt{2}} (|00...0\rangle + |11...1\rangle) $$

This state represents a superposition where all qubits are either in the state $|0\rangle$ or all are in the state $|1\rangle$.

For the purpose of our tutorial, we will consider the case when $N$ is a power of 2. 

The simplest way to achieve this state is perhaps with a single Hadamard and expand it to all the qubits, one by one. 

Let's initialize by importing the relevant python packages. 

In [2]:
import math
from bloqade.pyqrack import PyQrack

from bloqade import qasm2
from kirin.dialects import ilist

Apply Hadamard to the first qubit of the register, then multiple CZs ascending the register.

In [3]:
def ghz_linear():
    n = 3
    n_qubits = int(2**n)

    @qasm2.extended
    def ghz_linear_program():

        qreg = qasm2.qreg(n_qubits)
        # Apply a Hadamard on the first qubit
        qasm2.h(qreg[0])
        # Create a cascading sequence of CX gates
        # necessary for quantum computers that
        # only have nearest-neighbor connectivity between qubits
        for i in range(1, n_qubits):
            qasm2.cx(qreg[i - 1], qreg[i])

    return ghz_linear_program

# Neutral Atom Computing Advantages & Programming Strategies
Neutral-atom quantum computing leverages the flexibility of entangling arbitrary pairs of qubits, high gate parallelization, and a variety of native gates. This leads to a large compilation design space.

Programming strategies for neutral-atom quantum hardware, supported by tools like Bloqade, involve:

* Extending QASM programming with annotations, for loops, and if statements to efficiently represent parallelizable and global gate structures.
   
* Considering specific design rules of neutral-atom systems to optimize circuit layout and performance. 
   
* Using kernel structures to co-design circuits for hardware awareness. 
   
* Using native gate sets like Z-rotations and XY-rotations, with a focus on global operations to minimize errors.
   
* Managing atom shuttling between storage and gate zones for entangling operations, while adhering to rules that prevent atom collisions and maintain order.

# GHZPrep First Optimization
The first optimization that we can make is to try to use global gates instead of local gates, wherever possible.
<br>
<br> Notice that if we expand every CNOT in the circuit, every qubit in the register will have a Hadamard gate as its first operation. If we replace all these single-qubit gates with just one global gate, substituting H for an equivalent U(pi/2, 0, pi) gate, then the circuit is now optimized.
<br>
There is still more potential for optimization!

Now, what would happen if apply this equivalent changes to each of the control gate in circuit one?

[graphics]

Observe that, quite elegantly, a Hadamard gate can be found at the beginning of each line. This can be replaced *global* Hadamard gate: a Hadamard gate that strike all qubits at the same time. The proper implementation is a global/parallel U(pi/2, 0, pi) gate, but they are the same operation.
<br>
This is one occasion of QuEra's neutral atom hardware platform truly shines.

After the global Hadamard gate, we are left with a sequence of Control-Z (CZ) gates, still arranged in a linear chain connecting adjacent qubits. For each qubit, Hadamard gates are applied after being the target for a first CZ, and before being the control bit for the second. While it may seem tempting to "globalize" the second set of hadamard gates, there are CZ gates between each one, so they are left alone.

![First optimization](image/IMG_8645.jpg)


In [4]:
from kirin.dialects import ilist
from bloqade.qasm2.dialects import core, parallel # Import core dialect as well
from bloqade import qasm2 as qasm2
import math

def ghz_global():
    n = 3
    num_qubits_program = 2**n
    @qasm2.extended
    def ghz_global_inner():

        qreg = qasm2.qreg(num_qubits_program)

        # Apply parallel U gate
        qasm2.parallel.u(
            list(qreg),    # Pass the list of qubits
            theta=math.pi/2, # Specify the theta angle
            phi=0,           # Specify the phi angle
            lam=math.pi      # Specify the lambda angle
        )

        # Apply CZ and H gates sequentially
        # Loop up to num_qubits_program
        for i in range(1, num_qubits_program):
            qasm2.cz(qreg[i - 1], qreg[i])
            qasm2.h(qreg[i])

    return ghz_global_inner


# GHZPrep Second Optimization
The circuit which creates a GHZ state also has an alternate variant which greatly reduces its depth. The depth of a circuit is the number of time steps needed to execute the circuit to completion, which generally correlates with larger error the larger the depth is.
<br><br>
Here it can be reduced by changing the order in which CNOT gates are executed, allowing some to be done in parallel. Additionally, the same global-hadamard optimization can be made here. The circuit and a diagram are below.



In [5]:
def ghz_parallel():

    n = 3 #2**n qubits

    @qasm2.extended
    def ghz_parallel_inner():
        qreg = qasm2.qreg(2**n)

        # Apply parallel U gate
        qasm2.parallel.u(list(qreg),theta=math.pi/2,phi=0,lam=math.pi )

        #layer 0, cz goes from 0 to n/2, layer 1, cz from 0 to n/4, n/2 to 3n/4, etc...
        for layer in range(0, n):
            for gate in range(0,2**layer):
                qasm2.cz(qreg[gate*2**(n-layer-1)], qreg[(gate+1)*2**(n-layer-1)])

    return ghz_parallel_inner

![First optimization](image/IMG_8646.jpg)

Note: The Benchmarker DOES NOT WORK due to lack of noise model implementation and data graphing scripts.

In [6]:
# Testing code with the Circuit Benchmarker
from Benchmarker import Benchmark

linear_tester = Benchmark(ghz_linear())
global_tester = Benchmark(ghz_global())
parallel_tester = Benchmark(ghz_parallel())

def main():
    linear_tester.run_benchmark()
    global_tester.run_benchmark()
    parallel_tester.run_benchmark()

b = PyQrack()
b.run(main)

DialectLoweringError: unsupported callee type: <class 'list'>

# Apply CZ and H gates sequentially
# Loop up to num_qubits_program
for i in range(1, num_qubits_program):
--------^
    qasm2.cz(qreg[i - 1], qreg[i])
    qasm2.h(qreg[i])
