# Introduction to Compiling with BQSKit
---
![bqskit](https://bqskit.lbl.gov/wp-content/uploads/sites/6/2021/02/BQSKit_header-1.png)

The Berkeley Quantum Synthesis Toolkit (BQSKit) is a powerful and portable quantum compiler framework. This tutorial explores how to use BQSKit to compile quantum programs to efficient physical circuits for any QPU. A standard workflow utilizing BQSKit consists of loading a program into the framework, modeling the target QPU, compiling the program, and exporting the resulting circuit. The first section of this tutorial covers the basics of this workflow. Section two explores QPU modeling and describes the built-in machine models, while the third and final section details fine-tuning compilation parameters for better performance and result quality.

To **install** BQSKit and other required packages, follow the instructions in the [README](https://github.com/BQSKit/bqskit-tutorial/blob/main/README.md).

## Section 1: Getting Started

### Loading a Circuit

Before we can compile a program with BQSKit, we first need to load one in. There are various ways to accomplish this, but the common option is to load the circuit in from a qasm file:

In [1]:
from bqskit import Circuit
circuit = Circuit.from_file('qasm/heisenberg16.qasm')

We have included many circuits as part of this tutorial series in the `qasm` subdirectory. Feel free to use any of these or one of your own programs interchangeably throughout the tutorial.

If you have another popular circuit framework installed your system, then the BQSKit extension package, `bqskit.ext`, will automatically detect and add transfer support for it:

In [2]:
# Build a circuit in Qiskit
from qiskit import QuantumCircuit
qc = QuantumCircuit(2)
qc.h(0)
qc.cnot(0, 1)

# Load it into BQSKit
from bqskit.ext import qiskit_to_bqskit
bqskit_circuit = qiskit_to_bqskit(qc)

In the above example, `qiskit_to_bqskit` could have been substituted for `qutip_to_bqskit`, `cirq_to_bqskit`, or `pytket_to_bqskit` when coming from the respective framework.

Now that we have a circuit loaded, let's print some information about it.

In [3]:
print("Circuit Statistics")
print("Gate Counts:", circuit.gate_counts)
print("Logical Connectivity:", circuit.coupling_graph)

Circuit Statistics
Gate Counts: {RZGate: 180, CNOTGate: 360, XGate: 8, RXGate: 240, HGate: 240}
Logical Connectivity: CouplingGraph({(0, 1), (9, 10), (13, 14), (1, 2), (10, 11), (3, 4), (12, 13), (2, 3), (6, 7), (4, 5), (8, 9), (11, 12), (5, 6), (14, 15), (7, 8)})


Using the `gate_counts` parameter we can count the number of each gate in the circuit. Additionally, the `coupling_graph` gives us the logical connectivity in this circuit.

### Compiling and Optimizing Circuits

To run a circuit on a specific QPU, we need to ensure the types of gates in the circuit can be executed natively by the QPU and that the logical connectivity in the circuit matches the hardware-provided physical connectivity. More crucially, we must overcome these two restrictions in the fewest number of total operations. Fewer operations require fewer resources. In the NISQ era, this implies less error and a greater chance of success; while in the fault-tolerant era, this will imply fewer physical resources and faster runtimes.

Retargeting, mapping, and optimization, BQSKit's `compile` function can be used to accomplish these three goals in one function call:

<div class="alert alert-warning">
<b>NOTE</b> BQSKit's compile function is executed in parallel across your entire system. If you would like to disable this, you can set <it>num_workers=1</it> to execute only on one CPU.
</div>

In [4]:
# Compile the circuit
from bqskit import compile
out_circuit = compile(circuit)

# Print new statistics
print("Compiled Circuit Statistics")
print("Gate Counts:", out_circuit.gate_counts)
print("Logical Connectivity:", out_circuit.coupling_graph)

Compiled Circuit Statistics
Gate Counts: {U3Gate: 525, CNOTGate: 338}
Logical Connectivity: CouplingGraph({(0, 1), (9, 10), (13, 14), (1, 2), (10, 11), (3, 4), (12, 13), (2, 3), (6, 7), (4, 5), (8, 9), (11, 12), (5, 6), (14, 15), (7, 8)})


The logical connectivity of the circuit may or may not have changed in this example, but one thing to notice is that the number of two-qubit gates has most likely gone down. We can also see that all the single-qubit gates have been converted into `U3Gate`s. This is because the `compile` function, by default, targets a machine with all-to-all connectivity and a native gate set including only CNOT gates and U3 gates. We can change the target gate set and connectivity by telling the `compile` function to target a different machine model. This will be explored in detail in the next section.

#### Optimization Levels

The `compile` function has different levels of optimization. Each successive level applies additional algorithms to reduce gate counts further while rebasing and mapping the circuit correctly. The `optimization_level` parameter specifies the level used in compilation. There are four levels: 1, 2, 3, and 4. There is no 0th level since the core synthesis and instantiation algorithms in BQSKit all have the potential to reduce gate counts. Level one is the default and focuses on returning a hardware-compliant circuit in minimal time. Levels two and three apply additional optimizations to reduce gate counts further. Level four aims to produce the best circuit possible without domain-specific information.

In [5]:
# Compile circuit with a optimization level 2
out_circuit = compile(circuit, optimization_level=2)

# Print new statistics
print("Compiled Circuit Statistics")
print("Gate Counts:", out_circuit.gate_counts)
print("Connectivity:", out_circuit.coupling_graph)

Compiled Circuit Statistics
Gate Counts: {U3Gate: 304, CNOTGate: 322}
Connectivity: CouplingGraph({(0, 1), (9, 10), (13, 14), (1, 2), (10, 11), (3, 4), (12, 13), (2, 3), (6, 7), (4, 5), (8, 9), (11, 12), (5, 6), (14, 15), (7, 8)})


You may notice higher levels of optimization require more compile time. The core algorithms are embarrassingly parallel; in section 3, we learn how to efficiently distribute the standard compilation across a cluster scaling the workload.

With advanced usage of BQSKit, you can build domain-tailored workflows to navigate the trade-offs between compile time and quality in your desired way. Workflow construction is highly situational and out of scope for this tutorial, however, later tutorials will provide examples.

### Exporting a program

When you have finished compiling with BQSKit, you can save a circuit as a qasm file or export it to another framework:

In [6]:
# Save circuit to qasm file
circuit.save('heisenberg16_out.qasm')

# Convert to another framework
from bqskit.ext import bqskit_to_qiskit
qc = bqskit_to_qiskit(circuit)

Similar to before, `bqskit_to_qiskit` could be substituted with `bqskit_to_qutip`, `bqskit_to_cirq`, or `bqskit_to_pytket`depending on the desired framework.

## Section 2: Portability across Hardware via MachineModels

BQSKit is a highly portable framework supporting a wide variety of quantum processors. The standard compilation workflow can be targeted to a specific QPU by modeling its restrictions. This section demonstrates this feature by building `MachineModel` objects and programming the compiler with them.

![image](https://d1.awsstatic.com/re19/Braket/Rigetti-Scalable-Aspen-chip-architecture.a29e3e5cbc4f7a48ea99f9e0aace6368de395855.png)

Rigetti's M2 QPU, pictured above, uses a gate set containing CZ gates and is fully supported by BQSKit along with any other QPU.

### Gate Set Targeting

The core of BQSKit's portability is in the `MachineModel` object, which represents a target machine and its constraints. This includes the QPU's qubit count, connectivity, and supported native gate set. In the following few examples, we build a new gate set and create a machine model with it. Then compile the circuit targeting our new model.

In [7]:
# Building Rigetti's native gate set
from bqskit.ir.gates import CZGate, RZGate, SXGate
gate_set = {CZGate(), RZGate(), SXGate()} 

# Build a MachineModel with this gate set
# and the same number of qubits as the circuit
from bqskit import MachineModel
model = MachineModel(circuit.num_qudits, gate_set=gate_set)

# Compile again and print new gate counts
out_circuit = compile(circuit, model=model)
print("Gate Counts:", out_circuit.gate_counts)

Gate Counts: {RZGate: 1404, CZGate: 226, SqrtXGate: 936}


This time, the compilation resulted in a circuit with gates only from the gate set of the model. Let's try another real gate set:

In [8]:
# Define a function to save space
def compile_to_gateset(circuit, gate_set):
    model = MachineModel(circuit.num_qudits, gate_set=gate_set)
    return compile(circuit, model=model)

# Building a gate set similar to Quantinuum's
from bqskit.ir.gates import ZZGate
quantinuum_like_gate_set = {ZZGate(), RZGate(), SXGate()}

out_circuit = compile_to_gateset(circuit, quantinuum_like_gate_set)
print(out_circuit.gate_counts)

{RZGate: 1416, SqrtXGate: 944, ZZGate: 228}


Any gate set that can express the input circuit will work without any other restriction or user-input:

In [9]:
# Completely custom gate set
from bqskit.ir.gates import RYYGate
custom_gate_set = {RYYGate(), RZGate(), SXGate()}

out_circuit = compile_to_gateset(circuit, custom_gate_set)
print(out_circuit.gate_counts)

{RZGate: 768, SqrtXGate: 512, RYYGate: 120}


Even gate sets containing multiple entangling gates or gates acting on 3 or more qudits are supported.

In [10]:
# Gate set with multiple entangling gates
from bqskit.ir.gates import SqrtISwapGate, SycamoreGate
cirq_gate_set = {SqrtISwapGate(), SycamoreGate(), CZGate(), RZGate(), SXGate()}

out_circuit = compile_to_gateset(circuit, cirq_gate_set)
print(out_circuit.gate_counts)

{RZGate: 1542, SycamoreGate: 80, SqrtISwapGate: 96, SqrtXGate: 1028, CZGate: 73}


Different gate sets can lead to drastically different results. This can be very helpful in deciding which computer to run an experiment on.

#### Exercise 1: Define and compile to your own gate set

Try your hand at designing a gate set using the gates supported by BQSKit. Modify the above code to compile a circuit to a gate set of your choice. You can refer to the [documentation](https://bqskit.readthedocs.io/en/latest/source/ir.html#bqskit-gates-bqskit-ir-gates) to find all the built-in gates. If you cannot find your gate in that list and it is constant -- not parameterized or always has a fixed parameter -- then you can use the [`ConstantUnitaryGate`](https://bqskit.readthedocs.io/en/latest/source/autogen/bqskit.ir.gates.ConstantUnitaryGate.html#bqskit.ir.gates.ConstantUnitaryGate) and pass the unitary directly. In a later tutorial, it is discussed how to define custom gates which is necessary for parameterized ones. Try and find the gate set that leads to the shortest circuit.

In [11]:
# The PauliGate parametrizes all possible qubit unitaries of a specified size
# Using the following formulation:
# U = exp(-i * sum_j (theta_j * P_j)) where P_j is a Pauli operator
from bqskit.ir.gates import PauliGate
custom_gate_set = {PauliGate(2), RZGate(), SXGate()}

out_circuit = compile_to_gateset(circuit, custom_gate_set)
print(out_circuit.gate_counts)

{PauliGate(2): 112, RZGate: 720, SqrtXGate: 480}


Since the `PauliGate` is a `GeneralGate`, a gate that parameterizes all unitaries for a given dimension, we should expect the fewest two-qubit gates required with these. This expectation is because they are far more expressive than, for example, a `CNOTGate`. However, they may not be accessible on most hardware, or if it is, it may cost more in time/noise. Often, it may be better to use as low-level instructions as possible, which can sometimes be seemingly random unitaries:

In [12]:
from bqskit.ir.gates import ConstantUnitaryGate
from bqskit.qis import UnitaryMatrix
for i in range(10):
    two_qubit_gate = ConstantUnitaryGate(UnitaryMatrix.random(2))
    custom_gate_set = {two_qubit_gate, RZGate(), SXGate()}

    out_circuit = compile_to_gateset(circuit, custom_gate_set)
    print(f"Circuit compiled to random gate {i} required {out_circuit.count(two_qubit_gate)} two-qubit gates.")

Circuit compiled to random gate 0 required 344 two-qubit gates.
Circuit compiled to random gate 1 required 268 two-qubit gates.
Circuit compiled to random gate 2 required 341 two-qubit gates.
Circuit compiled to random gate 3 required 354 two-qubit gates.
Circuit compiled to random gate 4 required 266 two-qubit gates.
Circuit compiled to random gate 5 required 339 two-qubit gates.
Circuit compiled to random gate 6 required 355 two-qubit gates.
Circuit compiled to random gate 7 required 258 two-qubit gates.
Circuit compiled to random gate 8 required 238 two-qubit gates.
Circuit compiled to random gate 9 required 248 two-qubit gates.


Often, hardware developers fix parameters to interactions at specific angles because it makes it easier to work with at the algorithm level. BQSKit allows us to decouple these concepts:

In [13]:
from bqskit.ir.gates import RXXGate, FrozenParameterGate, U3Gate
# output from `np.arange(np.pi/6, np.pi*2, np.pi/6)`
# hardcoded to remove dependency on numpy
angles = [
    0.5235987755982988,
    1.0471975511965976,
    1.5707963267948966,
    2.0943951023931953,
    2.617993877991494,
    # 3.141592653589793, # omitted pi
    3.665191429188092,
    4.1887902047863905,
    4.71238898038469,
    5.235987755982988,
    5.759586531581287,
]
for angle in angles:
    two_qubit_gate = FrozenParameterGate(RXXGate(), {0: angle})
    custom_gate_set = {two_qubit_gate, U3Gate()}

    out_circuit = compile_to_gateset(circuit, custom_gate_set)
    print(f"Circuit compiled to {out_circuit.count(two_qubit_gate)} XX({angle}) gates.")

Circuit compiled to 237 XX(0.5235987755982988) gates.
Circuit compiled to 247 XX(1.0471975511965976) gates.
Circuit compiled to 226 XX(1.5707963267948966) gates.
Circuit compiled to 244 XX(2.0943951023931953) gates.
Circuit compiled to 240 XX(2.617993877991494) gates.
Circuit compiled to 239 XX(3.665191429188092) gates.
Circuit compiled to 251 XX(4.1887902047863905) gates.
Circuit compiled to 226 XX(4.71238898038469) gates.
Circuit compiled to 251 XX(5.235987755982988) gates.
Circuit compiled to 239 XX(5.759586531581287) gates.


### Topology Mapping

A `MachineModel` is also used to encode hardware connectivity. We define a model's coupling graph with an edge list, where every edge indicates a valid position for a two-qubit gate. In the next example, we construct a 16-qubit model composed of two eight-qubit stars with a single link between them.

The below example is a good demonstration of mapping the default 16-qubit Heisenberg program. If you have loaded in a different quantum circuit, make sure to change the below coupling graph and machine model qubit count appropriately. 

In [14]:
coupling_graph = [
    (0, 1), (0, 2), (0, 3), (0, 4), (0, 5), (0, 6), (0, 7),
    (8, 9), (8, 10), (8, 11), (8, 12), (8, 13), (8, 14), (8, 15),
    (0, 8),
]
model = MachineModel(16, coupling_graph=coupling_graph)
out_circuit = compile(circuit, model=model)
print("Gate Counts:", out_circuit.gate_counts)
print("Connectivity:", out_circuit.coupling_graph)

Gate Counts: {U3Gate: 1561, CNOTGate: 595}
Connectivity: CouplingGraph({(0, 1), (0, 7), (8, 14), (0, 4), (0, 3), (8, 10), (0, 6), (8, 13), (0, 2), (8, 9), (0, 5), (8, 12), (0, 8), (8, 15), (8, 11)})


Notice that output circuit has a logical connectivity that matches the model's coupling graph:

In [15]:
assert model.is_compatible(out_circuit)

#### Placement Information

Whenever mapping is performed, the order of the qubits may be permuted to save on gate counts. If the initial circuit before compilation has measurements included, the measurements will also be permuted to ensure you do not need to permute bit string outputs. The standard compiler workflow will ensure that the appropriate qubits are measured into the correct classical bits. However, sometimes we would like more detailed information about how the logical circuit was placed on the physical chip. There is a `compile` option that achieves this result:

In [16]:
out_circuit, initial_mapping, final_mapping = compile(circuit, model=model, with_mapping=True)
print("Initial Mapping:", initial_mapping)
print("Final Mapping:", final_mapping)

Initial Mapping: [7, 3, 0, 6, 5, 4, 2, 1, 8, 9, 13, 10, 11, 12, 14, 15]
Final Mapping: [1, 7, 4, 5, 2, 3, 6, 0, 12, 9, 14, 13, 10, 15, 11, 8]


When the `with_mapping` flag is set to `True`, the `compile` function will additionally output two tuples. The first captures the initial placement or mapping of logical qudits to physical qudits. The initial mapping is a tuple where `initial_mapping[i] = j` implies that logical qudit `i` in the input system starts on the physical qudit `j` in the output circuit. Likewise, the final mapping describes where the logical qudits are in the physical circuit at the end of execution. Therefore, the following is true in the context of the above example:

In [17]:
print(f"In out_circuit, the first logical qubit starts execution on physical qubit {initial_mapping[0]} and ends on physical qubit {final_mapping[0]}.")

In out_circuit, the first logical qubit starts execution on physical qubit 7 and ends on physical qubit 1.


### Pre-built Models

Included in the BQSKit extensions package are pre-built `MachineModel`s and `MachineModel`-factories for many QPUs [1, 2, 3, 4]. They are listed below:

In [18]:
# Quantinuum QPU Models
from bqskit.ext import H1_1Model
from bqskit.ext import H1_2Model

# Google Sycamore QPU Models
from bqskit.ext import Sycamore23Model
from bqskit.ext import SycamoreModel

# Factory for converting Qiskit `Backend` objects into `MachineModels`
from bqskit.ext import model_from_backend

# Rigetti QPU Models
from bqskit.ext import Aspen11Model
from bqskit.ext import AspenM2Model

#### Exercise 2: Discover the best model for your circuit

Having a powerful and fully portable compiler enables you choose the best QPUs available today for your specific use case. Try compiling your circuit to various `MachineModel`s and find which one produces the best result. You can use the built-in models above or even define your own custom model.

In [19]:
for model_name, model in [
    ("H1_2", H1_2Model),
    ("Sycamore23", Sycamore23Model),
    ("AspenM2", AspenM2Model),
]:
    out_circuit = compile(circuit, model=model, optimization_level=2)
    two_qubit_gate_counts = sum(c for g,c in out_circuit.gate_counts.items() if g.num_qudits >= 2)
    print(f"{model_name} Two Qubit Gate Count: {two_qubit_gate_counts}")

H1_2 Two Qubit Gate Count: 226
Sycamore23 Two Qubit Gate Count: 250
AspenM2 Two Qubit Gate Count: 232


BQSKit can help you produce high-quality circuits for any QPU, but the analysis doesn't just stop with gate counts. In a real experimental setting, you should consider gate fidelities and other hardware statistics because not all gates are equal. For example, Google's `SycamoreGate` may require more gates (or not), but it is also significantly faster than other instructions.

## Section 3: Under the hood: Synthesis, Verification, and Scaling

Under the hood, BQSKit performs many numerical instantiation and synthesis subroutines to compile an input circuit to a target chip. There are parameters that control these aspects leading to even better performance or quality that goes beyond the optimization level. This section dives into these trade-offs, starting with synthesis. We then discuss compilation error and verification and finish the tutorial by explaining how you can distribute BQSKit across cluster to parallelize a compilation workload efficiently.

### Synthesis

Numerical quantum synthesis, as used in BQSKit, is the process of implementing a quantum circuit from a given unitary matrix, which completely describes the target operation. You can use the standard compile function to synthesis a unitary by simply passing one in:

In [20]:
toffoli = [
    [1, 0, 0, 0, 0, 0, 0, 0],
    [0, 1, 0, 0, 0, 0, 0, 0],
    [0, 0, 1, 0, 0, 0, 0, 0],
    [0, 0, 0, 1, 0, 0, 0, 0],
    [0, 0, 0, 0, 1, 0, 0, 0],
    [0, 0, 0, 0, 0, 1, 0, 0],
    [0, 0, 0, 0, 0, 0, 0, 1],
    [0, 0, 0, 0, 0, 0, 1, 0],
]
toffoli_circuit = compile(toffoli, optimization_level=3)
print(toffoli_circuit.gate_counts)

{U3Gate: 10, CNOTGate: 6}


The Berkeley Quantum **Synthesis** toolkit (BQSKit) has much greater support for synthesis that goes far beyond this example. For example, you can also pass in a [`StateVector`](https://bqskit.readthedocs.io/en/latest/source/autogen/bqskit.qis.StateVector.html#bqskit.qis.StateVector) or a [`StateSystem`](https://bqskit.readthedocs.io/en/latest/source/autogen/bqskit.qis.StateSystem.html#bqskit.qis.StateSystem). Additionally, you can easily and quickly select from a variety of different state-of-the-art algorithms, define custom synthesis functions, swap out a cost function, integrate domain-specific knowledge and much more. All of these features will be explore in another tutorial.

The bottom-up synthesis and optimization algorithms in BQSKit are built on top of numerical instantiation. In this process, a numerical optimizer is employed to find the best gate parameters for a specific circuit to minimize the distance to a target matrix. The numerical aspect of this subroutine enables BQSKit to be flexible; however, these algorithms grow exponentially challenging as the system size increases.

To combat this, BQSKit uses circuit partitioning. Partitioning drastically improves performance on medium-sized inputs while enabling instantiation-based compilation of very large ones. Large circuits have their gates grouped into blocks of configurable size, and instantiation-based algorithms are executed only at the block level.

![images/instantiation-unpart.png](images/instantiation-unpart.png)

For example, the above circuit may be partitioned into three-qubit blocks given below:

![images/instantiation-parted.png](images/instantiation-parted.png)

The parameter `max_synthesis_size` determines the maximum width of any block passed to instantiation or synthesis during compilation. It is set to three by default, which is a sweet spot for most use cases. Larger sizes will lead to better results with an exponential time trade-off.

As a note, directly passing unitaries larger than the maximum size to the compile function will log an error. To directly synthesize larger unitaries, you must increase this parameter accordingly.

**The block size 4 example below is very computationally intensive.**

In [21]:
# Block size 4
out_circuit = compile(circuit, max_synthesis_size=4)
print("Gate Counts:", out_circuit.gate_counts)

Gate Counts: {U3Gate: 633, CNOTGate: 257}


In [22]:
# Block size 2
out_circuit = compile(circuit, max_synthesis_size=2)
print("Gate Counts:", out_circuit.gate_counts)

Gate Counts: {U3Gate: 506, CNOTGate: 360}


### Approximate Compilation and Verification

During compilation, BQSKit uses bottom-up synthesis, where circuits are built up one gate at a time. Synthesis is sucessful when the distance between the target unitary matrix and the one implemented by the circuit is less than some epsilon. In general, the distance function is configurable, but the standard compilation workflows use the following distance function based on the Hilbert-Schmidt inner product:

$$\Delta(U_T, U_C) = 1 - \frac{|Tr(U_T^\dagger U_C)|}{2^n}$$

When a n-qubit circuit implementing a unitary given by $U_C$ has a $\Delta$ less than some $\epsilon$ away from $U_T$ the target unitary, synthesis is sucessful. By default, the compilation workflow does not set $\epsilon$ to zero, and as a result, the standard compilation workflow is approximate. The default $\epsilon$ is $10^{-8}$.

When compiling small circuits, this value will cap the final $\Delta$ between the input and output circuits. However, computing $\Delta$ for large circuits is intractable due to the exponential size of the unitary with respect to the number of qubits in the circuit. In order to verify compilation in the general case, we have adapted and integrated methods from the Quest algorithm [5] to compute an upper bound on circuit error. In practice, this upper bound can be loose, nevertheless it describes the maximum error introduced by compilation and can be used to verify results.

The `compile` function has "push-button" verification with three parameters that control the approximation and verification mechanisms: 

The `synthesis_epsilon` parameter controls the $\epsilon$ value used during instantiation. This indirectly controls the amount of error introduced during the complete compilation process. Larger values will likely lead to more error in the final output with a potential of speeding up compilation and reducing gate counts even further. Values closer to zero will produce more accurate results at the cost of time and quality.

The `error_threshold` and `error_sim_size` parameters control compilation verification. By default, no verification is done. If you set the `error_threshold` to a number between 0 and 1, the compiler will calculate the maximum $\Delta$ and record a warning if it is greater than the `error_threshold`. Verification is not for free. Under the hood, large blocks are formed within the circuit and their error is calculated and summed together. The larger the block size used for verification, the tighter the upper bound. But simulating larger blocks is exponentially costly in time. The `error_sim_size` controls the block size used for verification, which is set to 8 by default.

![images/instantiation-verf.png](images/instantiation-verf.png)

In the below examples, we will demonstrate the use of these parameters. First, we demonstrate a compilation with large error and the associated warning when verification is used:

In [23]:
# High-error compilation
out_circuit = compile(circuit, synthesis_epsilon=0.5, error_threshold=0)



Increasing synthesis epsilon will lead to more compilation error, but it is not always the case that shorter circuits will be produced. In the below example, we increase the synthesis epsilon significantly but do not see a significant improvement. Your specific circuit may be different, and it is worth testing. Experimentally, we have seen significant benefits here; it is just not always the case.

In [24]:
out_circuit = compile(circuit, synthesis_epsilon=1e-3, error_threshold=5e-2)
print(out_circuit.gate_counts)

{U3Gate: 517, CNOTGate: 344}


The error upper bound is directly related to the number of partitions in a circuit. As the input circuit increases in depth and width, you may need to lower `synthesis_epsilon` to lower the compiler error or increase the `error_sim_size` to tighten the upper bound. In practice, these upper bounds can be pretty loose, but there is value in ensuring that compilation is correct.

#### Exercise 3:  Verify Your Compilation

Pick a compilation you previously tried during this tutorial, and repeat it while verifying the result this time.

In [25]:
for model_name, model in [
    ("H1_2", H1_2Model),
    ("Sycamore23", Sycamore23Model),
    ("AspenM2", AspenM2Model),
]:
    out_circuit = compile(circuit, model=model, optimization_level=2, error_threshold=1e-8)
    two_qubit_gate_counts = sum(c for g,c in out_circuit.gate_counts.items() if g.num_qudits >= 2)
    print(f"{model_name} Two Qubit Gate Count: {two_qubit_gate_counts}")

H1_2 Two Qubit Gate Count: 226
Sycamore23 Two Qubit Gate Count: 272
AspenM2 Two Qubit Gate Count: 224


### Scaling Compilation: Tens of Qubits to Thousands of Qubits

BQSKit will quickly partition very large circuits into blocks determined by the `max_synthesis_size` and then distribute the compilation process over a [BQSKit Runtime](https://bqskit.readthedocs.io/en/latest/source/runtime.html). By default, whenever `compile` is called a cluster is first started on your local machine and then the compilation continues. This usually adds a few seconds of overhead as worker processes and threads are started. However, you can launch your own cluster then connect to and compile with it.

The steps are:
1. Launch your cluster. Refer to the Runtime [documentation](https://bqskit.readthedocs.io/en/latest/source/runtime.html) and [guide](#TODO). If you plan on using a supercomputer or cluster with slurm support, the guide has a handy slurm script for reference.
2. Connect to it. The compile function will pass all extra arguments and keyword arguments to relevant [`Compiler`](https://bqskit.readthedocs.io/en/latest/source/autogen/bqskit.compiler.Compiler.html#bqskit.compiler.Compiler) constructor, which can be used to connect to and configure a runtime cluster. Usually, you will either use the `ip` and/or `port` keywords to connect to an already running cluster.
3. Compile. once connected the `compile` function will do the rest.

#### Compiling Many Circuits Faster

If you plan to compile many inputs one after another on your local machine, you can save a lot of time by caching a `Compiler` object. By default, a new `Compiler` is created every time the `compile` function is called, which has a nontrivial overhead. However, a `Compiler` can be reused without issue. In the below example, we synthesize three single-qubit unitaries one after another. The first attempt doesn't cache the `Compiler` object, while the second does. The compile times are printed:

In [26]:
from timeit import default_timer as timer

from bqskit.compiler import Compiler
from bqskit.qis import UnitaryMatrix

single_qubit_unitaries = [UnitaryMatrix.random(1) for i in range(3)]

# Synthesize the 3 unitaries
start = timer()
for utry in single_qubit_unitaries:
    compile(utry)
end = timer()
print(f'Synthesized 3 unitaries in {end - start} seconds.')

# Now synthesize with a cached Compiler
start = timer()
compiler = Compiler()
for utry in single_qubit_unitaries:
    compile(utry, compiler=compiler)
compiler.close()
end = timer()
print(f'After caching a Compiler, synthesized 3 unitaries in {end - start} seconds.')

Synthesized 3 unitaries in 3.369565630098805 seconds.
After caching a Compiler, synthesized 3 unitaries in 1.4991761350538582 seconds.


In more recent versions of BQSKit, this can be streamlined even more by directly passing the list into the `compile` function. This has the added benefit of parallelizing the compilation tasks.

In [27]:
start = timer()
out_unitaries = compile(single_qubit_unitaries)
end = timer()
print(f'Compiling all single-qubit unitaries at once took {end - start} seconds.')

Compiling all single-qubit unitaries at once took 1.1658039430622011 seconds.


## References

- [1] https://www.quantinuum.com/products/h1
- [2] https://quantumai.google/cirq/google/devices
- [3] https://www.ibm.com/quantum/systems
- [4] https://qcs.rigetti.com/qpus
- [5] Tirthak Patel, Ed Younis, Costin Iancu, Wibe de Jong, and Devesh Tiwari. 2022. QUEST: systematically approximating Quantum circuits for higher output fidelity. In Proceedings of the 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS '22). Association for Computing Machinery, New York, NY, USA, 514–528. https://doi.org/10.1145/3503222.3507739